mirror of
https://github.com/torvalds/linux.git
synced 2026-05-28 17:13:52 +02:00
master
136 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
13f24586a2 |
arm64 updates for 7.1 (second round):
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a recent
commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation, uninitialised
variable and checker warnings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmnmQZcACgkQa9axLQDI
XvGouA//SXlo7hQyM41rkgRru9oqftrGg0y6nxz4Z089kv50cm3Jlf/nUuti6vah
BMBLCGXA1iOQrGIVmuvtCxDRrfYZWpfKGuT9A0gmEoMqrGIpWl9gfBQG+uR+YrQX
4kp5DLqB85WrJIPiy7HUV6GQoCbFuMrRJwxl89IdWZSobaei3SczTmnttwyJtxG5
/BMitl024TYdiOPNo8bhiML1wIJCaTHvH4IrtCHPyUHEAtsHSMy00y0OrSKBtA/9
ZHZRpY7Po/jnL7YUs1AfYwsaSXjkvqXN0K1Tdavzm75k6lpJmbM3VsZabG/CEuvK
PCOGV++is4Y/A+7aQsCwXKeVnY3b6AC4sextytNq0g3GZ7I+Ht9O6nbsp5ZmyXzB
HRiFxmFS1pSQOMX9f1neKi3vxDMTy1tKPeccTTzL8dNnxTvUBXnoWfPoJh3cpbjm
Dbhe1kksiEn01WWFacGtkIPDa9c+Bkd2T+8wrsk85Z+u3Z0JPM5PfOn6v3X9YlKl
K7W8fhvlDL1wP+iyWcMT5zdo+xzHY4ZxuyWbi9a4RhKc6lFHVVG2mpUuPwSsh2ma
NnxkDouriuoADHBir89U71N483HSnNfSjhlVSFYD2LFCre5KOZM4KYZ2vwWb8Sy4
79q+BlVRUTQ5O6XjePoSPjUW4APPNviHJsF4E4IiqHkd9O5lMZU=
=LNY2
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"The main 'feature' is a workaround for C1-Pro erratum 4193714
requiring IPIs during TLB maintenance if a process is running in user
space with SME enabled.
The hardware acknowledges the DVMSync messages before completing
in-flight SME accesses, with security implications. The workaround
makes use of the mm_cpumask() to track the cores that need
interrupting (arm64 hasn't used this mask before).
The rest are fixes for MPAM, CCA and generated header that turned up
during the merging window or shortly before.
Summary:
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a
recent commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation,
uninitialised variable and checker warnings"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
|
||
|
|
858fbd7248 |
Merge branch 'for-next/c1-pro-erratum-4193714' into for-next/core
* for-next/c1-pro-erratum-4193714:
: Work around C1-Pro erratum 4193714 (CVE-2026-0995)
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
|
||
|
|
f1d26d72f0 |
IOMMU Updates for Linux v7.1:
Including:
- Core:
- Support for RISC-V IO-page-table format in generic iommupt
code
- ARM-SMMU Updates:
- Introduction of an "invalidation array" for SMMUv3, which enables
future scalability work and optimisations for devices with a large
number of SMMUv3 instances.
- Update the conditions under which the SMMUv3 driver works around
hardware errata for invalidation on MMU-700 implementations.
- Fix broken command filtering for the host view of NVIDIA's "cmdqv"
SMMUv3 extension.
- MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi SoCs.
- Intel VT-d:
- Support for dirty tracking on domains attached to PASID
- Removal of unnecessary read*()/write*() wrappers
- Improvements to the invalidation paths
- AMD Vi:
- Race-condition fixed in debugfs code
- Make log buffer allocation NUMA aware
- RISC-V:
- IO-TLB flushing improvements
- Minor fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmneJdAACgkQK/BELZcB
GuO3TQ//cWG59NaY32ByWMJTug+DnlmunPG4xlUrD0JQCJXIcHrzKruhet4NifJL
lE86+vN+s6I9EQgabmiDJPRZkZR20irDFMvuhfhVp8hprwxoan/fW7b4WY8Es0DM
Q/AUioacyULBuKIl5XeEU1kAvKbgj2mlaWERTVKknh0jyItWwkEDvfR7G3eYIKDV
dNhVjuow1byKnjrhE4Rslqz7HDalJVkDiADAdfbkV5+/HxqBJ19r84STW2YPkYLE
ARIdwv3AI+NZfk9O0sBavMBs5v2nyNqr1j6kEiKd8hoFYOXY2Da1pWRSflTgFWil
dYlXQZNGsznghBuc3VsqW75CMbSFtpWPx0LcQ0ClZUE11zxh+pfc1BekzvwHzunZ
CTOcWastVkYHlvZTi3zHz68puy1omblN3r5juQhKOjQL+8N9BtRL4FObwj9XH8R0
5Q9Fm2uWYr/1GCNZg0OvJNtQllxw8BTx/ssvof6sbq6P7OeT4cgErtqA9aQTdKuT
6EvsW74GIllTNk2yuNHvQTpNWcW9iW7cEgwPaN/S5cXPZHZXhLjimOUGJkPmP3NY
k2gcPAMkXi8MqC/gAr0kCDfxZPp5V8XWXI0lHd9qjxMvjZbLQlf0y6LArfA0hmQk
oMzMJQuF81CK3+6fY4fXePOR5y0r+eWSCKYSr0HeXIH8eUvsrsY=
=T6YT
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
"Core:
- Support for RISC-V IO-page-table format in generic iommupt code
ARM-SMMU Updates:
- Introduction of an "invalidation array" for SMMUv3, which enables
future scalability work and optimisations for devices with a large
number of SMMUv3 instances
- Update the conditions under which the SMMUv3 driver works around
hardware errata for invalidation on MMU-700 implementations
- Fix broken command filtering for the host view of NVIDIA's "cmdqv"
SMMUv3 extension
- MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi
SoCs
Intel VT-d:
- Support for dirty tracking on domains attached to PASID
- Removal of unnecessary read*()/write*() wrappers
- Improvements to the invalidation paths
AMD Vi:
- Race-condition fixed in debugfs code
- Make log buffer allocation NUMA aware
RISC-V:
- IO-TLB flushing improvements
- Minor fixes"
* tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits)
iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCY
dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC
iommu/amd: Invalidate IRT cache for DMA aliases
iommu/riscv: Remove overflows on the invalidation path
iommu/amd: Fix clone_alias() to use the original device's devid
iommu/vt-d: Remove the remaining pages along the invalidation path
iommu/vt-d: Pass size_order to qi_desc_piotlb() not npages
iommu/vt-d: Split piotlb invalidation into range and all
iommu/vt-d: Remove dmar_writel() and dmar_writeq()
iommu/vt-d: Remove dmar_readl() and dmar_readq()
iommufd/selftest: Test dirty tracking on PASID
iommu/vt-d: Support dirty tracking on PASID
iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointer
iommu/vt-d: Block PASID attachment to nested domain with dirty tracking
iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain
iommu/riscv: Fix signedness bug
iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs
iommu/amd: Fix illegal device-id access in IOMMU debugfs
iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement
iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init()
...
|
||
|
|
0baba94a97 |
arm64: errata: Work around early CME DVMSync acknowledgement
C1-Pro acknowledges DVMSync messages before completing the SME/CME memory accesses. Work around this by issuing an IPI to the affected CPUs if they are running in EL0 with SME enabled. Note that we avoid the local DSB in the IPI handler as the kernel runs with SCTLR_EL1.IESB=1. This is sufficient to complete SME memory accesses at EL0 on taking an exception to EL1. On the return to user path, no barrier is necessary either. See the comment in sme_set_active() and the more detailed explanation in the link below. To avoid a potential IPI flood from malicious applications (e.g. madvise(MADV_PAGEOUT) in a tight loop), track where a process is active via mm_cpumask() and only interrupt those CPUs. Link: https://lore.kernel.org/r/ablEXwhfKyJW1i7l@J2N7QTR9R3 Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
|
|
4ce0a2ccc0 |
arm64: mpam: Add initial MPAM documentation
MPAM (Memory Partitioning and Monitoring) is now exposed to user-space via resctrl. Add some documentation so the user knows what features to expect. Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com> |
||
|
|
aeb8595a5f |
arm_mpam: Quirk CMN-650's CSU NRDY behaviour
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears. This tells us the monitor never finishes scanning the cache. The erratum document says to wait the maximum time, then ignore the field. Add a flag to indicate whether this is the final attempt to read the counter, and when this quirk is applied, ignore the NRDY field. This means accesses to this counter will always retry, even if the counter was previously programmed to the same values. The counter value is not expected to be stable, it drifts up and down with each allocation and eviction. The CSU register provides the value for a point in time. Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com> |
||
|
|
dc48eb1ff2 |
arm_mpam: Add workaround for T241-MPAM-6
The registers MSMON_MBWU_L and MSMON_MBWU return the number of requests rather than the number of bytes transferred. Bandwidth resource monitoring is performed at the last level cache, where each request arrive in 64Byte granularity. The current implementation returns the number of transactions received at the last level cache but does not provide the value in bytes. Scaling by 64 gives an accurate byte count to match the MPAM specification for the MSMON_MBWU and MSMON_MBWU_L registers. This patch fixes the issue by reporting the actual number of bytes instead of the number of transactions from __ris_msmon_read(). Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com> |
||
|
|
a7efe23ed6 |
arm_mpam: Add workaround for T241-MPAM-4
In the T241 implementation of memory-bandwidth partitioning, in the absence of contention for bandwidth, the minimum bandwidth setting can affect the amount of achieved bandwidth. Specifically, the achieved bandwidth in the absence of contention can settle to any value between the values of MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set zero (below 0.78125%), once a core enters a throttled state, it will never leave that state. The first issue is not a concern if the MPAM software allows to program MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed. In the scenario where the resctrl doesn't support the MBW_MIN interface via sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention, software should configure a relatively narrow gap between MBW_MIN and MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem. Clear the feature MBW_MIN feature from the class to ensure we don't accidentally change behaviour when resctrl adds support for a MBW_MIN interface. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com> |
||
|
|
70e81fbedc |
arm_mpam: Add workaround for T241-MPAM-1
The MPAM bandwidth partitioning controls will not be correctly configured, and hardware will retain default configuration register values, meaning generally that bandwidth will remain unprovisioned. To address the issue, follow the below steps after updating the MBW_MIN and/or MBW_MAX registers. - Perform 64b reads from all 12 bridge MPAM shadow registers at offsets (0x360048 + slice*0x10000 + partid*8). These registers are read-only. - Continue iterating until all 12 shadow register values match in a loop. pr_warn_once if the values fail to match within the loop count 1000. - Perform 64b writes with the value 0x0 to the two spare registers at offsets 0x1b0000 and 0x1c0000. In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers are transformed into broadcast writes to the 12 shadow registers. The final two writes to the spare registers cause a final rank of downstream micro-architectural MPAM registers to be updated from the shadow copies. The intervening loop to read the 12 shadow registers helps avoid a race condition where writes to the spare registers occur before all shadow registers have been updated. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com> |
||
|
|
3b79398383 |
iommu/arm-smmu-v3: Update Arm errata
MMU-700 r1p1 has subsequently fixed some of the errata for which we've been applying the workarounds unconditionally, so we can now make those conditional. However, there have also been some more new cases identified where we must rely on range invalidation commands, and thus still nominally avoid DVM being inadvertently enabled. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
72c395024d |
A slightly calmer cycle for docs this time around, though there is still a
fair amount going on, including:
- Some signs of life on the long-moribund Japanese translation
- Documentation on policies around the use of generative tools for patch
submissions, and a separate document intended for consumption by
generative tools.
- The completion of the move of the documentation tools to tools/docs. For
now we're leaving a /scripts/kernel-doc symlink behind to avoid breaking
scripts.
- Ongoing build-system work includes the incorporation of documentation in
Python code, better support for documenting variables, and lots of
improvements and fixes.
- Automatic linking of man-page references -- cat(1), for example -- to
the online pages in the HTML build.
...and the usual array of typo fixes and such.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCgAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmmKM8YPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YLK4H/2gqVxY8wKbVymiB95/zudiba8EtWlKE4hZl
KAd4+csZ8RCTMxHJLI23FXOi56CYr3XOQol0DIDUGimQiQx/Cxa2QDWewpkqbNH1
tHPTaNWAj16wKzrZxXhWt+6FoBHd7wrqilLH180IRmezRhu+7kURQ5XEAAXfK08B
CfDXBsCpnGsKn+m72x04cpvnsf/iK3pznbKrZ0ZYGIoaZb6+BV3+jqVaLROWSQZt
Nvt1rYjsi0vaeNapbQL8q72UJ/+zO4nK9am13s7p20zD+jUVY48yfQB/ZqvHp/1L
aymcJUCq0h5sSOHnfSqY5MTZUR/0CK+npRcEPgDYzLBigc9XU9g=
=hHq1
-----END PGP SIGNATURE-----
Merge tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux
Pull documentation updates from Jonathan Corbet:
"A slightly calmer cycle for docs this time around, though there is
still a fair amount going on, including:
- Some signs of life on the long-moribund Japanese translation
- Documentation on policies around the use of generative tools for
patch submissions, and a separate document intended for consumption
by generative tools
- The completion of the move of the documentation tools to
tools/docs. For now we're leaving a /scripts/kernel-doc symlink
behind to avoid breaking scripts
- Ongoing build-system work includes the incorporation of
documentation in Python code, better support for documenting
variables, and lots of improvements and fixes
- Automatic linking of man-page references -- cat(1), for example --
to the online pages in the HTML build
...and the usual array of typo fixes and such"
* tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (107 commits)
doc: development-process: add notice on testing
tools: sphinx-build-wrapper: improve its help message
docs: sphinx-build-wrapper: allow -v override -q
docs: kdoc: Fix pdfdocs build for tools
docs: ja_JP: process: translate 'Obtain a current source tree'
docs: fix 're-use' -> 'reuse' in documentation
docs: ioctl-number: fix a typo in ioctl-number.rst
docs: filesystems: ensure proc pid substitutable is complete
docs: automarkup.py: Skip common English words as C identifiers
Documentation: use a source-read extension for the index link boilerplate
docs: parse_features: make documentation more consistent
docs: add parse_features module documentation
docs: jobserver: do some documentation improvements
docs: add jobserver module documentation
docs: kabi: helpers: add documentation for each "enum" value
docs: kabi: helpers: add helper for debug bits 7 and 8
docs: kabi: system_symbols: end docstring phrases with a dot
docs: python: abi_regex: do some improvements at documentation
docs: python: abi_parser: do some improvements at documentation
docs: add kabi modules documentation
...
|
||
|
|
45bf4bc87c |
arm64 updates for 7.0
ACPI:
- Add interrupt signalling support to the AGDI handler.
- Add Catalin and myself to the arm64 ACPI MAINTAINERS entry.
CPU features:
- Drop Kconfig options for PAN and LSE (these are detected at runtime).
- Add support for 64-byte single-copy atomic instructions (LS64/LS64V).
- Reduce MTE overhead when executing in the kernel on Ampere CPUs.
- Ensure POR_EL0 value exposed via ptrace is up-to-date.
- Fix error handling on GCS allocation failure.
CPU frequency:
- Add CPU hotplug support to the FIE setup in the AMU driver.
Entry code:
- Minor optimisations and cleanups to the syscall entry path.
- Preparatory rework for moving to the generic syscall entry code.
Hardware errata:
- Work around Spectre-BHB on TSV110 processors.
- Work around broken CMO propagation on some systems with the SI-L1
interconnect.
Miscellaneous:
- Disable branch profiling for arch/arm64/ to avoid issues with noinstr.
- Minor fixes and cleanups (kexec + ubsan, WARN_ONCE() instead of
WARN_ON(), reduction of boolean expression).
- Fix custom __READ_ONCE() implementation for LTO builds when operating
on non-atomic types.
Perf and PMUs:
- Support for CMN-600AE.
- Be stricter about supported hardware in the CMN driver.
- Support for DSU-110 and DSU-120.
- Support for the cycles event in the DSU driver (alongside the
dedicated cycles counter).
- Use IRQF_NO_THREAD instead of IRQF_ONESHOT in the cxlpmu driver.
- Use !bitmap_empty() as a faster alternative to bitmap_weight().
- Fix SPE error handling when failing to resume profiling.
Selftests:
- Add support for the FORCE_TARGETS option to the arm64 kselftests.
- Avoid nolibc-specific my_syscall() function.
- Add basic test for the LS64 HWCAP.
- Extend fp-pidbench to cover additional workload patterns.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmmJ16QQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNFO0CACxYy5klTAdqB+RjhxSbibZ6mmch/mX3zLl
qBp0HeKuwSv+EWEblmr6KMrr/S5YjtEMboOC7GnmEdSeM6hucu5gPiEb7bIDQfbB
5KJXoUlN1C+7Na4CimfordWiHIZw2d7mTumJcnsqMo7XN83t+4yqs98sET7WRCy5
WUyqHH/dGZsGcsBMrRRO3UriHCpU425GE0jopF5OkwCs4dwyUVIV8SmZtK4Ovi3u
Kin75UYgeVum8A218oHosTfYTBt4p/cSdfVVv5f3P14Q7lIstyJR7ZDqepTP2XSJ
95W3nuMfOT1lx7f6pqvHc+4Ccl/+DQnbW1PWwv8f3nXAKVVMafWf
=3SfC
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"There's a little less than normal, probably due to LPC & Christmas/New
Year meaning that a few series weren't quite ready or reviewed in
time. It's still useful across the board, despite the only real
feature being support for the LS64 feature enabling 64-byte atomic
accesses to endpoints that support it.
ACPI:
- Add interrupt signalling support to the AGDI handler
- Add Catalin and myself to the arm64 ACPI MAINTAINERS entry
CPU features:
- Drop Kconfig options for PAN and LSE (these are detected at runtime)
- Add support for 64-byte single-copy atomic instructions (LS64/LS64V)
- Reduce MTE overhead when executing in the kernel on Ampere CPUs
- Ensure POR_EL0 value exposed via ptrace is up-to-date
- Fix error handling on GCS allocation failure
CPU frequency:
- Add CPU hotplug support to the FIE setup in the AMU driver
Entry code:
- Minor optimisations and cleanups to the syscall entry path
- Preparatory rework for moving to the generic syscall entry code
Hardware errata:
- Work around Spectre-BHB on TSV110 processors
- Work around broken CMO propagation on some systems with the SI-L1
interconnect
Miscellaneous:
- Disable branch profiling for arch/arm64/ to avoid issues with
noinstr
- Minor fixes and cleanups (kexec + ubsan, WARN_ONCE() instead of
WARN_ON(), reduction of boolean expression)
- Fix custom __READ_ONCE() implementation for LTO builds when
operating on non-atomic types
Perf and PMUs:
- Support for CMN-600AE
- Be stricter about supported hardware in the CMN driver
- Support for DSU-110 and DSU-120
- Support for the cycles event in the DSU driver (alongside the
dedicated cycles counter)
- Use IRQF_NO_THREAD instead of IRQF_ONESHOT in the cxlpmu driver
- Use !bitmap_empty() as a faster alternative to bitmap_weight()
- Fix SPE error handling when failing to resume profiling
Selftests:
- Add support for the FORCE_TARGETS option to the arm64 kselftests
- Avoid nolibc-specific my_syscall() function
- Add basic test for the LS64 HWCAP
- Extend fp-pidbench to cover additional workload patterns"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (43 commits)
perf/arm-cmn: Reject unsupported hardware configurations
perf: arm_spe: Properly set hw.state on failures
arm64/gcs: Fix error handling in arch_set_shadow_stack_status()
arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
arm64: poe: fix stale POR_EL0 values for ptrace
kselftest/arm64: Raise default number of loops in fp-pidbench
kselftest/arm64: Add a no-SVE loop after SVE in fp-pidbench
perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD
arm64: mte: Set TCMA1 whenever MTE is present in the kernel
arm64/ptrace: Return early for ptrace_report_syscall_entry() error
arm64/ptrace: Split report_syscall()
arm64: Remove unused _TIF_WORK_MASK
kselftest/arm64: Add missing file in .gitignore
arm64: errata: Workaround for SI L1 downstream coherency issue
kselftest/arm64: Add HWCAP test for FEAT_LS64
arm64: Add support for FEAT_{LS64, LS64_V}
KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory
KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
...
|
||
|
|
79c0abaf06 |
sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN
When none of the allowed CPUs of a task are online, it gets migrated to the fallback cpumask which is all the non nohz_full CPUs. However just like nohz_full CPUs, domain isolated CPUs don't want to be disturbed by tasks that have lost their CPU affinities. And since nohz_full rely on domain isolation to work correctly, the housekeeping mask of domain isolated CPUs should always be a subset of the housekeeping mask of nohz_full CPUs (there can be CPUs that are domain isolated but not nohz_full, OTOH there shouldn't be nohz_full CPUs that are not domain isolated): HK_TYPE_DOMAIN & HK_TYPE_KERNEL_NOISE == HK_TYPE_DOMAIN Therefore use HK_TYPE_DOMAIN as the appropriate fallback target for tasks. Note that cpuset isolated partitions are not supported on those systems and may result in undefined behaviour. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org |
||
|
|
78a00cac1e |
docs: fix 're-use' -> 'reuse' in documentation
Signed-off-by: Rhys Tumelty <rhys@tumelty.co.uk> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260128220233.179439-1-rhys@tumelty.co.uk> |
||
|
|
2f8aed5e97 |
Merge branch 'for-next/errata' into for-next/core
* for-next/errata: arm64: errata: Workaround for SI L1 downstream coherency issue |
||
|
|
a592a36e49 |
Documentation: use a source-read extension for the index link boilerplate
The root document usually has a special :ref:`genindex` link to the generated index. This is also the case for Documentation/index.rst. The other index.rst files deeper in the directory hierarchy usually don't. For SPHINXDIRS builds, the root document isn't Documentation/index.rst, but some other index.rst in the hierarchy. Currently they have a ".. only::" block to add the index link when doing SPHINXDIRS html builds. This is obviously very tedious and repetitive. The link is also added to all index.rst files in the hierarchy for SPHINXDIRS builds, not just the root document. Put the boilerplate in a sphinx-includes/subproject-index.rst file, and include it at the end of the root document for subproject builds in an ad-hoc source-read extension defined in conf.py. For now, keep having the boilerplate in translations, because this approach currently doesn't cover translated index link headers. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Tested-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> [jc: did s/doctree/kern_doc_dir/ ] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260123143149.2024303-1-jani.nikula@intel.com> |
||
|
|
3fed7e0059 |
arm64: errata: Workaround for SI L1 downstream coherency issue
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.
This erratum affects any system topology in which the following
conditions apply:
- The Point of Serialization (PoS) is located downstream of the
interconnect.
- A downstream observer accesses memory directly, bypassing the
interconnect.
Conditions:
This erratum occurs only when all of the following conditions are met:
1. Software executes a data cache maintenance operation, specifically,
a clean or clean&invalidate by virtual address (DC CVAC or DC
CIVAC), that hits on unique dirty data in the CPU or DSU cache.
This results in a combined CopyBack and CMO being issued to the
interconnect.
2. The interconnect splits the combined transaction into separate Write
and CMO transactions and returns an early completion response to the
CPU or DSU before the write has completed at the downstream memory
or PoS.
3. A downstream observer accesses the affected memory address after the
early completion response is issued but before the actual memory
write has completed. This allows the observer to read stale data
that has not yet been updated at the PoS or downstream memory.
The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC. This way of implementation mitigates
performance penalty compared to purely duplicate original CMO.
Signed-off-by: Lucas Wei <lucaswei@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
58ce78667a |
arm64: Add support for FEAT_{LS64, LS64_V}
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
These features are identified by ID_AA64ISAR1_EL1.LS64 and the
use of such instructions in userspace (EL0) can be trapped.
As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell
apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later
patch will be exported to userspace, FEAT_LS64_V will be enabled only
in kernel.
In order to support the use of corresponding instructions in userspace:
- Make ID_AA64ISAR1_EL1.LS64 visbile to userspace
- Add identifying and enabling in the cpufeature list
- Expose these support of these features to userspace through HWCAP3
and cpuinfo
ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for
special memory (device memory) so requires support by the CPU, system
and target memory location (device that support these instructions).
The HWCAP3_LS64, implies the support of CPU and system (since no
identification method from system, so SoC vendors should advertise
support in the CPU if system also support them).
Otherwise for ld64b/st64b the atomicity may not be guaranteed or a
DABT will be generated, so users (probably userspace driver developer)
should make sure the target memory (device) also have the support.
For st64bv 0xffffffffffffffff will be returned as status result for
unsupported memory so user should check it.
Document the restrictions along with HWCAP3_LS64.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
17c05cb0ef |
Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY perf/arm-ni: Fix and optimise register offset calculation perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() perf/arm-ni: Add NoC S3 support perf/arm_cspmu: nvidia: Add pmevfiltr2 support perf/arm_cspmu: nvidia: Add revision id matching perf/arm_cspmu: Add pmpidr support perf/arm_cspmu: Add callback to reset filter config perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores * for-next/misc: : Miscellaneous patches arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index arm64: mm: make linear mapping permission update more robust for patial range arm64/mm: Elide TLB flush in certain pte protection transitions arm64/mm: Rename try_pgd_pgtable_alloc_init_mm arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arm64: add unlikely hint to MTE async fault check in el0_svc_common arm64: acpi: add newline to deferred APEI warning arm64: entry: Clean out some indirection arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz() arm64: remove unused ARCH_PFN_OFFSET arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack arm64: Remove assertion on CONFIG_VMAP_STACK * for-next/kselftest: : arm64 kselftest patches kselftest/arm64: Align zt-test register dumps * for-next/efi-preempt: : arm64: Make EFI calls preemptible arm64/efi: Call EFI runtime services without disabling preemption arm64/efi: Move uaccess en/disable out of efi_set_pgd() arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper arm64/fpsimd: Permit kernel mode NEON with IRQs off arm64/fpsimd: Don't warn when EFI execution context is preemptible efi/runtime-wrappers: Keep track of the efi_runtime_lock owner efi: Add missing static initializer for efi_mm::cpus_allowed_lock * for-next/assembler-macro: : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers * for-next/typos: : Random typo/spelling fixes arm64: Fix double word in comments arm64: Fix typos and spelling errors in comments * for-next/sme-ptrace-disable: : Support disabling streaming mode via ptrace on SME only systems kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems arm64/sme: Support disabling streaming mode via ptrace on SME only systems * for-next/local-tlbi-page-reused: : arm64, mm: avoid TLBI broadcast if page reused in write fault arm64, tlbflush: don't TLBI broadcast if page reused in write fault mm: add spurious fault fixing support for huge pmd * for-next/mpam: (34 commits) : Basic Arm MPAM driver (more to follow) MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state arm_mpam: Use long MBWU counters if supported arm_mpam: Probe for long/lwd mbwu counters arm_mpam: Consider overflow in bandwidth counter state arm_mpam: Track bandwidth counter state for power management arm_mpam: Add mpam_msmon_read() to read monitor value arm_mpam: Add helpers to allocate monitors arm_mpam: Probe and reset the rest of the features arm_mpam: Allow configuration to be applied and restored during cpu online arm_mpam: Use a static key to indicate when mpam is enabled arm_mpam: Register and enable IRQs arm_mpam: Extend reset logic to allow devices to be reset any time arm_mpam: Add a helper to touch an MSC from any CPU arm_mpam: Reset MSC controls from cpuhp callbacks arm_mpam: Merge supported features during mpam_enable() into mpam_class arm_mpam: Probe the hardware features resctrl supports arm_mpam: Add helpers for managing the locking around the mon_sel registers ... * for-next/acpi: : arm64 acpi updates ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() * for-next/documentation: : arm64 Documentation updates Documentation/arm64: Fix the typo of register names |
||
|
|
4b7a59fa70 |
Documentation/arm64: Fix the typo of register names
The register name 'HWFGWTR_EL2' and 'HWFGRTR_EL2' is wrong, should be 'HFGWTR_EL2' and 'HFGRTR_EL2'. Find the register description on arm website here, https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGWTR-EL2--Hypervisor-Fine-Grained-Write-Trap-Register https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGRTR-EL2--Hypervisor-Fine-Grained-Read-Trap-Register?lang=en Signed-off-by: Zenon Xiu <zenonxiu@outlook.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
|
|
472800cd5e |
arm64/sme: Support disabling streaming mode via ptrace on SME only systems
Currently it is not possible to disable streaming mode via ptrace on SME only systems, the interface for doing this is to write via NT_ARM_SVE but such writes will be rejected on a system without SVE support. Enable this functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such writes currently error since we require that a vector length is specified which should minimise the risk that existing software is relying on current behaviour. Reads are not supported since I am not aware of any use case for this and there is some risk that an existing userspace application may be confused if it reads NT_ARM_SVE on a system without SVE. Existing kernels will return FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not stored, for example if the task has not used SVE. Returning a vector length of 0 would create a risk that software would try to do things like allocate space for register state with zero sizes, while returning a vector length of 128 bits would look like SVE is supported. It seems safer to just not make the changes to add read support. It remains possible for userspace to detect a SME only system via the ptrace interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will succeed while reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in non-streaming mode is available via REGSET_FPR. sve_set_common() already avoids allocating SVE storage when doing a FPSIMD formatted write and allocating SME storage when doing a NT_ARM_SVE write so we change the function to validate the new case and skip setting a vector length for it. The aim is to make a minimally invasive change, no operation that would previously have succeeded will be affected, and we use a previously defined interface in new circumstances rather than define completely new ABI. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: David Spickett <david.spickett@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
|
|
f2d64a22fa |
Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (29 commits) perf/dwc_pcie: Fix use of uninitialized variable Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver perf/arm-cmn: Fix CMN S3 DTM offset perf: arm_spe: Prevent overflow in PERF_IDX2OFF() coresight: trbe: Prevent overflow in PERF_IDX2OFF() MAINTAINERS: Remove myself from HiSilicon PMU maintainers drivers/perf: hisi: Add support for HiSilicon MN PMU driver drivers/perf: hisi: Add support for HiSilicon NoC PMU perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS arm64/boot: Factor out a macro to check SPE version ... |
||
|
|
e0669b95f7 |
Merge branch 'for-next/docs' into for-next/core
* for-next/docs: arm64/sme: Drop inaccurate documentation of streaming mode switches |
||
|
|
0c33aa1804 |
arm64: errata: Apply workarounds for Neoverse-V3AE
Neoverse-V3AE is also affected by erratum #3312417, as described in its Software Developer Errata Notice (SDEN) document: Neoverse V3AE (MP172) SDEN v9.0, erratum 3312417 https://developer.arm.com/documentation/SDEN-2615521/9-0/ Enable the workaround for Neoverse-V3AE, and document this. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
00d7a1af5a |
arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
SPE data source filtering (optional from Armv8.8) requires that traps to the filter register PMSDSFR be disabled. Document the requirements and disable the traps if the feature is present. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
220928e52c |
arm64/hwcap: Add hwcap for FEAT_LSFE
FEAT_LSFE (Large System Float Extension), providing atomic floating point memory operations, is optional from v9.5. This feature adds no new architectural stare and we have no immediate use for it in the kernel so simply provide a hwcap for it to support discovery by userspace. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
3c0979c644 |
arm64/sme: Drop inaccurate documentation of streaming mode switches
The SME ABI documentation contains an inaccurate description of the architectural streaming mode entry/exit behaviour, just remove it since this is better documented by the architecture or with the rest of the documentation for the specific software interfaces concerned. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
63eb28bb14 |
ARM:
- Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes. LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups. RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical. - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps. - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently. - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed. - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo. - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone. - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list). - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC. - Various cleanups and fixes. x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests. - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still). - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model. - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care. - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance. - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data. Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique. - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions. - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL. - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely. - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation. Selftests: - Fix a comment typo. - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing). - Skip tests that hit EACCES when attempting to access a file, and rpint a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU 9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw== =j2At -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ... |
||
|
|
5b1ae9de71 |
Merge branch 'for-next/feat_mte_store_only' into for-next/core
* for-next/feat_mte_store_only: : MTE feature to restrict tag checking to store only operations kselftest/arm64/mte: Add MTE_STORE_ONLY testcases kselftest/arm64/mte: Preparation for mte store only test kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap test KVM: arm64: Expose MTE_STORE_ONLY feature to guest arm64/hwcaps: Add MTE_STORE_ONLY hwcaps arm64/kernel: Support store-only mte tag check prctl: Introduce PR_MTE_STORE_ONLY arm64/cpufeature: Add MTE_STORE_ONLY feature |
||
|
|
3ae8cef210 |
Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', 'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (23 commits) drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation perf/cxlpmu: Remove unintended newline from IRQ name format string perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe() perf: arm_spe: Relax period restriction perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) KVM: arm64: nvhe: Disable branch generation in nVHE guests arm64: Handle BRBE booting requirements arm64/sysreg: Add BRBE registers and fields perf/arm: Add missing .suppress_bind_attrs perf/arm-cmn: Reduce stack usage during discovery perf: imx9_perf: make the read-only array mask static const perf/arm-cmn: Broaden module description for wider interconnect support ... * for-next/livepatch: : Support for HAVE_LIVEPATCH on arm64 arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: Implement HAVE_LIVEPATCH arm64: stacktrace: Implement arch_stack_walk_reliable() arm64: stacktrace: Check kretprobe_find_ret_addr() return value arm64/module: Use text-poke API for late relocations. * for-next/user-contig-bbml2: : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts arm64/mm: Elide tlbi in contpte_convert() under BBML2 iommu/arm: Add BBM Level 2 smmu feature arm64: Add BBM Level 2 cpu feature arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type * for-next/misc: : Miscellaneous arm64 patches arm64/gcs: task_gcs_el0_enable() should use passed task arm64: signal: Remove ISB when resetting POR_EL0 arm64/mm: Drop redundant addr increment in set_huge_pte_at() arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get arm64: pi: use 'targets' instead of extra-y in Makefile * for-next/acpi: : Various ACPI arm64 changes ACPI: Suppress misleading SPCR console message when SPCR table is absent ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled * for-next/debug-entry: : Simplify the debug exception entry path arm64: debug: remove debug exception registration infrastructure arm64: debug: split bkpt32 exception entry arm64: debug: split brk64 exception entry arm64: debug: split hardware watchpoint exception entry arm64: debug: split single stepping exception entry arm64: debug: refactor reinstall_suspended_bps() arm64: debug: split hardware breakpoint exception entry arm64: entry: Add entry and exit functions for debug exceptions arm64: debug: remove break/step handler registration infrastructure arm64: debug: call step handlers statically arm64: debug: call software breakpoint handlers statically arm64: refactor aarch32_break_handler() arm64: debug: clean up single_step_handler logic * for-next/feat_mte_tagged_far: : Support for reporting the non-address bits during a synchronous MTE tag check fault kselftest/arm64/mte: Add mtefar tests on check_mmap_options kselftest/arm64/mte: Refactor check_mmap_option test kselftest/arm64/mte: Add verification for address tag in signal handler kselftest/arm64/mte: Add address tag related macro and function kselftest/arm64/mte: Check MTE_FAR feature is supported kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS kselftest/arm64: Add MTE_FAR hwcap test KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature * for-next/kselftest: : Kselftest updates for arm64 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems kselftest/arm4: Provide local defines for AT_HWCAP3 kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace kselftest/arm64: Fix check for setting new VLs in sve-ptrace kselftest/arm64: Convert tpidr2 test to use kselftest.h * for-next/mdscr-cleanup: : Drop redundant DBG_MDSCR_* macros KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t arm64/debug: Drop redundant DBG_MDSCR_* macros * for-next/vmap-stack: : Force VMAP_STACK on arm64 arm64: remove CONFIG_VMAP_STACK checks from entry code arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN arm64: efi: Remove CONFIG_VMAP_STACK check arm64: Mandate VMAP_STACK arm64: efi: Fix KASAN false positive for EFI runtime stack arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth() arm64/gcs: Don't call gcs_free() during flush_gcs() arm64: Restrict pagetable teardown to avoid false warning docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst |
||
|
|
42d36969e3 |
docs: arm64: gic-v5: Document booting requirements for GICv5
Document the requirements for booting a kernel on a system implementing a GICv5 interrupt controller. Specifically, other than DT/ACPI providing the required firmware representation, define what traps must be disabled if the kernel is booted at EL1 on a system where EL2 is implemented. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-30-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
ae344bcb0d |
arm64: Handle BRBE booting requirements
To use the Branch Record Buffer Extension (BRBE), some configuration is
necessary at EL3 and EL2. This patch documents the requirements and adds
the initial EL2 setup code, which largely consists of configuring the
fine-grained traps and initializing a couple of BRBE control registers.
Before this patch, __init_el2_fgt() would initialize HDFGRTR_EL2 and
HDFGWTR_EL2 with the same value, relying on the read/write trap controls
for a register occupying the same bit position in either register. The
'nBRBIDR' trap control only exists in bit 59 of HDFGRTR_EL2, while bit
59 of HDFGWTR_EL2 is RES0, and so this assumption no longer holds.
To handle HDFGRTR_EL2 and HDFGWTR_EL2 having (slightly) different bit
layouts, __init_el2_fgt() is changed to accumulate the HDFGRTR_EL2 and
HDFGWTR_EL2 control bits separately. While making this change the
open-coded value (1 << 62) is replaced with
HDFG{R,W}TR_EL2_nPMSNEVFR_EL1_MASK.
The BRBCR_EL1 and BRBCR_EL2 registers are unusual and require special
initialisation: even though they are subject to E2H renaming, both have
an effect regardless of HCR_EL2.TGE, even when running at EL2. So we
must initialize BRBCR_EL2 in case we run in nVHE mode. This is handled
in __init_el2_brbe() with a comment to explain the situation.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
[Mark: rewrite commit message, fix typo in comment]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
tested-by: Adam Young <admiyo@os.amperecomputing.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-2-e7775563036e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
f620372209 |
arm64/hwcaps: Add MTE_STORE_ONLY hwcaps
Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. add MTE_STORE_ONLY hwcaps so that user can use this feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250618092957.2069907-5-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
|
|
7c7f55039b |
arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported
If FEAT_MTE_TAGGED_FAR (Armv8.9) is supported, bits 63:60 of the fault address
are preserved in response to synchronous tag check faults (SEGV_MTESERR).
This patch modifies below to support this feature:
- Use the original FAR_EL1 value when an MTE tag check fault occurs,
if ARM64_MTE_FAR is supported so that not only logical tag
(bits 59:56) but also address tag (bits 63:60] being reported too.
- Add HWCAP for mtefar to let user know bits 63:60 includes
address tag information when when FEAT_MTE_TAGGED_FAR is supported.
Applications that require this information should install
a signal handler with the SA_EXPOSE_TAGBITS flag.
While this introduces a minor ABI change,
most applications do not set this flag and therefore will not be affected.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20250618084513.1761345-3-yeoreum.yun@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
||
|
|
c0c7fa4e7a |
docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
Fix trivial ICC_SRE_EL2 register spelling typo in booting.rst. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> CC: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250610120935.852034-1-lpieralisi@kernel.org Signed-off-by: Will Deacon <will@kernel.org> |
||
|
|
43db111107 |
ARM:
* Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. * Enable nested virtualisation support on systems that support it, though it is disabled by default. * Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. * Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. * Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. * New selftest checking the pKVM ownership transition rules * Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. * Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. * Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. * Add a new selftest for the SVE host state being corrupted by a guest. * Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. * Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. * Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. * and the usual random cleanups. LoongArch: * Don't flush tlb if the host supports hardware page table walks. * Add KVM selftests support. RISC-V: * Add vector registers to get-reg-list selftest * VCPU reset related improvements * Remove scounteren initialization from VCPU reset * Support VCPU reset from userspace using set_mpstate() ioctl x86: * Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit |
||
|
|
53a087046a |
Merge branch 'for-next/sme-fixes' into for-next/core
* for-next/sme-fixes: (35 commits)
arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected
arm64/fpsimd: ptrace: Gracefully handle errors
arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state
arm64/fpsimd: ptrace: Do not present register data for inactive mode
arm64/fpsimd: ptrace: Save task state before generating SVE header
arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state
arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data
arm64/fpsimd: Make clone() compatible with ZA lazy saving
arm64/fpsimd: Clear PSTATE.SM during clone()
arm64/fpsimd: Consistently preserve FPSIMD state during clone()
arm64/fpsimd: Remove redundant task->mm check
arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return()
arm64/fpsimd: Add task_smstop_sm()
arm64/fpsimd: Factor out {sve,sme}_state_size() helpers
arm64/fpsimd: Clarify sve_sync_*() functions
arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE
arm64/fpsimd: signal: Consistently read FPSIMD context
arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state
arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only
arm64/fpsimd: Do not discard modified SVE state
...
|
||
|
|
fed55f49fa |
arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
b87c8c4aca |
arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state
Currently, vec_set_vector_length() can manipulate a task into an invalid
state as a result of a prctl/ptrace syscall which changes the SVE/SME
vector length, resulting in several problems:
(1) When changing the SVE vector length, if the task initially has
PSTATE.ZA==1, and sve_alloc() fails to allocate memory, the task
will be left with PSTATE.ZA==1 and sve_state==NULL. This is not a
legitimate state, and could result in a subsequent null pointer
dereference.
(2) When changing the SVE vector length, if the task initially has
PSTATE.SM==1, the task will be left with PSTATE.SM==1 and
fp_type==FP_STATE_FPSIMD. Streaming mode state always needs to be
saved in SVE format, so this is not a legitimate state.
Attempting to restore this state may cause a task to erroneously
inherit stale streaming mode predicate registers and FFR contents,
behaving non-deterministically and potentially leaving information
from another task.
While in this state, reads of the NT_ARM_SSVE regset will indicate
that the registers are not stored in SVE format. For the NT_ARM_SSVE
regset specifically, debuggers interpret this as meaning that
PSTATE.SM==0.
(3) When changing the SME vector length, if the task initially has
PSTATE.SM==1, the lower 128 bits of task's streaming mode vector
state will be migrated to non-streaming mode, rather than these bits
being zeroed as is usually the case for changes to PSTATE.SM.
To fix the first issue, we can eagerly allocate the new sve_state and
sme_state before modifying the task. This makes it possible to handle
memory allocation failure without modifying the task state at all, and
removes the need to clear TIF_SVE and TIF_SME.
To fix the second issue, we either need to clear PSTATE.SM or not change
the saved fp_type. Given we're going to eagerly allocate sve_state and
sme_state, the simplest option is to preserve PSTATE.SM and the saves
fp_type, and consistently truncate the SVE state. This ensures that the
task always stays in a valid state, and by virtue of not exiting
streaming mode, this also sidesteps the third issue.
I believe these changes should not be problematic for realistic usage:
* When the SVE/SME vector length is changed via prctl(), syscall entry
will have cleared PSTATE.SM. Unless the task's state has been
manipulated via ptrace after entry, the task will have PSTATE.SM==0.
* When the SVE/SME vector length is changed via a write to the
NT_ARM_SVE or NT_ARM_SSVE regsets, PSTATE.SM will be forced
immediately after the length change, and new vector state will be
copied from userspace.
* When the SME vector length is changed via a write to the NT_ARM_ZA
regset, the (S)SVE state is clobbered today, so anyone who cares about
the specific state would need to install this after writing to the
NT_ARM_ZA regset.
As we need to free the old SVE state while TIF_SVE may still be set, we
cannot use sve_free(), and using kfree() directly makes it clear that
the free pairs with the subsequent assignment. As this leaves sve_free()
unused, I've removed the existing sve_free() and renamed __sve_free() to
mirror sme_free().
Fixes:
|
||
|
|
cde5c32db5 |
arm64/fpsimd: Make clone() compatible with ZA lazy saving
Linux is intended to be compatible with userspace written to Arm's
AAPCS64 procedure call standard [1,2]. For the Scalable Matrix Extension
(SME), AAPCS64 was extended with a "ZA lazy saving scheme", where SME's
ZA tile is lazily callee-saved and caller-restored. In this scheme,
TPIDR2_EL0 indicates whether the ZA tile is live or has been saved by
pointing to a "TPIDR2 block" in memory, which has a "za_save_buffer"
pointer. This scheme has been implemented in GCC and LLVM, with
necessary runtime support implemented in glibc and bionic.
AAPCS64 does not specify how the ZA lazy saving scheme is expected to
interact with thread creation mechanisms such as fork() and
pthread_create(), which would be implemented in terms of the Linux clone
syscall. The behaviour implemented by Linux and glibc/bionic doesn't
always compose safely, as explained below.
Currently the clone syscall is implemented such that PSTATE.ZA and the
ZA tile are always inherited by the new task, and TPIDR2_EL0 is
inherited unless the 'flags' argument includes CLONE_SETTLS,
in which case TPIDR2_EL0 is set to 0/NULL. This doesn't make much sense:
(a) TPIDR2_EL0 is part of the calling convention, and changes as control
is passed between functions. It is *NOT* used for thread local
storage, despite superficial similarity to TPIDR_EL0, which is is
used as the TLS register.
(b) TPIDR2_EL0 and PSTATE.ZA are tightly coupled in the procedure call
standard, and some combinations of states are illegal. In general,
manipulating the two independently is not guaranteed to be safe.
In practice, code which is compliant with the procedure call standard
may issue a clone syscall while in the "ZA dormant" state, where
PSTATE.ZA==1 and TPIDR2_EL0 is non-null and indicates that ZA needs to
be saved. This can cause a variety of problems, including:
* If the implementation of pthread_create() passes CLONE_SETTLS, the
new thread will start with PSTATE.ZA==1 and TPIDR2==NULL. Per the
procedure call standard this is not a legitimate state for most
functions. This can cause data corruption (e.g. as code may rely on
PSTATE.ZA being 0 to guarantee that an SMSTART ZA instruction will
zero the ZA tile contents), and may result in other undefined
behaviour.
* If the implementation of pthread_create() does not pass CLONE_SETTLS, the
new thread will start with PSTATE.ZA==1 and TPIDR2 pointing to a
TPIDR2 block on the parent thread's stack. This can result in a
variety of problems, e.g.
- The child may write back to the parent's za_save_buffer, corrupting
its contents.
- The child may read from the TPIDR2 block after the parent has reused
this memory for something else, and consequently the child may abort
or clobber arbitrary memory.
Ideally we'd require that userspace ensures that a task is in the "ZA
off" state (with PSTATE.ZA==0 and TPIDR2_EL0==NULL) prior to issuing a
clone syscall, and have the kernel force this state for new threads.
Unfortunately, contemporary C libraries do not do this, and simply
forcing this state within the implementation of clone would break
fork().
Instead, we can bodge around this by considering the CLONE_VM flag, and
manipulate PSTATE.ZA and TPIDR2_EL0 as a pair. CLONE_VM indicates that
the new task will run in the same address space as its parent, and in
that case it doesn't make sense to inherit a stale pointer to the
parent's TPIDR2 block:
* For fork(), CLONE_VM will not be set, and it is safe to inherit both
PSTATE.ZA and TPIDR2_EL0 as the new task will have its own copy of the
address space, and cannot clobber its parent's stack.
* For pthread_create() and vfork(), CLONE_VM will be set, and discarding
PSTATE.ZA and TPIDR2_EL0 for the new task doesn't break any existing
assumptions in userspace.
Implement this behaviour for clone(). We currently inherit PSTATE.ZA in
arch_dup_task_struct(), but this does not have access to the clone
flags, so move this logic under copy_thread(). Documentation is updated
to describe the new behaviour.
[1] https://github.com/ARM-software/abi-aa/releases/download/2025Q1/aapcs64.pdf
[2]
|
||
|
|
17efc1acee |
arm64: Expose AIDR_EL1 via sysfs
The KVM PV ABI recently added a feature that allows the VM to discover
the set of physical CPU implementations, identified by a tuple of
{MIDR_EL1, REVIDR_EL1, AIDR_EL1}. Unlike other KVM PV features, the
expectation is that the VMM implements the hypercall instead of KVM as
it has the authoritative view of where the VM gets scheduled.
To do this the VMM needs to know the values of these registers on any
CPU in the system. While MIDR_EL1 and REVIDR_EL1 are already exposed,
AIDR_EL1 is not. Provide it in sysfs along with the other identification
registers.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250403231626.3181116-1-oliver.upton@linux.dev
Signed-off-by: Will Deacon <will@kernel.org>
|
||
|
|
b376108e1f |
arm64/fpsimd: signal: Clear TPIDR2 when delivering signals
Linux is intended to be compatible with userspace written to Arm's
AAPCS64 procedure call standard [1,2]. For the Scalable Matrix Extension
(SME), AAPCS64 was extended with a "ZA lazy saving scheme", where SME's
ZA tile is lazily callee-saved and caller-restored. In this scheme,
TPIDR2_EL0 indicates whether the ZA tile is live or has been saved by
pointing to a "TPIDR2 block" in memory, which has a "za_save_buffer"
pointer. This scheme has been implemented in GCC and LLVM, with
necessary runtime support implemented in glibc.
AAPCS64 does not specify how the ZA lazy saving scheme is expected to
interact with signal handling, and the behaviour that AAPCS64 currently
recommends for (sig)setjmp() and (sig)longjmp() does not always compose
safely with signal handling, as explained below.
When Linux delivers a signal, it creates signal frames which contain the
original values of PSTATE.ZA, the ZA tile, and TPIDR_EL2. Between saving
the original state and entering the signal handler, Linux clears
PSTATE.ZA, but leaves TPIDR2_EL0 unchanged. Consequently a signal
handler can be entered with PSTATE.ZA=0 (meaning accesses to ZA will
trap), while TPIDR_EL0 is non-null (which may indicate that ZA needs to
be lazily saved, depending on the contents of the TPIDR2 block). While
in this state, libc and/or compiler runtime code, such as longjmp(), may
attempt to save ZA. As PSTATE.ZA=0, these accesses will trap, causing
the kernel to inject a SIGILL. Note that by virtue of lazy saving
occurring in libc and/or C runtime code, this can be triggered by
application/library code which is unaware of SME.
To avoid the problem above, the kernel must ensure that signal handlers
are entered with PSTATE.ZA and TPIDR2_EL0 configured in a manner which
complies with the ZA lazy saving scheme. Practically speaking, the only
choice is to enter signal handlers with PSTATE.ZA=0 and TPIDR2_EL0=NULL.
This change should not impact SME code which does not follow the ZA lazy
saving scheme (and hence does not use TPIDR2_EL0).
An alternative approach that was considered is to have the signal
handler inherit the original values of both PSTATE.ZA and TPIDR2_EL0,
relying on lazy save/restore sequences being idempotent and capable of
racing safely. This is not safe as signal handlers must be assumed to
have a "private ZA" interface, and therefore cannot be entered with
PSTATE.ZA=1 and TPIDR2_EL0=NULL, but it is legitimate for signals to be
taken from this state.
With the kernel fixed to clear TPIDR2_EL0, there are a couple of
remaining issues (largely masked by the first issue) that must be fixed
in userspace:
(1) When a (sig)setjmp() + (sig)longjmp() pair cross a signal boundary,
ZA state may be discarded when it needs to be preserved.
Currently, the ZA lazy saving scheme recommends that setjmp() does
not save ZA, and recommends that longjmp() is responsible for saving
ZA. A call to longjmp() in a signal handler will not have visibility
of ZA state that existed prior to entry to the signal, and when a
longjmp() is used to bypass a usual signal return, unsaved ZA state
will be discarded erroneously.
To fix this, it is necessary for setjmp() to eagerly save ZA state,
and for longjmp() to configure PSTATE.ZA=0 and TPIDR2_EL0=NULL. This
works regardless of whether a signal boundary is crossed.
(2) When a C++ exception is thrown and crosses a signal boundary before
it is caught, ZA state may be discarded when it needs to be
preserved.
AAPCS64 requires that exception handlers are entered with
PSTATE.{SM,ZA}={0,0} and TPIDR2_EL0=NULL, with exception unwind code
expected to perform any necessary save of ZA state.
Where it is necessary to perform an exception unwind across an
exception boundary, the unwind code must recover any necessary ZA
state (along with TPIDR2) from signal frames.
Fix the kernel as described above, with setup_return() clearing
TPIDR2_EL0 when delivering a signal. Folk on CC are working on fixes for
the remaining userspace issues, including updates/fixes to the AAPCS64
specification and glibc.
[1] https://github.com/ARM-software/abi-aa/releases/download/2025Q1/aapcs64.pdf
[2]
|
||
|
|
eb0ece1602 |
- The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was founf to be incorrect. - The 4 patch series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The 17 patch series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The 5 patch series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The 4 patch series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The 12 patch series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The 2 patch series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The 3 patch series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The 4 patch series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The 4 patch series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The 4 patch series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The 18 patch series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The 27 patch series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The 12 patch series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The 8 patch series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The 2 patch series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The 3 patch series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The 3 patch series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The 3 patch series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The 9 patch series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The 5 patch series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The 13 patch series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The 13 patch series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The 3 patch series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The 8 patch series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The 2 patch series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The 3 patch series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The 4 patch series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The 3 patch series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The 5 patch series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The 5 patch series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The 4 patch series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The 2 patch series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF Ee93qSSLR1BkNdDw+931Pu0mXfbnBw== =Pn2K -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ... |
||
|
|
2d09a9449e |
arm64 updates for 6.15:
Perf and PMUs:
- Support for the "Rainier" CPU PMU from Arm
- Preparatory driver changes and cleanups that pave the way for BRBE
support
- Support for partial virtualisation of the Apple-M1 PMU
- Support for the second event filter in Arm CSPMU designs
- Minor fixes and cleanups (CMN and DWC PMUs)
- Enable EL2 requirements for FEAT_PMUv3p9
Power, CPU topology:
- Support for AMUv1-based average CPU frequency
- Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds
a generic topology_is_primary_thread() function overridden by x86 and
powerpc
New(ish) features:
- MOPS (memcpy/memset) support for the uaccess routines
Security/confidential compute:
- Fix the DMA address for devices used in Realms with Arm CCA. The
CCA architecture uses the address bit to differentiate between shared
and private addresses
- Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
default
Memory management clean-ups:
- Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs
- Some minor page table accessor clean-ups
- PIE/POE (permission indirection/overlay) helpers clean-up
Kselftests:
- MTE: skip hugetlb tests if MTE is not supported on such mappings and
user correct naming for sync/async tag checking modes
Miscellaneous:
- Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
request)
- Sysreg updates for new register fields
- CPU type info for some Qualcomm Kryo cores
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmfjB2QACgkQa9axLQDI
XvGrfg//W3Bx9+jw1G/XHHEQqGEVFmvltvxZUkvgV0Qki0rPSMnappJhZRL9n0Nm
V6PvGd2KoKHZuL3g5ViZb3cs2R9BiD2JB6PncwBKuxumHGh3vz3kk1JMkDVfWdHv
qAceOckFJD9rXjPZn+PDsfYiEi2i3RRWIP5VglZ14ue8j3prHQ6DJXLUQF2GYvzE
/bgLSq44wp5N59ddy23+qH9rxrHzz3bgpbVv/F56W/LErvE873mRmyFwiuGJm+M0
Pn8ra572rI6a4sgSwrMTeNPBU+F9o5AbqwauVhkz428RdMvgfEuW6qHUBnGWJDmt
HotXmu+4Eb2KJks/iQkDo4OTJ38yUqvvZZJtP171ms3E4yqESSJngWP6O2A6LF+y
xhe0sESF/Ew6jLhM6/hvOmBcE2AyB14JE3ymqLkXbWub4NXddBn2AF1WXFjF4CBw
F8KSUhNLekrCYKv1k9M3nhvkcpoS9FkTF/TI+zEg546alI/GLPih6uDRkgMAODh1
RDJYixHsf2NDDRQbfwvt9Xua/KKpDF6qNkHLA4OiqqVUwh1hkas24Lrnp8vmce4o
wIpWCLqYWey8Rl3XWuWgWz2Xu58fHH4Dl2k72Z8I0pwp3abCDa9xEj79G0Svk7Si
Q+FCYrNlpKee1RXBC+1MUD/Gl5r/28dEUFkAzPD80F7AgafXPd0=
=Kc9c
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Nothing major this time around.
Apart from the usual perf/PMU updates, some page table cleanups, the
notable features are average CPU frequency based on the AMUv1
counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in
the uaccess routines.
Perf and PMUs:
- Support for the 'Rainier' CPU PMU from Arm
- Preparatory driver changes and cleanups that pave the way for BRBE
support
- Support for partial virtualisation of the Apple-M1 PMU
- Support for the second event filter in Arm CSPMU designs
- Minor fixes and cleanups (CMN and DWC PMUs)
- Enable EL2 requirements for FEAT_PMUv3p9
Power, CPU topology:
- Support for AMUv1-based average CPU frequency
- Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It
adds a generic topology_is_primary_thread() function overridden by
x86 and powerpc
New(ish) features:
- MOPS (memcpy/memset) support for the uaccess routines
Security/confidential compute:
- Fix the DMA address for devices used in Realms with Arm CCA. The
CCA architecture uses the address bit to differentiate between
shared and private addresses
- Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
default
Memory management clean-ups:
- Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs
- Some minor page table accessor clean-ups
- PIE/POE (permission indirection/overlay) helpers clean-up
Kselftests:
- MTE: skip hugetlb tests if MTE is not supported on such mappings
and user correct naming for sync/async tag checking modes
Miscellaneous:
- Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
request)
- Sysreg updates for new register fields
- CPU type info for some Qualcomm Kryo cores"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
arm64: mm: Don't use %pK through printk
perf/arm_cspmu: Fix missing io.h include
arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
arm64: cputype: Add MIDR_CORTEX_A76AE
arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
arm64/sysreg: Enforce whole word match for open/close tokens
arm64/sysreg: Fix unbalanced closing block
arm64: Kconfig: Enable HOTPLUG_SMT
arm64: topology: Support SMT control on ACPI based system
arch_topology: Support SMT control for OF based system
cpu/SMT: Provide a default topology_is_primary_thread()
arm64/mm: Define PTDESC_ORDER
perf/arm_cspmu: Add PMEVFILT2R support
perf/arm_cspmu: Generalise event filtering
perf/arm_cspmu: Move register definitons to header
arm64/kernel: Always use level 2 or higher for early mappings
arm64/mm: Drop PXD_TABLE_BIT
arm64/mm: Check pmd_table() in pmd_trans_huge()
...
|
||
|
|
0f40464674 |
Updates for interrupt chip drivers:
- Support for hard indices on RISC-V. The hart index identifies a hart
(core) within a specific interrupt domain in RISC-V's Priviledged
Architecture.
- Rework of the RISC-V MSI driver.
This moves the driver over to the generic MSI library and solves the
affinity problem of unmaskable PCI/MSI controllers. Unmaskable PCI/MSI
controllers are prone to lose interrupts when the MSI message is
updated to change the affinity because the message write consists of
three 32-bit subsequent writes, which update address and data. As these
writes are non-atomic versus the device raising an interrupt, the
device can observe a half written update and issue an interrupt on the
wrong vector. This is mitiated by a carefully orchestrated step by step
update and the observation of an eventually pending interrupt on the
CPU which issues the update. The algorithm follows the well established
method of the X86 MSI driver.
- A new driver for the RISC-V Sophgo SG2042 MSI controller
- Overhaul of the Renesas RZQ2L driver.
Simplification of the probe function by using devm_*() mechanisms,
which avoid the endless list of error prone gotos in the failure paths.
- Expand the Renesas RZV2H driver to support RZ/G3E SoCs
- A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to
ensure that the addressing is limited to the lower 32-bit of the
physical address space.
- Add support for the Allwinner AS23 NMI controller
- Expand the IMX irqsteer driver to handle up to 960 input interrupts
- The usual small updates, cleanups and device tree changes.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff454THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZqoD/4kdHzbxfLpf7vC3NnG8NWwTq5FpbSx
6grQC9hWNMAs4n2IFjJRFLrjeX3AcdAQXL/BWuM0LfW9tQDQaVmqlSIlB/bn69KB
7HyAR6ozbOgnHKGAqFUXSLf+4pq+6q3mOgGKIF289dy14HFu4ta0DqKgkPZeQnVs
R/J8i7REUnn+YuxzSt5eOqyDPyt2EHJosSUABSWQZBlrM9jy1W7f6NqDFwawiVsa
+tv4U/bz91vjzVxwTIgt7nJK+b2HVYdxoZYuKJwPaTsj26ANPp6ltjRTeOmZhb5h
uKgw+OyzDnk6q+tjGcRqrqwl291VKxCvnRiqHFfu3CERdmI9qvpN9IRcEJqIbkcN
cakekhAyt7OO7sEPcql5vBL97e9hpb7EcH78gYxwHf8Dy0rFZUvSC5v+L6VRFnJS
XcKA1L+f9B6u5qxnBtLan9IW08HYNdvmPq6AuVjk+ndKioPUFqB2q6AtXpuA3Rmu
Y3XH/wh/q5wk0pgeByxQW6swsfpMN3OYK3mpLx475wFh2NKzcdGlwGhDFhiw8DKX
m1AESy3UZatj1a0qGaFS/M+mm9KGrDYIMrje832Wf4Yf1LGmTsDkd3/V99oazSsq
Jm4qhDASXChJXd0imQICX9hPw0aHTlLYNs54obUXVULH4HivQKIgWhUXrjG0dBDL
+tttjuv5FJxr3A==
=jPHa
-----END PGP SIGNATURE-----
Merge tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq driver updates from Thomas Gleixner:
- Support for hard indices on RISC-V. The hart index identifies a hart
(core) within a specific interrupt domain in RISC-V's Priviledged
Architecture.
- Rework of the RISC-V MSI driver
This moves the driver over to the generic MSI library and solves the
affinity problem of unmaskable PCI/MSI controllers. Unmaskable
PCI/MSI controllers are prone to lose interrupts when the MSI message
is updated to change the affinity because the message write consists
of three 32-bit subsequent writes, which update address and data. As
these writes are non-atomic versus the device raising an interrupt,
the device can observe a half written update and issue an interrupt
on the wrong vector. This is mitiated by a carefully orchestrated
step by step update and the observation of an eventually pending
interrupt on the CPU which issues the update. The algorithm follows
the well established method of the X86 MSI driver.
- A new driver for the RISC-V Sophgo SG2042 MSI controller
- Overhaul of the Renesas RZQ2L driver
Simplification of the probe function by using devm_*() mechanisms,
which avoid the endless list of error prone gotos in the failure
paths.
- Expand the Renesas RZV2H driver to support RZ/G3E SoCs
- A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to
ensure that the addressing is limited to the lower 32-bit of the
physical address space.
- Add support for the Allwinner AS23 NMI controller
- Expand the IMX irqsteer driver to handle up to 960 input interrupts
- The usual small updates, cleanups and device tree changes
* tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
irqchip/imx-irqsteer: Support up to 960 input interrupts
irqchip/sunxi-nmi: Support Allwinner A523 NMI controller
dt-bindings: irq: sun7i-nmi: Document the Allwinner A523 NMI controller
irqchip/davinci-cp-intc: Remove public header
irqchip/renesas-rzv2h: Add RZ/G3E support
irqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP}
irqchip/renesas-rzv2h: Update TSSR_TIEN macro
irqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info
irqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info
irqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable
irqchip/renesas-rzv2h: Use devm_pm_runtime_enable()
irqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted()
irqchip/renesas-rzv2h: Simplify rzv2h_icu_init()
irqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv
irqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type()
dt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC
riscv: sophgo: dts: Add msi controller for SG2042
irqchip: Add the Sophgo SG2042 MSI interrupt controller
dt-bindings: interrupt-controller: Add Sophgo SG2042 MSI
arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI
...
|
||
|
|
f81c2b8150 |
It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
current standards and raise the minimum Python required to 3.9. Much of
this is preparatory to replacing the ancient Perl scripts/kernel-doc
horror with a slightly less horrifying Python implementation, expected
for 6.16.
- Update the minimum Sphinx required to 3.4.3, allowing us to remove a
bunch of older compatibility code.
- Rework and improve the generation of the ABI documentation.
(All of the above done by Mauro)
- Lots of translation updates. Alex Shi and Yanteng Si are taking on
responsibility for the Chinese translations going forward; that work will
still get to you via docs-next
- Try to standardize the format for indicating a developer's affiliation in
commit tags.
- Clarify the TAB's role in CoC enforcement actions.
- Try to spell out the rules for when a commit tag can name another
developer without their explicit permission.
Plus lots of other typo fixes and updates.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
=Gvdp
-----END PGP SIGNATURE-----
Merge tag 'docs-6.15' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
current standards and raise the minimum Python required to 3.9
Much of this is preparatory to replacing the ancient Perl
scripts/kernel-doc horror with a slightly less horrifying Python
implementation, expected for 6.16
- Update the minimum Sphinx required to 3.4.3, allowing us to remove
a bunch of older compatibility code
- Rework and improve the generation of the ABI documentation
(All of the above done by Mauro)
- Lots of translation updates. Alex Shi and Yanteng Si are taking on
responsibility for the Chinese translations going forward; that
work will still get to you via docs-next
- Try to standardize the format for indicating a developer's
affiliation in commit tags
- Clarify the TAB's role in CoC enforcement actions
- Try to spell out the rules for when a commit tag can name another
developer without their explicit permission
Plus lots of other typo fixes and updates"
* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
docs/zh_CN: fix spelling mistake
docs/Chinese: change the disclaimer words
docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
docs: driver-api: firmware: clarify userspace requirements
docs: clarify rules wrt tagging other people
docs: Remove outdated highuid.rst documentation
Documentation: dma-buf: heaps: Add heap name definitions
docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
docs: Correct installation instruction
Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
Documentation/CoC: Spell out the TAB role in enforcement decisions
Documentation: ocxl.rst: Update consortium site
scripts: get_feat.pl: substitute s390x with s390
scripts/kernel-doc: drop dead code for Wcontents_before_sections
scripts/kernel-doc: don't add not needed new lines
docs: driver-api/infiniband.rst: fix Kerneldoc markup
drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
include/asm-generic/io.h: fix kerneldoc markup
Docs/arch/arm64: Fix spelling in amu.rst
...
|
||
|
|
a5c96dfd47 |
docs: arm64: drop PTDUMP config options from ptdump.rst
Both GENERIC_PTDUMP and PTDUMP_CORE are not user selectable config options. Just drop these from documentation. Link: https://lkml.kernel.org/r/20250226122404.1927473-4-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
858c7bfcb3 |
arm64/boot: Enable EL2 requirements for FEAT_PMUv3p9
FEAT_PMUv3p9 registers such as PMICNTR_EL0, PMICFILTR_EL0, and PMUACR_EL1 access from EL1 requires appropriate EL2 fine grained trap configuration via FEAT_FGT2 based trap control registers HDFGRTR2_EL2 and HDFGWTR2_EL2. Otherwise such register accesses will result in traps into EL2. Add a new helper __init_el2_fgt2() which initializes FEAT_FGT2 based fine grained trap control registers HDFGRTR2_EL2 and HDFGWTR2_EL2 (setting the bits nPMICNTR_EL0, nPMICFILTR_EL0 and nPMUACR_EL1) to enable access into PMICNTR_EL0, PMICFILTR_EL0, and PMUACR_EL1 registers. Also update booting.rst with SCR_EL3.FGTEn2 requirement for all FEAT_FGT2 based registers to be accessible in EL2. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kvmarm@lists.linux.dev Fixes: |
||
|
|
696d107c68 |
Docs/arch/arm64: Fix spelling in amu.rst
Change though to through. Signed-off-by: Gabriel <gshahrouzi@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/67bd05b5.c80a0220.205997.19df@mx.google.com |
||
|
|
2d81e1bb62 |
irqchip/gic-v3: Add Rockchip 3568002 erratum workaround
Rockchip RK3566/RK3568 GIC600 integration has DDR addressing limited to the first 32bit of physical address space. Rockchip assigned Erratum ID #3568002 for this issue. Add driver quirk for this Rockchip GIC Erratum. Note, that the 0x0201743b GIC600 ID is not Rockchip-specific and is common for many ARM GICv3 implementations. Hence, there is an extra of_machine_is_compatible() check. Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20250216221634.364158-2-dmitry.osipenko@collabora.com |