linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-25 07:33:19 +02:00

Author	SHA1	Message	Date
Linus Torvalds	64edfa6506	Delete some obsolete networking code Old code like amateur radio and NFC have long been a burden to core networking developers. syzbot loves to find bugs in BKL-era code, and noobs try to fix them. If we want to have a fighting chance of surviving the LLM-pocalypse this code needs to find a dedicated owner or get deleted. We've talked about these deletions multiple times in the past and every time someone wanted the code to stay. It is never very clear to me how many of those people actually use the code vs are just nostalgic to see it go. Amateur radio did have occasional users (or so I think) but most users switched to user space implementations since its all super slow stuff. Nobody stepped up to maintain the kernel code. We were lucky enough to find someone who wants to help with NFC so we're giving that a chance. Let's try to put the rest of this code behind us. Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmnqqWYACgkQMUZtbf5S IrtEpQ/9F5+8POE6dg6gJVLDKx1+i6GiaOIweAl8h5DatzhBAAGuGr9JyTw0P/iy QX7/SU8WQIhi+LVTYBX9M5bJ3Rf+Iws4dll0CyoTTdOFvGwCAck8Ee/w+1gZdsQY aG0mQPmftfMEdZGX3KXt8UPDWG7QX4w1gSqxqYcSs1ohN6Txi1F94tmgqXgzYHzv vxWP3cF3XTv4eM6BpQj4tiLT3hvrTUfoCZEn9oF4Hn+miYU/yNlWxh0/pmfNjcxd vpNN0VfJVK48uPrj57Ep2x9OjkHPviojrUZT0Y55ENBhn1Lykry4MaxsJVsVYhuC OqJHQYTFyxwT/USTJxs1gplFyO0i37oCEEt43BKm2KS7rYHgc4pQgMJz7R2IS3wL z1xFl45QFt5kX3pw8BvWPXwBomkbDeFORB40Y1qc8RHMfAUKqOhbhzV8rDq9uKup 0nJxdijdh3/2qdO+LB1pU5rq/MbfAxOQSnRJmKLoKLVljaZHMAVbm829sdap8OM+ VMnyPF5hOAuTHV0NZJJ2BbcznI4MFDxM1lNEWFuRC39RQeeGRIHsNMjvs4HMHLaW V827UBXpUOK6HR3nGCKX3VpLJByUYAIkdIKvRugbWdynvXAw+FJUHx4wRzvFi6oi E7ucUY+FI5YOS1rmQJ+rqBjhThcIAdj2U9SNAykDKRVa7zPEUMU= =3vMU -----END PGP SIGNATURE----- Merge tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking deletions from Jakub Kicinski: "Delete some obsolete networking code Old code like amateur radio and NFC have long been a burden to core networking developers. syzbot loves to find bugs in BKL-era code, and noobs try to fix them. If we want to have a fighting chance of surviving the LLM-pocalypse this code needs to find a dedicated owner or get deleted. We've talked about these deletions multiple times in the past and every time someone wanted the code to stay. It is never very clear to me how many of those people actually use the code vs are just nostalgic to see it go. Amateur radio did have occasional users (or so I think) but most users switched to user space implementations since its all super slow stuff. Nobody stepped up to maintain the kernel code. We were lucky enough to find someone who wants to help with NFC so we're giving that a chance. Let's try to put the rest of this code behind us" * tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: drivers: net: 8390: wd80x3: Remove this driver drivers: net: 8390: ultra: Remove this driver drivers: net: 8390: AX88190: Remove this driver drivers: net: fujitsu: fmvj18x: Remove this driver drivers: net: smsc: smc91c92: Remove this driver drivers: net: smsc: smc9194: Remove this driver drivers: net: amd: nmclan: Remove this driver drivers: net: amd: lance: Remove this driver drivers: net: 3com: 3c589: Remove this driver drivers: net: 3com: 3c574: Remove this driver drivers: net: 3com: 3c515: Remove this driver drivers: net: 3com: 3c509: Remove this driver net: packetengines: remove obsolete yellowfin driver and vendor dir net: packetengines: remove obsolete hamachi driver net: remove unused ATM protocols and legacy ATM device drivers net: remove ax25 and amateur radio (hamradio) subsystem net: remove ISDN subsystem and Bluetooth CMTP caif: remove CAIF NETWORK LAYER	2026-04-24 09:41:58 -07:00
Jakub Kicinski	dd8d4bc28a	net: remove ax25 and amateur radio (hamradio) subsystem Remove the amateur radio (AX.25, NET/ROM, ROSE) protocol implementation and all associated hamradio device drivers from the kernel tree. This set of protocols has long been a huge bug/syzbot magnet, and since nobody stepped up to help us deal with the influx of the AI-generated bug reports we need to move it out of tree to protect our sanity. The code is moved to an out-of-tree repo: https://github.com/linux-netdev/mod-orphan if it's cleaned up and reworked there we can accept it back. Minimal stub headers are kept for include/net/ax25.h (AX25_P_IP, AX25_ADDR_LEN, ax25_address) and include/net/rose.h (ROSE_ADDR_LEN) so that the conditional integration code in arp.c and tun.c continues to compile and work when the out-of-tree modules are loaded. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Carlos Bilbao <carlos.bilbao@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260421021824.1293976-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-23 10:24:02 -07:00
Linus Torvalds	9cdca33667	integrity-v7.1 -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCad/SPRQcem9oYXJAbGlu dXguaWJtLmNvbQAKCRDLwZzRsCrn5TDuAQCT+OttUlEqKfGLUrmXjsqw+BdgSm59 vOwTUfY0uIjAsgEAzFY8bOt5WWud9bpfEE3iarKIZQI0RidSHylyaB4FRg8= =6soG -----END PGP SIGNATURE----- Merge tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "There are two main changes, one feature removal, some code cleanup, and a number of bug fixes. Main changes: - Detecting secure boot mode was limited to IMA. Make detecting secure boot mode accessible to EVM and other LSMs - IMA sigv3 support was limited to fsverity. Add IMA sigv3 support for IMA regular file hashes and EVM portable signatures Remove: - Remove IMA support for asychronous hash calculation originally added for hardware acceleration Cleanup: - Remove unnecessary Kconfig CONFIG_MODULE_SIG and CONFIG_KEXEC_SIG tests - Add descriptions of the IMA atomic flags Bug fixes: - Like IMA, properly limit EVM "fix" mode - Define and call evm_fix_hmac() to update security.evm - Fallback to using i_version to detect file change for filesystems that do not support STATX_CHANGE_COOKIE - Address missing kernel support for configured (new) TPM hash algorithms - Add missing crypto_shash_final() return value" * tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: evm: Enforce signatures version 3 with new EVM policy 'bit 3' integrity: Allow sigv3 verification on EVM_XATTR_PORTABLE_DIGSIG ima: add support to require IMA sigv3 signatures ima: add regular file data hash signature version 3 support ima: Define asymmetric_verify_v3() to verify IMA sigv3 signatures ima: remove buggy support for asynchronous hashes integrity: Eliminate weak definition of arch_get_secureboot() ima: Add code comments to explain IMA iint cache atomic_flags ima_fs: Correctly create securityfs files for unsupported hash algos ima: check return value of crypto_shash_final() in boot aggregate ima: Define and use a digest_size field in the ima_algo_desc structure powerpc/ima: Drop unnecessary check for CONFIG_MODULE_SIG ima: efi: Drop unnecessary check for CONFIG_MODULE_SIG/CONFIG_KEXEC_SIG ima: fallback to using i_version to detect file change evm: fix security.evm for a file with IMA signature s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT evm: Don't enable fix mode when secure boot is enabled integrity: Make arch_ima_get_secureboot integrity-wide	2026-04-17 15:42:01 -07:00
Linus Torvalds	01f492e181	Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree. - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM. - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported. This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor. - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable. - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis. - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow. - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups. - A small set of patches fixing the Stage-2 MMU freeing in error cases. - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls. - The usual cleanups and other selftest churn. LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel(). - Add DMSINTC irqchip in kernel support. RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors. - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code. - Fix LPSW/E to update the bear (which of course is the breaking event address register). x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized. - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes. - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage. x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace. - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier"). - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O. - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times. - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM. - Exempt SMM from CPUID faulting on Intel, as per the spec. - Misc hardening and cleanup changes. x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs. - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs. - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input. - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION. - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault. - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl. - Convert a pile of kvm->lock SEV code to guard(). - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6. - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2. - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE. - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore. - Add a variety of missing nSVM consistency checks. - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT. - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions. - Add support for save+restore of virtualized LBRs (on SVM). - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain. - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features. - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests. - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses). - Cache all used vmcb12 fields to further harden against TOCTOU bugs. x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros. - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate. - Code cleanups. guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to. LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests. x86 selftests: - Add support for Hygon CPUs in KVM selftests. - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP. - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnftRQUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAzwf+NKO4Ktv+7A22ImN0SBl0nlUuulsz vTcw3+hxdRoIw83GdNS+hG5js0wrpMDnbv3t4+VliDNBSSxrBzcSWX2wpilW0Xtw qGo1MWhs2lKPy1NlaRVOwPS6j7uF3AR0TQ1iQLGMedQuCU9WpiKJxyhNXJdbLrt3 8EgFzsvtEsv+jKNRUNDf9+d0j4gZsFyIe+Brhianbw+u3/UCiUClLCdsKPc4+5ZX 08otYXytacGNIf/5Ev1vT4pHkHL0yqKXAtX7LEtaS3+0KrPuLjV4slemivzE9vf5 Evafm5AhA4wpaNMb1ZerhY3T94lsMaJpWxotjR//0Q7C9B59pCQnXCm8mg== =CcE0 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...	2026-04-17 07:18:03 -07:00
Linus Torvalds	334fbe734e	mm.git review status for linus..mm-stable Everything: Total patches: 368 Reviews/patch: 1.56 Reviewed rate: 74% Excluding DAMON: Total patches: 316 Reviews/patch: 1.77 Reviewed rate: 81% Excluding DAMON and zram: Total patches: 306 Reviews/patch: 1.81 Reviewed rate: 82% Excluding DAMON, zram and maple_tree: Total patches: 276 Reviews/patch: 2.01 Reviewed rate: 91% Significant patch series in this merge: - The 30 patch series "maple_tree: Replace big node with maple copy" from Liam Howlett is mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - The 12 patch series "mm, swap: swap table phase III: remove swap_map" from Kairui Song offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush Yadav adds file seal preservation to LUO's memfd code. - The 2 patch series "mm: zswap: add per-memcg stat for incompressible pages" from Jiayuan Chen adds additional userspace stats reportng to zswap. - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike Rapoport implements some cleanups for our handling of ZERO_PAGE() and zero_pfn. - The 2 patch series "mm/kmemleak: Improve scan_should_stop() implementation" from Zhongqiu Han provides an robustness improvement and some cleanups in the kmemleak code. - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang "improves the khugepaged scan logic and reduces CPU consumption by prioritizing scanning tasks that access memory frequently". - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies Kexec Handover by "transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel" - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's tracepointing. - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation. - The 2 patch series "Fix KASAN support for KHO restored vmalloc regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO restores a vmalloc area. - The 4 patch series "mm: Remove stray references to pagevec" from Tal Zussman provides several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago. - The 17 patch series "mm: Eliminate fake head pages from vmemmap optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page. - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters" from SeongJae Park improves two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. - The 3 patch series "mm/damon: strictly respect min_nr_regions" from SeongJae Park improves DAMON usability by extending the treatment of the min_nr_regions user-settable parameter. - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil Babka is a proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ennsed. - The 16 patch series "mm: cleanups around unmapping / zapping" from David Hildenbrand implements "a bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions". - The 6 patch series "support batched checking of the young flag for MGLRU" from Baolin Wang supports batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64. - The 5 patch series "memcg: obj stock and slab stat caching cleanups" from Johannes Weiner provides memcg cleanup and robustness improvements. - The 5 patch series "Allow order zero pages in page reporting" from Yuvraj Sakshith enhances page_reporting's free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is cleanup work following from the recent conversion of the VMA flags to a bitmap. - The 10 patch series "mm/damon: add optional debugging-purpose sanity checks" from SeongJae Park adds some more developer-facing debug checks into DAMON core. - The 2 patch series "mm/damon: test and document power-of-2 min_region_sz requirement" from SeongJae Park adds an additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling. - The 3 patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time overflow issue in DAMON core. - The 7 patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation" from SeongJae Park is a "batch of misc/minor improvements and fixups" for DAMON. - The 4 patch series "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - The 6 patch series "zram: recompression cleanups and tweaks" from Sergey Senozhatsky provides "a somewhat random mix of fixups, recompression cleanups and improvements" in the zram code. - The 11 patch series "mm/damon: support multiple goal-based quota tuning algorithms" from SeongJae Park extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select. - The 4 patch series "mm: thp: reduce unnecessary start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged. - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes provides some cleanups and slight fixes in the mremap, mmap and vma code. - The 5 patch series "mm/damon: support addr_unit on default monitoring targets for modules" from SeongJae Park extends the use of DAMON core's addr_unit tunable. - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites" from Nico Pache provides cleanups in the khugepaged and is a base for Nico's planned khugepaged mTHP support. - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups" from David Hildenbrand implements code movement and cleanups in the memhotplug and sparsemem code. - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some memhotplug Kconfig support. - The 6 patch series "change young flag check functions to return bool" from Baolin Wang is "a cleanup patchset to change all young flag check functions to return bool". - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues" from Josh Law and SeongJae Park fixes a few potential DAMON bugs. - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code" from "converts a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it". Mainly in the vma code. - The 21 patch series "mm: expand mmap_prepare functionality and usage" from Lorenzo Stoakes "expands the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time". Cleanups, documentation, extension of mmap_prepare into filesystem drivers. - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from Lorenzo Stoakes simplifies and cleans up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ= =1Q7o -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...	2026-04-15 12:59:16 -07:00
Linus Torvalds	7de6b4a246	workqueue: Changes for v7.1 - New default WQ_AFFN_CACHE_SHARD affinity scope subdivides LLCs into smaller shards to improve scalability on machines with many CPUs per LLC. - Misc: system_dfl_long_wq for long unbound works, devm_alloc_workqueue() for device-managed allocation, sysfs exposure for ordered workqueues and the EFI workqueue, removal of HK_TYPE_WQ from wq_unbound_cpumask, and various small fixes. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCad0npw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGdZmAQD4BbIhTGKGcq89jwQRQpmUIZK6yIIWwd0cSvLC Biko2AD9FP2M9bqUzo2cZ83AfSC4LTK020e9VmsZStkw+u0s3ws= =cSEW -----END PGP SIGNATURE----- Merge tag 'wq-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: - New default WQ_AFFN_CACHE_SHARD affinity scope subdivides LLCs into smaller shards to improve scalability on machines with many CPUs per LLC - Misc: - system_dfl_long_wq for long unbound works - devm_alloc_workqueue() for device-managed allocation - sysfs exposure for ordered workqueues and the EFI workqueue - removal of HK_TYPE_WQ from wq_unbound_cpumask - various small fixes * tag 'wq-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (21 commits) workqueue: validate cpumask_first() result in llc_populate_cpu_shard_id() workqueue: use NR_STD_WORKER_POOLS instead of hardcoded value workqueue: avoid unguarded 64-bit division docs: workqueue: document WQ_AFFN_CACHE_SHARD affinity scope workqueue: add test_workqueue benchmark module tools/workqueue: add CACHE_SHARD support to wq_dump.py workqueue: set WQ_AFFN_CACHE_SHARD as the default affinity scope workqueue: add WQ_AFFN_CACHE_SHARD affinity scope workqueue: fix typo in WQ_AFFN_SMT comment workqueue: Remove HK_TYPE_WQ from affecting wq_unbound_cpumask workqueue: unlink pwqs from wq->pwqs list in alloc_and_link_pwqs() error path workqueue: Remove NULL wq WARN in __queue_delayed_work() workqueue: fix parse_affn_scope() prefix matching bug workqueue: devres: Add device-managed allocate workqueue workqueue: Add system_dfl_long_wq for long unbound works tools/workqueue/wq_dump.py: add NODE prefix to all node columns tools/workqueue/wq_dump.py: fix column alignment in node_nr/max_active section tools/workqueue/wq_dump.py: remove backslash separator from node_nr/max_active header efi: Allow to expose the workqueue via sysfs workqueue: Allow to expose ordered workqueues via sysfs ...	2026-04-15 10:32:08 -07:00
Linus Torvalds	e9635f2a73	- We made the FRED support an opt-in initially out of fear of it breaking machines left and right in the case of a hw bug in the first generation of machines supporting it. Now that that the FRED code has seen a lot of hammering, flip the logic to be opt-out as is the usual case with new hw features -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmndQUQACgkQEsHwGGHe VUpSQRAAqn7bxSqhZoLyzejovGxTSvHqSMaJGDhnkafA1xUNsiAvgONdMyQqxq3r XNQuLST/BCtYqllM7YLfJkKWAV+rpdp6teLA7QnxbU4BFEPwzKQ9OFDeiFDOTBMq 3Bde+t4YbwB05PpwPN8veZEEIXVDxPzzHe9I4M4x5VdsXhDHYVKuonR/DCAK1rpQ UKIXNhDGEcBdSCauOujtNwDZcmYTtcVpxWFkIwzOW0CiM8n/ys6forBTK90W3wg2 geL9r0LZqgBfeYdL16qqFtOCvqpE1pF9ot8JrPaQZa8r/Zw/wJ06faSjQcygQg/d 8gEJzC92eSU/IbwPdJBw/25D5vf/ZXztUgw2DDXCzQ6tBZ7N584YamVpHbh/WFRg degZ2Qu9AnLZquknBwJqp9y3nx8NQvIjsJZ/Q04+USh73LBlFhqHxwOcMGQ7yJHV nPhM9idlzcyTamjiJ1xw9WKx/07sI8P5SNmnte+XWJkbfwM5+mkhIuaRwH7IfRXu y/IODC3ulOLIKYyWN4ybUZnVXbUou5Fsz6o0tv6msAjStUj0O/mv/ImfiDLi6unU 73NQdXzYlFz6YKBvXfQxvjP3/r48i4XWeCc1wJMDAeOegXsBbJRDBuv27bE4kujp +7apQCe8BVyQDk0M+DdlnMUJRt0w6NHNakB671wJl+ooFDBENas= =XX9t -----END PGP SIGNATURE----- Merge tag 'x86_fred_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FRED updates from Borislav Petkov: "We made the FRED support an opt-in initially out of fear of it breaking machines left and right in the case of a hw bug in the first generation of machines supporting it. Now that that the FRED code has seen a lot of hammering, flip the logic to be opt-out as is the usual case with new hw features" * tag 'x86_fred_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fred: Remove kernel log message when initializing exceptions x86/fred: Enable FRED by default	2026-04-14 14:50:51 -07:00
Linus Torvalds	9f2bb6c7b3	- Complete LASS enabling: deal with vsyscall and EFI - Clean up CPUID usage in newer Intel audio driver -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmndDsMACgkQaDWVMHDJ krAOhRAAi3BsToTbmKayQJKWwVr5aGbLYXe3FiBv5juPprsJDF4otd0ALkdbf5Ls n7nND4L9RBfcGZBu25vo60Y1I1b2JReu9Y8YXxsrEx+WIxTPYClOJa7AqakyNxc4 zbTCTv6TuEyXqG5VWe3AItWrMToc49TvUVoN2fi+V89fLWA8IOkMoIhbAWwQPBFI ir0N4UwC/RtRBFf9qc9jesirqGSgh6SEAK6oBYM3C2PV/njoKljTconaCnF7uJlq WxxVcb07dIvOdpdE941OZ6jyS6eePr2feXOazNjnPb0py5Ai4WCsEGdK68ltQ5fM EhY24Ex6ltudgphB/cajKIKZ8LUCWgnMwTotY8IQMQiQ47JKmyrucdLUwvhl0qTD IAWiOMvb5d3X+CoKt3lXlzc29WhbogXzvxjZE29ad8Dm7DVBEOV57bXDwhnHb7LS rro7odiohsw7FMhfqxlb6NhYzCbdpBbmY3IFzXVgTZlWIXU1g/yBVz/2O0qsKLaw QQ2NKre6lioP2H3pWc0fi53336svzCSuuExCiu47Hx9WcYBX1Poq/AkpAoxPErq+ DRn8lXVwqMpefevlHK9XlZjSpwwwltULmId3LFxh3Z53b1NABrKpVEEJ/VChpzzA xC8dLdu1pbJa2jdccL6mBDYlMvyWLCmKfAlAHdRh8nLmOZxKRHg= =kPr7 -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Dave Hansen: - Complete LASS enabling: deal with vsyscall and EFI The existing Linear Address Space Separation (LASS) support punted on support for common EFI and vsyscall configs. Complete the implementation by supporting EFI and vsyscall=xonly. - Clean up CPUID usage in newer Intel "avs" audio driver and update the x86-cpuid-db file * tag 'x86_cpu_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/x86/kcpuid: Update bitfields to x86-cpuid-db v3.0 ASoC: Intel: avs: Include CPUID header at file scope ASoC: Intel: avs: Check maximum valid CPUID leaf x86/cpu: Remove LASS restriction on vsyscall emulation x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE x86/vsyscall: Restore vsyscall=xonly mode under LASS x86/traps: Consolidate user fixups in the #GP handler x86/vsyscall: Reorganize the page fault emulation code x86/cpu: Remove LASS restriction on EFI x86/efi: Disable LASS while executing runtime services x86/cpu: Defer LASS enabling until userspace comes up	2026-04-14 14:24:45 -07:00
Linus Torvalds	c1fe867b5b	Updates for the timer/timekeeping core: - A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer. - Better timer locality decision - Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided. - Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale. - Support for clocksource coupled clockevents The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR. As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled. - Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR. With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations. - Robustness improvements and cleanups of historical sins in the hrtimer and timekeeping code. - Rewrite of the clocksource watchdog. The clocksource watchdog code has over time reached the state of an impenetrable maze of duct tape and staples. The original design, which was made in the context of systems far smaller than today, is based on the assumption that the to be monitored clocksource (TSC) can be trivially compared against a known to be stable clocksource (HPET/ACPI-PM timer). Over the years this rather naive approach turned out to have major flaws. Long delays between the watchdog invocations can cause wrap arounds of the reference clocksource. The access to the reference clocksource degrades on large multi-sockets systems dure to interconnect congestion. This has been addressed with various heuristics which degraded the accuracy of the watchdog to the point that it fails to detect actual TSC problems on older hardware which exposes slow inter CPU drifts due to firmware manipulating the TSC to hide SMI time. The rewrite addresses this by: - Restricting the validation against the reference clocksource to the boot CPU which is usually closest to the legacy block which contains the reference clocksource (HPET/ACPI-PM). - Do a round robin validation betwen the boot CPU and the other CPUs based only on the TSC with an algorithm similar to the TSC synchronization code during CPU hotplug. - Being more leniant versus remote timeouts - The usual tiny fixes, cleanups and enhancements all over the place -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmnbxdMQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoeq6EAC4h9wuBr5yCkxmog1Bhlk9cnK0oX1THb7V Q4z6DrYAiDXP6z4IDwqSW+3vvmNw1QXOeqpyMTiiIcQ5mNSs1IDnCt5HOEwY+ICm fiSUMYkXkH6xdFWspYWFkD7aExHJRT3hd/bo+WnXGHhHclPj5NHZssLMIDboHrzX jLV1hljmthfwg/uOXDGmQUPRFjqr2ZQjo7zGA5SwfVg8Krz7g/qRVy2wUns9TdW/ NYwihDm1YV7qkK/+f1GnMdd70toqb1OZo/fS9FPbBrPLdyi8V+UbnFSUeZu8kCwV KubAzjLZR4xYCnrlaHhoi208GMd0TOvHMOrdAA0zkQHfhmszGl4N0pbF/EI29Ft2 tQG/FUTG+nzgNOrMCPN2nr3u/UOXLP+gO+2hkyyQjqUP35IyaTYQn10JgBmPTdJL Ab6E8WL9gTMCd+t/bVjdU/B8W9ruMihKBtWkTfMBCcQ9uNJFCEGzrcMF8hzFYRTs /4rMDr3NlGYydAnbKPj6bkC5gtjBvh/L08kOdUFyXCMSqIzvJkZJ4241ogl1Awi6 VfdwjF5KZCQo3M1ujpep+1L010wC/yulqLt2brQMO9Nt05dRhgwM3lxy7cnlMNm3 NdfMgi+OG0CzQ+ZUpvo20hCgTDUVgWN9g5R3rar8FJX+Ym3T+ZoEKlShZF+fSRjf YAUIbUyi7A== =2qc8 -----END PGP SIGNATURE----- Merge tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer: - Better timer locality decision - Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided. - Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale. - Support for clocksource coupled clockevents The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR. As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled. - Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR. With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations. - Robustness improvements and cleanups of historical sins in the hrtimer and timekeeping code. - Rewrite of the clocksource watchdog. The clocksource watchdog code has over time reached the state of an impenetrable maze of duct tape and staples. The original design, which was made in the context of systems far smaller than today, is based on the assumption that the to be monitored clocksource (TSC) can be trivially compared against a known to be stable clocksource (HPET/ACPI-PM timer). Over the years this rather naive approach turned out to have major flaws. Long delays between the watchdog invocations can cause wrap arounds of the reference clocksource. The access to the reference clocksource degrades on large multi-sockets systems dure to interconnect congestion. This has been addressed with various heuristics which degraded the accuracy of the watchdog to the point that it fails to detect actual TSC problems on older hardware which exposes slow inter CPU drifts due to firmware manipulating the TSC to hide SMI time. The rewrite addresses this by: - Restricting the validation against the reference clocksource to the boot CPU which is usually closest to the legacy block which contains the reference clocksource (HPET/ACPI-PM). - Do a round robin validation betwen the boot CPU and the other CPUs based only on the TSC with an algorithm similar to the TSC synchronization code during CPU hotplug. - Being more leniant versus remote timeouts - The usual tiny fixes, cleanups and enhancements all over the place * tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) alarmtimer: Access timerqueue node under lock in suspend hrtimer: Fix incorrect #endif comment for BITS_PER_LONG check posix-timers: Fix stale function name in comment timers: Get this_cpu once while clearing the idle state clocksource: Rewrite watchdog code completely clocksource: Don't use non-continuous clocksources as watchdog x86/tsc: Handle CLOCK_SOURCE_VALID_FOR_HRES correctly MIPS: Don't select CLOCKSOURCE_WATCHDOG parisc: Remove unused clocksource flags hrtimer: Add a helper to retrieve a hrtimer from its timerqueue node hrtimer: Remove trailing comma after HRTIMER_MAX_CLOCK_BASES hrtimer: Mark index and clockid of clock base as const hrtimer: Drop unnecessary pointer indirection in hrtimer_expire_entry event hrtimer: Drop spurious space in 'enum hrtimer_base_type' hrtimer: Don't zero-initialize ret in hrtimer_nanosleep() hrtimer: Remove hrtimer_get_expires_ns() timekeeping: Mark offsets array as const timekeeping/auxclock: Consistently use raw timekeeper for tk_setup_internals() timer_list: Print offset as signed integer tracing: Use explicit array size instead of sentinel elements in symbol printing ...	2026-04-14 10:27:07 -07:00
Linus Torvalds	5181afcdf9	A busier cycle than I had expected for docs, including: - Translations: some overdue updates to the Japanese translations, Chinese translations for some of the Rust documentation, and the beginnings of a Portuguese translation. - New documents covering CPU isolation, managed interrupts, debugging Python gbb scripts, and more. - More tooling work from Mauro, reducing docs-build warnings, adding self tests, improving man-page output, bringing in a proper C tokenizer to replace (some of) the mess of kernel-doc regexes, and more. - Update and synchronize changes.rst and scripts/ver_linux, and put both into alphabetical order. ...and a long list of documentation updates, typo fixes, and general improvements. -----BEGIN PGP SIGNATURE----- iQFDBAABCgAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmnb9GkPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YqnEH/R+9jPgcEsdaLGXxM0Li/obLMw2bzj+FhLOT tpS53AcK34ObVg9Xe5CsrQyjaI7Advy5QevLTqC3ZpfN7sxAtCZ7alUT/u1pk5+u P+OoB/jyD55KA/c5jgIQlgBn574fjaDIVD6W+sf0g9MhmOnDa/7EaOIEdvm+Xpey Dxx/Sq0jBk4PziqeCz03txsh//+O/R/wXoRbHeqKIbs3XQt1DF1LLRc+Ni8RYBCB PpCacoe1RpvaSG7CqXCwPB1bbiEPXRU3oiW2SPFcn0j4WyNtEJkv56g2lpzzWyCQ Gc837Mmu29GifQyH4Vi0bnvLtFBC/x+rJKsvCmp8jFfu6sMpuew= =liXF -----END PGP SIGNATURE----- Merge tag 'docs-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux Pull documentation updates from Jonathan Corbet: "A busier cycle than I had expected for docs, including: - Translations: some overdue updates to the Japanese translations, Chinese translations for some of the Rust documentation, and the beginnings of a Portuguese translation. - New documents covering CPU isolation, managed interrupts, debugging Python gbb scripts, and more. - More tooling work from Mauro, reducing docs-build warnings, adding self tests, improving man-page output, bringing in a proper C tokenizer to replace (some of) the mess of kernel-doc regexes, and more. - Update and synchronize changes.rst and scripts/ver_linux, and put both into alphabetical order. ... and a long list of documentation updates, typo fixes, and general improvements" * tag 'docs-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (162 commits) Documentation: core-api: real-time: correct spelling doc: Add CPU Isolation documentation Documentation: Add managed interrupts Documentation: seq_file: drop 2.6 reference docs/zh_CN: update rust/index.rst translation docs/zh_CN: update rust/quick-start.rst translation docs/zh_CN: update rust/coding-guidelines.rst translation docs/zh_CN: update rust/arch-support.rst translation docs/zh_CN: sync process/2.Process.rst with English version docs/zh_CN: fix an inconsistent statement in dev-tools/testing-overview tracing: Documentation: Update histogram-design.rst for fn() handling docs: sysctl: Add documentation for /proc/sys/xen/ Docs: hid: intel-ish-hid: make long URL usable Documentation/kernel-parameters: fix architecture alignment for pt, nopt, and nobypass sched/doc: Update yield_task description in sched-design-CFS Documentation/rtla: Convert links to RST format docs: fix typos and duplicated words across documentation docs: fix typo in zoran driver documentation docs: add an Assisted-by mention to submitting-patches.rst Revert "scripts/checkpatch: add Assisted-by: tag validation" ...	2026-04-14 08:47:08 -07:00
Linus Torvalds	d7c8087a9c	Power management updates for 7.1-rc1 - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa) - Update cpufreq-dt-platdev blocklist (Faruque Ansari) - Minor updates to driver and dt-bindings for Tegra (Thierry Reding, Rosen Penev) - Add MAINTAINERS entry for CPPC driver (Viresh Kumar) - Add support for new features: CPPC performance priority, Dynamic EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy, Mario Limonciello) - Fix sysfs files being present when HW missing and broken/outdated documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy) - Pass the policy to cpufreq_driver->adjust_perf() to avoid using cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which leads to a scheduling-while-atomic bug (K Prateek Nayak) - Clean up dead code in Kconfig for cpufreq (Julian Braha) - Remove max_freq_req update for pre-existing cpufreq policy and add a boost_freq_req QoS request to save the boost constraint instead of overwriting the last scaling_max_freq constraint (Pierre Gondois) - Embed cpufreq QoS freq_req objects in cpufreq policy so they all are allocated in one go along with the policy to simplify lifetime rules and avoid error handling issues (Viresh Kumar) - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq scaling driver (Henry Tseng) - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead of cpumask_weight() because the former is more efficient (Yury Norov) - Use sysfs_emit() in sysfs show functions for cpufreq governor attributes (Thorsten Blum) - Update intel_pstate to stop returning an error when "off" is written to its status sysfs attribute while the driver is already off (Fabio De Francesco) - Include current frequency in the debug message printed by __cpufreq_driver_target() (Pengjie Zhang) - Refine stopped tick handling in the menu cpuidle governor and rearrange stopped tick handling in the teo cpuidle governor (Rafael Wysocki) - Add Panther Lake C-states table to the intel_idle driver (Artem Bityutskiy) - Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha) - Simplify cpuidle_register_device() with guard() (Huisong Li) - Use performance level if available to distinguish between rates in OPP debugfs (Manivannan Sadhasivam) - Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar) - Return -ENODATA if the snapshot image is not loaded (Alberto Garcia) - Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric Biggers) - Clean up and rearrange the intel_rapl power capping driver to make the respective interface drivers (TPMI, MSR, and MMOI) hold their own settings and primitives and consolidate PL4 and PMU support flags into rapl_defaults (Kuppuswamy Sathyanarayanan) - Correct kernel-doc function parameter names in the power capping core code (Randy Dunlap) - Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko) - Use _visible attribute to replace create/remove_sysfs_files() in devfreq (Pengjie Zhang) - Add Tegra114 support to activity monitor device in tegra30-devfreq as a preparation to upcoming EMC controller support (Svyatoslav Ryhel) - Fix mistakes in cpupower man pages, add the boost and epp options to the cpupower-frequency-info man page, and add the perf-bias option to the cpupower-info man page (Roberto Ricci) - Remove unnecessary extern declarations from getopt.h in arguments parsing functions in cpufreq-set, cpuidle-info, cpuidle-set, cpupower-info, and cpupower-set utilities (Kaushlendra Kumar) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmnY9TISHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1G9gH/j5mEqfPpiwX6fQ/ZwOGdNOOPVA5w9j4 KPHSMwMD5lZkoaZfasp2vt27KY5SOoVVvRZ2DKkFJ3Jai4I3cUPZYypga2nre1ag tgzX4vOjcw2r40Eda6ezWl1h4mca/xJJBX7xH2+hn1JY+Y1in37g50CqMIjKh96z Uugkk6UZytL1XcF55PMhIUgDf6pDtRT5UOW9xOKOkUt8FVWTJ7ei3HaWyV5kDmVq b5eQ42+OH7y6sWNnoKczFd8fStvh6J/avoJurBEvcOQhMcjaIaB48G19+KjDg73E NjrVcgG20P2rltBvV2d0J1TKskZHkaP7XjIeWfkwjGZhee3FL7ssS/g= =fRCO -----END PGP SIGNATURE----- Merge tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Once again, cpufreq is the most active development area, mostly because of the new feature additions and documentation updates in the amd-pstate driver, but there are also changes in the cpufreq core related to boost support and other assorted updates elsewhere. Next up are power capping changes due to the major cleanup of the Intel RAPL driver. On the cpuidle front, a new C-states table for Intel Panther Lake is added to the intel_idle driver, the stopped tick handling in the menu and teo governors is updated, and there are a couple of cleanups. Apart from the above, support for Tegra114 is added to devfreq and there are assorted cleanups of that code, there are also two updates of the operating performance points (OPP) library, two minor updates related to hibernation, and cpupower utility man pages updates and cleanups. Specifics: - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa) - Update cpufreq-dt-platdev blocklist (Faruque Ansari) - Minor updates to driver and dt-bindings for Tegra (Thierry Reding, Rosen Penev) - Add MAINTAINERS entry for CPPC driver (Viresh Kumar) - Add support for new features: CPPC performance priority, Dynamic EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy, Mario Limonciello) - Fix sysfs files being present when HW missing and broken/outdated documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy) - Pass the policy to cpufreq_driver->adjust_perf() to avoid using cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which leads to a scheduling-while-atomic bug (K Prateek Nayak) - Clean up dead code in Kconfig for cpufreq (Julian Braha) - Remove max_freq_req update for pre-existing cpufreq policy and add a boost_freq_req QoS request to save the boost constraint instead of overwriting the last scaling_max_freq constraint (Pierre Gondois) - Embed cpufreq QoS freq_req objects in cpufreq policy so they all are allocated in one go along with the policy to simplify lifetime rules and avoid error handling issues (Viresh Kumar) - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq scaling driver (Henry Tseng) - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead of cpumask_weight() because the former is more efficient (Yury Norov) - Use sysfs_emit() in sysfs show functions for cpufreq governor attributes (Thorsten Blum) - Update intel_pstate to stop returning an error when "off" is written to its status sysfs attribute while the driver is already off (Fabio De Francesco) - Include current frequency in the debug message printed by __cpufreq_driver_target() (Pengjie Zhang) - Refine stopped tick handling in the menu cpuidle governor and rearrange stopped tick handling in the teo cpuidle governor (Rafael Wysocki) - Add Panther Lake C-states table to the intel_idle driver (Artem Bityutskiy) - Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha) - Simplify cpuidle_register_device() with guard() (Huisong Li) - Use performance level if available to distinguish between rates in OPP debugfs (Manivannan Sadhasivam) - Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar) - Return -ENODATA if the snapshot image is not loaded (Alberto Garcia) - Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric Biggers) - Clean up and rearrange the intel_rapl power capping driver to make the respective interface drivers (TPMI, MSR, and MMOI) hold their own settings and primitives and consolidate PL4 and PMU support flags into rapl_defaults (Kuppuswamy Sathyanarayanan) - Correct kernel-doc function parameter names in the power capping core code (Randy Dunlap) - Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko) - Use _visible attribute to replace create/remove_sysfs_files() in devfreq (Pengjie Zhang) - Add Tegra114 support to activity monitor device in tegra30-devfreq as a preparation to upcoming EMC controller support (Svyatoslav Ryhel) - Fix mistakes in cpupower man pages, add the boost and epp options to the cpupower-frequency-info man page, and add the perf-bias option to the cpupower-info man page (Roberto Ricci) - Remove unnecessary extern declarations from getopt.h in arguments parsing functions in cpufreq-set, cpuidle-info, cpuidle-set, cpupower-info, and cpupower-set utilities (Kaushlendra Kumar)" * tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits) cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP cpupower: remove extern declarations in cmd functions cpuidle: Simplify cpuidle_register_device() with guard() PM / devfreq: tegra30-devfreq: add support for Tegra114 PM / devfreq: use _visible attribute to replace create/remove_sysfs_files() PM / devfreq: Remove unneeded casting for HZ_PER_KHZ MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer cpufreq: Pass the policy to cpufreq_driver->adjust_perf() cpufreq/amd-pstate: Pass the policy to amd_pstate_update() cpufreq/amd-pstate-ut: Add a unit test for raw EPP cpufreq/amd-pstate: Add support for raw EPP writes cpufreq/amd-pstate: Add support for platform profile class cpufreq/amd-pstate: add kernel command line to override dynamic epp cpufreq/amd-pstate: Add dynamic energy performance preference Documentation: amd-pstate: fix dead links in the reference section cpufreq/amd-pstate: Cache the max frequency in cpudata Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count} Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file amd-pstate-ut: Add a testcase to validate the visibility of driver attributes ...	2026-04-13 19:47:52 -07:00
Rafael J. Wysocki	8e2308c1ae	Merge branches 'acpica', 'acpi-osl' and 'acpi-tables' Merge ACPICA updates, an ACPI OS service layer (OSL) update and assorted updates related to parsing ACPI tables for 7.1-rc1: - Update maintainers information regarding ACPICA (Rafael Wysocki) - Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy() (Kees Cook) - Trigger an ordered system power off after encountering a fatal error operator in AML (Armin Wolf) - Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao) - Remove the temporary stop-gap acpi_pptt_cache_v1_full structure from the ACPI PPTT parser (Ben Horgan) - Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate DeSimone) * acpica: ACPICA: Update maintainers information ACPICA: Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy() * acpi-osl: ACPI: OSL: Poweroff when encountering a fatal ACPI error * acpi-tables: ACPI: tables: Enable FPDT on LoongArch Documentation: ABI: add FBPT and S3PT entries to sysfs-firmware-acpi ACPI: FPDT: expose FBPT and S3PT subtables via sysfs ACPI: PPTT: Remove duplicate structure, acpi_pptt_cache_v1_full	2026-04-09 21:19:34 +02:00
Li RongQing	15d49089e5	Documentation/kernel-parameters: fix architecture alignment for pt, nopt, and nobypass Commit `ab0e7f2076` ("Documentation: Merge x86-specific boot options doc into kernel-parameters.txt") introduced a formatting regression where architecture tags were placed on separate lines with broken indentation. This caused the 'nopt' [X86] parameter to appear as if it belonged to the [PPC/POWERNV] section. Furthermore, since the main 'iommu=' parameter heading already specifies it is for [X86, EARLY], the subsequent standalone [X86] tags for 'pt', 'nopt', and the AMD GART options are redundant and clutter the documentation. Clean up the formatting by removing these redundant tags and properly attributing the 'nobypass' option to [PPC/POWERNV]. Fixes: `ab0e7f2076` ("Documentation: Merge x86-specific boot options doc into kernel-parameters.txt") Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260330105957.2271-1-lirongqing@baidu.com>	2026-04-09 08:36:52 -06:00
Marco Elver	da735962d0	kfence: add kfence.fault parameter Add kfence.fault parameter to control the behavior when a KFENCE error is detected (similar in spirit to kasan.fault=<mode>). The supported modes for kfence.fault=<mode> are: - report: print the error report and continue (default). - oops: print the error report and oops. - panic: print the error report and panic. In particular, the 'oops' mode offers a trade-off between no mitigation on report and panicking outright (if panic_on_oops is not set). Link: https://lkml.kernel.org/r/20260225203639.3159463-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:53:06 -07:00
Mario Limonciello (AMD)	da8afb1c66	cpufreq/amd-pstate: add kernel command line to override dynamic epp Add `amd_dynamic_epp=enable` and `amd_dynamic_epp=disable` to override the kernel configuration option `CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP` locally. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-04-02 11:29:11 -05:00
Breno Leitao	41e3ccca00	docs: workqueue: document WQ_AFFN_CACHE_SHARD affinity scope Update kernel-parameters.txt and workqueue.rst to reflect the new cache_shard affinity scope and the default change from cache to cache_shard. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-01 10:24:18 -10:00
Will Deacon	287c6981f1	KVM: arm64: Add some initial documentation for pKVM Add some initial documentation for pKVM to help people understand what is supported, the limitations of protected VMs when compared to non-protected VMs and also what is left to do. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-33-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
H. Peter Anvin	ac66a73be0	x86/fred: Enable FRED by default When FRED was added to the mainline kernel, it was set up as an explicit opt-in due to the risk of regressions before hardware was available publicly. Now, Panther Lake (Core Ultra 300 series) has been released, and benchmarking by Phoronix has shown that it provides a significant performance benefit on most workloads: https://www.phoronix.com/review/intel-fred-panther-lake Accordingly, enable FRED by default if the CPU supports it. FRED can of course still be disabled via the fred=off command line option. Touch up Kconfig help too. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://patch.msgid.link/20260325230151.1898287-2-hpa@zytor.com	2026-03-27 16:04:47 +01:00
Ingo Molnar	f6472b1793	Linux 7.0-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmm3G/UeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZJUH/R0vQ3Vha48QDEic 1NREwaHxAoTFi0i3y7OPPklqrP2V09D1qg4Q6fExYQVTQgV6F2DRjVbyPKrmr4ay BA6aHrUdnFngYHpDlI1b1r7rJiAIN4WFHl7StO70bS+EB+UPsP9cfP3CKXUfKfqT kyHXzUrd5QnjYmlb9rQw1E6rzsRamNtGUtZf7TwDidJYjtm3sPeDHUkjyRy4xkYd UouIu6W7UXoicl38bJAgaWBY5BiYtjN6ktnY4/gcqDeqYd7mTM3Eb1B+OSXgFfip F0OYfJhfWn+63WnPA+1I5jXWC1UrdVXTMK/NTYjhmGlfdmkLcWDlNGtu+qKZbpwj fmF3Kyo= =6nX1 -----END PGP SIGNATURE----- Merge tag 'v7.0-rc4' into timers/core, to resolve conflict Resolve conflict between this change in the upstream kernel: `4c652a4772` ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline") ... and this pending change in timers/core: `0e98eb1481` ("entry: Prepare for deferred hrtimer rearming") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2026-03-21 08:02:36 +01:00
Thomas Gleixner	763aacf86f	clocksource: Rewrite watchdog code completely The clocksource watchdog code has over time reached the state of an impenetrable maze of duct tape and staples. The original design, which was made in the context of systems far smaller than today, is based on the assumption that the to be monitored clocksource (TSC) can be trivially compared against a known to be stable clocksource (HPET/ACPI-PM timer). Over the years it turned out that this approach has major flaws: - Long delays between watchdog invocations can result in wrap arounds of the reference clocksource - Scalability of the reference clocksource readout can degrade on large multi-socket systems due to interconnect congestion This was addressed with various heuristics which degraded the accuracy of the watchdog to the point that it fails to detect actual TSC problems on older hardware which exposes slow inter CPU drifts due to firmware manipulating the TSC to hide SMI time. To address this and bring back sanity to the watchdog, rewrite the code completely with a different approach: 1) Restrict the validation against a reference clocksource to the boot CPU, which is usually the CPU/Socket closest to the legacy block which contains the reference source (HPET/ACPI-PM timer). Validate that the reference readout is within a bound latency so that the actual comparison against the TSC stays within 500ppm as long as the clocks are stable. 2) Compare the TSCs of the other CPUs in a round robin fashion against the boot CPU in the same way the TSC synchronization on CPU hotplug works. This still can suffer from delayed reaction of the remote CPU to the SMP function call and the latency of the control variable cache line. But this latency is not affecting correctness. It only affects the accuracy. With low contention the readout latency is in the low nanoseconds range, which detects even slight skews between CPUs. Under high contention this becomes obviously less accurate, but still detects slow skews reliably as it solely relies on subsequent readouts being monotonically increasing. It just can take slightly longer to detect the issue. 3) Rewrite the watchdog test so it tests the various mechanisms one by one and validating the result against the expectation. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Daniel J Blueman <daniel@quora.org> Reviewed-by: Jiri Wiesner <jwiesner@suse.de> Reviewed-by: Daniel J Blueman <daniel@quora.org> Link: https://patch.msgid.link/20260123231521.926490888@kernel.org Link: https://patch.msgid.link/87h5qeomm5.ffs@tglx	2026-03-20 13:36:32 +01:00
Sohil Mehta	b36d1f53d9	x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE The EMULATE mode of vsyscall maps the vsyscall page with a high kernel address directly into user address space. Reading the vsyscall page in EMULATE mode would cause LASS to trigger a #GP. Fixing the LASS violation in EMULATE mode would require complex instruction decoding because the resulting #GP does include the necessary error information, and the vsyscall address is not readily available in the RIP. The EMULATE mode has been deprecated since 2022 and can only be enabled using the command line parameter vsyscall=emulate. See commit `bf00745e77` ("x86/vsyscall: Remove CONFIG_LEGACY_VSYSCALL_EMULATE") for details. At this point, no one is expected to be using this insecure mode. The rare usages that need it obviously do not care about security. Disable LASS when EMULATE mode is requested to avoid breaking legacy user software. Also, update the vsyscall documentation to reflect this. LASS will only be supported if vsyscall mode is set to XONLY (default) or NONE. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260309181029.398498-5-sohil.mehta@intel.com	2026-03-19 15:11:13 -07:00
Eric Biggers	7a60fe48af	ima: remove buggy support for asynchronous hashes IMA computes hashes using the crypto_shash or crypto_ahash API. The latter is used only when ima.ahash_minsize is set on the command line, and its purpose is ostensibly to make the hash computation faster. However, going off the CPU to a crypto engine and back again is actually quite slow, especially compared with the acceleration that is built into modern CPUs and the kernel now enables by default for most algorithms. Typical performance results for SHA-256 on a modern platform can be found at https://lore.kernel.org/linux-crypto/20250615184638.GA1480@sol/ Partly for this reason, several other kernel subsystems have already dropped support for the crypto_ahash API. The other problem with crypto_ahash is that bugs are also common, not just in the underlying drivers, but also in the code using it, since it is very difficult to use correctly. Just from a quick review, here are some of the bugs I noticed in IMA's ahash code: - [Use after free] ima_alloc_atfm() isn't thread-safe and can trigger a use-after-free if multiple threads try to initialize the global ima_ahash_tfm at the same time. - [Deadlock] If only one buffer is allocated and there is an error reading from the file, then ahash_wait() is executed twice, causing a deadlock in wait_for_completion(). - [Crash or incorrect hash computed] calc_buffer_ahash_atfm() is sometimes passed stack buffers which can be vmalloc addresses, but it puts them in a scatterlist assuming they are linear addresses. This causes the hashing to be done on the wrong physical address. - [Truncation to 32-bit length] ima_alloc_pages() incorrectly assumes an loff_t value fits in an unsigned long. calc_buffer_ahash_atfm() incorrectly assumes that a loff_t value fits in an unsigned int. So, not exactly a great track record so far, even disregarding driver bugs which are an even larger problem. Fortunately, in practice it's unlikely that many users are actually setting the ima.ahash_minsize kernel command-line parameter which enables this code. However, given that this code is almost certainly no longer useful (if it ever was), let's just remove it instead of attempting to fix all these issues. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>	2026-03-17 18:53:49 -04:00
Linus Torvalds	69237f8c1f	USB fixes for 7.0-rc4 Here is a large chunk of USB driver fixes for 7.0-rc4. Included in here are: - usb gadget reverts due to reported issues, and then a follow-on fix to hopefully resolve the reported overall problem - xhci driver fixes - dwc3 driver fixes - usb core "killable" bulk message api addition to fix a usbtmc driver bug where userspace could hang the driver for forever - small USB driver fixes for reported issues - new usb device quirks All except the last USB device quirk change have been in linux-next with no reported issues. That one came in too late, and is "obviously correct" :) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCabVKjw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynxIACfUxlNXBrfcAz2xTRqEvFSFKiLscoAoMnGZG1U l7DU2htiO4Gacf5g6Yvj =wXSL -----END PGP SIGNATURE----- Merge tag 'usb-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here is a large chunk of USB driver fixes for 7.0-rc4. Included in here are: - usb gadget reverts due to reported issues, and then a follow-on fix to hopefully resolve the reported overall problem - xhci driver fixes - dwc3 driver fixes - usb core "killable" bulk message api addition to fix a usbtmc driver bug where userspace could hang the driver for forever - small USB driver fixes for reported issues - new usb device quirks All except the last USB device quirk change have been in linux-next with no reported issues. That one came in too late, and is 'obviously correct' :)" * tag 'usb-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (35 commits) USB: ezcap401 needs USB_QUIRK_NO_BOS to function on 10gbs usb speed usb: roles: get usb role switch from parent only for usb-b-connector Revert "tcpm: allow looking for role_sw device in the main node" usb: gadget: f_ncm: Fix net_device lifecycle with device_move Revert "usb: gadget: u_ether: add gether_opts for config caching" Revert "usb: gadget: u_ether: use <linux/hex.h> header file" Revert "usb: gadget: u_ether: Add auto-cleanup helper for freeing net_device" Revert "usb: gadget: f_ncm: align net_device lifecycle with bind/unbind" Revert "usb: legacy: ncm: Fix NPE in gncm_bind" Revert "usb: gadget: f_ncm: Fix atomic context locking issue" usb: typec: altmode/displayport: set displayport signaling rate in configure message usb: dwc3: pci: add support for the Intel Nova Lake -H usb/core/quirks: Add Huawei ME906S-device to wakeup quirk usb: gadget: uvc: fix interval_duration calculation xhci: Fix NULL pointer dereference when reading portli debugfs files usb: xhci: Prevent interrupt storm on host controller error (HCE) usb: xhci: Fix memory leak in xhci_disable_slot() usb: class: cdc-wdm: fix reordering issue in read code path usb: renesas_usbhs: fix use-after-free in ISR during device removal usb: cdc-acm: Restore CAP_BRK functionnality to CH343 ...	2026-03-14 09:43:12 -07:00
Rafael J. Wysocki	f862f29196	Merge back ACPI OS services layer (OSL) material for 7.1	2026-03-12 21:16:46 +01:00
Jie Deng	9f6a983cfa	usb: core: new quirk to handle devices with zero configurations Some USB devices incorrectly report bNumConfigurations as 0 in their device descriptor, which causes the USB core to reject them during enumeration. logs: usb 1-2: device descriptor read/64, error -71 usb 1-2: no configurations usb 1-2: can't read configurations, error -22 However, these devices actually work correctly when treated as having a single configuration. Add a new quirk USB_QUIRK_FORCE_ONE_CONFIG to handle such devices. When this quirk is set, assume the device has 1 configuration instead of failing with -EINVAL. This quirk is applied to the device with VID:PID 5131:2007 which exhibits this behavior. Signed-off-by: Jie Deng <dengjie03@kylinos.cn> Link: https://patch.msgid.link/20260227084931.1527461-1-dengjie03@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-03-11 16:17:19 +01:00
Jens Axboe	d90c470b0e	nvme fixes for Linux 7.0 - Improve quirk visibility and configurability (Maurizio) - Fix runtime user modification to queue setup (Keith) - Fix multipath leak on try_module_get failure (Keith) - Ignore ambiguous spec definitions for better atomics support (John) - Fix admin queue leak on controller reset (Ming) - Fix large allocation in persistent reservation read keys (Sungwoo Kim) - Fix fcloop callback handling (Justin) - Securely free DHCHAP secrets (Daniel) - Various cleanups and typo fixes (John, Wilfred) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmmoSbMACgkQPe3zGtjz RgkpuQ/9EfCp24xowwKEXycX7pquojwjEAh1n5WsUyBDXQls/7Dq3w0EXtkc8fA8 SUcDpTj7ABiF/faschCoFO47R5/0TPtNMCleWFSdW0OG6B7IYaUt9Cj86JK1dzme Zn7luH47Pesmd+H184IOIfDhsiVs5Z3YCISlT1aa1EFg+3/neDqGGpT4+ySOjSZe 9j8ASUTOqfuBZ2Xc8RNvumABBEkEkUd4xwYTLRi+o/PR9econGrpiEqDyUBAf8dr VrZoL0aoQoUEaU08tJOci4GH3Spp4RXlpQo92RBE4yDTxWozRRBWwoCycmPKHQ5b +5nC77t1p2OyzgP0xPngQZVMi7A+QTFZf4shq0Xho5kifjB8ZTqVSJJSGK7RlwE4 GmXgHfMs8Gvn3aew8BcpXilhe4InXfY1LqYmTvJxo9VLK/u7apo94vrJICewHh2z lsiWTOHe9xSm8wR20fcxp3D3kXpQ5sMcMoco96dVFetw1WNE30qDy+xtpOvPwdL5 9mloguR7Pmsu+gVim2VaqSA8HsPIYEbXymLMVzTeVbtPALzrKsGLLW8k/DYFhSTm +Ow4KeItyL5hgDU2jenjS3xwshKqKTeJDueue4WBFxgqdbH9hwiJ6aVWS2eoJxev RAZXSGTmxEo8X5nDsNz048iT96lFpM7ERViHOWnrptLcFX4yFNM= =fMd5 -----END PGP SIGNATURE----- Merge tag 'nvme-7.0-2026-03-04' of git://git.infradead.org/nvme into block-7.0 Pull NVMe fixes from Keith: "- Improve quirk visibility and configurability (Maurizio) - Fix runtime user modification to queue setup (Keith) - Fix multipath leak on try_module_get failure (Keith) - Ignore ambiguous spec definitions for better atomics support (John) - Fix admin queue leak on controller reset (Ming) - Fix large allocation in persistent reservation read keys (Sungwoo Kim) - Fix fcloop callback handling (Justin) - Securely free DHCHAP secrets (Daniel) - Various cleanups and typo fixes (John, Wilfred)" * tag 'nvme-7.0-2026-03-04' of git://git.infradead.org/nvme: nvme: fix memory allocation in nvme_pr_read_keys() nvme-multipath: fix leak on try_module_get failure nvmet-fcloop: Check remoteport port_state before calling done callback nvme-pci: do not try to add queue maps at runtime nvme-pci: cap queue creation to used queues nvme-pci: ensure we're polling a polled queue nvme: fix memory leak in quirks_param_set() nvme: correct comment about nvme_ns_remove() nvme: stop setting namespace gendisk device driver data nvme: add support for dynamic quirk configuration via module parameter nvme: fix admin queue leak on controller reset nvme-fabrics: use kfree_sensitive() for DHCHAP secrets nvme: stop using AWUPF nvme: expose active quirks in sysfs nvme/host: fixup some typos	2026-03-04 08:15:17 -07:00
Armin Wolf	591230c6f2	ACPI: OSL: Poweroff when encountering a fatal ACPI error The ACPI spec states that the operating system should respond to a fatal ACPI error by "performing a controlled OS shutdown in a timely fashion". Comply with the ACPI specification by powering off the system when ACPICA signals a fatal ACPI error. Users can still disable this behavior by using the acpi.poweroff_on_fatal kernel option to work around firmware bugs. Link: https://uefi.org/specs/ACPI/6.6/19_ASL_Reference.html#fatal-fatal-error-check Signed-off-by: Armin Wolf <W_Armin@gmx.de> [ rjw: Dropped the new Kconfig option, adjusted header file inclusions ] Link: https://patch.msgid.link/20260204212931.3860-1-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-02-27 21:24:41 +01:00
Pranav Kharche	4058b259b9	docs: kernel-parameters: Fix repeated word in initramfs_options entry Remove the duplicate word 'for' in the initramfs_options description. Signed-off-by: Pranav Kharche <pranavkharche7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260225132951.36624-1-pranavkharche7@gmail.com>	2026-02-25 10:35:46 -07:00
Linus Torvalds	64275e9fda	LoongArch changes for v7.0 1, Select HAVE_CMPXCHG_{LOCAL,DOUBLE}; 2, Add 128-bit atomic cmpxchg support; 3, Add HOTPLUG_SMT implementation; 4, Wire up memfd_secret system call; 5, Fix boot errors and unwind errors for KASAN; 6, Use BPF prog pack allocator and add BPF arena support; 7, Update dts files to add nand controllers; 8, Some bug fixes and other small changes. -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmmMTnsWHGNoZW5odWFj YWlAa2VybmVsLm9yZwAKCRAChivD8uImeo0AEACFniyK/cbaBchYAONJb5TxXcW6 7pvFEAbNrTzvQ8TTGpt+EBsOZlqE+y/afB/NlR06Aow8ifvUnOxJu9Ur1afo2r6A syB3Y7OsuUd8nxsATgrfJrNZnqq30dCJWxnBlP+YCCHQ2FFjLHIGcheRNM7rTrzd LvGCnBwHSKmKv5wGxsDJufYxbHgeb4YvrwZiNJC0ELRM9VqMSCogkIlayJrfC26S Or89+6i2XLC3K+Rrd1MgPp2HX6W9utzhB7kSmro0piUyX6F5UtL1YGHC9t1hamIZ yuTStXOZA2bYQPwEmXNNVucX8UfmPOeUQgl0P0n8XG09RGq0uNKFhfkSy9d+lxUl 2jftUZGujgV3/RsehrsKcto1ZBwwd2FyKL7uLWucuop+XJvrqIus/hsR+M2FI9IY 6sngOJZkKWfxMECTL7+FAMOGuxnghRk0VBZRJ8PqHTU/9YkKLQf0iyYqmvl+wOgu ByJmEapmVdrdGG78zUHsMDAqUFo518ixABhExWuqwEE2/zSj2jQIliIAcHRSJkvT ZOW1CZBX54AuFfRvjelYucSz1Q89lHC7U9WjYkte8Rv4tyPOYnTUmg3ouBPm5W2+ MuPVt1Y4rJN8RnD+1sSHJa4laMo7gZN2Cr4LsELc0mURxOfRK/hU1bjA9JM4mv2X 2L69IvQDbaG2H61qBQ== =WKfm -----END PGP SIGNATURE----- Merge tag 'loongarch-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Select HAVE_CMPXCHG_{LOCAL,DOUBLE} - Add 128-bit atomic cmpxchg support - Add HOTPLUG_SMT implementation - Wire up memfd_secret system call - Fix boot errors and unwind errors for KASAN - Use BPF prog pack allocator and add BPF arena support - Update dts files to add nand controllers - Some bug fixes and other small changes * tag 'loongarch-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: dts: loongson-2k1000: Add nand controller support LoongArch: dts: loongson-2k0500: Add nand controller support LoongArch: BPF: Implement bpf_addr_space_cast instruction LoongArch: BPF: Implement PROBE_MEM32 pseudo instructions LoongArch: BPF: Use BPF prog pack allocator LoongArch: Use IS_ERR_PCPU() macro for KGDB LoongArch: Rework KASAN initialization for PTW-enabled systems LoongArch: Disable instrumentation for setup_ptwalker() LoongArch: Remove some extern variables in source files LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RT LoongArch: Handle percpu handler address for ORC unwinder LoongArch: Use %px to print unmodified unwinding address LoongArch: Prefer top-down allocation after arch_mem_init() LoongArch: Add HOTPLUG_SMT implementation LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODE LoongArch: Wire up memfd_secret system call LoongArch: Replace seq_printf() with seq_puts() for simple strings LoongArch: Add 128-bit atomic cmpxchg support LoongArch: Add detection for SC.Q support LoongArch: Select HAVE_CMPXCHG_LOCAL in Kconfig	2026-02-14 12:47:15 -08:00
Linus Torvalds	cb5573868e	Loongarch: - Add more CPUCFG mask bits. - Improve feature detection. - Add lazy load support for FPU and binary translation (LBT) register state. - Fix return value for memory reads from and writes to in-kernel devices. - Add support for detecting preemption from within a guest. - Add KVM steal time test case to tools/selftests. ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception. - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process. - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers. - More pKVM fixes for features that are not supposed to be exposed to guests. - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor. - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding. - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest. - Preliminary work for guest GICv5 support. - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures. - A small set of FPSIMD cleanups. - Selftest fixes addressing the incorrect alignment of page allocation. - Other assorted low-impact fixes and spelling fixes. RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host. - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit `bf2c3138ae` ("Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD"). - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN. - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry. - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly. - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue. - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM. - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace. - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2. - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration. - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs). - Clean up KVM's handling of marking mapped vCPU pages dirty. - Drop a pile of ancient sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless. - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n. - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y. - Add a regression test for recent APICv update fixes. - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information. - Drop a dead function declaration. - Minor cleanups. x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans. - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM. - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed. - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist. - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch. x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure. - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) must flush the RAP. - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model. - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig. - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0. - Treat exit_code as an unsigned 64-bit value through all of KVM. - Add support for fetching SNP certificates from userspace. - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2. - Misc fixes and cleanups. x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking. - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests). - Extend several nested VMX tests to also cover nested SVM. - Add a selftest for nested VMLOAD/VMSAVE. - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain. guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP. - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources. - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock. Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless. -----BEGIN PGP SIGNATURE----- iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmNqwwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPaZAf/cJx5B67lnST272esz0j29MIuT/Ti jnf6PI9b7XubKYOtNvlu5ZW4Jsa5dqRG0qeO/JmcXDlwBf5/UkWOyvqIXyiuTl0l KcSUlKPtTgKZSoZpJpTppuuDE8FSYqEdcCmjNvoYzcJoPjmaeJbK6aqO0AkBbb6e L5InrLV7nV9iua6rFvA0s/G8/Eq2DG8M9hTRHe6NcI/z4hvslOudvpUXtC8Jygoo cV8vFavUwc+atrmvhAOLvSitnrjfNa4zcG6XMOlwXPfIdvi3zqTlQTgUpwGKiAGQ RIDUVZ/9bcWgJqbPRsdEWwaYRkNQWc5nmrAHRpEEaYV/NeBBNf4v6qfKSw== =SkJ1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Loongarch: - Add more CPUCFG mask bits - Improve feature detection - Add lazy load support for FPU and binary translation (LBT) register state - Fix return value for memory reads from and writes to in-kernel devices - Add support for detecting preemption from within a guest - Add KVM steal time test case to tools/selftests ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers - More pKVM fixes for features that are not supposed to be exposed to guests - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest - Preliminary work for guest GICv5 support - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures - A small set of FPSIMD cleanups - Selftest fixes addressing the incorrect alignment of page allocation - Other assorted low-impact fixes and spelling fixes RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit `bf2c3138ae` ("Merge tag 'kvm-x86-pmu-6.20' ...") - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs) - Clean up KVM's handling of marking mapped vCPU pages dirty - Drop a pile of ancient sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y - Add a regression test for recent APICv update fixes - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information - Drop a dead function declaration - Minor cleanups x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) must flush the RAP - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0 - Treat exit_code as an unsigned 64-bit value through all of KVM - Add support for fetching SNP certificates from userspace - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2 - Misc fixes and cleanups x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests) - Extend several nested VMX tests to also cover nested SVM - Add a selftest for nested VMLOAD/VMSAVE - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...	2026-02-13 11:31:15 -08:00
Linus Torvalds	cee73b1e84	RISC-V updates for v7.0 - Add support for control flow integrity for userspace processes. This is based on the standard RISC-V ISA extensions Zicfiss and Zicfilp - Improve ptrace behavior regarding vector registers, and add some selftests - Optimize our strlen() assembly - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI volume mounting - Clean up some code slightly, including defining copy_user_page() as copy_page() rather than memcpy(), aligning us with other architectures; and using max3() to slightly simplify an expression in riscv_iommu_init_check() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmmOYpYACgkQx4+xDQu9 KkvzOQ/9Fq8ZxWgYofhTPtw9/vps3avheOHlEoRrBWYfn1VkTRPAcbUULL4PGXwg dnVFEl3AcrpOFikIthbukklLeLoOnUshZJBU25zY5h0My1jb63V1//gEwJR6I0dg +V+GJmfzc4+YVaHK6UFdn7j3GgKUbTC7xXRMuGEriAzKPnm3AXAjh94wMNx6depv Li3IXRoZT/HvqIAyfeAoM9STwOzJtE3Sc6fXABkzsIbNTjjdgIqoRSsQsKY10178 z6ox/sVStnLmVaMbOd/ZVN0J70JRDsvK0TC0/13K1ESUbnVia9a3bPIxLRmSapKC wXnwAuSeevtFshGGyd5LZO0QQGxzG1H63Gky2GRoh8bTQbd2tQcfQzANdnPkBAQS j2aOiSsiUQeNZqfZAfEBwRd27GXRYlKb/MpgCZKUH+ZO9VG6QaD3VGvg17/Caghy nVdbBQ81ZV9tkz9EMN0vt2VJHmEqARh88w619laHjg+ioPTG4/UIDPzskt1I+Fgm Y6NQLeFyfaO3RKKDYWGPcY7fmWQI9V8MECHOvyVI4xJcgqAbqnfsgytjuiFbrfRo fTvpuB7kvltBZ180QSB79xj0sWGFTWR02MeWy3uOaLZz2eIm2ZTZbMUSgNYR0ldG L3y7CEkTkoVF1ijYgAfuMgptk3Yf0dpa66D9HUo947wWkNrW5ds= =4fTk -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: - Add support for control flow integrity for userspace processes. This is based on the standard RISC-V ISA extensions Zicfiss and Zicfilp - Improve ptrace behavior regarding vector registers, and add some selftests - Optimize our strlen() assembly - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI volume mounting - Clean up some code slightly, including defining copy_user_page() as copy_page() rather than memcpy(), aligning us with other architectures; and using max3() to slightly simplify an expression in riscv_iommu_init_check() * tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) riscv: lib: optimize strlen loop efficiency selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function selftests: riscv: verify ptrace accepts valid vector csr values selftests: riscv: verify ptrace rejects invalid vector csr inputs selftests: riscv: verify syscalls discard vector context selftests: riscv: verify initial vector state with ptrace selftests: riscv: test ptrace vector interface riscv: ptrace: validate input vector csr registers riscv: csr: define vtype register elements riscv: vector: init vector context with proper vlenb riscv: ptrace: return ENODATA for inactive vector extension kselftest/riscv: add kselftest for user mode CFI riscv: add documentation for shadow stack riscv: add documentation for landing pad / indirect branch tracking riscv: create a Kconfig fragment for shadow stack and landing pad support arch/riscv: add dual vdso creation logic and select vdso based on hw arch/riscv: compile vdso with landing pad and shadow stack note riscv: enable kernel access to shadow stack memory via the FWFT SBI call riscv: add kernel command line option to opt out of user CFI riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe ...	2026-02-12 19:17:44 -08:00
Linus Torvalds	2c75a8d92c	ATA changes for 6.20 - Cleanup IRQ masking in the handling of completed report zones commands (Niklas). - Improve the handling of Thunderbolt attached devices to speed up device removal (Henry). - Several patches to generalize the existing max_sec quirks to facilitates quirking the maximum command size of buggy drives, many of which have recently showed up with the recent increase of the default max_sectors block limit (Niklas). - Cleanup the ahci-platform and sata dt-bindings schema (Rob, Manivannan). - Improve device node scan in the ahci-dwc driver (Krzysztof). - Remove clang W=1 warnings with the ahci-imx and ahci-xgene drivers (Krzysztof). - Fix a long standing potential command starvation situation with non-NCQ commands issued when NCQ commands are on-going (me). - Limit max_sectors to 8191 on the INTEL SSDSC2KG480G8 SSD (Niklas). - Remove Vesa Local Bus (VLB) support in the pata_legacy driver (Ethan). - Simple fixes in the pata_cypress (typo) and pata_ftide010 (timing) drivers (Ethan, Linus W.) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCaY5uKwAKCRDdoc3SxdoY dvn1AQCyhAHcegeAuQLL9L6pTdtKmObR0AOeeTkqOvGOWdb4agD+OVCeivi7KPBL zwzaJ5BhvwOS8FTiZzd+KHVpAQ0LtQk= =HvkS -----END PGP SIGNATURE----- Merge tag 'ata-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ATA updates from Damien Le Moal: - Cleanup IRQ masking in the handling of completed report zones commands (Niklas) - Improve the handling of Thunderbolt attached devices to speed up device removal (Henry) - Several patches to generalize the existing max_sec quirks to facilitates quirking the maximum command size of buggy drives, many of which have recently showed up with the recent increase of the default max_sectors block limit (Niklas) - Cleanup the ahci-platform and sata dt-bindings schema (Rob, Manivannan) - Improve device node scan in the ahci-dwc driver (Krzysztof) - Remove clang W=1 warnings with the ahci-imx and ahci-xgene drivers (Krzysztof) - Fix a long standing potential command starvation situation with non-NCQ commands issued when NCQ commands are on-going (me) - Limit max_sectors to 8191 on the INTEL SSDSC2KG480G8 SSD (Niklas) - Remove Vesa Local Bus (VLB) support in the pata_legacy driver (Ethan) - Simple fixes in the pata_cypress (typo) and pata_ftide010 (timing) drivers (Ethan, Linus W) * tag 'ata-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: pata_ftide010: Fix some DMA timings ata: pata_cypress: fix typo in error message ata: pata_legacy: remove VLB support ata: libata-core: Quirk INTEL SSDSC2KG480G8 max_sectors dt-bindings: ata: sata: Document the graph port ata: libata-scsi: avoid Non-NCQ command starvation ata: libata-scsi: refactor ata_scsi_translate() ata: ahci-xgene: Fix Wvoid-pointer-to-enum-cast warning ata: ahci-imx: Fix Wvoid-pointer-to-enum-cast warning ata: ahci-dwc: Simplify with scoped for each OF child loop dt-bindings: ata: ahci-platform: Drop unnecessary select schema ata: libata: Allow more quirks ata: libata: Add libata.force parameter max_sec ata: libata: Add support to parse equal sign in libata.force ata: libata: Change libata.force to use the generic ATA_QUIRK_MAX_SEC quirk ata: libata: Add ata_force_get_fe_for_dev() helper ata: libata: Add ATA_QUIRK_MAX_SEC and convert all device quirks ata: libata: avoid long timeouts on hot-unplugged SATA DAS ata: libata-scsi: Remove superfluous local_irq_save()	2026-02-12 17:12:43 -08:00
Linus Torvalds	136114e0ab	mm.git review status for linus..mm-nonmm-stable Total patches: 107 Reviews/patch: 1.07 Reviewed rate: 67% - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" from Heming Zhao saves disk space by teaching ocfs2 to reclaim suballocator block group space. - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in various places. - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size. - The 7 patch series "kallsyms: Prevent invalid access when showing module buildid" from Petr Mladek cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces. - The 3 patch series "Address page fault in ima_restore_measurement_list()" from Harshit Mogalapalli fixes a kexec-related crash that can occur when booting the second-stage kernel on x86. - The 6 patch series "kho: ABI headers and Documentation updates" from Mike Rapoport updates the kexec handover ABI documentation. - The 4 patch series "Align atomic storage" from Finn Thain adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh. - The 2 patch series "kho: clean up page initialization logic" from Pratyush Yadav simplifies the page initialization logic in kho_restore_page(). - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves several things out of kernel.h and into more appropriate places. - The 7 patch series "don't abuse task_struct.group_leader" from Oleg Nesterov removes the usage of ->group_leader when it is "obviously unnecessary". - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin adds some infrastructure improvements to the live update orchestrator. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww= =mmsH -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...	2026-02-12 12:13:01 -08:00
Linus Torvalds	61e629596f	- dm-verity: various optimizations and fixes related to forward error correction - dm-verity: add a .dm-verity keyring - dm-integrity: fix bugs with growing a device in bitmap mode - dm-mpath: fix leaking fake timeout requests, a UAF bug caused by stale rq->bio and minor bugs in device creation - dm-core: fix a bug related to blkg association, avoid unnecessary blk-crypto work on invalid keys - dm-bufio: dm-bufio cleanup and optimization (reducing hash table lookups) - various other minor fixes and cleanups -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCaYsfghQcbXBhdG9ja2FA cmVkaGF0LmNvbQAKCRATAyx9YGnhbcwnAP9Ny3YqC63GI5qH2cd8N42zRQ8P4sN+ k4cz1XtrstoA4wEA/357lutlowGgUoHUkVRKK03AcGnbcVJk0yMyRDo/0gs= =wvTt -----END PGP SIGNATURE----- Merge tag 'for-7.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - dm-verity: - various optimizations and fixes related to forward error correction - add a .dm-verity keyring - dm-integrity: fix bugs with growing a device in bitmap mode - dm-mpath: - fix leaking fake timeout requests - fix UAF bug caused by stale rq->bio - fix minor bugs in device creation - dm-core: - fix a bug related to blkg association - avoid unnecessary blk-crypto work on invalid keys - dm-bufio: - dm-bufio cleanup and optimization (reducing hash table lookups) - various other minor fixes and cleanups * tag 'for-7.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits) dm mpath: make pg_init_delay_msecs settable Revert "dm: fix a race condition in retrieve_deps" dm mpath: Add missing dm_put_device when failing to get scsi dh name dm vdo encodings: clean up header and version functions dm: use bio_clone_blkg_association dm: fix excessive blk-crypto operations for invalid keys dm-verity: fix section mismatch error dm-unstripe: fix mapping bug when there are multiple targets in a table dm-integrity: fix recalculation in bitmap mode dm-bufio: avoid redundant buffer_tree lookups dm-bufio: merge cache_put() into cache_put_and_wake() selftests: add dm-verity keyring selftests dm-verity: add dm-verity keyring dm: clear cloned request bio pointer when last clone bio completes dm-verity: fix up various workqueue-related comments dm-verity: switch to bio_advance_iter_single() dm-verity: consolidate the BH and normal work structs dm: add WQ_PERCPU to alloc_workqueue users dm-integrity: fix a typo in the code for write/discard race dm: use READ_ONCE in dm_blk_report_zones ...	2026-02-11 17:04:21 -08:00
Linus Torvalds	1e0ea4dff0	IOMMU Updates for Linux v7.0 Including: - Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets - Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates - AMD-Vi changes: - Support for nested translations - Other minor improvements - ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS". - ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes. - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors. - Add device-tree support for NVIDIA's CMDQV extension. - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields. - Don't disable ATS for nested S1-bypass nested domains. - Additions to the kunit selftests. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmmLDZwACgkQK/BELZcB GuNHgg//Yf9K/+T6+IOemA5Z8k3x2p39Q/Dv5x+SEGkh+CUh2C5dX97WD9LHntus 1mgIHlSgbM3bgMB+XTS1Q5ghy1QH71XOMnGCPhthwg843iCP2CcrB84ZZKKnNmw9 2YJdxYlNcbAMpvSd0F1XKaXoiNl9qzWx+QFtnVaTXMptNEhYOxMOlaZPtlEuwfJa T7h4cwtsiMDLWA4pw85y4hfvc5jKRv4dMoohin0lNEBpWkCfYE6b2Cjpff+9TtU2 Jyvvcvyns0US3amEwPHlIyfTUPKdaq6Vv3NX8TkAJUhGyEzdfwEtzqAvWMvOEYFh HfnE/LjZZLB1CUkF5MTib9dBgJACf/jtvOtuh4wZkx+7O2WIR6Ebo41dtWBM6dxh cHGeeQGqxdDZ5UJbIonF8Am0lxsaZx2zs09tlHEMGl2pNDi6vUppk1iTOkv3Wog0 zy4GhDBl0n/IcyCaIinnWck8C+BsAMcRGpDP2AB0I9/C2qpsaFY/NdNkbIGidhaJ 3khdAcjWsNPiJPNbUx66n6t8RSXdYKUuhJq2a/GgYmtAjhRR9cJlupB8/QYCBS5j fxXpHp4xMtw+Cgj58xC+gYXDivQOEThPs/BhL/qrxOzWE03HWI15MFydqRFWicnI gJCZSevMncBfNUTIJUSUmuT7ukP40cnh58QBeRkTmKGcW6HjuyY= =W/nW -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates AMD-Vi changes: - Support for nested translations - Other minor improvements ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS" ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors - Add device-tree support for NVIDIA's CMDQV extension - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields - Don't disable ATS for nested S1-bypass nested domains - Additions to the kunit selftests" * tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommupt: Always add IOVA range to iotlb_gather in gather_range_pages() iommu/amd: serialize sequence allocation under concurrent TLB invalidations iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence iommu/arm-smmu-v3: Add device-tree support for CMDQV driver iommu/tegra241-cmdqv: Decouple driver from ACPI iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p iommu/vt-d: Fix race condition during PASID entry replacement iommu/vt-d: Clear Present bit before tearing down context entry iommu/vt-d: Clear Present bit before tearing down PASID entry iommu/vt-d: Flush piotlb for SVM and Nested domain iommu/vt-d: Flush cache for PASID table before using it iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode rust: iommu: fix `srctree` link warning rust: iommu: fix Rust formatting ...	2026-02-11 16:36:08 -08:00
Linus Torvalds	9bdc64892d	workqueue: Changes for v6.20 - Rework the rescuer to process work items one-by-one instead of slurping all pending work items in a single pass. As there is only one rescuer per workqueue, a single long-blocking work item could cause high latency for all tasks queued behind it, even after memory pressure is relieved and regular kworkers become available to service them. - Add CONFIG_BOOTPARAM_WQ_STALL_PANIC build-time option and workqueue.panic_on_stall_time parameter for time-based stall panic, giving systems more control over workqueue stall handling. - Replace BUG_ON() with panic() in the stall panic path for clearer intent and more informative output. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaYov/A4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGWXnAQCfyELl+evz3RdFhyTiVCM1TiOnC1TsBjgkm3SJ orMhwgEAkgg40jino34wgeZRfdIAThxQ1O6bsvTpooWKjlCYcQY= =zZCF -----END PGP SIGNATURE----- Merge tag 'wq-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: - Rework the rescuer to process work items one-by-one instead of slurping all pending work items in a single pass. As there is only one rescuer per workqueue, a single long-blocking work item could cause high latency for all tasks queued behind it, even after memory pressure is relieved and regular kworkers become available to service them. - Add CONFIG_BOOTPARAM_WQ_STALL_PANIC build-time option and workqueue.panic_on_stall_time parameter for time-based stall panic, giving systems more control over workqueue stall handling. - Replace BUG_ON() with panic() in the stall panic path for clearer intent and more informative output. * tag 'wq-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: replace BUG_ON with panic in panic_on_wq_watchdog workqueue: add time-based panic for stalls workqueue: add CONFIG_BOOTPARAM_WQ_STALL_PANIC option workqueue: Process extra works in rescuer on memory pressure workqueue: Process rescuer work items one-by-one using a cursor workqueue: Make send_mayday() take a PWQ argument directly	2026-02-11 13:13:32 -08:00
Paolo Bonzini	bf2c3138ae	Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD KVM mediated PMU support for 6.20 Add support for mediated PMUs, where KVM gives the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allows direct access to data MSRs and PMCs (restricted by the vPMU model), but intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. To keep overall complexity reasonable, mediated PMU usage is all or nothing for a given instance of KVM (controlled via module param). The Mediated PMU is disabled default, partly to maintain backwards compatilibity for existing setup, partly because there are tradeoffs when running with a mediated PMU that may be non-starters for some use cases, e.g. the host loses the ability to profile guests with mediated PMUs, the fastpath run loop is also a blind spot, entry/exit transitions are more expensive, etc. Versus the emulated PMU, where KVM is "just another perf user", the mediated PMU delivers more accurate profiling and monitoring (no risk of contention and thus dropped events), with significantly less overhead (fewer exits and faster emulation/programming of event selectors) E.g. when running Specint-2017 on a single-socket Sapphire Rapids with 56 cores and no-SMT, and using perf from within the guest: Perf command: a. basic-sampling: perf record -F 1000 -e 6-instructions -a --overwrite b. multiplex-sampling: perf record -F 1000 -e 10-instructions -a --overwrite Guest performance overhead: --------------------------------------------------------------------------- \| Test case \| emulated vPMU \| all passthrough \| passthrough with \| \| \| \| \| event filters \| --------------------------------------------------------------------------- \| basic-sampling \| 33.62% \| 4.24% \| 6.21% \| --------------------------------------------------------------------------- \| multiplex-sampling \| 79.32% \| 7.34% \| 10.45% \| ---------------------------------------------------------------------------	2026-02-11 12:45:40 -05:00
Linus Torvalds	192c015940	powerpc updates for 7.0 - Implement masked user access - Add support for internal only per-CPU instructions and inline the bpf_get_smp_processor_id() and bpf_get_current_task() - Fix pSeries MSI-X allocation failure when quota is exceeded - Fix recursive pci_lock_rescan_remove locking in EEH event handling - Support tailcalls with subprogs & BPF exceptions on 64bit - Extend "trusted" keys to support the PowerVM Key Wrapping Module (PKWM) Thanks to: Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li, Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, Venkat Rao Bagalkote, -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmmL7S0ACgkQpnEsdPSH ZJR2xA/9G+tZp9+2TidbLSyPT5o063uC5H5+j5QcvvHWi/ImGUNtixlDm5HcZLCR yKywE1piKYBU8HoISjzAt0+JCVd3ZjJ8chTpKgCHLXPRSBTgdR1MG+SXQysDJSWb yA4pwDikmwoLlfi+pf500F0nX2DRCRdU2Yi28ZFeaF/khJ7ebwj41QJ7LjN22+Q1 G8Kq543obZluzSoVvfG4xUK4ByWER+Zdd2F6699iMP68yw5PJ8PPc0SUGt8nuD4i FUs0Lw7XV7i/K3+zm/ZgH5+Cvn7wOIcMNkXgFlxJXkFit97KXUDijifYPoXQyKLL ksD7SPFdV0++Sc+3mWcgW4j+hQZC0Pn864unmh8C6ug3SagQ+3pE1JYWKwCmoyXd 49ROH0y+npArJ4NAc79eweunhafGcRYTSG+zV7swQvpRocMujEqa4CDz4uk1ll5W 1yAac08AN6PnfcXl2VMrcDboziTlQVFcnNQbK/ieYMC7KpgA+udw1hd2rOWNZCPd u0byXxR1ak5YaAEuyMztd/39hrExx8306Jtkh5FIRZKWGAO+3np5bi3vxk11rDni c9BGh2JIMtuPUGys3wcFPGMRTKwF2bDFW/pB+5hMHeLUdlkni9WGCX8eLe2klYsy T7fBVb4d99IVrJGYv3J1lwELgjrgXvv35XOaUiyJyZhcbng15cc= =zJoL -----END PGP SIGNATURE----- Merge tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates for 7.0 - Implement masked user access - Add bpf support for internal only per-CPU instructions and inline the bpf_get_smp_processor_id() and bpf_get_current_task() functions - Fix pSeries MSI-X allocation failure when quota is exceeded - Fix recursive pci_lock_rescan_remove locking in EEH event handling - Support tailcalls with subprogs & BPF exceptions on 64bit - Extend "trusted" keys to support the PowerVM Key Wrapping Module (PKWM) Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li, Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote. * tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits) powerpc/pseries: plpks: export plpks_wrapping_is_supported docs: trusted-encryped: add PKWM as a new trust source keys/trusted_keys: establish PKWM as a trusted source pseries/plpks: add HCALLs for PowerVM Key Wrapping Module pseries/plpks: expose PowerVM wrapping features via the sysfs powerpc/pseries: move the PLPKS config inside its own sysfs directory pseries/plpks: fix kernel-doc comment inconsistencies powerpc/smp: Add check for kcalloc() failure in parse_thread_groups() powerpc: kgdb: Remove OUTBUFMAX constant powerpc64/bpf: Additional NVR handling for bpf_throw powerpc64/bpf: Support exceptions powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT powerpc64/bpf: Avoid tailcall restore from trampoline powerpc64/bpf: Support tailcalls with subprogs powerpc64/bpf: Moving tail_call_cnt to bottom of frame powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf() powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs ...	2026-02-10 21:46:12 -08:00
Linus Torvalds	dcb4971018	- Extend the resctrl machinery to support telemetry monitoring on Intel. The practical usage of this is being able to tell how much energy or how much work can be attributed to a group of tasks tracked under a single idenitifier. Prepend this work with proper refactoring of resctrl domains handling code. Work by Tony Luck -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmKaKcACgkQEsHwGGHe VUrzTxAAkQP0r5DdcNZA6SjnaHMjjjxV/+BlxhVDJV1EOT2/luRDelPQwm+GwaHz Rk02cpUvLUmGQ3/vD/VLmH2Oar0gkKfzEgSJU/OAr7lDpeA3eN3BRQhyLL99fFFQ XaH5JaXjweer8er2+ultyux+0yXmwA2Albeh2IVNR6heGjJNIG4/p9YB6z9aCS1b B9Freb548ISq2MPqzczu5+Ku7N4nsA5TEL1wE+ndMz0NwdSvvhz8LWX3H/5cfgtM Vf3thqgtIMZq8EiS29tDtE6EeveMMXC5XGpQ4Ts6GLgV6a3L/0Ppc2VfUQ1AsC8u tkRIrXN9gxqcqTzI/GQ/b2QgcqH6/qy/wHowLDOjf9j8GJbRYtDda1rPo7imnoKL 6Ljn0E6qbstatBz6QBojYJB5fzC3wj0VMdG3jI2ZxaH1+iVwkb4D0fW2EScwcV2X C6GTg+VsEDUvhm/V/1gyNSZJHPibHpNkmD61M+J9rn3COlzslRF8p7sEUogjfWxu RXan9aJDocQA7/PDw7uYLlXZYQsyzwZbs3yv0UA8dU2vkNcNPQradlj305Fn2FxI D8oaFCx/AB0iPI8W6wZPdLR2gbCmwS9cNgt2PQAdSFCy3z4ceAkRYgLn64Sv4SJ7 gQVJXG+V2NZuKhFi7DhSIeffDqtMT6Agia3fWW/p3NIns4FyyXo= =WMoi -----END PGP SIGNATURE----- Merge tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Extend the resctrl machinery to support telemetry monitoring on Intel (Tony Luck) The practical usage of this is being able to tell how much energy or how much work can be attributed to a group of tasks tracked under a single idenitifier. Prepend this work with proper refactoring of resctrl domains handling code. * tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86,fs/resctrl: Update documentation for telemetry events x86/resctrl: Enable RDT_RESOURCE_PERF_PKG fs/resctrl: Move RMID initialization to first mount x86,fs/resctrl: Compute number of RMIDs as minimum across resources fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG x86/resctrl: Add energy/perf choices to rdt boot option x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() fs/resctrl: Refactor mkdir_mondata_subdir() x86/resctrl: Read telemetry events x86/resctrl: Find and enable usable telemetry events x86,fs/resctrl: Add architectural event pointer x86,fs/resctrl: Fill in details of events for performance and energy GUIDs x86/resctrl: Discover hardware telemetry events fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains x86,fs/resctrl: Add and initialize a resource for package scope monitoring x86,fs/resctrl: Add an architectural hook called for first mount x86,fs/resctrl: Support binary fixed point event counters x86,fs/resctrl: Handle events that can be read from any CPU ...	2026-02-10 18:24:56 -08:00
Linus Torvalds	5668a64622	x86/boot changes for v7.0: - x86/acpi: Add acpi=spcr to use SPCR-provided default console (Shenghao Yang) - x86/acpi/boot: Correct the acpi_is_processor_usable() check again (Yazen Ghannam) - Refresh the x86 memory map (e820 table) handling code, and make the printouts a bit more informative. (Ingo Molnar) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJk1wRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gcvA/9E00IuhZS6RxIPhqPFVhUvYlOEMnBBfnb Sv0de0etGb2Q5GRL5j+XqMbTE2mns1we8Jh7dyY9N4K1eRlhXJZQWEFjqsFHX5/A VxSc3/QrfzzpAxYDAg0hOSjMeJv3W4pdLjfwDpNLa/EZ+KA5tVcwu6ufd9ZFXO3a DWlk0H3JbRuiObFlKqSpccaf9/taRz9LZ9DNqTT0K9zDN4yukWKYXxjhwxASLZ2C i5cAzgJqvxiK4xtx5WbCs5b0swLzDi/4V7L1D9ojhckoCkrlLUjo7eAvSiS2kYik NcA7LHDehMmP6mKOQUq9YJhR/P9FfmeItd0VgdAmVTuMdsl0LQDCl5v8XkWoBbXG yYabeJl3eBrd5OVqsMK/vrnRkxMuBnTHdysyUpJrtCXHrMbQnpxerMIgqwhDH5rx Z3zN4MRDf2buDJZDQT53ItKjaCGuUAEu869vo12eWhbA+OeL83G6+cs8kPUXquIq p3NfPTLWj3U7qcwN35x8WTT1PWBrOtiUdjm27lLFUFsrY4xqXrgcHoXkoAp3tqQX eVoxjROVGsrmFdzVr6c9D/1ZyVDwsetKQdixHrLWiAlm/S0lwDYee180uB+ocYLH ML4QD7rvb6npAHcxUtvftpDEBO7TRnxZWlSgS70ZOYUnnVxLBhFrhygLMJ1giEew aAodneB2Os8= =brCM -----END PGP SIGNATURE----- Merge tag 'x86-boot-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/boot updates from Ingo Molnar: - x86/acpi: Add acpi=spcr to use SPCR-provided default console (Shenghao Yang) - x86/acpi/boot: Correct the acpi_is_processor_usable() check again (Yazen Ghannam) - Refresh the x86 memory map (e820 table) handling code, and make the printouts a bit more informative (Ingo Molnar) * tag 'x86-boot-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86/acpi: Add acpi=spcr to use SPCR-provided default console x86/boot/e820: Use <linux/sizes.h> symbols for literals x86/boot/e820: Make sure e820_search_gap() finds all gaps x86/boot/e820: Simplify the e820__range_remove() API x86/boot/e820: Remove e820__range_remove()'s unused return parameter x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables x86/boot/e820: Standardize __init/__initdata tag placement x86/boot/e820: Simplify & clarify __e820__range_add() a bit x86/boot/e820: Rename gap_start/gap_size to max_gap_start/max_gap_start in e820_search_gap() et al x86/boot/e820: Change e820_search_gap() to search for the highest-address PCI gap x86/boot/e820: Clean up e820__setup_pci_gap()/e820_search_gap() a bit x86/boot/e820: Change struct e820_table::nr_entries type from __u32 to u32 x86/boot/e820: Standardize e820 table index variable types under 'u32' x86/boot/e820: Standardize e820 table index variable names under 'idx' x86/boot/e820: Remove unnecessary header inclusions x86/boot/e820: Clean up __refdata use a bit x86/boot/e820: Clean up __e820__range_add() a bit x86/boot/e820: Improve e820_print_type() messages x86/boot/e820: Clean up confusing and self-contradictory verbiage around E820 related resource allocations x86/boot/e820: Remove pointless early_panic() indirection ...	2026-02-10 13:12:54 -08:00
Linus Torvalds	36ae1c45b2	Scheduler changes for v7.0: Scheduler Kconfig space updates: - Further consolidate configurable preemption modes: reduce the number of architectures that are allowed to offer PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of preemption models from four to just two: 'full' and 'lazy' on up-to-date architectures (arm64, loongarch, powerpc, riscv, s390, x86). None and voluntary are only available as legacy features on platforms that don't implement lazy preemption yet, or which don't even support preemption. The goal is to eventually remove cond_resched() and voluntary preemption altogether. (Peter Zijlstra) RSEQ based 'scheduler time slice extension' support: This allows a thread to request a time slice extension when it enters a critical section to avoid contention on a resource when the thread is scheduled out inside of the critical section. - Add fields and constants for time slice extension - Provide static branch for time slice extensions - Add statistics for time slice extensions - Add prctl() to enable time slice extensions - Implement sys_rseq_slice_yield() - Implement syscall entry work for time slice extensions - Implement time slice extension enforcement timer - Reset slice extension when scheduled - Implement rseq_grant_slice_extension() - entry: Hook up rseq time slice extension - selftests: Implement time slice extension test (Thomas Gleixner) - Allow registering RSEQ with slice extension - Move slice_ext_nsec to debugfs - Lower default slice extension - selftests/rseq: Add rseq slice histogram script (Peter Zijlstra) Scheduler performance/scalability improvements: - Update rq->avg_idle when a task is moved to an idle CPU, which improves the scalability of various workloads. (Shubhang Kaushik) - Reorder fields in 'struct rq' for better caching (Blake Jones) - Fair scheduler SMP NOHZ balancing code speedups: - Move checking for nohz cpus after time check - Change likelyhood of nohz.nr_cpus - Remove nohz.nr_cpus and use weight of cpumask instead (Shrikanth Hegde) - Avoid false sharing for sched_clock_irqtime (Wangyang Guo) - Drop useless cpumask_empty() in find_energy_efficient_cpu() - Simplify task_numa_find_cpu() - Use cpumask_weight_and() in sched_balance_find_dst_group() (Yury Norov) DL scheduler updates: - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel Fernandes, with fixes by Peter Zijlstra) RT scheduler updates: - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang) Entry code updates and performance improvements, which is part of the scheduler tree in this cycle due to interdependencies with the RSEQ based time slice extension work: - Remove unused syscall argument from syscall_trace_enter() - Rework syscall_exit_to_user_mode_work() for architecture reuse - Add arch_ptrace_report_syscall_entry/exit() - Inline syscall_exit_work() and syscall_trace_enter() (Jinjie Ruan) Scheduler core updates: - Rework sched_class::wakeup_preempt() and rq_modified_() - Avoid rq->lock bouncing in sched_balance_newidle() - Rename rcu_dereference_check_sched_domain() => rcu_dereference_sched_domain() - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper (Peter Zijlstra) Fair scheduler updates/refactoring: - Fold the sched_avg update - Change rcu_dereference_check_sched_domain() to rcu-sched - Switch to rcu_dereference_all() - Remove superfluous rcu_read_lock() - Limit hrtick work (Peter Zijlstra) - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks - Clean up comments in 'struct cfs_rq' - Separate se->vlag from se->vprot - Rename cfs_rq::avg_load to cfs_rq::sum_weight - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for wrapped-signed aritmetics - Sort out 'blocked_load' namespace noise (Ingo Molnar) Scheduler debugging code updates: - Export hidden tracepoints to modules (Gabriele Monaco) - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user() (Fushuai Wang) - Add assertions to QUEUE_CLASS (Peter Zijlstra) - hrtimer: Fix tracing oddity (Thomas Gleixner) Misc fixes and cleanups: - Re-evaluate scheduling when migrating queued tasks out of throttled cgroups (Zicheng Qu) - Remove task_struct->faults_disabled_mapping (Christoph Hellwig) - Fix math notation errors in avg_vruntime comment (Zhan Xusheng) - sched/cpufreq: Use %pe format for PTR_ERR() printing (zenghongling) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJj+IRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1grtQ//WyXYGVE/WicdqslfaCY2Mr0uJnL0tLSM CJp+0LROdkmy+ChJmftO8RgjCUSsjhC4/xcBhUQXApf/ffQi3b2jH6nkTp/Z64Ms p2IXLkBiZjwdcO6fGbB0JE2G1J4hGRC5BlqfgkZzWidMf3kIbmrHg99mVWGzODLY N/cPW4d0WGf9TScl1FgEiOqgF3czMLlqvTDJqaFMpsTzSUcRBnrG4xushb4W/bBx 573eqxgZJ6urNSGu8niY9PAl9F7gskXW3YxI3k8SH7VmJKSevWlwI9vMEhcRDzud E0XxD7J8iPOKtr7ypXm7anMBv4jWVUdAnPbYi4TDsyDDU/HguqMqT1McTGn8wQ+F jmdhmMC9/TEIzq93SNLbCYieibqDsmJoNVFFi0FWfPLMtYbcZd5a884SIz532vx4 DegdlDXdazUwhxzDiQR3sq1CsHXpxNS2YdrpadAtF/r2gU86DQjsEew8yBvXi7bb Wrkzpax70sU1AFI23wJQkEb/OnnXyehAHAhhQN6GVvuiGr9P7C02WLEGLlmSmJrx zl2F750P76yhTfGcvTfJ/5LTfSB+yRozGvcdXnIkyzWotY6a2D1MKNusAfVax+IR kyfAWqVdxBhlKnqYbu92lTogvnPh3Lymd6G4TZZRkSH2jixyGd2oS7nZaDBAeBEM NHQtr9R+KyU= =Xj2f -----END PGP SIGNATURE----- Merge tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Scheduler Kconfig space updates: - Further consolidate configurable preemption modes (Peter Zijlstra) Reduce the number of architectures that are allowed to offer PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of preemption models from four to just two: 'full' and 'lazy' on up-to-date architectures (arm64, loongarch, powerpc, riscv, s390, x86). None and voluntary are only available as legacy features on platforms that don't implement lazy preemption yet, or which don't even support preemption. The goal is to eventually remove cond_resched() and voluntary preemption altogether. RSEQ based 'scheduler time slice extension' support (Thomas Gleixner and Peter Zijlstra): This allows a thread to request a time slice extension when it enters a critical section to avoid contention on a resource when the thread is scheduled out inside of the critical section. - Add fields and constants for time slice extension - Provide static branch for time slice extensions - Add statistics for time slice extensions - Add prctl() to enable time slice extensions - Implement sys_rseq_slice_yield() - Implement syscall entry work for time slice extensions - Implement time slice extension enforcement timer - Reset slice extension when scheduled - Implement rseq_grant_slice_extension() - entry: Hook up rseq time slice extension - selftests: Implement time slice extension test - Allow registering RSEQ with slice extension - Move slice_ext_nsec to debugfs - Lower default slice extension - selftests/rseq: Add rseq slice histogram script Scheduler performance/scalability improvements: - Update rq->avg_idle when a task is moved to an idle CPU, which improves the scalability of various workloads (Shubhang Kaushik) - Reorder fields in 'struct rq' for better caching (Blake Jones) - Fair scheduler SMP NOHZ balancing code speedups (Shrikanth Hegde): - Move checking for nohz cpus after time check - Change likelyhood of nohz.nr_cpus - Remove nohz.nr_cpus and use weight of cpumask instead - Avoid false sharing for sched_clock_irqtime (Wangyang Guo) - Cleanups (Yury Norov): - Drop useless cpumask_empty() in find_energy_efficient_cpu() - Simplify task_numa_find_cpu() - Use cpumask_weight_and() in sched_balance_find_dst_group() DL scheduler updates: - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel Fernandes, with fixes by Peter Zijlstra) RT scheduler updates: - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang) Entry code updates and performance improvements (Jinjie Ruan) This is part of the scheduler tree in this cycle due to inter- dependencies with the RSEQ based time slice extension work: - Remove unused syscall argument from syscall_trace_enter() - Rework syscall_exit_to_user_mode_work() for architecture reuse - Add arch_ptrace_report_syscall_entry/exit() - Inline syscall_exit_work() and syscall_trace_enter() Scheduler core updates (Peter Zijlstra): - Rework sched_class::wakeup_preempt() and rq_modified_() - Avoid rq->lock bouncing in sched_balance_newidle() - Rename rcu_dereference_check_sched_domain() => rcu_dereference_sched_domain() - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper Fair scheduler updates/refactoring (Peter Zijlstra and Ingo Molnar): - Fold the sched_avg update - Change rcu_dereference_check_sched_domain() to rcu-sched - Switch to rcu_dereference_all() - Remove superfluous rcu_read_lock() - Limit hrtick work - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks - Clean up comments in 'struct cfs_rq' - Separate se->vlag from se->vprot - Rename cfs_rq::avg_load to cfs_rq::sum_weight - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for wrapped-signed aritmetics - Sort out 'blocked_load' namespace noise Scheduler debugging code updates: - Export hidden tracepoints to modules (Gabriele Monaco) - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user() (Fushuai Wang) - Add assertions to QUEUE_CLASS (Peter Zijlstra) - hrtimer: Fix tracing oddity (Thomas Gleixner) Misc fixes and cleanups: - Re-evaluate scheduling when migrating queued tasks out of throttled cgroups (Zicheng Qu) - Remove task_struct->faults_disabled_mapping (Christoph Hellwig) - Fix math notation errors in avg_vruntime comment (Zhan Xusheng) - sched/cpufreq: Use %pe format for PTR_ERR() printing (zenghongling)" * tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups sched/cpufreq: Use %pe format for PTR_ERR() printing sched/rt: Skip currently executing CPU in rto_next_cpu() sched/clock: Avoid false sharing for sched_clock_irqtime selftests/sched_ext: Add test for DL server total_bw consistency selftests/sched_ext: Add test for sched_ext dl_server sched/debug: Fix dl_server (re)start conditions sched/debug: Add support to change sched_ext server params sched_ext: Add a DL server for sched_ext tasks sched/debug: Stop and start server based on if it was active sched/debug: Fix updating of ppos on server write ops sched/deadline: Clear the defer params entry: Inline syscall_exit_work() and syscall_trace_enter() entry: Add arch_ptrace_report_syscall_entry/exit() entry: Rework syscall_exit_to_user_mode_work() for architecture reuse entry: Remove unused syscall argument from syscall_trace_enter() sched: remove task_struct->faults_disabled_mapping sched: Update rq->avg_idle when a task is moved to an idle CPU selftests/rseq: Add rseq slice histogram script hrtimer: Fix trace oddity ...	2026-02-10 12:50:10 -08:00
Huacai Chen	009ee0c964	LoongArch: Add HOTPLUG_SMT implementation For benchmarking or debugging purpose, we usually want to control SMT via boot parameter and sysfs knobs. So add HOTPLUG_SMT implementation. 1. Boot parameters: nosmt: Disable SMT, can be enabled via sysfs knobs. nosmt=force: Disable SMT, cannot be enabled via sysfs knobs. 2. Runtime sysfs controls: Write "on", "off", "forceoff" or the number of SMT threads (1, 2, ...) to /sys/devices/system/cpu/smt/control. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-02-10 19:31:13 +08:00
Linus Torvalds	72c395024d	A slightly calmer cycle for docs this time around, though there is still a fair amount going on, including: - Some signs of life on the long-moribund Japanese translation - Documentation on policies around the use of generative tools for patch submissions, and a separate document intended for consumption by generative tools. - The completion of the move of the documentation tools to tools/docs. For now we're leaving a /scripts/kernel-doc symlink behind to avoid breaking scripts. - Ongoing build-system work includes the incorporation of documentation in Python code, better support for documenting variables, and lots of improvements and fixes. - Automatic linking of man-page references -- cat(1), for example -- to the online pages in the HTML build. ...and the usual array of typo fixes and such. -----BEGIN PGP SIGNATURE----- iQFDBAABCgAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmmKM8YPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YLK4H/2gqVxY8wKbVymiB95/zudiba8EtWlKE4hZl KAd4+csZ8RCTMxHJLI23FXOi56CYr3XOQol0DIDUGimQiQx/Cxa2QDWewpkqbNH1 tHPTaNWAj16wKzrZxXhWt+6FoBHd7wrqilLH180IRmezRhu+7kURQ5XEAAXfK08B CfDXBsCpnGsKn+m72x04cpvnsf/iK3pznbKrZ0ZYGIoaZb6+BV3+jqVaLROWSQZt Nvt1rYjsi0vaeNapbQL8q72UJ/+zO4nK9am13s7p20zD+jUVY48yfQB/ZqvHp/1L aymcJUCq0h5sSOHnfSqY5MTZUR/0CK+npRcEPgDYzLBigc9XU9g= =hHq1 -----END PGP SIGNATURE----- Merge tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux Pull documentation updates from Jonathan Corbet: "A slightly calmer cycle for docs this time around, though there is still a fair amount going on, including: - Some signs of life on the long-moribund Japanese translation - Documentation on policies around the use of generative tools for patch submissions, and a separate document intended for consumption by generative tools - The completion of the move of the documentation tools to tools/docs. For now we're leaving a /scripts/kernel-doc symlink behind to avoid breaking scripts - Ongoing build-system work includes the incorporation of documentation in Python code, better support for documenting variables, and lots of improvements and fixes - Automatic linking of man-page references -- cat(1), for example -- to the online pages in the HTML build ...and the usual array of typo fixes and such" * tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (107 commits) doc: development-process: add notice on testing tools: sphinx-build-wrapper: improve its help message docs: sphinx-build-wrapper: allow -v override -q docs: kdoc: Fix pdfdocs build for tools docs: ja_JP: process: translate 'Obtain a current source tree' docs: fix 're-use' -> 'reuse' in documentation docs: ioctl-number: fix a typo in ioctl-number.rst docs: filesystems: ensure proc pid substitutable is complete docs: automarkup.py: Skip common English words as C identifiers Documentation: use a source-read extension for the index link boilerplate docs: parse_features: make documentation more consistent docs: add parse_features module documentation docs: jobserver: do some documentation improvements docs: add jobserver module documentation docs: kabi: helpers: add documentation for each "enum" value docs: kabi: helpers: add helper for debug bits 7 and 8 docs: kabi: system_symbols: end docstring phrases with a dot docs: python: abi_regex: do some improvements at documentation docs: python: abi_parser: do some improvements at documentation docs: add kabi modules documentation ...	2026-02-09 20:53:18 -08:00
Linus Torvalds	33120a2f8f	xen: branch for v7.0-rc1 -----BEGIN PGP SIGNATURE----- iJEEABYKADkWIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCaYmAPhsUgAAAAAAEAA5t YW51MiwyLjUrMS4xMSwyLDIACgkQgFxhu0/YY75PKAEAgNaon3unLoRnCGPjkRxL kruzd/KHcodevfz0koL7hOUA/iHOEpGIibCHJpmj2xt/kgLCxxMCQVupalDfeKwq 4jYN =Vy9z -----END PGP SIGNATURE----- Merge tag 'for-linus-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - fix running as Xen PVH guest in 32-bit mode without PAE - fix PV device handling for suspend/resume when running as a Xen guest - clean up workqueue usage - fix the Xen balloon driver for PVH dom0 - introduce the possibility to use hypercalls for console messages in unprivileged guests - enable Xen dom0 use of virtio devices in nested virtualization setups - simplify the xen-mcelog driver * tag 'for-linus-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xenbus: Rename helpers to freeze/thaw/restore xenbus: Use .freeze/.thaw to handle xenbus devices xen/mcelog: simplify MCE_GETCLEAR_FLAGS using xchg() xen/balloon: improve accuracy of initial balloon target for dom0 Partial revert "x86/xen: fix balloon target initialization for PVH dom0" xen: introduce xen_console_io option xen/virtio: Don't use grant-dma-ops when running as Dom0 x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set xen: privcmd: WQ_PERCPU added to alloc_workqueue users xen/events: replace use of system_wq with system_percpu_wq	2026-02-09 20:38:27 -08:00
Linus Torvalds	996812c453	vfs-7.0-rc1.initrd Please consider pulling these changes from the signed vfs-7.0-rc1.initrd tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc ordBAQD4d6Y5Zvr852s9deMTDv+ng3bFk1YHqybGe3wEuATstwD/QcvAoNFW9Nn5 n7/268Nk6jTEygT7Fm3tn42SnwOxhgU= =T203 -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs initrd removal from Christian Brauner: "Remove the deprecated linuxrc-based initrd code path and related dead code. The linuxrc initrd path was deprecated in 2020 and this series completes its removal. If we see real-life regressions we'll revert. The core change removes handle_initrd() and init_linuxrc() — the entire flow that ran /linuxrc from an initrd, pivoted roots, and handed off to the real root filesystem. With that gone, initrd_load() becomes void (no longer short-circuits prepare_namespace()), rd_load_image() is simplified to always load /initrd.image instead of taking a path, and rd_load_disk() is deleted. The /proc/sys/kernel/real-root-dev sysctl and its backing variable are removed since they only existed for linuxrc to communicate the real root device back to the kernel. The no-op load_ramdisk= and prompt_ramdisk= parameters are dropped, and noinitrd and ramdisk_start= gain deprecation warnings. Initramfs is entirely unaffected. The non-linuxrc initrd path (root=/dev/ram0) is preserved but now carries a deprecation warning targeting January 2027 removal" * tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: init: remove /proc/sys/kernel/real-root-dev initrd: remove deprecated code path (linuxrc) init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters	2026-02-09 11:03:25 -08:00
Linus Torvalds	ef852baaf6	RCU changes for v7.0 RCU Tasks Trace: Re-implement RCU tasks trace in term of SRCU-fast, not only more than 500 lines of code are saved because of the reimplementation, a new set of API, rcu_read_{,un}lock_tasks_trace(), becomes possible as well. Compared to the previous rcu_read_{,un}lock_trace(), the new API avoid the task_struct accesses thanks to the SRCU-fast semantics. As a result, the old rcu_read{,un}lock_trace() API is now deprecated. RCU Torture Test: - Multiple improvements on kvm-series.sh (parallel run and progress showing metrics) - Add context checks to rcu_torture_timer(). - Make config2csv.sh properly handle comments in .boot files. - Include commit discription in testid.txt. Miscellaneous RCU changes: - Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early. - Use suitable gfp_flags for the init_srcu_struct_nodes(). - Fix rcu_read_unlock() deadloop due to softirq. - Correctly compute probability to invoke ->exp_current() in rcutorture. - Make expedited RCU CPU stall warnings detect stall-end races. RCU nocb: - Remove unnecessary WakeOvfIsDeferred wake path and callback overload handling. - Extract nocb_defer_wakeup_cancel() helper. -----BEGIN PGP SIGNATURE----- iQFFBAABCAAvFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmmARZERHGJvcXVuQGtl cm5lbC5vcmcACgkQSXnow7UH+rh8SAf+PDIBWAkdbGgs32EfgpFY42RB4CWygH47 YRup/M3+nU0JcBzNnona1srpHBXRySBJQbvRbsOdlM45VoNQ2wPjig/3vFVRUKYx uqj9Tze00DS74IIGESoTGp0amZde9SS9JakNRoEfTr+Zpj8N6LFERQw0ywUwjR5b RR6bz7q05TAl3u2BYUAgNdnf3VWWTmj4WYwArlQ+qRFAyGN+TVj8Ezra6+K5TJ7u SQYrf7WmRGOhHbVVolvVEOVdACccI8dFl3ebJVE2Ky0gp1o3BLPkcDLJ6gBdTCoE rRrbnkeqs5V7tOkPFDBeUhLPrm1QxrdEDxUQFWjSApbv161sx7AOZA== =O9QQ -----END PGP SIGNATURE----- Merge tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Boqun Feng: - RCU Tasks Trace: Re-implement RCU tasks trace in term of SRCU-fast, not only more than 500 lines of code are saved because of the reimplementation, a new set of API, rcu_read_{,un}lock_tasks_trace(), becomes possible as well. Compared to the previous rcu_read_{,un}lock_trace(), the new API avoid the task_struct accesses thanks to the SRCU-fast semantics. As a result, the old rcu_read{,un}lock_trace() API is now deprecated. - RCU Torture Test: - Multiple improvements on kvm-series.sh (parallel run and progress showing metrics) - Add context checks to rcu_torture_timer() - Make config2csv.sh properly handle comments in .boot files - Include commit discription in testid.txt - Miscellaneous RCU changes: - Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early - Use suitable gfp_flags for the init_srcu_struct_nodes() - Fix rcu_read_unlock() deadloop due to softirq - Correctly compute probability to invoke ->exp_current() in rcutorture - Make expedited RCU CPU stall warnings detect stall-end races - RCU nocb: - Remove unnecessary WakeOvfIsDeferred wake path and callback overload handling - Extract nocb_defer_wakeup_cancel() helper * tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (25 commits) rcu/nocb: Extract nocb_defer_wakeup_cancel() helper rcu/nocb: Remove dead callback overload handling rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path rcu: Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early srcu: Use suitable gfp_flags for the init_srcu_struct_nodes() rcu: Fix rcu_read_unlock() deadloop due to softirq rcutorture: Correctly compute probability to invoke ->exp_current() rcu: Make expedited RCU CPU stall warnings detect stall-end races rcutorture: Add --kill-previous option to terminate previous kvm.sh runs rcutorture: Prevent concurrent kvm.sh runs on same source tree torture: Include commit discription in testid.txt torture: Make config2csv.sh properly handle comments in .boot files torture: Make kvm-series.sh give run numbers and totals torture: Make kvm-series.sh give build numbers and totals torture: Parallelize kvm-series.sh guest-OS execution rcutorture: Add context checks to rcu_torture_timer() rcutorture: Test rcu_tasks_trace_expedite_current() srcu: Create an rcu_tasks_trace_expedite_current() function checkpatch: Deprecate rcu_read_{,un}lock_trace() rcu: Update Requirements.rst for RCU Tasks Trace ...	2026-02-09 09:46:26 -08:00
Breno Leitao	f84c9dd34e	workqueue: add time-based panic for stalls Add a new module parameter 'panic_on_stall_time' that triggers a panic when a workqueue stall persists for longer than the specified duration in seconds. Unlike 'panic_on_stall' which counts accumulated stall events, this parameter triggers based on the duration of a single continuous stall. This is useful for catching truly stuck workqueues rather than accumulating transient stalls. Usage: workqueue.panic_on_stall_time=120 This would panic if any workqueue pool has been stalled for 120 seconds or more. The stall duration is measured from the workqueue last progress (poll_ts) which accounts for legitimate system stalls. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-07 06:54:38 -10:00
Joerg Roedel	ad09563660	Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2026-02-06 11:10:40 +01:00
Maurizio Lombardi	7bb8c40f5a	nvme: add support for dynamic quirk configuration via module parameter Introduce support for enabling or disabling specific NVMe quirks at module load time through the `quirks` module parameter. This mechanism allows users to apply known quirks dynamically based on the device's PCI vendor and device IDs, without requiring to add hardcoded entries in the driver and recompiling the kernel. While the generic PCI new_id sysfs interface exists for dynamic configuration, it is insufficient for scenarios where the system fails to boot (for example, this has been reported to happen because of the bogus_nid quirk). The new_id attribute is writable only after the system has booted and sysfs is mounted. The `quirks` parameter accepts a list of quirk specifications separated by a '-' character in the following format: <VID>:<DID>:<quirk_names>[-<VID>:<DID>:<quirk_names>-..] Each quirk is represented by its name and can be prefixed with `^` to indicate that the quirk should be disabled; quirk names are separated by a ',' character. Example: enable BOGUS_NID and BROKEN_MSI, disable DEALLOCATE_ZEROES: $ modprobe nvme quirks=7170:2210:bogus_nid,broken_msi,^deallocate_zeroes Tested-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-02-05 07:35:58 -08:00
Breno Leitao	32d572e390	workqueue: add CONFIG_BOOTPARAM_WQ_STALL_PANIC option Add a kernel config option to set the default value of workqueue.panic_on_stall, similar to CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, CONFIG_BOOTPARAM_HARDLOCKUP_PANIC and CONFIG_BOOTPARAM_HUNG_TASK_PANIC. This allows setting the number of workqueue stalls before triggering a kernel panic at build time, which is useful for high-availability systems that need consistent panic-on-stall, in other words, those servers which run with CONFIG_BOOTPARAM_*_PANIC=y already. The default remains 0 (disabled). Setting it to 1 will panic on the first stall, and higher values will panic after that many stall warnings. The value can still be overridden at runtime via the workqueue.panic_on_stall boot parameter or sysfs. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-03 09:37:59 -10:00

1 2 3 4 5 ...

1489 Commits