linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-27 08:33:17 +02:00

Author	SHA1	Message	Date
Jose Fernandez (Anthropic)	07d0f496fe	iommu/amd: Bounds-check devid in __rlookup_amd_iommu() iommu_device_register() walks every device on the PCI bus via bus_for_each_dev() and calls amd_iommu_probe_device() for each. The inlined check_device() path computes the device's sbdf, calls rlookup_amd_iommu() to find the owning IOMMU, and only afterwards verifies devid <= pci_seg->last_bdf. __rlookup_amd_iommu() indexes rlookup_table[devid] with no bounds check of its own, so for a PCI device whose BDF is not described by the IVRS, the lookup reads past the end of the allocation before the caller's bounds check can run. This was harmless before commit `e874c666b1` ("iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()"): the table was a zeroed page-order allocation, so the over-read returned NULL and the caller's NULL check skipped the device. After that commit the table is a tight kvcalloc() and the over-read returns adjacent slab contents, which check_device() then dereferences as a struct amd_iommu *, causing a boot-time GPF. Seen on Google Compute Engine ct6e VMs, where the virtualized IVRS describes only the four TPU endpoints 00:04.0-07.0; the gVNIC at 00:08.0 (devid 0x40) indexes 56 bytes past the 456-byte allocation, into the adjacent kmalloc-512 slab object: pci 0000:00:04.0: Adding to iommu group 0 pci 0000:00:05.0: Adding to iommu group 1 pci 0000:00:06.0: Adding to iommu group 2 pci 0000:00:07.0: Adding to iommu group 3 Oops: general protection fault, probably for non-canonical address 0x3a64695f78746382: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.22 #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/06/2025 RIP: 0010:amd_iommu_probe_device+0x54/0x3a0 Call Trace: __iommu_probe_device+0x107/0x520 probe_iommu_group+0x29/0x50 bus_for_each_dev+0x7e/0xe0 iommu_device_register+0xc9/0x240 iommu_go_to_state+0x9c0/0x1c60 amd_iommu_init+0x14/0x40 pci_iommu_init+0x16/0x60 do_one_initcall+0x47/0x2f0 Guard the array access in __rlookup_amd_iommu(). With the fix applied on 6.18.22, the gVNIC at 00:08.0 is skipped cleanly and the VM boots. Fixes: `e874c666b1` ("iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()") Cc: stable@vger.kernel.org Reported-by: Ziyuan Chen <zc@anthropic.com> Tested-by: Ziyuan Chen <zc@anthropic.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Assisted-by: Claude:unspecified Signed-off-by: Jose Fernandez (Anthropic) <jose.fernandez@linux.dev> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-05-11 10:07:52 +02:00
Eder Zulian	8dfd3d8d74	iommu/amd: Remove latent out-of-bounds access in IOMMU debugfs In iommu_mmio_write() and iommu_capability_write(), the variables dbg_mmio_offset and dbg_cap_offset are declared as int. However, they are populated using kstrtou32_from_user(). If a user provides a sufficiently large value, it can become a negative integer. Prior to this patch, the AMD IOMMU debugfs implementation was already protected by different mechanisms. 1. #define OFS_IN_SZ 8 ensures the user string <= 8 bytes, so e.g. 0xffffffff isn't a valid input. if (cnt > OFS_IN_SZ) return -EINVAL; 2. Implicit type promotion in iommu_mmio_write(), dbg_mmio_offset is int and iommu->mmio_phys_end is u64 if (dbg_mmio_offset > iommu->mmio_phys_end - sizeof(u64)) return -EINVAL; 3. The show handlers would currently catch the negative number and refuse to perform the read. Replace kstrtou32_from_user() with kstrtos32_from_user() to parse the input, and check for negative values to explicitly prevent out-of-bounds memory accesses directly in iommu_mmio_write() and iommu_capability_write(). Signed-off-by: Eder Zulian <ezulian@redhat.com> Fixes: `7a4ee419e8` ("iommu/amd: Add debugfs support to dump IOMMU MMIO registers") Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-05-11 09:52:54 +02:00
Weinan Liu	10161b4a79	iommu/amd: Fix precedence order in set_dte_passthrough() Bitwise OR \| operator has a higher precedence than the ternary ?: operatior. It will be incorrectly evaluated as: new->data[1] \|= (FIELD_PREP(...) \| dev_data->ats_enabled) ? DTE_FLAG_IOTLB : 0; Wrap the conditional operation in parentheses to enforce the correct evaluation order. Fixes: `93eee2a49c` ("iommu/amd: Refactor logic to program the host page table in DTE") Signed-off-by: Weinan Liu <wnliu@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-05-04 10:26:16 +02:00
Vasant Hegde	1f44aab79b	iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19 Due to CVE-2023-20585, the PPR log buffer must use the maximum supported size (512K) on Genoa (Family 0x19, model >= 0x10) systems when SNP is enabled, to mitigate a potential security vulnerability. Note that Family 0x19 models below 0x10 (Milan) do not support PPR when SNP is enabled. Hence the PPR log size increase is only applied for model >= 0x10. All other systems continue to use the default PPR log buffer size (8K). Apply the errata fix by making the following changes: - Introduce global new variable (amd_iommu_pprlog_size) to have PPR log buffer size. Adjust variable size for Genoa family. - Extend 'amd_iommu_apply_erratum_snp()' to also set the PPR log buffer size to maximum for Family 0x19 model >= 0x10 when SNP is enabled. - Rename PPR_* macros to make it more readable. Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3016.html Cc: Borislav Petkov <bp@alien8.de> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-04-27 13:49:36 +02:00
Vasant Hegde	58c0ac6125	iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19 Due to CVE-2023-20585, the Event log buffer must use the maximum supported size (512K) on Milan/Genoa (Family 0x19) systems when SNP is enabled, to mitigate a potential security vulnerability. All other systems continue to use the default Event log buffer size (8K). Apply the errata fix by making the following changes: * Introduce new global variable (amd_iommu_evtlog_size) to have event log buffer size. Adjust variable size for family 0x19. * Since 'iommu_snp_enable()' must be called after the core IOMMU subsystem is initialized, it cannot be moved to the early init stage. The SNP errata must also be applied after the 'iommu_snp_enable()' check. Therefore, 'alloc_event_buffer()' and 'iommu_enable_event_buffer()' are now called in the IOMMU_ENABLED state, after the errata is applied. * Adjust alloc_event_buffer() and iommu_enable_event_buffer() to handle all IOMMU instances. * Also rename EVT_* macros to make it more readable. Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3016.html Cc: Borislav Petkov <bp@alien8.de> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-04-27 13:49:16 +02:00
Will Deacon	f8d5e7066d	Merge branches 'fixes', 'arm/smmu/updates', 'arm/smmu/bindings', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2026-04-09 13:18:27 +01:00
Magnus Kalland	5aac28784d	iommu/amd: Invalidate IRT cache for DMA aliases DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared between multiple device IDs. See commit `3c124435e8` ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more information on this. However, the AMD IOMMU driver currently invalidates IRTE cache entries on a per-device basis whenever an IRTE is updated, not for each alias. This approach leaves stale IRTE cache entries when an IRTE is cached under one DMA alias but later updated and invalidated through a different alias. In such cases, the original device ID is never invalidated, since it is programmed via aliasing. This incoherency bug has been observed when IRTEs are cached for one Non-Transparent Bridge (NTB) DMA alias, later updated via another. Fix this by invalidating the interrupt remapping table cache for all DMA aliases when updating an IRTE. Co-developed-by: Lars B. Kristiansen <larsk@dolphinics.com> Signed-off-by: Lars B. Kristiansen <larsk@dolphinics.com> Co-developed-by: Jonas Markussen <jonas@dolphinics.com> Signed-off-by: Jonas Markussen <jonas@dolphinics.com> Co-developed-by: Tore H. Larsen <torel@simula.no> Signed-off-by: Tore H. Larsen <torel@simula.no> Signed-off-by: Magnus Kalland <magnus@dolphinics.com> Link: https://lore.kernel.org/linux-iommu/9204da81-f821-4034-b8ad-501e43383b56@amd.com/ Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-04-02 11:42:45 +02:00
Vasant Hegde	faad224fe0	iommu/amd: Fix clone_alias() to use the original device's devid Currently clone_alias() assumes first argument (pdev) is always the original device pointer. This function is called by pci_for_each_dma_alias() which based on topology decides to send original or alias device details in first argument. This meant that the source devid used to look up and copy the DTE may be incorrect, leading to wrong or stale DTE entries being propagated to alias device. Fix this by passing the original pdev as the opaque data argument to both the direct clone_alias() call and pci_for_each_dma_alias(). Inside clone_alias(), retrieve the original device from data and compute devid from it. Fixes: `3332364e4e` ("iommu/amd: Support multiple PCI DMA aliases in device table") Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-04-02 09:31:24 +02:00
Jason Gunthorpe	1c18a1212c	iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain iommupt always supports the semantics required for DMA-FQ, when drivers are converted to use it they automatically get support. Detect iommpt directly instead of using IOMMU_CAP_DEFERRED_FLUSH and remove IOMMU_CAP_DEFERRED_FLUSH from converted drivers. This will also enable DMA-FQ on RISC-V. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-04-01 09:50:20 +02:00
Guanghui Feng	0e59645683	iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs In the current AMD IOMMU debugfs, when multiple processes simultaneously access the IOMMU mmio/cap registers using the IOMMU debugfs, illegal access issues can occur in the following execution flow: 1. CPU1: Sets a valid access address using iommu_mmio/capability_write, and verifies the access address's validity in iommu_mmio/capability_show 2. CPU2: Sets an invalid address using iommu_mmio/capability_write 3. CPU1: accesses the IOMMU mmio/cap registers based on the invalid address, resulting in an illegal access. This patch modifies the execution process to first verify the address's validity and then access it based on the same address, ensuring correctness and robustness. Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-03-27 09:26:59 +01:00
Guanghui Feng	e4172c5b53	iommu/amd: Fix illegal device-id access in IOMMU debugfs In the current AMD IOMMU debugFS, when multiple processes use the IOMMU debugFS process simultaneously, illegal access issues can occur in the following execution flow: 1. CPU1: Sets a valid sbdf via devid_write, then checks the sbdf's validity in execution flows such as devid_show, iommu_devtbl_show, and iommu_irqtbl_show. 2. CPU2: Sets an invalid sbdf via devid_write, at which point the sbdf value is -1. 3. CPU1: accesses the IOMMU device table, IRQ table, based on the invalid SBDF value of -1, resulting in illegal access. This is especially problematic in monitoring scripts, where multiple scripts may access debugFS simultaneously, and some scripts may unexpectedly set invalid values, which triggers illegal access in debugfs. This patch modifies the execution flow of devid_show, iommu_devtbl_show, and iommu_irqtbl_show to ensure that these processes determine the validity and access based on the same device-id, thus guaranteeing correctness and robustness. Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-03-27 09:26:58 +01:00
Shameer Kolothum	a82efb8747	iommu: Add device ATS supported capability PCIe ATS may be disabled by platform firmware, root complex limitations, or kernel policy even when a device advertises the ATS capability in its PCI configuration space. Add a new IOMMU_CAP_PCI_ATS_SUPPORTED capability to allow IOMMU drivers to report the effective ATS decision for a device. When this capability is true for a device, ATS may be enabled for that device, but it does not imply that ATS is currently enabled. A subsequent patch will extend iommufd to expose the effective ATS status to userspace. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-03-17 14:05:05 +01:00
Joe Damato	ba17de9854	iommu/amd: Block identity domain when SNP enabled Previously, commit `8388f7df93` ("iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled") prevented users from changing the IOMMU domain to identity if SNP was enabled. This resulted in an error when writing to sysfs: # echo "identity" > /sys/kernel/iommu_groups/50/type -bash: echo: write error: Cannot allocate memory However, commit `4402f2627d` ("iommu/amd: Implement global identity domain") changed the flow of the code, skipping the SNP guard and allowing users to change the IOMMU domain to identity after a machine has booted. Once the user does that, they will probably try to bind and the device/driver will start to do DMA which will trigger errors: iommu ivhd3: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:43:00.0 pasid=0x00000 address=0x3737b01000 flags=0x0020] iommu ivhd3: AMD-Vi: Control Reg : 0xc22000142148d AMD-Vi: DTE[0]: 6000000000000003 AMD-Vi: DTE[1]: 0000000000000001 AMD-Vi: DTE[2]: 2000003088b3e013 AMD-Vi: DTE[3]: 0000000000000000 bnxt_en 0000:43:00.0 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0 iommu ivhd3: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:43:00.0 pasid=0x00000 address=0x3737b01000 flags=0x0020] iommu ivhd3: AMD-Vi: Control Reg : 0xc22000142148d AMD-Vi: DTE[0]: 6000000000000003 AMD-Vi: DTE[1]: 0000000000000001 AMD-Vi: DTE[2]: 2000003088b3e013 AMD-Vi: DTE[3]: 0000000000000000 bnxt_en 0000:43:00.0: probe with driver bnxt_en failed with error -16 To prevent this from happening, create an attach wrapper for identity_domain_ops which returns EINVAL if amd_iommu_snp_en is true. With this commit applied: # echo "identity" > /sys/kernel/iommu_groups/62/type -bash: echo: write error: Invalid argument Fixes: `4402f2627d` ("iommu/amd: Implement global identity domain") Signed-off-by: Joe Damato <joe@dama.to> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-03-17 14:02:02 +01:00
Li RongQing	1e0c8d6b69	iommu/amd: Add NUMA node affinity for IOMMU log buffers Currently, PPR Log and GA logs for AMD IOMMU are allocated using iommu_alloc_pages_sz(), which does not account for NUMA affinity. This can lead to remote memory access latencies if the memory is allocated on a different node than the IOMMU hardware. Switch to iommu_alloc_pages_node_sz() to ensure that these data structures are allocated on the same NUMA node as the IOMMU device. If the node information is unavailable, it defaults to NUMA_NO_NODE. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-03-17 13:26:38 +01:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	1e0ea4dff0	IOMMU Updates for Linux v7.0 Including: - Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets - Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates - AMD-Vi changes: - Support for nested translations - Other minor improvements - ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS". - ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes. - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors. - Add device-tree support for NVIDIA's CMDQV extension. - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields. - Don't disable ATS for nested S1-bypass nested domains. - Additions to the kunit selftests. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmmLDZwACgkQK/BELZcB GuNHgg//Yf9K/+T6+IOemA5Z8k3x2p39Q/Dv5x+SEGkh+CUh2C5dX97WD9LHntus 1mgIHlSgbM3bgMB+XTS1Q5ghy1QH71XOMnGCPhthwg843iCP2CcrB84ZZKKnNmw9 2YJdxYlNcbAMpvSd0F1XKaXoiNl9qzWx+QFtnVaTXMptNEhYOxMOlaZPtlEuwfJa T7h4cwtsiMDLWA4pw85y4hfvc5jKRv4dMoohin0lNEBpWkCfYE6b2Cjpff+9TtU2 Jyvvcvyns0US3amEwPHlIyfTUPKdaq6Vv3NX8TkAJUhGyEzdfwEtzqAvWMvOEYFh HfnE/LjZZLB1CUkF5MTib9dBgJACf/jtvOtuh4wZkx+7O2WIR6Ebo41dtWBM6dxh cHGeeQGqxdDZ5UJbIonF8Am0lxsaZx2zs09tlHEMGl2pNDi6vUppk1iTOkv3Wog0 zy4GhDBl0n/IcyCaIinnWck8C+BsAMcRGpDP2AB0I9/C2qpsaFY/NdNkbIGidhaJ 3khdAcjWsNPiJPNbUx66n6t8RSXdYKUuhJq2a/GgYmtAjhRR9cJlupB8/QYCBS5j fxXpHp4xMtw+Cgj58xC+gYXDivQOEThPs/BhL/qrxOzWE03HWI15MFydqRFWicnI gJCZSevMncBfNUTIJUSUmuT7ukP40cnh58QBeRkTmKGcW6HjuyY= =W/nW -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates AMD-Vi changes: - Support for nested translations - Other minor improvements ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS" ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors - Add device-tree support for NVIDIA's CMDQV extension - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields - Don't disable ATS for nested S1-bypass nested domains - Additions to the kunit selftests" * tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommupt: Always add IOVA range to iotlb_gather in gather_range_pages() iommu/amd: serialize sequence allocation under concurrent TLB invalidations iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence iommu/arm-smmu-v3: Add device-tree support for CMDQV driver iommu/tegra241-cmdqv: Decouple driver from ACPI iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p iommu/vt-d: Fix race condition during PASID entry replacement iommu/vt-d: Clear Present bit before tearing down context entry iommu/vt-d: Clear Present bit before tearing down PASID entry iommu/vt-d: Flush piotlb for SVM and Nested domain iommu/vt-d: Flush cache for PASID table before using it iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode rust: iommu: fix `srctree` link warning rust: iommu: fix Rust formatting ...	2026-02-11 16:36:08 -08:00
Linus Torvalds	4e21e585b6	A series of treewide cleanups to ensure interrupt request consistency. - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq() This is inconsistent vs. request_irq() and causes the same issues which where addressed with the introduction of this flag - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage Quite some drivers have inconsistent interrupt request flags related to interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to warnings and/or malfunction when forced interrupt threading is enabled. - Remove stub primary (hard interrupt) handlers A bunch of drivers implement a stub primary (hard interrupt) handler which just returns IRQ_WAKE_THREAD. The same functionality is provided by the core code when the primary handler argument of request_thread_irq() is set to NULL. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJs8MQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoTbvEACH4OegGofKri7aecUPNcpRdQDHBoueikni Rio/vydFJ/H2hto4xlSPC4C84onxuFqY9lJgo/tCQTCrO0t+ZQ4ZGqnlQKzLJzmv vcVzNgGsxDZ0p1wJO0rBpTRxJN8yTXi8VVv5e6OPuihjLhdXGesyYtk1zosR3nOS CF/w8r9jVMzsSMPvtEMr5AwXD9ZTziUqyhQv94fYlpsbyD4TPXnUxhVkdUFFHHo3 ROzWPFw1Ykh6wpdRPEpupcCf1d2Pq0TIAU86y3Sbf2msuXiTouHf+lH1uTd3EsLN 6qUIqRYjwWE8HTieh+3YcH415wrIsUsWJb8YDi0DpqhPbja3IXP5ACHqEWaaNHRA MaBE2Gc02se4ChXMWncYR3cdzyAAwAeKLUahpLNc+7U4cHOm1w2g60yy4I0v2krh V0vfEN88WQ8DgrM0VvDLST6ZinSz4ia+R0qYWywl6eIW4RVNtuBi6wrN5PtzSEtz jZ3LqnRLGmNfKwS/taHBCAme7NIJSNa1L0ao/icnW5XVQz/d2EHVcUsLHecHZSMx l9tr/g3t85tsFW1eIKfF8T1a5DrbCEP4afceQk9KexAfAkP7el53M1E1yQDk/kW8 so0CwZtbDJ136RQdBIQqx49QrUEOvtrgNDRQxPFBUrWEHcvjqbUuFclp9hpLheOj 8YnzkVe0Rg== =vrmm -----END PGP SIGNATURE----- Merge tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A series of treewide cleanups to ensure interrupt request consistency. - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq() This is inconsistent vs request_irq() and causes the same issues which where addressed with the introduction of this flag - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage Quite some drivers have inconsistent interrupt request flags related to interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to warnings and/or malfunction when forced interrupt threading is enabled. - Remove stub primary (hard interrupt) handlers A bunch of drivers implement a stub primary (hard interrupt) handler which just returns IRQ_WAKE_THREAD. The same functionality is provided by the core code when the primary handler argument of request_thread_irq() is set to NULL" * tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: media: pci: mg4b: Use IRQF_NO_THREAD mfd: wm8350-core: Use IRQF_ONESHOT thermal/qcom/lmh: Replace IRQF_ONESHOT with IRQF_NO_THREAD rtc: amlogic-a4: Remove IRQF_ONESHOT usb: typec: fusb302: Remove IRQF_ONESHOT EDAC/altera: Remove IRQF_ONESHOT char: tpm: cr50: Remove IRQF_ONESHOT ARM: versatile: Remove IRQF_ONESHOT scsi: efct: Use IRQF_ONESHOT and default primary handler Bluetooth: btintel_pcie: Use IRQF_ONESHOT and default primary handler bus: fsl-mc: Use default primary handler mailbox: bcm-ferxrm-mailbox: Use default primary handler iommu/amd: Use core's primary handler and set IRQF_ONESHOT platform/x86: int0002: Remove IRQF_ONESHOT from request_irq() genirq: Set IRQF_COND_ONESHOT in devm_request_irq().	2026-02-10 13:22:50 -08:00
Joerg Roedel	ad09563660	Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2026-02-06 11:10:40 +01:00
Ankit Soni	9e249c4841	iommu/amd: serialize sequence allocation under concurrent TLB invalidations With concurrent TLB invalidations, completion wait randomly gets timed out because cmd_sem_val was incremented outside the IOMMU spinlock, allowing CMD_COMPL_WAIT commands to be queued out of sequence and breaking the ordering assumption in wait_on_sem(). Move the cmd_sem_val increment under iommu->lock so completion sequence allocation is serialized with command queuing. And remove the unnecessary return. Fixes: `d2a0cac105` ("iommu/amd: move wait_on_sem() out of spinlock") Tested-by: Srikanth Aithal <sraithal@amd.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-02-03 14:27:05 +01:00
Sebastian Andrzej Siewior	5bfcdccb4d	iommu/amd: Use core's primary handler and set IRQF_ONESHOT request_threaded_irq() is invoked with a primary and a secondary handler and no flags are passed. The primary handler is the same as irq_default_primary_handler() so there is no need to have an identical copy. The lack of the IRQF_ONESHOT can be dangerous because the interrupt source is not masked while the threaded handler is active. This means, especially on LEVEL typed interrupt lines, the interrupt can fire again before the threaded handler had a chance to run. Use the default primary interrupt handler by specifying NULL and set IRQF_ONESHOT so the interrupt source is masked until the secondary handler is done. Fixes: `72fe00f01f` ("x86/amd-iommu: Use threaded interupt handler") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128095540.863589-4-bigeasy@linutronix.de	2026-02-01 17:37:13 +01:00
Nathan Chancellor	5b0530bb16	iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() When building with -Wincompatible-function-pointer-types-strict, a warning designed to catch kernel control flow integrity (kCFI) issues at build time, there is an instance around amd_iommufd_hw_info(): drivers/iommu/amd/iommu.c:3141:13: error: incompatible function pointer types initializing 'void ()(struct device , u32 , enum iommu_hw_info_type )' (aka 'void ()(struct device , unsigned int , enum iommu_hw_info_type )') with an expression of type 'void (struct device , u32 , u32 )' (aka 'void (struct device , unsigned int , unsigned int )') [-Werror,-Wincompatible-function-pointer-types-strict] 3141 \| .hw_info = amd_iommufd_hw_info, \| ^~~~~~~~~~~~~~~~~~~ While 'u32 ' and 'enum iommu_hw_info_type ' are ABI compatible, hence no regular warning from -Wincompatible-function-pointer-types, the mismatch will trigger a kCFI violation when amd_iommufd_hw_info() is called indirectly. Update the type parameter of amd_iommufd_hw_info() to be 'enum iommu_hw_info_type *' to match the prototype in 'struct iommu_ops', clearing up the warning and kCFI violation. Fixes: `7d8b06ecc4` ("iommu/amd: Add support for hw_info for iommu capability query") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-28 15:13:01 +01:00
Suravee Suthikulpanit	c0a652a3d1	iommu/amd: Remove unused variable in amd_iommufd_viommu_destroy() This fixes warning reported by 0-DAY CI Kernel Test Service. Fixes: `757d2b1fdf` ("iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601190634.bl7Mjx5Q-lkp@intel.com/ Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-20 10:16:16 +01:00
Vasant Hegde	3222b6de51	iommu/amd: Fix error path in amd_iommu_probe_device() Currently, the error path of amd_iommu_probe_device() unconditionally references dev_data, which may not be initialized if an early failure occurs (like iommu_init_device() fails). Move the out_err label to ensure the function exits immediately on failure without accessing potentially uninitialized dev_data. Fixes: `19e5cc156c` ("iommu/amd: Enable support for up to 2K interrupts per function") Cc: Rakuram Eswaran <rakuram.e96@gmail.com> Cc: Jörg Rödel <joro@8bytes.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/ Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 11:03:12 +01:00
Suravee Suthikulpanit	103f4e7c85	iommu/amd: Add support for nested domain attach/detach Introduce set_dte_nested() to program guest translation settings in the host DTE when attaches the nested domain to a device. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:15 +01:00
Suravee Suthikulpanit	93eee2a49c	iommu/amd: Refactor logic to program the host page table in DTE Introduce the amd_iommu_set_dte_v1() helper function to configure IOMMU host (v1) page table into DTE. This will be used later when attaching nested doamin. Also, remove obsolete warning when SNP is enabled and domain id is zero since this check is no longer applicable. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:15 +01:00
Suravee Suthikulpanit	4e1b09d90b	iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() To help avoid duplicate logic when programing DTE for nested translation. Note that this commit changes behavior of when the IOMMU driver is switching domain during attach and the blocking domain, where DTE bit fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt) and System management message could be affected. These DTE bits are specified in the IVRS table for specific devices, and should be persistent. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:14 +01:00
Suravee Suthikulpanit	757d2b1fdf	iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Each nested domain is assigned guest domain ID (gDomID), which guest OS programs into guest Device Table Entry (gDTE). For each gDomID, the driver assigns a corresponding host domain ID (hDomID), which will be programmed into the host Device Table Entry (hDTE). The hDomID is allocated during amd_iommu_alloc_domain_nested(), and free during nested_domain_free(). The gDomID-to-hDomID mapping info (struct guest_domain_mapping_info) is stored in a per-viommu xarray (struct amd_iommu_viommu.gdomid_array), which is indexed by gDomID. Note also that parent domain can be shared among struct iommufd_viommu. Therefore, when hypervisor invalidates the nest parent domain, the AMD IOMMU command INVALIDATE_IOMMU_PAGES must be issued for each hDomID in the gdomid_array. This is handled by the iommu_flush_pages_v1_hdom_ids(), where it iterates through struct protection_domain.viommu_list. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:14 +01:00
Suravee Suthikulpanit	774180a74a	iommu/amd: Add support for nested domain allocation The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer table along with guest page tables. The struct iommu_hwpt_amd_guest contains this information, and is passed from user-space as a parameter of the struct iommu_ops.domain_alloc_nested(). Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:13 +01:00
Suravee Suthikulpanit	e113a72576	iommu/amd: Introduce struct amd_iommu_viommu Which stores reference to nested parent domain assigned during the call to struct iommu_ops.viommu_init(). Information in the nest parent is needed when setting up the nested translation. Note that the viommu initialization will be introduced in subsequent commit. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:12 +01:00
Suravee Suthikulpanit	b43a29def2	iommu/amd: Add support for nest parent domain allocation To support nested translation, the nest parent domain is allocated with IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page table for stage 2 (i.e. GPA->SPA). Also, only support nest parent domain on AMD system, which can support the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in order to program DTE[GCR3 Table Root Pointer] with the GPA. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:12 +01:00
Suravee Suthikulpanit	b2bb0573dd	iommu/amd: Always enable GCR3TRPMode when supported. The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured with GPA (instead of SPA). This simplifies the implementation, and is a pre-requisite for nested translation support. Therefore, always enable this feature if available. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:12 +01:00
Suravee Suthikulpanit	9b467a5af8	iommu/amd: Introduce helper function amd_iommu_update_dte() Which includes DTE update, clone_aliases, DTE flush and completion-wait commands to avoid code duplication when reuse to setup DTE for nested translation. Also, make amd_iommu_update_dte() non-static to reuse in in a new nested.c file for nested translation. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:11 +01:00
Suravee Suthikulpanit	11cfa782f0	iommu/amd: Make amd_iommu_make_clear_dte() non-static inline This will be reused in a new nested.c file for nested translation. Also, remove unused function parameter ptr. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:11 +01:00
Suravee Suthikulpanit	5335fc1657	iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Also change the define to use GENMASK_ULL instead. There is no functional change. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:10 +01:00
Suravee Suthikulpanit	7d8b06ecc4	iommu/amd: Add support for hw_info for iommu capability query AMD IOMMU Extended Feature (EFR) and Extended Feature 2 (EFR2) registers specify features supported by each IOMMU hardware instance. The IOMMU driver checks each feature-specific bits before enabling each feature at run time. For IOMMUFD, the hypervisor passes the raw value of amd_iommu_efr and amd_iommu_efr2 to VMM via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:09 +01:00
Rakuram Eswaran	2e66659565	iommu/amd: Drop incorrect NULL check for iommu in alloc_irq_table() alloc_irq_table() contains a conditional check for a NULL iommu pointer when computing the NUMA node, but the function dereferences iommu in multiple places afterwards. All callers ensure that a valid iommu pointer is passed in, and a NULL iommu is not expected by the current callers. Remove the incorrect NULL check to make the assumptions consistent and address the Smatch warning. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/ Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 11:17:43 +01:00
Ankit Soni	d2a0cac105	iommu/amd: move wait_on_sem() out of spinlock With iommu.strict=1, the existing completion wait path can cause soft lockups under stressed environment, as wait_on_sem() busy-waits under the spinlock with interrupts disabled. Move the completion wait in iommu_completion_wait() out of the spinlock. wait_on_sem() only polls the hardware-updated cmd_sem and does not require iommu->lock, so holding the lock during the busy wait unnecessarily increases contention and extends the time with interrupts disabled. Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:54:38 +01:00
Sairaj Kodilkar	c7fe9384c8	amd/iommu: Make protection domain ID functions non-static So that both iommu.c and init.c can utilize them. Also define a new function 'pdom_id_destroy()' to destroy 'pdom_ids' instead of directly calling ida functions. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-12-19 11:23:49 +01:00
Sairaj Kodilkar	c2e8dc1222	amd/iommu: Preserve domain ids inside the kdump kernel Currently AMD IOMMU driver does not reserve domain ids programmed in the DTE while reusing the device table inside kdump kernel. This can cause reallocation of these domain ids for newer domains that are created by the kdump kernel, which can lead to potential IO_PAGE_FAULTs Hence reserve these ids inside pdom_ids. Fixes: `38e5f33ee3` ("iommu/amd: Reuse device table for kdump") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reported-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-12-19 11:23:48 +01:00
Linus Torvalds	249872f53d	tsm for 6.19 - Introduce the PCI/TSM core for the coordination of device authentication, link encryption and establishment (IDE), and later management of the device security operational states (TDISP). Notify the new TSM core layer of PCI device arrival and departure. - Add a low level TSM driver for the link encryption establishment capabilities of the AMD SEV-TIO architecture. - Add a library of helpers TSM drivers to use for IDE establishment and the DOE transport. - Add skeleton support for 'bind' and 'guest_request' operations in support of TDISP. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCaTOdAwAKCRDfioYZHlFs Z/fWAQDS5mwS/8rn0UdH/SijTm/oKVxdiyIQbTstrjk8AySITgEA5ki9w2iKa0WG x1ACZKlo9gS9emyx4wuJpCBIMtR50Qc= =B4oG -----END PGP SIGNATURE----- Merge tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm Pull PCIe Link Encryption and Device Authentication from Dan Williams: "New PCI infrastructure and one architecture implementation for PCIe link encryption establishment via platform firmware services. This work is the result of multiple vendors coming to consensus on some core infrastructure (thanks Alexey, Yilun, and Aneesh!), and three vendor implementations, although only one is included in this pull. The PCI core changes have an ack from Bjorn, the crypto/ccp/ changes have an ack from Tom, and the iommu/amd/ changes have an ack from Joerg. PCIe link encryption is made possible by the soup of acronyms mentioned in the shortlog below. Link Integrity and Data Encryption (IDE) is a protocol for installing keys in the transmitter and receiver at each end of a link. That protocol is transported over Data Object Exchange (DOE) mailboxes using PCI configuration requests. The aspect that makes this a "platform firmware service" is that the key provisioning and protocol is coordinated through a Trusted Execution Envrionment (TEE) Security Manager (TSM). That is either firmware running in a coprocessor (AMD SEV-TIO), or quasi-hypervisor software (Intel TDX Connect / ARM CCA) running in a protected CPU mode. Now, the only reason to ask a TSM to run this protocol and install the keys rather than have a Linux driver do the same is so that later, a confidential VM can ask the TSM directly "can you certify this device?". That precludes host Linux from provisioning its own keys, because host Linux is outside the trust domain for the VM. It also turns out that all architectures, save for one, do not publish a mechanism for an OS to establish keys in the root port. So "TSM-established link encryption" is the only cross-architecture path for this capability for the foreseeable future. This unblocks the other arch implementations to follow in v6.20/v7.0, once they clear some other dependencies, and it unblocks the next phase of work to implement the end-to-end flow of confidential device assignment. The PCIe specification calls this end-to-end flow Trusted Execution Environment (TEE) Device Interface Security Protocol (TDISP). In the meantime, Linux gets a link encryption facility which has practical benefits along the same lines as memory encryption. It authenticates devices via certificates and may protect against interposer attacks trying to capture clear-text PCIe traffic. Summary: - Introduce the PCI/TSM core for the coordination of device authentication, link encryption and establishment (IDE), and later management of the device security operational states (TDISP). Notify the new TSM core layer of PCI device arrival and departure - Add a low level TSM driver for the link encryption establishment capabilities of the AMD SEV-TIO architecture - Add a library of helpers TSM drivers to use for IDE establishment and the DOE transport - Add skeleton support for 'bind' and 'guest_request' operations in support of TDISP" * tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: (23 commits) crypto/ccp: Fix CONFIG_PCI=n build virt: Fix Kconfig warning when selecting TSM without VIRT_DRIVERS crypto/ccp: Implement SEV-TIO PCIe IDE (phase1) iommu/amd: Report SEV-TIO support psp-sev: Assign numbers to all status codes and add new ccp: Make snp_reclaim_pages and __sev_do_cmd_locked public PCI/TSM: Add 'dsm' and 'bound' attributes for dependent functions PCI/TSM: Add pci_tsm_guest_req() for managing TDIs PCI/TSM: Add pci_tsm_bind() helper for instantiating TDIs PCI/IDE: Initialize an ID for all IDE streams PCI/IDE: Add Address Association Register setup for downstream MMIO resource: Introduce resource_assigned() for discerning active resources PCI/TSM: Drop stub for pci_tsm_doe_transfer() drivers/virt: Drop VIRT_DRIVERS build dependency PCI/TSM: Report active IDE streams PCI/IDE: Report available IDE streams PCI/IDE: Add IDE establishment helpers PCI: Establish document for PCI host bridge sysfs attributes PCI: Add PCIe Device 3 Extended Capability enumeration PCI/TSM: Establish Secure Sessions and Link Encryption ...	2025-12-06 10:15:41 -08:00
Linus Torvalds	208eed95fc	soc: driver updates for 6.19 This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes. - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs. - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access. - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmky/8gACgkQmmx57+YA GNlqohAApPTLM6Q4gf1cIcsTVaP0uxx9CBgupCGuT5ORrOMKBghVWjTOTSxeEAab UQF465QwYUUu602GH34UmRaY9CKW2bMIsfmkgmxNB4Y4Qd7yCgQNJ/h/TnN0rBH+ qTeEsRH/hax4miSNsh0oOZfVkZkg+23VF02d1VL0CcaX7y4oT45RPBQugrNx/gNS fHfVwgIq8vJ8WyrmM1h2nv1i1vgSzEy50B3kY674BBw83FcJTafNLvD7N5DSgD1H /I/2xeyEpb+oL1VfeHcXZaX/jf04O+cmvSzBi+MOH1tI3MpdxJib1vEYBdggoOWN K/FFGgsOY+DNmJPpSnPTTu8UpzksS8SxGBP7M9Q8roKZwA2c9wLotxySvjki5yv8 2zvabRdzbrSaoYwsH9QnZdQ2hVkJ9W8MESu8PevD3yMNuFUzledPDWW0N1SbGm78 0ZdB6NPdaBZYHMNMRdFhN8P275/Mx5e0XWN9oYMQqjPooH7YkyT7hJWz6ao2PCJP 8mDmnW1RzL+LWf7mJ25ZEtS+YjmKA/PVmogRrGurKCadvdxXqCF09KNljICHhmmu t0KB4dqw02OXLPvBk21qCi0zL56w1JDgqtS8suFvDYo9sCceeAbAcmpyoUOFj2N+ Upn976tb4iqFrr9mFswpmCJWPpqJkU+A+KnKsIRPU7N4kSrP35I= =HvlN -----END PGP SIGNATURE----- Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...	2025-12-05 17:29:04 -08:00
Alexey Kardashevskiy	eeb934137d	iommu/amd: Report SEV-TIO support The SEV-TIO switch in the AMD BIOS is reported to the OS via the IOMMU Extended Feature 2 register (EFR2), bit 1. Add helper to parse the bit and report the feature presence. Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Link: https://patch.msgid.link/20251202024449.542361-4-aik@amd.com Acked-by: Joerg Roedel <joerg.roedel@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2025-12-02 12:06:45 -08:00
Joerg Roedel	0d081b1694	Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'mediatek', 'nvidia/tegra', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2025-11-28 08:44:21 +01:00
Jason Gunthorpe	1eb0ae6fbd	iommupt/vtd: Support mgaw's less than a 4 level walk for first stage If the IOVA is limited to less than 48 the page table will be constructed with a 3 level configuration which is unsupported by hardware. Like the second stage the caller needs to pass in both the top_level an the vasz to specify a table that has more levels than required to hold the IOVA range. Fixes: `6cbc09b771` ("iommu/vt-d: Restore previous domain::aperture_end calculation") Reported-by: Calvin Owens <calvin@wbinvd.org> Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-28 08:43:55 +01:00
Jinhui Guo	2381a1b40b	iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga() The return type of __modify_irte_ga() is int, but modify_irte_ga() treats it as a bool. Casting the int to bool discards the error code. To fix the issue, change the type of ret to int in modify_irte_ga(). Fixes: `57cdb720ea` ("iommu/amd: Do not flush IRTE when only updating isRun and destination fields") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-25 15:10:23 +01:00
Thierry Reding	a97fbc3ee3	syscore: Pass context data to callbacks Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>	2025-11-14 10:01:52 +01:00
Jinhui Guo	75ba146c26	iommu/amd: Fix pci_segment memleak in alloc_pci_segment() Fix a memory leak of struct amd_iommu_pci_segment in alloc_pci_segment() when system memory (or contiguous memory) is insufficient. Fixes: `04230c1199` ("iommu/amd: Introduce per PCI segment device table") Fixes: `eda797a277` ("iommu/amd: Introduce per PCI segment rlookup table") Fixes: `99fc4ac3d2` ("iommu/amd: Introduce per PCI segment alias_table") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-13 16:15:56 +01:00
Dheeraj Kumar Srivastava	d1e281f832	iommu/amd: Enhance "Completion-wait Time-out" error message Current IOMMU driver prints "Completion-wait Time-out" error message with insufficient information to further debug the issue. Enhancing the error message as following: 1. Log IOMMU PCI device ID in the error message. 2. With "amd_iommu_dump=1" kernel command line option, dump entire command buffer entries including Head and Tail offset. Dump the entire command buffer only on the first 'Completion-wait Time-out' to avoid dmesg spam. Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-13 16:13:26 +01:00

1 2 3 4 5 ...

621 Commits