Commit Graph

347 Commits

Author SHA1 Message Date
Linus Torvalds
43cfbdda5a iommufd v7.1 merge window pull request
Several fixes:
 
 - Add missing static const
 
 - Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is
   turned on
 
 - Fix selftest memory leak and syzkaller splat
 
 - Fix missed -EFAULT in fault reporting write() fops
 
 - Fix a race where map/unmap with the internal IOVA allocator can unmap
   things it should not
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaeECkwAKCRCFwuHvBreF
 YT5LAQD2DWHgrrgiv3z0X3J+ymvMPIw32hI2J7rAbrR+olAvigEA5adGbQAUI6cC
 msu5z8gv6bK1bzcqPbkL8hPlNU0naQQ=
 =fe0o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Several fixes:

   - Add missing static const

   - Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is
     turned on

   - Fix selftest memory leak and syzkaller splat

   - Fix missed -EFAULT in fault reporting write() fops

   - Fix a race where map/unmap with the internal IOVA allocator can
     unmap things it should not"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Fix a race with concurrent allocation and unmap
  iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format
  iommufd: Fix return value of iommufd_fault_fops_write()
  iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc()
  iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy}
  iommufd: vfio compatibility extension check for noiommu mode
  iommufd: Constify struct dma_buf_attach_ops
2026-04-16 21:21:55 -07:00
Linus Torvalds
f1d26d72f0 IOMMU Updates for Linux v7.1:
Including:
 
 	- Core:
 	  - Support for RISC-V IO-page-table format in generic iommupt
 	    code
 
 	- ARM-SMMU Updates:
 
 	  - Introduction of an "invalidation array" for SMMUv3, which enables
 	    future scalability work and optimisations for devices with a large
 	    number of SMMUv3 instances.
 
 	  - Update the conditions under which the SMMUv3 driver works around
 	    hardware errata for invalidation on MMU-700 implementations.
 
 	  - Fix broken command filtering for the host view of NVIDIA's "cmdqv"
 	    SMMUv3 extension.
 
 	  - MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi SoCs.
 
 	- Intel VT-d:
 
 	  - Support for dirty tracking on domains attached to PASID
 	  - Removal of unnecessary read*()/write*() wrappers
 	  - Improvements to the invalidation paths
 
 	- AMD Vi:
 
 	  - Race-condition fixed in debugfs code
 	  - Make log buffer allocation NUMA aware
 
 	- RISC-V:
 
 	  - IO-TLB flushing improvements
 	  - Minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmneJdAACgkQK/BELZcB
 GuO3TQ//cWG59NaY32ByWMJTug+DnlmunPG4xlUrD0JQCJXIcHrzKruhet4NifJL
 lE86+vN+s6I9EQgabmiDJPRZkZR20irDFMvuhfhVp8hprwxoan/fW7b4WY8Es0DM
 Q/AUioacyULBuKIl5XeEU1kAvKbgj2mlaWERTVKknh0jyItWwkEDvfR7G3eYIKDV
 dNhVjuow1byKnjrhE4Rslqz7HDalJVkDiADAdfbkV5+/HxqBJ19r84STW2YPkYLE
 ARIdwv3AI+NZfk9O0sBavMBs5v2nyNqr1j6kEiKd8hoFYOXY2Da1pWRSflTgFWil
 dYlXQZNGsznghBuc3VsqW75CMbSFtpWPx0LcQ0ClZUE11zxh+pfc1BekzvwHzunZ
 CTOcWastVkYHlvZTi3zHz68puy1omblN3r5juQhKOjQL+8N9BtRL4FObwj9XH8R0
 5Q9Fm2uWYr/1GCNZg0OvJNtQllxw8BTx/ssvof6sbq6P7OeT4cgErtqA9aQTdKuT
 6EvsW74GIllTNk2yuNHvQTpNWcW9iW7cEgwPaN/S5cXPZHZXhLjimOUGJkPmP3NY
 k2gcPAMkXi8MqC/gAr0kCDfxZPp5V8XWXI0lHd9qjxMvjZbLQlf0y6LArfA0hmQk
 oMzMJQuF81CK3+6fY4fXePOR5y0r+eWSCKYSr0HeXIH8eUvsrsY=
 =T6YT
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core:

   - Support for RISC-V IO-page-table format in generic iommupt code

  ARM-SMMU Updates:

   - Introduction of an "invalidation array" for SMMUv3, which enables
     future scalability work and optimisations for devices with a large
     number of SMMUv3 instances

   - Update the conditions under which the SMMUv3 driver works around
     hardware errata for invalidation on MMU-700 implementations

   - Fix broken command filtering for the host view of NVIDIA's "cmdqv"
     SMMUv3 extension

   - MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi
     SoCs

  Intel VT-d:

   - Support for dirty tracking on domains attached to PASID

   - Removal of unnecessary read*()/write*() wrappers

   - Improvements to the invalidation paths

  AMD Vi:

   - Race-condition fixed in debugfs code

   - Make log buffer allocation NUMA aware

  RISC-V:

   - IO-TLB flushing improvements

   - Minor fixes"

* tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits)
  iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCY
  dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC
  iommu/amd: Invalidate IRT cache for DMA aliases
  iommu/riscv: Remove overflows on the invalidation path
  iommu/amd: Fix clone_alias() to use the original device's devid
  iommu/vt-d: Remove the remaining pages along the invalidation path
  iommu/vt-d: Pass size_order to qi_desc_piotlb() not npages
  iommu/vt-d: Split piotlb invalidation into range and all
  iommu/vt-d: Remove dmar_writel() and dmar_writeq()
  iommu/vt-d: Remove dmar_readl() and dmar_readq()
  iommufd/selftest: Test dirty tracking on PASID
  iommu/vt-d: Support dirty tracking on PASID
  iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointer
  iommu/vt-d: Block PASID attachment to nested domain with dirty tracking
  iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain
  iommu/riscv: Fix signedness bug
  iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs
  iommu/amd: Fix illegal device-id access in IOMMU debugfs
  iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement
  iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init()
  ...
2026-04-15 15:05:51 -07:00
Sina Hassani
8602018b1f iommufd: Fix a race with concurrent allocation and unmap
iopt_unmap_iova_range() releases the lock on iova_rwsem inside the loop
body when getting to the more expensive unmap operations. This is fine on
its own, except the loop condition is based on the first area that matches
the unmap address range. If a concurrent call to map picks an area that
was unmapped in previous iterations, the loop mistakenly tries to unmap
it.

This is reproducible by having one userspace thread map buffers and pass
them to another thread that unmaps them. The problem manifests as EBUSY
errors with single page mappings.

Fix this by advancing the start pointer after unmapping an area. This
ensures each iteration only examines the IOVA range that remains mapped,
which is guaranteed not to have overlaps.

Cc: stable@vger.kernel.org
Fixes: 51fe6141f0 ("iommufd: Data structure to provide IOVA to PFN mapping")
Link: https://patch.msgid.link/r/CAAJpGJSR4r_ds1JOjmkqHtsBPyxu8GntoeW08Sk5RNQPmgi+tg@mail.gmail.com
Signed-off-by: Sina Hassani <sina@openai.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-11 09:57:02 -03:00
Pranjal Shrivastava
8c4dc1a502 iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format
syzbot found that allocating a mock domain with AMDV1 format could
cause a WARN_ON because the selftest enabled DYNAMIC_TOP without
providing the required driver_ops.

The AMDV1 format in the selftest was a placeholder and was not actually
used by any of the existing selftests. Instead of adding dummy
driver_ops to satisfy the requirements of a format we don't currently
test, remove the AMDV1 format option from the selftest.

The MOCK_IOMMUPT_DEFAULT and MOCK_IOMMUPT_HUGE formats are unaffected as
they use the amdv1_mock variant which does not enable DYNAMIC_TOP.

Fixes: dcd6a011a8 ("iommupt: Add map_pages op")
Link: https://patch.msgid.link/r/20260330092609.2659235-1-praan@google.com
Reported-by: syzbot+453eb7add07c3767adab@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69c1d50b.a70a0220.3cae05.0001.GAE@google.com/
Signed-off-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31 13:13:08 -03:00
Zhenzhong Duan
aaca2aa927 iommufd: Fix return value of iommufd_fault_fops_write()
copy_from_user() may return number of bytes failed to copy, we should
not pass over this number to user space to cheat that write() succeed.
Instead, -EFAULT should be returned.

Link: https://patch.msgid.link/r/20260330030755.12856-1-zhenzhong.duan@intel.com
Cc: stable@vger.kernel.org
Fixes: 07838f7fd5 ("iommufd: Add iommufd fault object")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31 13:12:18 -03:00
Kexin Sun
67cb50aee0 iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc()
The function iommufd_hw_pagetable_alloc() was renamed to
iommufd_hwpt_paging_alloc() by commit 89db31635c
("iommufd: Derive iommufd_hwpt_paging from
iommufd_hw_pagetable").  Update the stale reference in
iommufd_device_auto_get_domain().

Link: https://patch.msgid.link/r/20260321105759.6832-1-kexinsun@smail.nju.edu.cn
Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-25 09:35:14 -03:00
Shameer Kolothum
a11661a58c iommufd: Report ATS not supported status via IOMMU_GET_HW_INFO
If the IOMMU driver reports that ATS is not supported for a device, set
the IOMMU_HW_CAP_PCI_ATS_NOT_SUPPORTED flag in the returned hardware
capabilities.

This uses a negative flag for UAPI compatibility. Existing userspace
assumes ATS is supported if no flag is present. This also ensures that
new userspace works correctly on both old and new kernels, where a
zero value implies ATS support.

When this flag is set, ATS cannot be used for the device. When it is
clear, ATS may be enabled when an appropriate HWPT is attached.

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17 14:05:05 +01:00
Thorsten Blum
09c091fddb iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy}
mock_viommu_init() allocates two pages using __get_free_pages(..., 1),
but its error path and mock_viommu_destroy() only release the first page
using free_page(), leaking the second page. Use free_pages() with the
matching order instead to avoid any page leaks.

Fixes: 80478a2b45 ("iommufd/selftest: Add coverage for the new mmap interface")
Link: https://patch.msgid.link/r/20260312164040.457293-3-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-13 13:28:40 -03:00
Jacob Pan
7147ec874e iommufd: vfio compatibility extension check for noiommu mode
VFIO_CHECK_EXTENSION should return false for TYPE1_IOMMU variants when
in NO-IOMMU mode and IOMMUFD compat container is set. This change makes
the behavior match VFIO_CONTAINER in noiommu mode. It also prevents
userspace from incorrectly attempting to use TYPE1 IOMMU operations
in a no-iommu context.

Fixes: d624d6652a ("iommufd: vfio container FD ioctl compatibility")
Link: https://patch.msgid.link/r/20260213183636.3340-1-jacob.pan@linux.microsoft.com
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-02 20:29:01 -04:00
Christophe JAILLET
46a93917bf iommufd: Constify struct dma_buf_attach_ops
'struct dma_buf_attach_ops' is not modified in this driver.

Constifying this structure moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig:
Before:
======
   text	   data	    bss	    dec	    hex	filename
  81096	  13899	    192	  95187	  173d3	drivers/iommu/iommufd/pages.o

After:
=====
   text	   data	    bss	    dec	    hex	filename
  81160	  13835	    192	  95187	  173d3	drivers/iommu/iommufd/pages.o

Link: https://patch.msgid.link/r/67e9126bbffa1d5c05124773a8dd4a3493be77ac.1772139886.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-02 20:24:52 -04:00
Leon Romanovsky
8c5f9645c3 iommufd: Add dma_buf_pin()
IOMMUFD relies on a private protocol with VFIO, and this always operated
in pinned mode.

Now that VFIO can support pinned importers update IOMMUFD to invoke the
normal dma-buf flow to request pin.

This isn't enough to allow IOMMUFD to work with other exporters, it still
needs a way to get the physical address list which is another series.

IOMMUFD supports the defined revoke semantics. It immediately stops and
fences access to the memory inside it's invalidate_mappings() callback,
and it currently doesn't use scatterlists so doesn't call map/unmap at
all.

It is expected that a future revision can synchronously call unmap from
the move_notify callback as well.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260131-dmabuf-revoke-v7-8-463d956bd527@nvidia.com
2026-02-23 19:51:41 +01:00
Maxime Ripard
8b85987d3c
Merge drm/drm-next into drm-misc-next
Let's merge 7.0-rc1 to start the new drm-misc-next window

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-02-23 11:48:20 +01:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
cebcffe666 VFIO updates for v7.0-rc1
- Update outdated mdev comment referencing the renamed
    mdev_type_add() function. (Julia Lawall)
 
  - Introduce selftest support for IOMMU mapping of PCI MMIO BARs.
    (Alex Mastro)
 
  - Relax selftest assertion relative to differences in huge page
    handling between legacy (v1) TYPE1 IOMMU mapping behavior and
    the compatibility mode supported by IOMMUFD. (David Matlack)
 
  - Reintroduce memory poison handling support for non-struct-page-
    backed memory in the nvgrace-gpu variant driver. (Ankit Agrawal)
 
  - Replace dma_buf_phys_vec with phys_vec to avoid duplicate
    structure and semantics. (Leon Romanovsky)
 
  - Add missing upstream bridge locking across PCI function reset,
    resolving an assertion failure when secondary bus reset is used
    to provide that reset. (Anthony Pighin)
 
  - Fixes to hisi_acc vfio-pci variant driver to resolve corner case
    issues related to resets, repeated migration, and error injection
    scenarios. (Longfang Liu, Weili Qian)
 
  - Restrict vfio selftest builds to arm64 and x86_64, resolving
    compiler warnings on 32-bit archs. (Ted Logan)
 
  - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
    stepped up. (Ioana Ciornei)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmmNCcMRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyLlvw/9FLOcpjKCcxyWFPGUMHV9L0N8dWMR5t75
 Pu6cBuYdpqGgrUaa1NWHYEzFbMSkEJMb5jLj26lokn2l4VZ9BKwdehaE/7t978z2
 J0FgnGUg3B4lYm5qoBStaJ26123XafTMnsBn+wKdXt/lN6ng6GXVBxnmGP+Fuuwd
 HA3MSFB6HUFw4et8qDG3ziyboN/pSWyXaupy60zvVy9x39i4/ZzMm3PSrYPdUX4x
 aPM+lWKRi5yFMwiksZyYb67XA717Js8xhmgNMeJ8Yz3ZUF0n3Z7ZpOzbU+hl8LNn
 sAea6+lXXsvNjEXfet1mjg7A+RYmuQdcjk58J//ijRXn7zRijRM671Bzc40T2JcP
 bfrajHhprMsE+u7VwiBuERACTtbemuaKSbi5iNLHAIqTFwPpb400PvbptkyQhkxh
 IRXIxqgKb5G6/sd73m9dKR9HU7d5SL3mNCARrymgqT6kRxz8fqtaVsXbbsa1Tgah
 iV8in7wjKJ/80rYQd7gNyj/RRpYTAJJemfnJtKGQ9LxGnej8AV6kUZ3np7hpspz7
 TVtmn9RxlwbA5lWYXJ4VUzt9u2Riwd2W6jg6ZnUknSZN6B5j2Jd2bDtF/FKLauKG
 DW/bN8UU7nzgC40ro92qJEFF2PC7GkfZUVRlgW0oq54QZjyCoAIpfYOXjLTSteYP
 umnjcrWkgag=
 =F+FV
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:
 "A small cycle with the bulk in selftests and reintroducing poison
  handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and
  some dmabuf structure consolidation.

   - Update outdated mdev comment referencing the renamed
     mdev_type_add() function (Julia Lawall)

   - Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex
     Mastro)

   - Relax selftest assertion relative to differences in huge page
     handling between legacy (v1) TYPE1 IOMMU mapping behavior and the
     compatibility mode supported by IOMMUFD (David Matlack)

   - Reintroduce memory poison handling support for non-struct-page-
     backed memory in the nvgrace-gpu variant driver (Ankit Agrawal)

   - Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure
     and semantics (Leon Romanovsky)

   - Add missing upstream bridge locking across PCI function reset,
     resolving an assertion failure when secondary bus reset is used to
     provide that reset (Anthony Pighin)

   - Fixes to hisi_acc vfio-pci variant driver to resolve corner case
     issues related to resets, repeated migration, and error injection
     scenarios (Longfang Liu, Weili Qian)

   - Restrict vfio selftest builds to arm64 and x86_64, resolving
     compiler warnings on 32-bit archs (Ted Logan)

   - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
     stepped up (Ioana Ciornei)"

* tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/fsl-mc: add myself as maintainer
  vfio: selftests: only build tests on arm64 and x86_64
  hisi_acc_vfio_pci: fix the queue parameter anomaly issue
  hisi_acc_vfio_pci: resolve duplicate migration states
  hisi_acc_vfio_pci: update status after RAS error
  hisi_acc_vfio_pci: fix VF reset timeout issue
  vfio/pci: Lock upstream bridge for vfio_pci_core_disable()
  types: reuse common phys_vec type instead of DMABUF open‑coded variant
  vfio/nvgrace-gpu: register device memory for poison handling
  mm: add stubs for PFNMAP memory failure registration functions
  vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU
  vfio: selftests: Add vfio_dma_mapping_mmio_test
  vfio: selftests: Align BAR mmaps for efficient IOMMU mapping
  vfio: selftests: Centralize IOMMU mode name definitions
  vfio/mdev: update outdated comment
2026-02-12 15:52:39 -08:00
Thomas Zimmermann
2bebc88d5e Merge drm/drm-next into drm-misc-next
Backmerging to get bug fixes from v6.19-rc7.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2026-02-05 10:33:06 +01:00
Deepanshu Kartikey
2724138b2f iommufd: Initialize batch->kind in batch_clear()
KMSAN reported an uninitialized value when batch_add_pfn_num() reads
batch->kind. This occurs because batch_clear() does not initialize the
kind field.

When batch_add_pfn_num() checks "if (batch->kind != kind)", it reads this
uninitialized value, triggering KMSAN warnings. However the algorithm is
fine with any value in kind at this point as the batch is always empty and
it always corrects kind if wrong.

Initialize batch->kind to zero in batch_clear() to silence the KMSAN
warning.

Link: https://patch.msgid.link/r/20260124132214.624041-1-kartikey406@gmail.com
Reported-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=df28076a30d726933015
Fixes: f394576eb1 ("iommufd: PFN handling for iopt_pages")
Tested-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com
Reported-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-01-28 12:49:17 -04:00
Leon Romanovsky
95308225e5 dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
Along with renaming the .move_notify() callback, rename the corresponding
dma-buf core function. This makes the expected behavior clear to exporters
calling this function.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-2-f98fca917e96@nvidia.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2026-01-27 10:44:30 +01:00
Leon Romanovsky
ef246da8e6 dma-buf: Rename .move_notify() callback to a clearer identifier
Rename the .move_notify() callback to .invalidate_mappings() to make its
purpose explicit and highlight that it is responsible for invalidating
existing mappings.

Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-1-f98fca917e96@nvidia.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2026-01-27 10:43:55 +01:00
Alex Williamson
fab06e956f * Reuse common phys_vec, phase out dma_buf_phys_vec
Signed-off-by: Alex Williamson <alex@shazbot.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmluaLsRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyIZow/+N/BbZYkRub0lFbufNQ/+M/N80kRzfjjQ
 dE2/AN+YDfLdFxNvmq/P1fYI2B/Oc8msuGwT+C+eJMSVMzZIdCDvnhxH+aPH9roo
 5zmE7fxLjWp/k1lOwb8TupOIR3Jl334AiC/t9PvqraIUMGvt3Uv+/7KoCx/xDxb7
 ID6NbDKiXCZpvlQN1dukf8TCzDVFJnWOOMKLiDnez14okfd5rIqiFpOWMjtAhwlY
 VrwIy/eVqX6YktniwunqU3f4/BeurlHdCS29LdDnHdZzL6HER5onvMwIO8qMeuFZ
 yS8Bf8KJTYtqEedMtFUi5a/ipYu4vuK0KEqIE6USKZLuhQCqLVjFDKs5zUGRQ48X
 qLs59BBmP1WgOnM63OGXzBAAvelLNoh/D5KVzzXmQyNkBn6mFy1MWeR38Ozgl0FA
 +GjK+iwV/GRo+CgDa6Vz+eVwvCV2RhcYlT4cK6BodIQbwd9SWAMEcRxI/IEvOxfC
 YY/1U2JRhOSaQb9j65xgwylEbwoi8BMVbFWE3DydYMr+9PVaOyTKLcJLKrYmhwLn
 cuPetgLaK3UtxdcfhnZyrwzpmtvA56SAReQYg9s+TXFGFurQjNlGVlcKk4dB45nX
 JcOtWHm/6+3D8qoN6FY8Vj5QPePn48urSw1R1/D0LP7951gxknILiQpI7aqEPHyU
 rjAZH6nH6bI=
 =c1Gn
 -----END PGP SIGNATURE-----

Merge tag 'common_phys_vec_via_vfio' into v6.20/vfio/next

 * Reuse common phys_vec, phase out dma_buf_phys_vec

Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:25:24 -07:00
Leon Romanovsky
b703b31ea8 types: reuse common phys_vec type instead of DMABUF open‑coded variant
After commit fcf463b92a ("types: move phys_vec definition to common header"),
we can use the shared phys_vec type instead of the DMABUF‑specific
dma_buf_phys_vec, which duplicated the same structure and semantics.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260107-convert-to-pvec-v1-1-6e3ab8079708@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:13:29 -07:00
Jason Gunthorpe
7adfd68274 iommufd/selftest: Prevent module/builtin conflicts in kconfig
The selftest now depends on the AMDv1 page table, however the selftest
kconfig itself is just an sub-option of the main IOMMUFD module kconfig.

This means it cannot be modular and so kconfig allowed a modular
IOMMU_PT_AMDV1 with a built in IOMMUFD. This causes link failures:

   ld: vmlinux.o: in function `mock_domain_alloc_pgtable.isra.0':
   selftest.c:(.text+0x12e8ad3): undefined reference to `pt_iommu_amdv1_init'
   ld: vmlinux.o: in function `BSWAP_SHUFB_CTL':
   sha1-avx2-asm.o:(.rodata+0xaa36a8): undefined reference to `pt_iommu_amdv1_read_and_clear_dirty'
   ld: sha1-avx2-asm.o:(.rodata+0xaa36f0): undefined reference to `pt_iommu_amdv1_map_pages'
   ld: sha1-avx2-asm.o:(.rodata+0xaa36f8): undefined reference to `pt_iommu_amdv1_unmap_pages'
   ld: sha1-avx2-asm.o:(.rodata+0xaa3720): undefined reference to `pt_iommu_amdv1_iova_to_phys'

Adjust the kconfig to disable IOMMUFD_TEST if IOMMU_PT_AMDV1 is incompatible.

Fixes: e93d5945ed ("iommufd: Change the selftest to use iommupt instead of xarray")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512210135.freQWpxa-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10 10:40:58 +01:00
Jason Gunthorpe
faa37ff3bf iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFER
The test doesn't build without it, dma-buf.h does not provide stub
functions if it is not enabled. Compilation can fail with:

 ERROR:root:ld: vmlinux.o: in function `iommufd_test':
 (.text+0x3b1cdd): undefined reference to `dma_buf_get'
 ld: (.text+0x3b1d08): undefined reference to `dma_buf_put'
 ld: (.text+0x3b2105): undefined reference to `dma_buf_export'
 ld: (.text+0x3b211f): undefined reference to `dma_buf_fd'
 ld: (.text+0x3b2e47): undefined reference to `dma_buf_move_notify'

Add the missing select.

Fixes: d2041f1f11 ("iommufd/selftest: Add some tests for the dmabuf flow")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10 10:40:58 +01:00
Jason Gunthorpe
e6a973af11 iommufd/selftest: Check for overflow in IOMMU_TEST_OP_ADD_RESERVED
syzkaller found it could overflow math in the test infrastructure and
cause a WARN_ON by corrupting the reserved interval tree. This only
effects test kernels with CONFIG_IOMMUFD_TEST.

Validate the user input length in the test ioctl.

Fixes: f4b20bb34c ("iommufd: Add kernel support for testing iommufd")
Link: https://patch.msgid.link/r/0-v1-cd99f6049ba5+51-iommufd_syz_add_resv_jgg@nvidia.com
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Reported-by: syzbot+57fdb0cf6a0c5d1f15a2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69368129.a70a0220.38f243.008f.GAE@google.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-16 11:53:40 -04:00
Jason Gunthorpe
b80fab2813 iommufd/selftest: Do not leak the hwpt if IOMMU_TEST_OP_MD_CHECK_MAP fails
If the input validation fails it returned without freeing the hwpt
refcount causing a leak. This triggers a WARN_ON when closing the fd:

  WARNING: drivers/iommu/iommufd/main.c:369 at iommufd_fops_release+0x385/0x430, CPU#1: repro/724

Found by szykaller.

Fixes: e93d5945ed ("iommufd: Change the selftest to use iommupt instead of xarray")
Link: https://patch.msgid.link/r/0-v1-c8ed57e24380+44ae-iommufd_selftest_hwpt_leak_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/r/aTJGMaqwQK0ASj0G@ly-workstation
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-15 20:34:41 -04:00
Arnd Bergmann
69dc538a4f iommufd: Fix building without dmabuf
When DMABUF is disabled, trying to use it causes a link failure:

x86_64-linux-ld: drivers/iommu/iommufd/io_pagetable.o: in function `iopt_map_file_pages':
io_pagetable.c:(.text+0x1735): undefined reference to `dma_buf_get'
x86_64-linux-ld: io_pagetable.c:(.text+0x1775): undefined reference to `dma_buf_put'

Fixes: 44ebaa1744 ("iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE")
Link: https://patch.msgid.link/r/20251204100333.1034767-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-15 20:34:41 -04:00
Linus Torvalds
056daec292 iommufd 6.19 pull request
- Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO. This
   is the first step to broader DMABUF support in iommufd, right now it
   only works with VFIO. This closes the last functional gap with classic
   VFIO type 1 to safely support PCI peer to peer DMA by mapping the VFIO
   device's MMIO into the IOMMU.
 
 - Relax SMMUv3 restrictions on nesting domains to better support qemu's
   sequence to have an identity mapping before the vSID is established.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaS8DrgAKCRCFwuHvBreF
 YY/kAP0Q1s7LkVb83uQb8kIW3xKzEnFNTlhrSSGV5UBuYLbaDgD+J+y+4VrSkJem
 85LMipmzaoZdHqtxMhQWrlYbZMr9TAM=
 =BacK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "This is a pretty consequential cycle for iommufd, though this pull is
  not too big. It is based on a shared branch with VFIO that introduces
  VFIO_DEVICE_FEATURE_DMA_BUF a DMABUF exporter for VFIO device's MMIO
  PCI BARs. This was a large multiple series journey over the last year
  and a half.

  Based on that work IOMMUFD gains support for VFIO DMABUF's in its
  existing IOMMU_IOAS_MAP_FILE, which closes the last major gap to
  support PCI peer to peer transfers within VMs.

  In Joerg's iommu tree we have the "generic page table" work which aims
  to consolidate all the duplicated page table code in every iommu
  driver into a single algorithm. This will be used by iommufd to
  implement unique page table operations to start adding new features
  and improve performance.

  In here:

   - Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO.
     This is the first step to broader DMABUF support in iommufd, right
     now it only works with VFIO. This closes the last functional gap
     with classic VFIO type 1 to safely support PCI peer to peer DMA by
     mapping the VFIO device's MMIO into the IOMMU.

   - Relax SMMUv3 restrictions on nesting domains to better support
     qemu's sequence to have an identity mapping before the vSID is
     established"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases
  iommufd/selftest: Add some tests for the dmabuf flow
  iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
  iommufd: Have iopt_map_file_pages convert the fd to a file
  iommufd: Have pfn_reader process DMABUF iopt_pages
  iommufd: Allow MMIO pages in a batch
  iommufd: Allow a DMABUF to be revoked
  iommufd: Do not map/unmap revoked DMABUFs
  iommufd: Add DMABUF to iopt_pages
  vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
2025-12-04 18:50:11 -08:00
Linus Torvalds
ce5cfb0fa2 IOMMU Updates for Linux v6.19
Including:
 
 	- Introduction of the generic IO page-table framework with support for
 	  Intel and AMD IOMMU formats from Jason. This has good potential for
 	  unifying more IO page-table implementations and making future
 	  enhancements more easy. But this also needed quite some fixes during
 	  development. All known issues have been fixed, but my feeling is that
 	  there is a higher potential than usual that more might be needed.
 
 	- Intel VT-d updates:
 	  - Use right invalidation hint in qi_desc_iotlb().
 
 	  - Reduce the scope of INTEL_IOMMU_FLOPPY_WA.
 
 	- ARM-SMMU updates:
 	  - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
 	    and a new clock for the TBU.
 
 	  - Fix error handling if level 1 CD table allocation fails.
 
 	  - Permit more than the architectural maximum number of SMRs for funky
 	    Qualcomm mis-implementations of SMMUv2.
 
 	- Mediatek driver:
 	  - MT8189 iommu support.
 
 	- Move ARM IO-pgtable selftests to kunit.
 
 	- Device leak fixes for a couple of drivers.
 
 	- Random smaller fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmktjWgACgkQK/BELZcB
 GuMQgA//YljqMVbMmYpFQF/9nXsyTvkpzaaVqj3CscjfnLJQ7YNIrWHLMw4xcZP7
 c2zsSDMPc5LAe2nkyNPvMeFufsbDOYE1CcbqFfZhNqwKGWIVs7J1j73aqJXKfXB4
 RaVv5GA1NFeeIRRKPx/ZpD9W1WiR9PiUQXBxNTAJLzbBNhcTJzo+3YuSpJ6ckOak
 aG67Aox6Dwq/u0gHy8gWnuA2XL6Eit+nQbod7TfchHoRu+TUdbv8qWL+sUChj+u/
 IlyBt1YL/do3bJC0G1G2E81J1KGPU/OZRfB34STQKlopEdXX17ax3b2X0bt3Hz/h
 9Yk3BLDtGMBQ0aVZzAZcOLLlBlEpPiMKBVuJQj29kK9KJSYfmr2iolOK0cGyt+kv
 DfQ8+nv6HRFMbwjetfmhGYf6WemPcggvX44Hm/rgR2qbN3P+Q8/klyyH8MLuQeqO
 ttoQIwDd9DYKJelmWzbLgpb2vGE3O0EAFhiTcCKOk643PaudfengWYKZpJVIIqtF
 nEUEpk17HlpgFkYrtmIE7CMONqUGaQYO84R3j7DXcYXYAvqQJkhR3uJejlWQeh8x
 uMc9y04jpg3p5vC5c7LfkQ3at3p/jf7jzz4GuNoZP5bdVIUkwXXolir0ct3/oby3
 /bXXNA1pSaRuUADm7pYBBhKYAFKC7vCSa8LVaDR3CB95aNZnvS0=
 =KG7j
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:

 - Introduction of the generic IO page-table framework with support for
   Intel and AMD IOMMU formats from Jason.

   This has good potential for unifying more IO page-table
   implementations and making future enhancements more easy. But this
   also needed quite some fixes during development. All known issues
   have been fixed, but my feeling is that there is a higher potential
   than usual that more might be needed.

 - Intel VT-d updates:
    - Use right invalidation hint in qi_desc_iotlb()
    - Reduce the scope of INTEL_IOMMU_FLOPPY_WA

 - ARM-SMMU updates:
    - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
      and a new clock for the TBU.
    - Fix error handling if level 1 CD table allocation fails.
    - Permit more than the architectural maximum number of SMRs for
      funky Qualcomm mis-implementations of SMMUv2.

 - Mediatek driver:
    - MT8189 iommu support

 - Move ARM IO-pgtable selftests to kunit

 - Device leak fixes for a couple of drivers

 - Random smaller fixes and improvements

* tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits)
  iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
  iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
  powerpc/pseries/svm: Make mem_encrypt.h self contained
  genpt: Make GENERIC_PT invisible
  iommupt: Avoid a compiler bug with sw_bit
  iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal
  iommupt: Fix unlikely flows in increase_top()
  iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga()
  MAINTAINERS: Update my email address
  iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables
  dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock
  iommu/vt-d: Restore previous domain::aperture_end calculation
  iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
  iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD
  iommu/tegra: fix device leak on probe_device()
  iommu/sun50i: fix device leak on of_xlate()
  iommu/omap: simplify probe_device() error handling
  iommu/omap: fix device leaks on probe_device()
  iommu/mediatek-v1: add missing larb count sanity check
  iommu/mediatek-v1: fix device leaks on probe()
  ...
2025-12-04 18:05:06 -08:00
Joerg Roedel
0d081b1694 Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'mediatek', 'nvidia/tegra', 'intel/vt-d', 'amd/amd-vi' and 'core' into next 2025-11-28 08:44:21 +01:00
Jason Gunthorpe
5185c4d8a5 Merge branch 'iommufd_dmabuf' into k.o-iommufd/for-next
Jason Gunthorpe says:

====================
This series is the start of adding full DMABUF support to
iommufd. Currently it is limited to only work with VFIO's DMABUF exporter.
It sits on top of Leon's series to add a DMABUF exporter to VFIO:

   https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com/

The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but
otherwise works the same as it does today for a memfd. The user can select
a slice of the FD to map into the ioas and if the underliyng alignment
requirements are met it will be placed in the iommu_domain.

Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR
memory from VFIO to an iommu_domain controlled by iommufd. This is used
for PCI Peer to Peer support in VMs, and is the last feature that the VFIO
type 1 container has that iommufd couldn't do.

The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime
control and is a use-after-free security problem.

Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there
should be no access to the MMIO it can shoot down the mapping in iommufd
which will unmap it from the iommu_domain. There is no automatic remap,
this is a safety protocol so the kernel doesn't get stuck. Userspace is
expected to know it is doing something that will revoke the dmabuf and
map/unmap it around the activity. Eg when QEMU goes to issue FLR it should
do the map/unmap to iommufd.

Since DMABUF is missing some key general features for this use case it
relies on a "private interconnect" between VFIO and iommufd via the
vfio_pci_dma_buf_iommufd_map() call.

The call confirms the DMABUF has revoke semantics and delivers a phys_addr
for the memory suitable for use with iommu_map().

Medium term there is a desire to expand the supported DMABUFs to include
GPU drivers to support DPDK/SPDK type use cases so future series will work
to add a general concept of revoke and a general negotiation of
interconnect to remove vfio_pci_dma_buf_iommufd_map().

I also plan another series to modify iommufd's vfio_compat to
transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI
of type1.

The latest series for interconnect negotation to exchange a phys_addr is:
 https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com

And the discussion for design of revoke is here:
 https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/
====================

Based on a shared branch with vfio.

* iommufd_dmabuf:
  iommufd/selftest: Add some tests for the dmabuf flow
  iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
  iommufd: Have iopt_map_file_pages convert the fd to a file
  iommufd: Have pfn_reader process DMABUF iopt_pages
  iommufd: Allow MMIO pages in a batch
  iommufd: Allow a DMABUF to be revoked
  iommufd: Do not map/unmap revoked DMABUFs
  iommufd: Add DMABUF to iopt_pages
  vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
  vfio/nvgrace: Support get_dmabuf_phys
  vfio/pci: Add dma-buf export support for MMIO regions
  vfio/pci: Enable peer-to-peer DMA transactions by default
  vfio/pci: Share the core device pointer while invoking feature functions
  vfio: Export vfio device get and put registration helpers
  dma-buf: provide phys_vec to scatter-gather mapping routine
  PCI/P2PDMA: Document DMABUF model
  PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
  PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
  PCI/P2PDMA: Simplify bus address mapping API
  PCI/P2PDMA: Separate the mmap() support from the core logic

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-26 14:04:10 -04:00
Jason Gunthorpe
d2041f1f11 iommufd/selftest: Add some tests for the dmabuf flow
Basic tests of establishing a dmabuf and revoking it. The selftest kernel
side provides a basic small dmabuf for this testing.

Link: https://patch.msgid.link/r/9-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:16 -04:00
Jason Gunthorpe
44ebaa1744 iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
Finally call iopt_alloc_dmabuf_pages() if the user passed in a DMABUF
through IOMMU_IOAS_MAP_FILE. This makes the feature visible to userspace.

Link: https://patch.msgid.link/r/8-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:16 -04:00
Jason Gunthorpe
217725f0b2 iommufd: Have iopt_map_file_pages convert the fd to a file
Since dmabuf only has APIs that work on an int fd and not a struct file *,
pass the fd deeper into the call chain so we can use the dmabuf APIs as
is.

Link: https://patch.msgid.link/r/7-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:16 -04:00
Jason Gunthorpe
74014a4b55 iommufd: Have pfn_reader process DMABUF iopt_pages
Make another sub implementation of pfn_reader for DMABUF. This version
will fill the batch using the struct phys_vec recorded during the
attachment.

Link: https://patch.msgid.link/r/6-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:16 -04:00
Jason Gunthorpe
3114c67440 iommufd: Allow MMIO pages in a batch
Addresses intended for MMIO should be propagated through to the iommu with
the IOMMU_MMIO flag set.

Keep track in the batch if all the pfns are cachable or mmio and flush the
batch out of it ever needs to be changed. Switch to IOMMU_MMIO if the
batch is MMIO when mapping the iommu.

Link: https://patch.msgid.link/r/5-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:15 -04:00
Jason Gunthorpe
fc7063abd9 iommufd: Allow a DMABUF to be revoked
When connected to VFIO, the only DMABUF exporter that is accepted, the
move_notify callback will be made when VFIO wants to remove access to the
MMIO. This is being called revoke.

Wire up revoke to go through all the iommu_domain's that have mapped the
DMABUF and unmap them.

The locking here is unpleasant, since the existing locking scheme was
designed to come from the iopt through the area to the pages we cannot use
pages as starting point for the locking. There is no way to obtain the
domains_rwsem before obtaining the pages mutex to reliably use the
existing domains_itree.

Solve this problem by adding a new tracking structure just for DMABUF
revoke. Record a linked list of areas and domains inside the pages
mutex. Clean the entries on the list during revoke. The map/unmaps are now
all done under a pages mutex while updating the tracking linked list so
nothing can get out of sync. Only one lock is required for revoke
processing.

Link: https://patch.msgid.link/r/4-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:15 -04:00
Jason Gunthorpe
71e2409a0c iommufd: Do not map/unmap revoked DMABUFs
Once a DMABUF is revoked the domain will be unmapped under the pages
mutex. Double unmapping will trigger a WARN, and mapping while revoked
will fail.

Check for revoked DMABUFs along all the map and unmap paths to resolve
this. Ensure that map/unmap is always done under the pages mutex so it is
synchronized with the revoke notifier.

If a revoke happens between allocating the iopt_pages and the population
to a domain then the population will succeed, and leave things unmapped as
though revoke had happened immediately after.

Currently there is no way to repopulate the domains. Userspace is expected
to know if it is going to do something that would trigger revoke (eg if it
is about to do a FLR) then it should go and remove the DMABUF mappings
before and put the back after. The revoke is only to protect the kernel
from mis-behaving userspace.

Link: https://patch.msgid.link/r/3-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:15 -04:00
Jason Gunthorpe
71db84a092 iommufd: Add DMABUF to iopt_pages
Add IOPT_ADDRESS_DMABUF to the iopt_pages and the basic infrastructure to
create an iopt_pages from a struct dma_buf *.

DMABUF pages are not supported for accesses, and for now can only be used
with the VFIO DMABUF exporter.

The overall flow will be similar to memfd where the user can pass in a
DMABUF file descriptor to IOMMU_IOAS_MAP_FILE and create an area and
pages. Like other areas it can be copied and otherwise manipulated, though
there is little point in doing so.

There is no pinned page accounting done for DMABUF maps.

The DMABUF attachment exists so long as the dmabuf is mapped into an IOAS,
even if the IOAS is not mapped to any domains.

Link: https://patch.msgid.link/r/2-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:15 -04:00
Gustavo A. R. Silva
b07bf253ef iommufd/iommufd_private.h: Avoid -Wflex-array-member-not-at-end warning
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration to the end of the corresponding
structure. Notice that struct iommufd_vevent is a flexible
structure, this is a structure that contains a flexible-array
member.

Fix the following warning:

drivers/iommu/iommufd/iommufd_private.h:621:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Link: https://patch.msgid.link/r/aRHOAwpATIE0oajj@kspp
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Fixes: e36ba5ab80 ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-21 15:38:27 -04:00
Gustavo A. R. Silva
ac84ff4533 iommufd/driver: Fix counter initialization for counted_by annotation
One of the requirements for counted_by annotations is that the counter
member must be initialized before the first reference to the
flexible-array member.

Move the vevent->data_len = data_len; initialization to before the
first access to flexible array vevent->event_data.

Link: https://patch.msgid.link/r/aRL7ZFFqM5bRTd2D@kspp
Cc: stable@vger.kernel.org
Fixes: e8e1ef9b77 ("iommufd/viommu: Add iommufd_viommu_report_event helper")
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-21 15:38:27 -04:00
Jason Gunthorpe
afb47765f9 iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd returns ENOENT when attempting to unmap a range that is already
empty, while vfio type1 returns success. Fix vfio_compat to match.

Fixes: d624d6652a ("iommufd: vfio container FD ioctl compatibility")
Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Reported-by: Alex Mastro <amastro@fb.com>
Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05 15:11:26 -04:00
Jason Gunthorpe
e93d5945ed iommufd: Change the selftest to use iommupt instead of xarray
The iommufd self test uses an xarray to store the pfns and their orders to
emulate a page table. Make it act more like a real iommu driver by
replacing the xarray with an iommupt based page table. The new AMDv1 mock
format behaves similarly to the xarray.

Add set_dirty() as a iommu_pt operation to allow the test suite to
simulate HW dirty.

Userspace can select between several formats including the normal AMDv1
format and a special MOCK_IOMMUPT_HUGE variation for testing huge page
dirty tracking. To make the dirty tracking test work the page table must
only store exactly 2M huge pages otherwise the logic the test uses
fails. They cannot be broken up or combined.

Aside from aligning the selftest with a real page table implementation,
this helps test the iommupt code itself.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05 09:07:13 +01:00
Nicolin Chen
fd714986e4 iommu: Pass in old domain to attach_dev callback functions
The IOMMU core attaches each device to a default domain on probe(). Then,
every new "attach" operation has a fundamental meaning of two-fold:
 - detach from its currently attached (old) domain
 - attach to a given new domain

Modern IOMMU drivers following this pattern usually want to clean up the
things related to the old domain, so they call iommu_get_domain_for_dev()
to fetch the old domain.

Pass in the old domain pointer from the core to drivers, aligning with the
set_dev_pasid op that does so already.

Ensure all low-level attach fcuntions in the core can forward the correct
old domain pointer. Thus, rework those functions as well.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-10-27 13:55:35 +01:00
Jason Gunthorpe
cb30dfa75d iommufd: Don't overflow during division for dirty tracking
If pgshift is 63 then BITS_PER_TYPE(*bitmap->bitmap) * pgsize will overflow
to 0 and this triggers divide by 0.

In this case the index should just be 0, so reorganize things to divide
by shift and avoid hitting any overflows.

Link: https://patch.msgid.link/r/0-v1-663679b57226+172-iommufd_dirty_div0_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 58ccf0190d ("vfio: Add an IOVA bitmap support")
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: syzbot+093a8a8b859472e6c257@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=093a8a8b859472e6c257
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-10-20 19:58:37 -03:00
Guixin Liu
2a918911ed iommufd: Register iommufd mock devices with fwspec
Since the bus ops were retired the iommu subsystem changed to using fwspec
to match the iommu driver to the iommu device. If a device has a NULL
fwspec then it is matched to the first iommu driver with a NULL fwspec,
effectively disabling support for systems with more than one non-fwspec
iommu driver.

Thus, if the iommufd selfest are run in an x86 system that registers a
non-fwspec iommu driver they fail to bind their mock devices to the mock
iommu driver.

Fix this by allocating a software fwnode for mock iommu driver's
iommu_device, and set it to the device which mock iommu driver created.

This is done by adding a new helper iommu_mock_device_add() which abuses
the internals of the fwspec system to establish a fwspec before the device
is added and is careful not to leak it. A matching dummy fwspec is
automatically added to the mock iommu driver.

Test by "make -C toosl/testing/selftests TARGETS=iommu run_tests":
PASSED: 229 / 229 tests passed.

In addition, this issue is also can be found on amd platform, and
also tested on a amd machine.

Link: https://patch.msgid.link/r/20250925054730.3877-1-kanie@linux.alibaba.com
Fixes: 17de3f5fdd ("iommu: Retire bus ops")
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-30 09:54:12 -03:00
Jason Gunthorpe
53d0584eeb iommufd: WARN if an object is aborted with an elevated refcount
If something holds a refcount then it is at risk of UAFing. For abort
paths we expect the caller to never share the object with a parallel
thread and to clean up any refcounts it obtained on its own.

Add the missing dec inside iommufd_hwpt_paging_alloc() during error unwind
by making iommufd_hw_pagetable_attach/detach() proper pairs.

Link: https://patch.msgid.link/r/2-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19 10:34:49 -03:00
Jason Gunthorpe
4e034bf045 iommufd: Fix race during abort for file descriptors
fput() doesn't actually call file_operations release() synchronously, it
puts the file on a work queue and it will be released eventually.

This is normally fine, except for iommufd the file and the iommufd_object
are tied to gether. The file has the object as it's private_data and holds
a users refcount, while the object is expected to remain alive as long as
the file is.

When the allocation of a new object aborts before installing the file it
will fput() the file and then go on to immediately kfree() the obj. This
causes a UAF once the workqueue completes the fput() and tries to
decrement the users refcount.

Fix this by putting the core code in charge of the file lifetime, and call
__fput_sync() during abort to ensure that release() is called before
kfree. __fput_sync() is a bit too tricky to open code in all the object
implementations. Instead the objects tell the core code where the file
pointer is and the core will take care of the life cycle.

If the object is successfully allocated then the file will hold a users
refcount and the iommufd_object cannot be destroyed.

It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an
issue because close() is already using a synchronous version of fput().

The UAF looks like this:

    BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
    Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164

    CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full)
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:94 [inline]
     dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
     print_address_description mm/kasan/report.c:378 [inline]
     print_report+0xcd/0x630 mm/kasan/report.c:482
     kasan_report+0xe0/0x110 mm/kasan/report.c:595
     check_region_inline mm/kasan/generic.c:183 [inline]
     kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189
     instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
     atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline]
     __refcount_dec include/linux/refcount.h:455 [inline]
     refcount_dec include/linux/refcount.h:476 [inline]
     iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
     __fput+0x402/0xb70 fs/file_table.c:468
     task_work_run+0x14d/0x240 kernel/task_work.c:227
     resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
     exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43
     exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
     syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
     syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
     do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100
     entry_SYSCALL_64_after_hwframe+0x77/0x7f

Link: https://patch.msgid.link/r/1-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 07838f7fd5 ("iommufd: Add iommufd fault object")
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19 10:34:49 -03:00
Jason Gunthorpe
7a425ec75d iommufd: Fix refcounting race during mmap
The owner object of the imap can be destroyed while the imap remains in
the mtree. So access to the imap pointer without holding locks is racy
with destruction.

The imap is safe to access outside the lock once a users refcount is
obtained, the owner object cannot start destruction until users is 0.

Thus the users refcount should not be obtained at the end of
iommufd_fops_mmap() but instead inside the mtree lock held around the
mtree_load(). Move the refcount there and use refcount_inc_not_zero() as
we can have a 0 refcount inside the mtree during destruction races.

Link: https://patch.msgid.link/r/0-v1-e6faace50971+3cc-iommufd_mmap_fix_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 56e9a0d8e5 ("iommufd: Add mmap interface")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19 10:34:49 -03:00
Akhilesh Patil
8fe8a09204 iommufd: viommu: free memory allocated by kvcalloc() using kvfree()
Use kvfree() instead of kfree() to free pages allocated by kvcalloc()
in iommufs_hw_queue_alloc_phys() to fix potential memory corruption.
Ensure the memory is properly freed, as kvcalloc may internally use
vmalloc or kmalloc depending on available memory in the system.

Fixes: 2238ddc2b0 ("iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl")
Link: https://patch.msgid.link/r/aJifyVV2PL6WGEs6@bhairav-test.ee.iitb.ac.in
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-18 11:10:40 -03:00
Linus Torvalds
c93529ad4f iommufd 6.17 merge window pull
- IOMMU HW now has features to directly assign HW command queues to a
   guest VM. In this mode the command queue operates on a limited set of
   invalidation commands that are suitable for improving guest invalidation
   performance and easy for the HW to virtualize.
 
   This PR brings the generic infrastructure to allow IOMMU drivers to
   expose such command queues through the iommufd uAPI, mmap the doorbell
   pages, and get the guest physical range for the command queue ring
   itself.
 
 - An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built on
   the new iommufd command queue features. It works with the existing SMMU
   driver support for cmdqv in guest VMs.
 
 - Many precursor cleanups and improvements to support the above cleanly,
   changes to the general ioctl and object helpers, driver support for
   VDEVICE, and mmap pgoff cookie infrastructure.
 
 - Sequence VDEVICE destruction to always happen before VFIO device
   destruction. When using the above type features, and also in future
   confidential compute, the internal virtual device representation becomes
   linked to HW or CC TSM configuration and objects. If a VFIO device is
   removed from iommufd those HW objects should also be cleaned up to
   prevent a sort of UAF. This became important now that we have HW backing
   the VDEVICE.
 
 - Fix one syzkaller found error related to math overflows during iova
   allocation
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaIpl9AAKCRCFwuHvBreF
 YS5tAP9MDIRML5a/2IOhzcsc4LiDkWTMKm2m1wcRYd+iU2aFVQEAjdghINLHrUlx
 HVuIDvNvWIUED/oTAp5kCxQ7PBFN4gU=
 =NmCO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "This broadly brings the assigned HW command queue support to iommufd.
  This feature is used to improve SVA performance in VMs by avoiding
  paravirtualization traps during SVA invalidations.

  Along the way I think some of the core logic is in a much better state
  to support future driver backed features.

  Summary:

   - IOMMU HW now has features to directly assign HW command queues to a
     guest VM. In this mode the command queue operates on a limited set
     of invalidation commands that are suitable for improving guest
     invalidation performance and easy for the HW to virtualize.

     This brings the generic infrastructure to allow IOMMU drivers to
     expose such command queues through the iommufd uAPI, mmap the
     doorbell pages, and get the guest physical range for the command
     queue ring itself.

   - An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built
     on the new iommufd command queue features. It works with the
     existing SMMU driver support for cmdqv in guest VMs.

   - Many precursor cleanups and improvements to support the above
     cleanly, changes to the general ioctl and object helpers, driver
     support for VDEVICE, and mmap pgoff cookie infrastructure.

   - Sequence VDEVICE destruction to always happen before VFIO device
     destruction. When using the above type features, and also in future
     confidential compute, the internal virtual device representation
     becomes linked to HW or CC TSM configuration and objects. If a VFIO
     device is removed from iommufd those HW objects should also be
     cleaned up to prevent a sort of UAF. This became important now that
     we have HW backing the VDEVICE.

   - Fix one syzkaller found error related to math overflows during iova
     allocation"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits)
  iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size
  iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3
  iommufd: Rename some shortterm-related identifiers
  iommufd/selftest: Add coverage for vdevice tombstone
  iommufd/selftest: Explicitly skip tests for inapplicable variant
  iommufd/vdevice: Remove struct device reference from struct vdevice
  iommufd: Destroy vdevice on idevice destroy
  iommufd: Add a pre_destroy() op for objects
  iommufd: Add iommufd_object_tombstone_user() helper
  iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
  iommufd/selftest: Test reserved regions near ULONG_MAX
  iommufd: Prevent ALIGN() overflow
  iommu/tegra241-cmdqv: import IOMMUFD module namespace
  iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set
  iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
  iommu/tegra241-cmdqv: Add user-space use support
  iommu/tegra241-cmdqv: Do not statically map LVCMDQs
  iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf()
  iommu/tegra241-cmdqv: Use request_threaded_irq
  iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
  ...
2025-07-31 12:43:08 -07:00