This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
__arm_lpae_unmap() returns size_t but was returning -ENOENT (negative
error code) when encountering an unmapped PTE. Since size_t is unsigned,
-ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE
on 64-bit systems).
This corrupted value propagates through the call chain:
__arm_lpae_unmap() returns -ENOENT as size_t
-> arm_lpae_unmap_pages() returns it
-> __iommu_unmap() adds it to iova address
-> iommu_pgsize() triggers BUG_ON due to corrupted iova
This can cause IOVA address overflow in __iommu_unmap() loop and
trigger BUG_ON in iommu_pgsize() from invalid address alignment.
Fix by returning 0 instead of -ENOENT. The WARN_ON already signals
the error condition, and returning 0 (meaning "nothing unmapped")
is the correct semantic for size_t return type. This matches the
behavior of other io-pgtable implementations (io-pgtable-arm-v7s,
io-pgtable-dart) which return 0 on error conditions.
Fixes: 3318f7b5ce ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()")
Cc: stable@vger.kernel.org
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Clean up the io-pgtable-arm library by moving the selftests out.
Next the tests will be registered with kunit.
This is useful also to factor out kernel specific code out, so
it can compiled as part of the hypervisor object.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
At the moment, if the selftest fails it prints a lot of information
about the page table (size, levels...) this requires access to many
internals, which has to be exposed in the next patch moving the
tests out.
Instead, we can simplify the print to only print the fmt and
for each test ias, oas and pgsize_bitmap are already printed.
That is enough to identify the failed case, and the rest can
be deduced from the code.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Commit 33729a5fc0 ("iommu/io-pgtable-arm: Remove split on unmap
behavior") removed the last user of the macro iopte_prot. Remove the
macro definition of iopte_prot as well as three other related
definitions.
Fixes: 33729a5fc0 ("iommu/io-pgtable-arm: Remove split on unmap behavior")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20250708211705.1567787-1-danielmentz@google.com
Signed-off-by: Will Deacon <will@kernel.org>
In situations where mapping/unmapping sequence can be controlled by
userspace, attempting to map over a region that has not yet been
unmapped is an error. But not something that should spam dmesg.
Now that there is a quirk, we can also drop the selftest_running
flag, and use the quirk instead for selftests.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20250519175348.11924-6-robdclark@gmail.com
[will: Rename quirk to IO_PGTABLE_QUIRK_NO_WARN per Robin's suggestion]
Signed-off-by: Will Deacon <will@kernel.org>
In general a 'struct device' is way too large to be put on the kernel
stack. Apparently something just caused it to grow a slightly larger,
which pushed the arm_lpae_do_selftests() function over the warning
limit in some configurations:
drivers/iommu/io-pgtable-arm.c:1423:19: error: stack frame size (1032) exceeds limit (1024) in 'arm_lpae_do_selftests' [-Werror,-Wframe-larger-than]
1423 | static int __init arm_lpae_do_selftests(void)
| ^
Change the function to use a dynamically allocated faux_device
instead of the on-stack device structure.
Fixes: ca25ec247a ("iommu/io-pgtable-arm: Remove iommu_dev==NULL special case")
Link: https://lore.kernel.org/all/ab75a444-22a1-47f5-b3c0-253660395b5a@arm.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20250423164826.2931382-1-arnd@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Convert most of the places calling get_order() as an argument to the
iommu-pages allocator into order_base_2() or the _sz flavour
instead. These places already have an exact size, there is no particular
reason to use order here.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/19-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The entire allocator API is built around using the kernel virtual address,
it is illegal to pass GFP_HIGHMEM in as a GFP flag. Block it in the common
code. Remove the duplicated checks from drivers.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/14-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that we have a folio under the allocation iommu_free_pages() can know
the order of the original allocation and do the correct thing to free it.
The next patch will rename iommu_free_page() to iommu_free_pages() so we
have naming consistency with iommu_alloc_pages_node().
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241210165127.600817-4-robdclark@gmail.com
[will: Removed 'arm_lpae_io_pgtable_walk_data::level' per Mostafa]
Signed-off-by: Will Deacon <will@kernel.org>
We can re-use this basic pgtable walk logic in a few places.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241210165127.600817-2-robdclark@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
The newly introduced arm_lpae_concat_mandatory() function reads the
ias/oas fields from the 'io_pgtable_cfg' copy embedded inside the
'arm_lpae_io_pgtable' structure. However, this copy is not set until
later in alloc_io_pgtable_ops() after the alloc() function has been
called.
Use the address sizes passed in the 'io_pgtable_cfg' structure when
deciding whether or not to concatenate the PGD.
Fixes: 4dcac8407f ("iommu/io-pgtable-arm: Fix stage-2 concatenation with 16K")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241215200412.561400-1-smostafa@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Run selftests with different OAS values intead of hardcoding it to 48
bits.
We always keep OAS >= IAS to make the config valid for stage-2.
This can be further improved, if we split IAS/OAS configuration for
stage-1 and stage-2 (to use input sizes compatible with VA_BITS as
SMMUv3 does, or IAS > OAS which is valid for stage-1).
However, that adds more complexity, and the current change improves
coverage and makes it possible to test all concatenation cases.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241202140604.422235-3-smostafa@google.com
Signed-off-by: Will Deacon <will@kernel.org>
At the moment, io-pgtable-arm uses concatenation only if it is
possible at level 0, which misses a case where concatenation is
mandatory at level 1 according to R_SRKBC in Arm spec DDI0487 K.a.
Also, that means concatenation can be used when not mandated,
contradicting the comment on the code. However, these cases can only
happen if the SMMUv3 driver is changed to use ias != oas for stage-2.
This patch re-writes the code to use concatenation only if mandatory,
fixing the missing case for level-1 and granule 16K with PA = 40 bits.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241202140604.422235-2-smostafa@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Including:
- Core Updates:
- Convert call-sites using iommu_domain_alloc() to more specific
versions and remove function.
- Introduce iommu_paging_domain_alloc_flags().
- Extend support for allocating PASID-capable domains to more
drivers.
- Remove iommu_present().
- Some smaller improvements.
- New IOMMU driver for RISC-V.
- Intel VT-d Updates:
- Add domain_alloc_paging support.
- Enable user space IOPFs in non-PASID and non-svm cases.
- Small code refactoring and cleanups.
- Add domain replacement support for pasid.
- AMD-Vi Updates:
- Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2
page-tables by default.
- Replace custom domain ID allocator with IDA allocator.
- Add ops->release_domain() support.
- Other improvements to device attach and domain allocation code
paths.
- ARM-SMMU Updates:
- SMMUv2:
- Return -EPROBE_DEFER for client devices probing before their SMMU.
- Devicetree binding updates for Qualcomm MMU-500 implementations.
- SMMUv3:
- Minor fixes and cleanup for NVIDIA's virtual command queue driver.
- IO-PGTable:
- Fix indexing of concatenated PGDs and extend selftest coverage.
- Remove unused block-splitting support.
- S390 IOMMU:
- Implement support for blocking domain.
- Mediatek IOMMU:
- Enable 35-bit physical address support for mt8186.
- OMAP IOMMU driver:
- Adapt to recent IOMMU core changes and unbreak driver.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmdAPOoACgkQK/BELZcB
GuOs1w/+PoLbOYUjmJiOfpI6YNSEfF2tE4z2al/YYIBcNoAmTTRauuhv6+S0gVRy
NTfSucw7OuLlbE9vGsdY02UL1PK58NGfUF8Z2rZSf+RRgLACc47cjZWh0vzDlNbP
4LTdqJXmIWiYcmDtY7LmHtwTSiB900YFZwZOHmTSfNyJt8UC4tBPRh8k2YD3vuxc
QZlxSihEf+F+vm8GtW40Ia9BiG3YhCYAcHq6Y4dKxI0JWN+7oRiPN8CF+z/vcdjV
VpCDBcbHjvqqpXJvddQHA0SrGDBMHz1AXYhRXnfe7Ogh6SbaSWDSsdaIS27DsOzC
L6fxW3+sNmfEOO1RmJoizkHzAtkLWCLNjBvjOb1hUCpwLcKf5nhgE3wOQSwzqumn
KbxpoQpHFJutikDBGRsKJCsNqS8ZNWd4Z8rHhTnq2ctuYUFvurkcwX4WXOSRpsoA
iJ+x1ezk9FxObHj/B+1nIAwKoeaLyFEwJe7Etom/E2m/2mq2oQOrq1bvfIGCms5h
mqLYJ9L9MDanhEiOshHooy6ROPD842XmWILfq3HUi9JcrB/BvILPRsESQnNAn3Zl
8ImbR5VijGGDy50KBE8I9abRwDTIn9c2JJVDSh3tAz1aicGnRLcIeqNeuJ4IEQZf
IQb7qcZQge17ie/Pwr24GlwrKG7DhOg5NXvl3DiVUum2NFGjuBc=
=V9hb
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
"Core Updates:
- Convert call-sites using iommu_domain_alloc() to more specific
versions and remove function
- Introduce iommu_paging_domain_alloc_flags()
- Extend support for allocating PASID-capable domains to more drivers
- Remove iommu_present()
- Some smaller improvements
New IOMMU driver for RISC-V
Intel VT-d Updates:
- Add domain_alloc_paging support
- Enable user space IOPFs in non-PASID and non-svm cases
- Small code refactoring and cleanups
- Add domain replacement support for pasid
AMD-Vi Updates:
- Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2
page-tables by default
- Replace custom domain ID allocator with IDA allocator
- Add ops->release_domain() support
- Other improvements to device attach and domain allocation code
paths
ARM-SMMU Updates:
- SMMUv2:
- Return -EPROBE_DEFER for client devices probing before their
SMMU
- Devicetree binding updates for Qualcomm MMU-500 implementations
- SMMUv3:
- Minor fixes and cleanup for NVIDIA's virtual command queue
driver
- IO-PGTable:
- Fix indexing of concatenated PGDs and extend selftest coverage
- Remove unused block-splitting support
S390 IOMMU:
- Implement support for blocking domain
Mediatek IOMMU:
- Enable 35-bit physical address support for mt8186
OMAP IOMMU driver:
- Adapt to recent IOMMU core changes and unbreak driver"
* tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (92 commits)
iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift
iommu: Make set_dev_pasid op support domain replacement
iommu/arm-smmu-v3: Make set_dev_pasid() op support replace
iommu/vt-d: Add set_dev_pasid callback for nested domain
iommu/vt-d: Make identity_domain_set_dev_pasid() to handle domain replacement
iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement
iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain
iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
iommu/vt-d: Add iommu_domain_did() to get did
iommu/vt-d: Consolidate the struct dev_pasid_info add/remove
iommu/vt-d: Add pasid replace helpers
iommu/vt-d: Refactor the pasid setup helpers
iommu/vt-d: Add a helper to flush cache for updating present pasid entry
iommu: Pass old domain to set_dev_pasid op
iommu/iova: Fix typo 'adderss'
iommu: Add a kdoc to iommu_unmap()
iommu/io-pgtable-arm-v7s: Remove split on unmap behavior
iommu/io-pgtable-arm: Remove split on unmap behavior
iommu/vt-d: Drain PRQs when domain removed from RID
iommu/vt-d: Drop pasid requirement for prq initialization
...
Force Write Back (FWB) changes how the S2 IOPTE's MemAttr field
works. When S2FWB is supported and enabled the IOPTE will force cachable
access to IOMMU_CACHE memory when nesting with a S1 and deny cachable
access when !IOMMU_CACHE.
When using a single stage of translation, a simple S2 domain, it doesn't
change things for PCI devices as it is just a different encoding for the
existing mapping of the IOMMU protection flags to cachability attributes.
For non-PCI it also changes the combining rules when incoming transactions
have inconsistent attributes.
However, when used with a nested S1, FWB has the effect of preventing the
guest from choosing a MemAttr in it's S1 that would cause ordinary DMA to
bypass the cache. Consistent with KVM we wish to deny the guest the
ability to become incoherent with cached memory the hypervisor believes is
cachable so we don't have to flush it.
Allow NESTED domains to be created if the SMMU has S2FWB support and use
S2FWB for NESTING_PARENTS. This is an additional option to CANWBS.
Link: https://patch.msgid.link/r/10-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
A minority of page table implementations (arm_lpae, armv7) are unique in
how they handle partial unmap of large IOPTEs.
Other implementations will unmap the large IOPTE and return it's
length. For example if a 2M IOPTE is present and the first 4K is requested
to be unmapped then unmap will remove the whole 2M and report 2M as the
result.
arm_lpae instead replaces the IOPTE with a table of smaller IOPTEs, unmaps
the 4K and returns 4k. This is actually an illegal/non-hitless operation
on at least SMMUv3 because of the BBM level 0 rules.
Will says this was done to support VFIO, but upon deeper analysis this was
never strictly necessary:
https://lore.kernel.org/all/20241024134411.GA6956@nvidia.com/
In summary, historical VFIO supported the AMD behavior of unmapping the
whole large IOPTE and returning the size, even if asked to unmap a
portion. The driver would see this as a request to split a large IOPTE.
Modern VFIO always unmaps entire large IOPTEs (except on AMD) and drivers
don't see an IOPTE split.
Given it doesn't work fully correctly on SMMUv3 and relying on ARM unique
behavior would create portability problems across IOMMU drivers, retire
this functionality.
Outside the iommu users, this will potentially effect io_pgtable users of
ARM_32_LPAE_S1, ARM_32_LPAE_S2, ARM_64_LPAE_S1, ARM_64_LPAE_S2, and
ARM_MALI_LPAE formats.
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Add a case in the selftests that can detect some bugs with concatenated
page tables, where it maps the biggest supported page size at the end of
the IAS, this test would fail without the previous fix.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241024162516.2005652-3-smostafa@google.com
Signed-off-by: Will Deacon <will@kernel.org>
ARM_LPAE_LVL_IDX() takes into account concatenated PGDs and can return
an index spanning multiple page-table pages given a sufficiently large
input address. However, when the resulting index is used to calculate
the number of remaining entries in the page, the possibility of
concatenation is ignored and we end up computing a negative upper bound:
max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
On the map path, this results in a negative 'mapped' value being
returned but on the unmap path we can leak child tables if they are
skipped in __arm_lpae_free_pgtable().
Introduce an arm_lpae_max_entries() helper to convert a table index into
the remaining number of entries within a single page-table page.
Cc: <stable@vger.kernel.org>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241024162516.2005652-2-smostafa@google.com
[will: Tweaked comment and commit message]
Signed-off-by: Will Deacon <will@kernel.org>
The current __arm_lpae_unmap() function calls dma_sync() on individual
PTEs after clearing them. Overall unmap performance can be improved by
around 25% for large buffer sizes by combining the syncs for adjacent
leaf entries.
Optimize the unmap time by clearing all the leaf entries and issuing a
single dma_sync() for them.
Below is detailed analysis of average unmap latency(in us) with and
without this optimization obtained by running dma_map_benchmark for
different buffer sizes.
UnMap Latency(us)
Size Without With % gain with
optimiztion optimization optimization
4KB 3 3 0
8KB 4 3.8 5
16KB 6.1 5.4 11.48
32KB 10.2 8.5 16.67
64KB 18.5 14.9 19.46
128KB 35 27.5 21.43
256KB 67.5 52.2 22.67
512KB 127.9 97.2 24.00
1MB 248.6 187.4 24.62
2MB 65.5 65.5 0
4MB 119.2 119 0.17
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240806105135.218089-1-amhetre@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These three implementations of map_pages() all succeed if a mapping is
requested with no read or write. Since they return back to __iommu_map()
leaving the mapped output as 0 it triggers an infinite loop. Therefore
nothing is using no-access protection bits.
Further, VFIO and iommufd rely on iommu_iova_to_phys() to get back PFNs
stored by map, if iommu_map() succeeds but iommu_iova_to_phys() fails that
will create serious bugs.
Thus remove this never used "nothing to do" concept and just fail map
immediately.
Fixes: e5fc9753b1 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
Fixes: e1d3c0fd70 ("iommu: add ARM LPAE page table allocator")
Fixes: 745ef1092b ("iommu/io-pgtable: Move Apple DART support to its own file")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/2-v1-1211e1294c27+4b1-iommu_no_prot_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If io-pgtable quirk flag indicates support for hardware update of
dirty state, enable HA/HD bits in the SMMU CD and also set the DBM
bit in the page descriptor.
Now report the dirty page tracking capability of SMMUv3 and
select IOMMUFD_DRIVER for ARM_SMMU_V3 if IOMMUFD is enabled.
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20240703101604.2576-6-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
.read_and_clear_dirty() IOMMU domain op takes care of reading the dirty
bits (i.e. PTE has DBM set and AP[2] clear) and marshalling into a
bitmap of a given page size.
While reading the dirty bits we also set the PTE AP[2] bit to mark it
as writeable-clean depending on read_and_clear_dirty() flags.
PTE states with respect to DBM bit:
DBM bit AP[2]("RDONLY" bit)
1. writable_clean 1 1
2. writable_dirty 1 0
3. read-only 0 1
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20240703101604.2576-4-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Convert iommu/io-pgtable-arm.c to use the new page allocation functions
provided in iommu-pages.h.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240413002522.1101315-5-pasha.tatashin@soleen.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We need that in order to implement the VM_BIND ioctl in the GPU driver
targeting new Mali GPUs.
VM_BIND is about executing MMU map/unmap requests asynchronously,
possibly after waiting for external dependencies encoded as dma_fences.
We intend to use the drm_sched framework to automate the dependency
tracking and VM job dequeuing logic, but this comes with its own set
of constraints, one of them being the fact we are not allowed to
allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
sort of deadlocks:
- VM_BIND map job needs to allocate a page table to map some memory
to the VM. No memory available, so kswapd is kicked
- GPU driver shrinker backend ends up waiting on the fence attached to
the VM map job or any other job fence depending on this VM operation.
With custom allocators, we will be able to pre-reserve enough pages to
guarantee the map/unmap operations we queued will take place without
going through the system allocator. But we can also optimize
allocation/reservation by not free-ing pages immediately, so any
upcoming page table allocation requests can be serviced by some free
page table pool kept at the driver level.
I might also be valuable for other aspects of GPU and similar
use-cases, like fine-grained memory accounting and resource limiting.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20231124142434.1577550-3-boris.brezillon@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The pte format used by the DARTs found in the Apple M1 (t8103) is not
fully compatible with io-pgtable-arm. The 24 MSB are used for subpage
protection (mapping only parts of page) and conflict with the address
mask. In addition bit 1 is not available for tagging entries but disables
subpage protection. Subpage protection could be useful to support a CPU
granule of 4k with the fixed IOMMU page size of 16k.
The DARTs found on Apple M1 Pro/Max/Ultra use another different pte
format which is even less compatible. To support an output address size
of 42 bit the address is shifted down by 4. Subpage protection is
mandatory and bit 1 signifies uncached mappings used by the display
controller.
It would be advantageous to share code for all known Apple DART
variants to support common features. The page table allocator for DARTs
is less complex since it uses a two levels of translation table without
support for huge pages.
Signed-off-by: Janne Grunau <j@jannau.net>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Sven Peter <sven@svenpeter.dev>
Acked-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/20220916094152.87137-3-j@jannau.net
[ joro: Fix compile warning in __dart_alloc_pages()]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The special case to allow iommu_dev==NULL in __arm_lpae_alloc_pages() is
confusing to static checkers (and possibly readers in general), since
it's not obvious that that is only intended for the selftests. However
it only serves to get around the dev_to_node() call, and we can easily
fake up enough to make that work anyway, so let's simply remove this
consideration from the normal flow and punt the responsibility over to
the test harness itself.
Reported-by: Rustam Subkhankulov <subkhankulov@ispras.ru>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e2095eeda305071cb56c2cb8ac8a82dc3bd4dcab.1660580155.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Table descriptors were being installed without properly formatting the
address using paddr_to_iopte, which does not match up with the
iopte_deref in __arm_lpae_map. This is incorrect for the LPAE pte
format, as it does not handle the high bits properly.
This was found on Apple T6000 DARTs, which require a new pte format
(different shift); adding support for that to
paddr_to_iopte/iopte_to_paddr caused it to break badly, as even <48-bit
addresses would end up incorrect in that case.
Fixes: 6c89928ff7 ("iommu/io-pgtable-arm: Support 52-bit physical address")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/20211120031343.88034-1-marcan@marcan.st
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Previously io-pgtable merely passed the iommu_iotlb_gather pointer
through to helpers, but now it has grown its own direct dereference.
This turns out to break the build for !IOMMU_API configs where the
structure only has a dummy definition. It will probably also crash
drivers who don't use the gather mechanism and simply pass in NULL.
Wrap this dereference in a suitable helper which can both be stubbed
out for !IOMMU_API and encapsulate a NULL check otherwise.
Fixes: 7a7c5badf8 ("iommu: Indicate queued flushes via gather data")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/83672ee76f6405c82845a55c148fa836f56fbbc1.1629465282.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IO_PGTABLE_QUIRK_NON_STRICT was never a very comfortable fit, since it's
not a quirk of the pagetable format itself. Now that we have a more
appropriate way to convey non-strict unmaps, though, this last of the
non-quirk quirks can also go, and with the flush queue code also now
enforcing its own ordering we can have a lovely cleanup all round.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/155b5c621cd8936472e273a8b07a182f62c6c20d.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Apple's DART iommu uses a pagetable format that shares some
similarities with the ones already implemented by io-pgtable.c.
Add a new format variant to support the required differences
so that we don't have to duplicate the pagetable handling code.
Reviewed-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20210803121651.61594-2-sven@svenpeter.dev
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-11-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The PTE methods currently operate on a single entry. In preparation
for manipulating multiple PTEs in one map or unmap call, allow them
to handle multiple PTEs.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/1623850736-389584-10-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- IOVA allocation optimisations and removal of unused code
- Introduction of DOMAIN_ATTR_IO_PGTABLE_CFG for parameterising the
page-table of an IOMMU domain
- Support for changing the default domain type in sysfs
- Optimisation to the way in which identity-mapped regions are created
- Driver updates:
* Arm SMMU updates, including continued work on Shared Virtual Memory
* Tegra SMMU updates, including support for PCI devices
* Intel VT-D updates, including conversion to the IOMMU-DMA API
- Cleanup, kerneldoc and minor refactoring
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/XWy8QHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPejB/46QsXATkWt7hbDPIxlUvzUG8VP/FBNJ6A3
/4Z+4KBXR3zhvZJOEqTarnm6Uc22tWkYpNS3QAOuRW0EfVeD8H+og4SOA2iri5tR
x3GZUCng93APWpHdDtJP7kP/xuU47JsBblY/Ip9aJKYoXi9c9svtssAqKr008wxr
knv/xv/awQ0O7CNc3gAoz7mUagQxG/no+HMXMT3Fz9KWRzzvTi6s+7ZDm2faI0hO
GEJygsKbXxe1qbfeGqKTP/67EJVqjTGsLCF2zMogbnnD7DxadJ2hP0oNg5tvldT/
oDj9YWG6oLMfIVCwDVQXuWNfKxd7RGORMbYwKNAaRSvmkli6625h
=KFOO
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull IOMMU updates from Will Deacon:
"There's a good mixture of improvements to the core code and driver
changes across the board.
One thing worth pointing out is that this includes a quirk to work
around behaviour in the i915 driver (see 65f746e828 ("iommu: Add
quirk for Intel graphic devices in map_sg")), which otherwise
interacts badly with the conversion of the intel IOMMU driver over to
the DMA-IOMMU APU but has being fixed properly in the DRM tree.
We'll revert the quirk later this cycle once we've confirmed that
things don't fall apart without it.
Summary:
- IOVA allocation optimisations and removal of unused code
- Introduction of DOMAIN_ATTR_IO_PGTABLE_CFG for parameterising the
page-table of an IOMMU domain
- Support for changing the default domain type in sysfs
- Optimisation to the way in which identity-mapped regions are
created
- Driver updates:
* Arm SMMU updates, including continued work on Shared Virtual
Memory
* Tegra SMMU updates, including support for PCI devices
* Intel VT-D updates, including conversion to the IOMMU-DMA API
- Cleanup, kerneldoc and minor refactoring"
* tag 'iommu-updates-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (50 commits)
iommu/amd: Add sanity check for interrupt remapping table length macros
dma-iommu: remove __iommu_dma_mmap
iommu/io-pgtable: Remove tlb_flush_leaf
iommu: Stop exporting free_iova_mem()
iommu: Stop exporting alloc_iova_mem()
iommu: Delete split_and_remove_iova()
iommu/io-pgtable-arm: Remove unused 'level' parameter from iopte_type() macro
iommu: Defer the early return in arm_(v7s/lpae)_map
iommu: Improve the performance for direct_mapping
iommu: avoid taking iova_rbtree_lock twice
iommu/vt-d: Avoid GFP_ATOMIC where it is not needed
iommu/vt-d: Remove set but not used variable
iommu: return error code when it can't get group
iommu: Fix htmldocs warnings in sysfs-kernel-iommu_groups
iommu: arm-smmu-impl: Add a space before open parenthesis
iommu: arm-smmu-impl: Use table to list QCOM implementations
iommu/arm-smmu: Move non-strict mode to use io_pgtable_domain_attr
iommu/arm-smmu: Add support for pagetable config domain attribute
iommu: Document usage of "/sys/kernel/iommu_groups/<grp_id>/type" file
iommu: Take lock before reading iommu group default domain type
...
* Shutdown hook for GPU (to ensure GPU is idle before iommu goes away)
* GPU cooling device support
* DSI 7nm and 10nm phy/pll updates
* Additional sm8150/sm8250 DPU support (merge_3d and DSPP color
processing)
* Various DP fixes
* A whole bunch of W=1 fixes from Lee Jones
* GEM locking re-work (no more trylock_recursive in shrinker!)
* LLCC (system cache) support
* Various other fixes/cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGt0G=H3_RbF_GAQv838z5uujSmFd+7fYhL6Yg=23LwZ=g@mail.gmail.com
The only user of tlb_flush_leaf is a particularly hairy corner of the
Arm short-descriptor code, which wants a synchronous invalidation to
minimise the races inherent in trying to split a large page mapping.
This is already far enough into "here be dragons" territory that no
sensible caller should ever hit it, and thus it really doesn't need
optimising. Although using tlb_flush_walk there may technically be
more heavyweight than needed, it does the job and saves everyone else
having to carry around useless baggage.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/9844ab0c5cb3da8b2f89c6c2da16941910702b41.1606324115.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Miscellaneous IOMMU changes for 5.11. Largely cosmetic, apart from a
change to the way in which identity-mapped domains are configured so
that the requests are now batched and can potentially use larger pages
for the mapping.
* for-next/iommu/misc:
iommu/io-pgtable-arm: Remove unused 'level' parameter from iopte_type() macro
iommu: Defer the early return in arm_(v7s/lpae)_map
iommu: Improve the performance for direct_mapping
iommu: return error code when it can't get group
iommu: Modify the description of iommu_sva_unbind_device
Although handling a mapping request with no permissions is a
trivial no-op, defer the early return until after the size/range
checks so that we are consistent with other mapping requests.
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Link: https://lore.kernel.org/r/20201207115758.9400-1-zhukeqian1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>