The split follows the pairing with the destroy functions:
- vfio_group_get_device_fd() destroyed by close()
- vfio_device_open() destroyed by vfio_device_fops_release()
- vfio_device_assign_container() destroyed by
vfio_group_try_dissolve_container()
The next patch will put a lock around vfio_device_assign_container().
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is not a performance path, just use the group_rwsem to protect the
value.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Without locking userspace can trigger a UAF by racing
KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD:
CPU1 CPU2
ioctl(KVM_DEV_VFIO_GROUP_DEL)
ioctl(VFIO_GROUP_GET_DEVICE_FD)
vfio_group_get_device_fd
open_device()
intel_vgpu_open_device()
vfio_register_notifier()
vfio_register_group_notifier()
blocking_notifier_call_chain(&group->notifier,
VFIO_GROUP_NOTIFY_SET_KVM, group->kvm);
set_kvm()
group->kvm = NULL
close()
kfree(kvm)
intel_vgpu_group_notifier()
vdev->kvm = data
[..]
kvm_get_kvm(vgpu->kvm);
// UAF!
Add a simple rwsem in the group to protect the kvm while the notifier is
using it.
Note this doesn't fix the race internal to i915 where userspace can
trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of
vgpu->kvm and trigger this same UAF, it just makes the notifier
self-consistent.
Fixes: ccd46dbae7 ("vfio: support notifier chain in vfio_group")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fix following coccicheck warning:
./virt/kvm/vfio.c:258:1-7: preceding lock on line 236
If kvm_vfio_file_iommu_group() failed, code would goto err_fdput with
mutex_lock acquired and then return ret. It might cause potential
deadlock. Move mutex_unlock bellow err_fdput tag to fix it.
Fixes: d55d9e7a45 ("kvm/vfio: Store the struct file in the kvm_vfio_group")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220517023441.4258-1-wanjiabing@vivo.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
There is no macro called _IORW, so use _IOWR in the comment instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20220516101202.88373-1-thuth@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO PCI does a security check as part of hot reset to prove that the user
has permission to manipulate all the devices that will be impacted by the
reset.
Use a new API vfio_file_has_dev() to perform this security check against
the struct file directly and remove the vfio_group from VFIO PCI.
Since VFIO PCI was the last user of vfio_group_get_external_user() and
vfio_group_put_external_user() remove it as well.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
None of the VFIO APIs take in the vfio_group anymore, so we can remove it
completely.
This has a subtle side effect on the enforced coherency tracking. The
vfio_group_get_external_user() was holding on to the container_users which
would prevent the iommu_domain and thus the enforced coherency value from
changing while the group is registered with kvm.
It changes the security proof slightly into 'user must hold a group FD
that has a device that cannot enforce DMA coherence'. As opening the group
FD, not attaching the container, is the privileged operation this doesn't
change the security properties much.
On the flip side it paves the way to changing the iommu_domain/container
attached to a group at runtime which is something that will be required to
support nested translation.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>i
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Just change the argument from struct vfio_group to struct file *.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Instead of a general extension check change the function into a limited
test if the iommu_domain has enforced coherency, which is the only thing
kvm needs to query.
Make the new op self contained by properly refcounting the container
before touching it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio_group_fops_open() ensures there is only ever one struct file open for
any struct vfio_group at any time:
/* Do we need multiple instances of the group open? Seems not. */
opened = atomic_cmpxchg(&group->opened, 0, 1);
if (opened) {
vfio_group_put(group);
return -EBUSY;
Therefor the struct file * can be used directly to search the list of VFIO
groups that KVM keeps instead of using the
vfio_external_group_match_file() callback to try to figure out if the
passed in FD matches the list or not.
Delete vfio_external_group_match_file().
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The only caller wants to get a pointer to the struct iommu_group
associated with the VFIO group file. Instead of returning the group ID
then searching sysfs for that string to get the struct iommu_group just
directly return the iommu_group pointer already held by the vfio_group
struct.
It already has a safe lifetime due to the struct file kref, the vfio_group
and thus the iommu_group cannot be destroyed while the group file is open.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Following patches will change the APIs to use the struct file as the handle
instead of the vfio_group, so hang on to a reference to it with the same
duration of as the vfio_group.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
To make it easier to read and change in following patches.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that the iommu core takes care of isolation there is no race between
driver attach and container unset. Once iommu_group_release_dma_owner()
returns the device can immediately be re-used.
Remove this mechanism.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v1-a1e8791d795b+6b-vfio_container_q_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Once the group enters 'owned' mode it can never be assigned back to the
default_domain or to a NULL domain. It must always be actively assigned to
a current domain. If the caller hasn't provided a domain then the core
must provide an explicit DMA blocking domain that has no DMA map.
Lazily create a group-global blocking DMA domain when
iommu_group_claim_dma_owner is first called and immediately assign the
group to it. This ensures that DMA is immediately fully isolated on all
IOMMU drivers.
If the user attaches/detaches while owned then detach will set the group
back to the blocking domain.
Slightly reorganize the call chains so that
__iommu_group_set_core_domain() is the function that removes any caller
configured domain and sets the domains back a core owned domain with an
appropriate lifetime.
__iommu_group_set_domain() is the worker function that can change the
domain assigned to a group to any target domain, including NULL.
Add comments clarifying how the NULL vs detach_dev vs default_domain works
based on Robin's remarks.
This fixes an oops with VFIO and SMMUv3 because VFIO will call
iommu_detach_group() and then immediately iommu_domain_free(), but
SMMUv3 has no way to know that the domain it is holding a pointer to
has been freed. Now the iommu_detach_group() will assign the blocking
domain and SMMUv3 will no longer hold a stale domain reference.
Fixes: 1ea2a07a53 ("iommu: Add DMA ownership management interfaces")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Tested-by: Baolu Lu <baolu.lu@linux.intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Co-developed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
--
Just minor polishing as discussed
v3:
- Change names to __iommu_group_set_domain() /
__iommu_group_set_core_domain()
- Clarify comments
- Call __iommu_group_set_domain() directly in
iommu_group_release_dma_owner() since we know it is always selecting
the default_domain
- Remove redundant detach_dev ops check in __iommu_detach_device and
make the added WARN_ON fail instead
- Check for blocking_domain in __iommu_attach_group() so VFIO can
actually attach a new group
- Update comments and spelling
- Fix missed change to new_domain in iommu_group_do_detach_device()
v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com
v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The last user of this function is in PCI callbacks that want to convert
their struct pci_dev to a vfio_device. Instead of searching use the
vfio_device available trivially through the drvdata.
When a callback in the device_driver is called, the caller must hold the
device_lock() on dev. The purpose of the device_lock is to prevent
remove() from being called (see __device_release_driver), and allow the
driver to safely interact with its drvdata without races.
The PCI core correctly follows this and holds the device_lock() when
calling error_detected (see report_error_detected) and
sriov_configure (see sriov_numvfs_store).
Further, since the drvdata holds a positive refcount on the vfio_device
any access of the drvdata, under the device_lock(), from a driver callback
needs no further protection or refcounting.
Thus the remark in the vfio_device_get_from_dev() comment does not apply
here, VFIO PCI drivers all call vfio_unregister_group_dev() from their
remove callbacks under the device_lock() and cannot race with the
remaining callers.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Having a consistent pointer in the drvdata will allow the next patch to
make use of the drvdata from some of the core code helpers.
Use a WARN_ON inside vfio_pci_core_register_device() to detect drivers
that miss this.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When the open_device() op is called the container_users is incremented and
held incremented until close_device(). Thus, so long as drivers call
functions within their open_device()/close_device() region they do not
need to worry about the container_users.
These functions can all only be called between open_device() and
close_device():
vfio_pin_pages()
vfio_unpin_pages()
vfio_dma_rw()
vfio_register_notifier()
vfio_unregister_notifier()
Eliminate the calls to vfio_group_add_container_user() and add
vfio_assert_device_open() to detect driver mis-use. This causes the
close_device() op to check device->open_count so always leave it elevated
while calling the op.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that callers have been updated to use the vfio_device APIs the driver
facing group interface is no longer used, delete it:
- vfio_group_get_external_user_from_dev()
- vfio_group_pin_pages()
- vfio_group_unpin_pages()
- vfio_group_iommu_domain()
--
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Use the existing vfio_device versions of vfio_(un)pin_pages(). There is no
reason to use a group interface here, kvmgt has easy access to a
vfio_device.
Delete kvmgt_vdev::vfio_group since these calls were the last users.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/5-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Every caller has a readily available vfio_device pointer, use that instead
of passing in a generic struct device. Change vfio_dma_rw() to take in the
struct vfio_device and move the container users that would have been held
by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like
vfio_pin/unpin_pages().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Every caller has a readily available vfio_device pointer, use that instead
of passing in a generic struct device. The struct vfio_device already
contains the group we need so this avoids complexity, extra refcountings,
and a confusing lifecycle model.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The next patch wants the vfio_device instead. There is no reason to store
a pointer here since we can container_of back to the vfio_device.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
All callers have a struct vfio_device trivially available, pass it in
directly and avoid calling the expensive vfio_group_get_from_dev().
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
IOMMU groups have been mandatory for some time now, so a device without
one is necessarily a device without any usable IOMMU, therefore the
iommu_present() check is redundant (or at best unhelpful).
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/537103bbd7246574f37f2c88704d7824a3a889f2.1649160714.git.robin.murphy@arm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
From Yishai:
This series improves mlx5 live migration driver in few aspects as of
below.
Refactor to enable running migration commands in parallel over the PF
command interface.
To achieve that we exposed from mlx5_core an API to let the VF be
notified before that the PF command interface goes down/up. (e.g. PF
reload upon health recovery).
Once having the above functionality in place mlx5 vfio doesn't need any
more to obtain the global PF lock upon using the command interface but
can rely on the above mechanism to be in sync with the PF.
This can enable parallel VFs migration over the PF command interface
from kernel driver point of view.
In addition,
Moved to use the PF async command mode for the SAVE state command.
This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.
Alex, as this series touches mlx5_core we may need to send this in a
pull request format to VFIO to avoid conflicts before acceptance.
Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com
Signed-of-by: Leon Romanovsky <leonro@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCYntY3AAKCRAp8NhrnBAZ
scRWAP0QzEqg/Xqk/geUAQ3dliFrA2DZJm8v9B3x5tA5nEAazAD9HqC17MvDzY8T
6KBP7G37JNg2NCkxnKnt2gCIT+O4lgA=
=zwWT
-----END PGP SIGNATURE-----
Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next
Improve mlx5 live migration driver
From Yishai:
This series improves mlx5 live migration driver in few aspects as of
below.
Refactor to enable running migration commands in parallel over the PF
command interface.
To achieve that we exposed from mlx5_core an API to let the VF be
notified before that the PF command interface goes down/up. (e.g. PF
reload upon health recovery).
Once having the above functionality in place mlx5 vfio doesn't need any
more to obtain the global PF lock upon using the command interface but
can rely on the above mechanism to be in sync with the PF.
This can enable parallel VFs migration over the PF command interface
from kernel driver point of view.
In addition,
Moved to use the PF async command mode for the SAVE state command.
This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.
Alex, as this series touches mlx5_core we may need to send this in a
pull request format to VFIO to avoid conflicts before acceptance.
Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com
Signed-of-by: Leon Romanovsky <leonro@nvidia.com>
Use the PF asynchronous command mode for the SAVE state command.
This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.
Link: https://lore.kernel.org/r/20220510090206.90374-5-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Refactor to enable different VFs to run their commands over the PF
command interface in parallel and to not block one each other.
This is done by not using the global PF lock that was used before but
relying on the VF attach/detach mechanism to sync.
Link: https://lore.kernel.org/r/20220510090206.90374-4-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Manage the VF attach/detach callback from the PF.
This lets the driver to enable parallel VFs migration as will be
introduced in the next patch.
As part of this, reorganize the VF is migratable code to be in a
separate function and rename it to be set_migratable() to match its
functionality.
Link: https://lore.kernel.org/r/20220510090206.90374-3-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
VF register to be notified for its enablement / disablement by the PF.
Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
its notifier block and upon VF remove it will call
mlx5_sriov_blocking_notifier_unregister() to drop its registration.
This can give a VF the ability to clean some resources upon disable
before that the command interface goes down and on the other hand sets
some stuff before that it's enabled.
This may be used by a VF which is migration capable in few cases.(e.g.
PF load/unload upon an health recovery).
Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Some reverts of existing patches, which were necessary because of boot
issues due to wrong CPU clock handling and cache issues which led to
userspace segfaults with 32bit kernels.
Other than that just small updates and fixes, e.g. defconfig updates,
spelling fixes, a clocksource fix, boot topology fixes and a fix for
/proc/cpuinfo output to satisfy lscpu.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYngOJAAKCRD3ErUQojoP
X2+vAPwPf5JbrOkK9z7OM2xmRpfp1f0vuD5k6fxhc11+F5xpLQEAnxkLOX5//jGK
FmPVDub53u5+Wje+WFJQoqzJ4zyDQQQ=
=4UHA
-----END PGP SIGNATURE-----
Merge tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"Some reverts of existing patches, which were necessary because of boot
issues due to wrong CPU clock handling and cache issues which led to
userspace segfaults with 32bit kernels. Dave has a whole bunch of
upcoming cache fixes which I then plan to push in the next merge
window.
Other than that just small updates and fixes, e.g. defconfig updates,
spelling fixes, a clocksource fix, boot topology fixes and a fix for
/proc/cpuinfo output to satisfy lscpu"
* tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
Revert "parisc: Increase parisc_cache_flush_threshold setting"
parisc: Mark cr16 clock unstable on all SMP machines
parisc: Fix typos in comments
parisc: Change MAX_ADDRESS to become unsigned long long
parisc: Merge model and model name into one line in /proc/cpuinfo
parisc: Re-enable GENERIC_CPU_DEVICES for !SMP
parisc: Update 32- and 64-bit defconfigs
parisc: Only list existing CPUs in cpu_possible_mask
Revert "parisc: Fix patch code locking and flushing"
Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
Revert "parisc: Mark cr16 CPU clocksource unstable on all SMP machines"
- Fix the DWARF CFI in our VDSO time functions, allowing gdb to backtrace through them
correctly.
- Fix a buffer overflow in the papr_scm driver, only triggerable by hypervisor input.
- A fix in the recently added QoS handling for VAS (used for communicating with
coprocessors).
Thanks to: Alan Modra, Haren Myneni, Kajol Jain, Segher Boessenkool.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmJ3r7kTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM1LEACU87gXiSJJW99qD5IvoPncR1nRjd+3
zTAch9Dk70Kt/1Zkn0bEGjkua6YwcOdlA7QXiHBR2HAYli85VWK/w9Vz/TLTbL9/
SrDvSfzwmqbvU61A2ZppvH487z8pfEBmviv3SCrmB3xZWtttYkatEj4A1EjdBJUI
+xq0JgrXj8rAayRpkJS5XEUpkw8eXJ85X1WXdx+peIGvcRKB+n46HyCYhsl3/YVv
pAn6jcnNnPqYzXeE0sQ4ZpybzCkzqVHC4SyCemkp8PWfqyL8LqgIvz2qtCvXAzij
SJikxmHUN3XvHCF4aOgJG6OTkMEz92cpH0hGsb59c4usPCNlRFMuqhJxuYYmNGT3
FtDUrrptsQ5ZRdWttQzi0RFjK0klro5VKvMJRv7uDWIlaPwl3Vi8DxvSrzCSdQPE
Q1XsJS7B/io1JCCzrhrJyRQkIt+Z9e1/3xm618ide1UKQkqVYApZcImJDk7DlDCV
QL1mtqHgasQjDLRkQq/fil/vyt2byw2uzCy6j0X/WQYrKJlUmbXKlcbdbbeCH2qZ
8NndD96wlw+dd/q2u/ZdngQ8f/68n6+a/Zxv7eAbb01JY04VFsCXgdgGHpzhBiJ5
YWqp5qzzuTcvhSYhyfdqFd9KCxxGRIag6jmhs6vbfQbMlZMoo9Jr4vaVm1aIW27t
MRhuEwi2KzQI2A==
=sVeX
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix the DWARF CFI in our VDSO time functions, allowing gdb to
backtrace through them correctly.
- Fix a buffer overflow in the papr_scm driver, only triggerable by
hypervisor input.
- A fix in the recently added QoS handling for VAS (used for
communicating with coprocessors).
Thanks to Alan Modra, Haren Myneni, Kajol Jain, and Segher Boessenkool.
* tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
powerpc/vdso: Fix incorrect CFI in gettimeofday.S
powerpc/pseries/vas: Use QoS credits from the userspace
- Prevent FPU state corruption. The condition in irq_fpu_usable() grants
FPU usage when the FPU is not used in the kernel. That's just wrong as
it does not take the fpregs_lock()'ed regions into account. If FPU usage
happens within such a region from interrupt context, then the FPU state
gets corrupted. That's a long standing bug, which got unearthed by the
recent changes to the random code.
- Josh wants to use his kernel.org email address
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJ3sb0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRR9EACOcJAkO4ZjHvQf8RDw4ZaC/d0PgEC1
rEcxL7Tq9qAjdY+VmoRdzAia1FbKWrSNzENiBaTwdM2dxsZN0cl5fEQAy5ffHKXr
IadRIHICu6INKQ0iuf4VdOt8HuMC+Ams9sFoVDId1avRoejsjIHeCpgBen+0/LQf
D4i+nvUL9hMcZDsWiQW9mTe8J4fqr7rrg+p7tD0300DbZ6/PFx+zWP58TE8K7vQ8
dsmfMXxDrJW3d9FOHHvPQXa/Okdm2fHxXuxs3Quc+7HG6cMcwefCYugf8HK3E14F
q0O6IAOfiYzCL+8aNo4J3H5jPEGLMJ7JlY5Yoygc1mcx0uGyVraMbFOsK8WuRFvP
eAmx31Wh6EIYOwaboSG+74k/b3hPa6Hx3R7aQDS+SnQQI6I9fdi3ZZtQ+DGnZBZG
Ipq/f+EjaROh1atUwhE4zM80UKSU6RWEWAlMO4K07uO8a3RnR8qV7N8tl44i+Q7k
KZUbN5/aV4ccZNwMbazcpZ32fe3SB9cD4e/aLqpMp0uOl9TVxcOA3hIkQ0wflh94
6XO+gPdvr5VxWayc9tljMXUGPxwjTN4zDKUIlZP2EzYHt6SyZpdwi2+8moEfvU+a
qcIWPLeXb+972LaY+rTicT4cQxCKe0CZEXCOq1ns+Ni5f5TdKkvyxpeMIOrGtjYG
/4RqWncPKIyuEw==
=PpOB
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A fix and an email address update:
- Prevent FPU state corruption.
The condition in irq_fpu_usable() grants FPU usage when the FPU is
not used in the kernel. That's just wrong as it does not take the
fpregs_lock()'ed regions into account. If FPU usage happens within
such a region from interrupt context, then the FPU state gets
corrupted.
That's a long standing bug, which got unearthed by the recent
changes to the random code.
- Josh wants to use his kernel.org email address"
* tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Prevent FPU state corruption
MAINTAINERS: Update Josh Poimboeuf's email address
- Mark the NMI safe time accessors notrace to prevent tracer recursion
when they are selected as trace clocks.
- John Stultz has a new email address
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJ3sP0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeFiEADBGWhBZ04Rr87ZGwi7ZTq5Z4uTcRKg
9iXLAS8xG2eYwIdYDqpryx4ugacKTqWBiXPEqwHQlumIJ6LKJDDDJ7WLaRZNJiMg
MEZJ5qYnjDx52BwEL5tsVFv8OeYDneg4f8r7Vq7AdwyUDiNZ6QRsYXfXHdXqfsaQ
IEbvMSWdHiATuJfd3H57G3J9aHw58lcy/n56e1yz4uVDZYgPiw5rMuUV8Y0srOBq
2xPW/Ggq/Lzi8aM8Owu8dkfHpJ9beGLbx3COgIOcLkOkgspmK8D5w5i0AZaIX9LK
ec2uyyNXiay2LtvBjPULDAqGoeRA3rrww5ZC58bk0FIqoROD13nf6iw3R0tTPCk2
EHgZwxKUY1X21HVUeqy4RdTaASsGX6P6TzVSFvaqT89tHX4cSNKzLOSWJBf8NaQT
z1hbTAzuwpE1FTo1og3zxDovEufKv7svc6bblz3MSU3VgW5/F6AZxUQMAu+xCcl7
+nICjC5Xvasg4FLdNiuhrPocaHrNSt73YHC9j97RKcwn6WLSx5kVFt76BLEdW0nI
V6a3ZGs10Jg4+9OGwA/6oQGlqVSv1Fzz+ckBLPZsqMVLAkXgV2BrdmCJ9E8VRn99
0qJzfPHEXdm1JBa4BZUGXHToKUi3LTQxI2eXvauibcLryLPSSKZXCPsSvgbLewOU
/dC4/DkJeSbUQA==
=LX9Y
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A fix and an email address update:
- Mark the NMI safe time accessors notrace to prevent tracer
recursion when they are selected as trace clocks.
- John Stultz has a new email address"
* tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Mark NMI safe time accessors as notrace
MAINTAINERS: Update email address for John Stultz
request/free_irq() can result in a hang because the interrupt thread did
not reach the thread function and got stopped in the kthread core
already. That leaves a state active counter arround which makes a
invocation of synchronized_irq() on that interrupt hang forever. Ensure
that the thread reached the thread function in request_irq() to prevent
that.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJ3sG0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWi/D/wJ71FPiZa2nF+1FfobjQtUmp+YK3kc
6BRSUjrHQ/xhZCjBKEgDcxLC23qzBbjC3e/JfN08rV31iMYOv9onNKorDpO69XkA
FV+8kJsJcbxJMLHcn/tLgM97eIRYCWIzvAjfB2FaOR+QXPV3dc2sVB2E9mF1ZKfz
cEEmcBW9apv9gnX4vujsQTBToakUZpeMi0TYQq/48hXABIrian4uN2p+lv9kVIg7
Z6+GCm3t/uUVIn4DQe3+aP10UzpHdC/y0fhv62LOCNu4v/eGRdbQ3n2fefsANPRa
gaQTaBhuRYc5c123x5HHhYDzYPXWpJOSOtJj2Hk4Ags4Q5oiaute5Q0cZR8uUv49
yHm8UWTlaA3HrQRZbvnwXxUPucL+lcWTuItjseCMHXke8t2n6dKlt3IvwuFVl7XI
Mxgc5GD8dDjuYmDFdtxSAJnzzSJ610X5fzgO38cymnMi4Cf8c/N5SvFWnxAz7AnW
SCUUwwFNs3+gWSrIVfMS45N0tDF3CXIxpd/vMpVMJsEuJgMwTw+qccEtQHavu2GW
I0dO9JJMlTMXqJP3I++CWSUhRuIhaBwyf9wybLWj1WrBtQAcdhORSd9NU38CfdVC
9Zr3MzaqdjJJeZM4D7SlgBnU2bj6s2QlkaR2H0ZXAOe0ItANz0+tndJJ77C8+7ru
Q/s19KqojfIUsw==
=f7Ph
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A fix for the threaded interrupt core.
A quick sequence of request/free_irq() can result in a hang because
the interrupt thread did not reach the thread function and got stopped
in the kthread core already. That leaves a state active counter
arround which makes a invocation of synchronized_irq() on that
interrupt hang forever.
Ensure that the thread reached the thread function in request_irq() to
prevent that"
* tag 'irq-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Synchronize interrupt thread startup
The cr16 interval timers are not synchronized across CPUs, even with just
one dual-core CPU. This becomes visible if the machines have a longer
uptime.
Signed-off-by: Helge Deller <deller@gmx.de>
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Helge Deller <deller@gmx.de>
Dave noticed that for the 32-bit kernel MAX_ADDRESS should be a ULL,
otherwise this define would become 0:
MAX_ADDRESS (1UL << MAX_ADDRBITS)
It has no real effect on the kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>
The Linux tool "lscpu" shows the double amount of CPUs if we have
"model" and "model name" in two different lines in /proc/cpuinfo.
This change combines the model and the model name into one line.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
In commit 62773112ac ("parisc: Switch from GENERIC_CPU_DEVICES to
GENERIC_ARCH_TOPOLOGY") GENERIC_CPU_DEVICES was unconditionally turned
off, but this triggers a warning in topology_add_dev(). Turning it back
on for the !SMP case avoids this warning.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 62773112ac ("parisc: Switch from GENERIC_CPU_DEVICES to GENERIC_ARCH_TOPOLOGY")
Signed-off-by: Helge Deller <deller@gmx.de>
Enable CONFIG_CGROUPS=y on 32-bit defconfig for systemd-support, and
enable CONFIG_NAMESPACES and CONFIG_USER_NS.
Signed-off-by: Helge Deller <deller@gmx.de>
The inventory knows which CPUs are in the system, so this bitmask should
be in cpu_possible_mask instead of the bitmask based on CONFIG_NR_CPUS.
Reset the cpu_possible_mask before scanning the system for CPUs, and
mark each existing CPU as possible during initialization of that CPU.
This avoids those warnings later on too:
register_cpu_capacity_sysctl: too early to get CPU4 device!
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>
This reverts commit d97180ad68.
It triggers RCU stalls at boot with a 32-bit kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v5.15+
This reverts commit afdb4a5b1d.
It triggers RCU stalls at boot with a 32-bit kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v5.16+