diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index d16a4d2c3a41..e93606ecfb01 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst
@@ -17,14 +17,23 @@ What is KUnit?
==============
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
-These tests are able to be run locally on a developer's workstation without a VM
-or special hardware.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
+KUnit consists of a kernel component, which provides a set of macros for easily
+writing unit tests. Tests written against KUnit will run on kernel boot if
+built-in, or when loaded if built as a module. These tests write out results to
+the kernel log in `TAP `_ format.
+
+To make running these tests (and reading the results) easier, KUnit offers
+:doc:`kunit_tool `, which builds a `User Mode Linux
+`_ kernel, runs it, and parses the test
+results. This provides a quick way of running KUnit tests during development,
+without requiring a virtual machine or separate hardware.
+
Get started now: :doc:`start`
Why KUnit?
@@ -36,21 +45,20 @@ allow all possible code paths to be tested in the code under test; this is only
possible if the code under test is very small and does not have any external
dependencies outside of the test's control like hardware.
-Outside of KUnit, there are no testing frameworks currently
-available for the kernel that do not require installing the kernel on a test
-machine or in a VM and all require tests to be written in userspace running on
-the kernel; this is true for Autotest, and kselftest, disqualifying
-any of them from being considered unit testing frameworks.
+KUnit provides a common framework for unit tests within the kernel.
-KUnit addresses the problem of being able to run tests without needing a virtual
-machine or actual hardware with User Mode Linux. User Mode Linux is a Linux
-architecture, like ARM or x86; however, unlike other architectures it compiles
-to a standalone program that can be run like any other program directly inside
-of a host operating system; to be clear, it does not require any virtualization
-support; it is just a regular program.
+KUnit tests can be run on most architectures, and most tests are architecture
+independent. All built-in KUnit tests run on kernel startup. Alternatively,
+KUnit and KUnit tests can be built as modules and tests will run when the test
+module is loaded.
-Alternatively, kunit and kunit tests can be built as modules and tests will
-run when the test module is loaded.
+.. note::
+
+ KUnit can also run tests without needing a virtual machine or actual
+ hardware under User Mode Linux. User Mode Linux is a Linux architecture,
+ like ARM or x86, which compiles the kernel as a Linux executable. KUnit
+ can be used with UML either by building with ``ARCH=um`` (like any other
+ architecture), or by using :doc:`kunit_tool `.
KUnit is fast. Excluding build time, from invocation to completion KUnit can run
several dozen tests in only 10 to 20 seconds; this might not sound like a big
@@ -81,3 +89,5 @@ How do I use it?
* :doc:`start` - for new users of KUnit
* :doc:`usage` - for a more detailed explanation of KUnit features
* :doc:`api/index` - for the list of KUnit APIs used for testing
+* :doc:`kunit-tool` - for more information on the kunit_tool helper script
+* :doc:`faq` - for answers to some common questions about KUnit
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index 50d46394e97e..949af2da81e5 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -12,6 +12,13 @@ the Linux kernel as UML (`User Mode Linux
`_), running KUnit tests, parsing
the test results and displaying them in a user friendly manner.
+kunit_tool addresses the problem of being able to run tests without needing a
+virtual machine or actual hardware with User Mode Linux. User Mode Linux is a
+Linux architecture, like ARM or x86; however, unlike other architectures it
+compiles the kernel as a standalone Linux executable that can be run like any
+other program directly inside of a host operating system. To be clear, it does
+not require any virtualization support: it is just a regular program.
+
What is a kunitconfig?
======================
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index 4e1d24db6b13..e1c5ce80ce12 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -9,11 +9,10 @@ Installing dependencies
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
-KUnit Wrapper
-=============
-Included with KUnit is a simple Python wrapper that helps format the output to
-easily use and read KUnit output. It handles building and running the kernel, as
-well as formatting the output.
+Running tests with the KUnit Wrapper
+====================================
+Included with KUnit is a simple Python wrapper which runs tests under User Mode
+Linux, and formats the test results.
The wrapper can be run with:
@@ -21,22 +20,42 @@ The wrapper can be run with:
./tools/testing/kunit/kunit.py run --defconfig
-For more information on this wrapper (also called kunit_tool) checkout the
+For more information on this wrapper (also called kunit_tool) check out the
:doc:`kunit-tool` page.
Creating a .kunitconfig
-=======================
-The Python script is a thin wrapper around Kbuild. As such, it needs to be
-configured with a ``.kunitconfig`` file. This file essentially contains the
-regular Kernel config, with the specific test targets as well.
+-----------------------
+If you want to run a specific set of tests (rather than those listed in the
+KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
+This file essentially contains the regular Kernel config, with the specific
+test targets as well. The ``.kunitconfig`` should also contain any other config
+options required by the tests.
+A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp arch/um/configs/kunit_defconfig .kunitconfig
-Verifying KUnit Works
----------------------
+You can then add any other Kconfig options you wish, e.g.:
+.. code-block:: none
+
+ CONFIG_LIST_KUNIT_TEST=y
+
+:doc:`kunit_tool ` will ensure that all config options set in
+``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
+It'll warn you if you haven't included the dependencies of the options you're
+using.
+
+.. note::
+ Note that removing something from the ``.kunitconfig`` will not trigger a
+ rebuild of the ``.config`` file: the configuration is only updated if the
+ ``.kunitconfig`` is not a subset of ``.config``. This means that you can use
+ other tools (such as make menuconfig) to adjust other config options.
+
+
+Running the tests
+-----------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
@@ -62,6 +81,41 @@ followed by a list of tests that are run. All of them should be passing.
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
+Running tests without the KUnit Wrapper
+=======================================
+
+If you'd rather not use the KUnit Wrapper (if, for example, you need to
+integrate with other systems, or use an architecture other than UML), KUnit can
+be included in any kernel, and the results read out and parsed manually.
+
+.. note::
+ KUnit is not designed for use in a production system, and it's possible that
+ tests may reduce the stability or security of the system.
+
+
+
+Configuring the kernel
+----------------------
+
+In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
+Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
+menuconfig). From there, you can enable any KUnit tests you want: they usually
+have config options ending in ``_KUNIT_TEST``.
+
+KUnit and KUnit tests can be compiled as modules: in this case the tests in a
+module will be run when the module is loaded.
+
+Running the tests
+-----------------
+
+Build and run your kernel as usual. Test output will be written to the kernel
+log in `TAP `_ format.
+
+.. note::
+ It's possible that there will be other lines and/or data interspersed in the
+ TAP output.
+
+
Writing your first test
=======================
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 607758a66a99..473a2361ec37 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -591,3 +591,17 @@ able to run one test case per invocation.
.. TODO(brendanhiggins@google.com): Add an actual example of an architecture
dependent KUnit test.
+
+KUnit debugfs representation
+============================
+When kunit test suites are initialized, they create an associated directory
+in /sys/kernel/debug/kunit/. The directory contains one file
+
+- results: "cat results" displays results of each test case and the results
+ of the entire suite for the last test run.
+
+The debugfs representation is primarily of use when kunit test suites are
+run in a native environment, either as modules or builtin. Having a way
+to display results like this is valuable as otherwise results can be
+intermixed with other events in dmesg output. The maximum size of each
+results file is KUNIT_LOG_SIZE bytes (defined in include/kunit/test.h).
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index a3216979298b..f46b05e9b96c 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -404,11 +404,8 @@ that is the "next" component in the pathname.
``int last_type``
~~~~~~~~~~~~~~~~~
-This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or
-``LAST_BIND``. The ``last`` field is only valid if the type is
-``LAST_NORM``. ``LAST_BIND`` is used when following a symlink and no
-components of the symlink have been processed yet. Others should be
-fairly self-explanatory.
+This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT`` or ``LAST_DOTDOT``.
+The ``last`` field is only valid if the type is ``LAST_NORM``.
``struct path root``
~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 95fec5968362..4e3e9362afeb 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -161,13 +161,11 @@ device must complete the update before the driver callback returns.
When the device driver wants to populate a range of virtual addresses, it can
use::
- long hmm_range_fault(struct hmm_range *range, unsigned int flags);
+ long hmm_range_fault(struct hmm_range *range);
-With the HMM_RANGE_SNAPSHOT flag, it will only fetch present CPU page table
-entries and will not trigger a page fault on missing or non-present entries.
-Without that flag, it does trigger a page fault on missing or read-only entries
-if write access is requested (see below). Page faults use the generic mm page
-fault code path just like a CPU page fault.
+It will trigger a page fault on missing or read-only entries if write access is
+requested (see below). Page faults use the generic mm page fault code path just
+like a CPU page fault.
Both functions copy CPU page table entries into their pfns array argument. Each
entry in that array corresponds to an address in the virtual range. HMM
@@ -197,7 +195,7 @@ The usage pattern is::
again:
range.notifier_seq = mmu_interval_read_begin(&interval_sub);
down_read(&mm->mmap_sem);
- ret = hmm_range_fault(&range, HMM_RANGE_SNAPSHOT);
+ ret = hmm_range_fault(&range);
if (ret) {
up_read(&mm->mmap_sem);
if (ret == -EBUSY)
diff --git a/MAINTAINERS b/MAINTAINERS
index aeb30bf3f429..7b2eb8a78d25 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14195,6 +14195,7 @@ S: Supported
F: arch/x86/kernel/cpu/resctrl/
F: arch/x86/include/asm/resctrl_sched.h
F: Documentation/x86/resctrl*
+F: tools/testing/selftests/resctrl/
READ-COPY UPDATE (RCU)
M: "Paul E. McKenney"
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 79b1202b1c62..f44f6b27950f 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -563,6 +563,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
mig.end = end;
mig.src = &src_pfn;
mig.dst = &dst_pfn;
+ mig.src_owner = &kvmppc_uvmem_pgmap;
mutex_lock(&kvm->arch.uvmem_lock);
/* The requested page is already paged-out, nothing to do */
@@ -779,6 +780,8 @@ int kvmppc_uvmem_init(void)
kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE;
kvmppc_uvmem_pgmap.res = *res;
kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops;
+ /* just one global instance: */
+ kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap;
addr = memremap_pages(&kvmppc_uvmem_pgmap, NUMA_NO_NODE);
if (IS_ERR(addr)) {
ret = PTR_ERR(addr);
diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index b80a1d616e4e..30575bd92975 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -36,6 +36,8 @@
#include "mconsole_kern.h"
#include
+static struct vfsmount *proc_mnt = NULL;
+
static int do_unlink_socket(struct notifier_block *notifier,
unsigned long what, void *data)
{
@@ -123,7 +125,7 @@ void mconsole_log(struct mc_request *req)
void mconsole_proc(struct mc_request *req)
{
- struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
+ struct vfsmount *mnt = proc_mnt;
char *buf;
int len;
struct file *file;
@@ -134,6 +136,10 @@ void mconsole_proc(struct mc_request *req)
ptr += strlen("proc");
ptr = skip_spaces(ptr);
+ if (!mnt) {
+ mconsole_reply(req, "Proc not available", 1, 0);
+ goto out;
+ }
file = file_open_root(mnt->mnt_root, mnt, ptr, O_RDONLY, 0);
if (IS_ERR(file)) {
mconsole_reply(req, "Failed to open file", 1, 0);
@@ -683,6 +689,24 @@ void mconsole_stack(struct mc_request *req)
with_console(req, stack_proc, to);
}
+static int __init mount_proc(void)
+{
+ struct file_system_type *proc_fs_type;
+ struct vfsmount *mnt;
+
+ proc_fs_type = get_fs_type("proc");
+ if (!proc_fs_type)
+ return -ENODEV;
+
+ mnt = kern_mount(proc_fs_type);
+ put_filesystem(proc_fs_type);
+ if (IS_ERR(mnt))
+ return PTR_ERR(mnt);
+
+ proc_mnt = mnt;
+ return 0;
+}
+
/*
* Changed by mconsole_setup, which is __setup, and called before SMP is
* active.
@@ -696,6 +720,8 @@ static int __init mconsole_init(void)
int err;
char file[UNIX_PATH_MAX];
+ mount_proc();
+
if (umid_file_name("mconsole", file, sizeof(file)))
return -1;
snprintf(mconsole_socket_name, sizeof(file), "%s", file);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9f44ba7d9d97..6309ff72bd78 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -770,7 +770,6 @@ struct amdgpu_ttm_tt {
static const uint64_t hmm_range_flags[HMM_PFN_FLAG_MAX] = {
(1 << 0), /* HMM_PFN_VALID */
(1 << 1), /* HMM_PFN_WRITE */
- 0 /* HMM_PFN_DEVICE_PRIVATE */
};
static const uint64_t hmm_range_values[HMM_PFN_VALUE_MAX] = {
@@ -851,7 +850,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
range->notifier_seq = mmu_interval_read_begin(&bo->notifier);
down_read(&mm->mmap_sem);
- r = hmm_range_fault(range, 0);
+ r = hmm_range_fault(range);
up_read(&mm->mmap_sem);
if (unlikely(r <= 0)) {
/*
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 0ad5d87b5a8e..ad89e09a0be3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -28,6 +28,7 @@
#include
#include
+#include
#include
#include
@@ -176,6 +177,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
.end = vmf->address + PAGE_SIZE,
.src = &src,
.dst = &dst,
+ .src_owner = drm->dev,
};
/*
@@ -526,6 +528,7 @@ nouveau_dmem_init(struct nouveau_drm *drm)
drm->dmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
drm->dmem->pagemap.res = *res;
drm->dmem->pagemap.ops = &nouveau_dmem_pagemap_ops;
+ drm->dmem->pagemap.owner = drm->dev;
if (IS_ERR(devm_memremap_pages(device, &drm->dmem->pagemap)))
goto out_free;
@@ -669,12 +672,6 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
return ret;
}
-static inline bool
-nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
-{
- return is_device_private_page(page) && drm->dmem == page_to_dmem(page);
-}
-
void
nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
struct hmm_range *range)
@@ -690,18 +687,12 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
if (page == NULL)
continue;
- if (!(range->pfns[i] & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
+ if (!is_device_private_page(page))
continue;
- }
-
- if (!nouveau_dmem_page(drm, page)) {
- WARN(1, "Some unknown device memory !\n");
- range->pfns[i] = 0;
- continue;
- }
addr = nouveau_dmem_page_addr(page);
range->pfns[i] &= ((1UL << range->pfn_shift) - 1);
range->pfns[i] |= (addr >> PAGE_SHIFT) << range->pfn_shift;
+ range->pfns[i] |= NVIF_VMM_PFNMAP_V0_VRAM;
}
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index df9bf1fd1bc0..e3797b2d4d17 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -367,7 +367,6 @@ static const u64
nouveau_svm_pfn_flags[HMM_PFN_FLAG_MAX] = {
[HMM_PFN_VALID ] = NVIF_VMM_PFNMAP_V0_V,
[HMM_PFN_WRITE ] = NVIF_VMM_PFNMAP_V0_W,
- [HMM_PFN_DEVICE_PRIVATE] = NVIF_VMM_PFNMAP_V0_VRAM,
};
static const u64
@@ -541,7 +540,7 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
range.default_flags = 0;
range.pfn_flags_mask = -1UL;
down_read(&mm->mmap_sem);
- ret = hmm_range_fault(&range, 0);
+ ret = hmm_range_fault(&range);
up_read(&mm->mmap_sem);
if (ret <= 0) {
if (ret == 0 || ret == -EBUSY)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 17bfedd24cc3..717b798cddad 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -46,7 +46,7 @@
struct ib_pkey_cache {
int table_len;
- u16 table[0];
+ u16 table[];
};
struct ib_update_work {
@@ -972,6 +972,23 @@ int rdma_query_gid(struct ib_device *device, u8 port_num,
}
EXPORT_SYMBOL(rdma_query_gid);
+/**
+ * rdma_read_gid_hw_context - Read the HW GID context from GID attribute
+ * @attr: Potinter to the GID attribute
+ *
+ * rdma_read_gid_hw_context() reads the drivers GID HW context corresponding
+ * to the SGID attr. Callers are required to already be holding the reference
+ * to an existing GID entry.
+ *
+ * Returns the HW GID context
+ *
+ */
+void *rdma_read_gid_hw_context(const struct ib_gid_attr *attr)
+{
+ return container_of(attr, struct ib_gid_table_entry, attr)->context;
+}
+EXPORT_SYMBOL(rdma_read_gid_hw_context);
+
/**
* rdma_find_gid - Returns SGID attributes if the matching GID is found.
* @device: The device to query.
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 15e99a888427..4794113ecd59 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -80,8 +80,19 @@ const char *__attribute_const__ ibcm_reject_msg(int reason)
}
EXPORT_SYMBOL(ibcm_reject_msg);
+struct cm_id_private;
static void cm_add_one(struct ib_device *device);
static void cm_remove_one(struct ib_device *device, void *client_data);
+static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
+ struct ib_cm_sidr_rep_param *param);
+static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
+ const void *private_data, u8 private_data_len);
+static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
+ void *private_data, u8 private_data_len);
+static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
+ enum ib_cm_rej_reason reason, void *ari,
+ u8 ari_length, const void *private_data,
+ u8 private_data_len);
static struct ib_client cm_client = {
.name = "cm",
@@ -197,7 +208,7 @@ struct cm_device {
struct ib_device *ib_device;
u8 ack_delay;
int going_down;
- struct cm_port *port[0];
+ struct cm_port *port[];
};
struct cm_av {
@@ -216,7 +227,7 @@ struct cm_work {
__be32 local_id; /* Established / timewait */
__be32 remote_id;
struct ib_cm_event cm_event;
- struct sa_path_rec path[0];
+ struct sa_path_rec path[];
};
struct cm_timewait_info {
@@ -261,7 +272,6 @@ struct cm_id_private {
__be16 pkey;
u8 private_data_len;
u8 max_cm_retries;
- u8 peer_to_peer;
u8 responder_resources;
u8 initiator_depth;
u8 retry_count;
@@ -572,18 +582,6 @@ static int cm_init_av_by_path(struct sa_path_rec *path,
return 0;
}
-static int cm_alloc_id(struct cm_id_private *cm_id_priv)
-{
- int err;
- u32 id;
-
- err = xa_alloc_cyclic_irq(&cm.local_id_table, &id, cm_id_priv,
- xa_limit_32b, &cm.local_id_next, GFP_KERNEL);
-
- cm_id_priv->id.local_id = (__force __be32)id ^ cm.random_id_operand;
- return err;
-}
-
static u32 cm_local_id(__be32 local_id)
{
return (__force u32) (local_id ^ cm.random_id_operand);
@@ -633,22 +631,44 @@ static int be64_gt(__be64 a, __be64 b)
return (__force u64) a > (__force u64) b;
}
-static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
+/*
+ * Inserts a new cm_id_priv into the listen_service_table. Returns cm_id_priv
+ * if the new ID was inserted, NULL if it could not be inserted due to a
+ * collision, or the existing cm_id_priv ready for shared usage.
+ */
+static struct cm_id_private *cm_insert_listen(struct cm_id_private *cm_id_priv,
+ ib_cm_handler shared_handler)
{
struct rb_node **link = &cm.listen_service_table.rb_node;
struct rb_node *parent = NULL;
struct cm_id_private *cur_cm_id_priv;
__be64 service_id = cm_id_priv->id.service_id;
__be64 service_mask = cm_id_priv->id.service_mask;
+ unsigned long flags;
+ spin_lock_irqsave(&cm.lock, flags);
while (*link) {
parent = *link;
cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
service_node);
if ((cur_cm_id_priv->id.service_mask & service_id) ==
(service_mask & cur_cm_id_priv->id.service_id) &&
- (cm_id_priv->id.device == cur_cm_id_priv->id.device))
+ (cm_id_priv->id.device == cur_cm_id_priv->id.device)) {
+ /*
+ * Sharing an ib_cm_id with different handlers is not
+ * supported
+ */
+ if (cur_cm_id_priv->id.cm_handler != shared_handler ||
+ cur_cm_id_priv->id.context ||
+ WARN_ON(!cur_cm_id_priv->id.cm_handler)) {
+ spin_unlock_irqrestore(&cm.lock, flags);
+ return NULL;
+ }
+ refcount_inc(&cur_cm_id_priv->refcount);
+ cur_cm_id_priv->listen_sharecount++;
+ spin_unlock_irqrestore(&cm.lock, flags);
return cur_cm_id_priv;
+ }
if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
link = &(*link)->rb_left;
@@ -661,9 +681,11 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
else
link = &(*link)->rb_right;
}
+ cm_id_priv->listen_sharecount++;
rb_link_node(&cm_id_priv->service_node, parent, link);
rb_insert_color(&cm_id_priv->service_node, &cm.listen_service_table);
- return NULL;
+ spin_unlock_irqrestore(&cm.lock, flags);
+ return cm_id_priv;
}
static struct cm_id_private * cm_find_listen(struct ib_device *device,
@@ -810,21 +832,12 @@ static struct cm_id_private * cm_insert_remote_sidr(struct cm_id_private
return NULL;
}
-static void cm_reject_sidr_req(struct cm_id_private *cm_id_priv,
- enum ib_cm_sidr_status status)
-{
- struct ib_cm_sidr_rep_param param;
-
- memset(¶m, 0, sizeof param);
- param.status = status;
- ib_send_cm_sidr_rep(&cm_id_priv->id, ¶m);
-}
-
-struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
- ib_cm_handler cm_handler,
- void *context)
+static struct cm_id_private *cm_alloc_id_priv(struct ib_device *device,
+ ib_cm_handler cm_handler,
+ void *context)
{
struct cm_id_private *cm_id_priv;
+ u32 id;
int ret;
cm_id_priv = kzalloc(sizeof *cm_id_priv, GFP_KERNEL);
@@ -836,10 +849,9 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
cm_id_priv->id.cm_handler = cm_handler;
cm_id_priv->id.context = context;
cm_id_priv->id.remote_cm_qpn = 1;
- ret = cm_alloc_id(cm_id_priv);
- if (ret)
- goto error;
+ RB_CLEAR_NODE(&cm_id_priv->service_node);
+ RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
spin_lock_init(&cm_id_priv->lock);
init_completion(&cm_id_priv->comp);
INIT_LIST_HEAD(&cm_id_priv->work_list);
@@ -847,11 +859,42 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
INIT_LIST_HEAD(&cm_id_priv->altr_list);
atomic_set(&cm_id_priv->work_count, -1);
refcount_set(&cm_id_priv->refcount, 1);
- return &cm_id_priv->id;
+
+ ret = xa_alloc_cyclic_irq(&cm.local_id_table, &id, NULL, xa_limit_32b,
+ &cm.local_id_next, GFP_KERNEL);
+ if (ret)
+ goto error;
+ cm_id_priv->id.local_id = (__force __be32)id ^ cm.random_id_operand;
+
+ return cm_id_priv;
error:
kfree(cm_id_priv);
- return ERR_PTR(-ENOMEM);
+ return ERR_PTR(ret);
+}
+
+/*
+ * Make the ID visible to the MAD handlers and other threads that use the
+ * xarray.
+ */
+static void cm_finalize_id(struct cm_id_private *cm_id_priv)
+{
+ xa_store_irq(&cm.local_id_table, cm_local_id(cm_id_priv->id.local_id),
+ cm_id_priv, GFP_KERNEL);
+}
+
+struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
+ ib_cm_handler cm_handler,
+ void *context)
+{
+ struct cm_id_private *cm_id_priv;
+
+ cm_id_priv = cm_alloc_id_priv(device, cm_handler, context);
+ if (IS_ERR(cm_id_priv))
+ return ERR_CAST(cm_id_priv);
+
+ cm_finalize_id(cm_id_priv);
+ return &cm_id_priv->id;
}
EXPORT_SYMBOL(ib_create_cm_id);
@@ -932,6 +975,8 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv)
unsigned long flags;
struct cm_device *cm_dev;
+ lockdep_assert_held(&cm_id_priv->lock);
+
cm_dev = ib_get_client_data(cm_id_priv->id.device, &cm_client);
if (!cm_dev)
return;
@@ -963,6 +1008,8 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
{
unsigned long flags;
+ lockdep_assert_held(&cm_id_priv->lock);
+
cm_id_priv->id.state = IB_CM_IDLE;
if (cm_id_priv->timewait_info) {
spin_lock_irqsave(&cm.lock, flags);
@@ -979,54 +1026,51 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
struct cm_work *work;
cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-retest:
spin_lock_irq(&cm_id_priv->lock);
+retest:
switch (cm_id->state) {
case IB_CM_LISTEN:
- spin_unlock_irq(&cm_id_priv->lock);
-
- spin_lock_irq(&cm.lock);
+ spin_lock(&cm.lock);
if (--cm_id_priv->listen_sharecount > 0) {
/* The id is still shared. */
+ WARN_ON(refcount_read(&cm_id_priv->refcount) == 1);
+ spin_unlock(&cm.lock);
+ spin_unlock_irq(&cm_id_priv->lock);
cm_deref_id(cm_id_priv);
- spin_unlock_irq(&cm.lock);
return;
}
+ cm_id->state = IB_CM_IDLE;
rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
- spin_unlock_irq(&cm.lock);
+ RB_CLEAR_NODE(&cm_id_priv->service_node);
+ spin_unlock(&cm.lock);
break;
case IB_CM_SIDR_REQ_SENT:
cm_id->state = IB_CM_IDLE;
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
- spin_unlock_irq(&cm_id_priv->lock);
break;
case IB_CM_SIDR_REQ_RCVD:
- spin_unlock_irq(&cm_id_priv->lock);
- cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
- spin_lock_irq(&cm.lock);
- if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
- rb_erase(&cm_id_priv->sidr_id_node,
- &cm.remote_sidr_table);
- spin_unlock_irq(&cm.lock);
+ cm_send_sidr_rep_locked(cm_id_priv,
+ &(struct ib_cm_sidr_rep_param){
+ .status = IB_SIDR_REJECT });
+ /* cm_send_sidr_rep_locked will not move to IDLE if it fails */
+ cm_id->state = IB_CM_IDLE;
break;
case IB_CM_REQ_SENT:
case IB_CM_MRA_REQ_RCVD:
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
- spin_unlock_irq(&cm_id_priv->lock);
- ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
- &cm_id_priv->id.device->node_guid,
- sizeof cm_id_priv->id.device->node_guid,
- NULL, 0);
+ cm_send_rej_locked(cm_id_priv, IB_CM_REJ_TIMEOUT,
+ &cm_id_priv->id.device->node_guid,
+ sizeof(cm_id_priv->id.device->node_guid),
+ NULL, 0);
break;
case IB_CM_REQ_RCVD:
if (err == -ENOMEM) {
/* Do not reject to allow future retries. */
cm_reset_to_idle(cm_id_priv);
- spin_unlock_irq(&cm_id_priv->lock);
} else {
- spin_unlock_irq(&cm_id_priv->lock);
- ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
- NULL, 0, NULL, 0);
+ cm_send_rej_locked(cm_id_priv,
+ IB_CM_REJ_CONSUMER_DEFINED, NULL, 0,
+ NULL, 0);
}
break;
case IB_CM_REP_SENT:
@@ -1036,38 +1080,56 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
case IB_CM_MRA_REQ_SENT:
case IB_CM_REP_RCVD:
case IB_CM_MRA_REP_SENT:
- spin_unlock_irq(&cm_id_priv->lock);
- ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
- NULL, 0, NULL, 0);
+ cm_send_rej_locked(cm_id_priv, IB_CM_REJ_CONSUMER_DEFINED, NULL,
+ 0, NULL, 0);
break;
case IB_CM_ESTABLISHED:
- spin_unlock_irq(&cm_id_priv->lock);
- if (cm_id_priv->qp_type == IB_QPT_XRC_TGT)
+ if (cm_id_priv->qp_type == IB_QPT_XRC_TGT) {
+ cm_id->state = IB_CM_IDLE;
break;
- ib_send_cm_dreq(cm_id, NULL, 0);
+ }
+ cm_send_dreq_locked(cm_id_priv, NULL, 0);
goto retest;
case IB_CM_DREQ_SENT:
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
cm_enter_timewait(cm_id_priv);
- spin_unlock_irq(&cm_id_priv->lock);
- break;
+ goto retest;
case IB_CM_DREQ_RCVD:
- spin_unlock_irq(&cm_id_priv->lock);
- ib_send_cm_drep(cm_id, NULL, 0);
+ cm_send_drep_locked(cm_id_priv, NULL, 0);
+ WARN_ON(cm_id->state != IB_CM_TIMEWAIT);
+ goto retest;
+ case IB_CM_TIMEWAIT:
+ /*
+ * The cm_acquire_id in cm_timewait_handler will stop working
+ * once we do cm_free_id() below, so just move to idle here for
+ * consistency.
+ */
+ cm_id->state = IB_CM_IDLE;
break;
- default:
- spin_unlock_irq(&cm_id_priv->lock);
+ case IB_CM_IDLE:
break;
}
+ WARN_ON(cm_id->state != IB_CM_IDLE);
- spin_lock_irq(&cm.lock);
+ spin_lock(&cm.lock);
+ /* Required for cleanup paths related cm_req_handler() */
+ if (cm_id_priv->timewait_info) {
+ cm_cleanup_timewait(cm_id_priv->timewait_info);
+ kfree(cm_id_priv->timewait_info);
+ cm_id_priv->timewait_info = NULL;
+ }
if (!list_empty(&cm_id_priv->altr_list) &&
(!cm_id_priv->altr_send_port_not_ready))
list_del(&cm_id_priv->altr_list);
if (!list_empty(&cm_id_priv->prim_list) &&
(!cm_id_priv->prim_send_port_not_ready))
list_del(&cm_id_priv->prim_list);
- spin_unlock_irq(&cm.lock);
+ WARN_ON(cm_id_priv->listen_sharecount);
+ WARN_ON(!RB_EMPTY_NODE(&cm_id_priv->service_node));
+ if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
+ rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
+ spin_unlock(&cm.lock);
+ spin_unlock_irq(&cm_id_priv->lock);
cm_free_id(cm_id->local_id);
cm_deref_id(cm_id_priv);
@@ -1087,8 +1149,27 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
}
EXPORT_SYMBOL(ib_destroy_cm_id);
+static int cm_init_listen(struct cm_id_private *cm_id_priv, __be64 service_id,
+ __be64 service_mask)
+{
+ service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
+ service_id &= service_mask;
+ if ((service_id & IB_SERVICE_ID_AGN_MASK) == IB_CM_ASSIGN_SERVICE_ID &&
+ (service_id != IB_CM_ASSIGN_SERVICE_ID))
+ return -EINVAL;
+
+ if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
+ cm_id_priv->id.service_id = cpu_to_be64(cm.listen_service_id++);
+ cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+ } else {
+ cm_id_priv->id.service_id = service_id;
+ cm_id_priv->id.service_mask = service_mask;
+ }
+ return 0;
+}
+
/**
- * __ib_cm_listen - Initiates listening on the specified service ID for
+ * ib_cm_listen - Initiates listening on the specified service ID for
* connection and service ID resolution requests.
* @cm_id: Connection identifier associated with the listen request.
* @service_id: Service identifier matched against incoming connection
@@ -1100,51 +1181,33 @@ EXPORT_SYMBOL(ib_destroy_cm_id);
* exactly. This parameter is ignored if %service_id is set to
* IB_CM_ASSIGN_SERVICE_ID.
*/
-static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
- __be64 service_mask)
-{
- struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
- int ret = 0;
-
- service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
- service_id &= service_mask;
- if ((service_id & IB_SERVICE_ID_AGN_MASK) == IB_CM_ASSIGN_SERVICE_ID &&
- (service_id != IB_CM_ASSIGN_SERVICE_ID))
- return -EINVAL;
-
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
- if (cm_id->state != IB_CM_IDLE)
- return -EINVAL;
-
- cm_id->state = IB_CM_LISTEN;
- ++cm_id_priv->listen_sharecount;
-
- if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
- cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
- cm_id->service_mask = ~cpu_to_be64(0);
- } else {
- cm_id->service_id = service_id;
- cm_id->service_mask = service_mask;
- }
- cur_cm_id_priv = cm_insert_listen(cm_id_priv);
-
- if (cur_cm_id_priv) {
- cm_id->state = IB_CM_IDLE;
- --cm_id_priv->listen_sharecount;
- ret = -EBUSY;
- }
- return ret;
-}
-
int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask)
{
+ struct cm_id_private *cm_id_priv =
+ container_of(cm_id, struct cm_id_private, id);
unsigned long flags;
int ret;
- spin_lock_irqsave(&cm.lock, flags);
- ret = __ib_cm_listen(cm_id, service_id, service_mask);
- spin_unlock_irqrestore(&cm.lock, flags);
+ spin_lock_irqsave(&cm_id_priv->lock, flags);
+ if (cm_id_priv->id.state != IB_CM_IDLE) {
+ ret = -EINVAL;
+ goto out;
+ }
+ ret = cm_init_listen(cm_id_priv, service_id, service_mask);
+ if (ret)
+ goto out;
+
+ if (!cm_insert_listen(cm_id_priv, NULL)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ cm_id_priv->id.state = IB_CM_LISTEN;
+ ret = 0;
+
+out:
+ spin_unlock_irqrestore(&cm_id_priv->lock, flags);
return ret;
}
EXPORT_SYMBOL(ib_cm_listen);
@@ -1169,51 +1232,38 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
ib_cm_handler cm_handler,
__be64 service_id)
{
+ struct cm_id_private *listen_id_priv;
struct cm_id_private *cm_id_priv;
- struct ib_cm_id *cm_id;
- unsigned long flags;
int err = 0;
/* Create an ID in advance, since the creation may sleep */
- cm_id = ib_create_cm_id(device, cm_handler, NULL);
- if (IS_ERR(cm_id))
- return cm_id;
+ cm_id_priv = cm_alloc_id_priv(device, cm_handler, NULL);
+ if (IS_ERR(cm_id_priv))
+ return ERR_CAST(cm_id_priv);
- spin_lock_irqsave(&cm.lock, flags);
-
- if (service_id == IB_CM_ASSIGN_SERVICE_ID)
- goto new_id;
-
- /* Find an existing ID */
- cm_id_priv = cm_find_listen(device, service_id);
- if (cm_id_priv) {
- if (cm_id->cm_handler != cm_handler || cm_id->context) {
- /* Sharing an ib_cm_id with different handlers is not
- * supported */
- spin_unlock_irqrestore(&cm.lock, flags);
- ib_destroy_cm_id(cm_id);
- return ERR_PTR(-EINVAL);
- }
- refcount_inc(&cm_id_priv->refcount);
- ++cm_id_priv->listen_sharecount;
- spin_unlock_irqrestore(&cm.lock, flags);
-
- ib_destroy_cm_id(cm_id);
- cm_id = &cm_id_priv->id;
- return cm_id;
- }
-
-new_id:
- /* Use newly created ID */
- err = __ib_cm_listen(cm_id, service_id, 0);
-
- spin_unlock_irqrestore(&cm.lock, flags);
-
- if (err) {
- ib_destroy_cm_id(cm_id);
+ err = cm_init_listen(cm_id_priv, service_id, 0);
+ if (err)
return ERR_PTR(err);
+
+ spin_lock_irq(&cm_id_priv->lock);
+ listen_id_priv = cm_insert_listen(cm_id_priv, cm_handler);
+ if (listen_id_priv != cm_id_priv) {
+ spin_unlock_irq(&cm_id_priv->lock);
+ ib_destroy_cm_id(&cm_id_priv->id);
+ if (!listen_id_priv)
+ return ERR_PTR(-EINVAL);
+ return &listen_id_priv->id;
}
- return cm_id;
+ cm_id_priv->id.state = IB_CM_LISTEN;
+ spin_unlock_irq(&cm_id_priv->lock);
+
+ /*
+ * A listen ID does not need to be in the xarray since it does not
+ * receive mads, is not placed in the remote_id or remote_qpn rbtree,
+ * and does not enter timewait.
+ */
+
+ return &cm_id_priv->id;
}
EXPORT_SYMBOL(ib_cm_insert_listen);
@@ -1381,10 +1431,6 @@ static void cm_format_req(struct cm_req_msg *req_msg,
static int cm_validate_req_param(struct ib_cm_req_param *param)
{
- /* peer-to-peer not supported */
- if (param->peer_to_peer)
- return -EINVAL;
-
if (!param->primary_path)
return -EINVAL;
@@ -1419,7 +1465,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
/* Verify that we're not in timewait. */
cm_id_priv = container_of(cm_id, struct cm_id_private, id);
spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id->state != IB_CM_IDLE) {
+ if (cm_id->state != IB_CM_IDLE || WARN_ON(cm_id_priv->timewait_info)) {
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
ret = -EINVAL;
goto out;
@@ -1437,12 +1483,12 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
param->ppath_sgid_attr, &cm_id_priv->av,
cm_id_priv);
if (ret)
- goto error1;
+ goto out;
if (param->alternate_path) {
ret = cm_init_av_by_path(param->alternate_path, NULL,
&cm_id_priv->alt_av, cm_id_priv);
if (ret)
- goto error1;
+ goto out;
}
cm_id->service_id = param->service_id;
cm_id->service_mask = ~cpu_to_be64(0);
@@ -1460,7 +1506,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
if (ret)
- goto error1;
+ goto out;
req_msg = (struct cm_req_msg *) cm_id_priv->msg->mad;
cm_format_req(req_msg, cm_id_priv, param);
@@ -1483,7 +1529,6 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
return 0;
error2: cm_free_msg(cm_id_priv->msg);
-error1: kfree(cm_id_priv->timewait_info);
out: return ret;
}
EXPORT_SYMBOL(ib_send_cm_req);
@@ -1789,6 +1834,8 @@ static void cm_format_rej(struct cm_rej_msg *rej_msg,
const void *private_data,
u8 private_data_len)
{
+ lockdep_assert_held(&cm_id_priv->lock);
+
cm_format_mad_hdr(&rej_msg->hdr, CM_REJ_ATTR_ID, cm_id_priv->tid);
IBA_SET(CM_REJ_REMOTE_COMM_ID, rej_msg,
be32_to_cpu(cm_id_priv->id.remote_id));
@@ -1838,8 +1885,12 @@ static void cm_dup_req_handler(struct cm_work *work,
counter[CM_REQ_COUNTER]);
/* Quick state check to discard duplicate REQs. */
- if (cm_id_priv->id.state == IB_CM_REQ_RCVD)
+ spin_lock_irq(&cm_id_priv->lock);
+ if (cm_id_priv->id.state == IB_CM_REQ_RCVD) {
+ spin_unlock_irq(&cm_id_priv->lock);
return;
+ }
+ spin_unlock_irq(&cm_id_priv->lock);
ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg);
if (ret)
@@ -1924,14 +1975,10 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
cm_issue_rej(work->port, work->mad_recv_wc,
IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ,
NULL, 0);
- goto out;
+ return NULL;
}
refcount_inc(&listen_cm_id_priv->refcount);
- refcount_inc(&cm_id_priv->refcount);
- cm_id_priv->id.state = IB_CM_REQ_RCVD;
- atomic_inc(&cm_id_priv->work_count);
spin_unlock_irq(&cm.lock);
-out:
return listen_cm_id_priv;
}
@@ -1973,7 +2020,6 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
static int cm_req_handler(struct cm_work *work)
{
- struct ib_cm_id *cm_id;
struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
struct cm_req_msg *req_msg;
const struct ib_global_route *grh;
@@ -1982,13 +2028,33 @@ static int cm_req_handler(struct cm_work *work)
req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
- cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
- if (IS_ERR(cm_id))
- return PTR_ERR(cm_id);
+ cm_id_priv =
+ cm_alloc_id_priv(work->port->cm_dev->ib_device, NULL, NULL);
+ if (IS_ERR(cm_id_priv))
+ return PTR_ERR(cm_id_priv);
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
cm_id_priv->id.remote_id =
cpu_to_be32(IBA_GET(CM_REQ_LOCAL_COMM_ID, req_msg));
+ cm_id_priv->id.service_id =
+ cpu_to_be64(IBA_GET(CM_REQ_SERVICE_ID, req_msg));
+ cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+ cm_id_priv->tid = req_msg->hdr.tid;
+ cm_id_priv->timeout_ms = cm_convert_to_ms(
+ IBA_GET(CM_REQ_LOCAL_CM_RESPONSE_TIMEOUT, req_msg));
+ cm_id_priv->max_cm_retries = IBA_GET(CM_REQ_MAX_CM_RETRIES, req_msg);
+ cm_id_priv->remote_qpn =
+ cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
+ cm_id_priv->initiator_depth =
+ IBA_GET(CM_REQ_RESPONDER_RESOURCES, req_msg);
+ cm_id_priv->responder_resources =
+ IBA_GET(CM_REQ_INITIATOR_DEPTH, req_msg);
+ cm_id_priv->path_mtu = IBA_GET(CM_REQ_PATH_PACKET_PAYLOAD_MTU, req_msg);
+ cm_id_priv->pkey = cpu_to_be16(IBA_GET(CM_REQ_PARTITION_KEY, req_msg));
+ cm_id_priv->sq_psn = cpu_to_be32(IBA_GET(CM_REQ_STARTING_PSN, req_msg));
+ cm_id_priv->retry_count = IBA_GET(CM_REQ_RETRY_COUNT, req_msg);
+ cm_id_priv->rnr_retry_count = IBA_GET(CM_REQ_RNR_RETRY_COUNT, req_msg);
+ cm_id_priv->qp_type = cm_req_get_qp_type(req_msg);
+
ret = cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
work->mad_recv_wc->recv_buf.grh,
&cm_id_priv->av);
@@ -2000,27 +2066,26 @@ static int cm_req_handler(struct cm_work *work)
ret = PTR_ERR(cm_id_priv->timewait_info);
goto destroy;
}
- cm_id_priv->timewait_info->work.remote_id =
- cpu_to_be32(IBA_GET(CM_REQ_LOCAL_COMM_ID, req_msg));
+ cm_id_priv->timewait_info->work.remote_id = cm_id_priv->id.remote_id;
cm_id_priv->timewait_info->remote_ca_guid =
cpu_to_be64(IBA_GET(CM_REQ_LOCAL_CA_GUID, req_msg));
- cm_id_priv->timewait_info->remote_qpn =
- cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
+ cm_id_priv->timewait_info->remote_qpn = cm_id_priv->remote_qpn;
+
+ /*
+ * Note that the ID pointer is not in the xarray at this point,
+ * so this set is only visible to the local thread.
+ */
+ cm_id_priv->id.state = IB_CM_REQ_RCVD;
listen_cm_id_priv = cm_match_req(work, cm_id_priv);
if (!listen_cm_id_priv) {
pr_debug("%s: local_id %d, no listen_cm_id_priv\n", __func__,
- be32_to_cpu(cm_id->local_id));
+ be32_to_cpu(cm_id_priv->id.local_id));
+ cm_id_priv->id.state = IB_CM_IDLE;
ret = -EINVAL;
- goto free_timeinfo;
+ goto destroy;
}
- cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
- cm_id_priv->id.context = listen_cm_id_priv->id.context;
- cm_id_priv->id.service_id =
- cpu_to_be64(IBA_GET(CM_REQ_SERVICE_ID, req_msg));
- cm_id_priv->id.service_mask = ~cpu_to_be64(0);
-
cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
memset(&work->path[0], 0, sizeof(work->path[0]));
@@ -2058,10 +2123,10 @@ static int cm_req_handler(struct cm_work *work)
work->port->port_num, 0,
&work->path[0].sgid);
if (err)
- ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
+ ib_send_cm_rej(&cm_id_priv->id, IB_CM_REJ_INVALID_GID,
NULL, 0, NULL, 0);
else
- ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
+ ib_send_cm_rej(&cm_id_priv->id, IB_CM_REJ_INVALID_GID,
&work->path[0].sgid,
sizeof(work->path[0].sgid),
NULL, 0);
@@ -2071,41 +2136,40 @@ static int cm_req_handler(struct cm_work *work)
ret = cm_init_av_by_path(&work->path[1], NULL,
&cm_id_priv->alt_av, cm_id_priv);
if (ret) {
- ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
+ ib_send_cm_rej(&cm_id_priv->id,
+ IB_CM_REJ_INVALID_ALT_GID,
&work->path[0].sgid,
sizeof(work->path[0].sgid), NULL, 0);
goto rejected;
}
}
- cm_id_priv->tid = req_msg->hdr.tid;
- cm_id_priv->timeout_ms = cm_convert_to_ms(
- IBA_GET(CM_REQ_LOCAL_CM_RESPONSE_TIMEOUT, req_msg));
- cm_id_priv->max_cm_retries = IBA_GET(CM_REQ_MAX_CM_RETRIES, req_msg);
- cm_id_priv->remote_qpn =
- cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
- cm_id_priv->initiator_depth =
- IBA_GET(CM_REQ_RESPONDER_RESOURCES, req_msg);
- cm_id_priv->responder_resources =
- IBA_GET(CM_REQ_INITIATOR_DEPTH, req_msg);
- cm_id_priv->path_mtu = IBA_GET(CM_REQ_PATH_PACKET_PAYLOAD_MTU, req_msg);
- cm_id_priv->pkey = cpu_to_be16(IBA_GET(CM_REQ_PARTITION_KEY, req_msg));
- cm_id_priv->sq_psn = cpu_to_be32(IBA_GET(CM_REQ_STARTING_PSN, req_msg));
- cm_id_priv->retry_count = IBA_GET(CM_REQ_RETRY_COUNT, req_msg);
- cm_id_priv->rnr_retry_count = IBA_GET(CM_REQ_RNR_RETRY_COUNT, req_msg);
- cm_id_priv->qp_type = cm_req_get_qp_type(req_msg);
+ cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
+ cm_id_priv->id.context = listen_cm_id_priv->id.context;
cm_format_req_event(work, cm_id_priv, &listen_cm_id_priv->id);
+
+ /* Now MAD handlers can see the new ID */
+ spin_lock_irq(&cm_id_priv->lock);
+ cm_finalize_id(cm_id_priv);
+
+ /* Refcount belongs to the event, pairs with cm_process_work() */
+ refcount_inc(&cm_id_priv->refcount);
+ atomic_inc(&cm_id_priv->work_count);
+ spin_unlock_irq(&cm_id_priv->lock);
cm_process_work(cm_id_priv, work);
+ /*
+ * Since this ID was just created and was not made visible to other MAD
+ * handlers until the cm_finalize_id() above we know that the
+ * cm_process_work() will deliver the event and the listen_cm_id
+ * embedded in the event can be derefed here.
+ */
cm_deref_id(listen_cm_id_priv);
return 0;
rejected:
- refcount_dec(&cm_id_priv->refcount);
cm_deref_id(listen_cm_id_priv);
-free_timeinfo:
- kfree(cm_id_priv->timewait_info);
destroy:
- ib_destroy_cm_id(cm_id);
+ ib_destroy_cm_id(&cm_id_priv->id);
return ret;
}
@@ -2189,6 +2253,9 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
cm_id_priv->initiator_depth = param->initiator_depth;
cm_id_priv->responder_resources = param->responder_resources;
cm_id_priv->rq_psn = cpu_to_be32(IBA_GET(CM_REP_STARTING_PSN, rep_msg));
+ WARN_ONCE(param->qp_num & 0xFF000000,
+ "IBTA declares QPN to be 24 bits, but it is 0x%X\n",
+ param->qp_num);
cm_id_priv->local_qpn = cpu_to_be32(param->qp_num & 0xFFFFFF);
out: spin_unlock_irqrestore(&cm_id_priv->lock, flags);
@@ -2359,13 +2426,13 @@ static int cm_rep_handler(struct cm_work *work)
case IB_CM_MRA_REQ_RCVD:
break;
default:
- spin_unlock_irq(&cm_id_priv->lock);
ret = -EINVAL;
pr_debug(
"%s: cm_id_priv->id.state: %d, local_comm_id %d, remote_comm_id %d\n",
__func__, cm_id_priv->id.state,
IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
+ spin_unlock_irq(&cm_id_priv->lock);
goto error;
}
@@ -2434,8 +2501,6 @@ static int cm_rep_handler(struct cm_work *work)
cm_ack_timeout(cm_id_priv->target_ack_delay,
cm_id_priv->alt_av.timeout - 1);
- /* todo: handle peer_to_peer */
-
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
@@ -2546,35 +2611,32 @@ static void cm_format_dreq(struct cm_dreq_msg *dreq_msg,
private_data_len);
}
-int ib_send_cm_dreq(struct ib_cm_id *cm_id,
- const void *private_data,
- u8 private_data_len)
+static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
+ const void *private_data, u8 private_data_len)
{
- struct cm_id_private *cm_id_priv;
struct ib_mad_send_buf *msg;
- unsigned long flags;
int ret;
+ lockdep_assert_held(&cm_id_priv->lock);
+
if (private_data && private_data_len > IB_CM_DREQ_PRIVATE_DATA_SIZE)
return -EINVAL;
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
- spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id->state != IB_CM_ESTABLISHED) {
+ if (cm_id_priv->id.state != IB_CM_ESTABLISHED) {
pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
- be32_to_cpu(cm_id->local_id), cm_id->state);
- ret = -EINVAL;
- goto out;
+ be32_to_cpu(cm_id_priv->id.local_id),
+ cm_id_priv->id.state);
+ return -EINVAL;
}
- if (cm_id->lap_state == IB_CM_LAP_SENT ||
- cm_id->lap_state == IB_CM_MRA_LAP_RCVD)
+ if (cm_id_priv->id.lap_state == IB_CM_LAP_SENT ||
+ cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD)
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
ret = cm_alloc_msg(cm_id_priv, &msg);
if (ret) {
cm_enter_timewait(cm_id_priv);
- goto out;
+ return ret;
}
cm_format_dreq((struct cm_dreq_msg *) msg->mad, cm_id_priv,
@@ -2585,14 +2647,26 @@ int ib_send_cm_dreq(struct ib_cm_id *cm_id,
ret = ib_post_send_mad(msg, NULL);
if (ret) {
cm_enter_timewait(cm_id_priv);
- spin_unlock_irqrestore(&cm_id_priv->lock, flags);
cm_free_msg(msg);
return ret;
}
- cm_id->state = IB_CM_DREQ_SENT;
+ cm_id_priv->id.state = IB_CM_DREQ_SENT;
cm_id_priv->msg = msg;
-out: spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ return 0;
+}
+
+int ib_send_cm_dreq(struct ib_cm_id *cm_id, const void *private_data,
+ u8 private_data_len)
+{
+ struct cm_id_private *cm_id_priv =
+ container_of(cm_id, struct cm_id_private, id);
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&cm_id_priv->lock, flags);
+ ret = cm_send_dreq_locked(cm_id_priv, private_data, private_data_len);
+ spin_unlock_irqrestore(&cm_id_priv->lock, flags);
return ret;
}
EXPORT_SYMBOL(ib_send_cm_dreq);
@@ -2613,51 +2687,60 @@ static void cm_format_drep(struct cm_drep_msg *drep_msg,
private_data_len);
}
-int ib_send_cm_drep(struct ib_cm_id *cm_id,
- const void *private_data,
- u8 private_data_len)
+static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
+ void *private_data, u8 private_data_len)
{
- struct cm_id_private *cm_id_priv;
struct ib_mad_send_buf *msg;
- unsigned long flags;
- void *data;
int ret;
+ lockdep_assert_held(&cm_id_priv->lock);
+
if (private_data && private_data_len > IB_CM_DREP_PRIVATE_DATA_SIZE)
return -EINVAL;
- data = cm_copy_private_data(private_data, private_data_len);
- if (IS_ERR(data))
- return PTR_ERR(data);
-
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
- spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id->state != IB_CM_DREQ_RCVD) {
- spin_unlock_irqrestore(&cm_id_priv->lock, flags);
- kfree(data);
- pr_debug("%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
- __func__, be32_to_cpu(cm_id->local_id), cm_id->state);
+ if (cm_id_priv->id.state != IB_CM_DREQ_RCVD) {
+ pr_debug(
+ "%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
+ __func__, be32_to_cpu(cm_id_priv->id.local_id),
+ cm_id_priv->id.state);
+ kfree(private_data);
return -EINVAL;
}
- cm_set_private_data(cm_id_priv, data, private_data_len);
+ cm_set_private_data(cm_id_priv, private_data, private_data_len);
cm_enter_timewait(cm_id_priv);
ret = cm_alloc_msg(cm_id_priv, &msg);
if (ret)
- goto out;
+ return ret;
cm_format_drep((struct cm_drep_msg *) msg->mad, cm_id_priv,
private_data, private_data_len);
ret = ib_post_send_mad(msg, NULL);
if (ret) {
- spin_unlock_irqrestore(&cm_id_priv->lock, flags);
cm_free_msg(msg);
return ret;
}
+ return 0;
+}
-out: spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+int ib_send_cm_drep(struct ib_cm_id *cm_id, const void *private_data,
+ u8 private_data_len)
+{
+ struct cm_id_private *cm_id_priv =
+ container_of(cm_id, struct cm_id_private, id);
+ unsigned long flags;
+ void *data;
+ int ret;
+
+ data = cm_copy_private_data(private_data, private_data_len);
+ if (IS_ERR(data))
+ return PTR_ERR(data);
+
+ spin_lock_irqsave(&cm_id_priv->lock, flags);
+ ret = cm_send_drep_locked(cm_id_priv, data, private_data_len);
+ spin_unlock_irqrestore(&cm_id_priv->lock, flags);
return ret;
}
EXPORT_SYMBOL(ib_send_cm_drep);
@@ -2816,65 +2899,72 @@ static int cm_drep_handler(struct cm_work *work)
return -EINVAL;
}
-int ib_send_cm_rej(struct ib_cm_id *cm_id,
- enum ib_cm_rej_reason reason,
- void *ari,
- u8 ari_length,
- const void *private_data,
- u8 private_data_len)
+static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
+ enum ib_cm_rej_reason reason, void *ari,
+ u8 ari_length, const void *private_data,
+ u8 private_data_len)
{
- struct cm_id_private *cm_id_priv;
struct ib_mad_send_buf *msg;
- unsigned long flags;
int ret;
+ lockdep_assert_held(&cm_id_priv->lock);
+
if ((private_data && private_data_len > IB_CM_REJ_PRIVATE_DATA_SIZE) ||
(ari && ari_length > IB_CM_REJ_ARI_LENGTH))
return -EINVAL;
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-
- spin_lock_irqsave(&cm_id_priv->lock, flags);
- switch (cm_id->state) {
+ switch (cm_id_priv->id.state) {
case IB_CM_REQ_SENT:
case IB_CM_MRA_REQ_RCVD:
case IB_CM_REQ_RCVD:
case IB_CM_MRA_REQ_SENT:
case IB_CM_REP_RCVD:
case IB_CM_MRA_REP_SENT:
- ret = cm_alloc_msg(cm_id_priv, &msg);
- if (!ret)
- cm_format_rej((struct cm_rej_msg *) msg->mad,
- cm_id_priv, reason, ari, ari_length,
- private_data, private_data_len);
-
cm_reset_to_idle(cm_id_priv);
+ ret = cm_alloc_msg(cm_id_priv, &msg);
+ if (ret)
+ return ret;
+ cm_format_rej((struct cm_rej_msg *)msg->mad, cm_id_priv, reason,
+ ari, ari_length, private_data, private_data_len);
break;
case IB_CM_REP_SENT:
case IB_CM_MRA_REP_RCVD:
- ret = cm_alloc_msg(cm_id_priv, &msg);
- if (!ret)
- cm_format_rej((struct cm_rej_msg *) msg->mad,
- cm_id_priv, reason, ari, ari_length,
- private_data, private_data_len);
-
cm_enter_timewait(cm_id_priv);
+ ret = cm_alloc_msg(cm_id_priv, &msg);
+ if (ret)
+ return ret;
+ cm_format_rej((struct cm_rej_msg *)msg->mad, cm_id_priv, reason,
+ ari, ari_length, private_data, private_data_len);
break;
default:
pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
- be32_to_cpu(cm_id_priv->id.local_id), cm_id->state);
- ret = -EINVAL;
- goto out;
+ be32_to_cpu(cm_id_priv->id.local_id),
+ cm_id_priv->id.state);
+ return -EINVAL;
}
- if (ret)
- goto out;
-
ret = ib_post_send_mad(msg, NULL);
- if (ret)
+ if (ret) {
cm_free_msg(msg);
+ return ret;
+ }
-out: spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ return 0;
+}
+
+int ib_send_cm_rej(struct ib_cm_id *cm_id, enum ib_cm_rej_reason reason,
+ void *ari, u8 ari_length, const void *private_data,
+ u8 private_data_len)
+{
+ struct cm_id_private *cm_id_priv =
+ container_of(cm_id, struct cm_id_private, id);
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&cm_id_priv->lock, flags);
+ ret = cm_send_rej_locked(cm_id_priv, reason, ari, ari_length,
+ private_data, private_data_len);
+ spin_unlock_irqrestore(&cm_id_priv->lock, flags);
return ret;
}
EXPORT_SYMBOL(ib_send_cm_rej);
@@ -2972,10 +3062,10 @@ static int cm_rej_handler(struct cm_work *work)
}
/* fall through */
default:
- spin_unlock_irq(&cm_id_priv->lock);
pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
__func__, be32_to_cpu(cm_id_priv->id.local_id),
cm_id_priv->id.state);
+ spin_unlock_irq(&cm_id_priv->lock);
ret = -EINVAL;
goto out;
}
@@ -3502,20 +3592,27 @@ static void cm_format_sidr_req_event(struct cm_work *work,
static int cm_sidr_req_handler(struct cm_work *work)
{
- struct ib_cm_id *cm_id;
- struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
+ struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
struct cm_sidr_req_msg *sidr_req_msg;
struct ib_wc *wc;
int ret;
- cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
- if (IS_ERR(cm_id))
- return PTR_ERR(cm_id);
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
+ cm_id_priv =
+ cm_alloc_id_priv(work->port->cm_dev->ib_device, NULL, NULL);
+ if (IS_ERR(cm_id_priv))
+ return PTR_ERR(cm_id_priv);
/* Record SGID/SLID and request ID for lookup. */
sidr_req_msg = (struct cm_sidr_req_msg *)
work->mad_recv_wc->recv_buf.mad;
+
+ cm_id_priv->id.remote_id =
+ cpu_to_be32(IBA_GET(CM_SIDR_REQ_REQUESTID, sidr_req_msg));
+ cm_id_priv->id.service_id =
+ cpu_to_be64(IBA_GET(CM_SIDR_REQ_SERVICEID, sidr_req_msg));
+ cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+ cm_id_priv->tid = sidr_req_msg->hdr.tid;
+
wc = work->mad_recv_wc->wc;
cm_id_priv->av.dgid.global.subnet_prefix = cpu_to_be64(wc->slid);
cm_id_priv->av.dgid.global.interface_id = 0;
@@ -3525,41 +3622,46 @@ static int cm_sidr_req_handler(struct cm_work *work)
if (ret)
goto out;
- cm_id_priv->id.remote_id =
- cpu_to_be32(IBA_GET(CM_SIDR_REQ_REQUESTID, sidr_req_msg));
- cm_id_priv->tid = sidr_req_msg->hdr.tid;
- atomic_inc(&cm_id_priv->work_count);
-
spin_lock_irq(&cm.lock);
- cur_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
- if (cur_cm_id_priv) {
+ listen_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
+ if (listen_cm_id_priv) {
spin_unlock_irq(&cm.lock);
atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
counter[CM_SIDR_REQ_COUNTER]);
goto out; /* Duplicate message. */
}
cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
- cur_cm_id_priv = cm_find_listen(
- cm_id->device,
- cpu_to_be64(IBA_GET(CM_SIDR_REQ_SERVICEID, sidr_req_msg)));
- if (!cur_cm_id_priv) {
+ listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
+ cm_id_priv->id.service_id);
+ if (!listen_cm_id_priv) {
spin_unlock_irq(&cm.lock);
- cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
+ ib_send_cm_sidr_rep(&cm_id_priv->id,
+ &(struct ib_cm_sidr_rep_param){
+ .status = IB_SIDR_UNSUPPORTED });
goto out; /* No match. */
}
- refcount_inc(&cur_cm_id_priv->refcount);
- refcount_inc(&cm_id_priv->refcount);
+ refcount_inc(&listen_cm_id_priv->refcount);
spin_unlock_irq(&cm.lock);
- cm_id_priv->id.cm_handler = cur_cm_id_priv->id.cm_handler;
- cm_id_priv->id.context = cur_cm_id_priv->id.context;
- cm_id_priv->id.service_id =
- cpu_to_be64(IBA_GET(CM_SIDR_REQ_SERVICEID, sidr_req_msg));
- cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+ cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
+ cm_id_priv->id.context = listen_cm_id_priv->id.context;
- cm_format_sidr_req_event(work, cm_id_priv, &cur_cm_id_priv->id);
- cm_process_work(cm_id_priv, work);
- cm_deref_id(cur_cm_id_priv);
+ /*
+ * A SIDR ID does not need to be in the xarray since it does not receive
+ * mads, is not placed in the remote_id or remote_qpn rbtree, and does
+ * not enter timewait.
+ */
+
+ cm_format_sidr_req_event(work, cm_id_priv, &listen_cm_id_priv->id);
+ ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, &work->cm_event);
+ cm_free_work(work);
+ /*
+ * A pointer to the listen_cm_id is held in the event, so this deref
+ * must be after the event is delivered above.
+ */
+ cm_deref_id(listen_cm_id_priv);
+ if (ret)
+ cm_destroy_id(&cm_id_priv->id, ret);
return 0;
out:
ib_destroy_cm_id(&cm_id_priv->id);
@@ -3589,50 +3691,52 @@ static void cm_format_sidr_rep(struct cm_sidr_rep_msg *sidr_rep_msg,
param->private_data, param->private_data_len);
}
-int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
- struct ib_cm_sidr_rep_param *param)
+static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
+ struct ib_cm_sidr_rep_param *param)
{
- struct cm_id_private *cm_id_priv;
struct ib_mad_send_buf *msg;
- unsigned long flags;
int ret;
+ lockdep_assert_held(&cm_id_priv->lock);
+
if ((param->info && param->info_length > IB_CM_SIDR_REP_INFO_LENGTH) ||
(param->private_data &&
param->private_data_len > IB_CM_SIDR_REP_PRIVATE_DATA_SIZE))
return -EINVAL;
- cm_id_priv = container_of(cm_id, struct cm_id_private, id);
- spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id->state != IB_CM_SIDR_REQ_RCVD) {
- ret = -EINVAL;
- goto error;
- }
+ if (cm_id_priv->id.state != IB_CM_SIDR_REQ_RCVD)
+ return -EINVAL;
ret = cm_alloc_msg(cm_id_priv, &msg);
if (ret)
- goto error;
+ return ret;
cm_format_sidr_rep((struct cm_sidr_rep_msg *) msg->mad, cm_id_priv,
param);
ret = ib_post_send_mad(msg, NULL);
if (ret) {
- spin_unlock_irqrestore(&cm_id_priv->lock, flags);
cm_free_msg(msg);
return ret;
}
- cm_id->state = IB_CM_IDLE;
- spin_unlock_irqrestore(&cm_id_priv->lock, flags);
-
- spin_lock_irqsave(&cm.lock, flags);
+ cm_id_priv->id.state = IB_CM_IDLE;
if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) {
rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
}
- spin_unlock_irqrestore(&cm.lock, flags);
return 0;
+}
-error: spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
+ struct ib_cm_sidr_rep_param *param)
+{
+ struct cm_id_private *cm_id_priv =
+ container_of(cm_id, struct cm_id_private, id);
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&cm_id_priv->lock, flags);
+ ret = cm_send_sidr_rep_locked(cm_id_priv, param);
+ spin_unlock_irqrestore(&cm_id_priv->lock, flags);
return ret;
}
EXPORT_SYMBOL(ib_send_cm_sidr_rep);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2dec3a02ab9f..26e6f7df247b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -199,7 +199,7 @@ struct cma_device {
struct list_head list;
struct ib_device *device;
struct completion comp;
- atomic_t refcount;
+ refcount_t refcount;
struct list_head id_list;
enum ib_gid_type *default_gid_type;
u8 *default_roce_tos;
@@ -247,9 +247,15 @@ enum {
CMA_OPTION_AFONLY,
};
-void cma_ref_dev(struct cma_device *cma_dev)
+void cma_dev_get(struct cma_device *cma_dev)
{
- atomic_inc(&cma_dev->refcount);
+ refcount_inc(&cma_dev->refcount);
+}
+
+void cma_dev_put(struct cma_device *cma_dev)
+{
+ if (refcount_dec_and_test(&cma_dev->refcount))
+ complete(&cma_dev->comp);
}
struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter,
@@ -267,7 +273,7 @@ struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter,
}
if (found_cma_dev)
- cma_ref_dev(found_cma_dev);
+ cma_dev_get(found_cma_dev);
mutex_unlock(&lock);
return found_cma_dev;
}
@@ -463,7 +469,7 @@ static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join)
static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
struct cma_device *cma_dev)
{
- cma_ref_dev(cma_dev);
+ cma_dev_get(cma_dev);
id_priv->cma_dev = cma_dev;
id_priv->id.device = cma_dev->device;
id_priv->id.route.addr.dev_addr.transport =
@@ -484,12 +490,6 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
rdma_start_port(cma_dev->device)];
}
-void cma_deref_dev(struct cma_device *cma_dev)
-{
- if (atomic_dec_and_test(&cma_dev->refcount))
- complete(&cma_dev->comp);
-}
-
static inline void release_mc(struct kref *kref)
{
struct cma_multicast *mc = container_of(kref, struct cma_multicast, mcref);
@@ -502,7 +502,7 @@ static void cma_release_dev(struct rdma_id_private *id_priv)
{
mutex_lock(&lock);
list_del(&id_priv->list);
- cma_deref_dev(id_priv->cma_dev);
+ cma_dev_put(id_priv->cma_dev);
id_priv->cma_dev = NULL;
mutex_unlock(&lock);
}
@@ -728,8 +728,8 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
struct cma_device *cma_dev;
enum ib_gid_type gid_type;
int ret = -ENODEV;
+ unsigned int port;
union ib_gid gid;
- u8 port;
if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
id_priv->id.ps == RDMA_PS_IPOIB)
@@ -753,7 +753,7 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
}
list_for_each_entry(cma_dev, &dev_list, list) {
- for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
+ rdma_for_each_port (cma_dev->device, port) {
if (listen_id_priv->cma_dev == cma_dev &&
listen_id_priv->id.port_num == port)
continue;
@@ -786,8 +786,8 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
struct cma_device *cma_dev, *cur_dev;
struct sockaddr_ib *addr;
union ib_gid gid, sgid, *dgid;
+ unsigned int p;
u16 pkey, index;
- u8 p;
enum ib_port_state port_state;
int i;
@@ -798,7 +798,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
mutex_lock(&lock);
list_for_each_entry(cur_dev, &dev_list, list) {
- for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
+ rdma_for_each_port (cur_dev->device, p) {
if (!rdma_cap_af_ib(cur_dev->device, p))
continue;
@@ -840,9 +840,14 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
return 0;
}
-static void cma_deref_id(struct rdma_id_private *id_priv)
+static void cma_id_get(struct rdma_id_private *id_priv)
{
- if (atomic_dec_and_test(&id_priv->refcount))
+ refcount_inc(&id_priv->refcount);
+}
+
+static void cma_id_put(struct rdma_id_private *id_priv)
+{
+ if (refcount_dec_and_test(&id_priv->refcount))
complete(&id_priv->comp);
}
@@ -870,7 +875,7 @@ struct rdma_cm_id *__rdma_create_id(struct net *net,
spin_lock_init(&id_priv->lock);
mutex_init(&id_priv->qp_mutex);
init_completion(&id_priv->comp);
- atomic_set(&id_priv->refcount, 1);
+ refcount_set(&id_priv->refcount, 1);
mutex_init(&id_priv->handler_mutex);
INIT_LIST_HEAD(&id_priv->listen_list);
INIT_LIST_HEAD(&id_priv->mc_list);
@@ -1846,11 +1851,11 @@ void rdma_destroy_id(struct rdma_cm_id *id)
}
cma_release_port(id_priv);
- cma_deref_id(id_priv);
+ cma_id_put(id_priv);
wait_for_completion(&id_priv->comp);
if (id_priv->internal_id)
- cma_deref_id(id_priv->id.context);
+ cma_id_put(id_priv->id.context);
kfree(id_priv->id.route.path_rec);
@@ -2187,7 +2192,7 @@ static int cma_ib_req_handler(struct ib_cm_id *cm_id,
* Protect against the user destroying conn_id from another thread
* until we're done accessing it.
*/
- atomic_inc(&conn_id->refcount);
+ cma_id_get(conn_id);
ret = cma_cm_event_handler(conn_id, &event);
if (ret)
goto err3;
@@ -2204,13 +2209,13 @@ static int cma_ib_req_handler(struct ib_cm_id *cm_id,
mutex_unlock(&lock);
mutex_unlock(&conn_id->handler_mutex);
mutex_unlock(&listen_id->handler_mutex);
- cma_deref_id(conn_id);
+ cma_id_put(conn_id);
if (net_dev)
dev_put(net_dev);
return 0;
err3:
- cma_deref_id(conn_id);
+ cma_id_put(conn_id);
/* Destroy the CM ID by returning a non-zero value. */
conn_id->cm_id.ib = NULL;
err2:
@@ -2391,7 +2396,7 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
* Protect against the user destroying conn_id from another thread
* until we're done accessing it.
*/
- atomic_inc(&conn_id->refcount);
+ cma_id_get(conn_id);
ret = cma_cm_event_handler(conn_id, &event);
if (ret) {
/* User wants to destroy the CM ID */
@@ -2399,13 +2404,13 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
cma_exch(conn_id, RDMA_CM_DESTROYING);
mutex_unlock(&conn_id->handler_mutex);
mutex_unlock(&listen_id->handler_mutex);
- cma_deref_id(conn_id);
+ cma_id_put(conn_id);
rdma_destroy_id(&conn_id->id);
return ret;
}
mutex_unlock(&conn_id->handler_mutex);
- cma_deref_id(conn_id);
+ cma_id_put(conn_id);
out:
mutex_unlock(&listen_id->handler_mutex);
@@ -2492,7 +2497,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
_cma_attach_to_dev(dev_id_priv, cma_dev);
list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
- atomic_inc(&id_priv->refcount);
+ cma_id_get(id_priv);
dev_id_priv->internal_id = 1;
dev_id_priv->afonly = id_priv->afonly;
dev_id_priv->tos_set = id_priv->tos_set;
@@ -2647,7 +2652,7 @@ static void cma_work_handler(struct work_struct *_work)
}
out:
mutex_unlock(&id_priv->handler_mutex);
- cma_deref_id(id_priv);
+ cma_id_put(id_priv);
if (destroy)
rdma_destroy_id(&id_priv->id);
kfree(work);
@@ -2671,7 +2676,7 @@ static void cma_ndev_work_handler(struct work_struct *_work)
out:
mutex_unlock(&id_priv->handler_mutex);
- cma_deref_id(id_priv);
+ cma_id_put(id_priv);
if (destroy)
rdma_destroy_id(&id_priv->id);
kfree(work);
@@ -2687,14 +2692,19 @@ static void cma_init_resolve_route_work(struct cma_work *work,
work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED;
}
-static void cma_init_resolve_addr_work(struct cma_work *work,
- struct rdma_id_private *id_priv)
+static void enqueue_resolve_addr_work(struct cma_work *work,
+ struct rdma_id_private *id_priv)
{
+ /* Balances with cma_id_put() in cma_work_handler */
+ cma_id_get(id_priv);
+
work->id = id_priv;
INIT_WORK(&work->work, cma_work_handler);
work->old_state = RDMA_CM_ADDR_QUERY;
work->new_state = RDMA_CM_ADDR_RESOLVED;
work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
+
+ queue_work(cma_wq, &work->work);
}
static int cma_resolve_ib_route(struct rdma_id_private *id_priv,
@@ -2968,6 +2978,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
err2:
kfree(route->path_rec);
route->path_rec = NULL;
+ route->num_paths = 0;
err1:
kfree(work);
return ret;
@@ -2982,7 +2993,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, unsigned long timeout_ms)
if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_QUERY))
return -EINVAL;
- atomic_inc(&id_priv->refcount);
+ cma_id_get(id_priv);
if (rdma_cap_ib_sa(id->device, id->port_num))
ret = cma_resolve_ib_route(id_priv, timeout_ms);
else if (rdma_protocol_roce(id->device, id->port_num))
@@ -2998,7 +3009,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, unsigned long timeout_ms)
return 0;
err:
cma_comp_exch(id_priv, RDMA_CM_ROUTE_QUERY, RDMA_CM_ADDR_RESOLVED);
- cma_deref_id(id_priv);
+ cma_id_put(id_priv);
return ret;
}
EXPORT_SYMBOL(rdma_resolve_route);
@@ -3025,9 +3036,9 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
struct cma_device *cma_dev, *cur_dev;
union ib_gid gid;
enum ib_port_state port_state;
+ unsigned int p;
u16 pkey;
int ret;
- u8 p;
cma_dev = NULL;
mutex_lock(&lock);
@@ -3039,7 +3050,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
if (!cma_dev)
cma_dev = cur_dev;
- for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
+ rdma_for_each_port (cur_dev->device, p) {
if (!ib_get_cached_port_state(cur_dev->device, p, &port_state) &&
port_state == IB_PORT_ACTIVE) {
cma_dev = cur_dev;
@@ -3148,9 +3159,7 @@ static int cma_resolve_loopback(struct rdma_id_private *id_priv)
rdma_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid);
rdma_addr_set_dgid(&id_priv->id.route.addr.dev_addr, &gid);
- atomic_inc(&id_priv->refcount);
- cma_init_resolve_addr_work(work, id_priv);
- queue_work(cma_wq, &work->work);
+ enqueue_resolve_addr_work(work, id_priv);
return 0;
err:
kfree(work);
@@ -3175,9 +3184,7 @@ static int cma_resolve_ib_addr(struct rdma_id_private *id_priv)
rdma_addr_set_dgid(&id_priv->id.route.addr.dev_addr, (union ib_gid *)
&(((struct sockaddr_ib *) &id_priv->id.route.addr.dst_addr)->sib_addr));
- atomic_inc(&id_priv->refcount);
- cma_init_resolve_addr_work(work, id_priv);
- queue_work(cma_wq, &work->work);
+ enqueue_resolve_addr_work(work, id_priv);
return 0;
err:
kfree(work);
@@ -4588,7 +4595,7 @@ static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id
INIT_WORK(&work->work, cma_ndev_work_handler);
work->id = id_priv;
work->event.event = RDMA_CM_EVENT_ADDR_CHANGE;
- atomic_inc(&id_priv->refcount);
+ cma_id_get(id_priv);
queue_work(cma_wq, &work->work);
}
@@ -4663,7 +4670,7 @@ static void cma_add_one(struct ib_device *device)
}
init_completion(&cma_dev->comp);
- atomic_set(&cma_dev->refcount, 1);
+ refcount_set(&cma_dev->refcount, 1);
INIT_LIST_HEAD(&cma_dev->id_list);
ib_set_client_data(device, &cma_client, cma_dev);
@@ -4722,11 +4729,11 @@ static void cma_process_remove(struct cma_device *cma_dev)
list_del(&id_priv->listen_list);
list_del_init(&id_priv->list);
- atomic_inc(&id_priv->refcount);
+ cma_id_get(id_priv);
mutex_unlock(&lock);
ret = id_priv->internal_id ? 1 : cma_remove_id_dev(id_priv);
- cma_deref_id(id_priv);
+ cma_id_put(id_priv);
if (ret)
rdma_destroy_id(&id_priv->id);
@@ -4734,7 +4741,7 @@ static void cma_process_remove(struct cma_device *cma_dev)
}
mutex_unlock(&lock);
- cma_deref_dev(cma_dev);
+ cma_dev_put(cma_dev);
wait_for_completion(&cma_dev->comp);
}
@@ -4790,6 +4797,19 @@ static int __init cma_init(void)
{
int ret;
+ /*
+ * There is a rare lock ordering dependency in cma_netdev_callback()
+ * that only happens when bonding is enabled. Teach lockdep that rtnl
+ * must never be nested under lock so it can find these without having
+ * to test with bonding.
+ */
+ if (IS_ENABLED(CONFIG_LOCKDEP)) {
+ rtnl_lock();
+ mutex_lock(&lock);
+ mutex_unlock(&lock);
+ rtnl_unlock();
+ }
+
cma_wq = alloc_ordered_workqueue("rdma_cm", WQ_MEM_RECLAIM);
if (!cma_wq)
return -ENOMEM;
diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
index 8b0b5ae22e4c..c672a4978bfd 100644
--- a/drivers/infiniband/core/cma_configfs.c
+++ b/drivers/infiniband/core/cma_configfs.c
@@ -94,7 +94,7 @@ static int cma_configfs_params_get(struct config_item *item,
static void cma_configfs_params_put(struct cma_device *cma_dev)
{
- cma_deref_dev(cma_dev);
+ cma_dev_put(cma_dev);
}
static ssize_t default_roce_mode_show(struct config_item *item,
@@ -312,12 +312,12 @@ static struct config_group *make_cma_dev(struct config_group *group,
configfs_add_default_group(&cma_dev_group->ports_group,
&cma_dev_group->device_group);
- cma_deref_dev(cma_dev);
+ cma_dev_put(cma_dev);
return &cma_dev_group->device_group;
fail:
if (cma_dev)
- cma_deref_dev(cma_dev);
+ cma_dev_put(cma_dev);
kfree(cma_dev_group);
return ERR_PTR(err);
}
diff --git a/drivers/infiniband/core/cma_priv.h b/drivers/infiniband/core/cma_priv.h
index ca7307277518..5edcf44a9307 100644
--- a/drivers/infiniband/core/cma_priv.h
+++ b/drivers/infiniband/core/cma_priv.h
@@ -66,7 +66,7 @@ struct rdma_id_private {
struct mutex qp_mutex;
struct completion comp;
- atomic_t refcount;
+ refcount_t refcount;
struct mutex handler_mutex;
int backlog;
@@ -111,8 +111,8 @@ static inline void cma_configfs_exit(void)
}
#endif
-void cma_ref_dev(struct cma_device *dev);
-void cma_deref_dev(struct cma_device *dev);
+void cma_dev_get(struct cma_device *dev);
+void cma_dev_put(struct cma_device *dev);
typedef bool (*cma_device_filter)(struct ib_device *, void *);
struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter,
void *cookie);
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 956b3a7dfed7..403d8673a2f9 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -79,13 +79,13 @@ struct ib_mad_private {
struct ib_mad_private_header header;
size_t mad_size;
struct ib_grh grh;
- u8 mad[0];
+ u8 mad[];
} __packed;
struct ib_rmpp_segment {
struct list_head list;
u32 num;
- u8 data[0];
+ u8 data[];
};
struct ib_mad_agent_private {
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index cd338ddc4a39..9c2d8b7f1af9 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -71,7 +71,7 @@ struct mcast_device {
struct ib_event_handler event_handler;
int start_port;
int end_port;
- struct mcast_port port[0];
+ struct mcast_port port[];
};
enum mcast_state {
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 06e5b6787443..557efbf29197 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -391,13 +391,13 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
return -EINVAL;
}
- ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+ ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
if (!ret)
return -ENOMEM;
sg_cnt = ret;
if (prot_sg_cnt) {
- ret = ib_dma_map_sg(dev, prot_sg, prot_sg_cnt, dir);
+ ret = rdma_rw_map_sg(dev, prot_sg, prot_sg_cnt, dir);
if (!ret) {
ret = -ENOMEM;
goto out_unmap_sg;
@@ -466,9 +466,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
kfree(ctx->reg);
out_unmap_prot_sg:
if (prot_sg_cnt)
- ib_dma_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
+ rdma_rw_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
out_unmap_sg:
- ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
+ rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
return ret;
}
EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -628,9 +628,9 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
ib_mr_pool_put(qp, &qp->sig_mrs, ctx->reg->mr);
kfree(ctx->reg);
- ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
if (prot_sg_cnt)
- ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+ rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+ rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
}
EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 30d4c126a2db..74e0058fcf9e 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -101,7 +101,7 @@ struct ib_sa_port {
struct ib_sa_device {
int start_port, end_port;
struct ib_event_handler event_handler;
- struct ib_sa_port port[0];
+ struct ib_sa_port port[];
};
struct ib_sa_query {
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 0274e9b704be..16b6cf57fa85 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -85,12 +85,13 @@ struct ucma_file {
struct ucma_context {
u32 id;
struct completion comp;
- atomic_t ref;
+ refcount_t ref;
int events_reported;
int backlog;
struct ucma_file *file;
struct rdma_cm_id *cm_id;
+ struct mutex mutex;
u64 uid;
struct list_head list;
@@ -152,7 +153,7 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
if (ctx->closing)
ctx = ERR_PTR(-EIO);
else
- atomic_inc(&ctx->ref);
+ refcount_inc(&ctx->ref);
}
xa_unlock(&ctx_table);
return ctx;
@@ -160,7 +161,7 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
static void ucma_put_ctx(struct ucma_context *ctx)
{
- if (atomic_dec_and_test(&ctx->ref))
+ if (refcount_dec_and_test(&ctx->ref))
complete(&ctx->comp);
}
@@ -212,10 +213,11 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
return NULL;
INIT_WORK(&ctx->close_work, ucma_close_id);
- atomic_set(&ctx->ref, 1);
+ refcount_set(&ctx->ref, 1);
init_completion(&ctx->comp);
INIT_LIST_HEAD(&ctx->mc_list);
ctx->file = file;
+ mutex_init(&ctx->mutex);
if (xa_alloc(&ctx_table, &ctx->id, ctx, xa_limit_32b, GFP_KERNEL))
goto error;
@@ -589,6 +591,7 @@ static int ucma_free_ctx(struct ucma_context *ctx)
}
events_reported = ctx->events_reported;
+ mutex_destroy(&ctx->mutex);
kfree(ctx);
return events_reported;
}
@@ -658,7 +661,10 @@ static ssize_t ucma_bind_ip(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr);
+ mutex_unlock(&ctx->mutex);
+
ucma_put_ctx(ctx);
return ret;
}
@@ -681,7 +687,9 @@ static ssize_t ucma_bind(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -705,8 +713,10 @@ static ssize_t ucma_resolve_ip(struct ucma_file *file,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr,
(struct sockaddr *) &cmd.dst_addr, cmd.timeout_ms);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -731,8 +741,10 @@ static ssize_t ucma_resolve_addr(struct ucma_file *file,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr,
(struct sockaddr *) &cmd.dst_addr, cmd.timeout_ms);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -752,7 +764,9 @@ static ssize_t ucma_resolve_route(struct ucma_file *file,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -841,6 +855,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
memset(&resp, 0, sizeof resp);
addr = (struct sockaddr *) &ctx->cm_id->route.addr.src_addr;
memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ?
@@ -864,6 +879,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
ucma_copy_iw_route(&resp, &ctx->cm_id->route);
out:
+ mutex_unlock(&ctx->mutex);
if (copy_to_user(u64_to_user_ptr(cmd.response),
&resp, sizeof(resp)))
ret = -EFAULT;
@@ -1014,6 +1030,7 @@ static ssize_t ucma_query(struct ucma_file *file,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
switch (cmd.option) {
case RDMA_USER_CM_QUERY_ADDR:
ret = ucma_query_addr(ctx, response, out_len);
@@ -1028,6 +1045,7 @@ static ssize_t ucma_query(struct ucma_file *file,
ret = -ENOSYS;
break;
}
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
@@ -1045,7 +1063,7 @@ static void ucma_copy_conn_param(struct rdma_cm_id *id,
dst->retry_count = src->retry_count;
dst->rnr_retry_count = src->rnr_retry_count;
dst->srq = src->srq;
- dst->qp_num = src->qp_num;
+ dst->qp_num = src->qp_num & 0xFFFFFF;
dst->qkey = (id->route.addr.src_addr.ss_family == AF_IB) ? src->qkey : 0;
}
@@ -1068,7 +1086,9 @@ static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf,
return PTR_ERR(ctx);
ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param);
+ mutex_lock(&ctx->mutex);
ret = rdma_connect(ctx->cm_id, &conn_param);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -1089,7 +1109,9 @@ static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf,
ctx->backlog = cmd.backlog > 0 && cmd.backlog < max_backlog ?
cmd.backlog : max_backlog;
+ mutex_lock(&ctx->mutex);
ret = rdma_listen(ctx->cm_id, ctx->backlog);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -1112,13 +1134,17 @@ static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf,
if (cmd.conn_param.valid) {
ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param);
mutex_lock(&file->mut);
+ mutex_lock(&ctx->mutex);
ret = __rdma_accept(ctx->cm_id, &conn_param, NULL);
+ mutex_unlock(&ctx->mutex);
if (!ret)
ctx->uid = cmd.uid;
mutex_unlock(&file->mut);
- } else
+ } else {
+ mutex_lock(&ctx->mutex);
ret = __rdma_accept(ctx->cm_id, NULL, NULL);
-
+ mutex_unlock(&ctx->mutex);
+ }
ucma_put_ctx(ctx);
return ret;
}
@@ -1137,7 +1163,9 @@ static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -1156,7 +1184,9 @@ static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
ret = rdma_disconnect(ctx->cm_id);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
}
@@ -1187,7 +1217,9 @@ static ssize_t ucma_init_qp_attr(struct ucma_file *file,
resp.qp_attr_mask = 0;
memset(&qp_attr, 0, sizeof qp_attr);
qp_attr.qp_state = cmd.qp_state;
+ mutex_lock(&ctx->mutex);
ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask);
+ mutex_unlock(&ctx->mutex);
if (ret)
goto out;
@@ -1273,9 +1305,13 @@ static int ucma_set_ib_path(struct ucma_context *ctx,
struct sa_path_rec opa;
sa_convert_path_ib_to_opa(&opa, &sa_path);
+ mutex_lock(&ctx->mutex);
ret = rdma_set_ib_path(ctx->cm_id, &opa);
+ mutex_unlock(&ctx->mutex);
} else {
+ mutex_lock(&ctx->mutex);
ret = rdma_set_ib_path(ctx->cm_id, &sa_path);
+ mutex_unlock(&ctx->mutex);
}
if (ret)
return ret;
@@ -1308,7 +1344,9 @@ static int ucma_set_option_level(struct ucma_context *ctx, int level,
switch (level) {
case RDMA_OPTION_ID:
+ mutex_lock(&ctx->mutex);
ret = ucma_set_option_id(ctx, optname, optval, optlen);
+ mutex_unlock(&ctx->mutex);
break;
case RDMA_OPTION_IB:
ret = ucma_set_option_ib(ctx, optname, optval, optlen);
@@ -1368,8 +1406,10 @@ static ssize_t ucma_notify(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
+ mutex_lock(&ctx->mutex);
if (ctx->cm_id->device)
ret = rdma_notify(ctx->cm_id, (enum ib_event_type)cmd.event);
+ mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
@@ -1412,8 +1452,10 @@ static ssize_t ucma_process_join(struct ucma_file *file,
mc->join_state = join_state;
mc->uid = cmd->uid;
memcpy(&mc->addr, addr, cmd->addr_size);
+ mutex_lock(&ctx->mutex);
ret = rdma_join_multicast(ctx->cm_id, (struct sockaddr *)&mc->addr,
join_state, mc);
+ mutex_unlock(&ctx->mutex);
if (ret)
goto err2;
@@ -1502,7 +1544,7 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
mc = ERR_PTR(-ENOENT);
else if (mc->ctx->file != file)
mc = ERR_PTR(-EINVAL);
- else if (!atomic_inc_not_zero(&mc->ctx->ref))
+ else if (!refcount_inc_not_zero(&mc->ctx->ref))
mc = ERR_PTR(-ENXIO);
else
__xa_erase(&multicast_table, mc->id);
@@ -1513,7 +1555,10 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
goto out;
}
+ mutex_lock(&mc->ctx->mutex);
rdma_leave_multicast(mc->ctx->cm_id, (struct sockaddr *) &mc->addr);
+ mutex_unlock(&mc->ctx->mutex);
+
mutex_lock(&mc->ctx->file->mut);
ucma_cleanup_mc_events(mc);
list_del(&mc->list);
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 06b6125b5ae1..82455a1392f1 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -197,6 +197,7 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
unsigned long lock_limit;
unsigned long new_pinned;
unsigned long cur_base;
+ unsigned long dma_attr = 0;
struct mm_struct *mm;
unsigned long npages;
int ret;
@@ -278,10 +279,12 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
sg_mark_end(sg);
- umem->nmap = ib_dma_map_sg(device,
- umem->sg_head.sgl,
- umem->sg_nents,
- DMA_BIDIRECTIONAL);
+ if (access & IB_ACCESS_RELAXED_ORDERING)
+ dma_attr |= DMA_ATTR_WEAK_ORDERING;
+
+ umem->nmap =
+ ib_dma_map_sg_attrs(device, umem->sg_head.sgl, umem->sg_nents,
+ DMA_BIDIRECTIONAL, dma_attr);
if (!umem->nmap) {
ret = -ENOMEM;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e62c9dfc7837..56a71337112c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -54,8 +54,6 @@
#include "core_priv.h"
#include
-#include
-
static int ib_resolve_eth_dmac(struct ib_device *device,
struct rdma_ah_attr *ah_attr);
@@ -1127,8 +1125,7 @@ struct ib_qp *ib_open_qp(struct ib_xrcd *xrcd,
EXPORT_SYMBOL(ib_open_qp);
static struct ib_qp *create_xrc_qp_user(struct ib_qp *qp,
- struct ib_qp_init_attr *qp_init_attr,
- struct ib_udata *udata)
+ struct ib_qp_init_attr *qp_init_attr)
{
struct ib_qp *real_qp = qp;
@@ -1150,9 +1147,18 @@ static struct ib_qp *create_xrc_qp_user(struct ib_qp *qp,
return qp;
}
-struct ib_qp *ib_create_qp_user(struct ib_pd *pd,
- struct ib_qp_init_attr *qp_init_attr,
- struct ib_udata *udata)
+/**
+ * ib_create_qp - Creates a kernel QP associated with the specified protection
+ * domain.
+ * @pd: The protection domain associated with the QP.
+ * @qp_init_attr: A list of initial attributes required to create the
+ * QP. If QP creation succeeds, then the attributes are updated to
+ * the actual capabilities of the created QP.
+ *
+ * NOTE: for user qp use ib_create_qp_user with valid udata!
+ */
+struct ib_qp *ib_create_qp(struct ib_pd *pd,
+ struct ib_qp_init_attr *qp_init_attr)
{
struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
struct ib_qp *qp;
@@ -1187,7 +1193,7 @@ struct ib_qp *ib_create_qp_user(struct ib_pd *pd,
if (qp_init_attr->qp_type == IB_QPT_XRC_TGT) {
struct ib_qp *xrc_qp =
- create_xrc_qp_user(qp, qp_init_attr, udata);
+ create_xrc_qp_user(qp, qp_init_attr);
if (IS_ERR(xrc_qp)) {
ret = PTR_ERR(xrc_qp);
@@ -1243,7 +1249,7 @@ struct ib_qp *ib_create_qp_user(struct ib_pd *pd,
return ERR_PTR(ret);
}
-EXPORT_SYMBOL(ib_create_qp_user);
+EXPORT_SYMBOL(ib_create_qp);
static const struct {
int valid;
diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 725b2350e349..a300588634c5 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -89,6 +89,15 @@
#define BNXT_RE_DEFAULT_ACK_DELAY 16
+struct bnxt_re_ring_attr {
+ dma_addr_t *dma_arr;
+ int pages;
+ int type;
+ u32 depth;
+ u32 lrid; /* Logical ring id */
+ u8 mode;
+};
+
struct bnxt_re_work {
struct work_struct work;
unsigned long event;
@@ -104,6 +113,14 @@ struct bnxt_re_sqp_entries {
struct bnxt_re_qp *qp1_qp;
};
+#define BNXT_RE_MAX_GSI_SQP_ENTRIES 1024
+struct bnxt_re_gsi_context {
+ struct bnxt_re_qp *gsi_qp;
+ struct bnxt_re_qp *gsi_sqp;
+ struct bnxt_re_ah *gsi_sah;
+ struct bnxt_re_sqp_entries *sqp_tbl;
+};
+
#define BNXT_RE_MIN_MSIX 2
#define BNXT_RE_MAX_MSIX 9
#define BNXT_RE_AEQ_IDX 0
@@ -115,7 +132,6 @@ struct bnxt_re_dev {
struct list_head list;
unsigned long flags;
#define BNXT_RE_FLAG_NETDEV_REGISTERED 0
-#define BNXT_RE_FLAG_IBDEV_REGISTERED 1
#define BNXT_RE_FLAG_GOT_MSIX 2
#define BNXT_RE_FLAG_HAVE_L2_REF 3
#define BNXT_RE_FLAG_RCFW_CHANNEL_EN 4
@@ -125,7 +141,7 @@ struct bnxt_re_dev {
#define BNXT_RE_FLAG_ISSUE_ROCE_STATS 29
struct net_device *netdev;
unsigned int version, major, minor;
- struct bnxt_qplib_chip_ctx chip_ctx;
+ struct bnxt_qplib_chip_ctx *chip_ctx;
struct bnxt_en_dev *en_dev;
struct bnxt_msix_entry msix_entries[BNXT_RE_MAX_MSIX];
int num_msix;
@@ -160,15 +176,11 @@ struct bnxt_re_dev {
atomic_t srq_count;
atomic_t mr_count;
atomic_t mw_count;
- atomic_t sched_count;
/* Max of 2 lossless traffic class supported per port */
u16 cosq[2];
/* QP for for handling QP1 packets */
- u32 sqp_id;
- struct bnxt_re_qp *qp1_sqp;
- struct bnxt_re_ah *sqp_ah;
- struct bnxt_re_sqp_entries sqp_tbl[1024];
+ struct bnxt_re_gsi_context gsi_ctx;
atomic_t nq_alloc_cnt;
u32 is_virtfn;
u32 num_vfs;
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 52b6a4d85460..95f6d493d1b9 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -312,9 +312,9 @@ int bnxt_re_del_gid(const struct ib_gid_attr *attr, void **context)
*/
if (ctx->idx == 0 &&
rdma_link_local_addr((struct in6_addr *)gid_to_del) &&
- ctx->refcnt == 1 && rdev->qp1_sqp) {
- dev_dbg(rdev_to_dev(rdev),
- "Trying to delete GID0 while QP1 is alive\n");
+ ctx->refcnt == 1 && rdev->gsi_ctx.gsi_sqp) {
+ ibdev_dbg(&rdev->ibdev,
+ "Trying to delete GID0 while QP1 is alive\n");
return -EFAULT;
}
ctx->refcnt--;
@@ -322,8 +322,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr *attr, void **context)
rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del,
vlan_id, true);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to remove GID: %#x", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to remove GID: %#x", rc);
} else {
ctx_tbl = sgid_tbl->ctx;
ctx_tbl[ctx->idx] = NULL;
@@ -360,7 +360,7 @@ int bnxt_re_add_gid(const struct ib_gid_attr *attr, void **context)
}
if (rc < 0) {
- dev_err(rdev_to_dev(rdev), "Failed to add GID: %#x", rc);
+ ibdev_err(&rdev->ibdev, "Failed to add GID: %#x", rc);
return rc;
}
@@ -423,12 +423,12 @@ static int bnxt_re_bind_fence_mw(struct bnxt_qplib_qp *qplib_qp)
wqe.bind.r_key = fence->bind_rkey;
fence->bind_rkey = ib_inc_rkey(fence->bind_rkey);
- dev_dbg(rdev_to_dev(qp->rdev),
- "Posting bind fence-WQE: rkey: %#x QP: %d PD: %p\n",
+ ibdev_dbg(&qp->rdev->ibdev,
+ "Posting bind fence-WQE: rkey: %#x QP: %d PD: %p\n",
wqe.bind.r_key, qp->qplib_qp.id, pd);
rc = bnxt_qplib_post_send(&qp->qplib_qp, &wqe);
if (rc) {
- dev_err(rdev_to_dev(qp->rdev), "Failed to bind fence-WQE\n");
+ ibdev_err(&qp->rdev->ibdev, "Failed to bind fence-WQE\n");
return rc;
}
bnxt_qplib_post_send_db(&qp->qplib_qp);
@@ -479,7 +479,7 @@ static int bnxt_re_create_fence_mr(struct bnxt_re_pd *pd)
DMA_BIDIRECTIONAL);
rc = dma_mapping_error(dev, dma_addr);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to dma-map fence-MR-mem\n");
+ ibdev_err(&rdev->ibdev, "Failed to dma-map fence-MR-mem\n");
rc = -EIO;
fence->dma_addr = 0;
goto fail;
@@ -499,7 +499,7 @@ static int bnxt_re_create_fence_mr(struct bnxt_re_pd *pd)
mr->qplib_mr.flags = __from_ib_access_flags(mr_access_flags);
rc = bnxt_qplib_alloc_mrw(&rdev->qplib_res, &mr->qplib_mr);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to alloc fence-HW-MR\n");
+ ibdev_err(&rdev->ibdev, "Failed to alloc fence-HW-MR\n");
goto fail;
}
@@ -511,7 +511,7 @@ static int bnxt_re_create_fence_mr(struct bnxt_re_pd *pd)
rc = bnxt_qplib_reg_mr(&rdev->qplib_res, &mr->qplib_mr, &pbl_tbl,
BNXT_RE_FENCE_PBL_SIZE, false, PAGE_SIZE);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to register fence-MR\n");
+ ibdev_err(&rdev->ibdev, "Failed to register fence-MR\n");
goto fail;
}
mr->ib_mr.rkey = mr->qplib_mr.rkey;
@@ -519,8 +519,8 @@ static int bnxt_re_create_fence_mr(struct bnxt_re_pd *pd)
/* Create a fence MW only for kernel consumers */
mw = bnxt_re_alloc_mw(&pd->ib_pd, IB_MW_TYPE_1, NULL);
if (IS_ERR(mw)) {
- dev_err(rdev_to_dev(rdev),
- "Failed to create fence-MW for PD: %p\n", pd);
+ ibdev_err(&rdev->ibdev,
+ "Failed to create fence-MW for PD: %p\n", pd);
rc = PTR_ERR(mw);
goto fail;
}
@@ -558,7 +558,7 @@ int bnxt_re_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
pd->rdev = rdev;
if (bnxt_qplib_alloc_pd(&rdev->qplib_res.pd_tbl, &pd->qplib_pd)) {
- dev_err(rdev_to_dev(rdev), "Failed to allocate HW PD");
+ ibdev_err(&rdev->ibdev, "Failed to allocate HW PD");
rc = -ENOMEM;
goto fail;
}
@@ -585,16 +585,16 @@ int bnxt_re_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to copy user response\n");
+ ibdev_err(&rdev->ibdev,
+ "Failed to copy user response\n");
goto dbfail;
}
}
if (!udata)
if (bnxt_re_create_fence_mr(pd))
- dev_warn(rdev_to_dev(rdev),
- "Failed to create Fence-MR\n");
+ ibdev_warn(&rdev->ibdev,
+ "Failed to create Fence-MR\n");
return 0;
dbfail:
bnxt_qplib_dealloc_pd(&rdev->qplib_res, &rdev->qplib_res.pd_tbl,
@@ -639,12 +639,13 @@ int bnxt_re_create_ah(struct ib_ah *ib_ah, struct rdma_ah_attr *ah_attr,
const struct ib_global_route *grh = rdma_ah_read_grh(ah_attr);
struct bnxt_re_dev *rdev = pd->rdev;
const struct ib_gid_attr *sgid_attr;
+ struct bnxt_re_gid_ctx *ctx;
struct bnxt_re_ah *ah = container_of(ib_ah, struct bnxt_re_ah, ib_ah);
u8 nw_type;
int rc;
if (!(rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH)) {
- dev_err(rdev_to_dev(rdev), "Failed to alloc AH: GRH not set");
+ ibdev_err(&rdev->ibdev, "Failed to alloc AH: GRH not set");
return -EINVAL;
}
@@ -654,19 +655,18 @@ int bnxt_re_create_ah(struct ib_ah *ib_ah, struct rdma_ah_attr *ah_attr,
/* Supply the configuration for the HW */
memcpy(ah->qplib_ah.dgid.data, grh->dgid.raw,
sizeof(union ib_gid));
- /*
- * If RoCE V2 is enabled, stack will have two entries for
- * each GID entry. Avoiding this duplicte entry in HW. Dividing
- * the GID index by 2 for RoCE V2
+ sgid_attr = grh->sgid_attr;
+ /* Get the HW context of the GID. The reference
+ * of GID table entry is already taken by the caller.
*/
- ah->qplib_ah.sgid_index = grh->sgid_index / 2;
+ ctx = rdma_read_gid_hw_context(sgid_attr);
+ ah->qplib_ah.sgid_index = ctx->idx;
ah->qplib_ah.host_sgid_index = grh->sgid_index;
ah->qplib_ah.traffic_class = grh->traffic_class;
ah->qplib_ah.flow_label = grh->flow_label;
ah->qplib_ah.hop_limit = grh->hop_limit;
ah->qplib_ah.sl = rdma_ah_get_sl(ah_attr);
- sgid_attr = grh->sgid_attr;
/* Get network header type for this GID */
nw_type = rdma_gid_attr_network_type(sgid_attr);
ah->qplib_ah.nw_type = bnxt_re_stack_to_dev_nw_type(nw_type);
@@ -675,7 +675,7 @@ int bnxt_re_create_ah(struct ib_ah *ib_ah, struct rdma_ah_attr *ah_attr,
rc = bnxt_qplib_create_ah(&rdev->qplib_res, &ah->qplib_ah,
!(flags & RDMA_CREATE_AH_SLEEPABLE));
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to allocate HW AH");
+ ibdev_err(&rdev->ibdev, "Failed to allocate HW AH");
return rc;
}
@@ -742,6 +742,49 @@ void bnxt_re_unlock_cqs(struct bnxt_re_qp *qp,
spin_unlock_irqrestore(&qp->scq->cq_lock, flags);
}
+static int bnxt_re_destroy_gsi_sqp(struct bnxt_re_qp *qp)
+{
+ struct bnxt_re_qp *gsi_sqp;
+ struct bnxt_re_ah *gsi_sah;
+ struct bnxt_re_dev *rdev;
+ int rc = 0;
+
+ rdev = qp->rdev;
+ gsi_sqp = rdev->gsi_ctx.gsi_sqp;
+ gsi_sah = rdev->gsi_ctx.gsi_sah;
+
+ /* remove from active qp list */
+ mutex_lock(&rdev->qp_lock);
+ list_del(&gsi_sqp->list);
+ mutex_unlock(&rdev->qp_lock);
+ atomic_dec(&rdev->qp_count);
+
+ ibdev_dbg(&rdev->ibdev, "Destroy the shadow AH\n");
+ bnxt_qplib_destroy_ah(&rdev->qplib_res,
+ &gsi_sah->qplib_ah,
+ true);
+ bnxt_qplib_clean_qp(&qp->qplib_qp);
+
+ ibdev_dbg(&rdev->ibdev, "Destroy the shadow QP\n");
+ rc = bnxt_qplib_destroy_qp(&rdev->qplib_res, &gsi_sqp->qplib_qp);
+ if (rc) {
+ ibdev_err(&rdev->ibdev, "Destroy Shadow QP failed");
+ goto fail;
+ }
+ bnxt_qplib_free_qp_res(&rdev->qplib_res, &gsi_sqp->qplib_qp);
+
+ kfree(rdev->gsi_ctx.sqp_tbl);
+ kfree(gsi_sah);
+ kfree(gsi_sqp);
+ rdev->gsi_ctx.gsi_sqp = NULL;
+ rdev->gsi_ctx.gsi_sah = NULL;
+ rdev->gsi_ctx.sqp_tbl = NULL;
+
+ return 0;
+fail:
+ return rc;
+}
+
/* Queue Pairs */
int bnxt_re_destroy_qp(struct ib_qp *ib_qp, struct ib_udata *udata)
{
@@ -750,10 +793,16 @@ int bnxt_re_destroy_qp(struct ib_qp *ib_qp, struct ib_udata *udata)
unsigned int flags;
int rc;
+ mutex_lock(&rdev->qp_lock);
+ list_del(&qp->list);
+ mutex_unlock(&rdev->qp_lock);
+ atomic_dec(&rdev->qp_count);
+
bnxt_qplib_flush_cqn_wq(&qp->qplib_qp);
+
rc = bnxt_qplib_destroy_qp(&rdev->qplib_res, &qp->qplib_qp);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to destroy HW QP");
+ ibdev_err(&rdev->ibdev, "Failed to destroy HW QP");
return rc;
}
@@ -765,40 +814,19 @@ int bnxt_re_destroy_qp(struct ib_qp *ib_qp, struct ib_udata *udata)
bnxt_qplib_free_qp_res(&rdev->qplib_res, &qp->qplib_qp);
- if (ib_qp->qp_type == IB_QPT_GSI && rdev->qp1_sqp) {
- bnxt_qplib_destroy_ah(&rdev->qplib_res, &rdev->sqp_ah->qplib_ah,
- false);
-
- bnxt_qplib_clean_qp(&qp->qplib_qp);
- rc = bnxt_qplib_destroy_qp(&rdev->qplib_res,
- &rdev->qp1_sqp->qplib_qp);
- if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to destroy Shadow QP");
- return rc;
- }
- bnxt_qplib_free_qp_res(&rdev->qplib_res,
- &rdev->qp1_sqp->qplib_qp);
- mutex_lock(&rdev->qp_lock);
- list_del(&rdev->qp1_sqp->list);
- atomic_dec(&rdev->qp_count);
- mutex_unlock(&rdev->qp_lock);
-
- kfree(rdev->sqp_ah);
- kfree(rdev->qp1_sqp);
- rdev->qp1_sqp = NULL;
- rdev->sqp_ah = NULL;
+ if (ib_qp->qp_type == IB_QPT_GSI && rdev->gsi_ctx.gsi_sqp) {
+ rc = bnxt_re_destroy_gsi_sqp(qp);
+ if (rc)
+ goto sh_fail;
}
ib_umem_release(qp->rumem);
ib_umem_release(qp->sumem);
- mutex_lock(&rdev->qp_lock);
- list_del(&qp->list);
- atomic_dec(&rdev->qp_count);
- mutex_unlock(&rdev->qp_lock);
kfree(qp);
return 0;
+sh_fail:
+ return rc;
}
static u8 __from_ib_qp_type(enum ib_qp_type type)
@@ -831,7 +859,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
bytes = (qplib_qp->sq.max_wqe * BNXT_QPLIB_MAX_SQE_ENTRY_SIZE);
/* Consider mapping PSN search memory only for RC QPs. */
if (qplib_qp->type == CMDQ_CREATE_QP_TYPE_RC) {
- psn_sz = bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx) ?
+ psn_sz = bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx) ?
sizeof(struct sq_psn_search_ext) :
sizeof(struct sq_psn_search);
bytes += (qplib_qp->sq.max_wqe * psn_sz);
@@ -843,9 +871,11 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
return PTR_ERR(umem);
qp->sumem = umem;
- qplib_qp->sq.sg_info.sglist = umem->sg_head.sgl;
+ qplib_qp->sq.sg_info.sghead = umem->sg_head.sgl;
qplib_qp->sq.sg_info.npages = ib_umem_num_pages(umem);
qplib_qp->sq.sg_info.nmap = umem->nmap;
+ qplib_qp->sq.sg_info.pgsize = PAGE_SIZE;
+ qplib_qp->sq.sg_info.pgshft = PAGE_SHIFT;
qplib_qp->qp_handle = ureq.qp_handle;
if (!qp->qplib_qp.srq) {
@@ -856,9 +886,11 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
if (IS_ERR(umem))
goto rqfail;
qp->rumem = umem;
- qplib_qp->rq.sg_info.sglist = umem->sg_head.sgl;
+ qplib_qp->rq.sg_info.sghead = umem->sg_head.sgl;
qplib_qp->rq.sg_info.npages = ib_umem_num_pages(umem);
qplib_qp->rq.sg_info.nmap = umem->nmap;
+ qplib_qp->rq.sg_info.pgsize = PAGE_SIZE;
+ qplib_qp->rq.sg_info.pgshft = PAGE_SHIFT;
}
qplib_qp->dpi = &cntx->dpi;
@@ -906,8 +938,8 @@ static struct bnxt_re_ah *bnxt_re_create_shadow_qp_ah
rc = bnxt_qplib_create_ah(&rdev->qplib_res, &ah->qplib_ah, false);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to allocate HW AH for Shadow QP");
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate HW AH for Shadow QP");
goto fail;
}
@@ -948,6 +980,8 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
qp->qplib_qp.sq.max_sge = 2;
/* Q full delta can be 1 since it is internal QP */
qp->qplib_qp.sq.q_full_delta = 1;
+ qp->qplib_qp.sq.sg_info.pgsize = PAGE_SIZE;
+ qp->qplib_qp.sq.sg_info.pgshft = PAGE_SHIFT;
qp->qplib_qp.scq = qp1_qp->scq;
qp->qplib_qp.rcq = qp1_qp->rcq;
@@ -956,6 +990,8 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
qp->qplib_qp.rq.max_sge = qp1_qp->rq.max_sge;
/* Q full delta can be 1 since it is internal QP */
qp->qplib_qp.rq.q_full_delta = 1;
+ qp->qplib_qp.rq.sg_info.pgsize = PAGE_SIZE;
+ qp->qplib_qp.rq.sg_info.pgshft = PAGE_SHIFT;
qp->qplib_qp.mtu = qp1_qp->mtu;
@@ -967,8 +1003,6 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
if (rc)
goto fail;
- rdev->sqp_id = qp->qplib_qp.id;
-
spin_lock_init(&qp->sq_lock);
INIT_LIST_HEAD(&qp->list);
mutex_lock(&rdev->qp_lock);
@@ -981,6 +1015,316 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
return NULL;
}
+static int bnxt_re_init_rq_attr(struct bnxt_re_qp *qp,
+ struct ib_qp_init_attr *init_attr)
+{
+ struct bnxt_qplib_dev_attr *dev_attr;
+ struct bnxt_qplib_qp *qplqp;
+ struct bnxt_re_dev *rdev;
+ int entries;
+
+ rdev = qp->rdev;
+ qplqp = &qp->qplib_qp;
+ dev_attr = &rdev->dev_attr;
+
+ if (init_attr->srq) {
+ struct bnxt_re_srq *srq;
+
+ srq = container_of(init_attr->srq, struct bnxt_re_srq, ib_srq);
+ if (!srq) {
+ ibdev_err(&rdev->ibdev, "SRQ not found");
+ return -EINVAL;
+ }
+ qplqp->srq = &srq->qplib_srq;
+ qplqp->rq.max_wqe = 0;
+ } else {
+ /* Allocate 1 more than what's provided so posting max doesn't
+ * mean empty.
+ */
+ entries = roundup_pow_of_two(init_attr->cap.max_recv_wr + 1);
+ qplqp->rq.max_wqe = min_t(u32, entries,
+ dev_attr->max_qp_wqes + 1);
+
+ qplqp->rq.q_full_delta = qplqp->rq.max_wqe -
+ init_attr->cap.max_recv_wr;
+ qplqp->rq.max_sge = init_attr->cap.max_recv_sge;
+ if (qplqp->rq.max_sge > dev_attr->max_qp_sges)
+ qplqp->rq.max_sge = dev_attr->max_qp_sges;
+ }
+ qplqp->rq.sg_info.pgsize = PAGE_SIZE;
+ qplqp->rq.sg_info.pgshft = PAGE_SHIFT;
+
+ return 0;
+}
+
+static void bnxt_re_adjust_gsi_rq_attr(struct bnxt_re_qp *qp)
+{
+ struct bnxt_qplib_dev_attr *dev_attr;
+ struct bnxt_qplib_qp *qplqp;
+ struct bnxt_re_dev *rdev;
+
+ rdev = qp->rdev;
+ qplqp = &qp->qplib_qp;
+ dev_attr = &rdev->dev_attr;
+
+ qplqp->rq.max_sge = dev_attr->max_qp_sges;
+ if (qplqp->rq.max_sge > dev_attr->max_qp_sges)
+ qplqp->rq.max_sge = dev_attr->max_qp_sges;
+ qplqp->rq.max_sge = 6;
+}
+
+static void bnxt_re_init_sq_attr(struct bnxt_re_qp *qp,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata)
+{
+ struct bnxt_qplib_dev_attr *dev_attr;
+ struct bnxt_qplib_qp *qplqp;
+ struct bnxt_re_dev *rdev;
+ int entries;
+
+ rdev = qp->rdev;
+ qplqp = &qp->qplib_qp;
+ dev_attr = &rdev->dev_attr;
+
+ qplqp->sq.max_sge = init_attr->cap.max_send_sge;
+ if (qplqp->sq.max_sge > dev_attr->max_qp_sges)
+ qplqp->sq.max_sge = dev_attr->max_qp_sges;
+ /*
+ * Change the SQ depth if user has requested minimum using
+ * configfs. Only supported for kernel consumers
+ */
+ entries = init_attr->cap.max_send_wr;
+ /* Allocate 128 + 1 more than what's provided */
+ entries = roundup_pow_of_two(entries + BNXT_QPLIB_RESERVED_QP_WRS + 1);
+ qplqp->sq.max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes +
+ BNXT_QPLIB_RESERVED_QP_WRS + 1);
+ qplqp->sq.q_full_delta = BNXT_QPLIB_RESERVED_QP_WRS + 1;
+ /*
+ * Reserving one slot for Phantom WQE. Application can
+ * post one extra entry in this case. But allowing this to avoid
+ * unexpected Queue full condition
+ */
+ qplqp->sq.q_full_delta -= 1;
+ qplqp->sq.sg_info.pgsize = PAGE_SIZE;
+ qplqp->sq.sg_info.pgshft = PAGE_SHIFT;
+}
+
+static void bnxt_re_adjust_gsi_sq_attr(struct bnxt_re_qp *qp,
+ struct ib_qp_init_attr *init_attr)
+{
+ struct bnxt_qplib_dev_attr *dev_attr;
+ struct bnxt_qplib_qp *qplqp;
+ struct bnxt_re_dev *rdev;
+ int entries;
+
+ rdev = qp->rdev;
+ qplqp = &qp->qplib_qp;
+ dev_attr = &rdev->dev_attr;
+
+ entries = roundup_pow_of_two(init_attr->cap.max_send_wr + 1);
+ qplqp->sq.max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes + 1);
+ qplqp->sq.q_full_delta = qplqp->sq.max_wqe -
+ init_attr->cap.max_send_wr;
+ qplqp->sq.max_sge++; /* Need one extra sge to put UD header */
+ if (qplqp->sq.max_sge > dev_attr->max_qp_sges)
+ qplqp->sq.max_sge = dev_attr->max_qp_sges;
+}
+
+static int bnxt_re_init_qp_type(struct bnxt_re_dev *rdev,
+ struct ib_qp_init_attr *init_attr)
+{
+ struct bnxt_qplib_chip_ctx *chip_ctx;
+ int qptype;
+
+ chip_ctx = rdev->chip_ctx;
+
+ qptype = __from_ib_qp_type(init_attr->qp_type);
+ if (qptype == IB_QPT_MAX) {
+ ibdev_err(&rdev->ibdev, "QP type 0x%x not supported", qptype);
+ qptype = -EOPNOTSUPP;
+ goto out;
+ }
+
+ if (bnxt_qplib_is_chip_gen_p5(chip_ctx) &&
+ init_attr->qp_type == IB_QPT_GSI)
+ qptype = CMDQ_CREATE_QP_TYPE_GSI;
+out:
+ return qptype;
+}
+
+static int bnxt_re_init_qp_attr(struct bnxt_re_qp *qp, struct bnxt_re_pd *pd,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata)
+{
+ struct bnxt_qplib_dev_attr *dev_attr;
+ struct bnxt_qplib_qp *qplqp;
+ struct bnxt_re_dev *rdev;
+ struct bnxt_re_cq *cq;
+ int rc = 0, qptype;
+
+ rdev = qp->rdev;
+ qplqp = &qp->qplib_qp;
+ dev_attr = &rdev->dev_attr;
+
+ /* Setup misc params */
+ ether_addr_copy(qplqp->smac, rdev->netdev->dev_addr);
+ qplqp->pd = &pd->qplib_pd;
+ qplqp->qp_handle = (u64)qplqp;
+ qplqp->max_inline_data = init_attr->cap.max_inline_data;
+ qplqp->sig_type = ((init_attr->sq_sig_type == IB_SIGNAL_ALL_WR) ?
+ true : false);
+ qptype = bnxt_re_init_qp_type(rdev, init_attr);
+ if (qptype < 0) {
+ rc = qptype;
+ goto out;
+ }
+ qplqp->type = (u8)qptype;
+
+ if (init_attr->qp_type == IB_QPT_RC) {
+ qplqp->max_rd_atomic = dev_attr->max_qp_rd_atom;
+ qplqp->max_dest_rd_atomic = dev_attr->max_qp_init_rd_atom;
+ }
+ qplqp->mtu = ib_mtu_enum_to_int(iboe_get_mtu(rdev->netdev->mtu));
+ qplqp->dpi = &rdev->dpi_privileged; /* Doorbell page */
+ if (init_attr->create_flags)
+ ibdev_dbg(&rdev->ibdev,
+ "QP create flags 0x%x not supported",
+ init_attr->create_flags);
+
+ /* Setup CQs */
+ if (init_attr->send_cq) {
+ cq = container_of(init_attr->send_cq, struct bnxt_re_cq, ib_cq);
+ if (!cq) {
+ ibdev_err(&rdev->ibdev, "Send CQ not found");
+ rc = -EINVAL;
+ goto out;
+ }
+ qplqp->scq = &cq->qplib_cq;
+ qp->scq = cq;
+ }
+
+ if (init_attr->recv_cq) {
+ cq = container_of(init_attr->recv_cq, struct bnxt_re_cq, ib_cq);
+ if (!cq) {
+ ibdev_err(&rdev->ibdev, "Receive CQ not found");
+ rc = -EINVAL;
+ goto out;
+ }
+ qplqp->rcq = &cq->qplib_cq;
+ qp->rcq = cq;
+ }
+
+ /* Setup RQ/SRQ */
+ rc = bnxt_re_init_rq_attr(qp, init_attr);
+ if (rc)
+ goto out;
+ if (init_attr->qp_type == IB_QPT_GSI)
+ bnxt_re_adjust_gsi_rq_attr(qp);
+
+ /* Setup SQ */
+ bnxt_re_init_sq_attr(qp, init_attr, udata);
+ if (init_attr->qp_type == IB_QPT_GSI)
+ bnxt_re_adjust_gsi_sq_attr(qp, init_attr);
+
+ if (udata) /* This will update DPI and qp_handle */
+ rc = bnxt_re_init_user_qp(rdev, pd, qp, udata);
+out:
+ return rc;
+}
+
+static int bnxt_re_create_shadow_gsi(struct bnxt_re_qp *qp,
+ struct bnxt_re_pd *pd)
+{
+ struct bnxt_re_sqp_entries *sqp_tbl = NULL;
+ struct bnxt_re_dev *rdev;
+ struct bnxt_re_qp *sqp;
+ struct bnxt_re_ah *sah;
+ int rc = 0;
+
+ rdev = qp->rdev;
+ /* Create a shadow QP to handle the QP1 traffic */
+ sqp_tbl = kzalloc(sizeof(*sqp_tbl) * BNXT_RE_MAX_GSI_SQP_ENTRIES,
+ GFP_KERNEL);
+ if (!sqp_tbl)
+ return -ENOMEM;
+ rdev->gsi_ctx.sqp_tbl = sqp_tbl;
+
+ sqp = bnxt_re_create_shadow_qp(pd, &rdev->qplib_res, &qp->qplib_qp);
+ if (!sqp) {
+ rc = -ENODEV;
+ ibdev_err(&rdev->ibdev, "Failed to create Shadow QP for QP1");
+ goto out;
+ }
+ rdev->gsi_ctx.gsi_sqp = sqp;
+
+ sqp->rcq = qp->rcq;
+ sqp->scq = qp->scq;
+ sah = bnxt_re_create_shadow_qp_ah(pd, &rdev->qplib_res,
+ &qp->qplib_qp);
+ if (!sah) {
+ bnxt_qplib_destroy_qp(&rdev->qplib_res,
+ &sqp->qplib_qp);
+ rc = -ENODEV;
+ ibdev_err(&rdev->ibdev,
+ "Failed to create AH entry for ShadowQP");
+ goto out;
+ }
+ rdev->gsi_ctx.gsi_sah = sah;
+
+ return 0;
+out:
+ kfree(sqp_tbl);
+ return rc;
+}
+
+static int bnxt_re_create_gsi_qp(struct bnxt_re_qp *qp, struct bnxt_re_pd *pd,
+ struct ib_qp_init_attr *init_attr)
+{
+ struct bnxt_re_dev *rdev;
+ struct bnxt_qplib_qp *qplqp;
+ int rc = 0;
+
+ rdev = qp->rdev;
+ qplqp = &qp->qplib_qp;
+
+ qplqp->rq_hdr_buf_size = BNXT_QPLIB_MAX_QP1_RQ_HDR_SIZE_V2;
+ qplqp->sq_hdr_buf_size = BNXT_QPLIB_MAX_QP1_SQ_HDR_SIZE_V2;
+
+ rc = bnxt_qplib_create_qp1(&rdev->qplib_res, qplqp);
+ if (rc) {
+ ibdev_err(&rdev->ibdev, "create HW QP1 failed!");
+ goto out;
+ }
+
+ rc = bnxt_re_create_shadow_gsi(qp, pd);
+out:
+ return rc;
+}
+
+static bool bnxt_re_test_qp_limits(struct bnxt_re_dev *rdev,
+ struct ib_qp_init_attr *init_attr,
+ struct bnxt_qplib_dev_attr *dev_attr)
+{
+ bool rc = true;
+
+ if (init_attr->cap.max_send_wr > dev_attr->max_qp_wqes ||
+ init_attr->cap.max_recv_wr > dev_attr->max_qp_wqes ||
+ init_attr->cap.max_send_sge > dev_attr->max_qp_sges ||
+ init_attr->cap.max_recv_sge > dev_attr->max_qp_sges ||
+ init_attr->cap.max_inline_data > dev_attr->max_inline_data) {
+ ibdev_err(&rdev->ibdev,
+ "Create QP failed - max exceeded! 0x%x/0x%x 0x%x/0x%x 0x%x/0x%x 0x%x/0x%x 0x%x/0x%x",
+ init_attr->cap.max_send_wr, dev_attr->max_qp_wqes,
+ init_attr->cap.max_recv_wr, dev_attr->max_qp_wqes,
+ init_attr->cap.max_send_sge, dev_attr->max_qp_sges,
+ init_attr->cap.max_recv_sge, dev_attr->max_qp_sges,
+ init_attr->cap.max_inline_data,
+ dev_attr->max_inline_data);
+ rc = false;
+ }
+ return rc;
+}
+
struct ib_qp *bnxt_re_create_qp(struct ib_pd *ib_pd,
struct ib_qp_init_attr *qp_init_attr,
struct ib_udata *udata)
@@ -989,197 +1333,60 @@ struct ib_qp *bnxt_re_create_qp(struct ib_pd *ib_pd,
struct bnxt_re_dev *rdev = pd->rdev;
struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
struct bnxt_re_qp *qp;
- struct bnxt_re_cq *cq;
- struct bnxt_re_srq *srq;
- int rc, entries;
+ int rc;
- if ((qp_init_attr->cap.max_send_wr > dev_attr->max_qp_wqes) ||
- (qp_init_attr->cap.max_recv_wr > dev_attr->max_qp_wqes) ||
- (qp_init_attr->cap.max_send_sge > dev_attr->max_qp_sges) ||
- (qp_init_attr->cap.max_recv_sge > dev_attr->max_qp_sges) ||
- (qp_init_attr->cap.max_inline_data > dev_attr->max_inline_data))
- return ERR_PTR(-EINVAL);
+ rc = bnxt_re_test_qp_limits(rdev, qp_init_attr, dev_attr);
+ if (!rc) {
+ rc = -EINVAL;
+ goto exit;
+ }
qp = kzalloc(sizeof(*qp), GFP_KERNEL);
- if (!qp)
- return ERR_PTR(-ENOMEM);
-
+ if (!qp) {
+ rc = -ENOMEM;
+ goto exit;
+ }
qp->rdev = rdev;
- ether_addr_copy(qp->qplib_qp.smac, rdev->netdev->dev_addr);
- qp->qplib_qp.pd = &pd->qplib_pd;
- qp->qplib_qp.qp_handle = (u64)(unsigned long)(&qp->qplib_qp);
- qp->qplib_qp.type = __from_ib_qp_type(qp_init_attr->qp_type);
-
- if (qp_init_attr->qp_type == IB_QPT_GSI &&
- bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx))
- qp->qplib_qp.type = CMDQ_CREATE_QP_TYPE_GSI;
- if (qp->qplib_qp.type == IB_QPT_MAX) {
- dev_err(rdev_to_dev(rdev), "QP type 0x%x not supported",
- qp->qplib_qp.type);
- rc = -EINVAL;
+ rc = bnxt_re_init_qp_attr(qp, pd, qp_init_attr, udata);
+ if (rc)
goto fail;
- }
-
- qp->qplib_qp.max_inline_data = qp_init_attr->cap.max_inline_data;
- qp->qplib_qp.sig_type = ((qp_init_attr->sq_sig_type ==
- IB_SIGNAL_ALL_WR) ? true : false);
-
- qp->qplib_qp.sq.max_sge = qp_init_attr->cap.max_send_sge;
- if (qp->qplib_qp.sq.max_sge > dev_attr->max_qp_sges)
- qp->qplib_qp.sq.max_sge = dev_attr->max_qp_sges;
-
- if (qp_init_attr->send_cq) {
- cq = container_of(qp_init_attr->send_cq, struct bnxt_re_cq,
- ib_cq);
- if (!cq) {
- dev_err(rdev_to_dev(rdev), "Send CQ not found");
- rc = -EINVAL;
- goto fail;
- }
- qp->qplib_qp.scq = &cq->qplib_cq;
- qp->scq = cq;
- }
-
- if (qp_init_attr->recv_cq) {
- cq = container_of(qp_init_attr->recv_cq, struct bnxt_re_cq,
- ib_cq);
- if (!cq) {
- dev_err(rdev_to_dev(rdev), "Receive CQ not found");
- rc = -EINVAL;
- goto fail;
- }
- qp->qplib_qp.rcq = &cq->qplib_cq;
- qp->rcq = cq;
- }
-
- if (qp_init_attr->srq) {
- srq = container_of(qp_init_attr->srq, struct bnxt_re_srq,
- ib_srq);
- if (!srq) {
- dev_err(rdev_to_dev(rdev), "SRQ not found");
- rc = -EINVAL;
- goto fail;
- }
- qp->qplib_qp.srq = &srq->qplib_srq;
- qp->qplib_qp.rq.max_wqe = 0;
- } else {
- /* Allocate 1 more than what's provided so posting max doesn't
- * mean empty
- */
- entries = roundup_pow_of_two(qp_init_attr->cap.max_recv_wr + 1);
- qp->qplib_qp.rq.max_wqe = min_t(u32, entries,
- dev_attr->max_qp_wqes + 1);
-
- qp->qplib_qp.rq.q_full_delta = qp->qplib_qp.rq.max_wqe -
- qp_init_attr->cap.max_recv_wr;
-
- qp->qplib_qp.rq.max_sge = qp_init_attr->cap.max_recv_sge;
- if (qp->qplib_qp.rq.max_sge > dev_attr->max_qp_sges)
- qp->qplib_qp.rq.max_sge = dev_attr->max_qp_sges;
- }
-
- qp->qplib_qp.mtu = ib_mtu_enum_to_int(iboe_get_mtu(rdev->netdev->mtu));
if (qp_init_attr->qp_type == IB_QPT_GSI &&
- !(bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx))) {
- /* Allocate 1 more than what's provided */
- entries = roundup_pow_of_two(qp_init_attr->cap.max_send_wr + 1);
- qp->qplib_qp.sq.max_wqe = min_t(u32, entries,
- dev_attr->max_qp_wqes + 1);
- qp->qplib_qp.sq.q_full_delta = qp->qplib_qp.sq.max_wqe -
- qp_init_attr->cap.max_send_wr;
- qp->qplib_qp.rq.max_sge = dev_attr->max_qp_sges;
- if (qp->qplib_qp.rq.max_sge > dev_attr->max_qp_sges)
- qp->qplib_qp.rq.max_sge = dev_attr->max_qp_sges;
- qp->qplib_qp.sq.max_sge++;
- if (qp->qplib_qp.sq.max_sge > dev_attr->max_qp_sges)
- qp->qplib_qp.sq.max_sge = dev_attr->max_qp_sges;
-
- qp->qplib_qp.rq_hdr_buf_size =
- BNXT_QPLIB_MAX_QP1_RQ_HDR_SIZE_V2;
-
- qp->qplib_qp.sq_hdr_buf_size =
- BNXT_QPLIB_MAX_QP1_SQ_HDR_SIZE_V2;
- qp->qplib_qp.dpi = &rdev->dpi_privileged;
- rc = bnxt_qplib_create_qp1(&rdev->qplib_res, &qp->qplib_qp);
- if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to create HW QP1");
+ !(bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx))) {
+ rc = bnxt_re_create_gsi_qp(qp, pd, qp_init_attr);
+ if (rc == -ENODEV)
+ goto qp_destroy;
+ if (rc)
goto fail;
- }
- /* Create a shadow QP to handle the QP1 traffic */
- rdev->qp1_sqp = bnxt_re_create_shadow_qp(pd, &rdev->qplib_res,
- &qp->qplib_qp);
- if (!rdev->qp1_sqp) {
- rc = -EINVAL;
- dev_err(rdev_to_dev(rdev),
- "Failed to create Shadow QP for QP1");
- goto qp_destroy;
- }
- rdev->sqp_ah = bnxt_re_create_shadow_qp_ah(pd, &rdev->qplib_res,
- &qp->qplib_qp);
- if (!rdev->sqp_ah) {
- bnxt_qplib_destroy_qp(&rdev->qplib_res,
- &rdev->qp1_sqp->qplib_qp);
- rc = -EINVAL;
- dev_err(rdev_to_dev(rdev),
- "Failed to create AH entry for ShadowQP");
- goto qp_destroy;
- }
-
} else {
- /* Allocate 128 + 1 more than what's provided */
- entries = roundup_pow_of_two(qp_init_attr->cap.max_send_wr +
- BNXT_QPLIB_RESERVED_QP_WRS + 1);
- qp->qplib_qp.sq.max_wqe = min_t(u32, entries,
- dev_attr->max_qp_wqes +
- BNXT_QPLIB_RESERVED_QP_WRS + 1);
- qp->qplib_qp.sq.q_full_delta = BNXT_QPLIB_RESERVED_QP_WRS + 1;
-
- /*
- * Reserving one slot for Phantom WQE. Application can
- * post one extra entry in this case. But allowing this to avoid
- * unexpected Queue full condition
- */
-
- qp->qplib_qp.sq.q_full_delta -= 1;
-
- qp->qplib_qp.max_rd_atomic = dev_attr->max_qp_rd_atom;
- qp->qplib_qp.max_dest_rd_atomic = dev_attr->max_qp_init_rd_atom;
- if (udata) {
- rc = bnxt_re_init_user_qp(rdev, pd, qp, udata);
- if (rc)
- goto fail;
- } else {
- qp->qplib_qp.dpi = &rdev->dpi_privileged;
- }
-
rc = bnxt_qplib_create_qp(&rdev->qplib_res, &qp->qplib_qp);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to create HW QP");
+ ibdev_err(&rdev->ibdev, "Failed to create HW QP");
goto free_umem;
}
+ if (udata) {
+ struct bnxt_re_qp_resp resp;
+
+ resp.qpid = qp->qplib_qp.id;
+ resp.rsvd = 0;
+ rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
+ if (rc) {
+ ibdev_err(&rdev->ibdev, "Failed to copy QP udata");
+ goto qp_destroy;
+ }
+ }
}
qp->ib_qp.qp_num = qp->qplib_qp.id;
+ if (qp_init_attr->qp_type == IB_QPT_GSI)
+ rdev->gsi_ctx.gsi_qp = qp;
spin_lock_init(&qp->sq_lock);
spin_lock_init(&qp->rq_lock);
-
- if (udata) {
- struct bnxt_re_qp_resp resp;
-
- resp.qpid = qp->ib_qp.qp_num;
- resp.rsvd = 0;
- rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
- if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to copy QP udata");
- goto qp_destroy;
- }
- }
INIT_LIST_HEAD(&qp->list);
mutex_lock(&rdev->qp_lock);
list_add_tail(&qp->list, &rdev->qp_list);
- atomic_inc(&rdev->qp_count);
mutex_unlock(&rdev->qp_lock);
+ atomic_inc(&rdev->qp_count);
return &qp->ib_qp;
qp_destroy:
@@ -1189,6 +1396,7 @@ struct ib_qp *bnxt_re_create_qp(struct ib_pd *ib_pd,
ib_umem_release(qp->sumem);
fail:
kfree(qp);
+exit:
return ERR_PTR(rc);
}
@@ -1311,9 +1519,11 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev,
return PTR_ERR(umem);
srq->umem = umem;
- qplib_srq->sg_info.sglist = umem->sg_head.sgl;
+ qplib_srq->sg_info.sghead = umem->sg_head.sgl;
qplib_srq->sg_info.npages = ib_umem_num_pages(umem);
qplib_srq->sg_info.nmap = umem->nmap;
+ qplib_srq->sg_info.pgsize = PAGE_SIZE;
+ qplib_srq->sg_info.pgshft = PAGE_SHIFT;
qplib_srq->srq_handle = ureq.srq_handle;
qplib_srq->dpi = &cntx->dpi;
@@ -1334,7 +1544,7 @@ int bnxt_re_create_srq(struct ib_srq *ib_srq,
int rc, entries;
if (srq_init_attr->attr.max_wr >= dev_attr->max_srq_wqes) {
- dev_err(rdev_to_dev(rdev), "Create CQ failed - max exceeded");
+ ibdev_err(&rdev->ibdev, "Create CQ failed - max exceeded");
rc = -EINVAL;
goto exit;
}
@@ -1369,7 +1579,7 @@ int bnxt_re_create_srq(struct ib_srq *ib_srq,
rc = bnxt_qplib_create_srq(&rdev->qplib_res, &srq->qplib_srq);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Create HW SRQ failed!");
+ ibdev_err(&rdev->ibdev, "Create HW SRQ failed!");
goto fail;
}
@@ -1379,7 +1589,7 @@ int bnxt_re_create_srq(struct ib_srq *ib_srq,
resp.srqid = srq->qplib_srq.id;
rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
if (rc) {
- dev_err(rdev_to_dev(rdev), "SRQ copy to udata failed!");
+ ibdev_err(&rdev->ibdev, "SRQ copy to udata failed!");
bnxt_qplib_destroy_srq(&rdev->qplib_res,
&srq->qplib_srq);
goto fail;
@@ -1418,7 +1628,7 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr,
srq->qplib_srq.threshold = srq_attr->srq_limit;
rc = bnxt_qplib_modify_srq(&rdev->qplib_res, &srq->qplib_srq);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Modify HW SRQ failed!");
+ ibdev_err(&rdev->ibdev, "Modify HW SRQ failed!");
return rc;
}
/* On success, update the shadow */
@@ -1426,8 +1636,8 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr,
/* No need to Build and send response back to udata */
break;
default:
- dev_err(rdev_to_dev(rdev),
- "Unsupported srq_attr_mask 0x%x", srq_attr_mask);
+ ibdev_err(&rdev->ibdev,
+ "Unsupported srq_attr_mask 0x%x", srq_attr_mask);
return -EINVAL;
}
return 0;
@@ -1445,7 +1655,7 @@ int bnxt_re_query_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr)
tsrq.qplib_srq.id = srq->qplib_srq.id;
rc = bnxt_qplib_query_srq(&rdev->qplib_res, &tsrq.qplib_srq);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Query HW SRQ failed!");
+ ibdev_err(&rdev->ibdev, "Query HW SRQ failed!");
return rc;
}
srq_attr->max_wr = srq->qplib_srq.max_wqe;
@@ -1487,7 +1697,7 @@ static int bnxt_re_modify_shadow_qp(struct bnxt_re_dev *rdev,
struct bnxt_re_qp *qp1_qp,
int qp_attr_mask)
{
- struct bnxt_re_qp *qp = rdev->qp1_sqp;
+ struct bnxt_re_qp *qp = rdev->gsi_ctx.gsi_sqp;
int rc = 0;
if (qp_attr_mask & IB_QP_STATE) {
@@ -1511,8 +1721,7 @@ static int bnxt_re_modify_shadow_qp(struct bnxt_re_dev *rdev,
rc = bnxt_qplib_modify_qp(&rdev->qplib_res, &qp->qplib_qp);
if (rc)
- dev_err(rdev_to_dev(rdev),
- "Failed to modify Shadow QP for QP1");
+ ibdev_err(&rdev->ibdev, "Failed to modify Shadow QP for QP1");
return rc;
}
@@ -1533,15 +1742,15 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
new_qp_state = qp_attr->qp_state;
if (!ib_modify_qp_is_ok(curr_qp_state, new_qp_state,
ib_qp->qp_type, qp_attr_mask)) {
- dev_err(rdev_to_dev(rdev),
- "Invalid attribute mask: %#x specified ",
- qp_attr_mask);
- dev_err(rdev_to_dev(rdev),
- "for qpn: %#x type: %#x",
- ib_qp->qp_num, ib_qp->qp_type);
- dev_err(rdev_to_dev(rdev),
- "curr_qp_state=0x%x, new_qp_state=0x%x\n",
- curr_qp_state, new_qp_state);
+ ibdev_err(&rdev->ibdev,
+ "Invalid attribute mask: %#x specified ",
+ qp_attr_mask);
+ ibdev_err(&rdev->ibdev,
+ "for qpn: %#x type: %#x",
+ ib_qp->qp_num, ib_qp->qp_type);
+ ibdev_err(&rdev->ibdev,
+ "curr_qp_state=0x%x, new_qp_state=0x%x\n",
+ curr_qp_state, new_qp_state);
return -EINVAL;
}
qp->qplib_qp.modify_flags |= CMDQ_MODIFY_QP_MODIFY_MASK_STATE;
@@ -1549,18 +1758,16 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
if (!qp->sumem &&
qp->qplib_qp.state == CMDQ_MODIFY_QP_NEW_STATE_ERR) {
- dev_dbg(rdev_to_dev(rdev),
- "Move QP = %p to flush list\n",
- qp);
+ ibdev_dbg(&rdev->ibdev,
+ "Move QP = %p to flush list\n", qp);
flags = bnxt_re_lock_cqs(qp);
bnxt_qplib_add_flush_qp(&qp->qplib_qp);
bnxt_re_unlock_cqs(qp, flags);
}
if (!qp->sumem &&
qp->qplib_qp.state == CMDQ_MODIFY_QP_NEW_STATE_RESET) {
- dev_dbg(rdev_to_dev(rdev),
- "Move QP = %p out of flush list\n",
- qp);
+ ibdev_dbg(&rdev->ibdev,
+ "Move QP = %p out of flush list\n", qp);
flags = bnxt_re_lock_cqs(qp);
bnxt_qplib_clean_qp(&qp->qplib_qp);
bnxt_re_unlock_cqs(qp, flags);
@@ -1593,6 +1800,7 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
const struct ib_global_route *grh =
rdma_ah_read_grh(&qp_attr->ah_attr);
const struct ib_gid_attr *sgid_attr;
+ struct bnxt_re_gid_ctx *ctx;
qp->qplib_qp.modify_flags |= CMDQ_MODIFY_QP_MODIFY_MASK_DGID |
CMDQ_MODIFY_QP_MODIFY_MASK_FLOW_LABEL |
@@ -1604,11 +1812,12 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
memcpy(qp->qplib_qp.ah.dgid.data, grh->dgid.raw,
sizeof(qp->qplib_qp.ah.dgid.data));
qp->qplib_qp.ah.flow_label = grh->flow_label;
- /* If RoCE V2 is enabled, stack will have two entries for
- * each GID entry. Avoiding this duplicte entry in HW. Dividing
- * the GID index by 2 for RoCE V2
+ sgid_attr = grh->sgid_attr;
+ /* Get the HW context of the GID. The reference
+ * of GID table entry is already taken by the caller.
*/
- qp->qplib_qp.ah.sgid_index = grh->sgid_index / 2;
+ ctx = rdma_read_gid_hw_context(sgid_attr);
+ qp->qplib_qp.ah.sgid_index = ctx->idx;
qp->qplib_qp.ah.host_sgid_index = grh->sgid_index;
qp->qplib_qp.ah.hop_limit = grh->hop_limit;
qp->qplib_qp.ah.traffic_class = grh->traffic_class;
@@ -1616,7 +1825,6 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
ether_addr_copy(qp->qplib_qp.ah.dmac,
qp_attr->ah_attr.roce.dmac);
- sgid_attr = qp_attr->ah_attr.grh.sgid_attr;
rc = rdma_read_gid_l2_fields(sgid_attr, NULL,
&qp->qplib_qp.smac[0]);
if (rc)
@@ -1690,10 +1898,10 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
if (qp_attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) {
if (qp_attr->max_dest_rd_atomic >
dev_attr->max_qp_init_rd_atom) {
- dev_err(rdev_to_dev(rdev),
- "max_dest_rd_atomic requested%d is > dev_max%d",
- qp_attr->max_dest_rd_atomic,
- dev_attr->max_qp_init_rd_atom);
+ ibdev_err(&rdev->ibdev,
+ "max_dest_rd_atomic requested%d is > dev_max%d",
+ qp_attr->max_dest_rd_atomic,
+ dev_attr->max_qp_init_rd_atom);
return -EINVAL;
}
@@ -1714,8 +1922,8 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
(qp_attr->cap.max_recv_sge >= dev_attr->max_qp_sges) ||
(qp_attr->cap.max_inline_data >=
dev_attr->max_inline_data)) {
- dev_err(rdev_to_dev(rdev),
- "Create QP failed - max exceeded");
+ ibdev_err(&rdev->ibdev,
+ "Create QP failed - max exceeded");
return -EINVAL;
}
entries = roundup_pow_of_two(qp_attr->cap.max_send_wr);
@@ -1748,10 +1956,10 @@ int bnxt_re_modify_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
}
rc = bnxt_qplib_modify_qp(&rdev->qplib_res, &qp->qplib_qp);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to modify HW QP");
+ ibdev_err(&rdev->ibdev, "Failed to modify HW QP");
return rc;
}
- if (ib_qp->qp_type == IB_QPT_GSI && rdev->qp1_sqp)
+ if (ib_qp->qp_type == IB_QPT_GSI && rdev->gsi_ctx.gsi_sqp)
rc = bnxt_re_modify_shadow_qp(rdev, qp, qp_attr_mask);
return rc;
}
@@ -1773,7 +1981,7 @@ int bnxt_re_query_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr,
rc = bnxt_qplib_query_qp(&rdev->qplib_res, qplib_qp);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to query HW QP");
+ ibdev_err(&rdev->ibdev, "Failed to query HW QP");
goto out;
}
qp_attr->qp_state = __to_ib_qp_state(qplib_qp->state);
@@ -1978,7 +2186,7 @@ static int bnxt_re_build_qp1_send_v2(struct bnxt_re_qp *qp,
wqe->num_sge++;
} else {
- dev_err(rdev_to_dev(qp->rdev), "QP1 buffer is empty!");
+ ibdev_err(&qp->rdev->ibdev, "QP1 buffer is empty!");
rc = -ENOMEM;
}
return rc;
@@ -1995,9 +2203,12 @@ static int bnxt_re_build_qp1_shadow_qp_recv(struct bnxt_re_qp *qp,
struct bnxt_qplib_swqe *wqe,
int payload_size)
{
- struct bnxt_qplib_sge ref, sge;
- u32 rq_prod_index;
struct bnxt_re_sqp_entries *sqp_entry;
+ struct bnxt_qplib_sge ref, sge;
+ struct bnxt_re_dev *rdev;
+ u32 rq_prod_index;
+
+ rdev = qp->rdev;
rq_prod_index = bnxt_qplib_get_rq_prod_index(&qp->qplib_qp);
@@ -2012,7 +2223,7 @@ static int bnxt_re_build_qp1_shadow_qp_recv(struct bnxt_re_qp *qp,
ref.lkey = wqe->sg_list[0].lkey;
ref.size = wqe->sg_list[0].size;
- sqp_entry = &qp->rdev->sqp_tbl[rq_prod_index];
+ sqp_entry = &rdev->gsi_ctx.sqp_tbl[rq_prod_index];
/* SGE 1 */
wqe->sg_list[0].addr = sge.addr;
@@ -2164,7 +2375,7 @@ static int bnxt_re_build_reg_wqe(const struct ib_reg_wr *wr,
wqe->frmr.pbl_dma_ptr = qplib_frpl->hwq.pbl_dma_ptr[0];
wqe->frmr.page_list = mr->pages;
wqe->frmr.page_list_len = mr->npages;
- wqe->frmr.levels = qplib_frpl->hwq.level + 1;
+ wqe->frmr.levels = qplib_frpl->hwq.level;
wqe->type = BNXT_QPLIB_SWQE_TYPE_REG_MR;
/* Need unconditional fence for reg_mr
@@ -2211,8 +2422,8 @@ static int bnxt_re_copy_inline_data(struct bnxt_re_dev *rdev,
if ((sge_len + wqe->inline_len) >
BNXT_QPLIB_SWQE_MAX_INLINE_LENGTH) {
- dev_err(rdev_to_dev(rdev),
- "Inline data size requested > supported value");
+ ibdev_err(&rdev->ibdev,
+ "Inline data size requested > supported value");
return -EINVAL;
}
sge_len = wr->sg_list[i].length;
@@ -2259,21 +2470,18 @@ static int bnxt_re_post_send_shadow_qp(struct bnxt_re_dev *rdev,
struct bnxt_re_qp *qp,
const struct ib_send_wr *wr)
{
- struct bnxt_qplib_swqe wqe;
int rc = 0, payload_sz = 0;
unsigned long flags;
spin_lock_irqsave(&qp->sq_lock, flags);
- memset(&wqe, 0, sizeof(wqe));
while (wr) {
- /* House keeping */
- memset(&wqe, 0, sizeof(wqe));
+ struct bnxt_qplib_swqe wqe = {};
/* Common */
wqe.num_sge = wr->num_sge;
if (wr->num_sge > qp->qplib_qp.sq.max_sge) {
- dev_err(rdev_to_dev(rdev),
- "Limit exceeded for Send SGEs");
+ ibdev_err(&rdev->ibdev,
+ "Limit exceeded for Send SGEs");
rc = -EINVAL;
goto bad;
}
@@ -2292,9 +2500,9 @@ static int bnxt_re_post_send_shadow_qp(struct bnxt_re_dev *rdev,
rc = bnxt_qplib_post_send(&qp->qplib_qp, &wqe);
bad:
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Post send failed opcode = %#x rc = %d",
- wr->opcode, rc);
+ ibdev_err(&rdev->ibdev,
+ "Post send failed opcode = %#x rc = %d",
+ wr->opcode, rc);
break;
}
wr = wr->next;
@@ -2321,8 +2529,8 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
/* Common */
wqe.num_sge = wr->num_sge;
if (wr->num_sge > qp->qplib_qp.sq.max_sge) {
- dev_err(rdev_to_dev(qp->rdev),
- "Limit exceeded for Send SGEs");
+ ibdev_err(&qp->rdev->ibdev,
+ "Limit exceeded for Send SGEs");
rc = -EINVAL;
goto bad;
}
@@ -2367,8 +2575,8 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
rc = bnxt_re_build_atomic_wqe(wr, &wqe);
break;
case IB_WR_RDMA_READ_WITH_INV:
- dev_err(rdev_to_dev(qp->rdev),
- "RDMA Read with Invalidate is not supported");
+ ibdev_err(&qp->rdev->ibdev,
+ "RDMA Read with Invalidate is not supported");
rc = -EINVAL;
goto bad;
case IB_WR_LOCAL_INV:
@@ -2379,8 +2587,8 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
break;
default:
/* Unsupported WRs */
- dev_err(rdev_to_dev(qp->rdev),
- "WR (%#x) is not supported", wr->opcode);
+ ibdev_err(&qp->rdev->ibdev,
+ "WR (%#x) is not supported", wr->opcode);
rc = -EINVAL;
goto bad;
}
@@ -2388,9 +2596,9 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
rc = bnxt_qplib_post_send(&qp->qplib_qp, &wqe);
bad:
if (rc) {
- dev_err(rdev_to_dev(qp->rdev),
- "post_send failed op:%#x qps = %#x rc = %d\n",
- wr->opcode, qp->qplib_qp.state, rc);
+ ibdev_err(&qp->rdev->ibdev,
+ "post_send failed op:%#x qps = %#x rc = %d\n",
+ wr->opcode, qp->qplib_qp.state, rc);
*bad_wr = wr;
break;
}
@@ -2418,8 +2626,8 @@ static int bnxt_re_post_recv_shadow_qp(struct bnxt_re_dev *rdev,
/* Common */
wqe.num_sge = wr->num_sge;
if (wr->num_sge > qp->qplib_qp.rq.max_sge) {
- dev_err(rdev_to_dev(rdev),
- "Limit exceeded for Receive SGEs");
+ ibdev_err(&rdev->ibdev,
+ "Limit exceeded for Receive SGEs");
rc = -EINVAL;
break;
}
@@ -2455,8 +2663,8 @@ int bnxt_re_post_recv(struct ib_qp *ib_qp, const struct ib_recv_wr *wr,
/* Common */
wqe.num_sge = wr->num_sge;
if (wr->num_sge > qp->qplib_qp.rq.max_sge) {
- dev_err(rdev_to_dev(qp->rdev),
- "Limit exceeded for Receive SGEs");
+ ibdev_err(&qp->rdev->ibdev,
+ "Limit exceeded for Receive SGEs");
rc = -EINVAL;
*bad_wr = wr;
break;
@@ -2527,7 +2735,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
/* Validate CQ fields */
if (cqe < 1 || cqe > dev_attr->max_cq_wqes) {
- dev_err(rdev_to_dev(rdev), "Failed to create CQ -max exceeded");
+ ibdev_err(&rdev->ibdev, "Failed to create CQ -max exceeded");
return -EINVAL;
}
@@ -2538,6 +2746,8 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
if (entries > dev_attr->max_cq_wqes + 1)
entries = dev_attr->max_cq_wqes + 1;
+ cq->qplib_cq.sg_info.pgsize = PAGE_SIZE;
+ cq->qplib_cq.sg_info.pgshft = PAGE_SHIFT;
if (udata) {
struct bnxt_re_cq_req req;
struct bnxt_re_ucontext *uctx = rdma_udata_to_drv_context(
@@ -2554,7 +2764,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
rc = PTR_ERR(cq->umem);
goto fail;
}
- cq->qplib_cq.sg_info.sglist = cq->umem->sg_head.sgl;
+ cq->qplib_cq.sg_info.sghead = cq->umem->sg_head.sgl;
cq->qplib_cq.sg_info.npages = ib_umem_num_pages(cq->umem);
cq->qplib_cq.sg_info.nmap = cq->umem->nmap;
cq->qplib_cq.dpi = &uctx->dpi;
@@ -2581,7 +2791,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
rc = bnxt_qplib_create_cq(&rdev->qplib_res, &cq->qplib_cq);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to create HW CQ");
+ ibdev_err(&rdev->ibdev, "Failed to create HW CQ");
goto fail;
}
@@ -2601,7 +2811,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
resp.rsvd = 0;
rc = ib_copy_to_udata(udata, &resp, sizeof(resp));
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to copy CQ udata");
+ ibdev_err(&rdev->ibdev, "Failed to copy CQ udata");
bnxt_qplib_destroy_cq(&rdev->qplib_res, &cq->qplib_cq);
goto c2fail;
}
@@ -2832,12 +3042,13 @@ static bool bnxt_re_is_loopback_packet(struct bnxt_re_dev *rdev,
return rc;
}
-static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *qp1_qp,
+static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *gsi_qp,
struct bnxt_qplib_cqe *cqe)
{
- struct bnxt_re_dev *rdev = qp1_qp->rdev;
+ struct bnxt_re_dev *rdev = gsi_qp->rdev;
struct bnxt_re_sqp_entries *sqp_entry = NULL;
- struct bnxt_re_qp *qp = rdev->qp1_sqp;
+ struct bnxt_re_qp *gsi_sqp = rdev->gsi_ctx.gsi_sqp;
+ struct bnxt_re_ah *gsi_sah;
struct ib_send_wr *swr;
struct ib_ud_wr udwr;
struct ib_recv_wr rwr;
@@ -2860,26 +3071,26 @@ static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *qp1_qp,
swr = &udwr.wr;
tbl_idx = cqe->wr_id;
- rq_hdr_buf = qp1_qp->qplib_qp.rq_hdr_buf +
- (tbl_idx * qp1_qp->qplib_qp.rq_hdr_buf_size);
- rq_hdr_buf_map = bnxt_qplib_get_qp_buf_from_index(&qp1_qp->qplib_qp,
+ rq_hdr_buf = gsi_qp->qplib_qp.rq_hdr_buf +
+ (tbl_idx * gsi_qp->qplib_qp.rq_hdr_buf_size);
+ rq_hdr_buf_map = bnxt_qplib_get_qp_buf_from_index(&gsi_qp->qplib_qp,
tbl_idx);
/* Shadow QP header buffer */
- shrq_hdr_buf_map = bnxt_qplib_get_qp_buf_from_index(&qp->qplib_qp,
+ shrq_hdr_buf_map = bnxt_qplib_get_qp_buf_from_index(&gsi_qp->qplib_qp,
tbl_idx);
- sqp_entry = &rdev->sqp_tbl[tbl_idx];
+ sqp_entry = &rdev->gsi_ctx.sqp_tbl[tbl_idx];
/* Store this cqe */
memcpy(&sqp_entry->cqe, cqe, sizeof(struct bnxt_qplib_cqe));
- sqp_entry->qp1_qp = qp1_qp;
+ sqp_entry->qp1_qp = gsi_qp;
/* Find packet type from the cqe */
pkt_type = bnxt_re_check_packet_type(cqe->raweth_qp1_flags,
cqe->raweth_qp1_flags2);
if (pkt_type < 0) {
- dev_err(rdev_to_dev(rdev), "Invalid packet\n");
+ ibdev_err(&rdev->ibdev, "Invalid packet\n");
return -EINVAL;
}
@@ -2926,10 +3137,10 @@ static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *qp1_qp,
rwr.wr_id = tbl_idx;
rwr.next = NULL;
- rc = bnxt_re_post_recv_shadow_qp(rdev, qp, &rwr);
+ rc = bnxt_re_post_recv_shadow_qp(rdev, gsi_sqp, &rwr);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to post Rx buffers to shadow QP");
+ ibdev_err(&rdev->ibdev,
+ "Failed to post Rx buffers to shadow QP");
return -ENOMEM;
}
@@ -2938,13 +3149,13 @@ static int bnxt_re_process_raw_qp_pkt_rx(struct bnxt_re_qp *qp1_qp,
swr->wr_id = tbl_idx;
swr->opcode = IB_WR_SEND;
swr->next = NULL;
-
- udwr.ah = &rdev->sqp_ah->ib_ah;
- udwr.remote_qpn = rdev->qp1_sqp->qplib_qp.id;
- udwr.remote_qkey = rdev->qp1_sqp->qplib_qp.qkey;
+ gsi_sah = rdev->gsi_ctx.gsi_sah;
+ udwr.ah = &gsi_sah->ib_ah;
+ udwr.remote_qpn = gsi_sqp->qplib_qp.id;
+ udwr.remote_qkey = gsi_sqp->qplib_qp.qkey;
/* post data received in the send queue */
- rc = bnxt_re_post_send_shadow_qp(rdev, qp, swr);
+ rc = bnxt_re_post_send_shadow_qp(rdev, gsi_sqp, swr);
return 0;
}
@@ -2998,12 +3209,12 @@ static void bnxt_re_process_res_rc_wc(struct ib_wc *wc,
wc->opcode = IB_WC_RECV_RDMA_WITH_IMM;
}
-static void bnxt_re_process_res_shadow_qp_wc(struct bnxt_re_qp *qp,
+static void bnxt_re_process_res_shadow_qp_wc(struct bnxt_re_qp *gsi_sqp,
struct ib_wc *wc,
struct bnxt_qplib_cqe *cqe)
{
- struct bnxt_re_dev *rdev = qp->rdev;
- struct bnxt_re_qp *qp1_qp = NULL;
+ struct bnxt_re_dev *rdev = gsi_sqp->rdev;
+ struct bnxt_re_qp *gsi_qp = NULL;
struct bnxt_qplib_cqe *orig_cqe = NULL;
struct bnxt_re_sqp_entries *sqp_entry = NULL;
int nw_type;
@@ -3013,13 +3224,13 @@ static void bnxt_re_process_res_shadow_qp_wc(struct bnxt_re_qp *qp,
tbl_idx = cqe->wr_id;
- sqp_entry = &rdev->sqp_tbl[tbl_idx];
- qp1_qp = sqp_entry->qp1_qp;
+ sqp_entry = &rdev->gsi_ctx.sqp_tbl[tbl_idx];
+ gsi_qp = sqp_entry->qp1_qp;
orig_cqe = &sqp_entry->cqe;
wc->wr_id = sqp_entry->wrid;
wc->byte_len = orig_cqe->length;
- wc->qp = &qp1_qp->ib_qp;
+ wc->qp = &gsi_qp->ib_qp;
wc->ex.imm_data = orig_cqe->immdata;
wc->src_qp = orig_cqe->src_qp;
@@ -3084,11 +3295,11 @@ static int send_phantom_wqe(struct bnxt_re_qp *qp)
rc = bnxt_re_bind_fence_mw(lib_qp);
if (!rc) {
lib_qp->sq.phantom_wqe_cnt++;
- dev_dbg(&lib_qp->sq.hwq.pdev->dev,
- "qp %#x sq->prod %#x sw_prod %#x phantom_wqe_cnt %d\n",
- lib_qp->id, lib_qp->sq.hwq.prod,
- HWQ_CMP(lib_qp->sq.hwq.prod, &lib_qp->sq.hwq),
- lib_qp->sq.phantom_wqe_cnt);
+ ibdev_dbg(&qp->rdev->ibdev,
+ "qp %#x sq->prod %#x sw_prod %#x phantom_wqe_cnt %d\n",
+ lib_qp->id, lib_qp->sq.hwq.prod,
+ HWQ_CMP(lib_qp->sq.hwq.prod, &lib_qp->sq.hwq),
+ lib_qp->sq.phantom_wqe_cnt);
}
spin_unlock_irqrestore(&qp->sq_lock, flags);
@@ -3098,7 +3309,7 @@ static int send_phantom_wqe(struct bnxt_re_qp *qp)
int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
{
struct bnxt_re_cq *cq = container_of(ib_cq, struct bnxt_re_cq, ib_cq);
- struct bnxt_re_qp *qp;
+ struct bnxt_re_qp *qp, *sh_qp;
struct bnxt_qplib_cqe *cqe;
int i, ncqe, budget;
struct bnxt_qplib_q *sq;
@@ -3111,7 +3322,7 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
budget = min_t(u32, num_entries, cq->max_cql);
num_entries = budget;
if (!cq->cql) {
- dev_err(rdev_to_dev(cq->rdev), "POLL CQ : no CQL to use");
+ ibdev_err(&cq->rdev->ibdev, "POLL CQ : no CQL to use");
goto exit;
}
cqe = &cq->cql[0];
@@ -3124,8 +3335,8 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
qp = container_of(lib_qp,
struct bnxt_re_qp, qplib_qp);
if (send_phantom_wqe(qp) == -ENOMEM)
- dev_err(rdev_to_dev(cq->rdev),
- "Phantom failed! Scheduled to send again\n");
+ ibdev_err(&cq->rdev->ibdev,
+ "Phantom failed! Scheduled to send again\n");
else
sq->send_phantom = false;
}
@@ -3149,8 +3360,7 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
(unsigned long)(cqe->qp_handle),
struct bnxt_re_qp, qplib_qp);
if (!qp) {
- dev_err(rdev_to_dev(cq->rdev),
- "POLL CQ : bad QP handle");
+ ibdev_err(&cq->rdev->ibdev, "POLL CQ : bad QP handle");
continue;
}
wc->qp = &qp->ib_qp;
@@ -3162,8 +3372,9 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
switch (cqe->opcode) {
case CQ_BASE_CQE_TYPE_REQ:
- if (qp->rdev->qp1_sqp && qp->qplib_qp.id ==
- qp->rdev->qp1_sqp->qplib_qp.id) {
+ sh_qp = qp->rdev->gsi_ctx.gsi_sqp;
+ if (sh_qp &&
+ qp->qplib_qp.id == sh_qp->qplib_qp.id) {
/* Handle this completion with
* the stored completion
*/
@@ -3189,7 +3400,7 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
* stored in the table
*/
tbl_idx = cqe->wr_id;
- sqp_entry = &cq->rdev->sqp_tbl[tbl_idx];
+ sqp_entry = &cq->rdev->gsi_ctx.sqp_tbl[tbl_idx];
wc->wr_id = sqp_entry->wrid;
bnxt_re_process_res_rawqp1_wc(wc, cqe);
break;
@@ -3197,8 +3408,9 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
bnxt_re_process_res_rc_wc(wc, cqe);
break;
case CQ_BASE_CQE_TYPE_RES_UD:
- if (qp->rdev->qp1_sqp && qp->qplib_qp.id ==
- qp->rdev->qp1_sqp->qplib_qp.id) {
+ sh_qp = qp->rdev->gsi_ctx.gsi_sqp;
+ if (sh_qp &&
+ qp->qplib_qp.id == sh_qp->qplib_qp.id) {
/* Handle this completion with
* the stored completion
*/
@@ -3213,9 +3425,9 @@ int bnxt_re_poll_cq(struct ib_cq *ib_cq, int num_entries, struct ib_wc *wc)
bnxt_re_process_res_ud_wc(qp, wc, cqe);
break;
default:
- dev_err(rdev_to_dev(cq->rdev),
- "POLL CQ : type 0x%x not handled",
- cqe->opcode);
+ ibdev_err(&cq->rdev->ibdev,
+ "POLL CQ : type 0x%x not handled",
+ cqe->opcode);
continue;
}
wc++;
@@ -3308,7 +3520,7 @@ int bnxt_re_dereg_mr(struct ib_mr *ib_mr, struct ib_udata *udata)
rc = bnxt_qplib_free_mrw(&rdev->qplib_res, &mr->qplib_mr);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Dereg MR failed: %#x\n", rc);
+ ibdev_err(&rdev->ibdev, "Dereg MR failed: %#x\n", rc);
return rc;
}
@@ -3355,7 +3567,7 @@ struct ib_mr *bnxt_re_alloc_mr(struct ib_pd *ib_pd, enum ib_mr_type type,
int rc;
if (type != IB_MR_TYPE_MEM_REG) {
- dev_dbg(rdev_to_dev(rdev), "MR type 0x%x not supported", type);
+ ibdev_dbg(&rdev->ibdev, "MR type 0x%x not supported", type);
return ERR_PTR(-EINVAL);
}
if (max_num_sg > MAX_PBL_LVL_1_PGS)
@@ -3385,8 +3597,8 @@ struct ib_mr *bnxt_re_alloc_mr(struct ib_pd *ib_pd, enum ib_mr_type type,
rc = bnxt_qplib_alloc_fast_reg_page_list(&rdev->qplib_res,
&mr->qplib_frpl, max_num_sg);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to allocate HW FR page list");
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate HW FR page list");
goto fail_mr;
}
@@ -3421,7 +3633,7 @@ struct ib_mw *bnxt_re_alloc_mw(struct ib_pd *ib_pd, enum ib_mw_type type,
CMDQ_ALLOCATE_MRW_MRW_FLAGS_MW_TYPE2B);
rc = bnxt_qplib_alloc_mrw(&rdev->qplib_res, &mw->qplib_mw);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Allocate MW failed!");
+ ibdev_err(&rdev->ibdev, "Allocate MW failed!");
goto fail;
}
mw->ib_mw.rkey = mw->qplib_mw.rkey;
@@ -3442,7 +3654,7 @@ int bnxt_re_dealloc_mw(struct ib_mw *ib_mw)
rc = bnxt_qplib_free_mrw(&rdev->qplib_res, &mw->qplib_mw);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Free MW failed: %#x\n", rc);
+ ibdev_err(&rdev->ibdev, "Free MW failed: %#x\n", rc);
return rc;
}
@@ -3494,8 +3706,8 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
int umem_pgs, page_shift, rc;
if (length > BNXT_RE_MAX_MR_SIZE) {
- dev_err(rdev_to_dev(rdev), "MR Size: %lld > Max supported:%lld\n",
- length, BNXT_RE_MAX_MR_SIZE);
+ ibdev_err(&rdev->ibdev, "MR Size: %lld > Max supported:%lld\n",
+ length, BNXT_RE_MAX_MR_SIZE);
return ERR_PTR(-ENOMEM);
}
@@ -3510,7 +3722,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
rc = bnxt_qplib_alloc_mrw(&rdev->qplib_res, &mr->qplib_mr);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to allocate MR");
+ ibdev_err(&rdev->ibdev, "Failed to allocate MR");
goto free_mr;
}
/* The fixed portion of the rkey is the same as the lkey */
@@ -3518,7 +3730,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
umem = ib_umem_get(&rdev->ibdev, start, length, mr_access_flags);
if (IS_ERR(umem)) {
- dev_err(rdev_to_dev(rdev), "Failed to get umem");
+ ibdev_err(&rdev->ibdev, "Failed to get umem");
rc = -EFAULT;
goto free_mrw;
}
@@ -3527,7 +3739,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
mr->qplib_mr.va = virt_addr;
umem_pgs = ib_umem_page_count(umem);
if (!umem_pgs) {
- dev_err(rdev_to_dev(rdev), "umem is invalid!");
+ ibdev_err(&rdev->ibdev, "umem is invalid!");
rc = -EINVAL;
goto free_umem;
}
@@ -3544,15 +3756,15 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
virt_addr));
if (!bnxt_re_page_size_ok(page_shift)) {
- dev_err(rdev_to_dev(rdev), "umem page size unsupported!");
+ ibdev_err(&rdev->ibdev, "umem page size unsupported!");
rc = -EFAULT;
goto fail;
}
if (page_shift == BNXT_RE_PAGE_SHIFT_4K &&
length > BNXT_RE_MAX_MR_SIZE_LOW) {
- dev_err(rdev_to_dev(rdev), "Requested MR Sz:%llu Max sup:%llu",
- length, (u64)BNXT_RE_MAX_MR_SIZE_LOW);
+ ibdev_err(&rdev->ibdev, "Requested MR Sz:%llu Max sup:%llu",
+ length, (u64)BNXT_RE_MAX_MR_SIZE_LOW);
rc = -EINVAL;
goto fail;
}
@@ -3562,7 +3774,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
rc = bnxt_qplib_reg_mr(&rdev->qplib_res, &mr->qplib_mr, pbl_tbl,
umem_pgs, false, 1 << page_shift);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to register user MR");
+ ibdev_err(&rdev->ibdev, "Failed to register user MR");
goto fail;
}
@@ -3595,12 +3807,11 @@ int bnxt_re_alloc_ucontext(struct ib_ucontext *ctx, struct ib_udata *udata)
u32 chip_met_rev_num = 0;
int rc;
- dev_dbg(rdev_to_dev(rdev), "ABI version requested %u",
- ibdev->ops.uverbs_abi_ver);
+ ibdev_dbg(ibdev, "ABI version requested %u", ibdev->ops.uverbs_abi_ver);
if (ibdev->ops.uverbs_abi_ver != BNXT_RE_ABI_VERSION) {
- dev_dbg(rdev_to_dev(rdev), " is different from the device %d ",
- BNXT_RE_ABI_VERSION);
+ ibdev_dbg(ibdev, " is different from the device %d ",
+ BNXT_RE_ABI_VERSION);
return -EPERM;
}
@@ -3614,10 +3825,10 @@ int bnxt_re_alloc_ucontext(struct ib_ucontext *ctx, struct ib_udata *udata)
spin_lock_init(&uctx->sh_lock);
resp.comp_mask = BNXT_RE_UCNTX_CMASK_HAVE_CCTX;
- chip_met_rev_num = rdev->chip_ctx.chip_num;
- chip_met_rev_num |= ((u32)rdev->chip_ctx.chip_rev & 0xFF) <<
+ chip_met_rev_num = rdev->chip_ctx->chip_num;
+ chip_met_rev_num |= ((u32)rdev->chip_ctx->chip_rev & 0xFF) <<
BNXT_RE_CHIP_ID0_CHIP_REV_SFT;
- chip_met_rev_num |= ((u32)rdev->chip_ctx.chip_metal & 0xFF) <<
+ chip_met_rev_num |= ((u32)rdev->chip_ctx->chip_metal & 0xFF) <<
BNXT_RE_CHIP_ID0_CHIP_MET_SFT;
resp.chip_id0 = chip_met_rev_num;
/* Future extension of chip info */
@@ -3632,7 +3843,7 @@ int bnxt_re_alloc_ucontext(struct ib_ucontext *ctx, struct ib_udata *udata)
rc = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp)));
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to copy user context");
+ ibdev_err(ibdev, "Failed to copy user context");
rc = -EFAULT;
goto cfail;
}
@@ -3682,15 +3893,14 @@ int bnxt_re_mmap(struct ib_ucontext *ib_uctx, struct vm_area_struct *vma)
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
PAGE_SIZE, vma->vm_page_prot)) {
- dev_err(rdev_to_dev(rdev), "Failed to map DPI");
+ ibdev_err(&rdev->ibdev, "Failed to map DPI");
return -EAGAIN;
}
} else {
pfn = virt_to_phys(uctx->shpg) >> PAGE_SHIFT;
if (remap_pfn_range(vma, vma->vm_start,
pfn, PAGE_SIZE, vma->vm_page_prot)) {
- dev_err(rdev_to_dev(rdev),
- "Failed to map shared page");
+ ibdev_err(&rdev->ibdev, "Failed to map shared page");
return -EAGAIN;
}
}
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 793c97251588..b12fbc857f94 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -78,26 +78,43 @@ static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
/* Mutex to protect the list of bnxt_re devices added */
static DEFINE_MUTEX(bnxt_re_dev_lock);
static struct workqueue_struct *bnxt_re_wq;
-static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev);
+static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
+static void bnxt_re_dealloc_driver(struct ib_device *ib_dev);
+static void bnxt_re_stop_irq(void *handle);
static void bnxt_re_destroy_chip_ctx(struct bnxt_re_dev *rdev)
{
+ struct bnxt_qplib_chip_ctx *chip_ctx;
+
+ if (!rdev->chip_ctx)
+ return;
+ chip_ctx = rdev->chip_ctx;
+ rdev->chip_ctx = NULL;
rdev->rcfw.res = NULL;
rdev->qplib_res.cctx = NULL;
+ rdev->qplib_res.pdev = NULL;
+ rdev->qplib_res.netdev = NULL;
+ kfree(chip_ctx);
}
static int bnxt_re_setup_chip_ctx(struct bnxt_re_dev *rdev)
{
+ struct bnxt_qplib_chip_ctx *chip_ctx;
struct bnxt_en_dev *en_dev;
struct bnxt *bp;
en_dev = rdev->en_dev;
bp = netdev_priv(en_dev->net);
- rdev->chip_ctx.chip_num = bp->chip_num;
+ chip_ctx = kzalloc(sizeof(*chip_ctx), GFP_KERNEL);
+ if (!chip_ctx)
+ return -ENOMEM;
+ chip_ctx->chip_num = bp->chip_num;
+
+ rdev->chip_ctx = chip_ctx;
/* rest members to follow eventually */
- rdev->qplib_res.cctx = &rdev->chip_ctx;
+ rdev->qplib_res.cctx = rdev->chip_ctx;
rdev->rcfw.res = &rdev->qplib_res;
return 0;
@@ -136,9 +153,9 @@ static void bnxt_re_limit_pf_res(struct bnxt_re_dev *rdev)
ctx->srqc_count = min_t(u32, BNXT_RE_MAX_SRQC_COUNT,
attr->max_srq);
ctx->cq_count = min_t(u32, BNXT_RE_MAX_CQ_COUNT, attr->max_cq);
- if (!bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx))
+ if (!bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx))
for (i = 0; i < MAX_TQM_ALLOC_REQ; i++)
- rdev->qplib_ctx.tqm_count[i] =
+ rdev->qplib_ctx.tqm_ctx.qcount[i] =
rdev->dev_attr.tqm_alloc_reqs[i];
}
@@ -185,7 +202,7 @@ static void bnxt_re_set_resource_limits(struct bnxt_re_dev *rdev)
memset(&rdev->qplib_ctx.vf_res, 0, sizeof(struct bnxt_qplib_vf_res));
bnxt_re_limit_pf_res(rdev);
- num_vfs = bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx) ?
+ num_vfs = bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx) ?
BNXT_RE_GEN_P5_MAX_VF : rdev->num_vfs;
if (num_vfs)
bnxt_re_limit_vf_res(&rdev->qplib_ctx, num_vfs);
@@ -208,7 +225,7 @@ static void bnxt_re_sriov_config(void *p, int num_vfs)
return;
rdev->num_vfs = num_vfs;
- if (!bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx)) {
+ if (!bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx)) {
bnxt_re_set_resource_limits(rdev);
bnxt_qplib_set_func_resources(&rdev->qplib_res, &rdev->rcfw,
&rdev->qplib_ctx);
@@ -221,8 +238,10 @@ static void bnxt_re_shutdown(void *p)
if (!rdev)
return;
-
- bnxt_re_ib_unreg(rdev);
+ ASSERT_RTNL();
+ /* Release the MSIx vectors before queuing unregister */
+ bnxt_re_stop_irq(rdev);
+ ib_unregister_device_queued(&rdev->ibdev);
}
static void bnxt_re_stop_irq(void *handle)
@@ -254,7 +273,7 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent)
* to f/w will timeout and that will set the
* timeout bit.
*/
- dev_err(rdev_to_dev(rdev), "Failed to re-start IRQs\n");
+ ibdev_err(&rdev->ibdev, "Failed to re-start IRQs\n");
return;
}
@@ -271,8 +290,8 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent)
rc = bnxt_qplib_nq_start_irq(nq, indx - 1,
msix_ent[indx].vector, false);
if (rc)
- dev_warn(rdev_to_dev(rdev),
- "Failed to reinit NQ index %d\n", indx - 1);
+ ibdev_warn(&rdev->ibdev, "Failed to reinit NQ index %d\n",
+ indx - 1);
}
}
@@ -358,9 +377,9 @@ static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
goto done;
}
if (num_msix_got != num_msix_want) {
- dev_warn(rdev_to_dev(rdev),
- "Requested %d MSI-X vectors, got %d\n",
- num_msix_want, num_msix_got);
+ ibdev_warn(&rdev->ibdev,
+ "Requested %d MSI-X vectors, got %d\n",
+ num_msix_want, num_msix_got);
}
rdev->num_msix = num_msix_got;
done:
@@ -407,14 +426,14 @@ static int bnxt_re_net_ring_free(struct bnxt_re_dev *rdev,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
if (rc)
- dev_err(rdev_to_dev(rdev),
- "Failed to free HW ring:%d :%#x", req.ring_id, rc);
+ ibdev_err(&rdev->ibdev, "Failed to free HW ring:%d :%#x",
+ req.ring_id, rc);
return rc;
}
-static int bnxt_re_net_ring_alloc(struct bnxt_re_dev *rdev, dma_addr_t *dma_arr,
- int pages, int type, u32 ring_mask,
- u32 map_index, u16 *fw_ring_id)
+static int bnxt_re_net_ring_alloc(struct bnxt_re_dev *rdev,
+ struct bnxt_re_ring_attr *ring_attr,
+ u16 *fw_ring_id)
{
struct bnxt_en_dev *en_dev = rdev->en_dev;
struct hwrm_ring_alloc_input req = {0};
@@ -428,18 +447,18 @@ static int bnxt_re_net_ring_alloc(struct bnxt_re_dev *rdev, dma_addr_t *dma_arr,
memset(&fw_msg, 0, sizeof(fw_msg));
bnxt_re_init_hwrm_hdr(rdev, (void *)&req, HWRM_RING_ALLOC, -1, -1);
req.enables = 0;
- req.page_tbl_addr = cpu_to_le64(dma_arr[0]);
- if (pages > 1) {
+ req.page_tbl_addr = cpu_to_le64(ring_attr->dma_arr[0]);
+ if (ring_attr->pages > 1) {
/* Page size is in log2 units */
req.page_size = BNXT_PAGE_SHIFT;
req.page_tbl_depth = 1;
}
req.fbo = 0;
/* Association of ring index with doorbell index and MSIX number */
- req.logical_id = cpu_to_le16(map_index);
- req.length = cpu_to_le32(ring_mask + 1);
- req.ring_type = type;
- req.int_mode = RING_ALLOC_REQ_INT_MODE_MSIX;
+ req.logical_id = cpu_to_le16(ring_attr->lrid);
+ req.length = cpu_to_le32(ring_attr->depth + 1);
+ req.ring_type = ring_attr->type;
+ req.int_mode = ring_attr->mode;
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
@@ -468,8 +487,8 @@ static int bnxt_re_net_stats_ctx_free(struct bnxt_re_dev *rdev,
sizeof(req), DFLT_HWRM_CMD_TIMEOUT);
rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
if (rc)
- dev_err(rdev_to_dev(rdev),
- "Failed to free HW stats context %#x", rc);
+ ibdev_err(&rdev->ibdev, "Failed to free HW stats context %#x",
+ rc);
return rc;
}
@@ -524,17 +543,12 @@ static bool is_bnxt_re_dev(struct net_device *netdev)
static struct bnxt_re_dev *bnxt_re_from_netdev(struct net_device *netdev)
{
- struct bnxt_re_dev *rdev;
+ struct ib_device *ibdev =
+ ib_device_get_by_netdev(netdev, RDMA_DRIVER_BNXT_RE);
+ if (!ibdev)
+ return NULL;
- rcu_read_lock();
- list_for_each_entry_rcu(rdev, &bnxt_re_dev_list, list) {
- if (rdev->netdev == netdev) {
- rcu_read_unlock();
- return rdev;
- }
- }
- rcu_read_unlock();
- return NULL;
+ return container_of(ibdev, struct bnxt_re_dev, ibdev);
}
static void bnxt_re_dev_unprobe(struct net_device *netdev,
@@ -608,11 +622,6 @@ static const struct attribute_group bnxt_re_dev_attr_group = {
.attrs = bnxt_re_attributes,
};
-static void bnxt_re_unregister_ib(struct bnxt_re_dev *rdev)
-{
- ib_unregister_device(&rdev->ibdev);
-}
-
static const struct ib_device_ops bnxt_re_dev_ops = {
.owner = THIS_MODULE,
.driver_id = RDMA_DRIVER_BNXT_RE,
@@ -627,6 +636,7 @@ static const struct ib_device_ops bnxt_re_dev_ops = {
.create_cq = bnxt_re_create_cq,
.create_qp = bnxt_re_create_qp,
.create_srq = bnxt_re_create_srq,
+ .dealloc_driver = bnxt_re_dealloc_driver,
.dealloc_pd = bnxt_re_dealloc_pd,
.dealloc_ucontext = bnxt_re_dealloc_ucontext,
.del_gid = bnxt_re_del_gid,
@@ -723,15 +733,11 @@ static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
{
dev_put(rdev->netdev);
rdev->netdev = NULL;
-
mutex_lock(&bnxt_re_dev_lock);
list_del_rcu(&rdev->list);
mutex_unlock(&bnxt_re_dev_lock);
synchronize_rcu();
-
- ib_dealloc_device(&rdev->ibdev);
- /* rdev is gone */
}
static struct bnxt_re_dev *bnxt_re_dev_add(struct net_device *netdev,
@@ -742,8 +748,8 @@ static struct bnxt_re_dev *bnxt_re_dev_add(struct net_device *netdev,
/* Allocate bnxt_re_dev instance here */
rdev = ib_alloc_device(bnxt_re_dev, ibdev);
if (!rdev) {
- dev_err(NULL, "%s: bnxt_re_dev allocation failure!",
- ROCE_DRV_MODULE_NAME);
+ ibdev_err(NULL, "%s: bnxt_re_dev allocation failure!",
+ ROCE_DRV_MODULE_NAME);
return NULL;
}
/* Default values */
@@ -872,8 +878,8 @@ static int bnxt_re_srqn_handler(struct bnxt_qplib_nq *nq,
int rc = 0;
if (!srq) {
- dev_err(NULL, "%s: SRQ is NULL, SRQN not handled",
- ROCE_DRV_MODULE_NAME);
+ ibdev_err(NULL, "%s: SRQ is NULL, SRQN not handled",
+ ROCE_DRV_MODULE_NAME);
rc = -EINVAL;
goto done;
}
@@ -900,8 +906,8 @@ static int bnxt_re_cqn_handler(struct bnxt_qplib_nq *nq,
qplib_cq);
if (!cq) {
- dev_err(NULL, "%s: CQ is NULL, CQN not handled",
- ROCE_DRV_MODULE_NAME);
+ ibdev_err(NULL, "%s: CQ is NULL, CQN not handled",
+ ROCE_DRV_MODULE_NAME);
return -EINVAL;
}
if (cq->ib_cq.comp_handler) {
@@ -916,7 +922,7 @@ static int bnxt_re_cqn_handler(struct bnxt_qplib_nq *nq,
#define BNXT_RE_GEN_P5_VF_NQ_DB 0x4000
static u32 bnxt_re_get_nqdb_offset(struct bnxt_re_dev *rdev, u16 indx)
{
- return bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx) ?
+ return bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx) ?
(rdev->is_virtfn ? BNXT_RE_GEN_P5_VF_NQ_DB :
BNXT_RE_GEN_P5_PF_NQ_DB) :
rdev->msix_entries[indx].db_offset;
@@ -948,8 +954,8 @@ static int bnxt_re_init_res(struct bnxt_re_dev *rdev)
db_offt, &bnxt_re_cqn_handler,
&bnxt_re_srqn_handler);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to enable NQ with rc = 0x%x", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to enable NQ with rc = 0x%x", rc);
goto fail;
}
num_vec_enabled++;
@@ -967,10 +973,10 @@ static void bnxt_re_free_nq_res(struct bnxt_re_dev *rdev)
int i;
for (i = 0; i < rdev->num_msix - 1; i++) {
- type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
+ type = bnxt_qplib_get_ring_type(rdev->chip_ctx);
bnxt_re_net_ring_free(rdev, rdev->nq[i].ring_id, type);
- rdev->nq[i].res = NULL;
bnxt_qplib_free_nq(&rdev->nq[i]);
+ rdev->nq[i].res = NULL;
}
}
@@ -991,10 +997,10 @@ static void bnxt_re_free_res(struct bnxt_re_dev *rdev)
static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
{
+ struct bnxt_re_ring_attr rattr = {};
+ struct bnxt_qplib_ctx *qplib_ctx;
int num_vec_created = 0;
- dma_addr_t *pg_map;
int rc = 0, i;
- int pages;
u8 type;
/* Configure and allocate resources for qplib */
@@ -1015,27 +1021,31 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
if (rc)
goto dealloc_res;
+ qplib_ctx = &rdev->qplib_ctx;
for (i = 0; i < rdev->num_msix - 1; i++) {
- rdev->nq[i].res = &rdev->qplib_res;
- rdev->nq[i].hwq.max_elements = BNXT_RE_MAX_CQ_COUNT +
- BNXT_RE_MAX_SRQC_COUNT + 2;
- rc = bnxt_qplib_alloc_nq(rdev->en_dev->pdev, &rdev->nq[i]);
+ struct bnxt_qplib_nq *nq;
+
+ nq = &rdev->nq[i];
+ nq->hwq.max_elements = (qplib_ctx->cq_count +
+ qplib_ctx->srqc_count + 2);
+ rc = bnxt_qplib_alloc_nq(&rdev->qplib_res, &rdev->nq[i]);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Alloc Failed NQ%d rc:%#x",
- i, rc);
+ ibdev_err(&rdev->ibdev, "Alloc Failed NQ%d rc:%#x",
+ i, rc);
goto free_nq;
}
- type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
- pg_map = rdev->nq[i].hwq.pbl[PBL_LVL_0].pg_map_arr;
- pages = rdev->nq[i].hwq.pbl[rdev->nq[i].hwq.level].pg_count;
- rc = bnxt_re_net_ring_alloc(rdev, pg_map, pages, type,
- BNXT_QPLIB_NQE_MAX_CNT - 1,
- rdev->msix_entries[i + 1].ring_idx,
- &rdev->nq[i].ring_id);
+ type = bnxt_qplib_get_ring_type(rdev->chip_ctx);
+ rattr.dma_arr = nq->hwq.pbl[PBL_LVL_0].pg_map_arr;
+ rattr.pages = nq->hwq.pbl[rdev->nq[i].hwq.level].pg_count;
+ rattr.type = type;
+ rattr.mode = RING_ALLOC_REQ_INT_MODE_MSIX;
+ rattr.depth = BNXT_QPLIB_NQE_MAX_CNT - 1;
+ rattr.lrid = rdev->msix_entries[i + 1].ring_idx;
+ rc = bnxt_re_net_ring_alloc(rdev, &rattr, &nq->ring_id);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to allocate NQ fw id with rc = 0x%x",
- rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate NQ fw id with rc = 0x%x",
+ rc);
bnxt_qplib_free_nq(&rdev->nq[i]);
goto free_nq;
}
@@ -1043,8 +1053,8 @@ static int bnxt_re_alloc_res(struct bnxt_re_dev *rdev)
}
return 0;
free_nq:
- for (i = num_vec_created; i >= 0; i--) {
- type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
+ for (i = num_vec_created - 1; i >= 0; i--) {
+ type = bnxt_qplib_get_ring_type(rdev->chip_ctx);
bnxt_re_net_ring_free(rdev, rdev->nq[i].ring_id, type);
bnxt_qplib_free_nq(&rdev->nq[i]);
}
@@ -1109,10 +1119,10 @@ static int bnxt_re_query_hwrm_pri2cos(struct bnxt_re_dev *rdev, u8 dir,
return rc;
if (resp.queue_cfg_info) {
- dev_warn(rdev_to_dev(rdev),
- "Asymmetric cos queue configuration detected");
- dev_warn(rdev_to_dev(rdev),
- " on device, QoS may not be fully functional\n");
+ ibdev_warn(&rdev->ibdev,
+ "Asymmetric cos queue configuration detected");
+ ibdev_warn(&rdev->ibdev,
+ " on device, QoS may not be fully functional\n");
}
qcfgmap = &resp.pri0_cos_queue_id;
tmp_map = (u8 *)cid_map;
@@ -1125,7 +1135,8 @@ static int bnxt_re_query_hwrm_pri2cos(struct bnxt_re_dev *rdev, u8 dir,
static bool bnxt_re_is_qp1_or_shadow_qp(struct bnxt_re_dev *rdev,
struct bnxt_re_qp *qp)
{
- return (qp->ib_qp.qp_type == IB_QPT_GSI) || (qp == rdev->qp1_sqp);
+ return (qp->ib_qp.qp_type == IB_QPT_GSI) ||
+ (qp == rdev->gsi_ctx.gsi_sqp);
}
static void bnxt_re_dev_stop(struct bnxt_re_dev *rdev)
@@ -1160,12 +1171,13 @@ static int bnxt_re_update_gid(struct bnxt_re_dev *rdev)
u16 gid_idx, index;
int rc = 0;
- if (!test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
+ if (!ib_device_try_get(&rdev->ibdev))
return 0;
if (!sgid_tbl) {
- dev_err(rdev_to_dev(rdev), "QPLIB: SGID table not allocated");
- return -EINVAL;
+ ibdev_err(&rdev->ibdev, "QPLIB: SGID table not allocated");
+ rc = -EINVAL;
+ goto out;
}
for (index = 0; index < sgid_tbl->active; index++) {
@@ -1185,7 +1197,8 @@ static int bnxt_re_update_gid(struct bnxt_re_dev *rdev)
rc = bnxt_qplib_update_sgid(sgid_tbl, &gid, gid_idx,
rdev->qplib_res.netdev->dev_addr);
}
-
+out:
+ ib_device_put(&rdev->ibdev);
return rc;
}
@@ -1241,7 +1254,7 @@ static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
/* Get cosq id for this priority */
rc = bnxt_re_query_hwrm_pri2cos(rdev, 0, &cid_map);
if (rc) {
- dev_warn(rdev_to_dev(rdev), "no cos for p_mask %x\n", prio_map);
+ ibdev_warn(&rdev->ibdev, "no cos for p_mask %x\n", prio_map);
return rc;
}
/* Parse CoS IDs for app priority */
@@ -1250,8 +1263,8 @@ static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
/* Config BONO. */
rc = bnxt_qplib_map_tc2cos(&rdev->qplib_res, rdev->cosq);
if (rc) {
- dev_warn(rdev_to_dev(rdev), "no tc for cos{%x, %x}\n",
- rdev->cosq[0], rdev->cosq[1]);
+ ibdev_warn(&rdev->ibdev, "no tc for cos{%x, %x}\n",
+ rdev->cosq[0], rdev->cosq[1]);
return rc;
}
@@ -1286,8 +1299,8 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev)
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to query HW version, rc = 0x%x", rc);
+ ibdev_err(&rdev->ibdev, "Failed to query HW version, rc = 0x%x",
+ rc);
return;
}
rdev->qplib_ctx.hwrm_intf_ver =
@@ -1297,15 +1310,35 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev)
le16_to_cpu(resp.hwrm_intf_patch);
}
-static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev)
+static int bnxt_re_ib_init(struct bnxt_re_dev *rdev)
+{
+ int rc = 0;
+ u32 event;
+
+ /* Register ib dev */
+ rc = bnxt_re_register_ib(rdev);
+ if (rc) {
+ pr_err("Failed to register with IB: %#x\n", rc);
+ return rc;
+ }
+ dev_info(rdev_to_dev(rdev), "Device registered successfully");
+ ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
+ &rdev->active_width);
+ set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
+
+ event = netif_running(rdev->netdev) && netif_carrier_ok(rdev->netdev) ?
+ IB_EVENT_PORT_ACTIVE : IB_EVENT_PORT_ERR;
+
+ bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, event);
+
+ return rc;
+}
+
+static void bnxt_re_dev_uninit(struct bnxt_re_dev *rdev)
{
u8 type;
int rc;
- if (test_and_clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
- /* Cleanup ib dev */
- bnxt_re_unregister_ib(rdev);
- }
if (test_and_clear_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags))
cancel_delayed_work_sync(&rdev->worker);
@@ -1318,28 +1351,28 @@ static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev)
if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags)) {
rc = bnxt_qplib_deinit_rcfw(&rdev->rcfw);
if (rc)
- dev_warn(rdev_to_dev(rdev),
- "Failed to deinitialize RCFW: %#x", rc);
+ ibdev_warn(&rdev->ibdev,
+ "Failed to deinitialize RCFW: %#x", rc);
bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id);
- bnxt_qplib_free_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx);
+ bnxt_qplib_free_ctx(&rdev->qplib_res, &rdev->qplib_ctx);
bnxt_qplib_disable_rcfw_channel(&rdev->rcfw);
- type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
- bnxt_re_net_ring_free(rdev, rdev->rcfw.creq_ring_id, type);
+ type = bnxt_qplib_get_ring_type(rdev->chip_ctx);
+ bnxt_re_net_ring_free(rdev, rdev->rcfw.creq.ring_id, type);
bnxt_qplib_free_rcfw_channel(&rdev->rcfw);
}
if (test_and_clear_bit(BNXT_RE_FLAG_GOT_MSIX, &rdev->flags)) {
rc = bnxt_re_free_msix(rdev);
if (rc)
- dev_warn(rdev_to_dev(rdev),
- "Failed to free MSI-X vectors: %#x", rc);
+ ibdev_warn(&rdev->ibdev,
+ "Failed to free MSI-X vectors: %#x", rc);
}
bnxt_re_destroy_chip_ctx(rdev);
if (test_and_clear_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, &rdev->flags)) {
rc = bnxt_re_unregister_netdev(rdev);
if (rc)
- dev_warn(rdev_to_dev(rdev),
- "Failed to unregister with netdev: %#x", rc);
+ ibdev_warn(&rdev->ibdev,
+ "Failed to unregister with netdev: %#x", rc);
}
}
@@ -1353,31 +1386,29 @@ static void bnxt_re_worker(struct work_struct *work)
schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
}
-static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
+static int bnxt_re_dev_init(struct bnxt_re_dev *rdev)
{
- dma_addr_t *pg_map;
- u32 db_offt, ridx;
- int pages, vid;
- bool locked;
+ struct bnxt_qplib_creq_ctx *creq;
+ struct bnxt_re_ring_attr rattr;
+ u32 db_offt;
+ int vid;
u8 type;
int rc;
- /* Acquire rtnl lock through out this function */
- rtnl_lock();
- locked = true;
-
/* Registered a new RoCE device instance to netdev */
+ memset(&rattr, 0, sizeof(rattr));
rc = bnxt_re_register_netdev(rdev);
if (rc) {
rtnl_unlock();
- pr_err("Failed to register with netedev: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to register with netedev: %#x\n", rc);
return -EINVAL;
}
set_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, &rdev->flags);
rc = bnxt_re_setup_chip_ctx(rdev);
if (rc) {
- dev_err(rdev_to_dev(rdev), "Failed to get chip context\n");
+ ibdev_err(&rdev->ibdev, "Failed to get chip context\n");
return -EINVAL;
}
@@ -1386,7 +1417,8 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
rc = bnxt_re_request_msix(rdev);
if (rc) {
- pr_err("Failed to get MSI-X vectors: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to get MSI-X vectors: %#x\n", rc);
rc = -EINVAL;
goto fail;
}
@@ -1397,31 +1429,36 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
/* Establish RCFW Communication Channel to initialize the context
* memory for the function and all child VFs
*/
- rc = bnxt_qplib_alloc_rcfw_channel(rdev->en_dev->pdev, &rdev->rcfw,
+ rc = bnxt_qplib_alloc_rcfw_channel(&rdev->qplib_res, &rdev->rcfw,
&rdev->qplib_ctx,
BNXT_RE_MAX_QPC_COUNT);
if (rc) {
- pr_err("Failed to allocate RCFW Channel: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate RCFW Channel: %#x\n", rc);
goto fail;
}
- type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
- pg_map = rdev->rcfw.creq.pbl[PBL_LVL_0].pg_map_arr;
- pages = rdev->rcfw.creq.pbl[rdev->rcfw.creq.level].pg_count;
- ridx = rdev->msix_entries[BNXT_RE_AEQ_IDX].ring_idx;
- rc = bnxt_re_net_ring_alloc(rdev, pg_map, pages, type,
- BNXT_QPLIB_CREQE_MAX_CNT - 1,
- ridx, &rdev->rcfw.creq_ring_id);
+
+ type = bnxt_qplib_get_ring_type(rdev->chip_ctx);
+ creq = &rdev->rcfw.creq;
+ rattr.dma_arr = creq->hwq.pbl[PBL_LVL_0].pg_map_arr;
+ rattr.pages = creq->hwq.pbl[creq->hwq.level].pg_count;
+ rattr.type = type;
+ rattr.mode = RING_ALLOC_REQ_INT_MODE_MSIX;
+ rattr.depth = BNXT_QPLIB_CREQE_MAX_CNT - 1;
+ rattr.lrid = rdev->msix_entries[BNXT_RE_AEQ_IDX].ring_idx;
+ rc = bnxt_re_net_ring_alloc(rdev, &rattr, &creq->ring_id);
if (rc) {
- pr_err("Failed to allocate CREQ: %#x\n", rc);
+ ibdev_err(&rdev->ibdev, "Failed to allocate CREQ: %#x\n", rc);
goto free_rcfw;
}
db_offt = bnxt_re_get_nqdb_offset(rdev, BNXT_RE_AEQ_IDX);
vid = rdev->msix_entries[BNXT_RE_AEQ_IDX].vector;
- rc = bnxt_qplib_enable_rcfw_channel(rdev->en_dev->pdev, &rdev->rcfw,
+ rc = bnxt_qplib_enable_rcfw_channel(&rdev->rcfw,
vid, db_offt, rdev->is_virtfn,
&bnxt_re_aeq_handler);
if (rc) {
- pr_err("Failed to enable RCFW channel: %#x\n", rc);
+ ibdev_err(&rdev->ibdev, "Failed to enable RCFW channel: %#x\n",
+ rc);
goto free_ring;
}
@@ -1432,24 +1469,27 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
bnxt_re_set_resource_limits(rdev);
- rc = bnxt_qplib_alloc_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx, 0,
- bnxt_qplib_is_chip_gen_p5(&rdev->chip_ctx));
+ rc = bnxt_qplib_alloc_ctx(&rdev->qplib_res, &rdev->qplib_ctx, 0,
+ bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx));
if (rc) {
- pr_err("Failed to allocate QPLIB context: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate QPLIB context: %#x\n", rc);
goto disable_rcfw;
}
rc = bnxt_re_net_stats_ctx_alloc(rdev,
rdev->qplib_ctx.stats.dma_map,
&rdev->qplib_ctx.stats.fw_id);
if (rc) {
- pr_err("Failed to allocate stats context: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate stats context: %#x\n", rc);
goto free_ctx;
}
rc = bnxt_qplib_init_rcfw(&rdev->rcfw, &rdev->qplib_ctx,
rdev->is_virtfn);
if (rc) {
- pr_err("Failed to initialize RCFW: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to initialize RCFW: %#x\n", rc);
goto free_sctx;
}
set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags);
@@ -1457,13 +1497,15 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
/* Resources based on the 'new' device caps */
rc = bnxt_re_alloc_res(rdev);
if (rc) {
- pr_err("Failed to allocate resources: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to allocate resources: %#x\n", rc);
goto fail;
}
set_bit(BNXT_RE_FLAG_RESOURCES_ALLOCATED, &rdev->flags);
rc = bnxt_re_init_res(rdev);
if (rc) {
- pr_err("Failed to initialize resources: %#x\n", rc);
+ ibdev_err(&rdev->ibdev,
+ "Failed to initialize resources: %#x\n", rc);
goto fail;
}
@@ -1472,46 +1514,28 @@ static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
if (!rdev->is_virtfn) {
rc = bnxt_re_setup_qos(rdev);
if (rc)
- pr_info("RoCE priority not yet configured\n");
+ ibdev_info(&rdev->ibdev,
+ "RoCE priority not yet configured\n");
INIT_DELAYED_WORK(&rdev->worker, bnxt_re_worker);
set_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags);
schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
}
- rtnl_unlock();
- locked = false;
-
- /* Register ib dev */
- rc = bnxt_re_register_ib(rdev);
- if (rc) {
- pr_err("Failed to register with IB: %#x\n", rc);
- goto fail;
- }
- set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
- dev_info(rdev_to_dev(rdev), "Device registered successfully");
- ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
- &rdev->active_width);
- set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
- bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
-
return 0;
free_sctx:
bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id);
free_ctx:
- bnxt_qplib_free_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx);
+ bnxt_qplib_free_ctx(&rdev->qplib_res, &rdev->qplib_ctx);
disable_rcfw:
bnxt_qplib_disable_rcfw_channel(&rdev->rcfw);
free_ring:
- type = bnxt_qplib_get_ring_type(&rdev->chip_ctx);
- bnxt_re_net_ring_free(rdev, rdev->rcfw.creq_ring_id, type);
+ type = bnxt_qplib_get_ring_type(rdev->chip_ctx);
+ bnxt_re_net_ring_free(rdev, rdev->rcfw.creq.ring_id, type);
free_rcfw:
bnxt_qplib_free_rcfw_channel(&rdev->rcfw);
fail:
- if (!locked)
- rtnl_lock();
- bnxt_re_ib_unreg(rdev);
- rtnl_unlock();
+ bnxt_re_dev_uninit(rdev);
return rc;
}
@@ -1538,7 +1562,8 @@ static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, struct net_device *netdev)
en_dev = bnxt_re_dev_probe(netdev);
if (IS_ERR(en_dev)) {
if (en_dev != ERR_PTR(-ENODEV))
- pr_err("%s: Failed to probe\n", ROCE_DRV_MODULE_NAME);
+ ibdev_err(&(*rdev)->ibdev, "%s: Failed to probe\n",
+ ROCE_DRV_MODULE_NAME);
rc = PTR_ERR(en_dev);
goto exit;
}
@@ -1552,9 +1577,47 @@ static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, struct net_device *netdev)
return rc;
}
-static void bnxt_re_remove_one(struct bnxt_re_dev *rdev)
+static void bnxt_re_remove_device(struct bnxt_re_dev *rdev)
{
+ bnxt_re_dev_uninit(rdev);
pci_dev_put(rdev->en_dev->pdev);
+ bnxt_re_dev_unreg(rdev);
+}
+
+static int bnxt_re_add_device(struct bnxt_re_dev **rdev,
+ struct net_device *netdev)
+{
+ int rc;
+
+ rc = bnxt_re_dev_reg(rdev, netdev);
+ if (rc == -ENODEV)
+ return rc;
+ if (rc) {
+ pr_err("Failed to register with the device %s: %#x\n",
+ netdev->name, rc);
+ return rc;
+ }
+
+ pci_dev_get((*rdev)->en_dev->pdev);
+ rc = bnxt_re_dev_init(*rdev);
+ if (rc) {
+ pci_dev_put((*rdev)->en_dev->pdev);
+ bnxt_re_dev_unreg(*rdev);
+ }
+
+ return rc;
+}
+
+static void bnxt_re_dealloc_driver(struct ib_device *ib_dev)
+{
+ struct bnxt_re_dev *rdev =
+ container_of(ib_dev, struct bnxt_re_dev, ibdev);
+
+ dev_info(rdev_to_dev(rdev), "Unregistering Device");
+
+ rtnl_lock();
+ bnxt_re_remove_device(rdev);
+ rtnl_unlock();
}
/* Handle all deferred netevents tasks */
@@ -1567,21 +1630,23 @@ static void bnxt_re_task(struct work_struct *work)
re_work = container_of(work, struct bnxt_re_work, work);
rdev = re_work->rdev;
- if (re_work->event != NETDEV_REGISTER &&
- !test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
- return;
-
- switch (re_work->event) {
- case NETDEV_REGISTER:
- rc = bnxt_re_ib_reg(rdev);
+ if (re_work->event == NETDEV_REGISTER) {
+ rc = bnxt_re_ib_init(rdev);
if (rc) {
- dev_err(rdev_to_dev(rdev),
- "Failed to register with IB: %#x", rc);
- bnxt_re_remove_one(rdev);
- bnxt_re_dev_unreg(rdev);
+ ibdev_err(&rdev->ibdev,
+ "Failed to register with IB: %#x", rc);
+ rtnl_lock();
+ bnxt_re_remove_device(rdev);
+ rtnl_unlock();
goto exit;
}
- break;
+ goto exit;
+ }
+
+ if (!ib_device_try_get(&rdev->ibdev))
+ goto exit;
+
+ switch (re_work->event) {
case NETDEV_UP:
bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
IB_EVENT_PORT_ACTIVE);
@@ -1601,17 +1666,12 @@ static void bnxt_re_task(struct work_struct *work)
default:
break;
}
- smp_mb__before_atomic();
- atomic_dec(&rdev->sched_count);
+ ib_device_put(&rdev->ibdev);
exit:
+ put_device(&rdev->ibdev.dev);
kfree(re_work);
}
-static void bnxt_re_init_one(struct bnxt_re_dev *rdev)
-{
- pci_dev_get(rdev->en_dev->pdev);
-}
-
/*
* "Notifier chain callback can be invoked for the same chain from
* different CPUs at the same time".
@@ -1634,6 +1694,7 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
struct bnxt_re_dev *rdev;
int rc = 0;
bool sch_work = false;
+ bool release = true;
real_dev = rdma_vlan_dev_real_dev(netdev);
if (!real_dev)
@@ -1641,7 +1702,8 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
rdev = bnxt_re_from_netdev(real_dev);
if (!rdev && event != NETDEV_REGISTER)
- goto exit;
+ return NOTIFY_OK;
+
if (real_dev != netdev)
goto exit;
@@ -1649,27 +1711,14 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
case NETDEV_REGISTER:
if (rdev)
break;
- rc = bnxt_re_dev_reg(&rdev, real_dev);
- if (rc == -ENODEV)
- break;
- if (rc) {
- pr_err("Failed to register with the device %s: %#x\n",
- real_dev->name, rc);
- break;
- }
- bnxt_re_init_one(rdev);
- sch_work = true;
+ rc = bnxt_re_add_device(&rdev, real_dev);
+ if (!rc)
+ sch_work = true;
+ release = false;
break;
case NETDEV_UNREGISTER:
- /* netdev notifier will call NETDEV_UNREGISTER again later since
- * we are still holding the reference to the netdev
- */
- if (atomic_read(&rdev->sched_count) > 0)
- goto exit;
- bnxt_re_ib_unreg(rdev);
- bnxt_re_remove_one(rdev);
- bnxt_re_dev_unreg(rdev);
+ ib_unregister_device_queued(&rdev->ibdev);
break;
default:
@@ -1680,17 +1729,19 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
/* Allocate for the deferred task */
re_work = kzalloc(sizeof(*re_work), GFP_ATOMIC);
if (re_work) {
+ get_device(&rdev->ibdev.dev);
re_work->rdev = rdev;
re_work->event = event;
re_work->vlan_dev = (real_dev == netdev ?
NULL : netdev);
INIT_WORK(&re_work->work, bnxt_re_task);
- atomic_inc(&rdev->sched_count);
queue_work(bnxt_re_wq, &re_work->work);
}
}
exit:
+ if (rdev && release)
+ ib_device_put(&rdev->ibdev);
return NOTIFY_DONE;
}
@@ -1726,36 +1777,21 @@ static int __init bnxt_re_mod_init(void)
static void __exit bnxt_re_mod_exit(void)
{
- struct bnxt_re_dev *rdev, *next;
- LIST_HEAD(to_be_deleted);
+ struct bnxt_re_dev *rdev;
- mutex_lock(&bnxt_re_dev_lock);
- /* Free all adapter allocated resources */
- if (!list_empty(&bnxt_re_dev_list))
- list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
- mutex_unlock(&bnxt_re_dev_lock);
- /*
- * Cleanup the devices in reverse order so that the VF device
- * cleanup is done before PF cleanup
- */
- list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
- dev_info(rdev_to_dev(rdev), "Unregistering Device");
- /*
- * Flush out any scheduled tasks before destroying the
- * resources
- */
- flush_workqueue(bnxt_re_wq);
- bnxt_re_dev_stop(rdev);
- /* Acquire the rtnl_lock as the L2 resources are freed here */
- rtnl_lock();
- bnxt_re_ib_unreg(rdev);
- rtnl_unlock();
- bnxt_re_remove_one(rdev);
- bnxt_re_dev_unreg(rdev);
- }
unregister_netdevice_notifier(&bnxt_re_netdev_notifier);
if (bnxt_re_wq)
destroy_workqueue(bnxt_re_wq);
+ list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
+ /* VF device removal should be called before the removal
+ * of PF device. Queue VFs unregister first, so that VFs
+ * shall be removed before the PF during the call of
+ * ib_unregister_driver.
+ */
+ if (rdev->is_virtfn)
+ ib_unregister_device(&rdev->ibdev);
+ }
+ ib_unregister_driver(RDMA_DRIVER_BNXT_RE);
}
module_init(bnxt_re_mod_init);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
index 020f70e6865e..899a5d2c100e 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
@@ -43,6 +43,7 @@
#include
#include
#include
+#include
#include
#include
@@ -53,9 +54,7 @@
#include "qplib_sp.h"
#include "qplib_fp.h"
-static void bnxt_qplib_arm_cq_enable(struct bnxt_qplib_cq *cq);
static void __clean_cq(struct bnxt_qplib_cq *cq, u64 qp);
-static void bnxt_qplib_arm_srq(struct bnxt_qplib_srq *srq, u32 arm_type);
static void bnxt_qplib_cancel_phantom_processing(struct bnxt_qplib_qp *qp)
{
@@ -233,6 +232,70 @@ static int bnxt_qplib_alloc_qp_hdr_buf(struct bnxt_qplib_res *res,
return rc;
}
+static void clean_nq(struct bnxt_qplib_nq *nq, struct bnxt_qplib_cq *cq)
+{
+ struct bnxt_qplib_hwq *hwq = &nq->hwq;
+ struct nq_base *nqe, **nq_ptr;
+ int budget = nq->budget;
+ u32 sw_cons, raw_cons;
+ uintptr_t q_handle;
+ u16 type;
+
+ spin_lock_bh(&hwq->lock);
+ /* Service the NQ until empty */
+ raw_cons = hwq->cons;
+ while (budget--) {
+ sw_cons = HWQ_CMP(raw_cons, hwq);
+ nq_ptr = (struct nq_base **)hwq->pbl_ptr;
+ nqe = &nq_ptr[NQE_PG(sw_cons)][NQE_IDX(sw_cons)];
+ if (!NQE_CMP_VALID(nqe, raw_cons, hwq->max_elements))
+ break;
+
+ /*
+ * The valid test of the entry must be done first before
+ * reading any further.
+ */
+ dma_rmb();
+
+ type = le16_to_cpu(nqe->info10_type) & NQ_BASE_TYPE_MASK;
+ switch (type) {
+ case NQ_BASE_TYPE_CQ_NOTIFICATION:
+ {
+ struct nq_cn *nqcne = (struct nq_cn *)nqe;
+
+ q_handle = le32_to_cpu(nqcne->cq_handle_low);
+ q_handle |= (u64)le32_to_cpu(nqcne->cq_handle_high)
+ << 32;
+ if ((unsigned long)cq == q_handle) {
+ nqcne->cq_handle_low = 0;
+ nqcne->cq_handle_high = 0;
+ cq->cnq_events++;
+ }
+ break;
+ }
+ default:
+ break;
+ }
+ raw_cons++;
+ }
+ spin_unlock_bh(&hwq->lock);
+}
+
+/* Wait for receiving all NQEs for this CQ and clean the NQEs associated with
+ * this CQ.
+ */
+static void __wait_for_all_nqes(struct bnxt_qplib_cq *cq, u16 cnq_events)
+{
+ u32 retry_cnt = 100;
+
+ while (retry_cnt--) {
+ if (cnq_events == cq->cnq_events)
+ return;
+ usleep_range(50, 100);
+ clean_nq(cq->nq, cq);
+ }
+}
+
static void bnxt_qplib_service_nq(unsigned long data)
{
struct bnxt_qplib_nq *nq = (struct bnxt_qplib_nq *)data;
@@ -241,12 +304,12 @@ static void bnxt_qplib_service_nq(unsigned long data)
struct bnxt_qplib_cq *cq;
int num_cqne_processed = 0;
int num_srqne_processed = 0;
- u32 sw_cons, raw_cons;
- u16 type;
int budget = nq->budget;
+ u32 sw_cons, raw_cons;
uintptr_t q_handle;
- bool gen_p5 = bnxt_qplib_is_chip_gen_p5(nq->res->cctx);
+ u16 type;
+ spin_lock_bh(&hwq->lock);
/* Service the NQ until empty */
raw_cons = hwq->cons;
while (budget--) {
@@ -272,7 +335,10 @@ static void bnxt_qplib_service_nq(unsigned long data)
q_handle |= (u64)le32_to_cpu(nqcne->cq_handle_high)
<< 32;
cq = (struct bnxt_qplib_cq *)(unsigned long)q_handle;
- bnxt_qplib_arm_cq_enable(cq);
+ if (!cq)
+ break;
+ bnxt_qplib_armen_db(&cq->dbinfo,
+ DBC_DBC_TYPE_CQ_ARMENA);
spin_lock_bh(&cq->compl_lock);
atomic_set(&cq->arm_state, 0);
if (!nq->cqn_handler(nq, (cq)))
@@ -280,19 +346,22 @@ static void bnxt_qplib_service_nq(unsigned long data)
else
dev_warn(&nq->pdev->dev,
"cqn - type 0x%x not handled\n", type);
+ cq->cnq_events++;
spin_unlock_bh(&cq->compl_lock);
break;
}
case NQ_BASE_TYPE_SRQ_EVENT:
{
+ struct bnxt_qplib_srq *srq;
struct nq_srq_event *nqsrqe =
(struct nq_srq_event *)nqe;
q_handle = le32_to_cpu(nqsrqe->srq_handle_low);
q_handle |= (u64)le32_to_cpu(nqsrqe->srq_handle_high)
<< 32;
- bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle,
- DBC_DBC_TYPE_SRQ_ARMENA);
+ srq = (struct bnxt_qplib_srq *)q_handle;
+ bnxt_qplib_armen_db(&srq->dbinfo,
+ DBC_DBC_TYPE_SRQ_ARMENA);
if (!nq->srqn_handler(nq,
(struct bnxt_qplib_srq *)q_handle,
nqsrqe->event))
@@ -314,10 +383,9 @@ static void bnxt_qplib_service_nq(unsigned long data)
}
if (hwq->cons != raw_cons) {
hwq->cons = raw_cons;
- bnxt_qplib_ring_nq_db_rearm(nq->bar_reg_iomem, hwq->cons,
- hwq->max_elements, nq->ring_id,
- gen_p5);
+ bnxt_qplib_ring_nq_db(&nq->nq_db.dbinfo, nq->res->cctx, true);
}
+ spin_unlock_bh(&hwq->lock);
}
static irqreturn_t bnxt_qplib_nq_irq(int irq, void *dev_instance)
@@ -333,25 +401,23 @@ static irqreturn_t bnxt_qplib_nq_irq(int irq, void *dev_instance)
prefetch(&nq_ptr[NQE_PG(sw_cons)][NQE_IDX(sw_cons)]);
/* Fan out to CPU affinitized kthreads? */
- tasklet_schedule(&nq->worker);
+ tasklet_schedule(&nq->nq_tasklet);
return IRQ_HANDLED;
}
void bnxt_qplib_nq_stop_irq(struct bnxt_qplib_nq *nq, bool kill)
{
- bool gen_p5 = bnxt_qplib_is_chip_gen_p5(nq->res->cctx);
- tasklet_disable(&nq->worker);
+ tasklet_disable(&nq->nq_tasklet);
/* Mask h/w interrupt */
- bnxt_qplib_ring_nq_db(nq->bar_reg_iomem, nq->hwq.cons,
- nq->hwq.max_elements, nq->ring_id, gen_p5);
+ bnxt_qplib_ring_nq_db(&nq->nq_db.dbinfo, nq->res->cctx, false);
/* Sync with last running IRQ handler */
- synchronize_irq(nq->vector);
+ synchronize_irq(nq->msix_vec);
if (kill)
- tasklet_kill(&nq->worker);
+ tasklet_kill(&nq->nq_tasklet);
if (nq->requested) {
- irq_set_affinity_hint(nq->vector, NULL);
- free_irq(nq->vector, nq);
+ irq_set_affinity_hint(nq->msix_vec, NULL);
+ free_irq(nq->msix_vec, nq);
nq->requested = false;
}
}
@@ -364,89 +430,108 @@ void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq)
}
/* Make sure the HW is stopped! */
- if (nq->requested)
- bnxt_qplib_nq_stop_irq(nq, true);
+ bnxt_qplib_nq_stop_irq(nq, true);
- if (nq->bar_reg_iomem)
- iounmap(nq->bar_reg_iomem);
- nq->bar_reg_iomem = NULL;
+ if (nq->nq_db.reg.bar_reg) {
+ iounmap(nq->nq_db.reg.bar_reg);
+ nq->nq_db.reg.bar_reg = NULL;
+ }
nq->cqn_handler = NULL;
nq->srqn_handler = NULL;
- nq->vector = 0;
+ nq->msix_vec = 0;
}
int bnxt_qplib_nq_start_irq(struct bnxt_qplib_nq *nq, int nq_indx,
int msix_vector, bool need_init)
{
- bool gen_p5 = bnxt_qplib_is_chip_gen_p5(nq->res->cctx);
int rc;
if (nq->requested)
return -EFAULT;
- nq->vector = msix_vector;
+ nq->msix_vec = msix_vector;
if (need_init)
- tasklet_init(&nq->worker, bnxt_qplib_service_nq,
+ tasklet_init(&nq->nq_tasklet, bnxt_qplib_service_nq,
(unsigned long)nq);
else
- tasklet_enable(&nq->worker);
+ tasklet_enable(&nq->nq_tasklet);
snprintf(nq->name, sizeof(nq->name), "bnxt_qplib_nq-%d", nq_indx);
- rc = request_irq(nq->vector, bnxt_qplib_nq_irq, 0, nq->name, nq);
+ rc = request_irq(nq->msix_vec, bnxt_qplib_nq_irq, 0, nq->name, nq);
if (rc)
return rc;
cpumask_clear(&nq->mask);
cpumask_set_cpu(nq_indx, &nq->mask);
- rc = irq_set_affinity_hint(nq->vector, &nq->mask);
+ rc = irq_set_affinity_hint(nq->msix_vec, &nq->mask);
if (rc) {
dev_warn(&nq->pdev->dev,
"set affinity failed; vector: %d nq_idx: %d\n",
- nq->vector, nq_indx);
+ nq->msix_vec, nq_indx);
}
nq->requested = true;
- bnxt_qplib_ring_nq_db_rearm(nq->bar_reg_iomem, nq->hwq.cons,
- nq->hwq.max_elements, nq->ring_id, gen_p5);
+ bnxt_qplib_ring_nq_db(&nq->nq_db.dbinfo, nq->res->cctx, true);
return rc;
}
+static int bnxt_qplib_map_nq_db(struct bnxt_qplib_nq *nq, u32 reg_offt)
+{
+ resource_size_t reg_base;
+ struct bnxt_qplib_nq_db *nq_db;
+ struct pci_dev *pdev;
+ int rc = 0;
+
+ pdev = nq->pdev;
+ nq_db = &nq->nq_db;
+
+ nq_db->reg.bar_id = NQ_CONS_PCI_BAR_REGION;
+ nq_db->reg.bar_base = pci_resource_start(pdev, nq_db->reg.bar_id);
+ if (!nq_db->reg.bar_base) {
+ dev_err(&pdev->dev, "QPLIB: NQ BAR region %d resc start is 0!",
+ nq_db->reg.bar_id);
+ rc = -ENOMEM;
+ goto fail;
+ }
+
+ reg_base = nq_db->reg.bar_base + reg_offt;
+ /* Unconditionally map 8 bytes to support 57500 series */
+ nq_db->reg.len = 8;
+ nq_db->reg.bar_reg = ioremap(reg_base, nq_db->reg.len);
+ if (!nq_db->reg.bar_reg) {
+ dev_err(&pdev->dev, "QPLIB: NQ BAR region %d mapping failed",
+ nq_db->reg.bar_id);
+ rc = -ENOMEM;
+ goto fail;
+ }
+
+ nq_db->dbinfo.db = nq_db->reg.bar_reg;
+ nq_db->dbinfo.hwq = &nq->hwq;
+ nq_db->dbinfo.xid = nq->ring_id;
+fail:
+ return rc;
+}
+
int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
int nq_idx, int msix_vector, int bar_reg_offset,
- int (*cqn_handler)(struct bnxt_qplib_nq *nq,
- struct bnxt_qplib_cq *),
- int (*srqn_handler)(struct bnxt_qplib_nq *nq,
- struct bnxt_qplib_srq *,
- u8 event))
+ cqn_handler_t cqn_handler,
+ srqn_handler_t srqn_handler)
{
- resource_size_t nq_base;
int rc = -1;
- if (cqn_handler)
- nq->cqn_handler = cqn_handler;
-
- if (srqn_handler)
- nq->srqn_handler = srqn_handler;
+ nq->pdev = pdev;
+ nq->cqn_handler = cqn_handler;
+ nq->srqn_handler = srqn_handler;
/* Have a task to schedule CQ notifiers in post send case */
nq->cqn_wq = create_singlethread_workqueue("bnxt_qplib_nq");
if (!nq->cqn_wq)
return -ENOMEM;
- nq->bar_reg = NQ_CONS_PCI_BAR_REGION;
- nq->bar_reg_off = bar_reg_offset;
- nq_base = pci_resource_start(pdev, nq->bar_reg);
- if (!nq_base) {
- rc = -ENOMEM;
+ rc = bnxt_qplib_map_nq_db(nq, bar_reg_offset);
+ if (rc)
goto fail;
- }
- /* Unconditionally map 8 bytes to support 57500 series */
- nq->bar_reg_iomem = ioremap(nq_base + nq->bar_reg_off, 8);
- if (!nq->bar_reg_iomem) {
- rc = -ENOMEM;
- goto fail;
- }
rc = bnxt_qplib_nq_start_irq(nq, nq_idx, msix_vector, true);
if (rc) {
@@ -464,49 +549,38 @@ int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
void bnxt_qplib_free_nq(struct bnxt_qplib_nq *nq)
{
if (nq->hwq.max_elements) {
- bnxt_qplib_free_hwq(nq->pdev, &nq->hwq);
+ bnxt_qplib_free_hwq(nq->res, &nq->hwq);
nq->hwq.max_elements = 0;
}
}
-int bnxt_qplib_alloc_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq)
+int bnxt_qplib_alloc_nq(struct bnxt_qplib_res *res, struct bnxt_qplib_nq *nq)
{
- u8 hwq_type;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
+ struct bnxt_qplib_sg_info sginfo = {};
- nq->pdev = pdev;
+ nq->pdev = res->pdev;
+ nq->res = res;
if (!nq->hwq.max_elements ||
nq->hwq.max_elements > BNXT_QPLIB_NQE_MAX_CNT)
nq->hwq.max_elements = BNXT_QPLIB_NQE_MAX_CNT;
- hwq_type = bnxt_qplib_get_hwq_type(nq->res);
- if (bnxt_qplib_alloc_init_hwq(nq->pdev, &nq->hwq, NULL,
- &nq->hwq.max_elements,
- BNXT_QPLIB_MAX_NQE_ENTRY_SIZE, 0,
- PAGE_SIZE, hwq_type))
- return -ENOMEM;
+ sginfo.pgsize = PAGE_SIZE;
+ sginfo.pgshft = PAGE_SHIFT;
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &sginfo;
+ hwq_attr.depth = nq->hwq.max_elements;
+ hwq_attr.stride = sizeof(struct nq_base);
+ hwq_attr.type = bnxt_qplib_get_hwq_type(nq->res);
+ if (bnxt_qplib_alloc_init_hwq(&nq->hwq, &hwq_attr)) {
+ dev_err(&nq->pdev->dev, "FP NQ allocation failed");
+ return -ENOMEM;
+ }
nq->budget = 8;
return 0;
}
/* SRQ */
-static void bnxt_qplib_arm_srq(struct bnxt_qplib_srq *srq, u32 arm_type)
-{
- struct bnxt_qplib_hwq *srq_hwq = &srq->hwq;
- void __iomem *db;
- u32 sw_prod;
- u64 val = 0;
-
- /* Ring DB */
- sw_prod = (arm_type == DBC_DBC_TYPE_SRQ_ARM) ?
- srq->threshold : HWQ_CMP(srq_hwq->prod, srq_hwq);
- db = (arm_type == DBC_DBC_TYPE_SRQ_ARMENA) ? srq->dbr_base :
- srq->dpi->dbr;
- val = ((srq->id << DBC_DBC_XID_SFT) & DBC_DBC_XID_MASK) | arm_type;
- val <<= 32;
- val |= (sw_prod << DBC_DBC_INDEX_SFT) & DBC_DBC_INDEX_MASK;
- writeq(val, db);
-}
-
void bnxt_qplib_destroy_srq(struct bnxt_qplib_res *res,
struct bnxt_qplib_srq *srq)
{
@@ -526,24 +600,26 @@ void bnxt_qplib_destroy_srq(struct bnxt_qplib_res *res,
kfree(srq->swq);
if (rc)
return;
- bnxt_qplib_free_hwq(res->pdev, &srq->hwq);
+ bnxt_qplib_free_hwq(res, &srq->hwq);
}
int bnxt_qplib_create_srq(struct bnxt_qplib_res *res,
struct bnxt_qplib_srq *srq)
{
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
- struct cmdq_create_srq req;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
struct creq_create_srq_resp resp;
+ struct cmdq_create_srq req;
struct bnxt_qplib_pbl *pbl;
u16 cmd_flags = 0;
int rc, idx;
- srq->hwq.max_elements = srq->max_wqe;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &srq->hwq, &srq->sg_info,
- &srq->hwq.max_elements,
- BNXT_QPLIB_MAX_RQE_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_QUEUE);
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &srq->sg_info;
+ hwq_attr.depth = srq->max_wqe;
+ hwq_attr.stride = BNXT_QPLIB_MAX_RQE_ENTRY_SIZE;
+ hwq_attr.type = HWQ_TYPE_QUEUE;
+ rc = bnxt_qplib_alloc_init_hwq(&srq->hwq, &hwq_attr);
if (rc)
goto exit;
@@ -595,14 +671,17 @@ int bnxt_qplib_create_srq(struct bnxt_qplib_res *res,
srq->swq[srq->last_idx].next_idx = -1;
srq->id = le32_to_cpu(resp.xid);
- srq->dbr_base = res->dpi_tbl.dbr_bar_reg_iomem;
+ srq->dbinfo.hwq = &srq->hwq;
+ srq->dbinfo.xid = srq->id;
+ srq->dbinfo.db = srq->dpi->dbr;
+ srq->dbinfo.priv_db = res->dpi_tbl.dbr_bar_reg_iomem;
if (srq->threshold)
- bnxt_qplib_arm_srq(srq, DBC_DBC_TYPE_SRQ_ARMENA);
+ bnxt_qplib_armen_db(&srq->dbinfo, DBC_DBC_TYPE_SRQ_ARMENA);
srq->arm_req = false;
return 0;
fail:
- bnxt_qplib_free_hwq(res->pdev, &srq->hwq);
+ bnxt_qplib_free_hwq(res, &srq->hwq);
kfree(srq->swq);
exit:
return rc;
@@ -621,7 +700,7 @@ int bnxt_qplib_modify_srq(struct bnxt_qplib_res *res,
srq_hwq->max_elements - sw_cons + sw_prod;
if (count > srq->threshold) {
srq->arm_req = false;
- bnxt_qplib_arm_srq(srq, DBC_DBC_TYPE_SRQ_ARM);
+ bnxt_qplib_srq_arm_db(&srq->dbinfo, srq->threshold);
} else {
/* Deferred arming */
srq->arm_req = true;
@@ -709,10 +788,10 @@ int bnxt_qplib_post_srq_recv(struct bnxt_qplib_srq *srq,
srq_hwq->max_elements - sw_cons + sw_prod;
spin_unlock(&srq_hwq->lock);
/* Ring DB */
- bnxt_qplib_arm_srq(srq, DBC_DBC_TYPE_SRQ);
+ bnxt_qplib_ring_prod_db(&srq->dbinfo, DBC_DBC_TYPE_SRQ);
if (srq->arm_req == true && count > srq->threshold) {
srq->arm_req = false;
- bnxt_qplib_arm_srq(srq, DBC_DBC_TYPE_SRQ_ARM);
+ bnxt_qplib_srq_arm_db(&srq->dbinfo, srq->threshold);
}
done:
return rc;
@@ -721,15 +800,16 @@ int bnxt_qplib_post_srq_recv(struct bnxt_qplib_srq *srq,
/* QP */
int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
{
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
- struct cmdq_create_qp1 req;
- struct creq_create_qp1_resp resp;
- struct bnxt_qplib_pbl *pbl;
struct bnxt_qplib_q *sq = &qp->sq;
struct bnxt_qplib_q *rq = &qp->rq;
- int rc;
+ struct creq_create_qp1_resp resp;
+ struct cmdq_create_qp1 req;
+ struct bnxt_qplib_pbl *pbl;
u16 cmd_flags = 0;
u32 qp_flags = 0;
+ int rc;
RCFW_CMD_PREP(req, CREATE_QP1, cmd_flags);
@@ -739,11 +819,12 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
req.qp_handle = cpu_to_le64(qp->qp_handle);
/* SQ */
- sq->hwq.max_elements = sq->max_wqe;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &sq->hwq, NULL,
- &sq->hwq.max_elements,
- BNXT_QPLIB_MAX_SQE_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_QUEUE);
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &sq->sg_info;
+ hwq_attr.depth = sq->max_wqe;
+ hwq_attr.stride = BNXT_QPLIB_MAX_SQE_ENTRY_SIZE;
+ hwq_attr.type = HWQ_TYPE_QUEUE;
+ rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr);
if (rc)
goto exit;
@@ -778,11 +859,12 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
/* RQ */
if (rq->max_wqe) {
- rq->hwq.max_elements = qp->rq.max_wqe;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &rq->hwq, NULL,
- &rq->hwq.max_elements,
- BNXT_QPLIB_MAX_RQE_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_QUEUE);
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &rq->sg_info;
+ hwq_attr.stride = BNXT_QPLIB_MAX_RQE_ENTRY_SIZE;
+ hwq_attr.depth = qp->rq.max_wqe;
+ hwq_attr.type = HWQ_TYPE_QUEUE;
+ rc = bnxt_qplib_alloc_init_hwq(&rq->hwq, &hwq_attr);
if (rc)
goto fail_sq;
@@ -840,6 +922,15 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
qp->id = le32_to_cpu(resp.xid);
qp->cur_qp_state = CMDQ_MODIFY_QP_NEW_STATE_RESET;
+ qp->cctx = res->cctx;
+ sq->dbinfo.hwq = &sq->hwq;
+ sq->dbinfo.xid = qp->id;
+ sq->dbinfo.db = qp->dpi->dbr;
+ if (rq->max_wqe) {
+ rq->dbinfo.hwq = &rq->hwq;
+ rq->dbinfo.xid = qp->id;
+ rq->dbinfo.db = qp->dpi->dbr;
+ }
rcfw->qp_tbl[qp->id].qp_id = qp->id;
rcfw->qp_tbl[qp->id].qp_handle = (void *)qp;
@@ -848,10 +939,10 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
fail:
bnxt_qplib_free_qp_hdr_buf(res, qp);
fail_rq:
- bnxt_qplib_free_hwq(res->pdev, &rq->hwq);
+ bnxt_qplib_free_hwq(res, &rq->hwq);
kfree(rq->swq);
fail_sq:
- bnxt_qplib_free_hwq(res->pdev, &sq->hwq);
+ bnxt_qplib_free_hwq(res, &sq->hwq);
kfree(sq->swq);
exit:
return rc;
@@ -860,7 +951,9 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
unsigned long int psn_search, poff = 0;
+ struct bnxt_qplib_sg_info sginfo = {};
struct sq_psn_search **psn_search_ptr;
struct bnxt_qplib_q *sq = &qp->sq;
struct bnxt_qplib_q *rq = &qp->rq;
@@ -887,12 +980,15 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
sizeof(struct sq_psn_search_ext) :
sizeof(struct sq_psn_search);
}
- sq->hwq.max_elements = sq->max_wqe;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &sq->hwq, &sq->sg_info,
- &sq->hwq.max_elements,
- BNXT_QPLIB_MAX_SQE_ENTRY_SIZE,
- psn_sz,
- PAGE_SIZE, HWQ_TYPE_QUEUE);
+
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &sq->sg_info;
+ hwq_attr.stride = BNXT_QPLIB_MAX_SQE_ENTRY_SIZE;
+ hwq_attr.depth = sq->max_wqe;
+ hwq_attr.aux_stride = psn_sz;
+ hwq_attr.aux_depth = hwq_attr.depth;
+ hwq_attr.type = HWQ_TYPE_QUEUE;
+ rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr);
if (rc)
goto exit;
@@ -956,12 +1052,14 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
/* RQ */
if (rq->max_wqe) {
- rq->hwq.max_elements = rq->max_wqe;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &rq->hwq,
- &rq->sg_info,
- &rq->hwq.max_elements,
- BNXT_QPLIB_MAX_RQE_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_QUEUE);
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &rq->sg_info;
+ hwq_attr.stride = BNXT_QPLIB_MAX_RQE_ENTRY_SIZE;
+ hwq_attr.depth = rq->max_wqe;
+ hwq_attr.aux_stride = 0;
+ hwq_attr.aux_depth = 0;
+ hwq_attr.type = HWQ_TYPE_QUEUE;
+ rc = bnxt_qplib_alloc_init_hwq(&rq->hwq, &hwq_attr);
if (rc)
goto fail_sq;
@@ -1029,10 +1127,17 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
req_size = xrrq->max_elements *
BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE + PAGE_SIZE - 1;
req_size &= ~(PAGE_SIZE - 1);
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, xrrq, NULL,
- &xrrq->max_elements,
- BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE,
- 0, req_size, HWQ_TYPE_CTX);
+ sginfo.pgsize = req_size;
+ sginfo.pgshft = PAGE_SHIFT;
+
+ hwq_attr.res = res;
+ hwq_attr.sginfo = &sginfo;
+ hwq_attr.depth = xrrq->max_elements;
+ hwq_attr.stride = BNXT_QPLIB_MAX_ORRQE_ENTRY_SIZE;
+ hwq_attr.aux_stride = 0;
+ hwq_attr.aux_depth = 0;
+ hwq_attr.type = HWQ_TYPE_CTX;
+ rc = bnxt_qplib_alloc_init_hwq(xrrq, &hwq_attr);
if (rc)
goto fail_buf_free;
pbl = &xrrq->pbl[PBL_LVL_0];
@@ -1044,11 +1149,10 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
req_size = xrrq->max_elements *
BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE + PAGE_SIZE - 1;
req_size &= ~(PAGE_SIZE - 1);
-
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, xrrq, NULL,
- &xrrq->max_elements,
- BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE,
- 0, req_size, HWQ_TYPE_CTX);
+ sginfo.pgsize = req_size;
+ hwq_attr.depth = xrrq->max_elements;
+ hwq_attr.stride = BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE;
+ rc = bnxt_qplib_alloc_init_hwq(xrrq, &hwq_attr);
if (rc)
goto fail_orrq;
@@ -1064,9 +1168,17 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
qp->id = le32_to_cpu(resp.xid);
qp->cur_qp_state = CMDQ_MODIFY_QP_NEW_STATE_RESET;
- qp->cctx = res->cctx;
INIT_LIST_HEAD(&qp->sq_flush);
INIT_LIST_HEAD(&qp->rq_flush);
+ qp->cctx = res->cctx;
+ sq->dbinfo.hwq = &sq->hwq;
+ sq->dbinfo.xid = qp->id;
+ sq->dbinfo.db = qp->dpi->dbr;
+ if (rq->max_wqe) {
+ rq->dbinfo.hwq = &rq->hwq;
+ rq->dbinfo.xid = qp->id;
+ rq->dbinfo.db = qp->dpi->dbr;
+ }
rcfw->qp_tbl[qp->id].qp_id = qp->id;
rcfw->qp_tbl[qp->id].qp_handle = (void *)qp;
@@ -1074,17 +1186,17 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
fail:
if (qp->irrq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &qp->irrq);
+ bnxt_qplib_free_hwq(res, &qp->irrq);
fail_orrq:
if (qp->orrq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &qp->orrq);
+ bnxt_qplib_free_hwq(res, &qp->orrq);
fail_buf_free:
bnxt_qplib_free_qp_hdr_buf(res, qp);
fail_rq:
- bnxt_qplib_free_hwq(res->pdev, &rq->hwq);
+ bnxt_qplib_free_hwq(res, &rq->hwq);
kfree(rq->swq);
fail_sq:
- bnxt_qplib_free_hwq(res->pdev, &sq->hwq);
+ bnxt_qplib_free_hwq(res, &sq->hwq);
kfree(sq->swq);
exit:
return rc;
@@ -1440,16 +1552,16 @@ void bnxt_qplib_free_qp_res(struct bnxt_qplib_res *res,
struct bnxt_qplib_qp *qp)
{
bnxt_qplib_free_qp_hdr_buf(res, qp);
- bnxt_qplib_free_hwq(res->pdev, &qp->sq.hwq);
+ bnxt_qplib_free_hwq(res, &qp->sq.hwq);
kfree(qp->sq.swq);
- bnxt_qplib_free_hwq(res->pdev, &qp->rq.hwq);
+ bnxt_qplib_free_hwq(res, &qp->rq.hwq);
kfree(qp->rq.swq);
if (qp->irrq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &qp->irrq);
+ bnxt_qplib_free_hwq(res, &qp->irrq);
if (qp->orrq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &qp->orrq);
+ bnxt_qplib_free_hwq(res, &qp->orrq);
}
@@ -1506,16 +1618,8 @@ void *bnxt_qplib_get_qp1_rq_buf(struct bnxt_qplib_qp *qp,
void bnxt_qplib_post_send_db(struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_q *sq = &qp->sq;
- u32 sw_prod;
- u64 val = 0;
- val = (((qp->id << DBC_DBC_XID_SFT) & DBC_DBC_XID_MASK) |
- DBC_DBC_TYPE_SQ);
- val <<= 32;
- sw_prod = HWQ_CMP(sq->hwq.prod, &sq->hwq);
- val |= (sw_prod << DBC_DBC_INDEX_SFT) & DBC_DBC_INDEX_MASK;
- /* Flush all the WQE writes to HW */
- writeq(val, qp->dpi->dbr);
+ bnxt_qplib_ring_prod_db(&sq->dbinfo, DBC_DBC_TYPE_SQ);
}
int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
@@ -1807,16 +1911,8 @@ int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
void bnxt_qplib_post_recv_db(struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_q *rq = &qp->rq;
- u32 sw_prod;
- u64 val = 0;
- val = (((qp->id << DBC_DBC_XID_SFT) & DBC_DBC_XID_MASK) |
- DBC_DBC_TYPE_RQ);
- val <<= 32;
- sw_prod = HWQ_CMP(rq->hwq.prod, &rq->hwq);
- val |= (sw_prod << DBC_DBC_INDEX_SFT) & DBC_DBC_INDEX_MASK;
- /* Flush the writes to HW Rx WQE before the ringing Rx DB */
- writeq(val, qp->dpi->dbr);
+ bnxt_qplib_ring_prod_db(&rq->dbinfo, DBC_DBC_TYPE_RQ);
}
int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
@@ -1896,48 +1992,22 @@ int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
}
/* CQ */
-
-/* Spinlock must be held */
-static void bnxt_qplib_arm_cq_enable(struct bnxt_qplib_cq *cq)
-{
- u64 val = 0;
-
- val = ((cq->id << DBC_DBC_XID_SFT) & DBC_DBC_XID_MASK) |
- DBC_DBC_TYPE_CQ_ARMENA;
- val <<= 32;
- /* Flush memory writes before enabling the CQ */
- writeq(val, cq->dbr_base);
-}
-
-static void bnxt_qplib_arm_cq(struct bnxt_qplib_cq *cq, u32 arm_type)
-{
- struct bnxt_qplib_hwq *cq_hwq = &cq->hwq;
- u32 sw_cons;
- u64 val = 0;
-
- /* Ring DB */
- val = ((cq->id << DBC_DBC_XID_SFT) & DBC_DBC_XID_MASK) | arm_type;
- val <<= 32;
- sw_cons = HWQ_CMP(cq_hwq->cons, cq_hwq);
- val |= (sw_cons << DBC_DBC_INDEX_SFT) & DBC_DBC_INDEX_MASK;
- /* flush memory writes before arming the CQ */
- writeq(val, cq->dpi->dbr);
-}
-
int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
{
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
- struct cmdq_create_cq req;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
struct creq_create_cq_resp resp;
+ struct cmdq_create_cq req;
struct bnxt_qplib_pbl *pbl;
u16 cmd_flags = 0;
int rc;
- cq->hwq.max_elements = cq->max_wqe;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &cq->hwq, &cq->sg_info,
- &cq->hwq.max_elements,
- BNXT_QPLIB_MAX_CQE_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_QUEUE);
+ hwq_attr.res = res;
+ hwq_attr.depth = cq->max_wqe;
+ hwq_attr.stride = sizeof(struct cq_base);
+ hwq_attr.type = HWQ_TYPE_QUEUE;
+ hwq_attr.sginfo = &cq->sg_info;
+ rc = bnxt_qplib_alloc_init_hwq(&cq->hwq, &hwq_attr);
if (rc)
goto exit;
@@ -1976,7 +2046,6 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
goto fail;
cq->id = le32_to_cpu(resp.xid);
- cq->dbr_base = res->dpi_tbl.dbr_bar_reg_iomem;
cq->period = BNXT_QPLIB_QUEUE_START_PERIOD;
init_waitqueue_head(&cq->waitq);
INIT_LIST_HEAD(&cq->sqf_head);
@@ -1984,11 +2053,17 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
spin_lock_init(&cq->compl_lock);
spin_lock_init(&cq->flush_lock);
- bnxt_qplib_arm_cq_enable(cq);
+ cq->dbinfo.hwq = &cq->hwq;
+ cq->dbinfo.xid = cq->id;
+ cq->dbinfo.db = cq->dpi->dbr;
+ cq->dbinfo.priv_db = res->dpi_tbl.dbr_bar_reg_iomem;
+
+ bnxt_qplib_armen_db(&cq->dbinfo, DBC_DBC_TYPE_CQ_ARMENA);
+
return 0;
fail:
- bnxt_qplib_free_hwq(res->pdev, &cq->hwq);
+ bnxt_qplib_free_hwq(res, &cq->hwq);
exit:
return rc;
}
@@ -1998,6 +2073,7 @@ int bnxt_qplib_destroy_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
struct cmdq_destroy_cq req;
struct creq_destroy_cq_resp resp;
+ u16 total_cnq_events;
u16 cmd_flags = 0;
int rc;
@@ -2008,7 +2084,9 @@ int bnxt_qplib_destroy_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
(void *)&resp, NULL, 0);
if (rc)
return rc;
- bnxt_qplib_free_hwq(res->pdev, &cq->hwq);
+ total_cnq_events = le16_to_cpu(resp.total_cnq_events);
+ __wait_for_all_nqes(cq, total_cnq_events);
+ bnxt_qplib_free_hwq(res, &cq->hwq);
return 0;
}
@@ -2141,8 +2219,7 @@ static int do_wa9060(struct bnxt_qplib_qp *qp, struct bnxt_qplib_cq *cq,
sq->send_phantom = true;
/* TODO: Only ARM if the previous SQE is ARMALL */
- bnxt_qplib_arm_cq(cq, DBC_DBC_TYPE_CQ_ARMALL);
-
+ bnxt_qplib_ring_db(&cq->dbinfo, DBC_DBC_TYPE_CQ_ARMALL);
rc = -EAGAIN;
goto out;
}
@@ -2426,7 +2503,7 @@ static int bnxt_qplib_cq_process_res_ud(struct bnxt_qplib_cq *cq,
}
cqe = *pcqe;
cqe->opcode = hwcqe->cqe_type_toggle & CQ_BASE_CQE_TYPE_MASK;
- cqe->length = (u32)le16_to_cpu(hwcqe->length);
+ cqe->length = le16_to_cpu(hwcqe->length) & CQ_RES_UD_LENGTH_MASK;
cqe->cfa_meta = le16_to_cpu(hwcqe->cfa_metadata);
cqe->invrkey = le32_to_cpu(hwcqe->imm_data);
cqe->flags = le16_to_cpu(hwcqe->flags);
@@ -2812,7 +2889,7 @@ int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
}
if (cq->hwq.cons != raw_cons) {
cq->hwq.cons = raw_cons;
- bnxt_qplib_arm_cq(cq, DBC_DBC_TYPE_CQ);
+ bnxt_qplib_ring_db(&cq->dbinfo, DBC_DBC_TYPE_CQ);
}
exit:
return num_cqes - budget;
@@ -2821,7 +2898,7 @@ int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
void bnxt_qplib_req_notify_cq(struct bnxt_qplib_cq *cq, u32 arm_type)
{
if (arm_type)
- bnxt_qplib_arm_cq(cq, arm_type);
+ bnxt_qplib_ring_db(&cq->dbinfo, arm_type);
/* Using cq->arm_state variable to track whether to issue cq handler */
atomic_set(&cq->arm_state, 1);
}
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.h b/drivers/infiniband/hw/bnxt_re/qplib_fp.h
index 99e0a13cbefa..7edb70b6bb16 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.h
@@ -42,7 +42,7 @@
struct bnxt_qplib_srq {
struct bnxt_qplib_pd *pd;
struct bnxt_qplib_dpi *dpi;
- void __iomem *dbr_base;
+ struct bnxt_qplib_db_info dbinfo;
u64 srq_handle;
u32 id;
u32 max_wqe;
@@ -236,6 +236,7 @@ struct bnxt_qplib_swqe {
struct bnxt_qplib_q {
struct bnxt_qplib_hwq hwq;
struct bnxt_qplib_swq *swq;
+ struct bnxt_qplib_db_info dbinfo;
struct bnxt_qplib_sg_info sg_info;
u32 max_wqe;
u16 q_full_delta;
@@ -370,7 +371,7 @@ struct bnxt_qplib_cqe {
#define BNXT_QPLIB_QUEUE_START_PERIOD 0x01
struct bnxt_qplib_cq {
struct bnxt_qplib_dpi *dpi;
- void __iomem *dbr_base;
+ struct bnxt_qplib_db_info dbinfo;
u32 max_wqe;
u32 id;
u16 count;
@@ -401,6 +402,7 @@ struct bnxt_qplib_cq {
* of the same QP while manipulating the flush list.
*/
spinlock_t flush_lock; /* QP flush management */
+ u16 cnq_events;
};
#define BNXT_QPLIB_MAX_IRRQE_ENTRY_SIZE sizeof(struct xrrq_irrq)
@@ -433,66 +435,32 @@ struct bnxt_qplib_cq {
NQ_DB_IDX_VALID | \
NQ_DB_IRQ_DIS)
-static inline void bnxt_qplib_ring_nq_db64(void __iomem *db, u32 index,
- u32 xid, bool arm)
-{
- u64 val;
+struct bnxt_qplib_nq_db {
+ struct bnxt_qplib_reg_desc reg;
+ struct bnxt_qplib_db_info dbinfo;
+};
- val = xid & DBC_DBC_XID_MASK;
- val |= DBC_DBC_PATH_ROCE;
- val |= arm ? DBC_DBC_TYPE_NQ_ARM : DBC_DBC_TYPE_NQ;
- val <<= 32;
- val |= index & DBC_DBC_INDEX_MASK;
- writeq(val, db);
-}
-
-static inline void bnxt_qplib_ring_nq_db_rearm(void __iomem *db, u32 raw_cons,
- u32 max_elements, u32 xid,
- bool gen_p5)
-{
- u32 index = raw_cons & (max_elements - 1);
-
- if (gen_p5)
- bnxt_qplib_ring_nq_db64(db, index, xid, true);
- else
- writel(NQ_DB_CP_FLAGS_REARM | (index & DBC_DBC32_XID_MASK), db);
-}
-
-static inline void bnxt_qplib_ring_nq_db(void __iomem *db, u32 raw_cons,
- u32 max_elements, u32 xid,
- bool gen_p5)
-{
- u32 index = raw_cons & (max_elements - 1);
-
- if (gen_p5)
- bnxt_qplib_ring_nq_db64(db, index, xid, false);
- else
- writel(NQ_DB_CP_FLAGS | (index & DBC_DBC32_XID_MASK), db);
-}
+typedef int (*cqn_handler_t)(struct bnxt_qplib_nq *nq,
+ struct bnxt_qplib_cq *cq);
+typedef int (*srqn_handler_t)(struct bnxt_qplib_nq *nq,
+ struct bnxt_qplib_srq *srq, u8 event);
struct bnxt_qplib_nq {
- struct pci_dev *pdev;
- struct bnxt_qplib_res *res;
+ struct pci_dev *pdev;
+ struct bnxt_qplib_res *res;
+ char name[32];
+ struct bnxt_qplib_hwq hwq;
+ struct bnxt_qplib_nq_db nq_db;
+ u16 ring_id;
+ int msix_vec;
+ cpumask_t mask;
+ struct tasklet_struct nq_tasklet;
+ bool requested;
+ int budget;
- int vector;
- cpumask_t mask;
- int budget;
- bool requested;
- struct tasklet_struct worker;
- struct bnxt_qplib_hwq hwq;
-
- u16 bar_reg;
- u32 bar_reg_off;
- u16 ring_id;
- void __iomem *bar_reg_iomem;
-
- int (*cqn_handler)(struct bnxt_qplib_nq *nq,
- struct bnxt_qplib_cq *cq);
- int (*srqn_handler)(struct bnxt_qplib_nq *nq,
- struct bnxt_qplib_srq *srq,
- u8 event);
- struct workqueue_struct *cqn_wq;
- char name[32];
+ cqn_handler_t cqn_handler;
+ srqn_handler_t srqn_handler;
+ struct workqueue_struct *cqn_wq;
};
struct bnxt_qplib_nq_work {
@@ -507,11 +475,8 @@ int bnxt_qplib_nq_start_irq(struct bnxt_qplib_nq *nq, int nq_indx,
int msix_vector, bool need_init);
int bnxt_qplib_enable_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq,
int nq_idx, int msix_vector, int bar_reg_offset,
- int (*cqn_handler)(struct bnxt_qplib_nq *nq,
- struct bnxt_qplib_cq *cq),
- int (*srqn_handler)(struct bnxt_qplib_nq *nq,
- struct bnxt_qplib_srq *srq,
- u8 event));
+ cqn_handler_t cqn_handler,
+ srqn_handler_t srq_handler);
int bnxt_qplib_create_srq(struct bnxt_qplib_res *res,
struct bnxt_qplib_srq *srq);
int bnxt_qplib_modify_srq(struct bnxt_qplib_res *res,
@@ -550,7 +515,7 @@ int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
bool bnxt_qplib_is_cq_empty(struct bnxt_qplib_cq *cq);
void bnxt_qplib_req_notify_cq(struct bnxt_qplib_cq *cq, u32 arm_type);
void bnxt_qplib_free_nq(struct bnxt_qplib_nq *nq);
-int bnxt_qplib_alloc_nq(struct pci_dev *pdev, struct bnxt_qplib_nq *nq);
+int bnxt_qplib_alloc_nq(struct bnxt_qplib_res *res, struct bnxt_qplib_nq *nq);
void bnxt_qplib_add_flush_qp(struct bnxt_qplib_qp *qp);
void bnxt_qplib_acquire_cq_locks(struct bnxt_qplib_qp *qp,
unsigned long *flags);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 1291b12287a5..f01e864bb611 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -55,12 +55,14 @@ static void bnxt_qplib_service_creq(unsigned long data);
/* Hardware communication channel */
static int __wait_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie)
{
+ struct bnxt_qplib_cmdq_ctx *cmdq;
u16 cbit;
int rc;
+ cmdq = &rcfw->cmdq;
cbit = cookie % rcfw->cmdq_depth;
- rc = wait_event_timeout(rcfw->waitq,
- !test_bit(cbit, rcfw->cmdq_bitmap),
+ rc = wait_event_timeout(cmdq->waitq,
+ !test_bit(cbit, cmdq->cmdq_bitmap),
msecs_to_jiffies(RCFW_CMD_WAIT_TIME_MS));
return rc ? 0 : -ETIMEDOUT;
};
@@ -68,15 +70,17 @@ static int __wait_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie)
static int __block_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie)
{
u32 count = RCFW_BLOCKED_CMD_WAIT_COUNT;
+ struct bnxt_qplib_cmdq_ctx *cmdq;
u16 cbit;
+ cmdq = &rcfw->cmdq;
cbit = cookie % rcfw->cmdq_depth;
- if (!test_bit(cbit, rcfw->cmdq_bitmap))
+ if (!test_bit(cbit, cmdq->cmdq_bitmap))
goto done;
do {
mdelay(1); /* 1m sec */
bnxt_qplib_service_creq((unsigned long)rcfw);
- } while (test_bit(cbit, rcfw->cmdq_bitmap) && --count);
+ } while (test_bit(cbit, cmdq->cmdq_bitmap) && --count);
done:
return count ? 0 : -ETIMEDOUT;
};
@@ -84,56 +88,60 @@ static int __block_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie)
static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
struct creq_base *resp, void *sb, u8 is_block)
{
- struct bnxt_qplib_cmdqe *cmdqe, **cmdq_ptr;
- struct bnxt_qplib_hwq *cmdq = &rcfw->cmdq;
+ struct bnxt_qplib_cmdq_ctx *cmdq = &rcfw->cmdq;
+ struct bnxt_qplib_cmdqe *cmdqe, **hwq_ptr;
+ struct bnxt_qplib_hwq *hwq = &cmdq->hwq;
+ struct bnxt_qplib_crsqe *crsqe;
u32 cmdq_depth = rcfw->cmdq_depth;
- struct bnxt_qplib_crsq *crsqe;
u32 sw_prod, cmdq_prod;
+ struct pci_dev *pdev;
unsigned long flags;
u32 size, opcode;
u16 cookie, cbit;
u8 *preq;
+ pdev = rcfw->pdev;
+
opcode = req->opcode;
- if (!test_bit(FIRMWARE_INITIALIZED_FLAG, &rcfw->flags) &&
+ if (!test_bit(FIRMWARE_INITIALIZED_FLAG, &cmdq->flags) &&
(opcode != CMDQ_BASE_OPCODE_QUERY_FUNC &&
opcode != CMDQ_BASE_OPCODE_INITIALIZE_FW &&
opcode != CMDQ_BASE_OPCODE_QUERY_VERSION)) {
- dev_err(&rcfw->pdev->dev,
+ dev_err(&pdev->dev,
"RCFW not initialized, reject opcode 0x%x\n", opcode);
return -EINVAL;
}
- if (test_bit(FIRMWARE_INITIALIZED_FLAG, &rcfw->flags) &&
+ if (test_bit(FIRMWARE_INITIALIZED_FLAG, &cmdq->flags) &&
opcode == CMDQ_BASE_OPCODE_INITIALIZE_FW) {
- dev_err(&rcfw->pdev->dev, "RCFW already initialized!\n");
+ dev_err(&pdev->dev, "RCFW already initialized!\n");
return -EINVAL;
}
- if (test_bit(FIRMWARE_TIMED_OUT, &rcfw->flags))
+ if (test_bit(FIRMWARE_TIMED_OUT, &cmdq->flags))
return -ETIMEDOUT;
/* Cmdq are in 16-byte units, each request can consume 1 or more
* cmdqe
*/
- spin_lock_irqsave(&cmdq->lock, flags);
- if (req->cmd_size >= HWQ_FREE_SLOTS(cmdq)) {
- dev_err(&rcfw->pdev->dev, "RCFW: CMDQ is full!\n");
- spin_unlock_irqrestore(&cmdq->lock, flags);
+ spin_lock_irqsave(&hwq->lock, flags);
+ if (req->cmd_size >= HWQ_FREE_SLOTS(hwq)) {
+ dev_err(&pdev->dev, "RCFW: CMDQ is full!\n");
+ spin_unlock_irqrestore(&hwq->lock, flags);
return -EAGAIN;
}
- cookie = rcfw->seq_num & RCFW_MAX_COOKIE_VALUE;
+ cookie = cmdq->seq_num & RCFW_MAX_COOKIE_VALUE;
cbit = cookie % rcfw->cmdq_depth;
if (is_block)
cookie |= RCFW_CMD_IS_BLOCKING;
- set_bit(cbit, rcfw->cmdq_bitmap);
+ set_bit(cbit, cmdq->cmdq_bitmap);
req->cookie = cpu_to_le16(cookie);
crsqe = &rcfw->crsqe_tbl[cbit];
if (crsqe->resp) {
- spin_unlock_irqrestore(&cmdq->lock, flags);
+ spin_unlock_irqrestore(&hwq->lock, flags);
return -EBUSY;
}
@@ -155,15 +163,15 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
BNXT_QPLIB_CMDQE_UNITS;
}
- cmdq_ptr = (struct bnxt_qplib_cmdqe **)cmdq->pbl_ptr;
+ hwq_ptr = (struct bnxt_qplib_cmdqe **)hwq->pbl_ptr;
preq = (u8 *)req;
do {
/* Locate the next cmdq slot */
- sw_prod = HWQ_CMP(cmdq->prod, cmdq);
- cmdqe = &cmdq_ptr[get_cmdq_pg(sw_prod, cmdq_depth)]
+ sw_prod = HWQ_CMP(hwq->prod, hwq);
+ cmdqe = &hwq_ptr[get_cmdq_pg(sw_prod, cmdq_depth)]
[get_cmdq_idx(sw_prod, cmdq_depth)];
if (!cmdqe) {
- dev_err(&rcfw->pdev->dev,
+ dev_err(&pdev->dev,
"RCFW request failed with no cmdqe!\n");
goto done;
}
@@ -172,31 +180,27 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
memcpy(cmdqe, preq, min_t(u32, size, sizeof(*cmdqe)));
preq += min_t(u32, size, sizeof(*cmdqe));
size -= min_t(u32, size, sizeof(*cmdqe));
- cmdq->prod++;
- rcfw->seq_num++;
+ hwq->prod++;
} while (size > 0);
+ cmdq->seq_num++;
- rcfw->seq_num++;
-
- cmdq_prod = cmdq->prod;
- if (test_bit(FIRMWARE_FIRST_FLAG, &rcfw->flags)) {
+ cmdq_prod = hwq->prod;
+ if (test_bit(FIRMWARE_FIRST_FLAG, &cmdq->flags)) {
/* The very first doorbell write
* is required to set this flag
* which prompts the FW to reset
* its internal pointers
*/
cmdq_prod |= BIT(FIRMWARE_FIRST_FLAG);
- clear_bit(FIRMWARE_FIRST_FLAG, &rcfw->flags);
+ clear_bit(FIRMWARE_FIRST_FLAG, &cmdq->flags);
}
/* ring CMDQ DB */
wmb();
- writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
- rcfw->cmdq_bar_reg_prod_off);
- writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
- rcfw->cmdq_bar_reg_trig_off);
+ writel(cmdq_prod, cmdq->cmdq_mbox.prod);
+ writel(RCFW_CMDQ_TRIG_VAL, cmdq->cmdq_mbox.db);
done:
- spin_unlock_irqrestore(&cmdq->lock, flags);
+ spin_unlock_irqrestore(&hwq->lock, flags);
/* Return the CREQ response pointer */
return 0;
}
@@ -236,7 +240,7 @@ int bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
/* timed out */
dev_err(&rcfw->pdev->dev, "cmdq[%#x]=%#x timedout (%d)msec\n",
cookie, opcode, RCFW_CMD_WAIT_TIME_MS);
- set_bit(FIRMWARE_TIMED_OUT, &rcfw->flags);
+ set_bit(FIRMWARE_TIMED_OUT, &rcfw->cmdq.flags);
return rc;
}
@@ -253,6 +257,8 @@ int bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
static int bnxt_qplib_process_func_event(struct bnxt_qplib_rcfw *rcfw,
struct creq_func_event *func_event)
{
+ int rc;
+
switch (func_event->event) {
case CREQ_FUNC_EVENT_EVENT_TX_WQE_ERROR:
break;
@@ -286,37 +292,41 @@ static int bnxt_qplib_process_func_event(struct bnxt_qplib_rcfw *rcfw,
default:
return -EINVAL;
}
- return 0;
+
+ rc = rcfw->creq.aeq_handler(rcfw, (void *)func_event, NULL);
+ return rc;
}
static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
struct creq_qp_event *qp_event)
{
- struct bnxt_qplib_hwq *cmdq = &rcfw->cmdq;
struct creq_qp_error_notification *err_event;
- struct bnxt_qplib_crsq *crsqe;
- unsigned long flags;
+ struct bnxt_qplib_hwq *hwq = &rcfw->cmdq.hwq;
+ struct bnxt_qplib_crsqe *crsqe;
struct bnxt_qplib_qp *qp;
u16 cbit, blocked = 0;
- u16 cookie;
+ struct pci_dev *pdev;
+ unsigned long flags;
__le16 mcookie;
+ u16 cookie;
+ int rc = 0;
u32 qp_id;
+ pdev = rcfw->pdev;
switch (qp_event->event) {
case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION:
err_event = (struct creq_qp_error_notification *)qp_event;
qp_id = le32_to_cpu(err_event->xid);
qp = rcfw->qp_tbl[qp_id].qp_handle;
- dev_dbg(&rcfw->pdev->dev,
- "Received QP error notification\n");
- dev_dbg(&rcfw->pdev->dev,
+ dev_dbg(&pdev->dev, "Received QP error notification\n");
+ dev_dbg(&pdev->dev,
"qpid 0x%x, req_err=0x%x, resp_err=0x%x\n",
qp_id, err_event->req_err_state_reason,
err_event->res_err_state_reason);
if (!qp)
break;
bnxt_qplib_mark_qp_error(qp);
- rcfw->aeq_handler(rcfw, qp_event, qp);
+ rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp);
break;
default:
/*
@@ -328,7 +338,7 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
*
*/
- spin_lock_irqsave_nested(&cmdq->lock, flags,
+ spin_lock_irqsave_nested(&hwq->lock, flags,
SINGLE_DEPTH_NESTING);
cookie = le16_to_cpu(qp_event->cookie);
mcookie = qp_event->cookie;
@@ -342,44 +352,44 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw,
crsqe->resp = NULL;
} else {
if (crsqe->resp && crsqe->resp->cookie)
- dev_err(&rcfw->pdev->dev,
+ dev_err(&pdev->dev,
"CMD %s cookie sent=%#x, recd=%#x\n",
crsqe->resp ? "mismatch" : "collision",
crsqe->resp ? crsqe->resp->cookie : 0,
mcookie);
}
- if (!test_and_clear_bit(cbit, rcfw->cmdq_bitmap))
- dev_warn(&rcfw->pdev->dev,
+ if (!test_and_clear_bit(cbit, rcfw->cmdq.cmdq_bitmap))
+ dev_warn(&pdev->dev,
"CMD bit %d was not requested\n", cbit);
- cmdq->cons += crsqe->req_size;
+ hwq->cons += crsqe->req_size;
crsqe->req_size = 0;
if (!blocked)
- wake_up(&rcfw->waitq);
- spin_unlock_irqrestore(&cmdq->lock, flags);
+ wake_up(&rcfw->cmdq.waitq);
+ spin_unlock_irqrestore(&hwq->lock, flags);
}
- return 0;
+ return rc;
}
/* SP - CREQ Completion handlers */
static void bnxt_qplib_service_creq(unsigned long data)
{
struct bnxt_qplib_rcfw *rcfw = (struct bnxt_qplib_rcfw *)data;
- bool gen_p5 = bnxt_qplib_is_chip_gen_p5(rcfw->res->cctx);
- struct bnxt_qplib_hwq *creq = &rcfw->creq;
+ struct bnxt_qplib_creq_ctx *creq = &rcfw->creq;
u32 type, budget = CREQ_ENTRY_POLL_BUDGET;
- struct creq_base *creqe, **creq_ptr;
+ struct bnxt_qplib_hwq *hwq = &creq->hwq;
+ struct creq_base *creqe, **hwq_ptr;
u32 sw_cons, raw_cons;
unsigned long flags;
/* Service the CREQ until budget is over */
- spin_lock_irqsave(&creq->lock, flags);
- raw_cons = creq->cons;
+ spin_lock_irqsave(&hwq->lock, flags);
+ raw_cons = hwq->cons;
while (budget > 0) {
- sw_cons = HWQ_CMP(raw_cons, creq);
- creq_ptr = (struct creq_base **)creq->pbl_ptr;
- creqe = &creq_ptr[get_creq_pg(sw_cons)][get_creq_idx(sw_cons)];
- if (!CREQ_CMP_VALID(creqe, raw_cons, creq->max_elements))
+ sw_cons = HWQ_CMP(raw_cons, hwq);
+ hwq_ptr = (struct creq_base **)hwq->pbl_ptr;
+ creqe = &hwq_ptr[get_creq_pg(sw_cons)][get_creq_idx(sw_cons)];
+ if (!CREQ_CMP_VALID(creqe, raw_cons, hwq->max_elements))
break;
/* The valid test of the entry must be done first before
* reading any further.
@@ -391,12 +401,12 @@ static void bnxt_qplib_service_creq(unsigned long data)
case CREQ_BASE_TYPE_QP_EVENT:
bnxt_qplib_process_qp_event
(rcfw, (struct creq_qp_event *)creqe);
- rcfw->creq_qp_event_processed++;
+ creq->stats.creq_qp_event_processed++;
break;
case CREQ_BASE_TYPE_FUNC_EVENT:
if (!bnxt_qplib_process_func_event
(rcfw, (struct creq_func_event *)creqe))
- rcfw->creq_func_event_processed++;
+ creq->stats.creq_func_event_processed++;
else
dev_warn(&rcfw->pdev->dev,
"aeqe:%#x Not handled\n", type);
@@ -412,28 +422,30 @@ static void bnxt_qplib_service_creq(unsigned long data)
budget--;
}
- if (creq->cons != raw_cons) {
- creq->cons = raw_cons;
- bnxt_qplib_ring_creq_db_rearm(rcfw->creq_bar_reg_iomem,
- raw_cons, creq->max_elements,
- rcfw->creq_ring_id, gen_p5);
+ if (hwq->cons != raw_cons) {
+ hwq->cons = raw_cons;
+ bnxt_qplib_ring_nq_db(&creq->creq_db.dbinfo,
+ rcfw->res->cctx, true);
}
- spin_unlock_irqrestore(&creq->lock, flags);
+ spin_unlock_irqrestore(&hwq->lock, flags);
}
static irqreturn_t bnxt_qplib_creq_irq(int irq, void *dev_instance)
{
struct bnxt_qplib_rcfw *rcfw = dev_instance;
- struct bnxt_qplib_hwq *creq = &rcfw->creq;
+ struct bnxt_qplib_creq_ctx *creq;
struct creq_base **creq_ptr;
+ struct bnxt_qplib_hwq *hwq;
u32 sw_cons;
+ creq = &rcfw->creq;
+ hwq = &creq->hwq;
/* Prefetch the CREQ element */
- sw_cons = HWQ_CMP(creq->cons, creq);
- creq_ptr = (struct creq_base **)rcfw->creq.pbl_ptr;
+ sw_cons = HWQ_CMP(hwq->cons, hwq);
+ creq_ptr = (struct creq_base **)creq->hwq.pbl_ptr;
prefetch(&creq_ptr[get_creq_pg(sw_cons)][get_creq_idx(sw_cons)]);
- tasklet_schedule(&rcfw->worker);
+ tasklet_schedule(&creq->creq_tasklet);
return IRQ_HANDLED;
}
@@ -452,7 +464,7 @@ int bnxt_qplib_deinit_rcfw(struct bnxt_qplib_rcfw *rcfw)
if (rc)
return rc;
- clear_bit(FIRMWARE_INITIALIZED_FLAG, &rcfw->flags);
+ clear_bit(FIRMWARE_INITIALIZED_FLAG, &rcfw->cmdq.flags);
return 0;
}
@@ -520,9 +532,10 @@ int bnxt_qplib_init_rcfw(struct bnxt_qplib_rcfw *rcfw,
level = ctx->tim_tbl.level;
req.tim_pg_size_tim_lvl = (level << CMDQ_INITIALIZE_FW_TIM_LVL_SFT) |
__get_pbl_pg_idx(&ctx->tim_tbl.pbl[level]);
- level = ctx->tqm_pde_level;
- req.tqm_pg_size_tqm_lvl = (level << CMDQ_INITIALIZE_FW_TQM_LVL_SFT) |
- __get_pbl_pg_idx(&ctx->tqm_pde.pbl[level]);
+ level = ctx->tqm_ctx.pde.level;
+ req.tqm_pg_size_tqm_lvl =
+ (level << CMDQ_INITIALIZE_FW_TQM_LVL_SFT) |
+ __get_pbl_pg_idx(&ctx->tqm_ctx.pde.pbl[level]);
req.qpc_page_dir =
cpu_to_le64(ctx->qpc_tbl.pbl[PBL_LVL_0].pg_map_arr[0]);
@@ -535,7 +548,7 @@ int bnxt_qplib_init_rcfw(struct bnxt_qplib_rcfw *rcfw,
req.tim_page_dir =
cpu_to_le64(ctx->tim_tbl.pbl[PBL_LVL_0].pg_map_arr[0]);
req.tqm_page_dir =
- cpu_to_le64(ctx->tqm_pde.pbl[PBL_LVL_0].pg_map_arr[0]);
+ cpu_to_le64(ctx->tqm_ctx.pde.pbl[PBL_LVL_0].pg_map_arr[0]);
req.number_of_qp = cpu_to_le32(ctx->qpc_tbl.max_elements);
req.number_of_mrw = cpu_to_le32(ctx->mrw_tbl.max_elements);
@@ -555,33 +568,46 @@ int bnxt_qplib_init_rcfw(struct bnxt_qplib_rcfw *rcfw,
NULL, 0);
if (rc)
return rc;
- set_bit(FIRMWARE_INITIALIZED_FLAG, &rcfw->flags);
+ set_bit(FIRMWARE_INITIALIZED_FLAG, &rcfw->cmdq.flags);
return 0;
}
void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw)
{
+ kfree(rcfw->cmdq.cmdq_bitmap);
kfree(rcfw->qp_tbl);
kfree(rcfw->crsqe_tbl);
- bnxt_qplib_free_hwq(rcfw->pdev, &rcfw->cmdq);
- bnxt_qplib_free_hwq(rcfw->pdev, &rcfw->creq);
+ bnxt_qplib_free_hwq(rcfw->res, &rcfw->cmdq.hwq);
+ bnxt_qplib_free_hwq(rcfw->res, &rcfw->creq.hwq);
rcfw->pdev = NULL;
}
-int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
+int bnxt_qplib_alloc_rcfw_channel(struct bnxt_qplib_res *res,
struct bnxt_qplib_rcfw *rcfw,
struct bnxt_qplib_ctx *ctx,
int qp_tbl_sz)
{
- u8 hwq_type;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
+ struct bnxt_qplib_sg_info sginfo = {};
+ struct bnxt_qplib_cmdq_ctx *cmdq;
+ struct bnxt_qplib_creq_ctx *creq;
+ u32 bmap_size = 0;
- rcfw->pdev = pdev;
- rcfw->creq.max_elements = BNXT_QPLIB_CREQE_MAX_CNT;
- hwq_type = bnxt_qplib_get_hwq_type(rcfw->res);
- if (bnxt_qplib_alloc_init_hwq(rcfw->pdev, &rcfw->creq, NULL,
- &rcfw->creq.max_elements,
- BNXT_QPLIB_CREQE_UNITS,
- 0, PAGE_SIZE, hwq_type)) {
+ rcfw->pdev = res->pdev;
+ cmdq = &rcfw->cmdq;
+ creq = &rcfw->creq;
+ rcfw->res = res;
+
+ sginfo.pgsize = PAGE_SIZE;
+ sginfo.pgshft = PAGE_SHIFT;
+
+ hwq_attr.sginfo = &sginfo;
+ hwq_attr.res = rcfw->res;
+ hwq_attr.depth = BNXT_QPLIB_CREQE_MAX_CNT;
+ hwq_attr.stride = BNXT_QPLIB_CREQE_UNITS;
+ hwq_attr.type = bnxt_qplib_get_hwq_type(res);
+
+ if (bnxt_qplib_alloc_init_hwq(&creq->hwq, &hwq_attr)) {
dev_err(&rcfw->pdev->dev,
"HW channel CREQ allocation failed\n");
goto fail;
@@ -591,23 +617,28 @@ int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
else
rcfw->cmdq_depth = BNXT_QPLIB_CMDQE_MAX_CNT_8192;
- rcfw->cmdq.max_elements = rcfw->cmdq_depth;
- if (bnxt_qplib_alloc_init_hwq
- (rcfw->pdev, &rcfw->cmdq, NULL,
- &rcfw->cmdq.max_elements,
- BNXT_QPLIB_CMDQE_UNITS, 0,
- bnxt_qplib_cmdqe_page_size(rcfw->cmdq_depth),
- HWQ_TYPE_CTX)) {
+ sginfo.pgsize = bnxt_qplib_cmdqe_page_size(rcfw->cmdq_depth);
+ hwq_attr.depth = rcfw->cmdq_depth;
+ hwq_attr.stride = BNXT_QPLIB_CMDQE_UNITS;
+ hwq_attr.type = HWQ_TYPE_CTX;
+ if (bnxt_qplib_alloc_init_hwq(&cmdq->hwq, &hwq_attr)) {
dev_err(&rcfw->pdev->dev,
"HW channel CMDQ allocation failed\n");
goto fail;
}
- rcfw->crsqe_tbl = kcalloc(rcfw->cmdq.max_elements,
+ rcfw->crsqe_tbl = kcalloc(cmdq->hwq.max_elements,
sizeof(*rcfw->crsqe_tbl), GFP_KERNEL);
if (!rcfw->crsqe_tbl)
goto fail;
+ bmap_size = BITS_TO_LONGS(rcfw->cmdq_depth) * sizeof(unsigned long);
+ cmdq->cmdq_bitmap = kzalloc(bmap_size, GFP_KERNEL);
+ if (!cmdq->cmdq_bitmap)
+ goto fail;
+
+ cmdq->bmap_size = bmap_size;
+
rcfw->qp_tbl_size = qp_tbl_sz;
rcfw->qp_tbl = kcalloc(qp_tbl_sz, sizeof(struct bnxt_qplib_qp_node),
GFP_KERNEL);
@@ -623,137 +654,199 @@ int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
void bnxt_qplib_rcfw_stop_irq(struct bnxt_qplib_rcfw *rcfw, bool kill)
{
- bool gen_p5 = bnxt_qplib_is_chip_gen_p5(rcfw->res->cctx);
+ struct bnxt_qplib_creq_ctx *creq;
- tasklet_disable(&rcfw->worker);
+ creq = &rcfw->creq;
+ tasklet_disable(&creq->creq_tasklet);
/* Mask h/w interrupts */
- bnxt_qplib_ring_creq_db(rcfw->creq_bar_reg_iomem, rcfw->creq.cons,
- rcfw->creq.max_elements, rcfw->creq_ring_id,
- gen_p5);
+ bnxt_qplib_ring_nq_db(&creq->creq_db.dbinfo, rcfw->res->cctx, false);
/* Sync with last running IRQ-handler */
- synchronize_irq(rcfw->vector);
+ synchronize_irq(creq->msix_vec);
if (kill)
- tasklet_kill(&rcfw->worker);
+ tasklet_kill(&creq->creq_tasklet);
- if (rcfw->requested) {
- free_irq(rcfw->vector, rcfw);
- rcfw->requested = false;
+ if (creq->requested) {
+ free_irq(creq->msix_vec, rcfw);
+ creq->requested = false;
}
}
void bnxt_qplib_disable_rcfw_channel(struct bnxt_qplib_rcfw *rcfw)
{
+ struct bnxt_qplib_creq_ctx *creq;
+ struct bnxt_qplib_cmdq_ctx *cmdq;
unsigned long indx;
+ creq = &rcfw->creq;
+ cmdq = &rcfw->cmdq;
+ /* Make sure the HW channel is stopped! */
bnxt_qplib_rcfw_stop_irq(rcfw, true);
- iounmap(rcfw->cmdq_bar_reg_iomem);
- iounmap(rcfw->creq_bar_reg_iomem);
+ iounmap(cmdq->cmdq_mbox.reg.bar_reg);
+ iounmap(creq->creq_db.reg.bar_reg);
- indx = find_first_bit(rcfw->cmdq_bitmap, rcfw->bmap_size);
- if (indx != rcfw->bmap_size)
+ indx = find_first_bit(cmdq->cmdq_bitmap, cmdq->bmap_size);
+ if (indx != cmdq->bmap_size)
dev_err(&rcfw->pdev->dev,
"disabling RCFW with pending cmd-bit %lx\n", indx);
- kfree(rcfw->cmdq_bitmap);
- rcfw->bmap_size = 0;
- rcfw->cmdq_bar_reg_iomem = NULL;
- rcfw->creq_bar_reg_iomem = NULL;
- rcfw->aeq_handler = NULL;
- rcfw->vector = 0;
+ cmdq->cmdq_mbox.reg.bar_reg = NULL;
+ creq->creq_db.reg.bar_reg = NULL;
+ creq->aeq_handler = NULL;
+ creq->msix_vec = 0;
}
int bnxt_qplib_rcfw_start_irq(struct bnxt_qplib_rcfw *rcfw, int msix_vector,
bool need_init)
{
- bool gen_p5 = bnxt_qplib_is_chip_gen_p5(rcfw->res->cctx);
+ struct bnxt_qplib_creq_ctx *creq;
int rc;
- if (rcfw->requested)
+ creq = &rcfw->creq;
+
+ if (creq->requested)
return -EFAULT;
- rcfw->vector = msix_vector;
+ creq->msix_vec = msix_vector;
if (need_init)
- tasklet_init(&rcfw->worker,
+ tasklet_init(&creq->creq_tasklet,
bnxt_qplib_service_creq, (unsigned long)rcfw);
else
- tasklet_enable(&rcfw->worker);
- rc = request_irq(rcfw->vector, bnxt_qplib_creq_irq, 0,
+ tasklet_enable(&creq->creq_tasklet);
+ rc = request_irq(creq->msix_vec, bnxt_qplib_creq_irq, 0,
"bnxt_qplib_creq", rcfw);
if (rc)
return rc;
- rcfw->requested = true;
- bnxt_qplib_ring_creq_db_rearm(rcfw->creq_bar_reg_iomem,
- rcfw->creq.cons, rcfw->creq.max_elements,
- rcfw->creq_ring_id, gen_p5);
+ creq->requested = true;
+
+ bnxt_qplib_ring_nq_db(&creq->creq_db.dbinfo, rcfw->res->cctx, true);
return 0;
}
-int bnxt_qplib_enable_rcfw_channel(struct pci_dev *pdev,
- struct bnxt_qplib_rcfw *rcfw,
+static int bnxt_qplib_map_cmdq_mbox(struct bnxt_qplib_rcfw *rcfw, bool is_vf)
+{
+ struct bnxt_qplib_cmdq_mbox *mbox;
+ resource_size_t bar_reg;
+ struct pci_dev *pdev;
+ u16 prod_offt;
+ int rc = 0;
+
+ pdev = rcfw->pdev;
+ mbox = &rcfw->cmdq.cmdq_mbox;
+
+ mbox->reg.bar_id = RCFW_COMM_PCI_BAR_REGION;
+ mbox->reg.len = RCFW_COMM_SIZE;
+ mbox->reg.bar_base = pci_resource_start(pdev, mbox->reg.bar_id);
+ if (!mbox->reg.bar_base) {
+ dev_err(&pdev->dev,
+ "QPLIB: CMDQ BAR region %d resc start is 0!\n",
+ mbox->reg.bar_id);
+ return -ENOMEM;
+ }
+
+ bar_reg = mbox->reg.bar_base + RCFW_COMM_BASE_OFFSET;
+ mbox->reg.len = RCFW_COMM_SIZE;
+ mbox->reg.bar_reg = ioremap(bar_reg, mbox->reg.len);
+ if (!mbox->reg.bar_reg) {
+ dev_err(&pdev->dev,
+ "QPLIB: CMDQ BAR region %d mapping failed\n",
+ mbox->reg.bar_id);
+ return -ENOMEM;
+ }
+
+ prod_offt = is_vf ? RCFW_VF_COMM_PROD_OFFSET :
+ RCFW_PF_COMM_PROD_OFFSET;
+ mbox->prod = (void __iomem *)(mbox->reg.bar_reg + prod_offt);
+ mbox->db = (void __iomem *)(mbox->reg.bar_reg + RCFW_COMM_TRIG_OFFSET);
+ return rc;
+}
+
+static int bnxt_qplib_map_creq_db(struct bnxt_qplib_rcfw *rcfw, u32 reg_offt)
+{
+ struct bnxt_qplib_creq_db *creq_db;
+ resource_size_t bar_reg;
+ struct pci_dev *pdev;
+
+ pdev = rcfw->pdev;
+ creq_db = &rcfw->creq.creq_db;
+
+ creq_db->reg.bar_id = RCFW_COMM_CONS_PCI_BAR_REGION;
+ creq_db->reg.bar_base = pci_resource_start(pdev, creq_db->reg.bar_id);
+ if (!creq_db->reg.bar_id)
+ dev_err(&pdev->dev,
+ "QPLIB: CREQ BAR region %d resc start is 0!",
+ creq_db->reg.bar_id);
+
+ bar_reg = creq_db->reg.bar_base + reg_offt;
+ /* Unconditionally map 8 bytes to support 57500 series */
+ creq_db->reg.len = 8;
+ creq_db->reg.bar_reg = ioremap(bar_reg, creq_db->reg.len);
+ if (!creq_db->reg.bar_reg) {
+ dev_err(&pdev->dev,
+ "QPLIB: CREQ BAR region %d mapping failed",
+ creq_db->reg.bar_id);
+ return -ENOMEM;
+ }
+ creq_db->dbinfo.db = creq_db->reg.bar_reg;
+ creq_db->dbinfo.hwq = &rcfw->creq.hwq;
+ creq_db->dbinfo.xid = rcfw->creq.ring_id;
+ return 0;
+}
+
+static void bnxt_qplib_start_rcfw(struct bnxt_qplib_rcfw *rcfw)
+{
+ struct bnxt_qplib_cmdq_ctx *cmdq;
+ struct bnxt_qplib_creq_ctx *creq;
+ struct bnxt_qplib_cmdq_mbox *mbox;
+ struct cmdq_init init = {0};
+
+ cmdq = &rcfw->cmdq;
+ creq = &rcfw->creq;
+ mbox = &cmdq->cmdq_mbox;
+
+ init.cmdq_pbl = cpu_to_le64(cmdq->hwq.pbl[PBL_LVL_0].pg_map_arr[0]);
+ init.cmdq_size_cmdq_lvl =
+ cpu_to_le16(((rcfw->cmdq_depth <<
+ CMDQ_INIT_CMDQ_SIZE_SFT) &
+ CMDQ_INIT_CMDQ_SIZE_MASK) |
+ ((cmdq->hwq.level <<
+ CMDQ_INIT_CMDQ_LVL_SFT) &
+ CMDQ_INIT_CMDQ_LVL_MASK));
+ init.creq_ring_id = cpu_to_le16(creq->ring_id);
+ /* Write to the Bono mailbox register */
+ __iowrite32_copy(mbox->reg.bar_reg, &init, sizeof(init) / 4);
+}
+
+int bnxt_qplib_enable_rcfw_channel(struct bnxt_qplib_rcfw *rcfw,
int msix_vector,
int cp_bar_reg_off, int virt_fn,
- int (*aeq_handler)(struct bnxt_qplib_rcfw *,
- void *, void *))
+ aeq_handler_t aeq_handler)
{
- resource_size_t res_base;
- struct cmdq_init init;
- u16 bmap_size;
+ struct bnxt_qplib_cmdq_ctx *cmdq;
+ struct bnxt_qplib_creq_ctx *creq;
int rc;
- /* General */
- rcfw->seq_num = 0;
- set_bit(FIRMWARE_FIRST_FLAG, &rcfw->flags);
- bmap_size = BITS_TO_LONGS(rcfw->cmdq_depth) * sizeof(unsigned long);
- rcfw->cmdq_bitmap = kzalloc(bmap_size, GFP_KERNEL);
- if (!rcfw->cmdq_bitmap)
- return -ENOMEM;
- rcfw->bmap_size = bmap_size;
+ cmdq = &rcfw->cmdq;
+ creq = &rcfw->creq;
- /* CMDQ */
- rcfw->cmdq_bar_reg = RCFW_COMM_PCI_BAR_REGION;
- res_base = pci_resource_start(pdev, rcfw->cmdq_bar_reg);
- if (!res_base)
- return -ENOMEM;
+ /* Clear to defaults */
- rcfw->cmdq_bar_reg_iomem = ioremap(res_base +
- RCFW_COMM_BASE_OFFSET,
- RCFW_COMM_SIZE);
- if (!rcfw->cmdq_bar_reg_iomem) {
- dev_err(&rcfw->pdev->dev, "CMDQ BAR region %d mapping failed\n",
- rcfw->cmdq_bar_reg);
- return -ENOMEM;
- }
+ cmdq->seq_num = 0;
+ set_bit(FIRMWARE_FIRST_FLAG, &cmdq->flags);
+ init_waitqueue_head(&cmdq->waitq);
- rcfw->cmdq_bar_reg_prod_off = virt_fn ? RCFW_VF_COMM_PROD_OFFSET :
- RCFW_PF_COMM_PROD_OFFSET;
+ creq->stats.creq_qp_event_processed = 0;
+ creq->stats.creq_func_event_processed = 0;
+ creq->aeq_handler = aeq_handler;
- rcfw->cmdq_bar_reg_trig_off = RCFW_COMM_TRIG_OFFSET;
+ rc = bnxt_qplib_map_cmdq_mbox(rcfw, virt_fn);
+ if (rc)
+ return rc;
- /* CREQ */
- rcfw->creq_bar_reg = RCFW_COMM_CONS_PCI_BAR_REGION;
- res_base = pci_resource_start(pdev, rcfw->creq_bar_reg);
- if (!res_base)
- dev_err(&rcfw->pdev->dev,
- "CREQ BAR region %d resc start is 0!\n",
- rcfw->creq_bar_reg);
- /* Unconditionally map 8 bytes to support 57500 series */
- rcfw->creq_bar_reg_iomem = ioremap(res_base + cp_bar_reg_off,
- 8);
- if (!rcfw->creq_bar_reg_iomem) {
- dev_err(&rcfw->pdev->dev, "CREQ BAR region %d mapping failed\n",
- rcfw->creq_bar_reg);
- iounmap(rcfw->cmdq_bar_reg_iomem);
- rcfw->cmdq_bar_reg_iomem = NULL;
- return -ENOMEM;
- }
- rcfw->creq_qp_event_processed = 0;
- rcfw->creq_func_event_processed = 0;
-
- if (aeq_handler)
- rcfw->aeq_handler = aeq_handler;
- init_waitqueue_head(&rcfw->waitq);
+ rc = bnxt_qplib_map_creq_db(rcfw, cp_bar_reg_off);
+ if (rc)
+ return rc;
rc = bnxt_qplib_rcfw_start_irq(rcfw, msix_vector, true);
if (rc) {
@@ -763,16 +856,8 @@ int bnxt_qplib_enable_rcfw_channel(struct pci_dev *pdev,
return rc;
}
- init.cmdq_pbl = cpu_to_le64(rcfw->cmdq.pbl[PBL_LVL_0].pg_map_arr[0]);
- init.cmdq_size_cmdq_lvl = cpu_to_le16(
- ((rcfw->cmdq_depth << CMDQ_INIT_CMDQ_SIZE_SFT) &
- CMDQ_INIT_CMDQ_SIZE_MASK) |
- ((rcfw->cmdq.level << CMDQ_INIT_CMDQ_LVL_SFT) &
- CMDQ_INIT_CMDQ_LVL_MASK));
- init.creq_ring_id = cpu_to_le16(rcfw->creq_ring_id);
+ bnxt_qplib_start_rcfw(rcfw);
- /* Write to the Bono mailbox register */
- __iowrite32_copy(rcfw->cmdq_bar_reg_iomem, &init, sizeof(init) / 4);
return 0;
}
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
index dfeadc192e17..411fce3493b6 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
@@ -206,8 +206,9 @@ static inline void bnxt_qplib_ring_creq_db(void __iomem *db, u32 raw_cons,
#define CREQ_ENTRY_POLL_BUDGET 0x100
/* HWQ */
+typedef int (*aeq_handler_t)(struct bnxt_qplib_rcfw *, void *, void *);
-struct bnxt_qplib_crsq {
+struct bnxt_qplib_crsqe {
struct creq_qp_event *resp;
u32 req_size;
};
@@ -225,41 +226,53 @@ struct bnxt_qplib_qp_node {
#define BNXT_QPLIB_OOS_COUNT_MASK 0xFFFFFFFF
+#define FIRMWARE_INITIALIZED_FLAG (0)
+#define FIRMWARE_FIRST_FLAG (31)
+#define FIRMWARE_TIMED_OUT (3)
+struct bnxt_qplib_cmdq_mbox {
+ struct bnxt_qplib_reg_desc reg;
+ void __iomem *prod;
+ void __iomem *db;
+};
+
+struct bnxt_qplib_cmdq_ctx {
+ struct bnxt_qplib_hwq hwq;
+ struct bnxt_qplib_cmdq_mbox cmdq_mbox;
+ wait_queue_head_t waitq;
+ unsigned long flags;
+ unsigned long *cmdq_bitmap;
+ u32 bmap_size;
+ u32 seq_num;
+};
+
+struct bnxt_qplib_creq_db {
+ struct bnxt_qplib_reg_desc reg;
+ struct bnxt_qplib_db_info dbinfo;
+};
+
+struct bnxt_qplib_creq_stat {
+ u64 creq_qp_event_processed;
+ u64 creq_func_event_processed;
+};
+
+struct bnxt_qplib_creq_ctx {
+ struct bnxt_qplib_hwq hwq;
+ struct bnxt_qplib_creq_db creq_db;
+ struct bnxt_qplib_creq_stat stats;
+ struct tasklet_struct creq_tasklet;
+ aeq_handler_t aeq_handler;
+ u16 ring_id;
+ int msix_vec;
+ bool requested; /*irq handler installed */
+};
+
/* RCFW Communication Channels */
struct bnxt_qplib_rcfw {
struct pci_dev *pdev;
struct bnxt_qplib_res *res;
- int vector;
- struct tasklet_struct worker;
- bool requested;
- unsigned long *cmdq_bitmap;
- u32 bmap_size;
- unsigned long flags;
-#define FIRMWARE_INITIALIZED_FLAG 0
-#define FIRMWARE_FIRST_FLAG 31
-#define FIRMWARE_TIMED_OUT 3
- wait_queue_head_t waitq;
- int (*aeq_handler)(struct bnxt_qplib_rcfw *,
- void *, void *);
- u32 seq_num;
-
- /* Bar region info */
- void __iomem *cmdq_bar_reg_iomem;
- u16 cmdq_bar_reg;
- u16 cmdq_bar_reg_prod_off;
- u16 cmdq_bar_reg_trig_off;
- u16 creq_ring_id;
- u16 creq_bar_reg;
- void __iomem *creq_bar_reg_iomem;
-
- /* Cmd-Resp and Async Event notification queue */
- struct bnxt_qplib_hwq creq;
- u64 creq_qp_event_processed;
- u64 creq_func_event_processed;
-
- /* Actual Cmd and Resp Queues */
- struct bnxt_qplib_hwq cmdq;
- struct bnxt_qplib_crsq *crsqe_tbl;
+ struct bnxt_qplib_cmdq_ctx cmdq;
+ struct bnxt_qplib_creq_ctx creq;
+ struct bnxt_qplib_crsqe *crsqe_tbl;
int qp_tbl_size;
struct bnxt_qplib_qp_node *qp_tbl;
u64 oos_prev;
@@ -268,7 +281,7 @@ struct bnxt_qplib_rcfw {
};
void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw);
-int bnxt_qplib_alloc_rcfw_channel(struct pci_dev *pdev,
+int bnxt_qplib_alloc_rcfw_channel(struct bnxt_qplib_res *res,
struct bnxt_qplib_rcfw *rcfw,
struct bnxt_qplib_ctx *ctx,
int qp_tbl_sz);
@@ -276,12 +289,10 @@ void bnxt_qplib_rcfw_stop_irq(struct bnxt_qplib_rcfw *rcfw, bool kill);
void bnxt_qplib_disable_rcfw_channel(struct bnxt_qplib_rcfw *rcfw);
int bnxt_qplib_rcfw_start_irq(struct bnxt_qplib_rcfw *rcfw, int msix_vector,
bool need_init);
-int bnxt_qplib_enable_rcfw_channel(struct pci_dev *pdev,
- struct bnxt_qplib_rcfw *rcfw,
+int bnxt_qplib_enable_rcfw_channel(struct bnxt_qplib_rcfw *rcfw,
int msix_vector,
int cp_bar_reg_off, int virt_fn,
- int (*aeq_handler)(struct bnxt_qplib_rcfw *,
- void *aeqe, void *obj));
+ aeq_handler_t aeq_handler);
struct bnxt_qplib_rcfw_sbuf *bnxt_qplib_rcfw_alloc_sbuf(
struct bnxt_qplib_rcfw *rcfw,
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 60ea1b924b67..cab1adf1fed9 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -44,6 +44,7 @@
#include
#include
#include
+#include
#include "roce_hsi.h"
#include "qplib_res.h"
#include "qplib_sp.h"
@@ -55,9 +56,10 @@ static int bnxt_qplib_alloc_stats_ctx(struct pci_dev *pdev,
struct bnxt_qplib_stats *stats);
/* PBL */
-static void __free_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
+static void __free_pbl(struct bnxt_qplib_res *res, struct bnxt_qplib_pbl *pbl,
bool is_umem)
{
+ struct pci_dev *pdev = res->pdev;
int i;
if (!is_umem) {
@@ -74,35 +76,56 @@ static void __free_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
pbl->pg_arr[i] = NULL;
}
}
- kfree(pbl->pg_arr);
+ vfree(pbl->pg_arr);
pbl->pg_arr = NULL;
- kfree(pbl->pg_map_arr);
+ vfree(pbl->pg_map_arr);
pbl->pg_map_arr = NULL;
pbl->pg_count = 0;
pbl->pg_size = 0;
}
-static int __alloc_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
- struct scatterlist *sghead, u32 pages,
- u32 nmaps, u32 pg_size)
+static void bnxt_qplib_fill_user_dma_pages(struct bnxt_qplib_pbl *pbl,
+ struct bnxt_qplib_sg_info *sginfo)
{
+ struct scatterlist *sghead = sginfo->sghead;
struct sg_dma_page_iter sg_iter;
+ int i = 0;
+
+ for_each_sg_dma_page(sghead, &sg_iter, sginfo->nmap, 0) {
+ pbl->pg_map_arr[i] = sg_page_iter_dma_address(&sg_iter);
+ pbl->pg_arr[i] = NULL;
+ pbl->pg_count++;
+ i++;
+ }
+}
+
+static int __alloc_pbl(struct bnxt_qplib_res *res,
+ struct bnxt_qplib_pbl *pbl,
+ struct bnxt_qplib_sg_info *sginfo)
+{
+ struct pci_dev *pdev = res->pdev;
+ struct scatterlist *sghead;
bool is_umem = false;
+ u32 pages;
int i;
+ if (sginfo->nopte)
+ return 0;
+ pages = sginfo->npages;
+ sghead = sginfo->sghead;
/* page ptr arrays */
- pbl->pg_arr = kcalloc(pages, sizeof(void *), GFP_KERNEL);
+ pbl->pg_arr = vmalloc(pages * sizeof(void *));
if (!pbl->pg_arr)
return -ENOMEM;
- pbl->pg_map_arr = kcalloc(pages, sizeof(dma_addr_t), GFP_KERNEL);
+ pbl->pg_map_arr = vmalloc(pages * sizeof(dma_addr_t));
if (!pbl->pg_map_arr) {
- kfree(pbl->pg_arr);
+ vfree(pbl->pg_arr);
pbl->pg_arr = NULL;
return -ENOMEM;
}
pbl->pg_count = 0;
- pbl->pg_size = pg_size;
+ pbl->pg_size = sginfo->pgsize;
if (!sghead) {
for (i = 0; i < pages; i++) {
@@ -115,25 +138,19 @@ static int __alloc_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
pbl->pg_count++;
}
} else {
- i = 0;
is_umem = true;
- for_each_sg_dma_page(sghead, &sg_iter, nmaps, 0) {
- pbl->pg_map_arr[i] = sg_page_iter_dma_address(&sg_iter);
- pbl->pg_arr[i] = NULL;
- pbl->pg_count++;
- i++;
- }
+ bnxt_qplib_fill_user_dma_pages(pbl, sginfo);
}
return 0;
-
fail:
- __free_pbl(pdev, pbl, is_umem);
+ __free_pbl(res, pbl, is_umem);
return -ENOMEM;
}
/* HWQ */
-void bnxt_qplib_free_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq)
+void bnxt_qplib_free_hwq(struct bnxt_qplib_res *res,
+ struct bnxt_qplib_hwq *hwq)
{
int i;
@@ -144,9 +161,9 @@ void bnxt_qplib_free_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq)
for (i = 0; i < hwq->level + 1; i++) {
if (i == hwq->level)
- __free_pbl(pdev, &hwq->pbl[i], hwq->is_user);
+ __free_pbl(res, &hwq->pbl[i], hwq->is_user);
else
- __free_pbl(pdev, &hwq->pbl[i], false);
+ __free_pbl(res, &hwq->pbl[i], false);
}
hwq->level = PBL_LVL_MAX;
@@ -158,79 +175,113 @@ void bnxt_qplib_free_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq)
}
/* All HWQs are power of 2 in size */
-int bnxt_qplib_alloc_init_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq,
- struct bnxt_qplib_sg_info *sg_info,
- u32 *elements, u32 element_size, u32 aux,
- u32 pg_size, enum bnxt_qplib_hwq_type hwq_type)
+
+int bnxt_qplib_alloc_init_hwq(struct bnxt_qplib_hwq *hwq,
+ struct bnxt_qplib_hwq_attr *hwq_attr)
{
- u32 pages, maps, slots, size, aux_pages = 0, aux_size = 0;
+ u32 npages, aux_slots, pg_size, aux_pages = 0, aux_size = 0;
+ struct bnxt_qplib_sg_info sginfo = {};
+ u32 depth, stride, npbl, npde;
dma_addr_t *src_phys_ptr, **dst_virt_ptr;
struct scatterlist *sghead = NULL;
- int i, rc;
+ struct bnxt_qplib_res *res;
+ struct pci_dev *pdev;
+ int i, rc, lvl;
+ res = hwq_attr->res;
+ pdev = res->pdev;
+ sghead = hwq_attr->sginfo->sghead;
+ pg_size = hwq_attr->sginfo->pgsize;
hwq->level = PBL_LVL_MAX;
- slots = roundup_pow_of_two(*elements);
- if (aux) {
- aux_size = roundup_pow_of_two(aux);
- aux_pages = (slots * aux_size) / pg_size;
- if ((slots * aux_size) % pg_size)
+ depth = roundup_pow_of_two(hwq_attr->depth);
+ stride = roundup_pow_of_two(hwq_attr->stride);
+ if (hwq_attr->aux_depth) {
+ aux_slots = hwq_attr->aux_depth;
+ aux_size = roundup_pow_of_two(hwq_attr->aux_stride);
+ aux_pages = (aux_slots * aux_size) / pg_size;
+ if ((aux_slots * aux_size) % pg_size)
aux_pages++;
}
- size = roundup_pow_of_two(element_size);
-
- if (sg_info)
- sghead = sg_info->sglist;
if (!sghead) {
hwq->is_user = false;
- pages = (slots * size) / pg_size + aux_pages;
- if ((slots * size) % pg_size)
- pages++;
- if (!pages)
+ npages = (depth * stride) / pg_size + aux_pages;
+ if ((depth * stride) % pg_size)
+ npages++;
+ if (!npages)
return -EINVAL;
- maps = 0;
+ hwq_attr->sginfo->npages = npages;
} else {
hwq->is_user = true;
- pages = sg_info->npages;
- maps = sg_info->nmap;
+ npages = hwq_attr->sginfo->npages;
+ npages = (npages * PAGE_SIZE) /
+ BIT_ULL(hwq_attr->sginfo->pgshft);
+ if ((hwq_attr->sginfo->npages * PAGE_SIZE) %
+ BIT_ULL(hwq_attr->sginfo->pgshft))
+ if (!npages)
+ npages++;
}
- /* Alloc the 1st memory block; can be a PDL/PTL/PBL */
- if (sghead && (pages == MAX_PBL_LVL_0_PGS))
- rc = __alloc_pbl(pdev, &hwq->pbl[PBL_LVL_0], sghead,
- pages, maps, pg_size);
- else
- rc = __alloc_pbl(pdev, &hwq->pbl[PBL_LVL_0], NULL,
- 1, 0, pg_size);
- if (rc)
- goto fail;
+ if (npages == MAX_PBL_LVL_0_PGS) {
+ /* This request is Level 0, map PTE */
+ rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_0], hwq_attr->sginfo);
+ if (rc)
+ goto fail;
+ hwq->level = PBL_LVL_0;
+ }
- hwq->level = PBL_LVL_0;
-
- if (pages > MAX_PBL_LVL_0_PGS) {
- if (pages > MAX_PBL_LVL_1_PGS) {
+ if (npages > MAX_PBL_LVL_0_PGS) {
+ if (npages > MAX_PBL_LVL_1_PGS) {
+ u32 flag = (hwq_attr->type == HWQ_TYPE_L2_CMPL) ?
+ 0 : PTU_PTE_VALID;
/* 2 levels of indirection */
- rc = __alloc_pbl(pdev, &hwq->pbl[PBL_LVL_1], NULL,
- MAX_PBL_LVL_1_PGS_FOR_LVL_2,
- 0, pg_size);
+ npbl = npages >> MAX_PBL_LVL_1_PGS_SHIFT;
+ if (npages % BIT(MAX_PBL_LVL_1_PGS_SHIFT))
+ npbl++;
+ npde = npbl >> MAX_PDL_LVL_SHIFT;
+ if (npbl % BIT(MAX_PDL_LVL_SHIFT))
+ npde++;
+ /* Alloc PDE pages */
+ sginfo.pgsize = npde * pg_size;
+ sginfo.npages = 1;
+ rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_0], &sginfo);
+
+ /* Alloc PBL pages */
+ sginfo.npages = npbl;
+ sginfo.pgsize = PAGE_SIZE;
+ rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_1], &sginfo);
if (rc)
goto fail;
- /* Fill in lvl0 PBL */
+ /* Fill PDL with PBL page pointers */
dst_virt_ptr =
(dma_addr_t **)hwq->pbl[PBL_LVL_0].pg_arr;
src_phys_ptr = hwq->pbl[PBL_LVL_1].pg_map_arr;
- for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count; i++)
- dst_virt_ptr[PTR_PG(i)][PTR_IDX(i)] =
- src_phys_ptr[i] | PTU_PDE_VALID;
- hwq->level = PBL_LVL_1;
-
- rc = __alloc_pbl(pdev, &hwq->pbl[PBL_LVL_2], sghead,
- pages, maps, pg_size);
+ if (hwq_attr->type == HWQ_TYPE_MR) {
+ /* For MR it is expected that we supply only 1 contigous
+ * page i.e only 1 entry in the PDL that will contain
+ * all the PBLs for the user supplied memory region
+ */
+ for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count;
+ i++)
+ dst_virt_ptr[0][i] = src_phys_ptr[i] |
+ flag;
+ } else {
+ for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count;
+ i++)
+ dst_virt_ptr[PTR_PG(i)][PTR_IDX(i)] =
+ src_phys_ptr[i] |
+ PTU_PDE_VALID;
+ }
+ /* Alloc or init PTEs */
+ rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_2],
+ hwq_attr->sginfo);
if (rc)
goto fail;
-
- /* Fill in lvl1 PBL */
+ hwq->level = PBL_LVL_2;
+ if (hwq_attr->sginfo->nopte)
+ goto done;
+ /* Fill PBLs with PTE pointers */
dst_virt_ptr =
(dma_addr_t **)hwq->pbl[PBL_LVL_1].pg_arr;
src_phys_ptr = hwq->pbl[PBL_LVL_2].pg_map_arr;
@@ -238,7 +289,7 @@ int bnxt_qplib_alloc_init_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq,
dst_virt_ptr[PTR_PG(i)][PTR_IDX(i)] =
src_phys_ptr[i] | PTU_PTE_VALID;
}
- if (hwq_type == HWQ_TYPE_QUEUE) {
+ if (hwq_attr->type == HWQ_TYPE_QUEUE) {
/* Find the last pg of the size */
i = hwq->pbl[PBL_LVL_2].pg_count;
dst_virt_ptr[PTR_PG(i - 1)][PTR_IDX(i - 1)] |=
@@ -248,25 +299,36 @@ int bnxt_qplib_alloc_init_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq,
[PTR_IDX(i - 2)] |=
PTU_PTE_NEXT_TO_LAST;
}
- hwq->level = PBL_LVL_2;
- } else {
- u32 flag = hwq_type == HWQ_TYPE_L2_CMPL ? 0 :
- PTU_PTE_VALID;
+ } else { /* pages < 512 npbl = 1, npde = 0 */
+ u32 flag = (hwq_attr->type == HWQ_TYPE_L2_CMPL) ?
+ 0 : PTU_PTE_VALID;
/* 1 level of indirection */
- rc = __alloc_pbl(pdev, &hwq->pbl[PBL_LVL_1], sghead,
- pages, maps, pg_size);
+ npbl = npages >> MAX_PBL_LVL_1_PGS_SHIFT;
+ if (npages % BIT(MAX_PBL_LVL_1_PGS_SHIFT))
+ npbl++;
+ sginfo.npages = npbl;
+ sginfo.pgsize = PAGE_SIZE;
+ /* Alloc PBL page */
+ rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_0], &sginfo);
if (rc)
goto fail;
- /* Fill in lvl0 PBL */
+ /* Alloc or init PTEs */
+ rc = __alloc_pbl(res, &hwq->pbl[PBL_LVL_1],
+ hwq_attr->sginfo);
+ if (rc)
+ goto fail;
+ hwq->level = PBL_LVL_1;
+ if (hwq_attr->sginfo->nopte)
+ goto done;
+ /* Fill PBL with PTE pointers */
dst_virt_ptr =
(dma_addr_t **)hwq->pbl[PBL_LVL_0].pg_arr;
src_phys_ptr = hwq->pbl[PBL_LVL_1].pg_map_arr;
- for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count; i++) {
+ for (i = 0; i < hwq->pbl[PBL_LVL_1].pg_count; i++)
dst_virt_ptr[PTR_PG(i)][PTR_IDX(i)] =
src_phys_ptr[i] | flag;
- }
- if (hwq_type == HWQ_TYPE_QUEUE) {
+ if (hwq_attr->type == HWQ_TYPE_QUEUE) {
/* Find the last pg of the size */
i = hwq->pbl[PBL_LVL_1].pg_count;
dst_virt_ptr[PTR_PG(i - 1)][PTR_IDX(i - 1)] |=
@@ -276,42 +338,141 @@ int bnxt_qplib_alloc_init_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq,
[PTR_IDX(i - 2)] |=
PTU_PTE_NEXT_TO_LAST;
}
- hwq->level = PBL_LVL_1;
}
}
- hwq->pdev = pdev;
- spin_lock_init(&hwq->lock);
+done:
hwq->prod = 0;
hwq->cons = 0;
- *elements = hwq->max_elements = slots;
- hwq->element_size = size;
-
+ hwq->pdev = pdev;
+ hwq->depth = hwq_attr->depth;
+ hwq->max_elements = depth;
+ hwq->element_size = stride;
/* For direct access to the elements */
- hwq->pbl_ptr = hwq->pbl[hwq->level].pg_arr;
- hwq->pbl_dma_ptr = hwq->pbl[hwq->level].pg_map_arr;
+ lvl = hwq->level;
+ if (hwq_attr->sginfo->nopte && hwq->level)
+ lvl = hwq->level - 1;
+ hwq->pbl_ptr = hwq->pbl[lvl].pg_arr;
+ hwq->pbl_dma_ptr = hwq->pbl[lvl].pg_map_arr;
+ spin_lock_init(&hwq->lock);
return 0;
-
fail:
- bnxt_qplib_free_hwq(pdev, hwq);
+ bnxt_qplib_free_hwq(res, hwq);
return -ENOMEM;
}
/* Context Tables */
-void bnxt_qplib_free_ctx(struct pci_dev *pdev,
+void bnxt_qplib_free_ctx(struct bnxt_qplib_res *res,
struct bnxt_qplib_ctx *ctx)
{
int i;
- bnxt_qplib_free_hwq(pdev, &ctx->qpc_tbl);
- bnxt_qplib_free_hwq(pdev, &ctx->mrw_tbl);
- bnxt_qplib_free_hwq(pdev, &ctx->srqc_tbl);
- bnxt_qplib_free_hwq(pdev, &ctx->cq_tbl);
- bnxt_qplib_free_hwq(pdev, &ctx->tim_tbl);
+ bnxt_qplib_free_hwq(res, &ctx->qpc_tbl);
+ bnxt_qplib_free_hwq(res, &ctx->mrw_tbl);
+ bnxt_qplib_free_hwq(res, &ctx->srqc_tbl);
+ bnxt_qplib_free_hwq(res, &ctx->cq_tbl);
+ bnxt_qplib_free_hwq(res, &ctx->tim_tbl);
for (i = 0; i < MAX_TQM_ALLOC_REQ; i++)
- bnxt_qplib_free_hwq(pdev, &ctx->tqm_tbl[i]);
- bnxt_qplib_free_hwq(pdev, &ctx->tqm_pde);
- bnxt_qplib_free_stats_ctx(pdev, &ctx->stats);
+ bnxt_qplib_free_hwq(res, &ctx->tqm_ctx.qtbl[i]);
+ /* restore original pde level before destroy */
+ ctx->tqm_ctx.pde.level = ctx->tqm_ctx.pde_level;
+ bnxt_qplib_free_hwq(res, &ctx->tqm_ctx.pde);
+ bnxt_qplib_free_stats_ctx(res->pdev, &ctx->stats);
+}
+
+static int bnxt_qplib_alloc_tqm_rings(struct bnxt_qplib_res *res,
+ struct bnxt_qplib_ctx *ctx)
+{
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
+ struct bnxt_qplib_sg_info sginfo = {};
+ struct bnxt_qplib_tqm_ctx *tqmctx;
+ int rc = 0;
+ int i;
+
+ tqmctx = &ctx->tqm_ctx;
+
+ sginfo.pgsize = PAGE_SIZE;
+ sginfo.pgshft = PAGE_SHIFT;
+ hwq_attr.sginfo = &sginfo;
+ hwq_attr.res = res;
+ hwq_attr.type = HWQ_TYPE_CTX;
+ hwq_attr.depth = 512;
+ hwq_attr.stride = sizeof(u64);
+ /* Alloc pdl buffer */
+ rc = bnxt_qplib_alloc_init_hwq(&tqmctx->pde, &hwq_attr);
+ if (rc)
+ goto out;
+ /* Save original pdl level */
+ tqmctx->pde_level = tqmctx->pde.level;
+
+ hwq_attr.stride = 1;
+ for (i = 0; i < MAX_TQM_ALLOC_REQ; i++) {
+ if (!tqmctx->qcount[i])
+ continue;
+ hwq_attr.depth = ctx->qpc_count * tqmctx->qcount[i];
+ rc = bnxt_qplib_alloc_init_hwq(&tqmctx->qtbl[i], &hwq_attr);
+ if (rc)
+ goto out;
+ }
+out:
+ return rc;
+}
+
+static void bnxt_qplib_map_tqm_pgtbl(struct bnxt_qplib_tqm_ctx *ctx)
+{
+ struct bnxt_qplib_hwq *tbl;
+ dma_addr_t *dma_ptr;
+ __le64 **pbl_ptr, *ptr;
+ int i, j, k;
+ int fnz_idx = -1;
+ int pg_count;
+
+ pbl_ptr = (__le64 **)ctx->pde.pbl_ptr;
+
+ for (i = 0, j = 0; i < MAX_TQM_ALLOC_REQ;
+ i++, j += MAX_TQM_ALLOC_BLK_SIZE) {
+ tbl = &ctx->qtbl[i];
+ if (!tbl->max_elements)
+ continue;
+ if (fnz_idx == -1)
+ fnz_idx = i; /* first non-zero index */
+ switch (tbl->level) {
+ case PBL_LVL_2:
+ pg_count = tbl->pbl[PBL_LVL_1].pg_count;
+ for (k = 0; k < pg_count; k++) {
+ ptr = &pbl_ptr[PTR_PG(j + k)][PTR_IDX(j + k)];
+ dma_ptr = &tbl->pbl[PBL_LVL_1].pg_map_arr[k];
+ *ptr = cpu_to_le64(*dma_ptr | PTU_PTE_VALID);
+ }
+ break;
+ case PBL_LVL_1:
+ case PBL_LVL_0:
+ default:
+ ptr = &pbl_ptr[PTR_PG(j)][PTR_IDX(j)];
+ *ptr = cpu_to_le64(tbl->pbl[PBL_LVL_0].pg_map_arr[0] |
+ PTU_PTE_VALID);
+ break;
+ }
+ }
+ if (fnz_idx == -1)
+ fnz_idx = 0;
+ /* update pde level as per page table programming */
+ ctx->pde.level = (ctx->qtbl[fnz_idx].level == PBL_LVL_2) ? PBL_LVL_2 :
+ ctx->qtbl[fnz_idx].level + 1;
+}
+
+static int bnxt_qplib_setup_tqm_rings(struct bnxt_qplib_res *res,
+ struct bnxt_qplib_ctx *ctx)
+{
+ int rc = 0;
+
+ rc = bnxt_qplib_alloc_tqm_rings(res, ctx);
+ if (rc)
+ goto fail;
+
+ bnxt_qplib_map_tqm_pgtbl(&ctx->tqm_ctx);
+fail:
+ return rc;
}
/*
@@ -335,120 +496,72 @@ void bnxt_qplib_free_ctx(struct pci_dev *pdev,
* Returns:
* 0 if success, else -ERRORS
*/
-int bnxt_qplib_alloc_ctx(struct pci_dev *pdev,
+int bnxt_qplib_alloc_ctx(struct bnxt_qplib_res *res,
struct bnxt_qplib_ctx *ctx,
bool virt_fn, bool is_p5)
{
- int i, j, k, rc = 0;
- int fnz_idx = -1;
- __le64 **pbl_ptr;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
+ struct bnxt_qplib_sg_info sginfo = {};
+ int rc = 0;
if (virt_fn || is_p5)
goto stats_alloc;
/* QPC Tables */
- ctx->qpc_tbl.max_elements = ctx->qpc_count;
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->qpc_tbl, NULL,
- &ctx->qpc_tbl.max_elements,
- BNXT_QPLIB_MAX_QP_CTX_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_CTX);
+ sginfo.pgsize = PAGE_SIZE;
+ sginfo.pgshft = PAGE_SHIFT;
+ hwq_attr.sginfo = &sginfo;
+
+ hwq_attr.res = res;
+ hwq_attr.depth = ctx->qpc_count;
+ hwq_attr.stride = BNXT_QPLIB_MAX_QP_CTX_ENTRY_SIZE;
+ hwq_attr.type = HWQ_TYPE_CTX;
+ rc = bnxt_qplib_alloc_init_hwq(&ctx->qpc_tbl, &hwq_attr);
if (rc)
goto fail;
/* MRW Tables */
- ctx->mrw_tbl.max_elements = ctx->mrw_count;
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->mrw_tbl, NULL,
- &ctx->mrw_tbl.max_elements,
- BNXT_QPLIB_MAX_MRW_CTX_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_CTX);
+ hwq_attr.depth = ctx->mrw_count;
+ hwq_attr.stride = BNXT_QPLIB_MAX_MRW_CTX_ENTRY_SIZE;
+ rc = bnxt_qplib_alloc_init_hwq(&ctx->mrw_tbl, &hwq_attr);
if (rc)
goto fail;
/* SRQ Tables */
- ctx->srqc_tbl.max_elements = ctx->srqc_count;
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->srqc_tbl, NULL,
- &ctx->srqc_tbl.max_elements,
- BNXT_QPLIB_MAX_SRQ_CTX_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_CTX);
+ hwq_attr.depth = ctx->srqc_count;
+ hwq_attr.stride = BNXT_QPLIB_MAX_SRQ_CTX_ENTRY_SIZE;
+ rc = bnxt_qplib_alloc_init_hwq(&ctx->srqc_tbl, &hwq_attr);
if (rc)
goto fail;
/* CQ Tables */
- ctx->cq_tbl.max_elements = ctx->cq_count;
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->cq_tbl, NULL,
- &ctx->cq_tbl.max_elements,
- BNXT_QPLIB_MAX_CQ_CTX_ENTRY_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_CTX);
+ hwq_attr.depth = ctx->cq_count;
+ hwq_attr.stride = BNXT_QPLIB_MAX_CQ_CTX_ENTRY_SIZE;
+ rc = bnxt_qplib_alloc_init_hwq(&ctx->cq_tbl, &hwq_attr);
if (rc)
goto fail;
/* TQM Buffer */
- ctx->tqm_pde.max_elements = 512;
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->tqm_pde, NULL,
- &ctx->tqm_pde.max_elements, sizeof(u64),
- 0, PAGE_SIZE, HWQ_TYPE_CTX);
+ rc = bnxt_qplib_setup_tqm_rings(res, ctx);
if (rc)
goto fail;
-
- for (i = 0; i < MAX_TQM_ALLOC_REQ; i++) {
- if (!ctx->tqm_count[i])
- continue;
- ctx->tqm_tbl[i].max_elements = ctx->qpc_count *
- ctx->tqm_count[i];
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->tqm_tbl[i], NULL,
- &ctx->tqm_tbl[i].max_elements, 1,
- 0, PAGE_SIZE, HWQ_TYPE_CTX);
- if (rc)
- goto fail;
- }
- pbl_ptr = (__le64 **)ctx->tqm_pde.pbl_ptr;
- for (i = 0, j = 0; i < MAX_TQM_ALLOC_REQ;
- i++, j += MAX_TQM_ALLOC_BLK_SIZE) {
- if (!ctx->tqm_tbl[i].max_elements)
- continue;
- if (fnz_idx == -1)
- fnz_idx = i;
- switch (ctx->tqm_tbl[i].level) {
- case PBL_LVL_2:
- for (k = 0; k < ctx->tqm_tbl[i].pbl[PBL_LVL_1].pg_count;
- k++)
- pbl_ptr[PTR_PG(j + k)][PTR_IDX(j + k)] =
- cpu_to_le64(
- ctx->tqm_tbl[i].pbl[PBL_LVL_1].pg_map_arr[k]
- | PTU_PTE_VALID);
- break;
- case PBL_LVL_1:
- case PBL_LVL_0:
- default:
- pbl_ptr[PTR_PG(j)][PTR_IDX(j)] = cpu_to_le64(
- ctx->tqm_tbl[i].pbl[PBL_LVL_0].pg_map_arr[0] |
- PTU_PTE_VALID);
- break;
- }
- }
- if (fnz_idx == -1)
- fnz_idx = 0;
- ctx->tqm_pde_level = ctx->tqm_tbl[fnz_idx].level == PBL_LVL_2 ?
- PBL_LVL_2 : ctx->tqm_tbl[fnz_idx].level + 1;
-
/* TIM Buffer */
ctx->tim_tbl.max_elements = ctx->qpc_count * 16;
- rc = bnxt_qplib_alloc_init_hwq(pdev, &ctx->tim_tbl, NULL,
- &ctx->tim_tbl.max_elements, 1,
- 0, PAGE_SIZE, HWQ_TYPE_CTX);
+ hwq_attr.depth = ctx->qpc_count * 16;
+ hwq_attr.stride = 1;
+ rc = bnxt_qplib_alloc_init_hwq(&ctx->tim_tbl, &hwq_attr);
if (rc)
goto fail;
-
stats_alloc:
/* Stats */
- rc = bnxt_qplib_alloc_stats_ctx(pdev, &ctx->stats);
+ rc = bnxt_qplib_alloc_stats_ctx(res->pdev, &ctx->stats);
if (rc)
goto fail;
return 0;
fail:
- bnxt_qplib_free_ctx(pdev, ctx);
+ bnxt_qplib_free_ctx(res, ctx);
return rc;
}
@@ -808,9 +921,6 @@ void bnxt_qplib_free_res(struct bnxt_qplib_res *res)
bnxt_qplib_free_sgid_tbl(res, &res->sgid_tbl);
bnxt_qplib_free_pd_tbl(&res->pd_tbl);
bnxt_qplib_free_dpi_tbl(res, &res->dpi_tbl);
-
- res->netdev = NULL;
- res->pdev = NULL;
}
int bnxt_qplib_alloc_res(struct bnxt_qplib_res *res, struct pci_dev *pdev,
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.h b/drivers/infiniband/hw/bnxt_re/qplib_res.h
index aaa76d792185..95b645dbbc2d 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.h
@@ -55,7 +55,8 @@ extern const struct bnxt_qplib_gid bnxt_qplib_gid_zero;
enum bnxt_qplib_hwq_type {
HWQ_TYPE_CTX,
HWQ_TYPE_QUEUE,
- HWQ_TYPE_L2_CMPL
+ HWQ_TYPE_L2_CMPL,
+ HWQ_TYPE_MR
};
#define MAX_PBL_LVL_0_PGS 1
@@ -63,6 +64,7 @@ enum bnxt_qplib_hwq_type {
#define MAX_PBL_LVL_1_PGS_SHIFT 9
#define MAX_PBL_LVL_1_PGS_FOR_LVL_2 256
#define MAX_PBL_LVL_2_PGS (256 * 512)
+#define MAX_PDL_LVL_SHIFT 9
enum bnxt_qplib_pbl_lvl {
PBL_LVL_0,
@@ -78,6 +80,13 @@ enum bnxt_qplib_pbl_lvl {
#define ROCE_PG_SIZE_8M (8 * 1024 * 1024)
#define ROCE_PG_SIZE_1G (1024 * 1024 * 1024)
+struct bnxt_qplib_reg_desc {
+ u8 bar_id;
+ resource_size_t bar_base;
+ void __iomem *bar_reg;
+ size_t len;
+};
+
struct bnxt_qplib_pbl {
u32 pg_count;
u32 pg_size;
@@ -85,17 +94,37 @@ struct bnxt_qplib_pbl {
dma_addr_t *pg_map_arr;
};
+struct bnxt_qplib_sg_info {
+ struct scatterlist *sghead;
+ u32 nmap;
+ u32 npages;
+ u32 pgshft;
+ u32 pgsize;
+ bool nopte;
+};
+
+struct bnxt_qplib_hwq_attr {
+ struct bnxt_qplib_res *res;
+ struct bnxt_qplib_sg_info *sginfo;
+ enum bnxt_qplib_hwq_type type;
+ u32 depth;
+ u32 stride;
+ u32 aux_stride;
+ u32 aux_depth;
+};
+
struct bnxt_qplib_hwq {
struct pci_dev *pdev;
/* lock to protect qplib_hwq */
spinlock_t lock;
- struct bnxt_qplib_pbl pbl[PBL_LVL_MAX];
+ struct bnxt_qplib_pbl pbl[PBL_LVL_MAX + 1];
enum bnxt_qplib_pbl_lvl level; /* 0, 1, or 2 */
/* ptr for easy access to the PBL entries */
void **pbl_ptr;
/* ptr for easy access to the dma_addr */
dma_addr_t *pbl_dma_ptr;
u32 max_elements;
+ u32 depth;
u16 element_size; /* Size of each entry */
u32 prod; /* raw */
@@ -104,6 +133,13 @@ struct bnxt_qplib_hwq {
u8 is_user;
};
+struct bnxt_qplib_db_info {
+ void __iomem *db;
+ void __iomem *priv_db;
+ struct bnxt_qplib_hwq *hwq;
+ u32 xid;
+};
+
/* Tables */
struct bnxt_qplib_pd_tbl {
unsigned long *tbl;
@@ -159,6 +195,15 @@ struct bnxt_qplib_vf_res {
#define BNXT_QPLIB_MAX_CQ_CTX_ENTRY_SIZE 64
#define BNXT_QPLIB_MAX_MRW_CTX_ENTRY_SIZE 128
+#define MAX_TQM_ALLOC_REQ 48
+#define MAX_TQM_ALLOC_BLK_SIZE 8
+struct bnxt_qplib_tqm_ctx {
+ struct bnxt_qplib_hwq pde;
+ u8 pde_level; /* Original level */
+ struct bnxt_qplib_hwq qtbl[MAX_TQM_ALLOC_REQ];
+ u8 qcount[MAX_TQM_ALLOC_REQ];
+};
+
struct bnxt_qplib_ctx {
u32 qpc_count;
struct bnxt_qplib_hwq qpc_tbl;
@@ -169,12 +214,7 @@ struct bnxt_qplib_ctx {
u32 cq_count;
struct bnxt_qplib_hwq cq_tbl;
struct bnxt_qplib_hwq tim_tbl;
-#define MAX_TQM_ALLOC_REQ 48
-#define MAX_TQM_ALLOC_BLK_SIZE 8
- u8 tqm_count[MAX_TQM_ALLOC_REQ];
- struct bnxt_qplib_hwq tqm_pde;
- u32 tqm_pde_level;
- struct bnxt_qplib_hwq tqm_tbl[MAX_TQM_ALLOC_REQ];
+ struct bnxt_qplib_tqm_ctx tqm_ctx;
struct bnxt_qplib_stats stats;
struct bnxt_qplib_vf_res vf_res;
u64 hwrm_intf_ver;
@@ -223,11 +263,6 @@ static inline u8 bnxt_qplib_get_ring_type(struct bnxt_qplib_chip_ctx *cctx)
RING_ALLOC_REQ_RING_TYPE_ROCE_CMPL;
}
-struct bnxt_qplib_sg_info {
- struct scatterlist *sglist;
- u32 nmap;
- u32 npages;
-};
#define to_bnxt_qplib(ptr, type, member) \
container_of(ptr, type, member)
@@ -235,11 +270,10 @@ struct bnxt_qplib_sg_info {
struct bnxt_qplib_pd;
struct bnxt_qplib_dev_attr;
-void bnxt_qplib_free_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq);
-int bnxt_qplib_alloc_init_hwq(struct pci_dev *pdev, struct bnxt_qplib_hwq *hwq,
- struct bnxt_qplib_sg_info *sg_info, u32 *elements,
- u32 elements_per_page, u32 aux, u32 pg_size,
- enum bnxt_qplib_hwq_type hwq_type);
+void bnxt_qplib_free_hwq(struct bnxt_qplib_res *res,
+ struct bnxt_qplib_hwq *hwq);
+int bnxt_qplib_alloc_init_hwq(struct bnxt_qplib_hwq *hwq,
+ struct bnxt_qplib_hwq_attr *hwq_attr);
void bnxt_qplib_get_guid(u8 *dev_addr, u8 *guid);
int bnxt_qplib_alloc_pd(struct bnxt_qplib_pd_tbl *pd_tbl,
struct bnxt_qplib_pd *pd);
@@ -258,9 +292,80 @@ void bnxt_qplib_free_res(struct bnxt_qplib_res *res);
int bnxt_qplib_alloc_res(struct bnxt_qplib_res *res, struct pci_dev *pdev,
struct net_device *netdev,
struct bnxt_qplib_dev_attr *dev_attr);
-void bnxt_qplib_free_ctx(struct pci_dev *pdev,
+void bnxt_qplib_free_ctx(struct bnxt_qplib_res *res,
struct bnxt_qplib_ctx *ctx);
-int bnxt_qplib_alloc_ctx(struct pci_dev *pdev,
+int bnxt_qplib_alloc_ctx(struct bnxt_qplib_res *res,
struct bnxt_qplib_ctx *ctx,
bool virt_fn, bool is_p5);
+
+static inline void bnxt_qplib_ring_db32(struct bnxt_qplib_db_info *info,
+ bool arm)
+{
+ u32 key;
+
+ key = info->hwq->cons & (info->hwq->max_elements - 1);
+ key |= (CMPL_DOORBELL_IDX_VALID |
+ (CMPL_DOORBELL_KEY_CMPL & CMPL_DOORBELL_KEY_MASK));
+ if (!arm)
+ key |= CMPL_DOORBELL_MASK;
+ writel(key, info->db);
+}
+
+static inline void bnxt_qplib_ring_db(struct bnxt_qplib_db_info *info,
+ u32 type)
+{
+ u64 key = 0;
+
+ key = (info->xid & DBC_DBC_XID_MASK) | DBC_DBC_PATH_ROCE | type;
+ key <<= 32;
+ key |= (info->hwq->cons & (info->hwq->max_elements - 1)) &
+ DBC_DBC_INDEX_MASK;
+ writeq(key, info->db);
+}
+
+static inline void bnxt_qplib_ring_prod_db(struct bnxt_qplib_db_info *info,
+ u32 type)
+{
+ u64 key = 0;
+
+ key = (info->xid & DBC_DBC_XID_MASK) | DBC_DBC_PATH_ROCE | type;
+ key <<= 32;
+ key |= (info->hwq->prod & (info->hwq->max_elements - 1)) &
+ DBC_DBC_INDEX_MASK;
+ writeq(key, info->db);
+}
+
+static inline void bnxt_qplib_armen_db(struct bnxt_qplib_db_info *info,
+ u32 type)
+{
+ u64 key = 0;
+
+ key = (info->xid & DBC_DBC_XID_MASK) | DBC_DBC_PATH_ROCE | type;
+ key <<= 32;
+ writeq(key, info->priv_db);
+}
+
+static inline void bnxt_qplib_srq_arm_db(struct bnxt_qplib_db_info *info,
+ u32 th)
+{
+ u64 key = 0;
+
+ key = (info->xid & DBC_DBC_XID_MASK) | DBC_DBC_PATH_ROCE | th;
+ key <<= 32;
+ key |= th & DBC_DBC_INDEX_MASK;
+ writeq(key, info->priv_db);
+}
+
+static inline void bnxt_qplib_ring_nq_db(struct bnxt_qplib_db_info *info,
+ struct bnxt_qplib_chip_ctx *cctx,
+ bool arm)
+{
+ u32 type;
+
+ type = arm ? DBC_DBC_TYPE_NQ_ARM : DBC_DBC_TYPE_NQ;
+ if (bnxt_qplib_is_chip_gen_p5(cctx))
+ bnxt_qplib_ring_db(info, type);
+ else
+ bnxt_qplib_ring_db32(info, arm);
+}
#endif /* __BNXT_QPLIB_RES_H__ */
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 40296b97d21e..66954ff6a2f2 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -585,7 +585,7 @@ int bnxt_qplib_free_mrw(struct bnxt_qplib_res *res, struct bnxt_qplib_mrw *mrw)
/* Free the qplib's MRW memory */
if (mrw->hwq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &mrw->hwq);
+ bnxt_qplib_free_hwq(res, &mrw->hwq);
return 0;
}
@@ -646,7 +646,7 @@ int bnxt_qplib_dereg_mrw(struct bnxt_qplib_res *res, struct bnxt_qplib_mrw *mrw,
if (mrw->hwq.max_elements) {
mrw->va = 0;
mrw->total_size = 0;
- bnxt_qplib_free_hwq(res->pdev, &mrw->hwq);
+ bnxt_qplib_free_hwq(res, &mrw->hwq);
}
return 0;
@@ -656,10 +656,12 @@ int bnxt_qplib_reg_mr(struct bnxt_qplib_res *res, struct bnxt_qplib_mrw *mr,
u64 *pbl_tbl, int num_pbls, bool block, u32 buf_pg_size)
{
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
- struct cmdq_register_mr req;
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
+ struct bnxt_qplib_sg_info sginfo = {};
struct creq_register_mr_resp resp;
- u16 cmd_flags = 0, level;
+ struct cmdq_register_mr req;
int pg_ptrs, pages, i, rc;
+ u16 cmd_flags = 0, level;
dma_addr_t **pbl_ptr;
u32 pg_size;
@@ -674,20 +676,23 @@ int bnxt_qplib_reg_mr(struct bnxt_qplib_res *res, struct bnxt_qplib_mrw *mr,
if (pages > MAX_PBL_LVL_1_PGS) {
dev_err(&res->pdev->dev,
- "SP: Reg MR pages requested (0x%x) exceeded max (0x%x)\n",
+ "SP: Reg MR: pages requested (0x%x) exceeded max (0x%x)\n",
pages, MAX_PBL_LVL_1_PGS);
return -ENOMEM;
}
/* Free the hwq if it already exist, must be a rereg */
if (mr->hwq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &mr->hwq);
-
- mr->hwq.max_elements = pages;
+ bnxt_qplib_free_hwq(res, &mr->hwq);
/* Use system PAGE_SIZE */
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &mr->hwq, NULL,
- &mr->hwq.max_elements,
- PAGE_SIZE, 0, PAGE_SIZE,
- HWQ_TYPE_CTX);
+ hwq_attr.res = res;
+ hwq_attr.depth = pages;
+ hwq_attr.stride = PAGE_SIZE;
+ hwq_attr.type = HWQ_TYPE_MR;
+ hwq_attr.sginfo = &sginfo;
+ hwq_attr.sginfo->npages = pages;
+ hwq_attr.sginfo->pgsize = PAGE_SIZE;
+ hwq_attr.sginfo->pgshft = PAGE_SHIFT;
+ rc = bnxt_qplib_alloc_init_hwq(&mr->hwq, &hwq_attr);
if (rc) {
dev_err(&res->pdev->dev,
"SP: Reg MR memory allocation failed\n");
@@ -734,7 +739,7 @@ int bnxt_qplib_reg_mr(struct bnxt_qplib_res *res, struct bnxt_qplib_mrw *mr,
fail:
if (mr->hwq.max_elements)
- bnxt_qplib_free_hwq(res->pdev, &mr->hwq);
+ bnxt_qplib_free_hwq(res, &mr->hwq);
return rc;
}
@@ -742,6 +747,8 @@ int bnxt_qplib_alloc_fast_reg_page_list(struct bnxt_qplib_res *res,
struct bnxt_qplib_frpl *frpl,
int max_pg_ptrs)
{
+ struct bnxt_qplib_hwq_attr hwq_attr = {};
+ struct bnxt_qplib_sg_info sginfo = {};
int pg_ptrs, pages, rc;
/* Re-calculate the max to fit the HWQ allocation model */
@@ -753,10 +760,15 @@ int bnxt_qplib_alloc_fast_reg_page_list(struct bnxt_qplib_res *res,
if (pages > MAX_PBL_LVL_1_PGS)
return -ENOMEM;
- frpl->hwq.max_elements = pages;
- rc = bnxt_qplib_alloc_init_hwq(res->pdev, &frpl->hwq, NULL,
- &frpl->hwq.max_elements, PAGE_SIZE, 0,
- PAGE_SIZE, HWQ_TYPE_CTX);
+ sginfo.pgsize = PAGE_SIZE;
+ sginfo.nopte = true;
+
+ hwq_attr.res = res;
+ hwq_attr.depth = pg_ptrs;
+ hwq_attr.stride = PAGE_SIZE;
+ hwq_attr.sginfo = &sginfo;
+ hwq_attr.type = HWQ_TYPE_CTX;
+ rc = bnxt_qplib_alloc_init_hwq(&frpl->hwq, &hwq_attr);
if (!rc)
frpl->max_pg_ptrs = pg_ptrs;
@@ -766,7 +778,7 @@ int bnxt_qplib_alloc_fast_reg_page_list(struct bnxt_qplib_res *res,
int bnxt_qplib_free_fast_reg_page_list(struct bnxt_qplib_res *res,
struct bnxt_qplib_frpl *frpl)
{
- bnxt_qplib_free_hwq(res->pdev, &frpl->hwq);
+ bnxt_qplib_free_hwq(res, &frpl->hwq);
return 0;
}
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 7d06b0f8d49a..e8e11bd95e42 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -707,7 +707,7 @@ struct mpa_message {
u8 flags;
u8 revision;
__be16 private_data_size;
- u8 private_data[0];
+ u8 private_data[];
};
struct mpa_v2_conn_params {
@@ -719,7 +719,7 @@ struct terminate_message {
u8 layer_etype;
u8 ecode;
__be16 hdrct_rsvd;
- u8 len_hdrs[0];
+ u8 len_hdrs[];
};
#define TERM_MAX_LENGTH (sizeof(struct terminate_message) + 2 + 18 + 28)
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 89ac2f9ae6dd..ac48012c992f 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -2127,7 +2127,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
pr_debug("ib_pd %p\n", pd);
if (attrs->qp_type != IB_QPT_RC)
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
php = to_c4iw_pd(pd);
rhp = php->rhp;
diff --git a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
index cbdb300a4794..a2f5e29ef226 100644
--- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
+++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
@@ -123,7 +123,7 @@ struct fw_ri_dsgl {
__be32 len0;
__be64 addr0;
#ifndef C99_NOT_SUPPORTED
- struct fw_ri_dsge_pair sge[0];
+ struct fw_ri_dsge_pair sge[];
#endif
};
@@ -139,7 +139,7 @@ struct fw_ri_isgl {
__be16 nsge;
__be32 r2;
#ifndef C99_NOT_SUPPORTED
- struct fw_ri_sge sge[0];
+ struct fw_ri_sge sge[];
#endif
};
@@ -149,7 +149,7 @@ struct fw_ri_immd {
__be16 r2;
__be32 immdlen;
#ifndef C99_NOT_SUPPORTED
- __u8 data[0];
+ __u8 data[];
#endif
};
@@ -321,7 +321,7 @@ struct fw_ri_res_wr {
__be32 len16_pkd;
__u64 cookie;
#ifndef C99_NOT_SUPPORTED
- struct fw_ri_res res[0];
+ struct fw_ri_res res[];
#endif
};
diff --git a/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h b/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
index 74b787a90660..96b104ab5415 100644
--- a/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
+++ b/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_ADMIN_CMDS_H_
@@ -801,21 +801,16 @@ struct efa_admin_mmio_req_read_less_resp {
/* create_qp_cmd */
#define EFA_ADMIN_CREATE_QP_CMD_SQ_VIRT_MASK BIT(0)
-#define EFA_ADMIN_CREATE_QP_CMD_RQ_VIRT_SHIFT 1
#define EFA_ADMIN_CREATE_QP_CMD_RQ_VIRT_MASK BIT(1)
/* reg_mr_cmd */
#define EFA_ADMIN_REG_MR_CMD_PHYS_PAGE_SIZE_SHIFT_MASK GENMASK(4, 0)
-#define EFA_ADMIN_REG_MR_CMD_MEM_ADDR_PHY_MODE_EN_SHIFT 7
#define EFA_ADMIN_REG_MR_CMD_MEM_ADDR_PHY_MODE_EN_MASK BIT(7)
#define EFA_ADMIN_REG_MR_CMD_LOCAL_WRITE_ENABLE_MASK BIT(0)
-#define EFA_ADMIN_REG_MR_CMD_REMOTE_READ_ENABLE_SHIFT 2
#define EFA_ADMIN_REG_MR_CMD_REMOTE_READ_ENABLE_MASK BIT(2)
/* create_cq_cmd */
-#define EFA_ADMIN_CREATE_CQ_CMD_INTERRUPT_MODE_ENABLED_SHIFT 5
#define EFA_ADMIN_CREATE_CQ_CMD_INTERRUPT_MODE_ENABLED_MASK BIT(5)
-#define EFA_ADMIN_CREATE_CQ_CMD_VIRT_SHIFT 6
#define EFA_ADMIN_CREATE_CQ_CMD_VIRT_MASK BIT(6)
#define EFA_ADMIN_CREATE_CQ_CMD_CQ_ENTRY_SIZE_WORDS_MASK GENMASK(4, 0)
diff --git a/drivers/infiniband/hw/efa/efa_admin_defs.h b/drivers/infiniband/hw/efa/efa_admin_defs.h
index c8e0c8b905be..29d53ed63b3e 100644
--- a/drivers/infiniband/hw/efa/efa_admin_defs.h
+++ b/drivers/infiniband/hw/efa/efa_admin_defs.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_ADMIN_H_
@@ -121,9 +121,7 @@ struct efa_admin_aenq_entry {
/* aq_common_desc */
#define EFA_ADMIN_AQ_COMMON_DESC_COMMAND_ID_MASK GENMASK(11, 0)
#define EFA_ADMIN_AQ_COMMON_DESC_PHASE_MASK BIT(0)
-#define EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_SHIFT 1
#define EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_MASK BIT(1)
-#define EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT_SHIFT 2
#define EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT_MASK BIT(2)
/* acq_common_desc */
diff --git a/drivers/infiniband/hw/efa/efa_com.c b/drivers/infiniband/hw/efa/efa_com.c
index 0778f4f7dccd..7fce69f5568f 100644
--- a/drivers/infiniband/hw/efa/efa_com.c
+++ b/drivers/infiniband/hw/efa/efa_com.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#include "efa_com.h"
@@ -16,26 +16,13 @@
#define EFA_ASYNC_QUEUE_DEPTH 16
#define EFA_ADMIN_QUEUE_DEPTH 32
-#define MIN_EFA_VER\
- ((EFA_ADMIN_API_VERSION_MAJOR << EFA_REGS_VERSION_MAJOR_VERSION_SHIFT) | \
- (EFA_ADMIN_API_VERSION_MINOR & EFA_REGS_VERSION_MINOR_VERSION_MASK))
-
#define EFA_CTRL_MAJOR 0
#define EFA_CTRL_MINOR 0
#define EFA_CTRL_SUB_MINOR 1
-#define MIN_EFA_CTRL_VER \
- (((EFA_CTRL_MAJOR) << \
- (EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_SHIFT)) | \
- ((EFA_CTRL_MINOR) << \
- (EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION_SHIFT)) | \
- (EFA_CTRL_SUB_MINOR))
-
#define EFA_DMA_ADDR_TO_UINT32_LOW(x) ((u32)((u64)(x)))
#define EFA_DMA_ADDR_TO_UINT32_HIGH(x) ((u32)(((u64)(x)) >> 32))
-#define EFA_REGS_ADMIN_INTR_MASK 1
-
enum efa_cmd_status {
EFA_CMD_SUBMITTED,
EFA_CMD_COMPLETED,
@@ -84,7 +71,7 @@ static u32 efa_com_reg_read32(struct efa_com_dev *edev, u16 offset)
struct efa_com_mmio_read *mmio_read = &edev->mmio_read;
struct efa_admin_mmio_req_read_less_resp *read_resp;
unsigned long exp_time;
- u32 mmio_read_reg;
+ u32 mmio_read_reg = 0;
u32 err;
read_resp = mmio_read->read_resp;
@@ -94,10 +81,9 @@ static u32 efa_com_reg_read32(struct efa_com_dev *edev, u16 offset)
/* trash DMA req_id to identify when hardware is done */
read_resp->req_id = mmio_read->seq_num + 0x9aL;
- mmio_read_reg = (offset << EFA_REGS_MMIO_REG_READ_REG_OFF_SHIFT) &
- EFA_REGS_MMIO_REG_READ_REG_OFF_MASK;
- mmio_read_reg |= mmio_read->seq_num &
- EFA_REGS_MMIO_REG_READ_REQ_ID_MASK;
+ EFA_SET(&mmio_read_reg, EFA_REGS_MMIO_REG_READ_REG_OFF, offset);
+ EFA_SET(&mmio_read_reg, EFA_REGS_MMIO_REG_READ_REQ_ID,
+ mmio_read->seq_num);
writel(mmio_read_reg, edev->reg_bar + EFA_REGS_MMIO_REG_READ_OFF);
@@ -137,9 +123,9 @@ static int efa_com_admin_init_sq(struct efa_com_dev *edev)
struct efa_com_admin_queue *aq = &edev->aq;
struct efa_com_admin_sq *sq = &aq->sq;
u16 size = aq->depth * sizeof(*sq->entries);
+ u32 aq_caps = 0;
u32 addr_high;
u32 addr_low;
- u32 aq_caps;
sq->entries =
dma_alloc_coherent(aq->dmadev, size, &sq->dma_addr, GFP_KERNEL);
@@ -160,10 +146,9 @@ static int efa_com_admin_init_sq(struct efa_com_dev *edev)
writel(addr_low, edev->reg_bar + EFA_REGS_AQ_BASE_LO_OFF);
writel(addr_high, edev->reg_bar + EFA_REGS_AQ_BASE_HI_OFF);
- aq_caps = aq->depth & EFA_REGS_AQ_CAPS_AQ_DEPTH_MASK;
- aq_caps |= (sizeof(struct efa_admin_aq_entry) <<
- EFA_REGS_AQ_CAPS_AQ_ENTRY_SIZE_SHIFT) &
- EFA_REGS_AQ_CAPS_AQ_ENTRY_SIZE_MASK;
+ EFA_SET(&aq_caps, EFA_REGS_AQ_CAPS_AQ_DEPTH, aq->depth);
+ EFA_SET(&aq_caps, EFA_REGS_AQ_CAPS_AQ_ENTRY_SIZE,
+ sizeof(struct efa_admin_aq_entry));
writel(aq_caps, edev->reg_bar + EFA_REGS_AQ_CAPS_OFF);
@@ -175,9 +160,9 @@ static int efa_com_admin_init_cq(struct efa_com_dev *edev)
struct efa_com_admin_queue *aq = &edev->aq;
struct efa_com_admin_cq *cq = &aq->cq;
u16 size = aq->depth * sizeof(*cq->entries);
+ u32 acq_caps = 0;
u32 addr_high;
u32 addr_low;
- u32 acq_caps;
cq->entries =
dma_alloc_coherent(aq->dmadev, size, &cq->dma_addr, GFP_KERNEL);
@@ -195,13 +180,11 @@ static int efa_com_admin_init_cq(struct efa_com_dev *edev)
writel(addr_low, edev->reg_bar + EFA_REGS_ACQ_BASE_LO_OFF);
writel(addr_high, edev->reg_bar + EFA_REGS_ACQ_BASE_HI_OFF);
- acq_caps = aq->depth & EFA_REGS_ACQ_CAPS_ACQ_DEPTH_MASK;
- acq_caps |= (sizeof(struct efa_admin_acq_entry) <<
- EFA_REGS_ACQ_CAPS_ACQ_ENTRY_SIZE_SHIFT) &
- EFA_REGS_ACQ_CAPS_ACQ_ENTRY_SIZE_MASK;
- acq_caps |= (aq->msix_vector_idx <<
- EFA_REGS_ACQ_CAPS_ACQ_MSIX_VECTOR_SHIFT) &
- EFA_REGS_ACQ_CAPS_ACQ_MSIX_VECTOR_MASK;
+ EFA_SET(&acq_caps, EFA_REGS_ACQ_CAPS_ACQ_DEPTH, aq->depth);
+ EFA_SET(&acq_caps, EFA_REGS_ACQ_CAPS_ACQ_ENTRY_SIZE,
+ sizeof(struct efa_admin_acq_entry));
+ EFA_SET(&acq_caps, EFA_REGS_ACQ_CAPS_ACQ_MSIX_VECTOR,
+ aq->msix_vector_idx);
writel(acq_caps, edev->reg_bar + EFA_REGS_ACQ_CAPS_OFF);
@@ -212,7 +195,8 @@ static int efa_com_admin_init_aenq(struct efa_com_dev *edev,
struct efa_aenq_handlers *aenq_handlers)
{
struct efa_com_aenq *aenq = &edev->aenq;
- u32 addr_low, addr_high, aenq_caps;
+ u32 addr_low, addr_high;
+ u32 aenq_caps = 0;
u16 size;
if (!aenq_handlers) {
@@ -237,13 +221,11 @@ static int efa_com_admin_init_aenq(struct efa_com_dev *edev,
writel(addr_low, edev->reg_bar + EFA_REGS_AENQ_BASE_LO_OFF);
writel(addr_high, edev->reg_bar + EFA_REGS_AENQ_BASE_HI_OFF);
- aenq_caps = aenq->depth & EFA_REGS_AENQ_CAPS_AENQ_DEPTH_MASK;
- aenq_caps |= (sizeof(struct efa_admin_aenq_entry) <<
- EFA_REGS_AENQ_CAPS_AENQ_ENTRY_SIZE_SHIFT) &
- EFA_REGS_AENQ_CAPS_AENQ_ENTRY_SIZE_MASK;
- aenq_caps |= (aenq->msix_vector_idx
- << EFA_REGS_AENQ_CAPS_AENQ_MSIX_VECTOR_SHIFT) &
- EFA_REGS_AENQ_CAPS_AENQ_MSIX_VECTOR_MASK;
+ EFA_SET(&aenq_caps, EFA_REGS_AENQ_CAPS_AENQ_DEPTH, aenq->depth);
+ EFA_SET(&aenq_caps, EFA_REGS_AENQ_CAPS_AENQ_ENTRY_SIZE,
+ sizeof(struct efa_admin_aenq_entry));
+ EFA_SET(&aenq_caps, EFA_REGS_AENQ_CAPS_AENQ_MSIX_VECTOR,
+ aenq->msix_vector_idx);
writel(aenq_caps, edev->reg_bar + EFA_REGS_AENQ_CAPS_OFF);
/*
@@ -280,8 +262,8 @@ static void efa_com_dealloc_ctx_id(struct efa_com_admin_queue *aq,
static inline void efa_com_put_comp_ctx(struct efa_com_admin_queue *aq,
struct efa_comp_ctx *comp_ctx)
{
- u16 cmd_id = comp_ctx->user_cqe->acq_common_descriptor.command &
- EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID_MASK;
+ u16 cmd_id = EFA_GET(&comp_ctx->user_cqe->acq_common_descriptor.command,
+ EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID);
u16 ctx_id = cmd_id & (aq->depth - 1);
ibdev_dbg(aq->efa_dev, "Put completion command_id %#x\n", cmd_id);
@@ -335,8 +317,8 @@ static struct efa_comp_ctx *__efa_com_submit_admin_cmd(struct efa_com_admin_queu
cmd_id &= EFA_ADMIN_AQ_COMMON_DESC_COMMAND_ID_MASK;
cmd->aq_common_descriptor.command_id = cmd_id;
- cmd->aq_common_descriptor.flags |= aq->sq.phase &
- EFA_ADMIN_AQ_COMMON_DESC_PHASE_MASK;
+ EFA_SET(&cmd->aq_common_descriptor.flags,
+ EFA_ADMIN_AQ_COMMON_DESC_PHASE, aq->sq.phase);
comp_ctx = efa_com_get_comp_ctx(aq, cmd_id, true);
if (!comp_ctx) {
@@ -427,8 +409,8 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
struct efa_comp_ctx *comp_ctx;
u16 cmd_id;
- cmd_id = cqe->acq_common_descriptor.command &
- EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID_MASK;
+ cmd_id = EFA_GET(&cqe->acq_common_descriptor.command,
+ EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID);
comp_ctx = efa_com_get_comp_ctx(aq, cmd_id, false);
if (!comp_ctx) {
@@ -705,7 +687,7 @@ void efa_com_set_admin_polling_mode(struct efa_com_dev *edev, bool polling)
u32 mask_value = 0;
if (polling)
- mask_value = EFA_REGS_ADMIN_INTR_MASK;
+ EFA_SET(&mask_value, EFA_REGS_INTR_MASK_EN, 1);
writel(mask_value, edev->reg_bar + EFA_REGS_INTR_MASK_OFF);
if (polling)
@@ -743,7 +725,7 @@ int efa_com_admin_init(struct efa_com_dev *edev,
int err;
dev_sts = efa_com_reg_read32(edev, EFA_REGS_DEV_STS_OFF);
- if (!(dev_sts & EFA_REGS_DEV_STS_READY_MASK)) {
+ if (!EFA_GET(&dev_sts, EFA_REGS_DEV_STS_READY)) {
ibdev_err(edev->efa_dev,
"Device isn't ready, abort com init %#x\n", dev_sts);
return -ENODEV;
@@ -778,8 +760,7 @@ int efa_com_admin_init(struct efa_com_dev *edev,
goto err_destroy_cq;
cap = efa_com_reg_read32(edev, EFA_REGS_CAPS_OFF);
- timeout = (cap & EFA_REGS_CAPS_ADMIN_CMD_TO_MASK) >>
- EFA_REGS_CAPS_ADMIN_CMD_TO_SHIFT;
+ timeout = EFA_GET(&cap, EFA_REGS_CAPS_ADMIN_CMD_TO);
if (timeout)
/* the resolution of timeout reg is 100ms */
aq->completion_timeout = timeout * 100000;
@@ -940,7 +921,9 @@ void efa_com_mmio_reg_read_destroy(struct efa_com_dev *edev)
int efa_com_validate_version(struct efa_com_dev *edev)
{
+ u32 min_ctrl_ver = 0;
u32 ctrl_ver_masked;
+ u32 min_ver = 0;
u32 ctrl_ver;
u32 ver;
@@ -953,33 +936,42 @@ int efa_com_validate_version(struct efa_com_dev *edev)
EFA_REGS_CONTROLLER_VERSION_OFF);
ibdev_dbg(edev->efa_dev, "efa device version: %d.%d\n",
- (ver & EFA_REGS_VERSION_MAJOR_VERSION_MASK) >>
- EFA_REGS_VERSION_MAJOR_VERSION_SHIFT,
- ver & EFA_REGS_VERSION_MINOR_VERSION_MASK);
+ EFA_GET(&ver, EFA_REGS_VERSION_MAJOR_VERSION),
+ EFA_GET(&ver, EFA_REGS_VERSION_MINOR_VERSION));
- if (ver < MIN_EFA_VER) {
+ EFA_SET(&min_ver, EFA_REGS_VERSION_MAJOR_VERSION,
+ EFA_ADMIN_API_VERSION_MAJOR);
+ EFA_SET(&min_ver, EFA_REGS_VERSION_MINOR_VERSION,
+ EFA_ADMIN_API_VERSION_MINOR);
+ if (ver < min_ver) {
ibdev_err(edev->efa_dev,
"EFA version is lower than the minimal version the driver supports\n");
return -EOPNOTSUPP;
}
- ibdev_dbg(edev->efa_dev,
- "efa controller version: %d.%d.%d implementation version %d\n",
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_MASK) >>
- EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_SHIFT,
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION_MASK) >>
- EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION_SHIFT,
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_SUBMINOR_VERSION_MASK),
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_IMPL_ID_MASK) >>
- EFA_REGS_CONTROLLER_VERSION_IMPL_ID_SHIFT);
+ ibdev_dbg(
+ edev->efa_dev,
+ "efa controller version: %d.%d.%d implementation version %d\n",
+ EFA_GET(&ctrl_ver, EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION),
+ EFA_GET(&ctrl_ver, EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION),
+ EFA_GET(&ctrl_ver,
+ EFA_REGS_CONTROLLER_VERSION_SUBMINOR_VERSION),
+ EFA_GET(&ctrl_ver, EFA_REGS_CONTROLLER_VERSION_IMPL_ID));
ctrl_ver_masked =
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_MASK) |
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION_MASK) |
- (ctrl_ver & EFA_REGS_CONTROLLER_VERSION_SUBMINOR_VERSION_MASK);
+ EFA_GET(&ctrl_ver, EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION) |
+ EFA_GET(&ctrl_ver, EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION) |
+ EFA_GET(&ctrl_ver,
+ EFA_REGS_CONTROLLER_VERSION_SUBMINOR_VERSION);
+ EFA_SET(&min_ctrl_ver, EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION,
+ EFA_CTRL_MAJOR);
+ EFA_SET(&min_ctrl_ver, EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION,
+ EFA_CTRL_MINOR);
+ EFA_SET(&min_ctrl_ver, EFA_REGS_CONTROLLER_VERSION_SUBMINOR_VERSION,
+ EFA_CTRL_SUB_MINOR);
/* Validate the ctrl version without the implementation ID */
- if (ctrl_ver_masked < MIN_EFA_CTRL_VER) {
+ if (ctrl_ver_masked < min_ctrl_ver) {
ibdev_err(edev->efa_dev,
"EFA ctrl version is lower than the minimal ctrl version the driver supports\n");
return -EOPNOTSUPP;
@@ -1002,8 +994,7 @@ int efa_com_get_dma_width(struct efa_com_dev *edev)
u32 caps = efa_com_reg_read32(edev, EFA_REGS_CAPS_OFF);
int width;
- width = (caps & EFA_REGS_CAPS_DMA_ADDR_WIDTH_MASK) >>
- EFA_REGS_CAPS_DMA_ADDR_WIDTH_SHIFT;
+ width = EFA_GET(&caps, EFA_REGS_CAPS_DMA_ADDR_WIDTH);
ibdev_dbg(edev->efa_dev, "DMA width: %d\n", width);
@@ -1017,16 +1008,14 @@ int efa_com_get_dma_width(struct efa_com_dev *edev)
return width;
}
-static int wait_for_reset_state(struct efa_com_dev *edev, u32 timeout,
- u16 exp_state)
+static int wait_for_reset_state(struct efa_com_dev *edev, u32 timeout, int on)
{
u32 val, i;
for (i = 0; i < timeout; i++) {
val = efa_com_reg_read32(edev, EFA_REGS_DEV_STS_OFF);
- if ((val & EFA_REGS_DEV_STS_RESET_IN_PROGRESS_MASK) ==
- exp_state)
+ if (EFA_GET(&val, EFA_REGS_DEV_STS_RESET_IN_PROGRESS) == on)
return 0;
ibdev_dbg(edev->efa_dev, "Reset indication val %d\n", val);
@@ -1046,36 +1035,34 @@ static int wait_for_reset_state(struct efa_com_dev *edev, u32 timeout,
int efa_com_dev_reset(struct efa_com_dev *edev,
enum efa_regs_reset_reason_types reset_reason)
{
- u32 stat, timeout, cap, reset_val;
+ u32 stat, timeout, cap;
+ u32 reset_val = 0;
int err;
stat = efa_com_reg_read32(edev, EFA_REGS_DEV_STS_OFF);
cap = efa_com_reg_read32(edev, EFA_REGS_CAPS_OFF);
- if (!(stat & EFA_REGS_DEV_STS_READY_MASK)) {
+ if (!EFA_GET(&stat, EFA_REGS_DEV_STS_READY)) {
ibdev_err(edev->efa_dev,
"Device isn't ready, can't reset device\n");
return -EINVAL;
}
- timeout = (cap & EFA_REGS_CAPS_RESET_TIMEOUT_MASK) >>
- EFA_REGS_CAPS_RESET_TIMEOUT_SHIFT;
+ timeout = EFA_GET(&cap, EFA_REGS_CAPS_RESET_TIMEOUT);
if (!timeout) {
ibdev_err(edev->efa_dev, "Invalid timeout value\n");
return -EINVAL;
}
/* start reset */
- reset_val = EFA_REGS_DEV_CTL_DEV_RESET_MASK;
- reset_val |= (reset_reason << EFA_REGS_DEV_CTL_RESET_REASON_SHIFT) &
- EFA_REGS_DEV_CTL_RESET_REASON_MASK;
+ EFA_SET(&reset_val, EFA_REGS_DEV_CTL_DEV_RESET, 1);
+ EFA_SET(&reset_val, EFA_REGS_DEV_CTL_RESET_REASON, reset_reason);
writel(reset_val, edev->reg_bar + EFA_REGS_DEV_CTL_OFF);
/* reset clears the mmio readless address, restore it */
efa_com_mmio_reg_read_resp_addr_init(edev);
- err = wait_for_reset_state(edev, timeout,
- EFA_REGS_DEV_STS_RESET_IN_PROGRESS_MASK);
+ err = wait_for_reset_state(edev, timeout, 1);
if (err) {
ibdev_err(edev->efa_dev, "Reset indication didn't turn on\n");
return err;
@@ -1089,8 +1076,7 @@ int efa_com_dev_reset(struct efa_com_dev *edev,
return err;
}
- timeout = (cap & EFA_REGS_CAPS_ADMIN_CMD_TO_MASK) >>
- EFA_REGS_CAPS_ADMIN_CMD_TO_SHIFT;
+ timeout = EFA_GET(&cap, EFA_REGS_CAPS_ADMIN_CMD_TO);
if (timeout)
/* the resolution of timeout reg is 100ms */
edev->aq.completion_timeout = timeout * 100000;
diff --git a/drivers/infiniband/hw/efa/efa_com_cmd.c b/drivers/infiniband/hw/efa/efa_com_cmd.c
index e20bd84a1014..eea5574a62e8 100644
--- a/drivers/infiniband/hw/efa/efa_com_cmd.c
+++ b/drivers/infiniband/hw/efa/efa_com_cmd.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#include "efa_com.h"
@@ -161,8 +161,9 @@ int efa_com_create_cq(struct efa_com_dev *edev,
int err;
create_cmd.aq_common_desc.opcode = EFA_ADMIN_CREATE_CQ;
- create_cmd.cq_caps_2 = (params->entry_size_in_bytes / 4) &
- EFA_ADMIN_CREATE_CQ_CMD_CQ_ENTRY_SIZE_WORDS_MASK;
+ EFA_SET(&create_cmd.cq_caps_2,
+ EFA_ADMIN_CREATE_CQ_CMD_CQ_ENTRY_SIZE_WORDS,
+ params->entry_size_in_bytes / 4);
create_cmd.cq_depth = params->cq_depth;
create_cmd.num_sub_cqs = params->num_sub_cqs;
create_cmd.uar = params->uarn;
@@ -227,8 +228,8 @@ int efa_com_register_mr(struct efa_com_dev *edev,
mr_cmd.aq_common_desc.opcode = EFA_ADMIN_REG_MR;
mr_cmd.pd = params->pd;
mr_cmd.mr_length = params->mr_length_in_bytes;
- mr_cmd.flags |= params->page_shift &
- EFA_ADMIN_REG_MR_CMD_PHYS_PAGE_SIZE_SHIFT_MASK;
+ EFA_SET(&mr_cmd.flags, EFA_ADMIN_REG_MR_CMD_PHYS_PAGE_SIZE_SHIFT,
+ params->page_shift);
mr_cmd.iova = params->iova;
mr_cmd.permissions = params->permissions;
@@ -242,11 +243,11 @@ int efa_com_register_mr(struct efa_com_dev *edev,
params->pbl.pbl.address.mem_addr_low;
mr_cmd.pbl.pbl.address.mem_addr_high =
params->pbl.pbl.address.mem_addr_high;
- mr_cmd.aq_common_desc.flags |=
- EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_MASK;
+ EFA_SET(&mr_cmd.aq_common_desc.flags,
+ EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA, 1);
if (params->indirect)
- mr_cmd.aq_common_desc.flags |=
- EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT_MASK;
+ EFA_SET(&mr_cmd.aq_common_desc.flags,
+ EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT, 1);
}
err = efa_com_cmd_exec(aq,
@@ -386,9 +387,8 @@ static int efa_com_get_feature_ex(struct efa_com_dev *edev,
get_cmd.aq_common_descriptor.opcode = EFA_ADMIN_GET_FEATURE;
if (control_buff_size)
- get_cmd.aq_common_descriptor.flags =
- EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT_MASK;
-
+ EFA_SET(&get_cmd.aq_common_descriptor.flags,
+ EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT, 1);
efa_com_set_dma_addr(control_buf_dma_addr,
&get_cmd.control_buffer.address.mem_addr_high,
@@ -538,8 +538,9 @@ static int efa_com_set_feature_ex(struct efa_com_dev *edev,
set_cmd->aq_common_descriptor.opcode = EFA_ADMIN_SET_FEATURE;
if (control_buff_size) {
- set_cmd->aq_common_descriptor.flags =
- EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT_MASK;
+ set_cmd->aq_common_descriptor.flags = 0;
+ EFA_SET(&set_cmd->aq_common_descriptor.flags,
+ EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT, 1);
efa_com_set_dma_addr(control_buf_dma_addr,
&set_cmd->control_buffer.address.mem_addr_high,
&set_cmd->control_buffer.address.mem_addr_low);
diff --git a/drivers/infiniband/hw/efa/efa_common_defs.h b/drivers/infiniband/hw/efa/efa_common_defs.h
index c559ec08898e..90af1c82c9c6 100644
--- a/drivers/infiniband/hw/efa/efa_common_defs.h
+++ b/drivers/infiniband/hw/efa/efa_common_defs.h
@@ -1,14 +1,25 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_COMMON_H_
#define _EFA_COMMON_H_
+#include
+
#define EFA_COMMON_SPEC_VERSION_MAJOR 2
#define EFA_COMMON_SPEC_VERSION_MINOR 0
+#define EFA_GET(ptr, mask) FIELD_GET(mask##_MASK, *(ptr))
+
+#define EFA_SET(ptr, mask, value) \
+ ({ \
+ typeof(ptr) _ptr = ptr; \
+ *_ptr = (*_ptr & ~(mask##_MASK)) | \
+ FIELD_PREP(mask##_MASK, value); \
+ })
+
struct efa_common_mem_addr {
u32 mem_addr_low;
diff --git a/drivers/infiniband/hw/efa/efa_regs_defs.h b/drivers/infiniband/hw/efa/efa_regs_defs.h
index bb9cad3d6a15..4017982fe13b 100644
--- a/drivers/infiniband/hw/efa/efa_regs_defs.h
+++ b/drivers/infiniband/hw/efa/efa_regs_defs.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_REGS_H_
@@ -45,69 +45,52 @@ enum efa_regs_reset_reason_types {
/* version register */
#define EFA_REGS_VERSION_MINOR_VERSION_MASK 0xff
-#define EFA_REGS_VERSION_MAJOR_VERSION_SHIFT 8
#define EFA_REGS_VERSION_MAJOR_VERSION_MASK 0xff00
/* controller_version register */
#define EFA_REGS_CONTROLLER_VERSION_SUBMINOR_VERSION_MASK 0xff
-#define EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION_SHIFT 8
#define EFA_REGS_CONTROLLER_VERSION_MINOR_VERSION_MASK 0xff00
-#define EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_SHIFT 16
#define EFA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_MASK 0xff0000
-#define EFA_REGS_CONTROLLER_VERSION_IMPL_ID_SHIFT 24
#define EFA_REGS_CONTROLLER_VERSION_IMPL_ID_MASK 0xff000000
/* caps register */
#define EFA_REGS_CAPS_CONTIGUOUS_QUEUE_REQUIRED_MASK 0x1
-#define EFA_REGS_CAPS_RESET_TIMEOUT_SHIFT 1
#define EFA_REGS_CAPS_RESET_TIMEOUT_MASK 0x3e
-#define EFA_REGS_CAPS_DMA_ADDR_WIDTH_SHIFT 8
#define EFA_REGS_CAPS_DMA_ADDR_WIDTH_MASK 0xff00
-#define EFA_REGS_CAPS_ADMIN_CMD_TO_SHIFT 16
#define EFA_REGS_CAPS_ADMIN_CMD_TO_MASK 0xf0000
/* aq_caps register */
#define EFA_REGS_AQ_CAPS_AQ_DEPTH_MASK 0xffff
-#define EFA_REGS_AQ_CAPS_AQ_ENTRY_SIZE_SHIFT 16
#define EFA_REGS_AQ_CAPS_AQ_ENTRY_SIZE_MASK 0xffff0000
/* acq_caps register */
#define EFA_REGS_ACQ_CAPS_ACQ_DEPTH_MASK 0xffff
-#define EFA_REGS_ACQ_CAPS_ACQ_ENTRY_SIZE_SHIFT 16
#define EFA_REGS_ACQ_CAPS_ACQ_ENTRY_SIZE_MASK 0xff0000
-#define EFA_REGS_ACQ_CAPS_ACQ_MSIX_VECTOR_SHIFT 24
#define EFA_REGS_ACQ_CAPS_ACQ_MSIX_VECTOR_MASK 0xff000000
/* aenq_caps register */
#define EFA_REGS_AENQ_CAPS_AENQ_DEPTH_MASK 0xffff
-#define EFA_REGS_AENQ_CAPS_AENQ_ENTRY_SIZE_SHIFT 16
#define EFA_REGS_AENQ_CAPS_AENQ_ENTRY_SIZE_MASK 0xff0000
-#define EFA_REGS_AENQ_CAPS_AENQ_MSIX_VECTOR_SHIFT 24
#define EFA_REGS_AENQ_CAPS_AENQ_MSIX_VECTOR_MASK 0xff000000
+/* intr_mask register */
+#define EFA_REGS_INTR_MASK_EN_MASK 0x1
+
/* dev_ctl register */
#define EFA_REGS_DEV_CTL_DEV_RESET_MASK 0x1
-#define EFA_REGS_DEV_CTL_AQ_RESTART_SHIFT 1
#define EFA_REGS_DEV_CTL_AQ_RESTART_MASK 0x2
-#define EFA_REGS_DEV_CTL_RESET_REASON_SHIFT 28
#define EFA_REGS_DEV_CTL_RESET_REASON_MASK 0xf0000000
/* dev_sts register */
#define EFA_REGS_DEV_STS_READY_MASK 0x1
-#define EFA_REGS_DEV_STS_AQ_RESTART_IN_PROGRESS_SHIFT 1
#define EFA_REGS_DEV_STS_AQ_RESTART_IN_PROGRESS_MASK 0x2
-#define EFA_REGS_DEV_STS_AQ_RESTART_FINISHED_SHIFT 2
#define EFA_REGS_DEV_STS_AQ_RESTART_FINISHED_MASK 0x4
-#define EFA_REGS_DEV_STS_RESET_IN_PROGRESS_SHIFT 3
#define EFA_REGS_DEV_STS_RESET_IN_PROGRESS_MASK 0x8
-#define EFA_REGS_DEV_STS_RESET_FINISHED_SHIFT 4
#define EFA_REGS_DEV_STS_RESET_FINISHED_MASK 0x10
-#define EFA_REGS_DEV_STS_FATAL_ERROR_SHIFT 5
#define EFA_REGS_DEV_STS_FATAL_ERROR_MASK 0x20
/* mmio_reg_read register */
#define EFA_REGS_MMIO_REG_READ_REQ_ID_MASK 0xffff
-#define EFA_REGS_MMIO_REG_READ_REG_OFF_SHIFT 16
#define EFA_REGS_MMIO_REG_READ_REG_OFF_MASK 0xffff0000
#endif /* _EFA_REGS_H_ */
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index ec5545870554..5c57098a4aee 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/*
- * Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#include
@@ -144,9 +144,6 @@ static inline bool is_rdma_read_cap(struct efa_dev *dev)
return dev->dev_attr.device_caps & EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_RDMA_READ_MASK;
}
-#define field_avail(x, fld, sz) (offsetof(typeof(x), fld) + \
- sizeof_field(typeof(x), fld) <= (sz))
-
#define is_reserved_cleared(reserved) \
!memchr_inv(reserved, 0, sizeof(reserved))
@@ -169,6 +166,14 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
return addr;
}
+static void efa_free_mapped(struct efa_dev *dev, void *cpu_addr,
+ dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir)
+{
+ dma_unmap_single(&dev->pdev->dev, dma_addr, size, dir);
+ free_pages_exact(cpu_addr, size);
+}
+
int efa_query_device(struct ib_device *ibdev,
struct ib_device_attr *props,
struct ib_udata *udata)
@@ -402,6 +407,9 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
int err;
ibdev_dbg(&dev->ibdev, "Destroy qp[%u]\n", ibqp->qp_num);
+
+ efa_qp_user_mmap_entries_remove(qp);
+
err = efa_destroy_qp_handle(dev, qp->qp_handle);
if (err)
return err;
@@ -411,11 +419,10 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
"qp->cpu_addr[0x%p] freed: size[%lu], dma[%pad]\n",
qp->rq_cpu_addr, qp->rq_size,
&qp->rq_dma_addr);
- dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr, qp->rq_size,
- DMA_TO_DEVICE);
+ efa_free_mapped(dev, qp->rq_cpu_addr, qp->rq_dma_addr,
+ qp->rq_size, DMA_TO_DEVICE);
}
- efa_qp_user_mmap_entries_remove(qp);
kfree(qp);
return 0;
}
@@ -599,7 +606,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
if (err)
goto err_out;
- if (!field_avail(cmd, driver_qp_type, udata->inlen)) {
+ if (offsetofend(typeof(cmd), driver_qp_type) > udata->inlen) {
ibdev_dbg(&dev->ibdev,
"Incompatible ABI params, no input udata\n");
err = -EINVAL;
@@ -720,13 +727,9 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
err_destroy_qp:
efa_destroy_qp_handle(dev, create_qp_resp.qp_handle);
err_free_mapped:
- if (qp->rq_size) {
- dma_unmap_single(&dev->pdev->dev, qp->rq_dma_addr, qp->rq_size,
- DMA_TO_DEVICE);
-
- if (!qp->rq_mmap_entry)
- free_pages_exact(qp->rq_cpu_addr, qp->rq_size);
- }
+ if (qp->rq_size)
+ efa_free_mapped(dev, qp->rq_cpu_addr, qp->rq_dma_addr,
+ qp->rq_size, DMA_TO_DEVICE);
err_free_qp:
kfree(qp);
err_out:
@@ -845,10 +848,10 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
"Destroy cq[%d] virt[0x%p] freed: size[%lu], dma[%pad]\n",
cq->cq_idx, cq->cpu_addr, cq->size, &cq->dma_addr);
- efa_destroy_cq_idx(dev, cq->cq_idx);
- dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
- DMA_FROM_DEVICE);
rdma_user_mmap_entry_remove(cq->mmap_entry);
+ efa_destroy_cq_idx(dev, cq->cq_idx);
+ efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size,
+ DMA_FROM_DEVICE);
}
static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
@@ -890,7 +893,7 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
goto err_out;
}
- if (!field_avail(cmd, num_sub_cqs, udata->inlen)) {
+ if (offsetofend(typeof(cmd), num_sub_cqs) > udata->inlen) {
ibdev_dbg(ibdev,
"Incompatible ABI params, no input udata\n");
err = -EINVAL;
@@ -985,10 +988,8 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
err_destroy_cq:
efa_destroy_cq_idx(dev, cq->cq_idx);
err_free_mapped:
- dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size,
- DMA_FROM_DEVICE);
- if (!cq->mmap_entry)
- free_pages_exact(cq->cpu_addr, cq->size);
+ efa_free_mapped(dev, cq->cpu_addr, cq->dma_addr, cq->size,
+ DMA_FROM_DEVICE);
err_out:
atomic64_inc(&dev->stats.sw_stats.create_cq_err);
@@ -1550,10 +1551,6 @@ void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
{
struct efa_user_mmap_entry *entry = to_emmap(rdma_entry);
- /* DMA mapping is already gone, now free the pages */
- if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
- free_pages_exact(phys_to_virt(entry->address),
- entry->rdma_entry.npages * PAGE_SIZE);
kfree(entry);
}
diff --git a/drivers/infiniband/hw/hfi1/fault.c b/drivers/infiniband/hw/hfi1/fault.c
index 986c12153e62..0dfbcfb048ca 100644
--- a/drivers/infiniband/hw/hfi1/fault.c
+++ b/drivers/infiniband/hw/hfi1/fault.c
@@ -222,11 +222,11 @@ static ssize_t fault_opcodes_read(struct file *file, char __user *buf,
while (bit < bitsize) {
zero = find_next_zero_bit(fault->opcodes, bitsize, bit);
if (zero - 1 != bit)
- size += snprintf(data + size,
+ size += scnprintf(data + size,
datalen - size - 1,
"0x%lx-0x%lx,", bit, zero - 1);
else
- size += snprintf(data + size,
+ size += scnprintf(data + size,
datalen - size - 1, "0x%lx,",
bit);
bit = find_next_bit(fault->opcodes, bitsize, zero);
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 259115886d35..e7fdd70c6e78 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -209,7 +209,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
fd->mm = current->mm;
mmgrab(fd->mm);
fd->dd = dd;
- kobject_get(&fd->dd->kobj);
fp->private_data = fd;
return 0;
nomem:
@@ -713,7 +712,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
mmdrop(fdata->mm);
- kobject_put(&dd->kobj);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
@@ -1696,7 +1694,7 @@ static int user_add(struct hfi1_devdata *dd)
snprintf(name, sizeof(name), "%s_%d", class_name(), dd->unit);
ret = hfi1_cdev_init(dd->unit, name, &hfi1_file_ops,
&dd->user_cdev, &dd->user_device,
- true, &dd->kobj);
+ true, &dd->verbs_dev.rdi.ibdev.dev.kobj);
if (ret)
user_remove(dd);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index cae12f416ca0..b06c2594105a 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1413,8 +1413,6 @@ struct hfi1_devdata {
bool aspm_enabled; /* ASPM state: enabled/disabled */
struct rhashtable *sdma_rht;
- struct kobject kobj;
-
/* vnic data */
struct hfi1_vnic_data vnic;
/* Lock to protect IRQ SRC register access */
diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index e3acda7a0800..3759d9233a1c 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -1198,13 +1198,13 @@ static void finalize_asic_data(struct hfi1_devdata *dd,
}
/**
- * hfi1_clean_devdata - cleans up per-unit data structure
+ * hfi1_free_devdata - cleans up and frees per-unit data structure
* @dd: pointer to a valid devdata structure
*
- * It cleans up all data structures set up by
+ * It cleans up and frees all data structures set up by
* by hfi1_alloc_devdata().
*/
-static void hfi1_clean_devdata(struct hfi1_devdata *dd)
+void hfi1_free_devdata(struct hfi1_devdata *dd)
{
struct hfi1_asic_data *ad;
unsigned long flags;
@@ -1231,23 +1231,6 @@ static void hfi1_clean_devdata(struct hfi1_devdata *dd)
rvt_dealloc_device(&dd->verbs_dev.rdi);
}
-static void __hfi1_free_devdata(struct kobject *kobj)
-{
- struct hfi1_devdata *dd =
- container_of(kobj, struct hfi1_devdata, kobj);
-
- hfi1_clean_devdata(dd);
-}
-
-static struct kobj_type hfi1_devdata_type = {
- .release = __hfi1_free_devdata,
-};
-
-void hfi1_free_devdata(struct hfi1_devdata *dd)
-{
- kobject_put(&dd->kobj);
-}
-
/**
* hfi1_alloc_devdata - Allocate our primary per-unit data structure.
* @pdev: Valid PCI device
@@ -1333,11 +1316,10 @@ static struct hfi1_devdata *hfi1_alloc_devdata(struct pci_dev *pdev,
goto bail;
}
- kobject_init(&dd->kobj, &hfi1_devdata_type);
return dd;
bail:
- hfi1_clean_devdata(dd);
+ hfi1_free_devdata(dd);
return ERR_PTR(ret);
}
diff --git a/drivers/infiniband/hw/hfi1/mad.c b/drivers/infiniband/hw/hfi1/mad.c
index a51bcd2b4391..7073f237a949 100644
--- a/drivers/infiniband/hw/hfi1/mad.c
+++ b/drivers/infiniband/hw/hfi1/mad.c
@@ -2381,7 +2381,7 @@ struct opa_port_status_rsp {
__be64 port_vl_rcv_bubble;
__be64 port_vl_mark_fecn;
__be64 port_vl_xmit_discards;
- } vls[0]; /* real array size defined by # bits set in vl_select_mask */
+ } vls[]; /* real array size defined by # bits set in vl_select_mask */
};
enum counter_selects {
@@ -2423,7 +2423,7 @@ struct opa_aggregate {
__be16 attr_id;
__be16 err_reqlength; /* 1 bit, 8 res, 7 bit */
__be32 attr_mod;
- u8 data[0];
+ u8 data[];
};
#define MSK_LLI 0x000000f0
diff --git a/drivers/infiniband/hw/hfi1/mad.h b/drivers/infiniband/hw/hfi1/mad.h
index 2f48e6953629..889e63d3f2cc 100644
--- a/drivers/infiniband/hw/hfi1/mad.h
+++ b/drivers/infiniband/hw/hfi1/mad.h
@@ -165,7 +165,7 @@ struct opa_mad_notice_attr {
} __packed ntc_2048;
};
- u8 class_data[0];
+ u8 class_data[];
};
#define IB_VLARB_LOWPRI_0_31 1
diff --git a/drivers/infiniband/hw/hfi1/pio.h b/drivers/infiniband/hw/hfi1/pio.h
index c9a58b642bdd..0102262343c0 100644
--- a/drivers/infiniband/hw/hfi1/pio.h
+++ b/drivers/infiniband/hw/hfi1/pio.h
@@ -243,7 +243,7 @@ struct sc_config_sizes {
*/
struct pio_map_elem {
u32 mask;
- struct send_context *ksc[0];
+ struct send_context *ksc[];
};
/*
@@ -263,7 +263,7 @@ struct pio_vl_map {
u32 mask;
u8 actual_vls;
u8 vls;
- struct pio_map_elem *map[0];
+ struct pio_map_elem *map[];
};
int pio_map_init(struct hfi1_devdata *dd, u8 port, u8 num_vls,
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c
index a51525647ac8..c93ea021cf49 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -833,7 +833,7 @@ struct sdma_engine *sdma_select_engine_sc(
struct sdma_rht_map_elem {
u32 mask;
u8 ctr;
- struct sdma_engine *sde[0];
+ struct sdma_engine *sde[];
};
struct sdma_rht_node {
diff --git a/drivers/infiniband/hw/hfi1/sdma.h b/drivers/infiniband/hw/hfi1/sdma.h
index 1e2e40f79cb2..7a851191f987 100644
--- a/drivers/infiniband/hw/hfi1/sdma.h
+++ b/drivers/infiniband/hw/hfi1/sdma.h
@@ -1002,7 +1002,7 @@ void sdma_engine_interrupt(struct sdma_engine *sde, u64 status);
*/
struct sdma_map_elem {
u32 mask;
- struct sdma_engine *sde[0];
+ struct sdma_engine *sde[];
};
/**
@@ -1024,7 +1024,7 @@ struct sdma_vl_map {
u32 mask;
u8 actual_vls;
u8 vls;
- struct sdma_map_elem *map[0];
+ struct sdma_map_elem *map[];
};
int sdma_map_init(
diff --git a/drivers/infiniband/hw/hfi1/sysfs.c b/drivers/infiniband/hw/hfi1/sysfs.c
index 90f62c4bddba..074ec71772d2 100644
--- a/drivers/infiniband/hw/hfi1/sysfs.c
+++ b/drivers/infiniband/hw/hfi1/sysfs.c
@@ -674,7 +674,11 @@ int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
dd_dev_err(dd,
"Skipping sc2vl sysfs info, (err %d) port %u\n",
ret, port_num);
- goto bail;
+ /*
+ * Based on the documentation for kobject_init_and_add(), the
+ * caller should call kobject_put even if this call fails.
+ */
+ goto bail_sc2vl;
}
kobject_uevent(&ppd->sc2vl_kobj, KOBJ_ADD);
@@ -684,7 +688,7 @@ int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
dd_dev_err(dd,
"Skipping sl2sc sysfs info, (err %d) port %u\n",
ret, port_num);
- goto bail_sc2vl;
+ goto bail_sl2sc;
}
kobject_uevent(&ppd->sl2sc_kobj, KOBJ_ADD);
@@ -694,7 +698,7 @@ int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
dd_dev_err(dd,
"Skipping vl2mtu sysfs info, (err %d) port %u\n",
ret, port_num);
- goto bail_sl2sc;
+ goto bail_vl2mtu;
}
kobject_uevent(&ppd->vl2mtu_kobj, KOBJ_ADD);
@@ -704,7 +708,7 @@ int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
dd_dev_err(dd,
"Skipping Congestion Control sysfs info, (err %d) port %u\n",
ret, port_num);
- goto bail_vl2mtu;
+ goto bail_cc;
}
kobject_uevent(&ppd->pport_cc_kobj, KOBJ_ADD);
@@ -742,7 +746,6 @@ int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
kobject_put(&ppd->sl2sc_kobj);
bail_sc2vl:
kobject_put(&ppd->sc2vl_kobj);
-bail:
return ret;
}
@@ -853,8 +856,13 @@ int hfi1_verbs_register_sysfs(struct hfi1_devdata *dd)
return 0;
bail:
- for (i = 0; i < dd->num_sdma; i++)
- kobject_del(&dd->per_sdma[i].kobj);
+ /*
+ * The function kobject_put() will call kobject_del() if the kobject
+ * has been added successfully. The sysfs files created under the
+ * kobject directory will also be removed during the process.
+ */
+ for (; i >= 0; i--)
+ kobject_put(&dd->per_sdma[i].kobj);
return ret;
}
@@ -867,6 +875,10 @@ void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *dd)
struct hfi1_pportdata *ppd;
int i;
+ /* Unwind operations in hfi1_verbs_register_sysfs() */
+ for (i = 0; i < dd->num_sdma; i++)
+ kobject_put(&dd->per_sdma[i].kobj);
+
for (i = 0; i < dd->num_pports; i++) {
ppd = &dd->pport[i];
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 6257eee083a1..332abb446861 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -73,7 +73,7 @@ struct tid_rb_node {
dma_addr_t dma_addr;
bool freed;
unsigned int npages;
- struct page *pages[0];
+ struct page *pages[];
};
static inline int num_user_pages(unsigned long addr,
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 5ffe4c996ed3..5bfb52ffd590 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -257,8 +257,8 @@ static int create_user_cq(struct hns_roce_dev *hr_dev,
return ret;
}
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
- (udata->outlen >= sizeof(*resp))) {
+ if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB &&
+ udata->outlen >= offsetofend(typeof(*resp), cap_flags)) {
ret = hns_roce_db_map_user(context, udata, ucmd.db_addr,
&hr_cq->db);
if (ret) {
@@ -321,8 +321,8 @@ static void destroy_user_cq(struct hns_roce_dev *hr_dev,
struct hns_roce_ucontext *context = rdma_udata_to_drv_context(
udata, struct hns_roce_ucontext, ibucontext);
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
- (udata->outlen >= sizeof(*resp)))
+ if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB &&
+ udata->outlen >= offsetofend(typeof(*resp), cap_flags))
hns_roce_db_unmap_user(context, &hr_cq->db);
hns_roce_mtt_cleanup(hr_dev, &hr_cq->mtt);
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index a7c4ff975c28..f6b3cf6b95d6 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -641,6 +641,19 @@ struct hns_roce_rinl_buf {
u32 wqe_cnt;
};
+enum {
+ HNS_ROCE_FLUSH_FLAG = 0,
+};
+
+struct hns_roce_work {
+ struct hns_roce_dev *hr_dev;
+ struct work_struct work;
+ u32 qpn;
+ u32 cqn;
+ int event_type;
+ int sub_type;
+};
+
struct hns_roce_qp {
struct ib_qp ibqp;
struct hns_roce_buf hr_buf;
@@ -656,11 +669,6 @@ struct hns_roce_qp {
struct ib_umem *umem;
struct hns_roce_mtt mtt;
struct hns_roce_mtr mtr;
-
- /* this define must less than HNS_ROCE_MAX_BT_REGION */
-#define HNS_ROCE_WQE_REGION_MAX 3
- struct hns_roce_buf_region regions[HNS_ROCE_WQE_REGION_MAX];
- int region_cnt;
int wqe_bt_pg_shift;
u32 buff_size;
@@ -684,6 +692,9 @@ struct hns_roce_qp {
struct hns_roce_sge sge;
u32 next_sge;
+ /* 0: flush needed, 1: unneeded */
+ unsigned long flush_flag;
+ struct hns_roce_work flush_work;
struct hns_roce_rinl_buf rq_inl_buf;
struct list_head node; /* all qps are on a list */
struct list_head rq_node; /* all recv qps are on a list */
@@ -762,14 +773,8 @@ struct hns_roce_eq {
int eqe_ba_pg_sz;
int eqe_buf_pg_sz;
int hop_num;
- u64 *bt_l0; /* Base address table for L0 */
- u64 **bt_l1; /* Base address table for L1 */
- u64 **buf;
- dma_addr_t l0_dma;
- dma_addr_t *l1_dma;
- dma_addr_t *buf_dma;
- u32 l0_last_num; /* L0 last chunk num */
- u32 l1_last_num; /* L1 last chunk num */
+ struct hns_roce_mtr mtr;
+ struct hns_roce_buf buf;
int eq_max_cnt;
int eq_period;
int shift;
@@ -881,7 +886,7 @@ struct hns_roce_caps {
u32 cqc_timer_ba_pg_sz;
u32 cqc_timer_buf_pg_sz;
u32 cqc_timer_hop_num;
- u32 cqe_ba_pg_sz;
+ u32 cqe_ba_pg_sz; /* page_size = 4K*(2^cqe_ba_pg_sz) */
u32 cqe_buf_pg_sz;
u32 cqe_hop_num;
u32 srqwqe_ba_pg_sz;
@@ -906,15 +911,6 @@ struct hns_roce_caps {
u16 default_ceq_arm_st;
};
-struct hns_roce_work {
- struct hns_roce_dev *hr_dev;
- struct work_struct work;
- u32 qpn;
- u32 cqn;
- int event_type;
- int sub_type;
-};
-
struct hns_roce_dfx_hw {
int (*query_cqc_info)(struct hns_roce_dev *hr_dev, u32 cqn,
int *buffer);
@@ -1237,9 +1233,10 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *ib_pd,
struct ib_udata *udata);
int hns_roce_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
int attr_mask, struct ib_udata *udata);
-void *get_recv_wqe(struct hns_roce_qp *hr_qp, int n);
-void *get_send_wqe(struct hns_roce_qp *hr_qp, int n);
-void *get_send_extend_sge(struct hns_roce_qp *hr_qp, int n);
+void init_flush_work(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
+void *hns_roce_get_recv_wqe(struct hns_roce_qp *hr_qp, int n);
+void *hns_roce_get_send_wqe(struct hns_roce_qp *hr_qp, int n);
+void *hns_roce_get_extend_sge(struct hns_roce_qp *hr_qp, int n);
bool hns_roce_wq_overflow(struct hns_roce_wq *hr_wq, int nreq,
struct ib_cq *ib_cq);
enum hns_roce_qp_state to_hns_roce_state(enum ib_qp_state state);
@@ -1248,9 +1245,8 @@ void hns_roce_lock_cqs(struct hns_roce_cq *send_cq,
void hns_roce_unlock_cqs(struct hns_roce_cq *send_cq,
struct hns_roce_cq *recv_cq);
void hns_roce_qp_remove(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
-void hns_roce_qp_free(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
-void hns_roce_release_range_qp(struct hns_roce_dev *hr_dev, int base_qpn,
- int cnt);
+void hns_roce_qp_destroy(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ struct ib_udata *udata);
__be32 send_ieth(const struct ib_send_wr *wr);
int to_hr_qp_type(int qp_type);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c b/drivers/infiniband/hw/hns/hns_roce_hem.c
index e82215774032..263338b90d7a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -39,6 +39,16 @@
#define DMA_ADDR_T_SHIFT 12
#define BT_BA_SHIFT 32
+#define HEM_INDEX_BUF BIT(0)
+#define HEM_INDEX_L0 BIT(1)
+#define HEM_INDEX_L1 BIT(2)
+struct hns_roce_hem_index {
+ u64 buf;
+ u64 l0;
+ u64 l1;
+ u32 inited; /* indicate which index is available */
+};
+
bool hns_roce_check_whether_mhop(struct hns_roce_dev *hr_dev, u32 type)
{
int hop_num = 0;
@@ -84,25 +94,27 @@ bool hns_roce_check_whether_mhop(struct hns_roce_dev *hr_dev, u32 type)
return hop_num ? true : false;
}
-static bool hns_roce_check_hem_null(struct hns_roce_hem **hem, u64 start_idx,
- u32 bt_chunk_num, u64 hem_max_num)
+static bool hns_roce_check_hem_null(struct hns_roce_hem **hem, u64 hem_idx,
+ u32 bt_chunk_num, u64 hem_max_num)
{
+ u64 start_idx = round_down(hem_idx, bt_chunk_num);
u64 check_max_num = start_idx + bt_chunk_num;
u64 i;
for (i = start_idx; (i < check_max_num) && (i < hem_max_num); i++)
- if (hem[i])
+ if (i != hem_idx && hem[i])
return false;
return true;
}
-static bool hns_roce_check_bt_null(u64 **bt, u64 start_idx, u32 bt_chunk_num)
+static bool hns_roce_check_bt_null(u64 **bt, u64 ba_idx, u32 bt_chunk_num)
{
+ u64 start_idx = round_down(ba_idx, bt_chunk_num);
int i;
for (i = 0; i < bt_chunk_num; i++)
- if (bt[start_idx + i])
+ if (i != ba_idx && bt[start_idx + i])
return false;
return true;
@@ -434,178 +446,235 @@ static int hns_roce_set_hem(struct hns_roce_dev *hr_dev,
return ret;
}
-static int hns_roce_table_mhop_get(struct hns_roce_dev *hr_dev,
- struct hns_roce_hem_table *table,
- unsigned long obj)
+static int calc_hem_config(struct hns_roce_dev *hr_dev,
+ struct hns_roce_hem_table *table, unsigned long obj,
+ struct hns_roce_hem_mhop *mhop,
+ struct hns_roce_hem_index *index)
{
- struct device *dev = hr_dev->dev;
- struct hns_roce_hem_mhop mhop;
- struct hns_roce_hem_iter iter;
- u32 buf_chunk_size;
- u32 bt_chunk_size;
- u32 chunk_ba_num;
- u32 hop_num;
- u32 size;
- u32 bt_num;
- u64 hem_idx;
- u64 bt_l1_idx = 0;
- u64 bt_l0_idx = 0;
- u64 bt_ba;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
unsigned long mhop_obj = obj;
- int bt_l1_allocated = 0;
- int bt_l0_allocated = 0;
- int step_idx;
+ u32 l0_idx, l1_idx, l2_idx;
+ u32 chunk_ba_num;
+ u32 bt_num;
int ret;
- ret = hns_roce_calc_hem_mhop(hr_dev, table, &mhop_obj, &mhop);
+ ret = hns_roce_calc_hem_mhop(hr_dev, table, &mhop_obj, mhop);
if (ret)
return ret;
- buf_chunk_size = mhop.buf_chunk_size;
- bt_chunk_size = mhop.bt_chunk_size;
- hop_num = mhop.hop_num;
- chunk_ba_num = bt_chunk_size / BA_BYTE_LEN;
-
- bt_num = hns_roce_get_bt_num(table->type, hop_num);
+ l0_idx = mhop->l0_idx;
+ l1_idx = mhop->l1_idx;
+ l2_idx = mhop->l2_idx;
+ chunk_ba_num = mhop->bt_chunk_size / BA_BYTE_LEN;
+ bt_num = hns_roce_get_bt_num(table->type, mhop->hop_num);
switch (bt_num) {
case 3:
- hem_idx = mhop.l0_idx * chunk_ba_num * chunk_ba_num +
- mhop.l1_idx * chunk_ba_num + mhop.l2_idx;
- bt_l1_idx = mhop.l0_idx * chunk_ba_num + mhop.l1_idx;
- bt_l0_idx = mhop.l0_idx;
+ index->l1 = l0_idx * chunk_ba_num + l1_idx;
+ index->l0 = l0_idx;
+ index->buf = l0_idx * chunk_ba_num * chunk_ba_num +
+ l1_idx * chunk_ba_num + l2_idx;
break;
case 2:
- hem_idx = mhop.l0_idx * chunk_ba_num + mhop.l1_idx;
- bt_l0_idx = mhop.l0_idx;
+ index->l0 = l0_idx;
+ index->buf = l0_idx * chunk_ba_num + l1_idx;
break;
case 1:
- hem_idx = mhop.l0_idx;
+ index->buf = l0_idx;
break;
default:
- dev_err(dev, "Table %d not support hop_num = %d!\n",
- table->type, hop_num);
+ ibdev_err(ibdev, "Table %d not support mhop.hop_num = %d!\n",
+ table->type, mhop->hop_num);
return -EINVAL;
}
- if (unlikely(hem_idx >= table->num_hem)) {
- dev_err(dev, "Table %d exceed hem limt idx = %llu,max = %lu!\n",
- table->type, hem_idx, table->num_hem);
+ if (unlikely(index->buf >= table->num_hem)) {
+ ibdev_err(ibdev, "Table %d exceed hem limt idx %llu,max %lu!\n",
+ table->type, index->buf, table->num_hem);
return -EINVAL;
}
- mutex_lock(&table->mutex);
+ return 0;
+}
- if (table->hem[hem_idx]) {
- ++table->hem[hem_idx]->refcount;
- goto out;
+static void free_mhop_hem(struct hns_roce_dev *hr_dev,
+ struct hns_roce_hem_table *table,
+ struct hns_roce_hem_mhop *mhop,
+ struct hns_roce_hem_index *index)
+{
+ u32 bt_size = mhop->bt_chunk_size;
+ struct device *dev = hr_dev->dev;
+
+ if (index->inited & HEM_INDEX_BUF) {
+ hns_roce_free_hem(hr_dev, table->hem[index->buf]);
+ table->hem[index->buf] = NULL;
}
+ if (index->inited & HEM_INDEX_L1) {
+ dma_free_coherent(dev, bt_size, table->bt_l1[index->l1],
+ table->bt_l1_dma_addr[index->l1]);
+ table->bt_l1[index->l1] = NULL;
+ }
+
+ if (index->inited & HEM_INDEX_L0) {
+ dma_free_coherent(dev, bt_size, table->bt_l0[index->l0],
+ table->bt_l0_dma_addr[index->l0]);
+ table->bt_l0[index->l0] = NULL;
+ }
+}
+
+static int alloc_mhop_hem(struct hns_roce_dev *hr_dev,
+ struct hns_roce_hem_table *table,
+ struct hns_roce_hem_mhop *mhop,
+ struct hns_roce_hem_index *index)
+{
+ u32 bt_size = mhop->bt_chunk_size;
+ struct device *dev = hr_dev->dev;
+ struct hns_roce_hem_iter iter;
+ gfp_t flag;
+ u64 bt_ba;
+ u32 size;
+ int ret;
+
/* alloc L1 BA's chunk */
- if ((check_whether_bt_num_3(table->type, hop_num) ||
- check_whether_bt_num_2(table->type, hop_num)) &&
- !table->bt_l0[bt_l0_idx]) {
- table->bt_l0[bt_l0_idx] = dma_alloc_coherent(dev, bt_chunk_size,
- &(table->bt_l0_dma_addr[bt_l0_idx]),
+ if ((check_whether_bt_num_3(table->type, mhop->hop_num) ||
+ check_whether_bt_num_2(table->type, mhop->hop_num)) &&
+ !table->bt_l0[index->l0]) {
+ table->bt_l0[index->l0] = dma_alloc_coherent(dev, bt_size,
+ &table->bt_l0_dma_addr[index->l0],
GFP_KERNEL);
- if (!table->bt_l0[bt_l0_idx]) {
+ if (!table->bt_l0[index->l0]) {
ret = -ENOMEM;
goto out;
}
- bt_l0_allocated = 1;
-
- /* set base address to hardware */
- if (table->type < HEM_TYPE_MTT) {
- step_idx = 0;
- if (hr_dev->hw->set_hem(hr_dev, table, obj, step_idx)) {
- ret = -ENODEV;
- dev_err(dev, "set HEM base address to HW failed!\n");
- goto err_dma_alloc_l1;
- }
- }
+ index->inited |= HEM_INDEX_L0;
}
/* alloc L2 BA's chunk */
- if (check_whether_bt_num_3(table->type, hop_num) &&
- !table->bt_l1[bt_l1_idx]) {
- table->bt_l1[bt_l1_idx] = dma_alloc_coherent(dev, bt_chunk_size,
- &(table->bt_l1_dma_addr[bt_l1_idx]),
+ if (check_whether_bt_num_3(table->type, mhop->hop_num) &&
+ !table->bt_l1[index->l1]) {
+ table->bt_l1[index->l1] = dma_alloc_coherent(dev, bt_size,
+ &table->bt_l1_dma_addr[index->l1],
GFP_KERNEL);
- if (!table->bt_l1[bt_l1_idx]) {
+ if (!table->bt_l1[index->l1]) {
ret = -ENOMEM;
- goto err_dma_alloc_l1;
- }
- bt_l1_allocated = 1;
- *(table->bt_l0[bt_l0_idx] + mhop.l1_idx) =
- table->bt_l1_dma_addr[bt_l1_idx];
-
- /* set base address to hardware */
- step_idx = 1;
- if (hr_dev->hw->set_hem(hr_dev, table, obj, step_idx)) {
- ret = -ENODEV;
- dev_err(dev, "set HEM base address to HW failed!\n");
- goto err_alloc_hem_buf;
+ goto err_alloc_hem;
}
+ index->inited |= HEM_INDEX_L1;
+ *(table->bt_l0[index->l0] + mhop->l1_idx) =
+ table->bt_l1_dma_addr[index->l1];
}
/*
* alloc buffer space chunk for QPC/MTPT/CQC/SRQC/SCCC.
* alloc bt space chunk for MTT/CQE.
*/
- size = table->type < HEM_TYPE_MTT ? buf_chunk_size : bt_chunk_size;
- table->hem[hem_idx] = hns_roce_alloc_hem(hr_dev,
- size >> PAGE_SHIFT,
- size,
- (table->lowmem ? GFP_KERNEL :
- GFP_HIGHUSER) | __GFP_NOWARN);
- if (!table->hem[hem_idx]) {
+ size = table->type < HEM_TYPE_MTT ? mhop->buf_chunk_size : bt_size;
+ flag = (table->lowmem ? GFP_KERNEL : GFP_HIGHUSER) | __GFP_NOWARN;
+ table->hem[index->buf] = hns_roce_alloc_hem(hr_dev, size >> PAGE_SHIFT,
+ size, flag);
+ if (!table->hem[index->buf]) {
ret = -ENOMEM;
- goto err_alloc_hem_buf;
+ goto err_alloc_hem;
}
- hns_roce_hem_first(table->hem[hem_idx], &iter);
+ index->inited |= HEM_INDEX_BUF;
+ hns_roce_hem_first(table->hem[index->buf], &iter);
bt_ba = hns_roce_hem_addr(&iter);
-
if (table->type < HEM_TYPE_MTT) {
- if (hop_num == 2) {
- *(table->bt_l1[bt_l1_idx] + mhop.l2_idx) = bt_ba;
- step_idx = 2;
- } else if (hop_num == 1) {
- *(table->bt_l0[bt_l0_idx] + mhop.l1_idx) = bt_ba;
- step_idx = 1;
- } else if (hop_num == HNS_ROCE_HOP_NUM_0) {
- step_idx = 0;
- } else {
- ret = -EINVAL;
- goto err_dma_alloc_l1;
- }
-
- /* set HEM base address to hardware */
- if (hr_dev->hw->set_hem(hr_dev, table, obj, step_idx)) {
- ret = -ENODEV;
- dev_err(dev, "set HEM base address to HW failed!\n");
- goto err_alloc_hem_buf;
- }
- } else if (hop_num == 2) {
- *(table->bt_l0[bt_l0_idx] + mhop.l1_idx) = bt_ba;
+ if (mhop->hop_num == 2)
+ *(table->bt_l1[index->l1] + mhop->l2_idx) = bt_ba;
+ else if (mhop->hop_num == 1)
+ *(table->bt_l0[index->l0] + mhop->l1_idx) = bt_ba;
+ } else if (mhop->hop_num == 2) {
+ *(table->bt_l0[index->l0] + mhop->l1_idx) = bt_ba;
}
- ++table->hem[hem_idx]->refcount;
+ return 0;
+err_alloc_hem:
+ free_mhop_hem(hr_dev, table, mhop, index);
+out:
+ return ret;
+}
+
+static int set_mhop_hem(struct hns_roce_dev *hr_dev,
+ struct hns_roce_hem_table *table, unsigned long obj,
+ struct hns_roce_hem_mhop *mhop,
+ struct hns_roce_hem_index *index)
+{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ int step_idx;
+ int ret = 0;
+
+ if (index->inited & HEM_INDEX_L0) {
+ ret = hr_dev->hw->set_hem(hr_dev, table, obj, 0);
+ if (ret) {
+ ibdev_err(ibdev, "set HEM step 0 failed!\n");
+ goto out;
+ }
+ }
+
+ if (index->inited & HEM_INDEX_L1) {
+ ret = hr_dev->hw->set_hem(hr_dev, table, obj, 1);
+ if (ret) {
+ ibdev_err(ibdev, "set HEM step 1 failed!\n");
+ goto out;
+ }
+ }
+
+ if (index->inited & HEM_INDEX_BUF) {
+ if (mhop->hop_num == HNS_ROCE_HOP_NUM_0)
+ step_idx = 0;
+ else
+ step_idx = mhop->hop_num;
+ ret = hr_dev->hw->set_hem(hr_dev, table, obj, step_idx);
+ if (ret)
+ ibdev_err(ibdev, "set HEM step last failed!\n");
+ }
+out:
+ return ret;
+}
+
+static int hns_roce_table_mhop_get(struct hns_roce_dev *hr_dev,
+ struct hns_roce_hem_table *table,
+ unsigned long obj)
+{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ struct hns_roce_hem_index index = {};
+ struct hns_roce_hem_mhop mhop = {};
+ int ret;
+
+ ret = calc_hem_config(hr_dev, table, obj, &mhop, &index);
+ if (ret) {
+ ibdev_err(ibdev, "calc hem config failed!\n");
+ return ret;
+ }
+
+ mutex_lock(&table->mutex);
+ if (table->hem[index.buf]) {
+ ++table->hem[index.buf]->refcount;
+ goto out;
+ }
+
+ ret = alloc_mhop_hem(hr_dev, table, &mhop, &index);
+ if (ret) {
+ ibdev_err(ibdev, "alloc mhop hem failed!\n");
+ goto out;
+ }
+
+ /* set HEM base address to hardware */
+ if (table->type < HEM_TYPE_MTT) {
+ ret = set_mhop_hem(hr_dev, table, obj, &mhop, &index);
+ if (ret) {
+ ibdev_err(ibdev, "set HEM address to HW failed!\n");
+ goto err_alloc;
+ }
+ }
+
+ ++table->hem[index.buf]->refcount;
goto out;
-err_alloc_hem_buf:
- if (bt_l1_allocated) {
- dma_free_coherent(dev, bt_chunk_size, table->bt_l1[bt_l1_idx],
- table->bt_l1_dma_addr[bt_l1_idx]);
- table->bt_l1[bt_l1_idx] = NULL;
- }
-
-err_dma_alloc_l1:
- if (bt_l0_allocated) {
- dma_free_coherent(dev, bt_chunk_size, table->bt_l0[bt_l0_idx],
- table->bt_l0_dma_addr[bt_l0_idx]);
- table->bt_l0[bt_l0_idx] = NULL;
- }
-
+err_alloc:
+ free_mhop_hem(hr_dev, table, &mhop, &index);
out:
mutex_unlock(&table->mutex);
return ret;
@@ -656,116 +725,75 @@ int hns_roce_table_get(struct hns_roce_dev *hr_dev,
return ret;
}
+static void clear_mhop_hem(struct hns_roce_dev *hr_dev,
+ struct hns_roce_hem_table *table, unsigned long obj,
+ struct hns_roce_hem_mhop *mhop,
+ struct hns_roce_hem_index *index)
+{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ u32 hop_num = mhop->hop_num;
+ u32 chunk_ba_num;
+ int step_idx;
+
+ index->inited = HEM_INDEX_BUF;
+ chunk_ba_num = mhop->bt_chunk_size / BA_BYTE_LEN;
+ if (check_whether_bt_num_2(table->type, hop_num)) {
+ if (hns_roce_check_hem_null(table->hem, index->buf,
+ chunk_ba_num, table->num_hem))
+ index->inited |= HEM_INDEX_L0;
+ } else if (check_whether_bt_num_3(table->type, hop_num)) {
+ if (hns_roce_check_hem_null(table->hem, index->buf,
+ chunk_ba_num, table->num_hem)) {
+ index->inited |= HEM_INDEX_L1;
+ if (hns_roce_check_bt_null(table->bt_l1, index->l1,
+ chunk_ba_num))
+ index->inited |= HEM_INDEX_L0;
+ }
+ }
+
+ if (table->type < HEM_TYPE_MTT) {
+ if (hop_num == HNS_ROCE_HOP_NUM_0)
+ step_idx = 0;
+ else
+ step_idx = hop_num;
+
+ if (hr_dev->hw->clear_hem(hr_dev, table, obj, step_idx))
+ ibdev_warn(ibdev, "Clear hop%d HEM failed.\n", hop_num);
+
+ if (index->inited & HEM_INDEX_L1)
+ if (hr_dev->hw->clear_hem(hr_dev, table, obj, 1))
+ ibdev_warn(ibdev, "Clear HEM step 1 failed.\n");
+
+ if (index->inited & HEM_INDEX_L0)
+ if (hr_dev->hw->clear_hem(hr_dev, table, obj, 0))
+ ibdev_warn(ibdev, "Clear HEM step 0 failed.\n");
+ }
+}
+
static void hns_roce_table_mhop_put(struct hns_roce_dev *hr_dev,
struct hns_roce_hem_table *table,
unsigned long obj,
int check_refcount)
{
- struct device *dev = hr_dev->dev;
- struct hns_roce_hem_mhop mhop;
- unsigned long mhop_obj = obj;
- u32 bt_chunk_size;
- u32 chunk_ba_num;
- u32 hop_num;
- u32 start_idx;
- u32 bt_num;
- u64 hem_idx;
- u64 bt_l1_idx = 0;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ struct hns_roce_hem_index index = {};
+ struct hns_roce_hem_mhop mhop = {};
int ret;
- ret = hns_roce_calc_hem_mhop(hr_dev, table, &mhop_obj, &mhop);
- if (ret)
- return;
-
- bt_chunk_size = mhop.bt_chunk_size;
- hop_num = mhop.hop_num;
- chunk_ba_num = bt_chunk_size / BA_BYTE_LEN;
-
- bt_num = hns_roce_get_bt_num(table->type, hop_num);
- switch (bt_num) {
- case 3:
- hem_idx = mhop.l0_idx * chunk_ba_num * chunk_ba_num +
- mhop.l1_idx * chunk_ba_num + mhop.l2_idx;
- bt_l1_idx = mhop.l0_idx * chunk_ba_num + mhop.l1_idx;
- break;
- case 2:
- hem_idx = mhop.l0_idx * chunk_ba_num + mhop.l1_idx;
- break;
- case 1:
- hem_idx = mhop.l0_idx;
- break;
- default:
- dev_err(dev, "Table %d not support hop_num = %d!\n",
- table->type, hop_num);
+ ret = calc_hem_config(hr_dev, table, obj, &mhop, &index);
+ if (ret) {
+ ibdev_err(ibdev, "calc hem config failed!\n");
return;
}
mutex_lock(&table->mutex);
-
- if (check_refcount && (--table->hem[hem_idx]->refcount > 0)) {
+ if (check_refcount && (--table->hem[index.buf]->refcount > 0)) {
mutex_unlock(&table->mutex);
return;
}
- if (table->type < HEM_TYPE_MTT && hop_num == 1) {
- if (hr_dev->hw->clear_hem(hr_dev, table, obj, 1))
- dev_warn(dev, "Clear HEM base address failed.\n");
- } else if (table->type < HEM_TYPE_MTT && hop_num == 2) {
- if (hr_dev->hw->clear_hem(hr_dev, table, obj, 2))
- dev_warn(dev, "Clear HEM base address failed.\n");
- } else if (table->type < HEM_TYPE_MTT &&
- hop_num == HNS_ROCE_HOP_NUM_0) {
- if (hr_dev->hw->clear_hem(hr_dev, table, obj, 0))
- dev_warn(dev, "Clear HEM base address failed.\n");
- }
-
- /*
- * free buffer space chunk for QPC/MTPT/CQC/SRQC/SCCC.
- * free bt space chunk for MTT/CQE.
- */
- hns_roce_free_hem(hr_dev, table->hem[hem_idx]);
- table->hem[hem_idx] = NULL;
-
- if (check_whether_bt_num_2(table->type, hop_num)) {
- start_idx = mhop.l0_idx * chunk_ba_num;
- if (hns_roce_check_hem_null(table->hem, start_idx,
- chunk_ba_num, table->num_hem)) {
- if (table->type < HEM_TYPE_MTT &&
- hr_dev->hw->clear_hem(hr_dev, table, obj, 0))
- dev_warn(dev, "Clear HEM base address failed.\n");
-
- dma_free_coherent(dev, bt_chunk_size,
- table->bt_l0[mhop.l0_idx],
- table->bt_l0_dma_addr[mhop.l0_idx]);
- table->bt_l0[mhop.l0_idx] = NULL;
- }
- } else if (check_whether_bt_num_3(table->type, hop_num)) {
- start_idx = mhop.l0_idx * chunk_ba_num * chunk_ba_num +
- mhop.l1_idx * chunk_ba_num;
- if (hns_roce_check_hem_null(table->hem, start_idx,
- chunk_ba_num, table->num_hem)) {
- if (hr_dev->hw->clear_hem(hr_dev, table, obj, 1))
- dev_warn(dev, "Clear HEM base address failed.\n");
-
- dma_free_coherent(dev, bt_chunk_size,
- table->bt_l1[bt_l1_idx],
- table->bt_l1_dma_addr[bt_l1_idx]);
- table->bt_l1[bt_l1_idx] = NULL;
-
- start_idx = mhop.l0_idx * chunk_ba_num;
- if (hns_roce_check_bt_null(table->bt_l1, start_idx,
- chunk_ba_num)) {
- if (hr_dev->hw->clear_hem(hr_dev, table, obj,
- 0))
- dev_warn(dev, "Clear HEM base address failed.\n");
-
- dma_free_coherent(dev, bt_chunk_size,
- table->bt_l0[mhop.l0_idx],
- table->bt_l0_dma_addr[mhop.l0_idx]);
- table->bt_l0[mhop.l0_idx] = NULL;
- }
- }
- }
+ clear_mhop_hem(hr_dev, table, obj, &mhop, &index);
+ free_mhop_hem(hr_dev, table, &mhop, &index);
mutex_unlock(&table->mutex);
}
@@ -1383,6 +1411,7 @@ static int hem_list_alloc_root_bt(struct hns_roce_dev *hr_dev,
void *cpu_base;
u64 phy_base;
int ret = 0;
+ int ba_num;
int offset;
int total;
int step;
@@ -1393,12 +1422,16 @@ static int hem_list_alloc_root_bt(struct hns_roce_dev *hr_dev,
if (root_hem)
return 0;
+ ba_num = hns_roce_hem_list_calc_root_ba(regions, region_cnt, unit);
+ if (ba_num < 1)
+ return -ENOMEM;
+
INIT_LIST_HEAD(&temp_root);
- total = r->offset;
+ offset = r->offset;
/* indicate to last region */
r = ®ions[region_cnt - 1];
- root_hem = hem_list_alloc_item(hr_dev, total, r->offset + r->count - 1,
- unit, true, 0);
+ root_hem = hem_list_alloc_item(hr_dev, offset, r->offset + r->count - 1,
+ ba_num, true, 0);
if (!root_hem)
return -ENOMEM;
list_add(&root_hem->list, &temp_root);
@@ -1410,7 +1443,7 @@ static int hem_list_alloc_root_bt(struct hns_roce_dev *hr_dev,
INIT_LIST_HEAD(&temp_list[i]);
total = 0;
- for (i = 0; i < region_cnt && total < unit; i++) {
+ for (i = 0; i < region_cnt && total < ba_num; i++) {
r = ®ions[i];
if (!r->count)
continue;
@@ -1443,7 +1476,8 @@ static int hem_list_alloc_root_bt(struct hns_roce_dev *hr_dev,
/* if exist mid bt, link L1 to L0 */
list_for_each_entry_safe(hem, temp_hem,
&hem_list->mid_bt[i][1], list) {
- offset = hem->start / step * BA_BYTE_LEN;
+ offset = (hem->start - r->offset) / step *
+ BA_BYTE_LEN;
hem_list_link_bt(hr_dev, cpu_base + offset,
hem->dma_addr);
total++;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index c6e66586e533..5ff028d77be3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -69,7 +69,7 @@ static int hns_roce_v1_post_send(struct ib_qp *ibqp,
struct hns_roce_wqe_data_seg *dseg = NULL;
struct hns_roce_qp *qp = to_hr_qp(ibqp);
struct device *dev = &hr_dev->pdev->dev;
- struct hns_roce_sq_db sq_db;
+ struct hns_roce_sq_db sq_db = {};
int ps_opcode = 0, i = 0;
unsigned long flags = 0;
void *wqe = NULL;
@@ -106,7 +106,7 @@ static int hns_roce_v1_post_send(struct ib_qp *ibqp,
goto out;
}
- wqe = get_send_wqe(qp, wqe_idx);
+ wqe = hns_roce_get_send_wqe(qp, wqe_idx);
qp->sq.wrid[wqe_idx] = wr->wr_id;
/* Corresponding to the RC and RD type wqe process separately */
@@ -318,8 +318,6 @@ static int hns_roce_v1_post_send(struct ib_qp *ibqp,
/* Memory barrier */
wmb();
- sq_db.u32_4 = 0;
- sq_db.u32_8 = 0;
roce_set_field(sq_db.u32_4, SQ_DOORBELL_U32_4_SQ_HEAD_M,
SQ_DOORBELL_U32_4_SQ_HEAD_S,
(qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1)));
@@ -351,7 +349,7 @@ static int hns_roce_v1_post_recv(struct ib_qp *ibqp,
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct device *dev = &hr_dev->pdev->dev;
- struct hns_roce_rq_db rq_db;
+ struct hns_roce_rq_db rq_db = {};
__le32 doorbell[2] = {0};
unsigned long flags = 0;
unsigned int wqe_idx;
@@ -380,7 +378,7 @@ static int hns_roce_v1_post_recv(struct ib_qp *ibqp,
goto out;
}
- ctrl = get_recv_wqe(hr_qp, wqe_idx);
+ ctrl = hns_roce_get_recv_wqe(hr_qp, wqe_idx);
roce_set_field(ctrl->rwqe_byte_12,
RQ_WQE_CTRL_RWQE_BYTE_12_RWQE_SGE_NUM_M,
@@ -418,9 +416,6 @@ static int hns_roce_v1_post_recv(struct ib_qp *ibqp,
ROCEE_QP1C_CFG3_0_REG +
QP1C_CFGN_OFFSET * hr_qp->phy_port, reg_val);
} else {
- rq_db.u32_4 = 0;
- rq_db.u32_8 = 0;
-
roce_set_field(rq_db.u32_4, RQ_DOORBELL_U32_4_RQ_HEAD_M,
RQ_DOORBELL_U32_4_RQ_HEAD_S,
hr_qp->rq.head);
@@ -2289,9 +2284,10 @@ static int hns_roce_v1_poll_one(struct hns_roce_cq *hr_cq,
if (is_send) {
/* SQ conrespond to CQE */
- sq_wqe = get_send_wqe(*cur_qp, roce_get_field(cqe->cqe_byte_4,
+ sq_wqe = hns_roce_get_send_wqe(*cur_qp,
+ roce_get_field(cqe->cqe_byte_4,
CQE_BYTE_4_WQE_INDEX_M,
- CQE_BYTE_4_WQE_INDEX_S)&
+ CQE_BYTE_4_WQE_INDEX_S) &
((*cur_qp)->sq.wqe_cnt-1));
switch (le32_to_cpu(sq_wqe->flag) & HNS_ROCE_WQE_OPCODE_MASK) {
case HNS_ROCE_WQE_OPCODE_SEND:
@@ -3623,26 +3619,11 @@ int hns_roce_v1_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
if (send_cq && send_cq != recv_cq)
__hns_roce_v1_cq_clean(send_cq, hr_qp->qpn, NULL);
}
+ hns_roce_qp_remove(hr_dev, hr_qp);
hns_roce_unlock_cqs(send_cq, recv_cq);
- hns_roce_qp_remove(hr_dev, hr_qp);
- hns_roce_qp_free(hr_dev, hr_qp);
+ hns_roce_qp_destroy(hr_dev, hr_qp, udata);
- /* RC QP, release QPN */
- if (hr_qp->ibqp.qp_type == IB_QPT_RC)
- hns_roce_release_range_qp(hr_dev, hr_qp->qpn, 1);
-
- hns_roce_mtt_cleanup(hr_dev, &hr_qp->mtt);
-
- ib_umem_release(hr_qp->umem);
- if (!udata) {
- kfree(hr_qp->sq.wrid);
- kfree(hr_qp->rq.wrid);
-
- hns_roce_buf_free(hr_dev, hr_qp->buff_size, &hr_qp->hr_buf);
- }
-
- kfree(hr_qp);
return 0;
}
@@ -3954,10 +3935,8 @@ static int hns_roce_v1_aeq_int(struct hns_roce_dev *hr_dev,
eq->cons_index++;
aeqes_found = 1;
- if (eq->cons_index > 2 * hr_dev->caps.aeqe_depth - 1) {
- dev_warn(dev, "cons_index overflow, set back to 0.\n");
+ if (eq->cons_index > 2 * hr_dev->caps.aeqe_depth - 1)
eq->cons_index = 0;
- }
}
set_eq_cons_index_v1(eq, 0);
@@ -4007,11 +3986,8 @@ static int hns_roce_v1_ceq_int(struct hns_roce_dev *hr_dev,
ceqes_found = 1;
if (eq->cons_index >
- EQ_DEPTH_COEFF * hr_dev->caps.ceqe_depth - 1) {
- dev_warn(&eq->hr_dev->pdev->dev,
- "cons_index overflow, set back to 0.\n");
+ EQ_DEPTH_COEFF * hr_dev->caps.ceqe_depth - 1)
eq->cons_index = 0;
- }
}
set_eq_cons_index_v1(eq, 0);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 12c4cd8e9378..c3316672b70e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -56,11 +56,45 @@ static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
dseg->len = cpu_to_le32(sg->length);
}
+/*
+ * mapped-value = 1 + real-value
+ * The hns wr opcode real value is start from 0, In order to distinguish between
+ * initialized and uninitialized map values, we plus 1 to the actual value when
+ * defining the mapping, so that the validity can be identified by checking the
+ * mapped value is greater than 0.
+ */
+#define HR_OPC_MAP(ib_key, hr_key) \
+ [IB_WR_ ## ib_key] = 1 + HNS_ROCE_V2_WQE_OP_ ## hr_key
+
+static const u32 hns_roce_op_code[] = {
+ HR_OPC_MAP(RDMA_WRITE, RDMA_WRITE),
+ HR_OPC_MAP(RDMA_WRITE_WITH_IMM, RDMA_WRITE_WITH_IMM),
+ HR_OPC_MAP(SEND, SEND),
+ HR_OPC_MAP(SEND_WITH_IMM, SEND_WITH_IMM),
+ HR_OPC_MAP(RDMA_READ, RDMA_READ),
+ HR_OPC_MAP(ATOMIC_CMP_AND_SWP, ATOM_CMP_AND_SWAP),
+ HR_OPC_MAP(ATOMIC_FETCH_AND_ADD, ATOM_FETCH_AND_ADD),
+ HR_OPC_MAP(SEND_WITH_INV, SEND_WITH_INV),
+ HR_OPC_MAP(LOCAL_INV, LOCAL_INV),
+ HR_OPC_MAP(MASKED_ATOMIC_CMP_AND_SWP, ATOM_MSK_CMP_AND_SWAP),
+ HR_OPC_MAP(MASKED_ATOMIC_FETCH_AND_ADD, ATOM_MSK_FETCH_AND_ADD),
+ HR_OPC_MAP(REG_MR, FAST_REG_PMR),
+};
+
+static u32 to_hr_opcode(u32 ib_opcode)
+{
+ if (ib_opcode >= ARRAY_SIZE(hns_roce_op_code))
+ return HNS_ROCE_V2_WQE_OP_MASK;
+
+ return hns_roce_op_code[ib_opcode] ? hns_roce_op_code[ib_opcode] - 1 :
+ HNS_ROCE_V2_WQE_OP_MASK;
+}
+
static void set_frmr_seg(struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
- struct hns_roce_wqe_frmr_seg *fseg,
- const struct ib_reg_wr *wr)
+ void *wqe, const struct ib_reg_wr *wr)
{
struct hns_roce_mr *mr = to_hr_mr(wr->mr);
+ struct hns_roce_wqe_frmr_seg *fseg = wqe;
/* use ib_access_flags */
roce_set_bit(rc_sq_wqe->byte_4, V2_RC_FRMR_WQE_BYTE_4_BIND_EN_S,
@@ -92,16 +126,26 @@ static void set_frmr_seg(struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
V2_RC_FRMR_WQE_BYTE_40_BLK_MODE_S, 0);
}
-static void set_atomic_seg(struct hns_roce_wqe_atomic_seg *aseg,
- const struct ib_atomic_wr *wr)
+static void set_atomic_seg(const struct ib_send_wr *wr, void *wqe,
+ struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
+ int valid_num_sge)
{
- if (wr->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
- aseg->fetchadd_swap_data = cpu_to_le64(wr->swap);
- aseg->cmp_data = cpu_to_le64(wr->compare_add);
+ struct hns_roce_wqe_atomic_seg *aseg;
+
+ set_data_seg_v2(wqe, wr->sg_list);
+ aseg = wqe + sizeof(struct hns_roce_v2_wqe_data_seg);
+
+ if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
+ aseg->fetchadd_swap_data = cpu_to_le64(atomic_wr(wr)->swap);
+ aseg->cmp_data = cpu_to_le64(atomic_wr(wr)->compare_add);
} else {
- aseg->fetchadd_swap_data = cpu_to_le64(wr->compare_add);
+ aseg->fetchadd_swap_data =
+ cpu_to_le64(atomic_wr(wr)->compare_add);
aseg->cmp_data = 0;
}
+
+ roce_set_field(rc_sq_wqe->byte_16, V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M,
+ V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, valid_num_sge);
}
static void set_extend_sge(struct hns_roce_qp *qp, const struct ib_send_wr *wr,
@@ -127,7 +171,7 @@ static void set_extend_sge(struct hns_roce_qp *qp, const struct ib_send_wr *wr,
* should calculate how many sges in the first page and the second
* page.
*/
- dseg = get_send_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
+ dseg = hns_roce_get_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
fi_sge_num = (round_up((uintptr_t)dseg, 1 << shift) -
(uintptr_t)dseg) /
sizeof(struct hns_roce_v2_wqe_data_seg);
@@ -137,7 +181,7 @@ static void set_extend_sge(struct hns_roce_qp *qp, const struct ib_send_wr *wr,
set_data_seg_v2(dseg++, sg + i);
(*sge_ind)++;
}
- dseg = get_send_extend_sge(qp,
+ dseg = hns_roce_get_extend_sge(qp,
(*sge_ind) & (qp->sge.sge_cnt - 1));
for (i = 0; i < se_sge_num; i++) {
set_data_seg_v2(dseg++, sg + fi_sge_num + i);
@@ -154,11 +198,11 @@ static void set_extend_sge(struct hns_roce_qp *qp, const struct ib_send_wr *wr,
static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr,
struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
void *wqe, unsigned int *sge_ind,
- int valid_num_sge,
- const struct ib_send_wr **bad_wr)
+ int valid_num_sge)
{
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_v2_wqe_data_seg *dseg = wqe;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
struct hns_roce_qp *qp = to_hr_qp(ibqp);
int j = 0;
int i;
@@ -166,15 +210,14 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr,
if (wr->send_flags & IB_SEND_INLINE && valid_num_sge) {
if (le32_to_cpu(rc_sq_wqe->msg_len) >
hr_dev->caps.max_sq_inline) {
- *bad_wr = wr;
- dev_err(hr_dev->dev, "inline len(1-%d)=%d, illegal",
- rc_sq_wqe->msg_len, hr_dev->caps.max_sq_inline);
+ ibdev_err(ibdev, "inline len(1-%d)=%d, illegal",
+ rc_sq_wqe->msg_len,
+ hr_dev->caps.max_sq_inline);
return -EINVAL;
}
if (wr->opcode == IB_WR_RDMA_READ) {
- *bad_wr = wr;
- dev_err(hr_dev->dev, "Not support inline data!\n");
+ ibdev_err(ibdev, "Not support inline data!\n");
return -EINVAL;
}
@@ -220,62 +263,287 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr,
return 0;
}
-static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
- const struct ib_qp_attr *attr,
- int attr_mask, enum ib_qp_state cur_state,
- enum ib_qp_state new_state);
-
static int check_send_valid(struct hns_roce_dev *hr_dev,
struct hns_roce_qp *hr_qp)
{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
struct ib_qp *ibqp = &hr_qp->ibqp;
- struct device *dev = hr_dev->dev;
if (unlikely(ibqp->qp_type != IB_QPT_RC &&
ibqp->qp_type != IB_QPT_GSI &&
ibqp->qp_type != IB_QPT_UD)) {
- dev_err(dev, "Not supported QP(0x%x)type!\n", ibqp->qp_type);
+ ibdev_err(ibdev, "Not supported QP(0x%x)type!\n",
+ ibqp->qp_type);
return -EOPNOTSUPP;
} else if (unlikely(hr_qp->state == IB_QPS_RESET ||
hr_qp->state == IB_QPS_INIT ||
hr_qp->state == IB_QPS_RTR)) {
- dev_err(dev, "Post WQE fail, QP state %d!\n", hr_qp->state);
+ ibdev_err(ibdev, "failed to post WQE, QP state %d!\n",
+ hr_qp->state);
return -EINVAL;
} else if (unlikely(hr_dev->state >= HNS_ROCE_DEVICE_STATE_RST_DOWN)) {
- dev_err(dev, "Post WQE fail, dev state %d!\n", hr_dev->state);
+ ibdev_err(ibdev, "failed to post WQE, dev state %d!\n",
+ hr_dev->state);
return -EIO;
}
return 0;
}
+static inline int calc_wr_sge_num(const struct ib_send_wr *wr, u32 *sge_len)
+{
+ int valid_num = 0;
+ u32 len = 0;
+ int i;
+
+ for (i = 0; i < wr->num_sge; i++) {
+ if (likely(wr->sg_list[i].length)) {
+ len += wr->sg_list[i].length;
+ valid_num++;
+ }
+ }
+
+ *sge_len = len;
+ return valid_num;
+}
+
+static inline int set_ud_wqe(struct hns_roce_qp *qp,
+ const struct ib_send_wr *wr,
+ void *wqe, unsigned int *sge_idx,
+ unsigned int owner_bit)
+{
+ struct hns_roce_dev *hr_dev = to_hr_dev(qp->ibqp.device);
+ struct hns_roce_ah *ah = to_hr_ah(ud_wr(wr)->ah);
+ struct hns_roce_v2_ud_send_wqe *ud_sq_wqe = wqe;
+ unsigned int curr_idx = *sge_idx;
+ int valid_num_sge;
+ u32 msg_len = 0;
+ bool loopback;
+ u8 *smac;
+
+ valid_num_sge = calc_wr_sge_num(wr, &msg_len);
+ memset(ud_sq_wqe, 0, sizeof(*ud_sq_wqe));
+
+ roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_0_M,
+ V2_UD_SEND_WQE_DMAC_0_S, ah->av.mac[0]);
+ roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_1_M,
+ V2_UD_SEND_WQE_DMAC_1_S, ah->av.mac[1]);
+ roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_2_M,
+ V2_UD_SEND_WQE_DMAC_2_S, ah->av.mac[2]);
+ roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_3_M,
+ V2_UD_SEND_WQE_DMAC_3_S, ah->av.mac[3]);
+ roce_set_field(ud_sq_wqe->byte_48, V2_UD_SEND_WQE_BYTE_48_DMAC_4_M,
+ V2_UD_SEND_WQE_BYTE_48_DMAC_4_S, ah->av.mac[4]);
+ roce_set_field(ud_sq_wqe->byte_48, V2_UD_SEND_WQE_BYTE_48_DMAC_5_M,
+ V2_UD_SEND_WQE_BYTE_48_DMAC_5_S, ah->av.mac[5]);
+
+ /* MAC loopback */
+ smac = (u8 *)hr_dev->dev_addr[qp->port];
+ loopback = ether_addr_equal_unaligned(ah->av.mac, smac) ? 1 : 0;
+
+ roce_set_bit(ud_sq_wqe->byte_40,
+ V2_UD_SEND_WQE_BYTE_40_LBI_S, loopback);
+
+ roce_set_field(ud_sq_wqe->byte_4,
+ V2_UD_SEND_WQE_BYTE_4_OPCODE_M,
+ V2_UD_SEND_WQE_BYTE_4_OPCODE_S,
+ HNS_ROCE_V2_WQE_OP_SEND);
+
+ ud_sq_wqe->msg_len = cpu_to_le32(msg_len);
+
+ switch (wr->opcode) {
+ case IB_WR_SEND_WITH_IMM:
+ case IB_WR_RDMA_WRITE_WITH_IMM:
+ ud_sq_wqe->immtdata = cpu_to_le32(be32_to_cpu(wr->ex.imm_data));
+ break;
+ default:
+ ud_sq_wqe->immtdata = 0;
+ break;
+ }
+
+ /* Set sig attr */
+ roce_set_bit(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_CQE_S,
+ (wr->send_flags & IB_SEND_SIGNALED) ? 1 : 0);
+
+ /* Set se attr */
+ roce_set_bit(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_SE_S,
+ (wr->send_flags & IB_SEND_SOLICITED) ? 1 : 0);
+
+ roce_set_bit(ud_sq_wqe->byte_4, V2_UD_SEND_WQE_BYTE_4_OWNER_S,
+ owner_bit);
+
+ roce_set_field(ud_sq_wqe->byte_16, V2_UD_SEND_WQE_BYTE_16_PD_M,
+ V2_UD_SEND_WQE_BYTE_16_PD_S, to_hr_pd(qp->ibqp.pd)->pdn);
+
+ roce_set_field(ud_sq_wqe->byte_16, V2_UD_SEND_WQE_BYTE_16_SGE_NUM_M,
+ V2_UD_SEND_WQE_BYTE_16_SGE_NUM_S, valid_num_sge);
+
+ roce_set_field(ud_sq_wqe->byte_20,
+ V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M,
+ V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S,
+ curr_idx & (qp->sge.sge_cnt - 1));
+
+ roce_set_field(ud_sq_wqe->byte_24, V2_UD_SEND_WQE_BYTE_24_UDPSPN_M,
+ V2_UD_SEND_WQE_BYTE_24_UDPSPN_S, 0);
+ ud_sq_wqe->qkey = cpu_to_le32(ud_wr(wr)->remote_qkey & 0x80000000 ?
+ qp->qkey : ud_wr(wr)->remote_qkey);
+ roce_set_field(ud_sq_wqe->byte_32, V2_UD_SEND_WQE_BYTE_32_DQPN_M,
+ V2_UD_SEND_WQE_BYTE_32_DQPN_S, ud_wr(wr)->remote_qpn);
+
+ roce_set_field(ud_sq_wqe->byte_36, V2_UD_SEND_WQE_BYTE_36_VLAN_M,
+ V2_UD_SEND_WQE_BYTE_36_VLAN_S, ah->av.vlan_id);
+ roce_set_field(ud_sq_wqe->byte_36, V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_M,
+ V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_S, ah->av.hop_limit);
+ roce_set_field(ud_sq_wqe->byte_36, V2_UD_SEND_WQE_BYTE_36_TCLASS_M,
+ V2_UD_SEND_WQE_BYTE_36_TCLASS_S, ah->av.tclass);
+ roce_set_field(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_M,
+ V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_S, ah->av.flowlabel);
+ roce_set_field(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_SL_M,
+ V2_UD_SEND_WQE_BYTE_40_SL_S, ah->av.sl);
+ roce_set_field(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_PORTN_M,
+ V2_UD_SEND_WQE_BYTE_40_PORTN_S, qp->port);
+
+ roce_set_bit(ud_sq_wqe->byte_40, V2_UD_SEND_WQE_BYTE_40_UD_VLAN_EN_S,
+ ah->av.vlan_en ? 1 : 0);
+ roce_set_field(ud_sq_wqe->byte_48, V2_UD_SEND_WQE_BYTE_48_SGID_INDX_M,
+ V2_UD_SEND_WQE_BYTE_48_SGID_INDX_S, ah->av.gid_index);
+
+ memcpy(&ud_sq_wqe->dgid[0], &ah->av.dgid[0], GID_LEN_V2);
+
+ set_extend_sge(qp, wr, &curr_idx, valid_num_sge);
+
+ *sge_idx = curr_idx;
+
+ return 0;
+}
+
+static inline int set_rc_wqe(struct hns_roce_qp *qp,
+ const struct ib_send_wr *wr,
+ void *wqe, unsigned int *sge_idx,
+ unsigned int owner_bit)
+{
+ struct hns_roce_v2_rc_send_wqe *rc_sq_wqe = wqe;
+ unsigned int curr_idx = *sge_idx;
+ int valid_num_sge;
+ u32 msg_len = 0;
+ int ret = 0;
+
+ valid_num_sge = calc_wr_sge_num(wr, &msg_len);
+ memset(rc_sq_wqe, 0, sizeof(*rc_sq_wqe));
+
+ rc_sq_wqe->msg_len = cpu_to_le32(msg_len);
+
+ switch (wr->opcode) {
+ case IB_WR_SEND_WITH_IMM:
+ case IB_WR_RDMA_WRITE_WITH_IMM:
+ rc_sq_wqe->immtdata = cpu_to_le32(be32_to_cpu(wr->ex.imm_data));
+ break;
+ case IB_WR_SEND_WITH_INV:
+ rc_sq_wqe->inv_key = cpu_to_le32(wr->ex.invalidate_rkey);
+ break;
+ default:
+ rc_sq_wqe->immtdata = 0;
+ break;
+ }
+
+ roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_FENCE_S,
+ (wr->send_flags & IB_SEND_FENCE) ? 1 : 0);
+
+ roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_SE_S,
+ (wr->send_flags & IB_SEND_SOLICITED) ? 1 : 0);
+
+ roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_CQE_S,
+ (wr->send_flags & IB_SEND_SIGNALED) ? 1 : 0);
+
+ roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_OWNER_S,
+ owner_bit);
+
+ wqe += sizeof(struct hns_roce_v2_rc_send_wqe);
+ switch (wr->opcode) {
+ case IB_WR_RDMA_READ:
+ case IB_WR_RDMA_WRITE:
+ case IB_WR_RDMA_WRITE_WITH_IMM:
+ rc_sq_wqe->rkey = cpu_to_le32(rdma_wr(wr)->rkey);
+ rc_sq_wqe->va = cpu_to_le64(rdma_wr(wr)->remote_addr);
+ break;
+ case IB_WR_LOCAL_INV:
+ roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_SO_S, 1);
+ rc_sq_wqe->inv_key = cpu_to_le32(wr->ex.invalidate_rkey);
+ break;
+ case IB_WR_REG_MR:
+ set_frmr_seg(rc_sq_wqe, wqe, reg_wr(wr));
+ break;
+ case IB_WR_ATOMIC_CMP_AND_SWP:
+ case IB_WR_ATOMIC_FETCH_AND_ADD:
+ rc_sq_wqe->rkey = cpu_to_le32(atomic_wr(wr)->rkey);
+ rc_sq_wqe->va = cpu_to_le64(atomic_wr(wr)->remote_addr);
+ break;
+ default:
+ break;
+ }
+
+ roce_set_field(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_OPCODE_M,
+ V2_RC_SEND_WQE_BYTE_4_OPCODE_S,
+ to_hr_opcode(wr->opcode));
+
+ if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+ wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
+ set_atomic_seg(wr, wqe, rc_sq_wqe, valid_num_sge);
+ else if (wr->opcode != IB_WR_REG_MR)
+ ret = set_rwqe_data_seg(&qp->ibqp, wr, rc_sq_wqe,
+ wqe, &curr_idx, valid_num_sge);
+
+ *sge_idx = curr_idx;
+
+ return ret;
+}
+
+static inline void update_sq_db(struct hns_roce_dev *hr_dev,
+ struct hns_roce_qp *qp)
+{
+ /*
+ * Hip08 hardware cannot flush the WQEs in SQ if the QP state
+ * gets into errored mode. Hence, as a workaround to this
+ * hardware limitation, driver needs to assist in flushing. But
+ * the flushing operation uses mailbox to convey the QP state to
+ * the hardware and which can sleep due to the mutex protection
+ * around the mailbox calls. Hence, use the deferred flush for
+ * now.
+ */
+ if (qp->state == IB_QPS_ERR) {
+ if (!test_and_set_bit(HNS_ROCE_FLUSH_FLAG, &qp->flush_flag))
+ init_flush_work(hr_dev, qp);
+ } else {
+ struct hns_roce_v2_db sq_db = {};
+
+ roce_set_field(sq_db.byte_4, V2_DB_BYTE_4_TAG_M,
+ V2_DB_BYTE_4_TAG_S, qp->doorbell_qpn);
+ roce_set_field(sq_db.byte_4, V2_DB_BYTE_4_CMD_M,
+ V2_DB_BYTE_4_CMD_S, HNS_ROCE_V2_SQ_DB);
+ roce_set_field(sq_db.parameter, V2_DB_PARAMETER_IDX_M,
+ V2_DB_PARAMETER_IDX_S,
+ qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1));
+ roce_set_field(sq_db.parameter, V2_DB_PARAMETER_SL_M,
+ V2_DB_PARAMETER_SL_S, qp->sl);
+
+ hns_roce_write64(hr_dev, (__le32 *)&sq_db, qp->sq.db_reg_l);
+ }
+}
+
static int hns_roce_v2_post_send(struct ib_qp *ibqp,
const struct ib_send_wr *wr,
const struct ib_send_wr **bad_wr)
{
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
- struct hns_roce_ah *ah = to_hr_ah(ud_wr(wr)->ah);
- struct hns_roce_v2_ud_send_wqe *ud_sq_wqe;
- struct hns_roce_v2_rc_send_wqe *rc_sq_wqe;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
struct hns_roce_qp *qp = to_hr_qp(ibqp);
- struct hns_roce_wqe_frmr_seg *fseg;
- struct device *dev = hr_dev->dev;
- struct hns_roce_v2_db sq_db;
- struct ib_qp_attr attr;
+ unsigned long flags = 0;
unsigned int owner_bit;
unsigned int sge_idx;
unsigned int wqe_idx;
- unsigned long flags;
- int valid_num_sge;
void *wqe = NULL;
- bool loopback;
- int attr_mask;
- u32 tmp_len;
- u32 hr_op;
- u8 *smac;
int nreq;
int ret;
- int i;
spin_lock_irqsave(&qp->sq.lock, flags);
@@ -298,327 +566,37 @@ static int hns_roce_v2_post_send(struct ib_qp *ibqp,
wqe_idx = (qp->sq.head + nreq) & (qp->sq.wqe_cnt - 1);
if (unlikely(wr->num_sge > qp->sq.max_gs)) {
- dev_err(dev, "num_sge=%d > qp->sq.max_gs=%d\n",
- wr->num_sge, qp->sq.max_gs);
+ ibdev_err(ibdev, "num_sge=%d > qp->sq.max_gs=%d\n",
+ wr->num_sge, qp->sq.max_gs);
ret = -EINVAL;
*bad_wr = wr;
goto out;
}
- wqe = get_send_wqe(qp, wqe_idx);
+ wqe = hns_roce_get_send_wqe(qp, wqe_idx);
qp->sq.wrid[wqe_idx] = wr->wr_id;
owner_bit =
~(((qp->sq.head + nreq) >> ilog2(qp->sq.wqe_cnt)) & 0x1);
- valid_num_sge = 0;
- tmp_len = 0;
-
- for (i = 0; i < wr->num_sge; i++) {
- if (likely(wr->sg_list[i].length)) {
- tmp_len += wr->sg_list[i].length;
- valid_num_sge++;
- }
- }
/* Corresponding to the QP type, wqe process separately */
- if (ibqp->qp_type == IB_QPT_GSI) {
- ud_sq_wqe = wqe;
- memset(ud_sq_wqe, 0, sizeof(*ud_sq_wqe));
+ if (ibqp->qp_type == IB_QPT_GSI)
+ ret = set_ud_wqe(qp, wr, wqe, &sge_idx, owner_bit);
+ else if (ibqp->qp_type == IB_QPT_RC)
+ ret = set_rc_wqe(qp, wr, wqe, &sge_idx, owner_bit);
- roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_0_M,
- V2_UD_SEND_WQE_DMAC_0_S, ah->av.mac[0]);
- roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_1_M,
- V2_UD_SEND_WQE_DMAC_1_S, ah->av.mac[1]);
- roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_2_M,
- V2_UD_SEND_WQE_DMAC_2_S, ah->av.mac[2]);
- roce_set_field(ud_sq_wqe->dmac, V2_UD_SEND_WQE_DMAC_3_M,
- V2_UD_SEND_WQE_DMAC_3_S, ah->av.mac[3]);
- roce_set_field(ud_sq_wqe->byte_48,
- V2_UD_SEND_WQE_BYTE_48_DMAC_4_M,
- V2_UD_SEND_WQE_BYTE_48_DMAC_4_S,
- ah->av.mac[4]);
- roce_set_field(ud_sq_wqe->byte_48,
- V2_UD_SEND_WQE_BYTE_48_DMAC_5_M,
- V2_UD_SEND_WQE_BYTE_48_DMAC_5_S,
- ah->av.mac[5]);
-
- /* MAC loopback */
- smac = (u8 *)hr_dev->dev_addr[qp->port];
- loopback = ether_addr_equal_unaligned(ah->av.mac,
- smac) ? 1 : 0;
-
- roce_set_bit(ud_sq_wqe->byte_40,
- V2_UD_SEND_WQE_BYTE_40_LBI_S, loopback);
-
- roce_set_field(ud_sq_wqe->byte_4,
- V2_UD_SEND_WQE_BYTE_4_OPCODE_M,
- V2_UD_SEND_WQE_BYTE_4_OPCODE_S,
- HNS_ROCE_V2_WQE_OP_SEND);
-
- ud_sq_wqe->msg_len =
- cpu_to_le32(le32_to_cpu(ud_sq_wqe->msg_len) + tmp_len);
-
- switch (wr->opcode) {
- case IB_WR_SEND_WITH_IMM:
- case IB_WR_RDMA_WRITE_WITH_IMM:
- ud_sq_wqe->immtdata =
- cpu_to_le32(be32_to_cpu(wr->ex.imm_data));
- break;
- default:
- ud_sq_wqe->immtdata = 0;
- break;
- }
-
- /* Set sig attr */
- roce_set_bit(ud_sq_wqe->byte_4,
- V2_UD_SEND_WQE_BYTE_4_CQE_S,
- (wr->send_flags & IB_SEND_SIGNALED) ? 1 : 0);
-
- /* Set se attr */
- roce_set_bit(ud_sq_wqe->byte_4,
- V2_UD_SEND_WQE_BYTE_4_SE_S,
- (wr->send_flags & IB_SEND_SOLICITED) ? 1 : 0);
-
- roce_set_bit(ud_sq_wqe->byte_4,
- V2_UD_SEND_WQE_BYTE_4_OWNER_S, owner_bit);
-
- roce_set_field(ud_sq_wqe->byte_16,
- V2_UD_SEND_WQE_BYTE_16_PD_M,
- V2_UD_SEND_WQE_BYTE_16_PD_S,
- to_hr_pd(ibqp->pd)->pdn);
-
- roce_set_field(ud_sq_wqe->byte_16,
- V2_UD_SEND_WQE_BYTE_16_SGE_NUM_M,
- V2_UD_SEND_WQE_BYTE_16_SGE_NUM_S,
- valid_num_sge);
-
- roce_set_field(ud_sq_wqe->byte_20,
- V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M,
- V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S,
- sge_idx & (qp->sge.sge_cnt - 1));
-
- roce_set_field(ud_sq_wqe->byte_24,
- V2_UD_SEND_WQE_BYTE_24_UDPSPN_M,
- V2_UD_SEND_WQE_BYTE_24_UDPSPN_S, 0);
- ud_sq_wqe->qkey =
- cpu_to_le32(ud_wr(wr)->remote_qkey & 0x80000000 ?
- qp->qkey : ud_wr(wr)->remote_qkey);
- roce_set_field(ud_sq_wqe->byte_32,
- V2_UD_SEND_WQE_BYTE_32_DQPN_M,
- V2_UD_SEND_WQE_BYTE_32_DQPN_S,
- ud_wr(wr)->remote_qpn);
-
- roce_set_field(ud_sq_wqe->byte_36,
- V2_UD_SEND_WQE_BYTE_36_VLAN_M,
- V2_UD_SEND_WQE_BYTE_36_VLAN_S,
- ah->av.vlan_id);
- roce_set_field(ud_sq_wqe->byte_36,
- V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_M,
- V2_UD_SEND_WQE_BYTE_36_HOPLIMIT_S,
- ah->av.hop_limit);
- roce_set_field(ud_sq_wqe->byte_36,
- V2_UD_SEND_WQE_BYTE_36_TCLASS_M,
- V2_UD_SEND_WQE_BYTE_36_TCLASS_S,
- ah->av.tclass);
- roce_set_field(ud_sq_wqe->byte_40,
- V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_M,
- V2_UD_SEND_WQE_BYTE_40_FLOW_LABEL_S,
- ah->av.flowlabel);
- roce_set_field(ud_sq_wqe->byte_40,
- V2_UD_SEND_WQE_BYTE_40_SL_M,
- V2_UD_SEND_WQE_BYTE_40_SL_S,
- ah->av.sl);
- roce_set_field(ud_sq_wqe->byte_40,
- V2_UD_SEND_WQE_BYTE_40_PORTN_M,
- V2_UD_SEND_WQE_BYTE_40_PORTN_S,
- qp->port);
-
- roce_set_bit(ud_sq_wqe->byte_40,
- V2_UD_SEND_WQE_BYTE_40_UD_VLAN_EN_S,
- ah->av.vlan_en ? 1 : 0);
- roce_set_field(ud_sq_wqe->byte_48,
- V2_UD_SEND_WQE_BYTE_48_SGID_INDX_M,
- V2_UD_SEND_WQE_BYTE_48_SGID_INDX_S,
- hns_get_gid_index(hr_dev, qp->phy_port,
- ah->av.gid_index));
-
- memcpy(&ud_sq_wqe->dgid[0], &ah->av.dgid[0],
- GID_LEN_V2);
-
- set_extend_sge(qp, wr, &sge_idx, valid_num_sge);
- } else if (ibqp->qp_type == IB_QPT_RC) {
- rc_sq_wqe = wqe;
- memset(rc_sq_wqe, 0, sizeof(*rc_sq_wqe));
-
- rc_sq_wqe->msg_len =
- cpu_to_le32(le32_to_cpu(rc_sq_wqe->msg_len) + tmp_len);
-
- switch (wr->opcode) {
- case IB_WR_SEND_WITH_IMM:
- case IB_WR_RDMA_WRITE_WITH_IMM:
- rc_sq_wqe->immtdata =
- cpu_to_le32(be32_to_cpu(wr->ex.imm_data));
- break;
- case IB_WR_SEND_WITH_INV:
- rc_sq_wqe->inv_key =
- cpu_to_le32(wr->ex.invalidate_rkey);
- break;
- default:
- rc_sq_wqe->immtdata = 0;
- break;
- }
-
- roce_set_bit(rc_sq_wqe->byte_4,
- V2_RC_SEND_WQE_BYTE_4_FENCE_S,
- (wr->send_flags & IB_SEND_FENCE) ? 1 : 0);
-
- roce_set_bit(rc_sq_wqe->byte_4,
- V2_RC_SEND_WQE_BYTE_4_SE_S,
- (wr->send_flags & IB_SEND_SOLICITED) ? 1 : 0);
-
- roce_set_bit(rc_sq_wqe->byte_4,
- V2_RC_SEND_WQE_BYTE_4_CQE_S,
- (wr->send_flags & IB_SEND_SIGNALED) ? 1 : 0);
-
- roce_set_bit(rc_sq_wqe->byte_4,
- V2_RC_SEND_WQE_BYTE_4_OWNER_S, owner_bit);
-
- wqe += sizeof(struct hns_roce_v2_rc_send_wqe);
- switch (wr->opcode) {
- case IB_WR_RDMA_READ:
- hr_op = HNS_ROCE_V2_WQE_OP_RDMA_READ;
- rc_sq_wqe->rkey =
- cpu_to_le32(rdma_wr(wr)->rkey);
- rc_sq_wqe->va =
- cpu_to_le64(rdma_wr(wr)->remote_addr);
- break;
- case IB_WR_RDMA_WRITE:
- hr_op = HNS_ROCE_V2_WQE_OP_RDMA_WRITE;
- rc_sq_wqe->rkey =
- cpu_to_le32(rdma_wr(wr)->rkey);
- rc_sq_wqe->va =
- cpu_to_le64(rdma_wr(wr)->remote_addr);
- break;
- case IB_WR_RDMA_WRITE_WITH_IMM:
- hr_op = HNS_ROCE_V2_WQE_OP_RDMA_WRITE_WITH_IMM;
- rc_sq_wqe->rkey =
- cpu_to_le32(rdma_wr(wr)->rkey);
- rc_sq_wqe->va =
- cpu_to_le64(rdma_wr(wr)->remote_addr);
- break;
- case IB_WR_SEND:
- hr_op = HNS_ROCE_V2_WQE_OP_SEND;
- break;
- case IB_WR_SEND_WITH_INV:
- hr_op = HNS_ROCE_V2_WQE_OP_SEND_WITH_INV;
- break;
- case IB_WR_SEND_WITH_IMM:
- hr_op = HNS_ROCE_V2_WQE_OP_SEND_WITH_IMM;
- break;
- case IB_WR_LOCAL_INV:
- hr_op = HNS_ROCE_V2_WQE_OP_LOCAL_INV;
- roce_set_bit(rc_sq_wqe->byte_4,
- V2_RC_SEND_WQE_BYTE_4_SO_S, 1);
- rc_sq_wqe->inv_key =
- cpu_to_le32(wr->ex.invalidate_rkey);
- break;
- case IB_WR_REG_MR:
- hr_op = HNS_ROCE_V2_WQE_OP_FAST_REG_PMR;
- fseg = wqe;
- set_frmr_seg(rc_sq_wqe, fseg, reg_wr(wr));
- break;
- case IB_WR_ATOMIC_CMP_AND_SWP:
- hr_op = HNS_ROCE_V2_WQE_OP_ATOM_CMP_AND_SWAP;
- rc_sq_wqe->rkey =
- cpu_to_le32(atomic_wr(wr)->rkey);
- rc_sq_wqe->va =
- cpu_to_le64(atomic_wr(wr)->remote_addr);
- break;
- case IB_WR_ATOMIC_FETCH_AND_ADD:
- hr_op = HNS_ROCE_V2_WQE_OP_ATOM_FETCH_AND_ADD;
- rc_sq_wqe->rkey =
- cpu_to_le32(atomic_wr(wr)->rkey);
- rc_sq_wqe->va =
- cpu_to_le64(atomic_wr(wr)->remote_addr);
- break;
- case IB_WR_MASKED_ATOMIC_CMP_AND_SWP:
- hr_op =
- HNS_ROCE_V2_WQE_OP_ATOM_MSK_CMP_AND_SWAP;
- break;
- case IB_WR_MASKED_ATOMIC_FETCH_AND_ADD:
- hr_op =
- HNS_ROCE_V2_WQE_OP_ATOM_MSK_FETCH_AND_ADD;
- break;
- default:
- hr_op = HNS_ROCE_V2_WQE_OP_MASK;
- break;
- }
-
- roce_set_field(rc_sq_wqe->byte_4,
- V2_RC_SEND_WQE_BYTE_4_OPCODE_M,
- V2_RC_SEND_WQE_BYTE_4_OPCODE_S, hr_op);
-
- if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
- wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD) {
- struct hns_roce_v2_wqe_data_seg *dseg;
-
- dseg = wqe;
- set_data_seg_v2(dseg, wr->sg_list);
- wqe += sizeof(struct hns_roce_v2_wqe_data_seg);
- set_atomic_seg(wqe, atomic_wr(wr));
- roce_set_field(rc_sq_wqe->byte_16,
- V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M,
- V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S,
- valid_num_sge);
- } else if (wr->opcode != IB_WR_REG_MR) {
- ret = set_rwqe_data_seg(ibqp, wr, rc_sq_wqe,
- wqe, &sge_idx,
- valid_num_sge, bad_wr);
- if (ret)
- goto out;
- }
- } else {
- dev_err(dev, "Illegal qp_type(0x%x)\n", ibqp->qp_type);
- spin_unlock_irqrestore(&qp->sq.lock, flags);
+ if (ret) {
*bad_wr = wr;
- return -EOPNOTSUPP;
+ goto out;
}
}
out:
if (likely(nreq)) {
qp->sq.head += nreq;
+ qp->next_sge = sge_idx;
/* Memory barrier */
wmb();
-
- sq_db.byte_4 = 0;
- sq_db.parameter = 0;
-
- roce_set_field(sq_db.byte_4, V2_DB_BYTE_4_TAG_M,
- V2_DB_BYTE_4_TAG_S, qp->doorbell_qpn);
- roce_set_field(sq_db.byte_4, V2_DB_BYTE_4_CMD_M,
- V2_DB_BYTE_4_CMD_S, HNS_ROCE_V2_SQ_DB);
- roce_set_field(sq_db.parameter, V2_DB_PARAMETER_IDX_M,
- V2_DB_PARAMETER_IDX_S,
- qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1));
- roce_set_field(sq_db.parameter, V2_DB_PARAMETER_SL_M,
- V2_DB_PARAMETER_SL_S, qp->sl);
-
- hns_roce_write64(hr_dev, (__le32 *)&sq_db, qp->sq.db_reg_l);
-
- qp->next_sge = sge_idx;
-
- if (qp->state == IB_QPS_ERR) {
- attr_mask = IB_QP_STATE;
- attr.qp_state = IB_QPS_ERR;
-
- ret = hns_roce_v2_modify_qp(&qp->ibqp, &attr, attr_mask,
- qp->state, IB_QPS_ERR);
- if (ret) {
- spin_unlock_irqrestore(&qp->sq.lock, flags);
- *bad_wr = wr;
- return ret;
- }
- }
+ update_sq_db(hr_dev, qp);
}
spin_unlock_irqrestore(&qp->sq.lock, flags);
@@ -643,13 +621,11 @@ static int hns_roce_v2_post_recv(struct ib_qp *ibqp,
{
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+ struct ib_device *ibdev = &hr_dev->ib_dev;
struct hns_roce_v2_wqe_data_seg *dseg;
struct hns_roce_rinl_sge *sge_list;
- struct device *dev = hr_dev->dev;
- struct ib_qp_attr attr;
unsigned long flags;
void *wqe = NULL;
- int attr_mask;
u32 wqe_idx;
int nreq;
int ret;
@@ -675,14 +651,14 @@ static int hns_roce_v2_post_recv(struct ib_qp *ibqp,
wqe_idx = (hr_qp->rq.head + nreq) & (hr_qp->rq.wqe_cnt - 1);
if (unlikely(wr->num_sge > hr_qp->rq.max_gs)) {
- dev_err(dev, "rq:num_sge=%d > qp->sq.max_gs=%d\n",
- wr->num_sge, hr_qp->rq.max_gs);
+ ibdev_err(ibdev, "rq:num_sge=%d >= qp->sq.max_gs=%d\n",
+ wr->num_sge, hr_qp->rq.max_gs);
ret = -EINVAL;
*bad_wr = wr;
goto out;
}
- wqe = get_recv_wqe(hr_qp, wqe_idx);
+ wqe = hns_roce_get_recv_wqe(hr_qp, wqe_idx);
dseg = (struct hns_roce_v2_wqe_data_seg *)wqe;
for (i = 0; i < wr->num_sge; i++) {
if (!wr->sg_list[i].length)
@@ -717,20 +693,21 @@ static int hns_roce_v2_post_recv(struct ib_qp *ibqp,
/* Memory barrier */
wmb();
- *hr_qp->rdb.db_record = hr_qp->rq.head & 0xffff;
-
+ /*
+ * Hip08 hardware cannot flush the WQEs in RQ if the QP state
+ * gets into errored mode. Hence, as a workaround to this
+ * hardware limitation, driver needs to assist in flushing. But
+ * the flushing operation uses mailbox to convey the QP state to
+ * the hardware and which can sleep due to the mutex protection
+ * around the mailbox calls. Hence, use the deferred flush for
+ * now.
+ */
if (hr_qp->state == IB_QPS_ERR) {
- attr_mask = IB_QP_STATE;
- attr.qp_state = IB_QPS_ERR;
-
- ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, &attr,
- attr_mask, hr_qp->state,
- IB_QPS_ERR);
- if (ret) {
- spin_unlock_irqrestore(&hr_qp->rq.lock, flags);
- *bad_wr = wr;
- return ret;
- }
+ if (!test_and_set_bit(HNS_ROCE_FLUSH_FLAG,
+ &hr_qp->flush_flag))
+ init_flush_work(hr_dev, hr_qp);
+ } else {
+ *hr_qp->rdb.db_record = hr_qp->rq.head & 0xffff;
}
}
spin_unlock_irqrestore(&hr_qp->rq.lock, flags);
@@ -1448,82 +1425,63 @@ static int hns_roce_alloc_vf_resource(struct hns_roce_dev *hr_dev)
desc[i].flag |= cpu_to_le16(HNS_ROCE_CMD_FLAG_NEXT);
else
desc[i].flag &= ~cpu_to_le16(HNS_ROCE_CMD_FLAG_NEXT);
-
- if (i == 0) {
- roce_set_field(req_a->vf_qpc_bt_idx_num,
- VF_RES_A_DATA_1_VF_QPC_BT_IDX_M,
- VF_RES_A_DATA_1_VF_QPC_BT_IDX_S, 0);
- roce_set_field(req_a->vf_qpc_bt_idx_num,
- VF_RES_A_DATA_1_VF_QPC_BT_NUM_M,
- VF_RES_A_DATA_1_VF_QPC_BT_NUM_S,
- HNS_ROCE_VF_QPC_BT_NUM);
-
- roce_set_field(req_a->vf_srqc_bt_idx_num,
- VF_RES_A_DATA_2_VF_SRQC_BT_IDX_M,
- VF_RES_A_DATA_2_VF_SRQC_BT_IDX_S, 0);
- roce_set_field(req_a->vf_srqc_bt_idx_num,
- VF_RES_A_DATA_2_VF_SRQC_BT_NUM_M,
- VF_RES_A_DATA_2_VF_SRQC_BT_NUM_S,
- HNS_ROCE_VF_SRQC_BT_NUM);
-
- roce_set_field(req_a->vf_cqc_bt_idx_num,
- VF_RES_A_DATA_3_VF_CQC_BT_IDX_M,
- VF_RES_A_DATA_3_VF_CQC_BT_IDX_S, 0);
- roce_set_field(req_a->vf_cqc_bt_idx_num,
- VF_RES_A_DATA_3_VF_CQC_BT_NUM_M,
- VF_RES_A_DATA_3_VF_CQC_BT_NUM_S,
- HNS_ROCE_VF_CQC_BT_NUM);
-
- roce_set_field(req_a->vf_mpt_bt_idx_num,
- VF_RES_A_DATA_4_VF_MPT_BT_IDX_M,
- VF_RES_A_DATA_4_VF_MPT_BT_IDX_S, 0);
- roce_set_field(req_a->vf_mpt_bt_idx_num,
- VF_RES_A_DATA_4_VF_MPT_BT_NUM_M,
- VF_RES_A_DATA_4_VF_MPT_BT_NUM_S,
- HNS_ROCE_VF_MPT_BT_NUM);
-
- roce_set_field(req_a->vf_eqc_bt_idx_num,
- VF_RES_A_DATA_5_VF_EQC_IDX_M,
- VF_RES_A_DATA_5_VF_EQC_IDX_S, 0);
- roce_set_field(req_a->vf_eqc_bt_idx_num,
- VF_RES_A_DATA_5_VF_EQC_NUM_M,
- VF_RES_A_DATA_5_VF_EQC_NUM_S,
- HNS_ROCE_VF_EQC_NUM);
- } else {
- roce_set_field(req_b->vf_smac_idx_num,
- VF_RES_B_DATA_1_VF_SMAC_IDX_M,
- VF_RES_B_DATA_1_VF_SMAC_IDX_S, 0);
- roce_set_field(req_b->vf_smac_idx_num,
- VF_RES_B_DATA_1_VF_SMAC_NUM_M,
- VF_RES_B_DATA_1_VF_SMAC_NUM_S,
- HNS_ROCE_VF_SMAC_NUM);
-
- roce_set_field(req_b->vf_sgid_idx_num,
- VF_RES_B_DATA_2_VF_SGID_IDX_M,
- VF_RES_B_DATA_2_VF_SGID_IDX_S, 0);
- roce_set_field(req_b->vf_sgid_idx_num,
- VF_RES_B_DATA_2_VF_SGID_NUM_M,
- VF_RES_B_DATA_2_VF_SGID_NUM_S,
- HNS_ROCE_VF_SGID_NUM);
-
- roce_set_field(req_b->vf_qid_idx_sl_num,
- VF_RES_B_DATA_3_VF_QID_IDX_M,
- VF_RES_B_DATA_3_VF_QID_IDX_S, 0);
- roce_set_field(req_b->vf_qid_idx_sl_num,
- VF_RES_B_DATA_3_VF_SL_NUM_M,
- VF_RES_B_DATA_3_VF_SL_NUM_S,
- HNS_ROCE_VF_SL_NUM);
-
- roce_set_field(req_b->vf_sccc_idx_num,
- VF_RES_B_DATA_4_VF_SCCC_BT_IDX_M,
- VF_RES_B_DATA_4_VF_SCCC_BT_IDX_S, 0);
- roce_set_field(req_b->vf_sccc_idx_num,
- VF_RES_B_DATA_4_VF_SCCC_BT_NUM_M,
- VF_RES_B_DATA_4_VF_SCCC_BT_NUM_S,
- HNS_ROCE_VF_SCCC_BT_NUM);
- }
}
+ roce_set_field(req_a->vf_qpc_bt_idx_num,
+ VF_RES_A_DATA_1_VF_QPC_BT_IDX_M,
+ VF_RES_A_DATA_1_VF_QPC_BT_IDX_S, 0);
+ roce_set_field(req_a->vf_qpc_bt_idx_num,
+ VF_RES_A_DATA_1_VF_QPC_BT_NUM_M,
+ VF_RES_A_DATA_1_VF_QPC_BT_NUM_S, HNS_ROCE_VF_QPC_BT_NUM);
+
+ roce_set_field(req_a->vf_srqc_bt_idx_num,
+ VF_RES_A_DATA_2_VF_SRQC_BT_IDX_M,
+ VF_RES_A_DATA_2_VF_SRQC_BT_IDX_S, 0);
+ roce_set_field(req_a->vf_srqc_bt_idx_num,
+ VF_RES_A_DATA_2_VF_SRQC_BT_NUM_M,
+ VF_RES_A_DATA_2_VF_SRQC_BT_NUM_S,
+ HNS_ROCE_VF_SRQC_BT_NUM);
+
+ roce_set_field(req_a->vf_cqc_bt_idx_num,
+ VF_RES_A_DATA_3_VF_CQC_BT_IDX_M,
+ VF_RES_A_DATA_3_VF_CQC_BT_IDX_S, 0);
+ roce_set_field(req_a->vf_cqc_bt_idx_num,
+ VF_RES_A_DATA_3_VF_CQC_BT_NUM_M,
+ VF_RES_A_DATA_3_VF_CQC_BT_NUM_S, HNS_ROCE_VF_CQC_BT_NUM);
+
+ roce_set_field(req_a->vf_mpt_bt_idx_num,
+ VF_RES_A_DATA_4_VF_MPT_BT_IDX_M,
+ VF_RES_A_DATA_4_VF_MPT_BT_IDX_S, 0);
+ roce_set_field(req_a->vf_mpt_bt_idx_num,
+ VF_RES_A_DATA_4_VF_MPT_BT_NUM_M,
+ VF_RES_A_DATA_4_VF_MPT_BT_NUM_S, HNS_ROCE_VF_MPT_BT_NUM);
+
+ roce_set_field(req_a->vf_eqc_bt_idx_num, VF_RES_A_DATA_5_VF_EQC_IDX_M,
+ VF_RES_A_DATA_5_VF_EQC_IDX_S, 0);
+ roce_set_field(req_a->vf_eqc_bt_idx_num, VF_RES_A_DATA_5_VF_EQC_NUM_M,
+ VF_RES_A_DATA_5_VF_EQC_NUM_S, HNS_ROCE_VF_EQC_NUM);
+
+ roce_set_field(req_b->vf_smac_idx_num, VF_RES_B_DATA_1_VF_SMAC_IDX_M,
+ VF_RES_B_DATA_1_VF_SMAC_IDX_S, 0);
+ roce_set_field(req_b->vf_smac_idx_num, VF_RES_B_DATA_1_VF_SMAC_NUM_M,
+ VF_RES_B_DATA_1_VF_SMAC_NUM_S, HNS_ROCE_VF_SMAC_NUM);
+
+ roce_set_field(req_b->vf_sgid_idx_num, VF_RES_B_DATA_2_VF_SGID_IDX_M,
+ VF_RES_B_DATA_2_VF_SGID_IDX_S, 0);
+ roce_set_field(req_b->vf_sgid_idx_num, VF_RES_B_DATA_2_VF_SGID_NUM_M,
+ VF_RES_B_DATA_2_VF_SGID_NUM_S, HNS_ROCE_VF_SGID_NUM);
+
+ roce_set_field(req_b->vf_qid_idx_sl_num, VF_RES_B_DATA_3_VF_QID_IDX_M,
+ VF_RES_B_DATA_3_VF_QID_IDX_S, 0);
+ roce_set_field(req_b->vf_qid_idx_sl_num, VF_RES_B_DATA_3_VF_SL_NUM_M,
+ VF_RES_B_DATA_3_VF_SL_NUM_S, HNS_ROCE_VF_SL_NUM);
+
+ roce_set_field(req_b->vf_sccc_idx_num, VF_RES_B_DATA_4_VF_SCCC_BT_IDX_M,
+ VF_RES_B_DATA_4_VF_SCCC_BT_IDX_S, 0);
+ roce_set_field(req_b->vf_sccc_idx_num, VF_RES_B_DATA_4_VF_SCCC_BT_NUM_M,
+ VF_RES_B_DATA_4_VF_SCCC_BT_NUM_S,
+ HNS_ROCE_VF_SCCC_BT_NUM);
+
return hns_roce_cmq_send(hr_dev, desc, 2);
}
@@ -1691,7 +1649,7 @@ static void set_default_caps(struct hns_roce_dev *hr_dev)
caps->max_srq_wrs = HNS_ROCE_V2_MAX_SRQ_WR;
caps->max_srq_sges = HNS_ROCE_V2_MAX_SRQ_SGE;
- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) {
+ if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) {
caps->flags |= HNS_ROCE_CAP_FLAG_ATOMIC | HNS_ROCE_CAP_FLAG_MW |
HNS_ROCE_CAP_FLAG_SRQ | HNS_ROCE_CAP_FLAG_FRMR |
HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL;
@@ -1939,7 +1897,7 @@ static int hns_roce_query_pf_caps(struct hns_roce_dev *hr_dev)
caps->srqc_bt_num, &caps->srqc_buf_pg_sz,
&caps->srqc_ba_pg_sz, HEM_TYPE_SRQC);
- if (hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_B) {
+ if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) {
caps->sccc_hop_num = ctx_hop_num;
caps->qpc_timer_hop_num = HNS_ROCE_HOP_NUM_0;
caps->cqc_timer_hop_num = HNS_ROCE_HOP_NUM_0;
@@ -1999,7 +1957,7 @@ static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev)
return ret;
}
- if (hr_dev->pci_dev->revision == 0x21) {
+ if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B) {
ret = hns_roce_query_pf_timer_resource(hr_dev);
if (ret) {
dev_err(hr_dev->dev,
@@ -2007,16 +1965,7 @@ static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev)
ret);
return ret;
}
- }
- ret = hns_roce_alloc_vf_resource(hr_dev);
- if (ret) {
- dev_err(hr_dev->dev, "Allocate vf resource fail, ret = %d.\n",
- ret);
- return ret;
- }
-
- if (hr_dev->pci_dev->revision == 0x21) {
ret = hns_roce_set_vf_switch_param(hr_dev, 0);
if (ret) {
dev_err(hr_dev->dev,
@@ -2046,6 +1995,13 @@ static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev)
if (ret)
set_default_caps(hr_dev);
+ ret = hns_roce_alloc_vf_resource(hr_dev);
+ if (ret) {
+ dev_err(hr_dev->dev, "Allocate vf resource fail, ret = %d.\n",
+ ret);
+ return ret;
+ }
+
ret = hns_roce_v2_set_bt(hr_dev);
if (ret)
dev_err(hr_dev->dev, "Configure bt attribute fail, ret = %d.\n",
@@ -2298,7 +2254,7 @@ static void hns_roce_v2_exit(struct hns_roce_dev *hr_dev)
{
struct hns_roce_v2_priv *priv = hr_dev->priv;
- if (hr_dev->pci_dev->revision == 0x21)
+ if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B)
hns_roce_function_clear(hr_dev);
hns_roce_free_link_table(hr_dev, &priv->tpq);
@@ -2461,7 +2417,9 @@ static int hns_roce_v2_set_gid(struct hns_roce_dev *hr_dev, u8 port,
ret = hns_roce_config_sgid_table(hr_dev, gid_index, gid, sgid_type);
if (ret)
- dev_err(hr_dev->dev, "Configure sgid table failed(%d)!\n", ret);
+ ibdev_err(&hr_dev->ib_dev,
+ "failed to configure sgid table, ret = %d!\n",
+ ret);
return ret;
}
@@ -2757,7 +2715,7 @@ static void hns_roce_free_srq_wqe(struct hns_roce_srq *srq, int wqe_index)
static void hns_roce_v2_cq_set_ci(struct hns_roce_cq *hr_cq, u32 cons_index)
{
- *hr_cq->set_ci_db = cons_index & 0xffffff;
+ *hr_cq->set_ci_db = cons_index & V2_CQ_DB_PARAMETER_CONS_IDX_M;
}
static void __hns_roce_v2_cq_clean(struct hns_roce_cq *hr_cq, u32 qpn,
@@ -2942,7 +2900,7 @@ static int hns_roce_handle_recv_inl_wqe(struct hns_roce_v2_cqe *cqe,
sge_list = (*cur_qp)->rq_inl_buf.wqe_list[wr_cnt].sg_list;
sge_num = (*cur_qp)->rq_inl_buf.wqe_list[wr_cnt].sge_cnt;
- wqe_buf = get_recv_wqe(*cur_qp, wr_cnt);
+ wqe_buf = hns_roce_get_recv_wqe(*cur_qp, wr_cnt);
data_len = wc->byte_len;
for (sge_cnt = 0; (sge_cnt < sge_num) && (data_len); sge_cnt++) {
@@ -3013,13 +2971,11 @@ static int hns_roce_v2_sw_poll_cq(struct hns_roce_cq *hr_cq, int num_entries,
static int hns_roce_v2_poll_one(struct hns_roce_cq *hr_cq,
struct hns_roce_qp **cur_qp, struct ib_wc *wc)
{
+ struct hns_roce_dev *hr_dev = to_hr_dev(hr_cq->ib_cq.device);
struct hns_roce_srq *srq = NULL;
- struct hns_roce_dev *hr_dev;
struct hns_roce_v2_cqe *cqe;
struct hns_roce_qp *hr_qp;
struct hns_roce_wq *wq;
- struct ib_qp_attr attr;
- int attr_mask;
int is_send;
u16 wqe_ctr;
u32 opcode;
@@ -3043,16 +2999,17 @@ static int hns_roce_v2_poll_one(struct hns_roce_cq *hr_cq,
V2_CQE_BYTE_16_LCL_QPN_S);
if (!*cur_qp || (qpn & HNS_ROCE_V2_CQE_QPN_MASK) != (*cur_qp)->qpn) {
- hr_dev = to_hr_dev(hr_cq->ib_cq.device);
hr_qp = __hns_roce_qp_lookup(hr_dev, qpn);
if (unlikely(!hr_qp)) {
- dev_err(hr_dev->dev, "CQ %06lx with entry for unknown QPN %06x\n",
- hr_cq->cqn, (qpn & HNS_ROCE_V2_CQE_QPN_MASK));
+ ibdev_err(&hr_dev->ib_dev,
+ "CQ %06lx with entry for unknown QPN %06x\n",
+ hr_cq->cqn, qpn & HNS_ROCE_V2_CQE_QPN_MASK);
return -EINVAL;
}
*cur_qp = hr_qp;
}
+ hr_qp = *cur_qp;
wc->qp = &(*cur_qp)->ibqp;
wc->vendor_err = 0;
@@ -3137,14 +3094,24 @@ static int hns_roce_v2_poll_one(struct hns_roce_cq *hr_cq,
break;
}
- /* flush cqe if wc status is error, excluding flush error */
- if ((wc->status != IB_WC_SUCCESS) &&
- (wc->status != IB_WC_WR_FLUSH_ERR)) {
- attr_mask = IB_QP_STATE;
- attr.qp_state = IB_QPS_ERR;
- return hns_roce_v2_modify_qp(&(*cur_qp)->ibqp,
- &attr, attr_mask,
- (*cur_qp)->state, IB_QPS_ERR);
+ /*
+ * Hip08 hardware cannot flush the WQEs in SQ/RQ if the QP state gets
+ * into errored mode. Hence, as a workaround to this hardware
+ * limitation, driver needs to assist in flushing. But the flushing
+ * operation uses mailbox to convey the QP state to the hardware and
+ * which can sleep due to the mutex protection around the mailbox calls.
+ * Hence, use the deferred flush for now. Once wc error detected, the
+ * flushing operation is needed.
+ */
+ if (wc->status != IB_WC_SUCCESS &&
+ wc->status != IB_WC_WR_FLUSH_ERR) {
+ ibdev_err(&hr_dev->ib_dev, "error cqe status is: 0x%x\n",
+ status & HNS_ROCE_V2_CQE_STATUS_MASK);
+
+ if (!test_and_set_bit(HNS_ROCE_FLUSH_FLAG, &hr_qp->flush_flag))
+ init_flush_work(hr_dev, hr_qp);
+
+ return 0;
}
if (wc->status == IB_WC_WR_FLUSH_ERR)
@@ -3262,14 +3229,7 @@ static int hns_roce_v2_poll_one(struct hns_roce_cq *hr_cq,
wc->port_num = roce_get_field(cqe->byte_32,
V2_CQE_BYTE_32_PORTN_M, V2_CQE_BYTE_32_PORTN_S);
wc->pkey_index = 0;
- memcpy(wc->smac, cqe->smac, 4);
- wc->smac[4] = roce_get_field(cqe->byte_28,
- V2_CQE_BYTE_28_SMAC_4_M,
- V2_CQE_BYTE_28_SMAC_4_S);
- wc->smac[5] = roce_get_field(cqe->byte_28,
- V2_CQE_BYTE_28_SMAC_5_M,
- V2_CQE_BYTE_28_SMAC_5_S);
- wc->wc_flags |= IB_WC_WITH_SMAC;
+
if (roce_get_bit(cqe->byte_28, V2_CQE_BYTE_28_VID_VLD_S)) {
wc->vlan_id = (u16)roce_get_field(cqe->byte_28,
V2_CQE_BYTE_28_VID_M,
@@ -3567,14 +3527,9 @@ static void set_qpc_wqe_cnt(struct hns_roce_qp *hr_qp,
HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE ?
ilog2((unsigned int)hr_qp->sge.sge_cnt) : 0);
- roce_set_field(qpc_mask->byte_4_sqpn_tst, V2_QPC_BYTE_4_SGE_SHIFT_M,
- V2_QPC_BYTE_4_SGE_SHIFT_S, 0);
-
roce_set_field(context->byte_20_smac_sgid_idx,
V2_QPC_BYTE_20_SQ_SHIFT_M, V2_QPC_BYTE_20_SQ_SHIFT_S,
ilog2((unsigned int)hr_qp->sq.wqe_cnt));
- roce_set_field(qpc_mask->byte_20_smac_sgid_idx,
- V2_QPC_BYTE_20_SQ_SHIFT_M, V2_QPC_BYTE_20_SQ_SHIFT_S, 0);
roce_set_field(context->byte_20_smac_sgid_idx,
V2_QPC_BYTE_20_RQ_SHIFT_M, V2_QPC_BYTE_20_RQ_SHIFT_S,
@@ -3582,9 +3537,6 @@ static void set_qpc_wqe_cnt(struct hns_roce_qp *hr_qp,
hr_qp->ibqp.qp_type == IB_QPT_XRC_TGT ||
hr_qp->ibqp.srq) ? 0 :
ilog2((unsigned int)hr_qp->rq.wqe_cnt));
-
- roce_set_field(qpc_mask->byte_20_smac_sgid_idx,
- V2_QPC_BYTE_20_RQ_SHIFT_M, V2_QPC_BYTE_20_RQ_SHIFT_S, 0);
}
static void modify_qp_reset_to_init(struct ib_qp *ibqp,
@@ -3604,280 +3556,53 @@ static void modify_qp_reset_to_init(struct ib_qp *ibqp,
*/
roce_set_field(context->byte_4_sqpn_tst, V2_QPC_BYTE_4_TST_M,
V2_QPC_BYTE_4_TST_S, to_hr_qp_type(hr_qp->ibqp.qp_type));
- roce_set_field(qpc_mask->byte_4_sqpn_tst, V2_QPC_BYTE_4_TST_M,
- V2_QPC_BYTE_4_TST_S, 0);
roce_set_field(context->byte_4_sqpn_tst, V2_QPC_BYTE_4_SQPN_M,
V2_QPC_BYTE_4_SQPN_S, hr_qp->qpn);
- roce_set_field(qpc_mask->byte_4_sqpn_tst, V2_QPC_BYTE_4_SQPN_M,
- V2_QPC_BYTE_4_SQPN_S, 0);
roce_set_field(context->byte_16_buf_ba_pg_sz, V2_QPC_BYTE_16_PD_M,
V2_QPC_BYTE_16_PD_S, to_hr_pd(ibqp->pd)->pdn);
- roce_set_field(qpc_mask->byte_16_buf_ba_pg_sz, V2_QPC_BYTE_16_PD_M,
- V2_QPC_BYTE_16_PD_S, 0);
roce_set_field(context->byte_20_smac_sgid_idx, V2_QPC_BYTE_20_RQWS_M,
V2_QPC_BYTE_20_RQWS_S, ilog2(hr_qp->rq.max_gs));
- roce_set_field(qpc_mask->byte_20_smac_sgid_idx, V2_QPC_BYTE_20_RQWS_M,
- V2_QPC_BYTE_20_RQWS_S, 0);
set_qpc_wqe_cnt(hr_qp, context, qpc_mask);
/* No VLAN need to set 0xFFF */
roce_set_field(context->byte_24_mtu_tc, V2_QPC_BYTE_24_VLAN_ID_M,
V2_QPC_BYTE_24_VLAN_ID_S, 0xfff);
- roce_set_field(qpc_mask->byte_24_mtu_tc, V2_QPC_BYTE_24_VLAN_ID_M,
- V2_QPC_BYTE_24_VLAN_ID_S, 0);
- /*
- * Set some fields in context to zero, Because the default values
- * of all fields in context are zero, we need not set them to 0 again.
- * but we should set the relevant fields of context mask to 0.
- */
- roce_set_bit(qpc_mask->byte_56_dqpn_err, V2_QPC_BYTE_56_SQ_TX_ERR_S, 0);
- roce_set_bit(qpc_mask->byte_56_dqpn_err, V2_QPC_BYTE_56_SQ_RX_ERR_S, 0);
- roce_set_bit(qpc_mask->byte_56_dqpn_err, V2_QPC_BYTE_56_RQ_TX_ERR_S, 0);
- roce_set_bit(qpc_mask->byte_56_dqpn_err, V2_QPC_BYTE_56_RQ_RX_ERR_S, 0);
-
- roce_set_field(qpc_mask->byte_60_qpst_tempid, V2_QPC_BYTE_60_TEMPID_M,
- V2_QPC_BYTE_60_TEMPID_S, 0);
-
- roce_set_field(qpc_mask->byte_60_qpst_tempid,
- V2_QPC_BYTE_60_SCC_TOKEN_M, V2_QPC_BYTE_60_SCC_TOKEN_S,
- 0);
- roce_set_bit(qpc_mask->byte_60_qpst_tempid,
- V2_QPC_BYTE_60_SQ_DB_DOING_S, 0);
- roce_set_bit(qpc_mask->byte_60_qpst_tempid,
- V2_QPC_BYTE_60_RQ_DB_DOING_S, 0);
- roce_set_bit(qpc_mask->byte_28_at_fl, V2_QPC_BYTE_28_CNP_TX_FLAG_S, 0);
- roce_set_bit(qpc_mask->byte_28_at_fl, V2_QPC_BYTE_28_CE_FLAG_S, 0);
-
- if (hr_qp->rdb_en) {
+ if (hr_qp->rdb_en)
roce_set_bit(context->byte_68_rq_db,
V2_QPC_BYTE_68_RQ_RECORD_EN_S, 1);
- roce_set_bit(qpc_mask->byte_68_rq_db,
- V2_QPC_BYTE_68_RQ_RECORD_EN_S, 0);
- }
roce_set_field(context->byte_68_rq_db,
V2_QPC_BYTE_68_RQ_DB_RECORD_ADDR_M,
V2_QPC_BYTE_68_RQ_DB_RECORD_ADDR_S,
((u32)hr_qp->rdb.dma) >> 1);
- roce_set_field(qpc_mask->byte_68_rq_db,
- V2_QPC_BYTE_68_RQ_DB_RECORD_ADDR_M,
- V2_QPC_BYTE_68_RQ_DB_RECORD_ADDR_S, 0);
context->rq_db_record_addr = cpu_to_le32(hr_qp->rdb.dma >> 32);
- qpc_mask->rq_db_record_addr = 0;
roce_set_bit(context->byte_76_srqn_op_en, V2_QPC_BYTE_76_RQIE_S,
(hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) ? 1 : 0);
- roce_set_bit(qpc_mask->byte_76_srqn_op_en, V2_QPC_BYTE_76_RQIE_S, 0);
roce_set_field(context->byte_80_rnr_rx_cqn, V2_QPC_BYTE_80_RX_CQN_M,
V2_QPC_BYTE_80_RX_CQN_S, to_hr_cq(ibqp->recv_cq)->cqn);
- roce_set_field(qpc_mask->byte_80_rnr_rx_cqn, V2_QPC_BYTE_80_RX_CQN_M,
- V2_QPC_BYTE_80_RX_CQN_S, 0);
if (ibqp->srq) {
roce_set_field(context->byte_76_srqn_op_en,
V2_QPC_BYTE_76_SRQN_M, V2_QPC_BYTE_76_SRQN_S,
to_hr_srq(ibqp->srq)->srqn);
- roce_set_field(qpc_mask->byte_76_srqn_op_en,
- V2_QPC_BYTE_76_SRQN_M, V2_QPC_BYTE_76_SRQN_S, 0);
roce_set_bit(context->byte_76_srqn_op_en,
V2_QPC_BYTE_76_SRQ_EN_S, 1);
- roce_set_bit(qpc_mask->byte_76_srqn_op_en,
- V2_QPC_BYTE_76_SRQ_EN_S, 0);
}
- roce_set_field(qpc_mask->byte_84_rq_ci_pi,
- V2_QPC_BYTE_84_RQ_PRODUCER_IDX_M,
- V2_QPC_BYTE_84_RQ_PRODUCER_IDX_S, 0);
- roce_set_field(qpc_mask->byte_84_rq_ci_pi,
- V2_QPC_BYTE_84_RQ_CONSUMER_IDX_M,
- V2_QPC_BYTE_84_RQ_CONSUMER_IDX_S, 0);
-
- roce_set_field(qpc_mask->byte_92_srq_info, V2_QPC_BYTE_92_SRQ_INFO_M,
- V2_QPC_BYTE_92_SRQ_INFO_S, 0);
-
- roce_set_field(qpc_mask->byte_96_rx_reqmsn, V2_QPC_BYTE_96_RX_REQ_MSN_M,
- V2_QPC_BYTE_96_RX_REQ_MSN_S, 0);
-
- roce_set_field(qpc_mask->byte_104_rq_sge,
- V2_QPC_BYTE_104_RQ_CUR_WQE_SGE_NUM_M,
- V2_QPC_BYTE_104_RQ_CUR_WQE_SGE_NUM_S, 0);
-
- roce_set_bit(qpc_mask->byte_108_rx_reqepsn,
- V2_QPC_BYTE_108_RX_REQ_PSN_ERR_S, 0);
- roce_set_field(qpc_mask->byte_108_rx_reqepsn,
- V2_QPC_BYTE_108_RX_REQ_LAST_OPTYPE_M,
- V2_QPC_BYTE_108_RX_REQ_LAST_OPTYPE_S, 0);
- roce_set_bit(qpc_mask->byte_108_rx_reqepsn,
- V2_QPC_BYTE_108_RX_REQ_RNR_S, 0);
-
- qpc_mask->rq_rnr_timer = 0;
- qpc_mask->rx_msg_len = 0;
- qpc_mask->rx_rkey_pkt_info = 0;
- qpc_mask->rx_va = 0;
-
- roce_set_field(qpc_mask->byte_132_trrl, V2_QPC_BYTE_132_TRRL_HEAD_MAX_M,
- V2_QPC_BYTE_132_TRRL_HEAD_MAX_S, 0);
- roce_set_field(qpc_mask->byte_132_trrl, V2_QPC_BYTE_132_TRRL_TAIL_MAX_M,
- V2_QPC_BYTE_132_TRRL_TAIL_MAX_S, 0);
-
- roce_set_bit(qpc_mask->byte_140_raq, V2_QPC_BYTE_140_RQ_RTY_WAIT_DO_S,
- 0);
- roce_set_field(qpc_mask->byte_140_raq, V2_QPC_BYTE_140_RAQ_TRRL_HEAD_M,
- V2_QPC_BYTE_140_RAQ_TRRL_HEAD_S, 0);
- roce_set_field(qpc_mask->byte_140_raq, V2_QPC_BYTE_140_RAQ_TRRL_TAIL_M,
- V2_QPC_BYTE_140_RAQ_TRRL_TAIL_S, 0);
-
- roce_set_field(qpc_mask->byte_144_raq,
- V2_QPC_BYTE_144_RAQ_RTY_INI_PSN_M,
- V2_QPC_BYTE_144_RAQ_RTY_INI_PSN_S, 0);
- roce_set_field(qpc_mask->byte_144_raq, V2_QPC_BYTE_144_RAQ_CREDIT_M,
- V2_QPC_BYTE_144_RAQ_CREDIT_S, 0);
- roce_set_bit(qpc_mask->byte_144_raq, V2_QPC_BYTE_144_RESP_RTY_FLG_S, 0);
-
- roce_set_field(qpc_mask->byte_148_raq, V2_QPC_BYTE_148_RQ_MSN_M,
- V2_QPC_BYTE_148_RQ_MSN_S, 0);
- roce_set_field(qpc_mask->byte_148_raq, V2_QPC_BYTE_148_RAQ_SYNDROME_M,
- V2_QPC_BYTE_148_RAQ_SYNDROME_S, 0);
-
- roce_set_field(qpc_mask->byte_152_raq, V2_QPC_BYTE_152_RAQ_PSN_M,
- V2_QPC_BYTE_152_RAQ_PSN_S, 0);
- roce_set_field(qpc_mask->byte_152_raq,
- V2_QPC_BYTE_152_RAQ_TRRL_RTY_HEAD_M,
- V2_QPC_BYTE_152_RAQ_TRRL_RTY_HEAD_S, 0);
-
- roce_set_field(qpc_mask->byte_156_raq, V2_QPC_BYTE_156_RAQ_USE_PKTN_M,
- V2_QPC_BYTE_156_RAQ_USE_PKTN_S, 0);
-
- roce_set_field(qpc_mask->byte_160_sq_ci_pi,
- V2_QPC_BYTE_160_SQ_PRODUCER_IDX_M,
- V2_QPC_BYTE_160_SQ_PRODUCER_IDX_S, 0);
- roce_set_field(qpc_mask->byte_160_sq_ci_pi,
- V2_QPC_BYTE_160_SQ_CONSUMER_IDX_M,
- V2_QPC_BYTE_160_SQ_CONSUMER_IDX_S, 0);
-
- roce_set_bit(qpc_mask->byte_168_irrl_idx,
- V2_QPC_BYTE_168_POLL_DB_WAIT_DO_S, 0);
- roce_set_bit(qpc_mask->byte_168_irrl_idx,
- V2_QPC_BYTE_168_SCC_TOKEN_FORBID_SQ_DEQ_S, 0);
- roce_set_bit(qpc_mask->byte_168_irrl_idx,
- V2_QPC_BYTE_168_WAIT_ACK_TIMEOUT_S, 0);
- roce_set_bit(qpc_mask->byte_168_irrl_idx,
- V2_QPC_BYTE_168_MSG_RTY_LP_FLG_S, 0);
- roce_set_bit(qpc_mask->byte_168_irrl_idx,
- V2_QPC_BYTE_168_SQ_INVLD_FLG_S, 0);
- roce_set_field(qpc_mask->byte_168_irrl_idx,
- V2_QPC_BYTE_168_IRRL_IDX_LSB_M,
- V2_QPC_BYTE_168_IRRL_IDX_LSB_S, 0);
-
roce_set_field(context->byte_172_sq_psn, V2_QPC_BYTE_172_ACK_REQ_FREQ_M,
V2_QPC_BYTE_172_ACK_REQ_FREQ_S, 4);
- roce_set_field(qpc_mask->byte_172_sq_psn,
- V2_QPC_BYTE_172_ACK_REQ_FREQ_M,
- V2_QPC_BYTE_172_ACK_REQ_FREQ_S, 0);
-
- roce_set_bit(qpc_mask->byte_172_sq_psn, V2_QPC_BYTE_172_MSG_RNR_FLG_S,
- 0);
roce_set_bit(context->byte_172_sq_psn, V2_QPC_BYTE_172_FRE_S, 1);
- roce_set_bit(qpc_mask->byte_172_sq_psn, V2_QPC_BYTE_172_FRE_S, 0);
-
- roce_set_field(qpc_mask->byte_176_msg_pktn,
- V2_QPC_BYTE_176_MSG_USE_PKTN_M,
- V2_QPC_BYTE_176_MSG_USE_PKTN_S, 0);
- roce_set_field(qpc_mask->byte_176_msg_pktn,
- V2_QPC_BYTE_176_IRRL_HEAD_PRE_M,
- V2_QPC_BYTE_176_IRRL_HEAD_PRE_S, 0);
-
- roce_set_field(qpc_mask->byte_184_irrl_idx,
- V2_QPC_BYTE_184_IRRL_IDX_MSB_M,
- V2_QPC_BYTE_184_IRRL_IDX_MSB_S, 0);
-
- qpc_mask->cur_sge_offset = 0;
-
- roce_set_field(qpc_mask->byte_192_ext_sge,
- V2_QPC_BYTE_192_CUR_SGE_IDX_M,
- V2_QPC_BYTE_192_CUR_SGE_IDX_S, 0);
- roce_set_field(qpc_mask->byte_192_ext_sge,
- V2_QPC_BYTE_192_EXT_SGE_NUM_LEFT_M,
- V2_QPC_BYTE_192_EXT_SGE_NUM_LEFT_S, 0);
-
- roce_set_field(qpc_mask->byte_196_sq_psn, V2_QPC_BYTE_196_IRRL_HEAD_M,
- V2_QPC_BYTE_196_IRRL_HEAD_S, 0);
-
- roce_set_field(qpc_mask->byte_200_sq_max, V2_QPC_BYTE_200_SQ_MAX_IDX_M,
- V2_QPC_BYTE_200_SQ_MAX_IDX_S, 0);
- roce_set_field(qpc_mask->byte_200_sq_max,
- V2_QPC_BYTE_200_LCL_OPERATED_CNT_M,
- V2_QPC_BYTE_200_LCL_OPERATED_CNT_S, 0);
-
- roce_set_bit(qpc_mask->byte_208_irrl, V2_QPC_BYTE_208_PKT_RNR_FLG_S, 0);
- roce_set_bit(qpc_mask->byte_208_irrl, V2_QPC_BYTE_208_PKT_RTY_FLG_S, 0);
-
- roce_set_field(qpc_mask->byte_212_lsn, V2_QPC_BYTE_212_CHECK_FLG_M,
- V2_QPC_BYTE_212_CHECK_FLG_S, 0);
-
- qpc_mask->sq_timer = 0;
-
- roce_set_field(qpc_mask->byte_220_retry_psn_msn,
- V2_QPC_BYTE_220_RETRY_MSG_MSN_M,
- V2_QPC_BYTE_220_RETRY_MSG_MSN_S, 0);
- roce_set_field(qpc_mask->byte_232_irrl_sge,
- V2_QPC_BYTE_232_IRRL_SGE_IDX_M,
- V2_QPC_BYTE_232_IRRL_SGE_IDX_S, 0);
-
- roce_set_bit(qpc_mask->byte_232_irrl_sge, V2_QPC_BYTE_232_SO_LP_VLD_S,
- 0);
- roce_set_bit(qpc_mask->byte_232_irrl_sge,
- V2_QPC_BYTE_232_FENCE_LP_VLD_S, 0);
- roce_set_bit(qpc_mask->byte_232_irrl_sge, V2_QPC_BYTE_232_IRRL_LP_VLD_S,
- 0);
-
- qpc_mask->irrl_cur_sge_offset = 0;
-
- roce_set_field(qpc_mask->byte_240_irrl_tail,
- V2_QPC_BYTE_240_IRRL_TAIL_REAL_M,
- V2_QPC_BYTE_240_IRRL_TAIL_REAL_S, 0);
- roce_set_field(qpc_mask->byte_240_irrl_tail,
- V2_QPC_BYTE_240_IRRL_TAIL_RD_M,
- V2_QPC_BYTE_240_IRRL_TAIL_RD_S, 0);
- roce_set_field(qpc_mask->byte_240_irrl_tail,
- V2_QPC_BYTE_240_RX_ACK_MSN_M,
- V2_QPC_BYTE_240_RX_ACK_MSN_S, 0);
-
- roce_set_field(qpc_mask->byte_248_ack_psn, V2_QPC_BYTE_248_IRRL_PSN_M,
- V2_QPC_BYTE_248_IRRL_PSN_S, 0);
- roce_set_bit(qpc_mask->byte_248_ack_psn, V2_QPC_BYTE_248_ACK_PSN_ERR_S,
- 0);
- roce_set_field(qpc_mask->byte_248_ack_psn,
- V2_QPC_BYTE_248_ACK_LAST_OPTYPE_M,
- V2_QPC_BYTE_248_ACK_LAST_OPTYPE_S, 0);
- roce_set_bit(qpc_mask->byte_248_ack_psn, V2_QPC_BYTE_248_IRRL_PSN_VLD_S,
- 0);
- roce_set_bit(qpc_mask->byte_248_ack_psn,
- V2_QPC_BYTE_248_RNR_RETRY_FLAG_S, 0);
- roce_set_bit(qpc_mask->byte_248_ack_psn, V2_QPC_BYTE_248_CQ_ERR_IND_S,
- 0);
hr_qp->access_flags = attr->qp_access_flags;
roce_set_field(context->byte_252_err_txcqn, V2_QPC_BYTE_252_TX_CQN_M,
V2_QPC_BYTE_252_TX_CQN_S, to_hr_cq(ibqp->send_cq)->cqn);
- roce_set_field(qpc_mask->byte_252_err_txcqn, V2_QPC_BYTE_252_TX_CQN_M,
- V2_QPC_BYTE_252_TX_CQN_S, 0);
-
- roce_set_field(qpc_mask->byte_252_err_txcqn, V2_QPC_BYTE_252_ERR_TYPE_M,
- V2_QPC_BYTE_252_ERR_TYPE_S, 0);
-
- roce_set_field(qpc_mask->byte_256_sqflush_rqcqe,
- V2_QPC_BYTE_256_RQ_CQE_IDX_M,
- V2_QPC_BYTE_256_RQ_CQE_IDX_S, 0);
- roce_set_field(qpc_mask->byte_256_sqflush_rqcqe,
- V2_QPC_BYTE_256_SQ_FLUSH_IDX_M,
- V2_QPC_BYTE_256_SQ_FLUSH_IDX_S, 0);
}
static void modify_qp_init_to_init(struct ib_qp *ibqp,
@@ -3987,21 +3712,22 @@ static bool check_wqe_rq_mtt_count(struct hns_roce_dev *hr_dev,
struct hns_roce_qp *hr_qp, int mtt_cnt,
u32 page_size)
{
- struct device *dev = hr_dev->dev;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
if (hr_qp->rq.wqe_cnt < 1)
return true;
if (mtt_cnt < 1) {
- dev_err(dev, "qp(0x%lx) rqwqe buf ba find failed\n",
- hr_qp->qpn);
+ ibdev_err(ibdev, "failed to find RQWQE buf ba of QP(0x%lx)\n",
+ hr_qp->qpn);
return false;
}
if (mtt_cnt < MTT_MIN_COUNT &&
(hr_qp->rq.offset + page_size) < hr_qp->buff_size) {
- dev_err(dev, "qp(0x%lx) next rqwqe buf ba find failed\n",
- hr_qp->qpn);
+ ibdev_err(ibdev,
+ "failed to find next RQWQE buf ba of QP(0x%lx)\n",
+ hr_qp->qpn);
return false;
}
@@ -4016,7 +3742,7 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
const struct ib_global_route *grh = rdma_ah_read_grh(&attr->ah_attr);
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
- struct device *dev = hr_dev->dev;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
u64 mtts[MTT_MIN_COUNT] = { 0 };
dma_addr_t dma_handle_3;
dma_addr_t dma_handle_2;
@@ -4043,7 +3769,7 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
mtts_2 = hns_roce_table_find(hr_dev, &hr_dev->qp_table.irrl_table,
hr_qp->qpn, &dma_handle_2);
if (!mtts_2) {
- dev_err(dev, "qp irrl_table find failed\n");
+ ibdev_err(ibdev, "failed to find QP irrl_table\n");
return -EINVAL;
}
@@ -4051,12 +3777,13 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
mtts_3 = hns_roce_table_find(hr_dev, &hr_dev->qp_table.trrl_table,
hr_qp->qpn, &dma_handle_3);
if (!mtts_3) {
- dev_err(dev, "qp trrl_table find failed\n");
+ ibdev_err(ibdev, "failed to find QP trrl_table\n");
return -EINVAL;
}
if (attr_mask & IB_QP_ALT_PATH) {
- dev_err(dev, "INIT2RTR attr_mask (0x%x) error\n", attr_mask);
+ ibdev_err(ibdev, "INIT2RTR attr_mask (0x%x) error\n",
+ attr_mask);
return -EINVAL;
}
@@ -4201,7 +3928,7 @@ static int modify_qp_init_to_rtr(struct ib_qp *ibqp,
/* mtu*(2^LP_PKTN_INI) should not bigger than 1 message length 64kb */
roce_set_field(context->byte_56_dqpn_err, V2_QPC_BYTE_56_LP_PKTN_INI_M,
- V2_QPC_BYTE_56_LP_PKTN_INI_S, 4);
+ V2_QPC_BYTE_56_LP_PKTN_INI_S, 0);
roce_set_field(qpc_mask->byte_56_dqpn_err, V2_QPC_BYTE_56_LP_PKTN_INI_M,
V2_QPC_BYTE_56_LP_PKTN_INI_S, 0);
@@ -4259,7 +3986,7 @@ static int modify_qp_rtr_to_rts(struct ib_qp *ibqp,
{
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
- struct device *dev = hr_dev->dev;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
u64 sge_cur_blk = 0;
u64 sq_cur_blk = 0;
u32 page_size;
@@ -4268,7 +3995,8 @@ static int modify_qp_rtr_to_rts(struct ib_qp *ibqp,
/* Search qp buf's mtts */
count = hns_roce_mtr_find(hr_dev, &hr_qp->mtr, 0, &sq_cur_blk, 1, NULL);
if (count < 1) {
- dev_err(dev, "qp(0x%lx) buf pa find failed\n", hr_qp->qpn);
+ ibdev_err(ibdev, "failed to find buf pa of QP(0x%lx)\n",
+ hr_qp->qpn);
return -EINVAL;
}
@@ -4278,16 +4006,15 @@ static int modify_qp_rtr_to_rts(struct ib_qp *ibqp,
hr_qp->sge.offset / page_size,
&sge_cur_blk, 1, NULL);
if (count < 1) {
- dev_err(dev, "qp(0x%lx) sge pa find failed\n",
- hr_qp->qpn);
+ ibdev_err(ibdev, "failed to find sge pa of QP(0x%lx)\n",
+ hr_qp->qpn);
return -EINVAL;
}
}
/* Not support alternate path and path migration */
- if ((attr_mask & IB_QP_ALT_PATH) ||
- (attr_mask & IB_QP_PATH_MIG_STATE)) {
- dev_err(dev, "RTR2RTS attr_mask (0x%x)error\n", attr_mask);
+ if (attr_mask & (IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE)) {
+ ibdev_err(ibdev, "RTR2RTS attr_mask (0x%x)error\n", attr_mask);
return -EINVAL;
}
@@ -4405,6 +4132,7 @@ static int hns_roce_v2_set_path(struct ib_qp *ibqp,
const struct ib_global_route *grh = rdma_ah_read_grh(&attr->ah_attr);
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+ struct ib_device *ibdev = &hr_dev->ib_dev;
const struct ib_gid_attr *gid_attr = NULL;
int is_roce_protocol;
u16 vlan_id = 0xffff;
@@ -4446,13 +4174,13 @@ static int hns_roce_v2_set_path(struct ib_qp *ibqp,
V2_QPC_BYTE_24_VLAN_ID_S, 0);
if (grh->sgid_index >= hr_dev->caps.gid_table_len[hr_port]) {
- dev_err(hr_dev->dev, "sgid_index(%u) too large. max is %d\n",
- grh->sgid_index, hr_dev->caps.gid_table_len[hr_port]);
+ ibdev_err(ibdev, "sgid_index(%u) too large. max is %d\n",
+ grh->sgid_index, hr_dev->caps.gid_table_len[hr_port]);
return -EINVAL;
}
if (attr->ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE) {
- dev_err(hr_dev->dev, "ah attr is not RDMA roce type\n");
+ ibdev_err(ibdev, "ah attr is not RDMA roce type\n");
return -EINVAL;
}
@@ -4475,7 +4203,7 @@ static int hns_roce_v2_set_path(struct ib_qp *ibqp,
roce_set_field(qpc_mask->byte_24_mtu_tc, V2_QPC_BYTE_24_HOP_LIMIT_M,
V2_QPC_BYTE_24_HOP_LIMIT_S, 0);
- if (hr_dev->pci_dev->revision == 0x21 && is_udp)
+ if (hr_dev->pci_dev->revision >= PCI_REVISION_ID_HIP08_B && is_udp)
roce_set_field(context->byte_24_mtu_tc, V2_QPC_BYTE_24_TC_M,
V2_QPC_BYTE_24_TC_S, grh->traffic_class >> 2);
else
@@ -4530,7 +4258,7 @@ static int hns_roce_v2_set_abs_fields(struct ib_qp *ibqp,
/* Nothing */
;
} else {
- dev_err(hr_dev->dev, "Illegal state for QP!\n");
+ ibdev_err(&hr_dev->ib_dev, "Illegal state for QP!\n");
ret = -EINVAL;
goto out;
}
@@ -4565,8 +4293,8 @@ static int hns_roce_v2_set_opt_fields(struct ib_qp *ibqp,
V2_QPC_BYTE_28_AT_M, V2_QPC_BYTE_28_AT_S,
0);
} else {
- dev_warn(hr_dev->dev,
- "Local ACK timeout shall be 0 to 30.\n");
+ ibdev_warn(&hr_dev->ib_dev,
+ "Local ACK timeout shall be 0 to 30.\n");
}
}
@@ -4734,7 +4462,9 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
struct hns_roce_v2_qp_context ctx[2];
struct hns_roce_v2_qp_context *context = ctx;
struct hns_roce_v2_qp_context *qpc_mask = ctx + 1;
- struct device *dev = hr_dev->dev;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ unsigned long sq_flag = 0;
+ unsigned long rq_flag = 0;
int ret;
/*
@@ -4752,6 +4482,8 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
/* When QP state is err, SQ and RQ WQE should be flushed */
if (new_state == IB_QPS_ERR) {
+ spin_lock_irqsave(&hr_qp->sq.lock, sq_flag);
+ hr_qp->state = IB_QPS_ERR;
roce_set_field(context->byte_160_sq_ci_pi,
V2_QPC_BYTE_160_SQ_PRODUCER_IDX_M,
V2_QPC_BYTE_160_SQ_PRODUCER_IDX_S,
@@ -4759,8 +4491,10 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
roce_set_field(qpc_mask->byte_160_sq_ci_pi,
V2_QPC_BYTE_160_SQ_PRODUCER_IDX_M,
V2_QPC_BYTE_160_SQ_PRODUCER_IDX_S, 0);
+ spin_unlock_irqrestore(&hr_qp->sq.lock, sq_flag);
if (!ibqp->srq) {
+ spin_lock_irqsave(&hr_qp->rq.lock, rq_flag);
roce_set_field(context->byte_84_rq_ci_pi,
V2_QPC_BYTE_84_RQ_PRODUCER_IDX_M,
V2_QPC_BYTE_84_RQ_PRODUCER_IDX_S,
@@ -4768,6 +4502,7 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
roce_set_field(qpc_mask->byte_84_rq_ci_pi,
V2_QPC_BYTE_84_RQ_PRODUCER_IDX_M,
V2_QPC_BYTE_84_RQ_PRODUCER_IDX_S, 0);
+ spin_unlock_irqrestore(&hr_qp->rq.lock, rq_flag);
}
}
@@ -4791,7 +4526,7 @@ static int hns_roce_v2_modify_qp(struct ib_qp *ibqp,
/* SW pass context to HW */
ret = hns_roce_v2_qp_modify(hr_dev, ctx, hr_qp);
if (ret) {
- dev_err(dev, "hns_roce_qp_modify failed(%d)\n", ret);
+ ibdev_err(ibdev, "failed to modify QP, ret = %d\n", ret);
goto out;
}
@@ -4848,10 +4583,8 @@ static int hns_roce_v2_query_qpc(struct hns_roce_dev *hr_dev,
ret = hns_roce_cmd_mbox(hr_dev, 0, mailbox->dma, hr_qp->qpn, 0,
HNS_ROCE_CMD_QUERY_QPC,
HNS_ROCE_CMD_TIMEOUT_MSECS);
- if (ret) {
- dev_err(hr_dev->dev, "QUERY QP cmd process error\n");
+ if (ret)
goto out;
- }
memcpy(hr_context, mailbox->buf, sizeof(*hr_context));
@@ -4867,7 +4600,7 @@ static int hns_roce_v2_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
struct hns_roce_v2_qp_context context = {};
- struct device *dev = hr_dev->dev;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
int tmp_qp_state;
int state;
int ret;
@@ -4885,7 +4618,7 @@ static int hns_roce_v2_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
ret = hns_roce_v2_query_qpc(hr_dev, hr_qp, &context);
if (ret) {
- dev_err(dev, "query qpc error\n");
+ ibdev_err(ibdev, "failed to query QPC, ret = %d\n", ret);
ret = -EINVAL;
goto out;
}
@@ -4894,7 +4627,7 @@ static int hns_roce_v2_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
V2_QPC_BYTE_60_QP_ST_M, V2_QPC_BYTE_60_QP_ST_S);
tmp_qp_state = to_ib_qp_st((enum hns_roce_v2_qp_state)state);
if (tmp_qp_state == -1) {
- dev_err(dev, "Illegal ib_qp_state\n");
+ ibdev_err(ibdev, "Illegal ib_qp_state\n");
ret = -EINVAL;
goto out;
}
@@ -4992,8 +4725,8 @@ static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
struct hns_roce_qp *hr_qp,
struct ib_udata *udata)
{
- struct hns_roce_cq *send_cq, *recv_cq;
struct ib_device *ibdev = &hr_dev->ib_dev;
+ struct hns_roce_cq *send_cq, *recv_cq;
unsigned long flags;
int ret = 0;
@@ -5002,7 +4735,9 @@ static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
hr_qp->state, IB_QPS_RESET);
if (ret)
- ibdev_err(ibdev, "modify QP to Reset failed.\n");
+ ibdev_err(ibdev,
+ "failed to modify QP to RST, ret = %d\n",
+ ret);
}
send_cq = hr_qp->ibqp.send_cq ? to_hr_cq(hr_qp->ibqp.send_cq) : NULL;
@@ -5011,10 +4746,6 @@ static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
spin_lock_irqsave(&hr_dev->qp_list_lock, flags);
hns_roce_lock_cqs(send_cq, recv_cq);
- list_del(&hr_qp->node);
- list_del(&hr_qp->sq_node);
- list_del(&hr_qp->rq_node);
-
if (!udata) {
if (recv_cq)
__hns_roce_v2_cq_clean(recv_cq, hr_qp->qpn,
@@ -5032,43 +4763,6 @@ static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
hns_roce_unlock_cqs(send_cq, recv_cq);
spin_unlock_irqrestore(&hr_dev->qp_list_lock, flags);
- hns_roce_qp_free(hr_dev, hr_qp);
-
- /* Not special_QP, free their QPN */
- if ((hr_qp->ibqp.qp_type == IB_QPT_RC) ||
- (hr_qp->ibqp.qp_type == IB_QPT_UC) ||
- (hr_qp->ibqp.qp_type == IB_QPT_UD))
- hns_roce_release_range_qp(hr_dev, hr_qp->qpn, 1);
-
- hns_roce_mtr_cleanup(hr_dev, &hr_qp->mtr);
-
- if (udata) {
- struct hns_roce_ucontext *context =
- rdma_udata_to_drv_context(
- udata,
- struct hns_roce_ucontext,
- ibucontext);
-
- if (hr_qp->sq.wqe_cnt && (hr_qp->sdb_en == 1))
- hns_roce_db_unmap_user(context, &hr_qp->sdb);
-
- if (hr_qp->rq.wqe_cnt && (hr_qp->rdb_en == 1))
- hns_roce_db_unmap_user(context, &hr_qp->rdb);
- } else {
- kfree(hr_qp->sq.wrid);
- kfree(hr_qp->rq.wrid);
- hns_roce_buf_free(hr_dev, hr_qp->buff_size, &hr_qp->hr_buf);
- if (hr_qp->rq.wqe_cnt)
- hns_roce_free_db(hr_dev, &hr_qp->rdb);
- }
- ib_umem_release(hr_qp->umem);
-
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) &&
- hr_qp->rq.wqe_cnt) {
- kfree(hr_qp->rq_inl_buf.wqe_list[0].sg_list);
- kfree(hr_qp->rq_inl_buf.wqe_list);
- }
-
return ret;
}
@@ -5080,17 +4774,19 @@ static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
if (ret)
- ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx failed(%d)\n",
+ ibdev_err(&hr_dev->ib_dev,
+ "failed to destroy QP 0x%06lx, ret = %d\n",
hr_qp->qpn, ret);
- kfree(hr_qp);
+ hns_roce_qp_destroy(hr_dev, hr_qp, udata);
return 0;
}
static int hns_roce_v2_qp_flow_control_init(struct hns_roce_dev *hr_dev,
- struct hns_roce_qp *hr_qp)
+ struct hns_roce_qp *hr_qp)
{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
struct hns_roce_sccc_clr_done *resp;
struct hns_roce_sccc_clr *clr;
struct hns_roce_cmq_desc desc;
@@ -5102,7 +4798,7 @@ static int hns_roce_v2_qp_flow_control_init(struct hns_roce_dev *hr_dev,
hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_RESET_SCCC, false);
ret = hns_roce_cmq_send(hr_dev, &desc, 1);
if (ret) {
- dev_err(hr_dev->dev, "Reset SCC ctx failed(%d)\n", ret);
+ ibdev_err(ibdev, "failed to reset SCC ctx, ret = %d\n", ret);
goto out;
}
@@ -5112,7 +4808,7 @@ static int hns_roce_v2_qp_flow_control_init(struct hns_roce_dev *hr_dev,
clr->qpn = cpu_to_le32(hr_qp->qpn);
ret = hns_roce_cmq_send(hr_dev, &desc, 1);
if (ret) {
- dev_err(hr_dev->dev, "Clear SCC ctx failed(%d)\n", ret);
+ ibdev_err(ibdev, "failed to clear SCC ctx, ret = %d\n", ret);
goto out;
}
@@ -5123,7 +4819,8 @@ static int hns_roce_v2_qp_flow_control_init(struct hns_roce_dev *hr_dev,
HNS_ROCE_OPC_QUERY_SCCC, true);
ret = hns_roce_cmq_send(hr_dev, &desc, 1);
if (ret) {
- dev_err(hr_dev->dev, "Query clr cmq failed(%d)\n", ret);
+ ibdev_err(ibdev, "failed to query clr cmq, ret = %d\n",
+ ret);
goto out;
}
@@ -5133,7 +4830,7 @@ static int hns_roce_v2_qp_flow_control_init(struct hns_roce_dev *hr_dev,
msleep(20);
}
- dev_err(hr_dev->dev, "Query SCC clr done flag overtime.\n");
+ ibdev_err(ibdev, "Query SCC clr done flag overtime.\n");
ret = -ETIMEDOUT;
out:
@@ -5177,99 +4874,65 @@ static int hns_roce_v2_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period)
HNS_ROCE_CMD_TIMEOUT_MSECS);
hns_roce_free_cmd_mailbox(hr_dev, mailbox);
if (ret)
- dev_err(hr_dev->dev, "MODIFY CQ Failed to cmd mailbox.\n");
+ ibdev_err(&hr_dev->ib_dev,
+ "failed to process cmd when modifying CQ, ret = %d\n",
+ ret);
return ret;
}
-static void hns_roce_set_qps_to_err(struct hns_roce_dev *hr_dev, u32 qpn)
-{
- struct hns_roce_qp *hr_qp;
- struct ib_qp_attr attr;
- int attr_mask;
- int ret;
-
- hr_qp = __hns_roce_qp_lookup(hr_dev, qpn);
- if (!hr_qp) {
- dev_warn(hr_dev->dev, "no hr_qp can be found!\n");
- return;
- }
-
- if (hr_qp->ibqp.uobject) {
- if (hr_qp->sdb_en == 1) {
- hr_qp->sq.head = *(int *)(hr_qp->sdb.virt_addr);
- if (hr_qp->rdb_en == 1)
- hr_qp->rq.head = *(int *)(hr_qp->rdb.virt_addr);
- } else {
- dev_warn(hr_dev->dev, "flush cqe is unsupported in userspace!\n");
- return;
- }
- }
-
- attr_mask = IB_QP_STATE;
- attr.qp_state = IB_QPS_ERR;
- ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, &attr, attr_mask,
- hr_qp->state, IB_QPS_ERR);
- if (ret)
- dev_err(hr_dev->dev, "failed to modify qp %d to err state.\n",
- qpn);
-}
-
static void hns_roce_irq_work_handle(struct work_struct *work)
{
struct hns_roce_work *irq_work =
container_of(work, struct hns_roce_work, work);
- struct device *dev = irq_work->hr_dev->dev;
+ struct ib_device *ibdev = &irq_work->hr_dev->ib_dev;
u32 qpn = irq_work->qpn;
u32 cqn = irq_work->cqn;
switch (irq_work->event_type) {
case HNS_ROCE_EVENT_TYPE_PATH_MIG:
- dev_info(dev, "Path migrated succeeded.\n");
+ ibdev_info(ibdev, "Path migrated succeeded.\n");
break;
case HNS_ROCE_EVENT_TYPE_PATH_MIG_FAILED:
- dev_warn(dev, "Path migration failed.\n");
+ ibdev_warn(ibdev, "Path migration failed.\n");
break;
case HNS_ROCE_EVENT_TYPE_COMM_EST:
break;
case HNS_ROCE_EVENT_TYPE_SQ_DRAINED:
- dev_warn(dev, "Send queue drained.\n");
+ ibdev_warn(ibdev, "Send queue drained.\n");
break;
case HNS_ROCE_EVENT_TYPE_WQ_CATAS_ERROR:
- dev_err(dev, "Local work queue 0x%x catas error, sub_type:%d\n",
- qpn, irq_work->sub_type);
- hns_roce_set_qps_to_err(irq_work->hr_dev, qpn);
+ ibdev_err(ibdev, "Local work queue 0x%x catast error, sub_event type is: %d\n",
+ qpn, irq_work->sub_type);
break;
case HNS_ROCE_EVENT_TYPE_INV_REQ_LOCAL_WQ_ERROR:
- dev_err(dev, "Invalid request local work queue 0x%x error.\n",
- qpn);
- hns_roce_set_qps_to_err(irq_work->hr_dev, qpn);
+ ibdev_err(ibdev, "Invalid request local work queue 0x%x error.\n",
+ qpn);
break;
case HNS_ROCE_EVENT_TYPE_LOCAL_WQ_ACCESS_ERROR:
- dev_err(dev, "Local access violation work queue 0x%x error, sub_type:%d\n",
- qpn, irq_work->sub_type);
- hns_roce_set_qps_to_err(irq_work->hr_dev, qpn);
+ ibdev_err(ibdev, "Local access violation work queue 0x%x error, sub_event type is: %d\n",
+ qpn, irq_work->sub_type);
break;
case HNS_ROCE_EVENT_TYPE_SRQ_LIMIT_REACH:
- dev_warn(dev, "SRQ limit reach.\n");
+ ibdev_warn(ibdev, "SRQ limit reach.\n");
break;
case HNS_ROCE_EVENT_TYPE_SRQ_LAST_WQE_REACH:
- dev_warn(dev, "SRQ last wqe reach.\n");
+ ibdev_warn(ibdev, "SRQ last wqe reach.\n");
break;
case HNS_ROCE_EVENT_TYPE_SRQ_CATAS_ERROR:
- dev_err(dev, "SRQ catas error.\n");
+ ibdev_err(ibdev, "SRQ catas error.\n");
break;
case HNS_ROCE_EVENT_TYPE_CQ_ACCESS_ERROR:
- dev_err(dev, "CQ 0x%x access err.\n", cqn);
+ ibdev_err(ibdev, "CQ 0x%x access err.\n", cqn);
break;
case HNS_ROCE_EVENT_TYPE_CQ_OVERFLOW:
- dev_warn(dev, "CQ 0x%x overflow\n", cqn);
+ ibdev_warn(ibdev, "CQ 0x%x overflow\n", cqn);
break;
case HNS_ROCE_EVENT_TYPE_DB_OVERFLOW:
- dev_warn(dev, "DB overflow.\n");
+ ibdev_warn(ibdev, "DB overflow.\n");
break;
case HNS_ROCE_EVENT_TYPE_FLR:
- dev_warn(dev, "Function level reset.\n");
+ ibdev_warn(ibdev, "Function level reset.\n");
break;
default:
break;
@@ -5326,44 +4989,24 @@ static void set_eq_cons_index_v2(struct hns_roce_eq *eq)
hns_roce_write64(hr_dev, doorbell, eq->doorbell);
}
-static struct hns_roce_aeqe *get_aeqe_v2(struct hns_roce_eq *eq, u32 entry)
+static inline void *get_eqe_buf(struct hns_roce_eq *eq, unsigned long offset)
{
u32 buf_chk_sz;
- unsigned long off;
buf_chk_sz = 1 << (eq->eqe_buf_pg_sz + PAGE_SHIFT);
- off = (entry & (eq->entries - 1)) * HNS_ROCE_AEQ_ENTRY_SIZE;
-
- return (struct hns_roce_aeqe *)((char *)(eq->buf_list->buf) +
- off % buf_chk_sz);
-}
-
-static struct hns_roce_aeqe *mhop_get_aeqe(struct hns_roce_eq *eq, u32 entry)
-{
- u32 buf_chk_sz;
- unsigned long off;
-
- buf_chk_sz = 1 << (eq->eqe_buf_pg_sz + PAGE_SHIFT);
-
- off = (entry & (eq->entries - 1)) * HNS_ROCE_AEQ_ENTRY_SIZE;
-
- if (eq->hop_num == HNS_ROCE_HOP_NUM_0)
- return (struct hns_roce_aeqe *)((u8 *)(eq->bt_l0) +
- off % buf_chk_sz);
+ if (eq->buf.nbufs == 1)
+ return eq->buf.direct.buf + offset % buf_chk_sz;
else
- return (struct hns_roce_aeqe *)((u8 *)
- (eq->buf[off / buf_chk_sz]) + off % buf_chk_sz);
+ return eq->buf.page_list[offset / buf_chk_sz].buf +
+ offset % buf_chk_sz;
}
static struct hns_roce_aeqe *next_aeqe_sw_v2(struct hns_roce_eq *eq)
{
struct hns_roce_aeqe *aeqe;
- if (!eq->hop_num)
- aeqe = get_aeqe_v2(eq, eq->cons_index);
- else
- aeqe = mhop_get_aeqe(eq, eq->cons_index);
-
+ aeqe = get_eqe_buf(eq, (eq->cons_index & (eq->entries - 1)) *
+ HNS_ROCE_AEQ_ENTRY_SIZE);
return (roce_get_bit(aeqe->asyn, HNS_ROCE_V2_AEQ_AEQE_OWNER_S) ^
!!(eq->cons_index & eq->entries)) ? aeqe : NULL;
}
@@ -5456,44 +5099,12 @@ static int hns_roce_v2_aeq_int(struct hns_roce_dev *hr_dev,
return aeqe_found;
}
-static struct hns_roce_ceqe *get_ceqe_v2(struct hns_roce_eq *eq, u32 entry)
-{
- u32 buf_chk_sz;
- unsigned long off;
-
- buf_chk_sz = 1 << (eq->eqe_buf_pg_sz + PAGE_SHIFT);
- off = (entry & (eq->entries - 1)) * HNS_ROCE_CEQ_ENTRY_SIZE;
-
- return (struct hns_roce_ceqe *)((char *)(eq->buf_list->buf) +
- off % buf_chk_sz);
-}
-
-static struct hns_roce_ceqe *mhop_get_ceqe(struct hns_roce_eq *eq, u32 entry)
-{
- u32 buf_chk_sz;
- unsigned long off;
-
- buf_chk_sz = 1 << (eq->eqe_buf_pg_sz + PAGE_SHIFT);
-
- off = (entry & (eq->entries - 1)) * HNS_ROCE_CEQ_ENTRY_SIZE;
-
- if (eq->hop_num == HNS_ROCE_HOP_NUM_0)
- return (struct hns_roce_ceqe *)((u8 *)(eq->bt_l0) +
- off % buf_chk_sz);
- else
- return (struct hns_roce_ceqe *)((u8 *)(eq->buf[off /
- buf_chk_sz]) + off % buf_chk_sz);
-}
-
static struct hns_roce_ceqe *next_ceqe_sw_v2(struct hns_roce_eq *eq)
{
struct hns_roce_ceqe *ceqe;
- if (!eq->hop_num)
- ceqe = get_ceqe_v2(eq, eq->cons_index);
- else
- ceqe = mhop_get_ceqe(eq, eq->cons_index);
-
+ ceqe = get_eqe_buf(eq, (eq->cons_index & (eq->entries - 1)) *
+ HNS_ROCE_CEQ_ENTRY_SIZE);
return (!!(roce_get_bit(ceqe->comp, HNS_ROCE_V2_CEQ_CEQE_OWNER_S))) ^
(!!(eq->cons_index & eq->entries)) ? ceqe : NULL;
}
@@ -5501,7 +5112,6 @@ static struct hns_roce_ceqe *next_ceqe_sw_v2(struct hns_roce_eq *eq)
static int hns_roce_v2_ceq_int(struct hns_roce_dev *hr_dev,
struct hns_roce_eq *eq)
{
- struct device *dev = hr_dev->dev;
struct hns_roce_ceqe *ceqe = next_ceqe_sw_v2(eq);
int ceqe_found = 0;
u32 cqn;
@@ -5520,10 +5130,8 @@ static int hns_roce_v2_ceq_int(struct hns_roce_dev *hr_dev,
++eq->cons_index;
ceqe_found = 1;
- if (eq->cons_index > (EQ_DEPTH_COEFF * eq->entries - 1)) {
- dev_warn(dev, "cons_index overflow, set back to 0.\n");
+ if (eq->cons_index > (EQ_DEPTH_COEFF * eq->entries - 1))
eq->cons_index = 0;
- }
ceqe = next_ceqe_sw_v2(eq);
}
@@ -5653,90 +5261,11 @@ static void hns_roce_v2_destroy_eqc(struct hns_roce_dev *hr_dev, int eqn)
dev_err(dev, "[mailbox cmd] destroy eqc(%d) failed.\n", eqn);
}
-static void hns_roce_mhop_free_eq(struct hns_roce_dev *hr_dev,
- struct hns_roce_eq *eq)
+static void free_eq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_eq *eq)
{
- struct device *dev = hr_dev->dev;
- u64 idx;
- u64 size;
- u32 buf_chk_sz;
- u32 bt_chk_sz;
- u32 mhop_num;
- int eqe_alloc;
- int i = 0;
- int j = 0;
-
- mhop_num = hr_dev->caps.eqe_hop_num;
- buf_chk_sz = 1 << (hr_dev->caps.eqe_buf_pg_sz + PAGE_SHIFT);
- bt_chk_sz = 1 << (hr_dev->caps.eqe_ba_pg_sz + PAGE_SHIFT);
-
- if (mhop_num == HNS_ROCE_HOP_NUM_0) {
- dma_free_coherent(dev, (unsigned int)(eq->entries *
- eq->eqe_size), eq->bt_l0, eq->l0_dma);
- return;
- }
-
- dma_free_coherent(dev, bt_chk_sz, eq->bt_l0, eq->l0_dma);
- if (mhop_num == 1) {
- for (i = 0; i < eq->l0_last_num; i++) {
- if (i == eq->l0_last_num - 1) {
- eqe_alloc = i * (buf_chk_sz / eq->eqe_size);
- size = (eq->entries - eqe_alloc) * eq->eqe_size;
- dma_free_coherent(dev, size, eq->buf[i],
- eq->buf_dma[i]);
- break;
- }
- dma_free_coherent(dev, buf_chk_sz, eq->buf[i],
- eq->buf_dma[i]);
- }
- } else if (mhop_num == 2) {
- for (i = 0; i < eq->l0_last_num; i++) {
- dma_free_coherent(dev, bt_chk_sz, eq->bt_l1[i],
- eq->l1_dma[i]);
-
- for (j = 0; j < bt_chk_sz / BA_BYTE_LEN; j++) {
- idx = i * (bt_chk_sz / BA_BYTE_LEN) + j;
- if ((i == eq->l0_last_num - 1)
- && j == eq->l1_last_num - 1) {
- eqe_alloc = (buf_chk_sz / eq->eqe_size)
- * idx;
- size = (eq->entries - eqe_alloc)
- * eq->eqe_size;
- dma_free_coherent(dev, size,
- eq->buf[idx],
- eq->buf_dma[idx]);
- break;
- }
- dma_free_coherent(dev, buf_chk_sz, eq->buf[idx],
- eq->buf_dma[idx]);
- }
- }
- }
- kfree(eq->buf_dma);
- kfree(eq->buf);
- kfree(eq->l1_dma);
- kfree(eq->bt_l1);
- eq->buf_dma = NULL;
- eq->buf = NULL;
- eq->l1_dma = NULL;
- eq->bt_l1 = NULL;
-}
-
-static void hns_roce_v2_free_eq(struct hns_roce_dev *hr_dev,
- struct hns_roce_eq *eq)
-{
- u32 buf_chk_sz;
-
- buf_chk_sz = 1 << (eq->eqe_buf_pg_sz + PAGE_SHIFT);
-
- if (hr_dev->caps.eqe_hop_num) {
- hns_roce_mhop_free_eq(hr_dev, eq);
- return;
- }
-
- dma_free_coherent(hr_dev->dev, buf_chk_sz, eq->buf_list->buf,
- eq->buf_list->map);
- kfree(eq->buf_list);
+ if (!eq->hop_num || eq->hop_num == HNS_ROCE_HOP_NUM_0)
+ hns_roce_mtr_cleanup(hr_dev, &eq->mtr);
+ hns_roce_buf_free(hr_dev, eq->buf.size, &eq->buf);
}
static void hns_roce_config_eqc(struct hns_roce_dev *hr_dev,
@@ -5744,6 +5273,8 @@ static void hns_roce_config_eqc(struct hns_roce_dev *hr_dev,
void *mb_buf)
{
struct hns_roce_eq_context *eqc;
+ u64 ba[MTT_MIN_COUNT] = { 0 };
+ int count;
eqc = mb_buf;
memset(eqc, 0, sizeof(struct hns_roce_eq_context));
@@ -5759,10 +5290,23 @@ static void hns_roce_config_eqc(struct hns_roce_dev *hr_dev,
eq->eqe_buf_pg_sz = hr_dev->caps.eqe_buf_pg_sz;
eq->shift = ilog2((unsigned int)eq->entries);
- if (!eq->hop_num)
- eq->eqe_ba = eq->buf_list->map;
- else
- eq->eqe_ba = eq->l0_dma;
+ /* if not muti-hop, eqe buffer only use one trunk */
+ if (!eq->hop_num || eq->hop_num == HNS_ROCE_HOP_NUM_0) {
+ eq->eqe_ba = eq->buf.direct.map;
+ eq->cur_eqe_ba = eq->eqe_ba;
+ if (eq->buf.npages > 1)
+ eq->nxt_eqe_ba = eq->eqe_ba + (1 << eq->eqe_buf_pg_sz);
+ else
+ eq->nxt_eqe_ba = eq->eqe_ba;
+ } else {
+ count = hns_roce_mtr_find(hr_dev, &eq->mtr, 0, ba,
+ MTT_MIN_COUNT, &eq->eqe_ba);
+ eq->cur_eqe_ba = ba[0];
+ if (count > 1)
+ eq->nxt_eqe_ba = ba[1];
+ else
+ eq->nxt_eqe_ba = ba[0];
+ }
/* set eqc state */
roce_set_field(eqc->byte_4, HNS_ROCE_EQC_EQ_ST_M, HNS_ROCE_EQC_EQ_ST_S,
@@ -5860,220 +5404,97 @@ static void hns_roce_config_eqc(struct hns_roce_dev *hr_dev,
HNS_ROCE_EQC_NXT_EQE_BA_H_S, eq->nxt_eqe_ba >> 44);
}
-static int hns_roce_mhop_alloc_eq(struct hns_roce_dev *hr_dev,
- struct hns_roce_eq *eq)
+static int map_eq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_eq *eq,
+ u32 page_shift)
{
- struct device *dev = hr_dev->dev;
- int eq_alloc_done = 0;
- int eq_buf_cnt = 0;
- int eqe_alloc;
- u32 buf_chk_sz;
- u32 bt_chk_sz;
- u32 mhop_num;
- u64 size;
- u64 idx;
+ struct hns_roce_buf_region region = {};
+ dma_addr_t *buf_list = NULL;
int ba_num;
- int bt_num;
- int record_i;
- int record_j;
- int i = 0;
- int j = 0;
-
- mhop_num = hr_dev->caps.eqe_hop_num;
- buf_chk_sz = 1 << (hr_dev->caps.eqe_buf_pg_sz + PAGE_SHIFT);
- bt_chk_sz = 1 << (hr_dev->caps.eqe_ba_pg_sz + PAGE_SHIFT);
+ int ret;
ba_num = DIV_ROUND_UP(PAGE_ALIGN(eq->entries * eq->eqe_size),
- buf_chk_sz);
- bt_num = DIV_ROUND_UP(ba_num, bt_chk_sz / BA_BYTE_LEN);
+ 1 << page_shift);
+ hns_roce_init_buf_region(®ion, hr_dev->caps.eqe_hop_num, 0, ba_num);
- if (mhop_num == HNS_ROCE_HOP_NUM_0) {
- if (eq->entries > buf_chk_sz / eq->eqe_size) {
- dev_err(dev, "eq entries %d is larger than buf_pg_sz!",
- eq->entries);
- return -EINVAL;
- }
- eq->bt_l0 = dma_alloc_coherent(dev, eq->entries * eq->eqe_size,
- &(eq->l0_dma), GFP_KERNEL);
- if (!eq->bt_l0)
- return -ENOMEM;
-
- eq->cur_eqe_ba = eq->l0_dma;
- eq->nxt_eqe_ba = 0;
-
- return 0;
+ /* alloc a tmp list for storing eq buf address */
+ ret = hns_roce_alloc_buf_list(®ion, &buf_list, 1);
+ if (ret) {
+ dev_err(hr_dev->dev, "alloc eq buf_list error\n");
+ return ret;
}
- eq->buf_dma = kcalloc(ba_num, sizeof(*eq->buf_dma), GFP_KERNEL);
- if (!eq->buf_dma)
- return -ENOMEM;
- eq->buf = kcalloc(ba_num, sizeof(*eq->buf), GFP_KERNEL);
- if (!eq->buf)
- goto err_kcalloc_buf;
-
- if (mhop_num == 2) {
- eq->l1_dma = kcalloc(bt_num, sizeof(*eq->l1_dma), GFP_KERNEL);
- if (!eq->l1_dma)
- goto err_kcalloc_l1_dma;
-
- eq->bt_l1 = kcalloc(bt_num, sizeof(*eq->bt_l1), GFP_KERNEL);
- if (!eq->bt_l1)
- goto err_kcalloc_bt_l1;
+ ba_num = hns_roce_get_kmem_bufs(hr_dev, buf_list, region.count,
+ region.offset, &eq->buf);
+ if (ba_num != region.count) {
+ dev_err(hr_dev->dev, "get eqe buf err,expect %d,ret %d.\n",
+ region.count, ba_num);
+ ret = -ENOBUFS;
+ goto done;
}
- /* alloc L0 BT */
- eq->bt_l0 = dma_alloc_coherent(dev, bt_chk_sz, &eq->l0_dma, GFP_KERNEL);
- if (!eq->bt_l0)
- goto err_dma_alloc_l0;
+ hns_roce_mtr_init(&eq->mtr, PAGE_SHIFT + hr_dev->caps.eqe_ba_pg_sz,
+ page_shift);
+ ret = hns_roce_mtr_attach(hr_dev, &eq->mtr, &buf_list, ®ion, 1);
+ if (ret)
+ dev_err(hr_dev->dev, "mtr attach error for eqe\n");
- if (mhop_num == 1) {
- if (ba_num > (bt_chk_sz / BA_BYTE_LEN))
- dev_err(dev, "ba_num %d is too large for 1 hop\n",
- ba_num);
+ goto done;
- /* alloc buf */
- for (i = 0; i < bt_chk_sz / BA_BYTE_LEN; i++) {
- if (eq_buf_cnt + 1 < ba_num) {
- size = buf_chk_sz;
- } else {
- eqe_alloc = i * (buf_chk_sz / eq->eqe_size);
- size = (eq->entries - eqe_alloc) * eq->eqe_size;
- }
- eq->buf[i] = dma_alloc_coherent(dev, size,
- &(eq->buf_dma[i]),
- GFP_KERNEL);
- if (!eq->buf[i])
- goto err_dma_alloc_buf;
+ hns_roce_mtr_cleanup(hr_dev, &eq->mtr);
+done:
+ hns_roce_free_buf_list(&buf_list, 1);
- *(eq->bt_l0 + i) = eq->buf_dma[i];
+ return ret;
+}
- eq_buf_cnt++;
- if (eq_buf_cnt >= ba_num)
- break;
- }
- eq->cur_eqe_ba = eq->buf_dma[0];
- if (ba_num > 1)
- eq->nxt_eqe_ba = eq->buf_dma[1];
+static int alloc_eq_buf(struct hns_roce_dev *hr_dev, struct hns_roce_eq *eq)
+{
+ struct hns_roce_buf *buf = &eq->buf;
+ bool is_mhop = false;
+ u32 page_shift;
+ u32 mhop_num;
+ u32 max_size;
+ int ret;
- } else if (mhop_num == 2) {
- /* alloc L1 BT and buf */
- for (i = 0; i < bt_chk_sz / BA_BYTE_LEN; i++) {
- eq->bt_l1[i] = dma_alloc_coherent(dev, bt_chk_sz,
- &(eq->l1_dma[i]),
- GFP_KERNEL);
- if (!eq->bt_l1[i])
- goto err_dma_alloc_l1;
- *(eq->bt_l0 + i) = eq->l1_dma[i];
-
- for (j = 0; j < bt_chk_sz / BA_BYTE_LEN; j++) {
- idx = i * bt_chk_sz / BA_BYTE_LEN + j;
- if (eq_buf_cnt + 1 < ba_num) {
- size = buf_chk_sz;
- } else {
- eqe_alloc = (buf_chk_sz / eq->eqe_size)
- * idx;
- size = (eq->entries - eqe_alloc)
- * eq->eqe_size;
- }
- eq->buf[idx] = dma_alloc_coherent(dev, size,
- &(eq->buf_dma[idx]),
- GFP_KERNEL);
- if (!eq->buf[idx])
- goto err_dma_alloc_buf;
-
- *(eq->bt_l1[i] + j) = eq->buf_dma[idx];
-
- eq_buf_cnt++;
- if (eq_buf_cnt >= ba_num) {
- eq_alloc_done = 1;
- break;
- }
- }
-
- if (eq_alloc_done)
- break;
- }
- eq->cur_eqe_ba = eq->buf_dma[0];
- if (ba_num > 1)
- eq->nxt_eqe_ba = eq->buf_dma[1];
+ page_shift = PAGE_SHIFT + hr_dev->caps.eqe_buf_pg_sz;
+ mhop_num = hr_dev->caps.eqe_hop_num;
+ if (!mhop_num) {
+ max_size = 1 << page_shift;
+ buf->size = max_size;
+ } else if (mhop_num == HNS_ROCE_HOP_NUM_0) {
+ max_size = eq->entries * eq->eqe_size;
+ buf->size = max_size;
+ } else {
+ max_size = 1 << page_shift;
+ buf->size = PAGE_ALIGN(eq->entries * eq->eqe_size);
+ is_mhop = true;
}
- eq->l0_last_num = i + 1;
- if (mhop_num == 2)
- eq->l1_last_num = j + 1;
+ ret = hns_roce_buf_alloc(hr_dev, buf->size, max_size, buf, page_shift);
+ if (ret) {
+ dev_err(hr_dev->dev, "alloc eq buf error\n");
+ return ret;
+ }
+
+ if (is_mhop) {
+ ret = map_eq_buf(hr_dev, eq, page_shift);
+ if (ret) {
+ dev_err(hr_dev->dev, "map roce buf error\n");
+ goto err_alloc;
+ }
+ }
return 0;
-
-err_dma_alloc_l1:
- dma_free_coherent(dev, bt_chk_sz, eq->bt_l0, eq->l0_dma);
- eq->bt_l0 = NULL;
- eq->l0_dma = 0;
- for (i -= 1; i >= 0; i--) {
- dma_free_coherent(dev, bt_chk_sz, eq->bt_l1[i],
- eq->l1_dma[i]);
-
- for (j = 0; j < bt_chk_sz / BA_BYTE_LEN; j++) {
- idx = i * bt_chk_sz / BA_BYTE_LEN + j;
- dma_free_coherent(dev, buf_chk_sz, eq->buf[idx],
- eq->buf_dma[idx]);
- }
- }
- goto err_dma_alloc_l0;
-
-err_dma_alloc_buf:
- dma_free_coherent(dev, bt_chk_sz, eq->bt_l0, eq->l0_dma);
- eq->bt_l0 = NULL;
- eq->l0_dma = 0;
-
- if (mhop_num == 1)
- for (i -= 1; i >= 0; i--)
- dma_free_coherent(dev, buf_chk_sz, eq->buf[i],
- eq->buf_dma[i]);
- else if (mhop_num == 2) {
- record_i = i;
- record_j = j;
- for (; i >= 0; i--) {
- dma_free_coherent(dev, bt_chk_sz, eq->bt_l1[i],
- eq->l1_dma[i]);
-
- for (j = 0; j < bt_chk_sz / BA_BYTE_LEN; j++) {
- if (i == record_i && j >= record_j)
- break;
-
- idx = i * bt_chk_sz / BA_BYTE_LEN + j;
- dma_free_coherent(dev, buf_chk_sz,
- eq->buf[idx],
- eq->buf_dma[idx]);
- }
- }
- }
-
-err_dma_alloc_l0:
- kfree(eq->bt_l1);
- eq->bt_l1 = NULL;
-
-err_kcalloc_bt_l1:
- kfree(eq->l1_dma);
- eq->l1_dma = NULL;
-
-err_kcalloc_l1_dma:
- kfree(eq->buf);
- eq->buf = NULL;
-
-err_kcalloc_buf:
- kfree(eq->buf_dma);
- eq->buf_dma = NULL;
-
- return -ENOMEM;
+err_alloc:
+ hns_roce_buf_free(hr_dev, buf->size, buf);
+ return ret;
}
static int hns_roce_v2_create_eq(struct hns_roce_dev *hr_dev,
struct hns_roce_eq *eq,
unsigned int eq_cmd)
{
- struct device *dev = hr_dev->dev;
struct hns_roce_cmd_mailbox *mailbox;
- u32 buf_chk_sz = 0;
int ret;
/* Allocate mailbox memory */
@@ -6081,38 +5502,17 @@ static int hns_roce_v2_create_eq(struct hns_roce_dev *hr_dev,
if (IS_ERR(mailbox))
return PTR_ERR(mailbox);
- if (!hr_dev->caps.eqe_hop_num) {
- buf_chk_sz = 1 << (hr_dev->caps.eqe_buf_pg_sz + PAGE_SHIFT);
-
- eq->buf_list = kzalloc(sizeof(struct hns_roce_buf_list),
- GFP_KERNEL);
- if (!eq->buf_list) {
- ret = -ENOMEM;
- goto free_cmd_mbox;
- }
-
- eq->buf_list->buf = dma_alloc_coherent(dev, buf_chk_sz,
- &(eq->buf_list->map),
- GFP_KERNEL);
- if (!eq->buf_list->buf) {
- ret = -ENOMEM;
- goto err_alloc_buf;
- }
-
- } else {
- ret = hns_roce_mhop_alloc_eq(hr_dev, eq);
- if (ret) {
- ret = -ENOMEM;
- goto free_cmd_mbox;
- }
+ ret = alloc_eq_buf(hr_dev, eq);
+ if (ret) {
+ ret = -ENOMEM;
+ goto free_cmd_mbox;
}
-
hns_roce_config_eqc(hr_dev, eq, mailbox->buf);
ret = hns_roce_cmd_mbox(hr_dev, mailbox->dma, 0, eq->eqn, 0,
eq_cmd, HNS_ROCE_CMD_TIMEOUT_MSECS);
if (ret) {
- dev_err(dev, "[mailbox cmd] create eqc failed.\n");
+ dev_err(hr_dev->dev, "[mailbox cmd] create eqc failed.\n");
goto err_cmd_mbox;
}
@@ -6121,16 +5521,7 @@ static int hns_roce_v2_create_eq(struct hns_roce_dev *hr_dev,
return 0;
err_cmd_mbox:
- if (!hr_dev->caps.eqe_hop_num)
- dma_free_coherent(dev, buf_chk_sz, eq->buf_list->buf,
- eq->buf_list->map);
- else {
- hns_roce_mhop_free_eq(hr_dev, eq);
- goto free_cmd_mbox;
- }
-
-err_alloc_buf:
- kfree(eq->buf_list);
+ free_eq_buf(hr_dev, eq);
free_cmd_mbox:
hns_roce_free_cmd_mailbox(hr_dev, mailbox);
@@ -6292,8 +5683,7 @@ static int hns_roce_v2_init_eq_table(struct hns_roce_dev *hr_dev)
goto err_request_irq_fail;
}
- hr_dev->irq_workq =
- create_singlethread_workqueue("hns_roce_irq_workqueue");
+ hr_dev->irq_workq = alloc_ordered_workqueue("hns_roce_irq_workq", 0);
if (!hr_dev->irq_workq) {
dev_err(dev, "Create irq workqueue failed!\n");
ret = -ENOMEM;
@@ -6310,7 +5700,7 @@ static int hns_roce_v2_init_eq_table(struct hns_roce_dev *hr_dev)
err_create_eq_fail:
for (i -= 1; i >= 0; i--)
- hns_roce_v2_free_eq(hr_dev, &eq_table->eq[i]);
+ free_eq_buf(hr_dev, &eq_table->eq[i]);
kfree(eq_table->eq);
return ret;
@@ -6332,7 +5722,7 @@ static void hns_roce_v2_cleanup_eq_table(struct hns_roce_dev *hr_dev)
for (i = 0; i < eq_num; i++) {
hns_roce_v2_destroy_eqc(hr_dev, i);
- hns_roce_v2_free_eq(hr_dev, &eq_table->eq[i]);
+ free_eq_buf(hr_dev, &eq_table->eq[i]);
}
kfree(eq_table->eq);
@@ -6472,8 +5862,9 @@ static int hns_roce_v2_modify_srq(struct ib_srq *ibsrq,
HNS_ROCE_CMD_TIMEOUT_MSECS);
hns_roce_free_cmd_mailbox(hr_dev, mailbox);
if (ret) {
- dev_err(hr_dev->dev,
- "MODIFY SRQ Failed to cmd mailbox.\n");
+ ibdev_err(&hr_dev->ib_dev,
+ "failed to process cmd when modifying SRQ, ret = %d\n",
+ ret);
return ret;
}
}
@@ -6499,7 +5890,9 @@ static int hns_roce_v2_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr)
HNS_ROCE_CMD_QUERY_SRQC,
HNS_ROCE_CMD_TIMEOUT_MSECS);
if (ret) {
- dev_err(hr_dev->dev, "QUERY SRQ cmd process error\n");
+ ibdev_err(&hr_dev->ib_dev,
+ "failed to process cmd when querying SRQ, ret = %d\n",
+ ret);
goto out;
}
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 2a117ff6a6be..82dd9f6f4845 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -50,15 +50,14 @@
#define HNS_ROCE_V2_MAX_WQE_NUM 0x8000
#define HNS_ROCE_V2_MAX_SRQ 0x100000
#define HNS_ROCE_V2_MAX_SRQ_WR 0x8000
-#define HNS_ROCE_V2_MAX_SRQ_SGE 0x100
+#define HNS_ROCE_V2_MAX_SRQ_SGE 64
#define HNS_ROCE_V2_MAX_CQ_NUM 0x100000
#define HNS_ROCE_V2_MAX_CQC_TIMER_NUM 0x100
#define HNS_ROCE_V2_MAX_SRQ_NUM 0x100000
#define HNS_ROCE_V2_MAX_CQE_NUM 0x400000
#define HNS_ROCE_V2_MAX_SRQWQE_NUM 0x8000
-#define HNS_ROCE_V2_MAX_RQ_SGE_NUM 0x100
-#define HNS_ROCE_V2_MAX_SQ_SGE_NUM 0xff
-#define HNS_ROCE_V2_MAX_SRQ_SGE_NUM 0x100
+#define HNS_ROCE_V2_MAX_RQ_SGE_NUM 64
+#define HNS_ROCE_V2_MAX_SQ_SGE_NUM 64
#define HNS_ROCE_V2_MAX_EXTEND_SGE_NUM 0x200000
#define HNS_ROCE_V2_MAX_SQ_INLINE 0x20
#define HNS_ROCE_V2_UAR_NUM 256
@@ -163,7 +162,7 @@ enum {
#define GID_LEN_V2 16
-#define HNS_ROCE_V2_CQE_QPN_MASK 0x3ffff
+#define HNS_ROCE_V2_CQE_QPN_MASK 0xfffff
enum {
HNS_ROCE_V2_WQE_OP_SEND = 0x0,
@@ -460,8 +459,8 @@ enum hns_roce_v2_qp_state {
HNS_ROCE_QP_ST_INIT,
HNS_ROCE_QP_ST_RTR,
HNS_ROCE_QP_ST_RTS,
- HNS_ROCE_QP_ST_SQER,
HNS_ROCE_QP_ST_SQD,
+ HNS_ROCE_QP_ST_SQER,
HNS_ROCE_QP_ST_ERR,
HNS_ROCE_QP_ST_SQ_DRAINING,
HNS_ROCE_QP_NUM_ST
@@ -1056,11 +1055,6 @@ struct hns_roce_v2_mpt_entry {
#define V2_DB_PARAMETER_SL_S 16
#define V2_DB_PARAMETER_SL_M GENMASK(18, 16)
-struct hns_roce_v2_cq_db {
- __le32 byte_4;
- __le32 parameter;
-};
-
#define V2_CQ_DB_BYTE_4_TAG_S 0
#define V2_CQ_DB_BYTE_4_TAG_M GENMASK(23, 0)
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index b9898e71655a..176f34692f88 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -243,7 +243,7 @@ int hns_roce_mtt_init(struct hns_roce_dev *hr_dev, int npages, int page_shift,
/* Allocate MTT entry */
ret = hns_roce_alloc_mtt_range(hr_dev, mtt->order, &mtt->first_seg,
mtt->mtt_type);
- if (ret == -1)
+ if (ret)
return -ENOMEM;
return 0;
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c b/drivers/infiniband/hw/hns/hns_roce_pd.c
index 780c780fdb22..b10c50b8736e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_pd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
@@ -60,14 +60,12 @@ void hns_roce_cleanup_pd_table(struct hns_roce_dev *hr_dev)
int hns_roce_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
{
struct ib_device *ib_dev = ibpd->device;
- struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
- struct device *dev = hr_dev->dev;
struct hns_roce_pd *pd = to_hr_pd(ibpd);
int ret;
ret = hns_roce_pd_alloc(to_hr_dev(ib_dev), &pd->pdn);
if (ret) {
- dev_err(dev, "[alloc_pd]hns_roce_pd_alloc failed!\n");
+ ibdev_err(ib_dev, "failed to alloc pd, ret = %d\n", ret);
return ret;
}
@@ -76,7 +74,7 @@ int hns_roce_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
if (ib_copy_to_udata(udata, &uresp, sizeof(uresp))) {
hns_roce_pd_free(to_hr_dev(ib_dev), pd->pdn);
- dev_err(dev, "[alloc_pd]ib_copy_to_udata failed!\n");
+ ibdev_err(ib_dev, "failed to copy to udata\n");
return -EFAULT;
}
}
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index 3257ad11be48..6317901c4b4f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -43,6 +43,45 @@
#define SQP_NUM (2 * HNS_ROCE_MAX_PORTS)
+static void flush_work_handle(struct work_struct *work)
+{
+ struct hns_roce_work *flush_work = container_of(work,
+ struct hns_roce_work, work);
+ struct hns_roce_qp *hr_qp = container_of(flush_work,
+ struct hns_roce_qp, flush_work);
+ struct device *dev = flush_work->hr_dev->dev;
+ struct ib_qp_attr attr;
+ int attr_mask;
+ int ret;
+
+ attr_mask = IB_QP_STATE;
+ attr.qp_state = IB_QPS_ERR;
+
+ if (test_and_clear_bit(HNS_ROCE_FLUSH_FLAG, &hr_qp->flush_flag)) {
+ ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL);
+ if (ret)
+ dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n",
+ ret);
+ }
+
+ /*
+ * make sure we signal QP destroy leg that flush QP was completed
+ * so that it can safely proceed ahead now and destroy QP
+ */
+ if (atomic_dec_and_test(&hr_qp->refcount))
+ complete(&hr_qp->free);
+}
+
+void init_flush_work(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+{
+ struct hns_roce_work *flush_work = &hr_qp->flush_work;
+
+ flush_work->hr_dev = hr_dev;
+ INIT_WORK(&flush_work->work, flush_work_handle);
+ atomic_inc(&hr_qp->refcount);
+ queue_work(hr_dev->irq_workq, &flush_work->work);
+}
+
void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type)
{
struct device *dev = hr_dev->dev;
@@ -59,6 +98,15 @@ void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type)
return;
}
+ if (hr_dev->hw_rev != HNS_ROCE_HW_VER1 &&
+ (event_type == HNS_ROCE_EVENT_TYPE_WQ_CATAS_ERROR ||
+ event_type == HNS_ROCE_EVENT_TYPE_INV_REQ_LOCAL_WQ_ERROR ||
+ event_type == HNS_ROCE_EVENT_TYPE_LOCAL_WQ_ACCESS_ERROR)) {
+ qp->state = IB_QPS_ERR;
+ if (!test_and_set_bit(HNS_ROCE_FLUSH_FLAG, &qp->flush_flag))
+ init_flush_work(hr_dev, qp);
+ }
+
qp->event(qp, (enum hns_roce_event)event_type);
if (atomic_dec_and_test(&qp->refcount))
@@ -108,15 +156,34 @@ static void hns_roce_ib_qp_event(struct hns_roce_qp *hr_qp,
}
}
-static int hns_roce_reserve_range_qp(struct hns_roce_dev *hr_dev, int cnt,
- int align, unsigned long *base)
+static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
{
- struct hns_roce_qp_table *qp_table = &hr_dev->qp_table;
+ unsigned long num = 0;
+ int ret;
- return hns_roce_bitmap_alloc_range(&qp_table->bitmap, cnt, align,
- base) ?
- -ENOMEM :
- 0;
+ if (hr_qp->ibqp.qp_type == IB_QPT_GSI) {
+ /* when hw version is v1, the sqpn is allocated */
+ if (hr_dev->hw_rev == HNS_ROCE_HW_VER1)
+ num = HNS_ROCE_MAX_PORTS +
+ hr_dev->iboe.phy_port[hr_qp->port];
+ else
+ num = 1;
+
+ hr_qp->doorbell_qpn = 1;
+ } else {
+ ret = hns_roce_bitmap_alloc_range(&hr_dev->qp_table.bitmap,
+ 1, 1, &num);
+ if (ret) {
+ ibdev_err(&hr_dev->ib_dev, "Failed to alloc bitmap\n");
+ return -ENOMEM;
+ }
+
+ hr_qp->doorbell_qpn = (u32)num;
+ }
+
+ hr_qp->qpn = num;
+
+ return 0;
}
enum hns_roce_qp_state to_hns_roce_state(enum ib_qp_state state)
@@ -139,50 +206,75 @@ enum hns_roce_qp_state to_hns_roce_state(enum ib_qp_state state)
}
}
-static int hns_roce_gsi_qp_alloc(struct hns_roce_dev *hr_dev, unsigned long qpn,
- struct hns_roce_qp *hr_qp)
+static void add_qp_to_list(struct hns_roce_dev *hr_dev,
+ struct hns_roce_qp *hr_qp,
+ struct ib_cq *send_cq, struct ib_cq *recv_cq)
+{
+ struct hns_roce_cq *hr_send_cq, *hr_recv_cq;
+ unsigned long flags;
+
+ hr_send_cq = send_cq ? to_hr_cq(send_cq) : NULL;
+ hr_recv_cq = recv_cq ? to_hr_cq(recv_cq) : NULL;
+
+ spin_lock_irqsave(&hr_dev->qp_list_lock, flags);
+ hns_roce_lock_cqs(hr_send_cq, hr_recv_cq);
+
+ list_add_tail(&hr_qp->node, &hr_dev->qp_list);
+ if (hr_send_cq)
+ list_add_tail(&hr_qp->sq_node, &hr_send_cq->sq_list);
+ if (hr_recv_cq)
+ list_add_tail(&hr_qp->rq_node, &hr_recv_cq->rq_list);
+
+ hns_roce_unlock_cqs(hr_send_cq, hr_recv_cq);
+ spin_unlock_irqrestore(&hr_dev->qp_list_lock, flags);
+}
+
+static int hns_roce_qp_store(struct hns_roce_dev *hr_dev,
+ struct hns_roce_qp *hr_qp,
+ struct ib_qp_init_attr *init_attr)
{
struct xarray *xa = &hr_dev->qp_table_xa;
int ret;
- if (!qpn)
+ if (!hr_qp->qpn)
return -EINVAL;
- hr_qp->qpn = qpn;
- atomic_set(&hr_qp->refcount, 1);
- init_completion(&hr_qp->free);
-
- ret = xa_err(xa_store_irq(xa, hr_qp->qpn & (hr_dev->caps.num_qps - 1),
- hr_qp, GFP_KERNEL));
+ ret = xa_err(xa_store_irq(xa, hr_qp->qpn, hr_qp, GFP_KERNEL));
if (ret)
- dev_err(hr_dev->dev, "QPC xa_store failed\n");
+ dev_err(hr_dev->dev, "Failed to xa store for QPC\n");
+ else
+ /* add QP to device's QP list for softwc */
+ add_qp_to_list(hr_dev, hr_qp, init_attr->send_cq,
+ init_attr->recv_cq);
return ret;
}
-static int hns_roce_qp_alloc(struct hns_roce_dev *hr_dev, unsigned long qpn,
- struct hns_roce_qp *hr_qp)
+static int alloc_qpc(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
{
struct hns_roce_qp_table *qp_table = &hr_dev->qp_table;
struct device *dev = hr_dev->dev;
int ret;
- if (!qpn)
+ if (!hr_qp->qpn)
return -EINVAL;
- hr_qp->qpn = qpn;
+ /* In v1 engine, GSI QP context is saved in the RoCE hw's register */
+ if (hr_qp->ibqp.qp_type == IB_QPT_GSI &&
+ hr_dev->hw_rev == HNS_ROCE_HW_VER1)
+ return 0;
/* Alloc memory for QPC */
ret = hns_roce_table_get(hr_dev, &qp_table->qp_table, hr_qp->qpn);
if (ret) {
- dev_err(dev, "QPC table get failed\n");
+ dev_err(dev, "Failed to get QPC table\n");
goto err_out;
}
/* Alloc memory for IRRL */
ret = hns_roce_table_get(hr_dev, &qp_table->irrl_table, hr_qp->qpn);
if (ret) {
- dev_err(dev, "IRRL table get failed\n");
+ dev_err(dev, "Failed to get IRRL table\n");
goto err_put_qp;
}
@@ -191,7 +283,7 @@ static int hns_roce_qp_alloc(struct hns_roce_dev *hr_dev, unsigned long qpn,
ret = hns_roce_table_get(hr_dev, &qp_table->trrl_table,
hr_qp->qpn);
if (ret) {
- dev_err(dev, "TRRL table get failed\n");
+ dev_err(dev, "Failed to get TRRL table\n");
goto err_put_irrl;
}
}
@@ -201,22 +293,13 @@ static int hns_roce_qp_alloc(struct hns_roce_dev *hr_dev, unsigned long qpn,
ret = hns_roce_table_get(hr_dev, &qp_table->sccc_table,
hr_qp->qpn);
if (ret) {
- dev_err(dev, "SCC CTX table get failed\n");
+ dev_err(dev, "Failed to get SCC CTX table\n");
goto err_put_trrl;
}
}
- ret = hns_roce_gsi_qp_alloc(hr_dev, qpn, hr_qp);
- if (ret)
- goto err_put_sccc;
-
return 0;
-err_put_sccc:
- if (hr_dev->caps.sccc_entry_sz)
- hns_roce_table_put(hr_dev, &qp_table->sccc_table,
- hr_qp->qpn);
-
err_put_trrl:
if (hr_dev->caps.trrl_entry_sz)
hns_roce_table_put(hr_dev, &qp_table->trrl_table, hr_qp->qpn);
@@ -236,88 +319,84 @@ void hns_roce_qp_remove(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
struct xarray *xa = &hr_dev->qp_table_xa;
unsigned long flags;
+ list_del(&hr_qp->node);
+ list_del(&hr_qp->sq_node);
+ list_del(&hr_qp->rq_node);
+
xa_lock_irqsave(xa, flags);
__xa_erase(xa, hr_qp->qpn & (hr_dev->caps.num_qps - 1));
xa_unlock_irqrestore(xa, flags);
}
-void hns_roce_qp_free(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+static void free_qpc(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
{
struct hns_roce_qp_table *qp_table = &hr_dev->qp_table;
- if (atomic_dec_and_test(&hr_qp->refcount))
- complete(&hr_qp->free);
- wait_for_completion(&hr_qp->free);
-
- if ((hr_qp->ibqp.qp_type) != IB_QPT_GSI) {
- if (hr_dev->caps.trrl_entry_sz)
- hns_roce_table_put(hr_dev, &qp_table->trrl_table,
- hr_qp->qpn);
- hns_roce_table_put(hr_dev, &qp_table->irrl_table, hr_qp->qpn);
- }
-}
-
-void hns_roce_release_range_qp(struct hns_roce_dev *hr_dev, int base_qpn,
- int cnt)
-{
- struct hns_roce_qp_table *qp_table = &hr_dev->qp_table;
-
- if (base_qpn < hr_dev->caps.reserved_qps)
+ /* In v1 engine, GSI QP context is saved in the RoCE hw's register */
+ if (hr_qp->ibqp.qp_type == IB_QPT_GSI &&
+ hr_dev->hw_rev == HNS_ROCE_HW_VER1)
return;
- hns_roce_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt, BITMAP_RR);
+ if (hr_dev->caps.trrl_entry_sz)
+ hns_roce_table_put(hr_dev, &qp_table->trrl_table, hr_qp->qpn);
+ hns_roce_table_put(hr_dev, &qp_table->irrl_table, hr_qp->qpn);
}
-static int hns_roce_set_rq_size(struct hns_roce_dev *hr_dev,
+static void free_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+{
+ struct hns_roce_qp_table *qp_table = &hr_dev->qp_table;
+
+ if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
+ return;
+
+ if (hr_qp->qpn < hr_dev->caps.reserved_qps)
+ return;
+
+ hns_roce_bitmap_free_range(&qp_table->bitmap, hr_qp->qpn, 1, BITMAP_RR);
+}
+
+static int set_rq_size(struct hns_roce_dev *hr_dev,
struct ib_qp_cap *cap, bool is_user, int has_rq,
struct hns_roce_qp *hr_qp)
{
- struct device *dev = hr_dev->dev;
u32 max_cnt;
- /* Check the validity of QP support capacity */
- if (cap->max_recv_wr > hr_dev->caps.max_wqes ||
- cap->max_recv_sge > hr_dev->caps.max_rq_sg) {
- dev_err(dev, "RQ WR or sge error!max_recv_wr=%d max_recv_sge=%d\n",
- cap->max_recv_wr, cap->max_recv_sge);
- return -EINVAL;
- }
-
/* If srq exist, set zero for relative number of rq */
if (!has_rq) {
hr_qp->rq.wqe_cnt = 0;
hr_qp->rq.max_gs = 0;
cap->max_recv_wr = 0;
cap->max_recv_sge = 0;
- } else {
- if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) {
- dev_err(dev, "user space no need config max_recv_wr max_recv_sge\n");
- return -EINVAL;
- }
- if (hr_dev->caps.min_wqes)
- max_cnt = max(cap->max_recv_wr, hr_dev->caps.min_wqes);
- else
- max_cnt = cap->max_recv_wr;
-
- hr_qp->rq.wqe_cnt = roundup_pow_of_two(max_cnt);
-
- if ((u32)hr_qp->rq.wqe_cnt > hr_dev->caps.max_wqes) {
- dev_err(dev, "while setting rq size, rq.wqe_cnt too large\n");
- return -EINVAL;
- }
-
- max_cnt = max(1U, cap->max_recv_sge);
- hr_qp->rq.max_gs = roundup_pow_of_two(max_cnt);
- if (hr_dev->caps.max_rq_sg <= 2)
- hr_qp->rq.wqe_shift =
- ilog2(hr_dev->caps.max_rq_desc_sz);
- else
- hr_qp->rq.wqe_shift =
- ilog2(hr_dev->caps.max_rq_desc_sz
- * hr_qp->rq.max_gs);
+ return 0;
}
+ /* Check the validity of QP support capacity */
+ if (!cap->max_recv_wr || cap->max_recv_wr > hr_dev->caps.max_wqes ||
+ cap->max_recv_sge > hr_dev->caps.max_rq_sg) {
+ ibdev_err(&hr_dev->ib_dev, "RQ config error, depth=%u, sge=%d\n",
+ cap->max_recv_wr, cap->max_recv_sge);
+ return -EINVAL;
+ }
+
+ max_cnt = max(cap->max_recv_wr, hr_dev->caps.min_wqes);
+
+ hr_qp->rq.wqe_cnt = roundup_pow_of_two(max_cnt);
+ if ((u32)hr_qp->rq.wqe_cnt > hr_dev->caps.max_wqes) {
+ ibdev_err(&hr_dev->ib_dev, "rq depth %u too large\n",
+ cap->max_recv_wr);
+ return -EINVAL;
+ }
+
+ max_cnt = max(1U, cap->max_recv_sge);
+ hr_qp->rq.max_gs = roundup_pow_of_two(max_cnt);
+
+ if (hr_dev->caps.max_rq_sg <= HNS_ROCE_SGE_IN_WQE)
+ hr_qp->rq.wqe_shift = ilog2(hr_dev->caps.max_rq_desc_sz);
+ else
+ hr_qp->rq.wqe_shift = ilog2(hr_dev->caps.max_rq_desc_sz *
+ hr_qp->rq.max_gs);
+
cap->max_recv_wr = hr_qp->rq.wqe_cnt;
cap->max_recv_sge = hr_qp->rq.max_gs;
@@ -334,12 +413,12 @@ static int check_sq_size_with_integrity(struct hns_roce_dev *hr_dev,
/* Sanity check SQ size before proceeding */
if (ucmd->log_sq_stride > max_sq_stride ||
ucmd->log_sq_stride < HNS_ROCE_IB_MIN_SQ_STRIDE) {
- ibdev_err(&hr_dev->ib_dev, "check SQ size error!\n");
+ ibdev_err(&hr_dev->ib_dev, "Failed to check SQ stride size\n");
return -EINVAL;
}
if (cap->max_send_sge > hr_dev->caps.max_sq_sg) {
- ibdev_err(&hr_dev->ib_dev, "SQ sge error! max_send_sge=%d\n",
+ ibdev_err(&hr_dev->ib_dev, "Failed to check SQ SGE size %d\n",
cap->max_send_sge);
return -EINVAL;
}
@@ -347,10 +426,9 @@ static int check_sq_size_with_integrity(struct hns_roce_dev *hr_dev,
return 0;
}
-static int hns_roce_set_user_sq_size(struct hns_roce_dev *hr_dev,
- struct ib_qp_cap *cap,
- struct hns_roce_qp *hr_qp,
- struct hns_roce_ib_create_qp *ucmd)
+static int set_user_sq_size(struct hns_roce_dev *hr_dev,
+ struct ib_qp_cap *cap, struct hns_roce_qp *hr_qp,
+ struct hns_roce_ib_create_qp *ucmd)
{
u32 ex_sge_num;
u32 page_size;
@@ -363,27 +441,28 @@ static int hns_roce_set_user_sq_size(struct hns_roce_dev *hr_dev,
ret = check_sq_size_with_integrity(hr_dev, cap, ucmd);
if (ret) {
- ibdev_err(&hr_dev->ib_dev, "Sanity check sq size failed\n");
+ ibdev_err(&hr_dev->ib_dev, "Failed to check user SQ size limit\n");
return ret;
}
hr_qp->sq.wqe_shift = ucmd->log_sq_stride;
max_cnt = max(1U, cap->max_send_sge);
- if (hr_dev->caps.max_sq_sg <= 2)
+ if (hr_dev->hw_rev == HNS_ROCE_HW_VER1)
hr_qp->sq.max_gs = roundup_pow_of_two(max_cnt);
else
hr_qp->sq.max_gs = max_cnt;
- if (hr_qp->sq.max_gs > 2)
+ if (hr_qp->sq.max_gs > HNS_ROCE_SGE_IN_WQE)
hr_qp->sge.sge_cnt = roundup_pow_of_two(hr_qp->sq.wqe_cnt *
(hr_qp->sq.max_gs - 2));
- if ((hr_qp->sq.max_gs > 2) && (hr_dev->pci_dev->revision == 0x20)) {
+ if (hr_qp->sq.max_gs > HNS_ROCE_SGE_IN_WQE &&
+ hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_A) {
if (hr_qp->sge.sge_cnt > hr_dev->caps.max_extend_sg) {
- dev_err(hr_dev->dev,
- "The extended sge cnt error! sge_cnt=%d\n",
- hr_qp->sge.sge_cnt);
+ ibdev_err(&hr_dev->ib_dev,
+ "Failed to check extended SGE size limit %d\n",
+ hr_qp->sge.sge_cnt);
return -EINVAL;
}
}
@@ -392,7 +471,7 @@ static int hns_roce_set_user_sq_size(struct hns_roce_dev *hr_dev,
ex_sge_num = hr_qp->sge.sge_cnt;
/* Get buf size, SQ and RQ are aligned to page_szie */
- if (hr_dev->caps.max_sq_sg <= 2) {
+ if (hr_dev->hw_rev == HNS_ROCE_HW_VER1) {
hr_qp->buff_size = round_up((hr_qp->rq.wqe_cnt <<
hr_qp->rq.wqe_shift), PAGE_SIZE) +
round_up((hr_qp->sq.wqe_cnt <<
@@ -492,30 +571,6 @@ static int split_wqe_buf_region(struct hns_roce_dev *hr_dev,
return region_cnt;
}
-static int calc_wqe_bt_page_shift(struct hns_roce_dev *hr_dev,
- struct hns_roce_buf_region *regions,
- int region_cnt)
-{
- int bt_pg_shift;
- int ba_num;
- int ret;
-
- bt_pg_shift = PAGE_SHIFT + hr_dev->caps.mtt_ba_pg_sz;
-
- /* all root ba entries must in one bt page */
- do {
- ba_num = (1 << bt_pg_shift) / BA_BYTE_LEN;
- ret = hns_roce_hem_list_calc_root_ba(regions, region_cnt,
- ba_num);
- if (ret <= ba_num)
- break;
-
- bt_pg_shift++;
- } while (ret > ba_num);
-
- return bt_pg_shift - PAGE_SHIFT;
-}
-
static int set_extend_sge_param(struct hns_roce_dev *hr_dev,
struct hns_roce_qp *hr_qp)
{
@@ -528,13 +583,15 @@ static int set_extend_sge_param(struct hns_roce_dev *hr_dev,
}
/* ud sqwqe's sge use extend sge */
- if (hr_dev->caps.max_sq_sg > 2 && hr_qp->ibqp.qp_type == IB_QPT_GSI) {
+ if (hr_dev->hw_rev != HNS_ROCE_HW_VER1 &&
+ hr_qp->ibqp.qp_type == IB_QPT_GSI) {
hr_qp->sge.sge_cnt = roundup_pow_of_two(hr_qp->sq.wqe_cnt *
hr_qp->sq.max_gs);
hr_qp->sge.sge_shift = 4;
}
- if ((hr_qp->sq.max_gs > 2) && hr_dev->pci_dev->revision == 0x20) {
+ if (hr_qp->sq.max_gs > 2 &&
+ hr_dev->pci_dev->revision == PCI_REVISION_ID_HIP08_A) {
if (hr_qp->sge.sge_cnt > hr_dev->caps.max_extend_sg) {
dev_err(dev, "The extended sge cnt error! sge_cnt=%d\n",
hr_qp->sge.sge_cnt);
@@ -545,46 +602,43 @@ static int set_extend_sge_param(struct hns_roce_dev *hr_dev,
return 0;
}
-static int hns_roce_set_kernel_sq_size(struct hns_roce_dev *hr_dev,
- struct ib_qp_cap *cap,
- struct hns_roce_qp *hr_qp)
+static int set_kernel_sq_size(struct hns_roce_dev *hr_dev,
+ struct ib_qp_cap *cap, struct hns_roce_qp *hr_qp)
{
- struct device *dev = hr_dev->dev;
u32 page_size;
u32 max_cnt;
int size;
int ret;
- if (cap->max_send_wr > hr_dev->caps.max_wqes ||
+ if (!cap->max_send_wr || cap->max_send_wr > hr_dev->caps.max_wqes ||
cap->max_send_sge > hr_dev->caps.max_sq_sg ||
cap->max_inline_data > hr_dev->caps.max_sq_inline) {
- dev_err(dev, "SQ WR or sge or inline data error!\n");
+ ibdev_err(&hr_dev->ib_dev,
+ "SQ WR or sge or inline data error!\n");
return -EINVAL;
}
hr_qp->sq.wqe_shift = ilog2(hr_dev->caps.max_sq_desc_sz);
- if (hr_dev->caps.min_wqes)
- max_cnt = max(cap->max_send_wr, hr_dev->caps.min_wqes);
- else
- max_cnt = cap->max_send_wr;
+ max_cnt = max(cap->max_send_wr, hr_dev->caps.min_wqes);
hr_qp->sq.wqe_cnt = roundup_pow_of_two(max_cnt);
if ((u32)hr_qp->sq.wqe_cnt > hr_dev->caps.max_wqes) {
- dev_err(dev, "while setting kernel sq size, sq.wqe_cnt too large\n");
+ ibdev_err(&hr_dev->ib_dev,
+ "while setting kernel sq size, sq.wqe_cnt too large\n");
return -EINVAL;
}
/* Get data_seg numbers */
max_cnt = max(1U, cap->max_send_sge);
- if (hr_dev->caps.max_sq_sg <= 2)
+ if (hr_dev->hw_rev == HNS_ROCE_HW_VER1)
hr_qp->sq.max_gs = roundup_pow_of_two(max_cnt);
else
hr_qp->sq.max_gs = max_cnt;
ret = set_extend_sge_param(hr_dev, hr_qp);
if (ret) {
- dev_err(dev, "set extend sge parameters fail\n");
+ ibdev_err(&hr_dev->ib_dev, "set extend sge parameters fail\n");
return ret;
}
@@ -593,7 +647,7 @@ static int hns_roce_set_kernel_sq_size(struct hns_roce_dev *hr_dev,
hr_qp->sq.offset = 0;
size = round_up(hr_qp->sq.wqe_cnt << hr_qp->sq.wqe_shift, page_size);
- if (hr_dev->caps.max_sq_sg > 2 && hr_qp->sge.sge_cnt) {
+ if (hr_dev->hw_rev != HNS_ROCE_HW_VER1 && hr_qp->sge.sge_cnt) {
hr_qp->sge.sge_cnt = max(page_size/(1 << hr_qp->sge.sge_shift),
(u32)hr_qp->sge.sge_cnt);
hr_qp->sge.offset = size;
@@ -677,53 +731,292 @@ static void free_rq_inline_buf(struct hns_roce_qp *hr_qp)
kfree(hr_qp->rq_inl_buf.wqe_list);
}
-static void add_qp_to_list(struct hns_roce_dev *hr_dev,
- struct hns_roce_qp *hr_qp,
- struct ib_cq *send_cq, struct ib_cq *recv_cq)
+static int map_wqe_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ u32 page_shift, bool is_user)
{
- struct hns_roce_cq *hr_send_cq, *hr_recv_cq;
- unsigned long flags;
-
- hr_send_cq = send_cq ? to_hr_cq(send_cq) : NULL;
- hr_recv_cq = recv_cq ? to_hr_cq(recv_cq) : NULL;
-
- spin_lock_irqsave(&hr_dev->qp_list_lock, flags);
- hns_roce_lock_cqs(hr_send_cq, hr_recv_cq);
-
- list_add_tail(&hr_qp->node, &hr_dev->qp_list);
- if (hr_send_cq)
- list_add_tail(&hr_qp->sq_node, &hr_send_cq->sq_list);
- if (hr_recv_cq)
- list_add_tail(&hr_qp->rq_node, &hr_recv_cq->rq_list);
-
- hns_roce_unlock_cqs(hr_send_cq, hr_recv_cq);
- spin_unlock_irqrestore(&hr_dev->qp_list_lock, flags);
-}
-
-static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
- struct ib_pd *ib_pd,
- struct ib_qp_init_attr *init_attr,
- struct ib_udata *udata, unsigned long sqpn,
- struct hns_roce_qp *hr_qp)
-{
- dma_addr_t *buf_list[ARRAY_SIZE(hr_qp->regions)] = { NULL };
- struct device *dev = hr_dev->dev;
- struct hns_roce_ib_create_qp ucmd;
- struct hns_roce_ib_create_qp_resp resp = {};
- struct hns_roce_ucontext *uctx = rdma_udata_to_drv_context(
- udata, struct hns_roce_ucontext, ibucontext);
+/* WQE buffer include 3 parts: SQ, extend SGE and RQ. */
+#define HNS_ROCE_WQE_REGION_MAX 3
+ struct hns_roce_buf_region regions[HNS_ROCE_WQE_REGION_MAX] = {};
+ dma_addr_t *buf_list[HNS_ROCE_WQE_REGION_MAX] = {};
+ struct ib_device *ibdev = &hr_dev->ib_dev;
struct hns_roce_buf_region *r;
- unsigned long qpn = 0;
- u32 page_shift;
+ int region_count;
int buf_count;
int ret;
int i;
- mutex_init(&hr_qp->mutex);
- spin_lock_init(&hr_qp->sq.lock);
- spin_lock_init(&hr_qp->rq.lock);
+ region_count = split_wqe_buf_region(hr_dev, hr_qp, regions,
+ ARRAY_SIZE(regions), page_shift);
- hr_qp->state = IB_QPS_RESET;
+ /* alloc a tmp list to store WQE buffers address */
+ ret = hns_roce_alloc_buf_list(regions, buf_list, region_count);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to alloc WQE buffer list\n");
+ return ret;
+ }
+
+ for (i = 0; i < region_count; i++) {
+ r = ®ions[i];
+ if (is_user)
+ buf_count = hns_roce_get_umem_bufs(hr_dev, buf_list[i],
+ r->count, r->offset, hr_qp->umem,
+ page_shift);
+ else
+ buf_count = hns_roce_get_kmem_bufs(hr_dev, buf_list[i],
+ r->count, r->offset, &hr_qp->hr_buf);
+
+ if (buf_count != r->count) {
+ ibdev_err(ibdev, "Failed to get %s WQE buf, expect %d = %d.\n",
+ is_user ? "user" : "kernel",
+ r->count, buf_count);
+ ret = -ENOBUFS;
+ goto done;
+ }
+ }
+
+ hr_qp->wqe_bt_pg_shift = hr_dev->caps.mtt_ba_pg_sz;
+ hns_roce_mtr_init(&hr_qp->mtr, PAGE_SHIFT + hr_qp->wqe_bt_pg_shift,
+ page_shift);
+ ret = hns_roce_mtr_attach(hr_dev, &hr_qp->mtr, buf_list, regions,
+ region_count);
+ if (ret)
+ ibdev_err(ibdev, "Failed to attach WQE's mtr\n");
+
+ goto done;
+
+ hns_roce_mtr_cleanup(hr_dev, &hr_qp->mtr);
+done:
+ hns_roce_free_buf_list(buf_list, region_count);
+
+ return ret;
+}
+
+static int alloc_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata, unsigned long addr)
+{
+ u32 page_shift = PAGE_SHIFT + hr_dev->caps.mtt_buf_pg_sz;
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ bool is_rq_buf_inline;
+ int ret;
+
+ is_rq_buf_inline = (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) &&
+ hns_roce_qp_has_rq(init_attr);
+ if (is_rq_buf_inline) {
+ ret = alloc_rq_inline_buf(hr_qp, init_attr);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to alloc inline RQ buffer\n");
+ return ret;
+ }
+ }
+
+ if (udata) {
+ hr_qp->umem = ib_umem_get(ibdev, addr, hr_qp->buff_size, 0);
+ if (IS_ERR(hr_qp->umem)) {
+ ret = PTR_ERR(hr_qp->umem);
+ goto err_inline;
+ }
+ } else {
+ ret = hns_roce_buf_alloc(hr_dev, hr_qp->buff_size,
+ (1 << page_shift) * 2,
+ &hr_qp->hr_buf, page_shift);
+ if (ret)
+ goto err_inline;
+ }
+
+ ret = map_wqe_buf(hr_dev, hr_qp, page_shift, udata);
+ if (ret)
+ goto err_alloc;
+
+ return 0;
+
+err_inline:
+ if (is_rq_buf_inline)
+ free_rq_inline_buf(hr_qp);
+
+err_alloc:
+ if (udata) {
+ ib_umem_release(hr_qp->umem);
+ hr_qp->umem = NULL;
+ } else {
+ hns_roce_buf_free(hr_dev, hr_qp->buff_size, &hr_qp->hr_buf);
+ }
+
+ ibdev_err(ibdev, "Failed to alloc WQE buffer, ret %d.\n", ret);
+
+ return ret;
+}
+
+static void free_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
+{
+ hns_roce_mtr_cleanup(hr_dev, &hr_qp->mtr);
+ if (hr_qp->umem) {
+ ib_umem_release(hr_qp->umem);
+ hr_qp->umem = NULL;
+ }
+
+ if (hr_qp->hr_buf.nbufs > 0)
+ hns_roce_buf_free(hr_dev, hr_qp->buff_size, &hr_qp->hr_buf);
+
+ if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) &&
+ hr_qp->rq.wqe_cnt)
+ free_rq_inline_buf(hr_qp);
+}
+
+static inline bool user_qp_has_sdb(struct hns_roce_dev *hr_dev,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata,
+ struct hns_roce_ib_create_qp_resp *resp,
+ struct hns_roce_ib_create_qp *ucmd)
+{
+ return ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_SQ_RECORD_DB) &&
+ udata->outlen >= offsetofend(typeof(*resp), cap_flags) &&
+ hns_roce_qp_has_sq(init_attr) &&
+ udata->inlen >= offsetofend(typeof(*ucmd), sdb_addr));
+}
+
+static inline bool user_qp_has_rdb(struct hns_roce_dev *hr_dev,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata,
+ struct hns_roce_ib_create_qp_resp *resp)
+{
+ return ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
+ udata->outlen >= offsetofend(typeof(*resp), cap_flags) &&
+ hns_roce_qp_has_rq(init_attr));
+}
+
+static inline bool kernel_qp_has_rdb(struct hns_roce_dev *hr_dev,
+ struct ib_qp_init_attr *init_attr)
+{
+ return ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
+ hns_roce_qp_has_rq(init_attr));
+}
+
+static int alloc_qp_db(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata,
+ struct hns_roce_ib_create_qp *ucmd,
+ struct hns_roce_ib_create_qp_resp *resp)
+{
+ struct hns_roce_ucontext *uctx = rdma_udata_to_drv_context(
+ udata, struct hns_roce_ucontext, ibucontext);
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ int ret;
+
+ if (udata) {
+ if (user_qp_has_sdb(hr_dev, init_attr, udata, resp, ucmd)) {
+ ret = hns_roce_db_map_user(uctx, udata, ucmd->sdb_addr,
+ &hr_qp->sdb);
+ if (ret) {
+ ibdev_err(ibdev,
+ "Failed to map user SQ doorbell\n");
+ goto err_out;
+ }
+ hr_qp->sdb_en = 1;
+ resp->cap_flags |= HNS_ROCE_SUPPORT_SQ_RECORD_DB;
+ }
+
+ if (user_qp_has_rdb(hr_dev, init_attr, udata, resp)) {
+ ret = hns_roce_db_map_user(uctx, udata, ucmd->db_addr,
+ &hr_qp->rdb);
+ if (ret) {
+ ibdev_err(ibdev,
+ "Failed to map user RQ doorbell\n");
+ goto err_sdb;
+ }
+ hr_qp->rdb_en = 1;
+ resp->cap_flags |= HNS_ROCE_SUPPORT_RQ_RECORD_DB;
+ }
+ } else {
+ /* QP doorbell register address */
+ hr_qp->sq.db_reg_l = hr_dev->reg_base + hr_dev->sdb_offset +
+ DB_REG_OFFSET * hr_dev->priv_uar.index;
+ hr_qp->rq.db_reg_l = hr_dev->reg_base + hr_dev->odb_offset +
+ DB_REG_OFFSET * hr_dev->priv_uar.index;
+
+ if (kernel_qp_has_rdb(hr_dev, init_attr)) {
+ ret = hns_roce_alloc_db(hr_dev, &hr_qp->rdb, 0);
+ if (ret) {
+ ibdev_err(ibdev,
+ "Failed to alloc kernel RQ doorbell\n");
+ goto err_out;
+ }
+ *hr_qp->rdb.db_record = 0;
+ hr_qp->rdb_en = 1;
+ }
+ }
+
+ return 0;
+err_sdb:
+ if (udata && hr_qp->sdb_en)
+ hns_roce_db_unmap_user(uctx, &hr_qp->sdb);
+err_out:
+ return ret;
+}
+
+static void free_qp_db(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ struct ib_udata *udata)
+{
+ struct hns_roce_ucontext *uctx = rdma_udata_to_drv_context(
+ udata, struct hns_roce_ucontext, ibucontext);
+
+ if (udata) {
+ if (hr_qp->rdb_en)
+ hns_roce_db_unmap_user(uctx, &hr_qp->rdb);
+ if (hr_qp->sdb_en)
+ hns_roce_db_unmap_user(uctx, &hr_qp->sdb);
+ } else {
+ if (hr_qp->rdb_en)
+ hns_roce_free_db(hr_dev, &hr_qp->rdb);
+ }
+}
+
+static int alloc_kernel_wrid(struct hns_roce_dev *hr_dev,
+ struct hns_roce_qp *hr_qp)
+{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ u64 *sq_wrid = NULL;
+ u64 *rq_wrid = NULL;
+ int ret;
+
+ sq_wrid = kcalloc(hr_qp->sq.wqe_cnt, sizeof(u64), GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(sq_wrid)) {
+ ibdev_err(ibdev, "Failed to alloc SQ wrid\n");
+ return -ENOMEM;
+ }
+
+ if (hr_qp->rq.wqe_cnt) {
+ rq_wrid = kcalloc(hr_qp->rq.wqe_cnt, sizeof(u64), GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(rq_wrid)) {
+ ibdev_err(ibdev, "Failed to alloc RQ wrid\n");
+ ret = -ENOMEM;
+ goto err_sq;
+ }
+ }
+
+ hr_qp->sq.wrid = sq_wrid;
+ hr_qp->rq.wrid = rq_wrid;
+ return 0;
+err_sq:
+ kfree(sq_wrid);
+
+ return ret;
+}
+
+static void free_kernel_wrid(struct hns_roce_dev *hr_dev,
+ struct hns_roce_qp *hr_qp)
+{
+ kfree(hr_qp->rq.wrid);
+ kfree(hr_qp->sq.wrid);
+}
+
+static int set_qp_param(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata,
+ struct hns_roce_ib_create_qp *ucmd)
+{
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ int ret;
hr_qp->ibqp.qp_type = init_attr->qp_type;
@@ -732,309 +1025,157 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
else
hr_qp->sq_signal_bits = IB_SIGNAL_REQ_WR;
- ret = hns_roce_set_rq_size(hr_dev, &init_attr->cap, udata,
- hns_roce_qp_has_rq(init_attr), hr_qp);
+ ret = set_rq_size(hr_dev, &init_attr->cap, udata,
+ hns_roce_qp_has_rq(init_attr), hr_qp);
if (ret) {
- dev_err(dev, "hns_roce_set_rq_size failed\n");
- goto err_out;
+ ibdev_err(ibdev, "Failed to set user RQ size\n");
+ return ret;
}
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) &&
- hns_roce_qp_has_rq(init_attr)) {
- ret = alloc_rq_inline_buf(hr_qp, init_attr);
- if (ret) {
- dev_err(dev, "allocate receive inline buffer failed\n");
- goto err_out;
- }
- }
-
- page_shift = PAGE_SHIFT + hr_dev->caps.mtt_buf_pg_sz;
if (udata) {
- if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
- dev_err(dev, "ib_copy_from_udata error for create qp\n");
- ret = -EFAULT;
- goto err_alloc_rq_inline_buf;
+ if (ib_copy_from_udata(ucmd, udata, sizeof(*ucmd))) {
+ ibdev_err(ibdev, "Failed to copy QP ucmd\n");
+ return -EFAULT;
}
- ret = hns_roce_set_user_sq_size(hr_dev, &init_attr->cap, hr_qp,
- &ucmd);
- if (ret) {
- dev_err(dev, "hns_roce_set_user_sq_size error for create qp\n");
- goto err_alloc_rq_inline_buf;
- }
-
- hr_qp->umem = ib_umem_get(ib_pd->device, ucmd.buf_addr,
- hr_qp->buff_size, 0);
- if (IS_ERR(hr_qp->umem)) {
- dev_err(dev, "ib_umem_get error for create qp\n");
- ret = PTR_ERR(hr_qp->umem);
- goto err_alloc_rq_inline_buf;
- }
- hr_qp->region_cnt = split_wqe_buf_region(hr_dev, hr_qp,
- hr_qp->regions, ARRAY_SIZE(hr_qp->regions),
- page_shift);
- ret = hns_roce_alloc_buf_list(hr_qp->regions, buf_list,
- hr_qp->region_cnt);
- if (ret) {
- dev_err(dev, "alloc buf_list error for create qp\n");
- goto err_alloc_list;
- }
-
- for (i = 0; i < hr_qp->region_cnt; i++) {
- r = &hr_qp->regions[i];
- buf_count = hns_roce_get_umem_bufs(hr_dev,
- buf_list[i], r->count, r->offset,
- hr_qp->umem, page_shift);
- if (buf_count != r->count) {
- dev_err(dev,
- "get umem buf err, expect %d,ret %d.\n",
- r->count, buf_count);
- ret = -ENOBUFS;
- goto err_get_bufs;
- }
- }
-
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_SQ_RECORD_DB) &&
- (udata->inlen >= sizeof(ucmd)) &&
- (udata->outlen >= sizeof(resp)) &&
- hns_roce_qp_has_sq(init_attr)) {
- ret = hns_roce_db_map_user(uctx, udata, ucmd.sdb_addr,
- &hr_qp->sdb);
- if (ret) {
- dev_err(dev, "sq record doorbell map failed!\n");
- goto err_get_bufs;
- }
-
- /* indicate kernel supports sq record db */
- resp.cap_flags |= HNS_ROCE_SUPPORT_SQ_RECORD_DB;
- hr_qp->sdb_en = 1;
- }
-
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
- (udata->outlen >= sizeof(resp)) &&
- hns_roce_qp_has_rq(init_attr)) {
- ret = hns_roce_db_map_user(uctx, udata, ucmd.db_addr,
- &hr_qp->rdb);
- if (ret) {
- dev_err(dev, "rq record doorbell map failed!\n");
- goto err_sq_dbmap;
- }
-
- /* indicate kernel supports rq record db */
- resp.cap_flags |= HNS_ROCE_SUPPORT_RQ_RECORD_DB;
- hr_qp->rdb_en = 1;
- }
+ ret = set_user_sq_size(hr_dev, &init_attr->cap, hr_qp, ucmd);
+ if (ret)
+ ibdev_err(ibdev, "Failed to set user SQ size\n");
} else {
if (init_attr->create_flags &
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK) {
- dev_err(dev, "init_attr->create_flags error!\n");
- ret = -EINVAL;
- goto err_alloc_rq_inline_buf;
+ ibdev_err(ibdev, "Failed to check multicast loopback\n");
+ return -EINVAL;
}
if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) {
- dev_err(dev, "init_attr->create_flags error!\n");
- ret = -EINVAL;
- goto err_alloc_rq_inline_buf;
+ ibdev_err(ibdev, "Failed to check ipoib ud lso\n");
+ return -EINVAL;
}
- /* Set SQ size */
- ret = hns_roce_set_kernel_sq_size(hr_dev, &init_attr->cap,
- hr_qp);
- if (ret) {
- dev_err(dev, "hns_roce_set_kernel_sq_size error!\n");
- goto err_alloc_rq_inline_buf;
- }
-
- /* QP doorbell register address */
- hr_qp->sq.db_reg_l = hr_dev->reg_base + hr_dev->sdb_offset +
- DB_REG_OFFSET * hr_dev->priv_uar.index;
- hr_qp->rq.db_reg_l = hr_dev->reg_base + hr_dev->odb_offset +
- DB_REG_OFFSET * hr_dev->priv_uar.index;
-
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
- hns_roce_qp_has_rq(init_attr)) {
- ret = hns_roce_alloc_db(hr_dev, &hr_qp->rdb, 0);
- if (ret) {
- dev_err(dev, "rq record doorbell alloc failed!\n");
- goto err_alloc_rq_inline_buf;
- }
- *hr_qp->rdb.db_record = 0;
- hr_qp->rdb_en = 1;
- }
-
- /* Allocate QP buf */
- if (hns_roce_buf_alloc(hr_dev, hr_qp->buff_size,
- (1 << page_shift) * 2,
- &hr_qp->hr_buf, page_shift)) {
- dev_err(dev, "hns_roce_buf_alloc error!\n");
- ret = -ENOMEM;
- goto err_db;
- }
- hr_qp->region_cnt = split_wqe_buf_region(hr_dev, hr_qp,
- hr_qp->regions, ARRAY_SIZE(hr_qp->regions),
- page_shift);
- ret = hns_roce_alloc_buf_list(hr_qp->regions, buf_list,
- hr_qp->region_cnt);
- if (ret) {
- dev_err(dev, "alloc buf_list error for create qp!\n");
- goto err_alloc_list;
- }
-
- for (i = 0; i < hr_qp->region_cnt; i++) {
- r = &hr_qp->regions[i];
- buf_count = hns_roce_get_kmem_bufs(hr_dev,
- buf_list[i], r->count, r->offset,
- &hr_qp->hr_buf);
- if (buf_count != r->count) {
- dev_err(dev,
- "get kmem buf err, expect %d,ret %d.\n",
- r->count, buf_count);
- ret = -ENOBUFS;
- goto err_get_bufs;
- }
- }
-
- hr_qp->sq.wrid = kcalloc(hr_qp->sq.wqe_cnt, sizeof(u64),
- GFP_KERNEL);
- if (ZERO_OR_NULL_PTR(hr_qp->sq.wrid)) {
- ret = -ENOMEM;
- goto err_get_bufs;
- }
-
- if (hr_qp->rq.wqe_cnt) {
- hr_qp->rq.wrid = kcalloc(hr_qp->rq.wqe_cnt, sizeof(u64),
- GFP_KERNEL);
- if (ZERO_OR_NULL_PTR(hr_qp->rq.wrid)) {
- ret = -ENOMEM;
- goto err_sq_wrid;
- }
- }
+ ret = set_kernel_sq_size(hr_dev, &init_attr->cap, hr_qp);
+ if (ret)
+ ibdev_err(ibdev, "Failed to set kernel SQ size\n");
}
- if (sqpn) {
- qpn = sqpn;
- } else {
- /* Get QPN */
- ret = hns_roce_reserve_range_qp(hr_dev, 1, 1, &qpn);
- if (ret) {
- dev_err(dev, "hns_roce_reserve_range_qp alloc qpn error\n");
- goto err_wrid;
- }
- }
+ return ret;
+}
- hr_qp->wqe_bt_pg_shift = calc_wqe_bt_page_shift(hr_dev, hr_qp->regions,
- hr_qp->region_cnt);
- hns_roce_mtr_init(&hr_qp->mtr, PAGE_SHIFT + hr_qp->wqe_bt_pg_shift,
- page_shift);
- ret = hns_roce_mtr_attach(hr_dev, &hr_qp->mtr, buf_list,
- hr_qp->regions, hr_qp->region_cnt);
+static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
+ struct ib_pd *ib_pd,
+ struct ib_qp_init_attr *init_attr,
+ struct ib_udata *udata,
+ struct hns_roce_qp *hr_qp)
+{
+ struct hns_roce_ib_create_qp_resp resp = {};
+ struct ib_device *ibdev = &hr_dev->ib_dev;
+ struct hns_roce_ib_create_qp ucmd;
+ int ret;
+
+ mutex_init(&hr_qp->mutex);
+ spin_lock_init(&hr_qp->sq.lock);
+ spin_lock_init(&hr_qp->rq.lock);
+
+ hr_qp->state = IB_QPS_RESET;
+ hr_qp->flush_flag = 0;
+
+ ret = set_qp_param(hr_dev, hr_qp, init_attr, udata, &ucmd);
if (ret) {
- dev_err(dev, "mtr attach error for create qp\n");
- goto err_mtr;
+ ibdev_err(ibdev, "Failed to set QP param\n");
+ return ret;
}
- if (init_attr->qp_type == IB_QPT_GSI &&
- hr_dev->hw_rev == HNS_ROCE_HW_VER1) {
- /* In v1 engine, GSI QP context in RoCE engine's register */
- ret = hns_roce_gsi_qp_alloc(hr_dev, qpn, hr_qp);
+ if (!udata) {
+ ret = alloc_kernel_wrid(hr_dev, hr_qp);
if (ret) {
- dev_err(dev, "hns_roce_qp_alloc failed!\n");
- goto err_qpn;
- }
- } else {
- ret = hns_roce_qp_alloc(hr_dev, qpn, hr_qp);
- if (ret) {
- dev_err(dev, "hns_roce_qp_alloc failed!\n");
- goto err_qpn;
+ ibdev_err(ibdev, "Failed to alloc wrid\n");
+ return ret;
}
}
- if (sqpn)
- hr_qp->doorbell_qpn = 1;
- else
- hr_qp->doorbell_qpn = (u32)hr_qp->qpn;
+ ret = alloc_qp_db(hr_dev, hr_qp, init_attr, udata, &ucmd, &resp);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to alloc QP doorbell\n");
+ goto err_wrid;
+ }
+
+ ret = alloc_qp_buf(hr_dev, hr_qp, init_attr, udata, ucmd.buf_addr);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to alloc QP buffer\n");
+ goto err_db;
+ }
+
+ ret = alloc_qpn(hr_dev, hr_qp);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to alloc QPN\n");
+ goto err_buf;
+ }
+
+ ret = alloc_qpc(hr_dev, hr_qp);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to alloc QP context\n");
+ goto err_qpn;
+ }
+
+ ret = hns_roce_qp_store(hr_dev, hr_qp, init_attr);
+ if (ret) {
+ ibdev_err(ibdev, "Failed to store QP\n");
+ goto err_qpc;
+ }
if (udata) {
ret = ib_copy_to_udata(udata, &resp,
min(udata->outlen, sizeof(resp)));
- if (ret)
- goto err_qp;
+ if (ret) {
+ ibdev_err(ibdev, "copy qp resp failed!\n");
+ goto err_store;
+ }
}
if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL) {
ret = hr_dev->hw->qp_flow_control_init(hr_dev, hr_qp);
if (ret)
- goto err_qp;
+ goto err_store;
}
+ hr_qp->ibqp.qp_num = hr_qp->qpn;
hr_qp->event = hns_roce_ib_qp_event;
-
- add_qp_to_list(hr_dev, hr_qp, init_attr->send_cq, init_attr->recv_cq);
-
- hns_roce_free_buf_list(buf_list, hr_qp->region_cnt);
+ atomic_set(&hr_qp->refcount, 1);
+ init_completion(&hr_qp->free);
return 0;
-err_qp:
- if (init_attr->qp_type == IB_QPT_GSI &&
- hr_dev->hw_rev == HNS_ROCE_HW_VER1)
- hns_roce_qp_remove(hr_dev, hr_qp);
- else
- hns_roce_qp_free(hr_dev, hr_qp);
-
+err_store:
+ hns_roce_qp_remove(hr_dev, hr_qp);
+err_qpc:
+ free_qpc(hr_dev, hr_qp);
err_qpn:
- if (!sqpn)
- hns_roce_release_range_qp(hr_dev, qpn, 1);
-
-err_mtr:
- hns_roce_mtr_cleanup(hr_dev, &hr_qp->mtr);
-
-err_wrid:
- if (udata) {
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB) &&
- (udata->outlen >= sizeof(resp)) &&
- hns_roce_qp_has_rq(init_attr))
- hns_roce_db_unmap_user(uctx, &hr_qp->rdb);
- } else {
- if (hr_qp->rq.wqe_cnt)
- kfree(hr_qp->rq.wrid);
- }
-
-err_sq_dbmap:
- if (udata)
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_SQ_RECORD_DB) &&
- (udata->inlen >= sizeof(ucmd)) &&
- (udata->outlen >= sizeof(resp)) &&
- hns_roce_qp_has_sq(init_attr))
- hns_roce_db_unmap_user(uctx, &hr_qp->sdb);
-
-err_sq_wrid:
- if (!udata)
- kfree(hr_qp->sq.wrid);
-
-err_get_bufs:
- hns_roce_free_buf_list(buf_list, hr_qp->region_cnt);
-
-err_alloc_list:
- if (!hr_qp->umem)
- hns_roce_buf_free(hr_dev, hr_qp->buff_size, &hr_qp->hr_buf);
- ib_umem_release(hr_qp->umem);
-
+ free_qpn(hr_dev, hr_qp);
+err_buf:
+ free_qp_buf(hr_dev, hr_qp);
err_db:
- if (!udata && hns_roce_qp_has_rq(init_attr) &&
- (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RECORD_DB))
- hns_roce_free_db(hr_dev, &hr_qp->rdb);
-
-err_alloc_rq_inline_buf:
- if ((hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) &&
- hns_roce_qp_has_rq(init_attr))
- free_rq_inline_buf(hr_qp);
-
-err_out:
+ free_qp_db(hr_dev, hr_qp, udata);
+err_wrid:
+ free_kernel_wrid(hr_dev, hr_qp);
return ret;
}
+void hns_roce_qp_destroy(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp,
+ struct ib_udata *udata)
+{
+ if (atomic_dec_and_test(&hr_qp->refcount))
+ complete(&hr_qp->free);
+ wait_for_completion(&hr_qp->free);
+
+ free_qpc(hr_dev, hr_qp);
+ free_qpn(hr_dev, hr_qp);
+ free_qp_buf(hr_dev, hr_qp);
+ free_kernel_wrid(hr_dev, hr_qp);
+ free_qp_db(hr_dev, hr_qp, udata);
+
+ kfree(hr_qp);
+}
+
struct ib_qp *hns_roce_create_qp(struct ib_pd *pd,
struct ib_qp_init_attr *init_attr,
struct ib_udata *udata)
@@ -1050,7 +1191,7 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *pd,
if (!hr_qp)
return ERR_PTR(-ENOMEM);
- ret = hns_roce_create_qp_common(hr_dev, pd, init_attr, udata, 0,
+ ret = hns_roce_create_qp_common(hr_dev, pd, init_attr, udata,
hr_qp);
if (ret) {
ibdev_err(ibdev, "Create QP 0x%06lx failed(%d)\n",
@@ -1059,8 +1200,6 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *pd,
return ERR_PTR(ret);
}
- hr_qp->ibqp.qp_num = hr_qp->qpn;
-
break;
}
case IB_QPT_GSI: {
@@ -1077,15 +1216,8 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *pd,
hr_qp->port = init_attr->port_num - 1;
hr_qp->phy_port = hr_dev->iboe.phy_port[hr_qp->port];
- /* when hw version is v1, the sqpn is allocated */
- if (hr_dev->caps.max_sq_sg <= 2)
- hr_qp->ibqp.qp_num = HNS_ROCE_MAX_PORTS +
- hr_dev->iboe.phy_port[hr_qp->port];
- else
- hr_qp->ibqp.qp_num = 1;
-
ret = hns_roce_create_qp_common(hr_dev, pd, init_attr, udata,
- hr_qp->ibqp.qp_num, hr_qp);
+ hr_qp);
if (ret) {
ibdev_err(ibdev, "Create GSI QP failed!\n");
kfree(hr_qp);
@@ -1097,7 +1229,7 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *pd,
default:{
ibdev_err(ibdev, "not support QP type %d\n",
init_attr->qp_type);
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
}
}
@@ -1230,11 +1362,10 @@ int hns_roce_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
goto out;
if (cur_state == new_state && cur_state == IB_QPS_RESET) {
- if (hr_dev->caps.min_wqes) {
+ if (hr_dev->hw_rev == HNS_ROCE_HW_VER1) {
ret = -EPERM;
ibdev_err(&hr_dev->ib_dev,
- "cur_state=%d new_state=%d\n", cur_state,
- new_state);
+ "RST2RST state is not supported\n");
} else {
ret = 0;
}
@@ -1306,17 +1437,17 @@ static void *get_wqe(struct hns_roce_qp *hr_qp, int offset)
return hns_roce_buf_offset(&hr_qp->hr_buf, offset);
}
-void *get_recv_wqe(struct hns_roce_qp *hr_qp, int n)
+void *hns_roce_get_recv_wqe(struct hns_roce_qp *hr_qp, int n)
{
return get_wqe(hr_qp, hr_qp->rq.offset + (n << hr_qp->rq.wqe_shift));
}
-void *get_send_wqe(struct hns_roce_qp *hr_qp, int n)
+void *hns_roce_get_send_wqe(struct hns_roce_qp *hr_qp, int n)
{
return get_wqe(hr_qp, hr_qp->sq.offset + (n << hr_qp->sq.wqe_shift));
}
-void *get_send_extend_sge(struct hns_roce_qp *hr_qp, int n)
+void *hns_roce_get_extend_sge(struct hns_roce_qp *hr_qp, int n)
{
return hns_roce_buf_offset(&hr_qp->hr_buf, hr_qp->sge.offset +
(n << hr_qp->sge.sge_shift));
diff --git a/drivers/infiniband/hw/hns/hns_roce_srq.c b/drivers/infiniband/hw/hns/hns_roce_srq.c
index c6d5f06f9cde..5b3dd1a337d4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_srq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_srq.c
@@ -381,7 +381,8 @@ int hns_roce_create_srq(struct ib_srq *ib_srq,
srq->wqe_cnt = roundup_pow_of_two(init_attr->attr.max_wr + 1);
srq->max_gs = init_attr->attr.max_sge;
- srq_desc_size = roundup_pow_of_two(max(16, 16 * srq->max_gs));
+ srq_desc_size = roundup_pow_of_two(max(HNS_ROCE_SGE_SIZE,
+ HNS_ROCE_SGE_SIZE * srq->max_gs));
srq->wqe_shift = ilog2(srq_desc_size);
diff --git a/drivers/infiniband/hw/i40iw/i40iw.h b/drivers/infiniband/hw/i40iw/i40iw.h
index 8feec35f95a7..3c62c9327a9c 100644
--- a/drivers/infiniband/hw/i40iw/i40iw.h
+++ b/drivers/infiniband/hw/i40iw/i40iw.h
@@ -67,7 +67,7 @@
#include "i40iw_user.h"
#include "i40iw_puda.h"
-#define I40IW_FW_VERSION 2
+#define I40IW_FW_VER_DEFAULT 2
#define I40IW_HW_VERSION 2
#define I40IW_ARP_ADD 1
@@ -325,6 +325,26 @@ struct i40iw_handler {
struct i40e_info ldev;
};
+/**
+ * i40iw_fw_major_ver - get firmware major version
+ * @dev: iwarp device
+ **/
+static inline u64 i40iw_fw_major_ver(struct i40iw_sc_dev *dev)
+{
+ return RS_64(dev->feature_info[I40IW_FEATURE_FW_INFO],
+ I40IW_FW_VER_MAJOR);
+}
+
+/**
+ * i40iw_fw_minor_ver - get firmware minor version
+ * @dev: iwarp device
+ **/
+static inline u64 i40iw_fw_minor_ver(struct i40iw_sc_dev *dev)
+{
+ return RS_64(dev->feature_info[I40IW_FEATURE_FW_INFO],
+ I40IW_FW_VER_MINOR);
+}
+
/**
* to_iwdev - get device
* @ibdev: ib device
diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.h b/drivers/infiniband/hw/i40iw/i40iw_cm.h
index 66dc1ba03389..6e43e4d730f4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_cm.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_cm.h
@@ -85,7 +85,7 @@ struct ietf_mpa_v1 {
u8 flags;
u8 rev;
__be16 priv_data_len;
- u8 priv_data[0];
+ u8 priv_data[];
};
#define ietf_mpa_req_resp_frame ietf_mpa_frame
@@ -101,7 +101,7 @@ struct ietf_mpa_v2 {
u8 rev;
__be16 priv_data_len;
struct ietf_rtr_msg rtr_msg;
- u8 priv_data[0];
+ u8 priv_data[];
};
struct i40iw_cm_node;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index 4d841a3c68f3..e8b4b3743661 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -1021,6 +1021,95 @@ static enum i40iw_status_code i40iw_sc_commit_fpm_values(
return ret_code;
}
+/**
+ * i40iw_sc_query_rdma_features_done - poll cqp for query features done
+ * @cqp: struct for cqp hw
+ */
+static enum i40iw_status_code
+i40iw_sc_query_rdma_features_done(struct i40iw_sc_cqp *cqp)
+{
+ return i40iw_sc_poll_for_cqp_op_done(
+ cqp, I40IW_CQP_OP_QUERY_RDMA_FEATURES, NULL);
+}
+
+/**
+ * i40iw_sc_query_rdma_features - query rdma features
+ * @cqp: struct for cqp hw
+ * @feat_mem: holds PA for HW to use
+ * @scratch: u64 saved to be used during cqp completion
+ */
+static enum i40iw_status_code
+i40iw_sc_query_rdma_features(struct i40iw_sc_cqp *cqp,
+ struct i40iw_dma_mem *feat_mem, u64 scratch)
+{
+ u64 *wqe;
+ u64 header;
+
+ wqe = i40iw_sc_cqp_get_next_send_wqe(cqp, scratch);
+ if (wqe)
+ return I40IW_ERR_RING_FULL;
+
+ set_64bit_val(wqe, 32, feat_mem->pa);
+
+ header = LS_64(I40IW_CQP_OP_QUERY_RDMA_FEATURES, I40IW_CQPSQ_OPCODE) |
+ LS_64(cqp->polarity, I40IW_CQPSQ_WQEVALID) | feat_mem->size;
+
+ i40iw_insert_wqe_hdr(wqe, header);
+
+ i40iw_debug_buf(cqp->dev, I40IW_DEBUG_WQE, "QUERY RDMA FEATURES WQE",
+ wqe, I40IW_CQP_WQE_SIZE * 8);
+
+ i40iw_sc_cqp_post_sq(cqp);
+
+ return 0;
+}
+
+/**
+ * i40iw_get_rdma_features - get RDMA features
+ * @dev - sc device struct
+ */
+enum i40iw_status_code i40iw_get_rdma_features(struct i40iw_sc_dev *dev)
+{
+ enum i40iw_status_code ret_code;
+ struct i40iw_dma_mem feat_buf;
+ u64 temp;
+ u16 byte_idx, feat_type, feat_cnt;
+
+ ret_code = i40iw_allocate_dma_mem(dev->hw, &feat_buf,
+ I40IW_FEATURE_BUF_SIZE,
+ I40IW_FEATURE_BUF_ALIGNMENT);
+
+ if (ret_code)
+ return I40IW_ERR_NO_MEMORY;
+
+ ret_code = i40iw_sc_query_rdma_features(dev->cqp, &feat_buf, 0);
+ if (!ret_code)
+ ret_code = i40iw_sc_query_rdma_features_done(dev->cqp);
+
+ if (ret_code)
+ goto exit;
+
+ get_64bit_val(feat_buf.va, 0, &temp);
+ feat_cnt = RS_64(temp, I40IW_FEATURE_CNT);
+ if (feat_cnt < I40IW_MAX_FEATURES) {
+ ret_code = I40IW_ERR_INVALID_FEAT_CNT;
+ goto exit;
+ } else if (feat_cnt > I40IW_MAX_FEATURES) {
+ i40iw_debug(dev, I40IW_DEBUG_CQP,
+ "features buf size insufficient\n");
+ }
+
+ for (byte_idx = 0, feat_type = 0; feat_type < I40IW_MAX_FEATURES;
+ feat_type++, byte_idx += 8) {
+ get_64bit_val((u64 *)feat_buf.va, byte_idx, &temp);
+ dev->feature_info[feat_type] = RS_64(temp, I40IW_FEATURE_INFO);
+ }
+exit:
+ i40iw_free_dma_mem(dev->hw, &feat_buf);
+
+ return ret_code;
+}
+
/**
* i40iw_sc_query_fpm_values_done - poll for cqp wqe completion for query fpm
* @cqp: struct for cqp hw
@@ -4265,6 +4354,13 @@ static enum i40iw_status_code i40iw_exec_cqp_cmd(struct i40iw_sc_dev *dev,
true,
I40IW_CQP_WAIT_EVENT);
break;
+ case OP_QUERY_RDMA_FEATURES:
+ values_mem.pa = pcmdinfo->in.u.query_rdma_features.cap_pa;
+ values_mem.va = pcmdinfo->in.u.query_rdma_features.cap_va;
+ status = i40iw_sc_query_rdma_features(
+ pcmdinfo->in.u.query_rdma_features.cqp, &values_mem,
+ pcmdinfo->in.u.query_rdma_features.scratch);
+ break;
default:
status = I40IW_NOT_SUPPORTED;
break;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_d.h b/drivers/infiniband/hw/i40iw/i40iw_d.h
index 6ddaeec87d2f..e8367d67575d 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_d.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_d.h
@@ -403,7 +403,7 @@
#define I40IW_CQP_OP_MANAGE_ARP 0x0f
#define I40IW_CQP_OP_MANAGE_VF_PBLE_BP 0x10
#define I40IW_CQP_OP_MANAGE_PUSH_PAGES 0x11
-#define I40IW_CQP_OP_MANAGE_PE_TEAM 0x12
+#define I40IW_CQP_OP_QUERY_RDMA_FEATURES 0x12
#define I40IW_CQP_OP_UPLOAD_CONTEXT 0x13
#define I40IW_CQP_OP_ALLOCATE_LOC_MAC_IP_TABLE_ENTRY 0x14
#define I40IW_CQP_OP_MANAGE_HMC_PM_FUNC_TABLE 0x15
@@ -431,6 +431,24 @@
#define I40IW_CQP_OP_SHMC_PAGES_ALLOCATED 0x2b
#define I40IW_CQP_OP_SET_HMC_RESOURCE_PROFILE 0x2d
+#define I40IW_FEATURE_BUF_SIZE (8 * I40IW_MAX_FEATURES)
+
+#define I40IW_FW_VER_MINOR_SHIFT 0
+#define I40IW_FW_VER_MINOR_MASK \
+ (0xffffULL << I40IW_FW_VER_MINOR_SHIFT)
+
+#define I40IW_FW_VER_MAJOR_SHIFT 16
+#define I40IW_FW_VER_MAJOR_MASK \
+ (0xffffULL << I40IW_FW_VER_MAJOR_SHIFT)
+
+#define I40IW_FEATURE_INFO_SHIFT 0
+#define I40IW_FEATURE_INFO_MASK \
+ (0xffffULL << I40IW_FEATURE_INFO_SHIFT)
+
+#define I40IW_FEATURE_CNT_SHIFT 32
+#define I40IW_FEATURE_CNT_MASK \
+ (0xffffULL << I40IW_FEATURE_CNT_SHIFT)
+
#define I40IW_UDA_QPSQ_NEXT_HEADER_SHIFT 16
#define I40IW_UDA_QPSQ_NEXT_HEADER_MASK ((u64)0xff << I40IW_UDA_QPSQ_NEXT_HEADER_SHIFT)
@@ -1529,7 +1547,8 @@ enum i40iw_alignment {
I40IW_AEQ_ALIGNMENT = 0x100,
I40IW_CEQ_ALIGNMENT = 0x100,
I40IW_CQ0_ALIGNMENT = 0x100,
- I40IW_SD_BUF_ALIGNMENT = 0x80
+ I40IW_SD_BUF_ALIGNMENT = 0x80,
+ I40IW_FEATURE_BUF_ALIGNMENT = 0x8
};
#define I40IW_WQE_SIZE_64 64
@@ -1732,6 +1751,7 @@ enum i40iw_alignment {
#define OP_REQUESTED_COMMANDS 31
#define OP_COMPLETED_COMMANDS 32
#define OP_GEN_AE 33
-#define OP_SIZE_CQP_STAT_ARRAY 34
+#define OP_QUERY_RDMA_FEATURES 34
+#define OP_SIZE_CQP_STAT_ARRAY 35
#endif
diff --git a/drivers/infiniband/hw/i40iw/i40iw_main.c b/drivers/infiniband/hw/i40iw/i40iw_main.c
index 238614370927..9c96ece5e7f3 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_main.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_main.c
@@ -1212,22 +1212,19 @@ static void i40iw_add_ipv4_addr(struct i40iw_device *iwdev)
{
struct net_device *dev;
struct in_device *idev;
- bool got_lock = true;
u32 ip_addr;
- if (!rtnl_trylock())
- got_lock = false;
-
- for_each_netdev(&init_net, dev) {
+ rcu_read_lock();
+ for_each_netdev_rcu(&init_net, dev) {
if ((((rdma_vlan_dev_vlan_id(dev) < 0xFFFF) &&
(rdma_vlan_dev_real_dev(dev) == iwdev->netdev)) ||
- (dev == iwdev->netdev)) && (dev->flags & IFF_UP)) {
+ (dev == iwdev->netdev)) && (READ_ONCE(dev->flags) & IFF_UP)) {
const struct in_ifaddr *ifa;
- idev = in_dev_get(dev);
+ idev = __in_dev_get_rcu(dev);
if (!idev)
continue;
- in_dev_for_each_ifa_rtnl(ifa, idev) {
+ in_dev_for_each_ifa_rcu(ifa, idev) {
i40iw_debug(&iwdev->sc_dev, I40IW_DEBUG_CM,
"IP=%pI4, vlan_id=%d, MAC=%pM\n", &ifa->ifa_address,
rdma_vlan_dev_vlan_id(dev), dev->dev_addr);
@@ -1239,12 +1236,9 @@ static void i40iw_add_ipv4_addr(struct i40iw_device *iwdev)
true,
I40IW_ARP_ADD);
}
-
- in_dev_put(idev);
}
}
- if (got_lock)
- rtnl_unlock();
+ rcu_read_unlock();
}
/**
@@ -1689,6 +1683,12 @@ static int i40iw_open(struct i40e_info *ldev, struct i40e_client *client)
status = i40iw_setup_ceqs(iwdev, ldev);
if (status)
break;
+
+ status = i40iw_get_rdma_features(dev);
+ if (status)
+ dev->feature_info[I40IW_FEATURE_FW_INFO] =
+ I40IW_FW_VER_DEFAULT;
+
iwdev->init_state = CEQ_CREATED;
status = i40iw_initialize_hw_resources(iwdev);
if (status)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_p.h b/drivers/infiniband/hw/i40iw/i40iw_p.h
index 11d3a2a72100..4c429567bbb4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_p.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_p.h
@@ -105,6 +105,7 @@ enum i40iw_status_code i40iw_sc_static_hmc_pages_allocated(struct i40iw_sc_cqp *
bool poll_registers);
enum i40iw_status_code i40iw_config_fpm_values(struct i40iw_sc_dev *dev, u32 qp_count);
+enum i40iw_status_code i40iw_get_rdma_features(struct i40iw_sc_dev *dev);
void free_sd_mem(struct i40iw_sc_dev *dev);
diff --git a/drivers/infiniband/hw/i40iw/i40iw_status.h b/drivers/infiniband/hw/i40iw/i40iw_status.h
index f7013f11d808..d1c5855bd8c3 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_status.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_status.h
@@ -95,7 +95,8 @@ enum i40iw_status_code {
I40IW_ERR_INVALID_MAC_ADDR = -65,
I40IW_ERR_BAD_STAG = -66,
I40IW_ERR_CQ_COMPL_ERROR = -67,
- I40IW_ERR_QUEUE_DESTROYED = -68
+ I40IW_ERR_QUEUE_DESTROYED = -68,
+ I40IW_ERR_INVALID_FEAT_CNT = -69
};
#endif
diff --git a/drivers/infiniband/hw/i40iw/i40iw_type.h b/drivers/infiniband/hw/i40iw/i40iw_type.h
index adc8d2ec523d..54c323c40d96 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_type.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_type.h
@@ -234,6 +234,11 @@ enum i40iw_hw_stats_index_64b {
I40IW_HW_STAT_INDEX_MAX_64
};
+enum i40iw_feature_type {
+ I40IW_FEATURE_FW_INFO = 0,
+ I40IW_MAX_FEATURES
+};
+
struct i40iw_dev_hw_stats_offsets {
u32 stats_offset_32[I40IW_HW_STAT_INDEX_MAX_32];
u32 stats_offset_64[I40IW_HW_STAT_INDEX_MAX_64];
@@ -501,6 +506,7 @@ struct i40iw_sc_dev {
const struct i40iw_vf_cqp_ops *iw_vf_cqp_ops;
struct i40iw_hmc_fpm_misc hmc_fpm_misc;
+ u64 feature_info[I40IW_MAX_FEATURES];
u32 debug_mask;
u8 hmc_fn_id;
bool is_pf;
@@ -1340,6 +1346,12 @@ struct cqp_info {
struct i40iw_sc_qp *qp;
u64 scratch;
} suspend_resume;
+ struct {
+ struct i40iw_sc_cqp *cqp;
+ void *cap_va;
+ u64 cap_pa;
+ u64 scratch;
+ } query_rdma_features;
} u;
};
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index c335de91508f..1b6fb1380961 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -64,7 +64,8 @@ static int i40iw_query_device(struct ib_device *ibdev,
return -EINVAL;
memset(props, 0, sizeof(*props));
ether_addr_copy((u8 *)&props->sys_image_guid, iwdev->netdev->dev_addr);
- props->fw_ver = I40IW_FW_VERSION;
+ props->fw_ver = i40iw_fw_major_ver(&iwdev->sc_dev) << 32 |
+ i40iw_fw_minor_ver(&iwdev->sc_dev);
props->device_cap_flags = iwdev->device_cap_flags;
props->vendor_id = iwdev->ldev->pcidev->vendor;
props->vendor_part_id = iwdev->ldev->pcidev->device;
@@ -617,7 +618,7 @@ static struct ib_qp *i40iw_create_qp(struct ib_pd *ibpd,
iwqp->ctx_info.qp_compl_ctx = (uintptr_t)qp;
if (init_attr->qp_type != IB_QPT_RC) {
- err_code = -EINVAL;
+ err_code = -EOPNOTSUPP;
goto error;
}
if (iwdev->push_mode)
@@ -2534,10 +2535,11 @@ static const char * const i40iw_hw_stat_names[] = {
static void i40iw_get_dev_fw_str(struct ib_device *dev, char *str)
{
- u32 firmware_version = I40IW_FW_VERSION;
+ struct i40iw_device *iwdev = to_iwdev(dev);
- snprintf(str, IB_FW_VERSION_NAME_MAX, "%u.%u", firmware_version,
- (firmware_version & 0x000000ff));
+ snprintf(str, IB_FW_VERSION_NAME_MAX, "%llu.%llu",
+ i40iw_fw_major_ver(&iwdev->sc_dev),
+ i40iw_fw_minor_ver(&iwdev->sc_dev));
}
/**
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 2f5d9b181848..a66518a5c938 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -434,9 +434,6 @@ int mlx4_ib_gid_index_to_real_index(struct mlx4_ib_dev *ibdev,
return real_index;
}
-#define field_avail(type, fld, sz) (offsetof(type, fld) + \
- sizeof(((type *)0)->fld) <= (sz))
-
static int mlx4_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props,
struct ib_udata *uhw)
@@ -447,7 +444,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
int err;
int have_ib_ports;
struct mlx4_uverbs_ex_query_device cmd;
- struct mlx4_uverbs_ex_query_device_resp resp = {.comp_mask = 0};
+ struct mlx4_uverbs_ex_query_device_resp resp = {};
struct mlx4_clock_params clock_params;
if (uhw->inlen) {
@@ -602,7 +599,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
sizeof(struct mlx4_wqe_data_seg);
}
- if (field_avail(typeof(resp), rss_caps, uhw->outlen)) {
+ if (offsetofend(typeof(resp), rss_caps) <= uhw->outlen) {
if (props->rss_caps.supported_qpts) {
resp.rss_caps.rx_hash_function =
MLX4_IB_RX_HASH_FUNC_TOEPLITZ;
@@ -626,7 +623,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
sizeof(resp.rss_caps);
}
- if (field_avail(typeof(resp), tso_caps, uhw->outlen)) {
+ if (offsetofend(typeof(resp), tso_caps) <= uhw->outlen) {
if (dev->dev->caps.max_gso_sz &&
((mlx4_ib_port_link_layer(ibdev, 1) ==
IB_LINK_LAYER_ETHERNET) ||
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 26425dd2d960..2f9f78912267 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1636,7 +1636,7 @@ static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd,
}
default:
/* Don't support raw QPs */
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
}
return &qp->ibqp;
diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile
index d0a043ccbe58..2a334800f109 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -8,3 +8,4 @@ mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
mlx5_ib-$(CONFIG_MLX5_ESWITCH) += ib_rep.o
mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += devx.o
mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += flow.o
+mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += qos.o
diff --git a/drivers/infiniband/hw/mlx5/cong.c b/drivers/infiniband/hw/mlx5/cong.c
index 8ba439fabf7f..de4da92b81a6 100644
--- a/drivers/infiniband/hw/mlx5/cong.c
+++ b/drivers/infiniband/hw/mlx5/cong.c
@@ -47,6 +47,7 @@ static const char * const mlx5_ib_dbg_cc_name[] = {
"rp_byte_reset",
"rp_threshold",
"rp_ai_rate",
+ "rp_max_rate",
"rp_hai_rate",
"rp_min_dec_fac",
"rp_min_rate",
@@ -56,6 +57,7 @@ static const char * const mlx5_ib_dbg_cc_name[] = {
"rp_rate_reduce_monitor_period",
"rp_initial_alpha_value",
"rp_gd",
+ "np_min_time_between_cnps",
"np_cnp_dscp",
"np_cnp_prio_mode",
"np_cnp_prio",
@@ -66,6 +68,7 @@ static const char * const mlx5_ib_dbg_cc_name[] = {
#define MLX5_IB_RP_TIME_RESET_ATTR BIT(3)
#define MLX5_IB_RP_BYTE_RESET_ATTR BIT(4)
#define MLX5_IB_RP_THRESHOLD_ATTR BIT(5)
+#define MLX5_IB_RP_MAX_RATE_ATTR BIT(6)
#define MLX5_IB_RP_AI_RATE_ATTR BIT(7)
#define MLX5_IB_RP_HAI_RATE_ATTR BIT(8)
#define MLX5_IB_RP_MIN_DEC_FAC_ATTR BIT(9)
@@ -77,6 +80,7 @@ static const char * const mlx5_ib_dbg_cc_name[] = {
#define MLX5_IB_RP_INITIAL_ALPHA_VALUE_ATTR BIT(15)
#define MLX5_IB_RP_GD_ATTR BIT(16)
+#define MLX5_IB_NP_MIN_TIME_BETWEEN_CNPS_ATTR BIT(2)
#define MLX5_IB_NP_CNP_DSCP_ATTR BIT(3)
#define MLX5_IB_NP_CNP_PRIO_MODE_ATTR BIT(4)
@@ -111,6 +115,9 @@ static u32 mlx5_get_cc_param_val(void *field, int offset)
case MLX5_IB_DBG_CC_RP_AI_RATE:
return MLX5_GET(cong_control_r_roce_ecn_rp, field,
rpg_ai_rate);
+ case MLX5_IB_DBG_CC_RP_MAX_RATE:
+ return MLX5_GET(cong_control_r_roce_ecn_rp, field,
+ rpg_max_rate);
case MLX5_IB_DBG_CC_RP_HAI_RATE:
return MLX5_GET(cong_control_r_roce_ecn_rp, field,
rpg_hai_rate);
@@ -138,6 +145,9 @@ static u32 mlx5_get_cc_param_val(void *field, int offset)
case MLX5_IB_DBG_CC_RP_GD:
return MLX5_GET(cong_control_r_roce_ecn_rp, field,
rpg_gd);
+ case MLX5_IB_DBG_CC_NP_MIN_TIME_BETWEEN_CNPS:
+ return MLX5_GET(cong_control_r_roce_ecn_np, field,
+ min_time_between_cnps);
case MLX5_IB_DBG_CC_NP_CNP_DSCP:
return MLX5_GET(cong_control_r_roce_ecn_np, field,
cnp_dscp);
@@ -186,6 +196,11 @@ static void mlx5_ib_set_cc_param_mask_val(void *field, int offset,
MLX5_SET(cong_control_r_roce_ecn_rp, field,
rpg_ai_rate, var);
break;
+ case MLX5_IB_DBG_CC_RP_MAX_RATE:
+ *attr_mask |= MLX5_IB_RP_MAX_RATE_ATTR;
+ MLX5_SET(cong_control_r_roce_ecn_rp, field,
+ rpg_max_rate, var);
+ break;
case MLX5_IB_DBG_CC_RP_HAI_RATE:
*attr_mask |= MLX5_IB_RP_HAI_RATE_ATTR;
MLX5_SET(cong_control_r_roce_ecn_rp, field,
@@ -231,6 +246,11 @@ static void mlx5_ib_set_cc_param_mask_val(void *field, int offset,
MLX5_SET(cong_control_r_roce_ecn_rp, field,
rpg_gd, var);
break;
+ case MLX5_IB_DBG_CC_NP_MIN_TIME_BETWEEN_CNPS:
+ *attr_mask |= MLX5_IB_NP_MIN_TIME_BETWEEN_CNPS_ATTR;
+ MLX5_SET(cong_control_r_roce_ecn_np, field,
+ min_time_between_cnps, var);
+ break;
case MLX5_IB_DBG_CC_NP_CNP_DSCP:
*attr_mask |= MLX5_IB_NP_CNP_DSCP_ATTR;
MLX5_SET(cong_control_r_roce_ecn_np, field, cnp_dscp, var);
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 3dec3de903b7..146ba2966744 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -715,17 +715,19 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
struct mlx5_ib_ucontext *context = rdma_udata_to_drv_context(
udata, struct mlx5_ib_ucontext, ibucontext);
- ucmdlen = udata->inlen < sizeof(ucmd) ?
- (sizeof(ucmd) - sizeof(ucmd.flags)) : sizeof(ucmd);
+ ucmdlen = min(udata->inlen, sizeof(ucmd));
+ if (ucmdlen < offsetof(struct mlx5_ib_create_cq, flags))
+ return -EINVAL;
if (ib_copy_from_udata(&ucmd, udata, ucmdlen))
return -EFAULT;
- if (ucmdlen == sizeof(ucmd) &&
- (ucmd.flags & ~(MLX5_IB_CREATE_CQ_FLAGS_CQE_128B_PAD)))
+ if ((ucmd.flags & ~(MLX5_IB_CREATE_CQ_FLAGS_CQE_128B_PAD |
+ MLX5_IB_CREATE_CQ_FLAGS_UAR_PAGE_INDEX)))
return -EINVAL;
- if (ucmd.cqe_size != 64 && ucmd.cqe_size != 128)
+ if ((ucmd.cqe_size != 64 && ucmd.cqe_size != 128) ||
+ ucmd.reserved0 || ucmd.reserved1)
return -EINVAL;
*cqe_size = ucmd.cqe_size;
@@ -762,7 +764,14 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
page_shift - MLX5_ADAPTER_PAGE_SHIFT);
- *index = context->bfregi.sys_pages[0];
+ if (ucmd.flags & MLX5_IB_CREATE_CQ_FLAGS_UAR_PAGE_INDEX) {
+ *index = ucmd.uar_page_index;
+ } else if (context->bfregi.lib_uar_dyn) {
+ err = -EINVAL;
+ goto err_cqb;
+ } else {
+ *index = context->bfregi.sys_pages[0];
+ }
if (ucmd.cqe_comp_en == 1) {
int mini_cqe_format;
diff --git a/drivers/infiniband/hw/mlx5/flow.c b/drivers/infiniband/hw/mlx5/flow.c
index dbee17d22d50..862b7bf3e646 100644
--- a/drivers/infiniband/hw/mlx5/flow.c
+++ b/drivers/infiniband/hw/mlx5/flow.c
@@ -35,6 +35,9 @@ mlx5_ib_ft_type_to_namespace(enum mlx5_ib_uapi_flow_table_type table_type,
case MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_RX:
*namespace = MLX5_FLOW_NAMESPACE_RDMA_RX;
break;
+ case MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TX:
+ *namespace = MLX5_FLOW_NAMESPACE_RDMA_TX;
+ break;
default:
return -EINVAL;
}
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 3efa7493456b..6679756506e6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -39,9 +39,6 @@
#include
#include
#include
-#if defined(CONFIG_X86)
-#include
-#endif
#include
#include
#include
@@ -898,7 +895,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props->raw_packet_caps |=
IB_RAW_PACKET_CAP_CVLAN_STRIPPING;
- if (field_avail(typeof(resp), tso_caps, uhw_outlen)) {
+ if (offsetofend(typeof(resp), tso_caps) <= uhw_outlen) {
max_tso = MLX5_CAP_ETH(mdev, max_lso_cap);
if (max_tso) {
resp.tso_caps.max_tso = 1 << max_tso;
@@ -908,7 +905,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
}
- if (field_avail(typeof(resp), rss_caps, uhw_outlen)) {
+ if (offsetofend(typeof(resp), rss_caps) <= uhw_outlen) {
resp.rss_caps.rx_hash_function =
MLX5_RX_HASH_FUNC_TOEPLITZ;
resp.rss_caps.rx_hash_fields_mask =
@@ -928,9 +925,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
resp.response_length += sizeof(resp.rss_caps);
}
} else {
- if (field_avail(typeof(resp), tso_caps, uhw_outlen))
+ if (offsetofend(typeof(resp), tso_caps) <= uhw_outlen)
resp.response_length += sizeof(resp.tso_caps);
- if (field_avail(typeof(resp), rss_caps, uhw_outlen))
+ if (offsetofend(typeof(resp), rss_caps) <= uhw_outlen)
resp.response_length += sizeof(resp.rss_caps);
}
@@ -1072,7 +1069,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
MLX5_MAX_CQ_PERIOD;
}
- if (field_avail(typeof(resp), cqe_comp_caps, uhw_outlen)) {
+ if (offsetofend(typeof(resp), cqe_comp_caps) <= uhw_outlen) {
resp.response_length += sizeof(resp.cqe_comp_caps);
if (MLX5_CAP_GEN(dev->mdev, cqe_compression)) {
@@ -1090,7 +1087,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
}
- if (field_avail(typeof(resp), packet_pacing_caps, uhw_outlen) &&
+ if (offsetofend(typeof(resp), packet_pacing_caps) <= uhw_outlen &&
raw_support) {
if (MLX5_CAP_QOS(mdev, packet_pacing) &&
MLX5_CAP_GEN(mdev, qos)) {
@@ -1108,8 +1105,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
resp.response_length += sizeof(resp.packet_pacing_caps);
}
- if (field_avail(typeof(resp), mlx5_ib_support_multi_pkt_send_wqes,
- uhw_outlen)) {
+ if (offsetofend(typeof(resp), mlx5_ib_support_multi_pkt_send_wqes) <=
+ uhw_outlen) {
if (MLX5_CAP_ETH(mdev, multi_pkt_send_wqe))
resp.mlx5_ib_support_multi_pkt_send_wqes =
MLX5_IB_ALLOW_MPW;
@@ -1122,7 +1119,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
sizeof(resp.mlx5_ib_support_multi_pkt_send_wqes);
}
- if (field_avail(typeof(resp), flags, uhw_outlen)) {
+ if (offsetofend(typeof(resp), flags) <= uhw_outlen) {
resp.response_length += sizeof(resp.flags);
if (MLX5_CAP_GEN(mdev, cqe_compression_128))
@@ -1138,7 +1135,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
resp.flags |= MLX5_IB_QUERY_DEV_RESP_FLAGS_SCAT2CQE_DCT;
}
- if (field_avail(typeof(resp), sw_parsing_caps, uhw_outlen)) {
+ if (offsetofend(typeof(resp), sw_parsing_caps) <= uhw_outlen) {
resp.response_length += sizeof(resp.sw_parsing_caps);
if (MLX5_CAP_ETH(mdev, swp)) {
resp.sw_parsing_caps.sw_parsing_offloads |=
@@ -1158,7 +1155,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
}
- if (field_avail(typeof(resp), striding_rq_caps, uhw_outlen) &&
+ if (offsetofend(typeof(resp), striding_rq_caps) <= uhw_outlen &&
raw_support) {
resp.response_length += sizeof(resp.striding_rq_caps);
if (MLX5_CAP_GEN(mdev, striding_rq)) {
@@ -1181,7 +1178,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
}
- if (field_avail(typeof(resp), tunnel_offloads_caps, uhw_outlen)) {
+ if (offsetofend(typeof(resp), tunnel_offloads_caps) <= uhw_outlen) {
resp.response_length += sizeof(resp.tunnel_offloads_caps);
if (MLX5_CAP_ETH(mdev, tunnel_stateless_vxlan))
resp.tunnel_offloads_caps |=
@@ -1192,12 +1189,10 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
if (MLX5_CAP_ETH(mdev, tunnel_stateless_gre))
resp.tunnel_offloads_caps |=
MLX5_IB_TUNNELED_OFFLOADS_GRE;
- if (MLX5_CAP_GEN(mdev, flex_parser_protocols) &
- MLX5_FLEX_PROTO_CW_MPLS_GRE)
+ if (MLX5_CAP_ETH(mdev, tunnel_stateless_mpls_over_gre))
resp.tunnel_offloads_caps |=
MLX5_IB_TUNNELED_OFFLOADS_MPLS_GRE;
- if (MLX5_CAP_GEN(mdev, flex_parser_protocols) &
- MLX5_FLEX_PROTO_CW_MPLS_UDP)
+ if (MLX5_CAP_ETH(mdev, tunnel_stateless_mpls_over_udp))
resp.tunnel_offloads_caps |=
MLX5_IB_TUNNELED_OFFLOADS_MPLS_UDP;
}
@@ -1791,6 +1786,7 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
max_cqe_version);
u32 dump_fill_mkey;
bool lib_uar_4k;
+ bool lib_uar_dyn;
if (!dev->ib_active)
return -EAGAIN;
@@ -1849,8 +1845,14 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
}
lib_uar_4k = req.lib_caps & MLX5_LIB_CAP_4K_UAR;
+ lib_uar_dyn = req.lib_caps & MLX5_LIB_CAP_DYN_UAR;
bfregi = &context->bfregi;
+ if (lib_uar_dyn) {
+ bfregi->lib_uar_dyn = lib_uar_dyn;
+ goto uar_done;
+ }
+
/* updates req->total_num_bfregs */
err = calc_total_bfregs(dev, lib_uar_4k, &req, bfregi);
if (err)
@@ -1877,6 +1879,7 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
if (err)
goto out_sys_pages;
+uar_done:
if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) {
err = mlx5_ib_devx_create(dev, true);
if (err < 0)
@@ -1898,19 +1901,19 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
INIT_LIST_HEAD(&context->db_page_list);
mutex_init(&context->db_page_mutex);
- resp.tot_bfregs = req.total_num_bfregs;
+ resp.tot_bfregs = lib_uar_dyn ? 0 : req.total_num_bfregs;
resp.num_ports = dev->num_ports;
- if (field_avail(typeof(resp), cqe_version, udata->outlen))
+ if (offsetofend(typeof(resp), cqe_version) <= udata->outlen)
resp.response_length += sizeof(resp.cqe_version);
- if (field_avail(typeof(resp), cmds_supp_uhw, udata->outlen)) {
+ if (offsetofend(typeof(resp), cmds_supp_uhw) <= udata->outlen) {
resp.cmds_supp_uhw |= MLX5_USER_CMDS_SUPP_UHW_QUERY_DEVICE |
MLX5_USER_CMDS_SUPP_UHW_CREATE_AH;
resp.response_length += sizeof(resp.cmds_supp_uhw);
}
- if (field_avail(typeof(resp), eth_min_inline, udata->outlen)) {
+ if (offsetofend(typeof(resp), eth_min_inline) <= udata->outlen) {
if (mlx5_ib_port_link_layer(ibdev, 1) == IB_LINK_LAYER_ETHERNET) {
mlx5_query_min_inline(dev->mdev, &resp.eth_min_inline);
resp.eth_min_inline++;
@@ -1918,7 +1921,7 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
resp.response_length += sizeof(resp.eth_min_inline);
}
- if (field_avail(typeof(resp), clock_info_versions, udata->outlen)) {
+ if (offsetofend(typeof(resp), clock_info_versions) <= udata->outlen) {
if (mdev->clock_info)
resp.clock_info_versions = BIT(MLX5_IB_CLOCK_INFO_V1);
resp.response_length += sizeof(resp.clock_info_versions);
@@ -1930,7 +1933,7 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
* pretend we don't support reading the HCA's core clock. This is also
* forced by mmap function.
*/
- if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
+ if (offsetofend(typeof(resp), hca_core_clock_offset) <= udata->outlen) {
if (PAGE_SIZE <= 4096) {
resp.comp_mask |=
MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
@@ -1940,18 +1943,18 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
resp.response_length += sizeof(resp.hca_core_clock_offset);
}
- if (field_avail(typeof(resp), log_uar_size, udata->outlen))
+ if (offsetofend(typeof(resp), log_uar_size) <= udata->outlen)
resp.response_length += sizeof(resp.log_uar_size);
- if (field_avail(typeof(resp), num_uars_per_page, udata->outlen))
+ if (offsetofend(typeof(resp), num_uars_per_page) <= udata->outlen)
resp.response_length += sizeof(resp.num_uars_per_page);
- if (field_avail(typeof(resp), num_dyn_bfregs, udata->outlen)) {
+ if (offsetofend(typeof(resp), num_dyn_bfregs) <= udata->outlen) {
resp.num_dyn_bfregs = bfregi->num_dyn_bfregs;
resp.response_length += sizeof(resp.num_dyn_bfregs);
}
- if (field_avail(typeof(resp), dump_fill_mkey, udata->outlen)) {
+ if (offsetofend(typeof(resp), dump_fill_mkey) <= udata->outlen) {
if (MLX5_CAP_GEN(dev->mdev, dump_fill_mkey)) {
resp.dump_fill_mkey = dump_fill_mkey;
resp.comp_mask |=
@@ -2026,6 +2029,17 @@ static phys_addr_t uar_index2pfn(struct mlx5_ib_dev *dev,
return (dev->mdev->bar_addr >> PAGE_SHIFT) + uar_idx / fw_uars_per_page;
}
+static u64 uar_index2paddress(struct mlx5_ib_dev *dev,
+ int uar_idx)
+{
+ unsigned int fw_uars_per_page;
+
+ fw_uars_per_page = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+ MLX5_UARS_IN_PAGE : 1;
+
+ return (dev->mdev->bar_addr + (uar_idx / fw_uars_per_page) * PAGE_SIZE);
+}
+
static int get_command(unsigned long offset)
{
return (offset >> MLX5_IB_MMAP_CMD_SHIFT) & MLX5_IB_MMAP_CMD_MASK;
@@ -2110,6 +2124,11 @@ static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry)
mutex_unlock(&var_table->bitmap_lock);
kfree(mentry);
break;
+ case MLX5_IB_MMAP_TYPE_UAR_WC:
+ case MLX5_IB_MMAP_TYPE_UAR_NC:
+ mlx5_cmd_free_uar(dev->mdev, mentry->page_idx);
+ kfree(mentry);
+ break;
default:
WARN_ON(true);
}
@@ -2130,6 +2149,9 @@ static int uar_mmap(struct mlx5_ib_dev *dev, enum mlx5_ib_mmap_cmd cmd,
int max_valid_idx = dyn_uar ? bfregi->num_sys_pages :
bfregi->num_static_sys_pages;
+ if (bfregi->lib_uar_dyn)
+ return -EINVAL;
+
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
@@ -2147,14 +2169,6 @@ static int uar_mmap(struct mlx5_ib_dev *dev, enum mlx5_ib_mmap_cmd cmd,
switch (cmd) {
case MLX5_IB_MMAP_WC_PAGE:
case MLX5_IB_MMAP_ALLOC_WC:
-/* Some architectures don't support WC memory */
-#if defined(CONFIG_X86)
- if (!pat_enabled())
- return -EPERM;
-#elif !(defined(CONFIG_PPC) || (defined(CONFIG_ARM) && defined(CONFIG_MMU)))
- return -EPERM;
-#endif
- /* fall through */
case MLX5_IB_MMAP_REGULAR_PAGE:
/* For MLX5_IB_MMAP_REGULAR_PAGE do the best effort to get WC */
prot = pgprot_writecombine(vma->vm_page_prot);
@@ -2269,7 +2283,8 @@ static int mlx5_ib_mmap_offset(struct mlx5_ib_dev *dev,
mentry = to_mmmap(entry);
pfn = (mentry->address >> PAGE_SHIFT);
- if (mentry->mmap_flag == MLX5_IB_MMAP_TYPE_VAR)
+ if (mentry->mmap_flag == MLX5_IB_MMAP_TYPE_VAR ||
+ mentry->mmap_flag == MLX5_IB_MMAP_TYPE_UAR_NC)
prot = pgprot_noncached(vma->vm_page_prot);
else
prot = pgprot_writecombine(vma->vm_page_prot);
@@ -2300,9 +2315,12 @@ static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vm
command = get_command(vma->vm_pgoff);
switch (command) {
case MLX5_IB_MMAP_WC_PAGE:
+ case MLX5_IB_MMAP_ALLOC_WC:
+ if (!dev->wc_support)
+ return -EPERM;
+ fallthrough;
case MLX5_IB_MMAP_NC_PAGE:
case MLX5_IB_MMAP_REGULAR_PAGE:
- case MLX5_IB_MMAP_ALLOC_WC:
return uar_mmap(dev, command, vma, context);
case MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES:
@@ -4046,6 +4064,11 @@ _get_flow_table(struct mlx5_ib_dev *dev,
BIT(MLX5_CAP_FLOWTABLE_RDMA_RX(dev->mdev,
log_max_ft_size));
priority = fs_matcher->priority;
+ } else if (fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TX) {
+ max_table_size =
+ BIT(MLX5_CAP_FLOWTABLE_RDMA_TX(dev->mdev,
+ log_max_ft_size));
+ priority = fs_matcher->priority;
}
max_table_size = min_t(int, max_table_size, MLX5_FS_MAX_ENTRIES);
@@ -4062,6 +4085,8 @@ _get_flow_table(struct mlx5_ib_dev *dev,
prio = &dev->flow_db->fdb;
else if (fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_RX)
prio = &dev->flow_db->rdma_rx[priority];
+ else if (fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TX)
+ prio = &dev->flow_db->rdma_tx[priority];
if (!prio)
return ERR_PTR(-EINVAL);
@@ -6090,9 +6115,9 @@ static void mlx5_ib_cleanup_multiport_master(struct mlx5_ib_dev *dev)
mlx5_nic_vport_disable_roce(dev->mdev);
}
-static int var_obj_cleanup(struct ib_uobject *uobject,
- enum rdma_remove_reason why,
- struct uverbs_attr_bundle *attrs)
+static int mmap_obj_cleanup(struct ib_uobject *uobject,
+ enum rdma_remove_reason why,
+ struct uverbs_attr_bundle *attrs)
{
struct mlx5_user_mmap_entry *obj = uobject->object;
@@ -6100,6 +6125,16 @@ static int var_obj_cleanup(struct ib_uobject *uobject,
return 0;
}
+static int mlx5_rdma_user_mmap_entry_insert(struct mlx5_ib_ucontext *c,
+ struct mlx5_user_mmap_entry *entry,
+ size_t length)
+{
+ return rdma_user_mmap_entry_insert_range(
+ &c->ibucontext, &entry->rdma_entry, length,
+ (MLX5_IB_MMAP_OFFSET_START << 16),
+ ((MLX5_IB_MMAP_OFFSET_END << 16) + (1UL << 16) - 1));
+}
+
static struct mlx5_user_mmap_entry *
alloc_var_entry(struct mlx5_ib_ucontext *c)
{
@@ -6130,10 +6165,8 @@ alloc_var_entry(struct mlx5_ib_ucontext *c)
entry->page_idx = page_idx;
entry->mmap_flag = MLX5_IB_MMAP_TYPE_VAR;
- err = rdma_user_mmap_entry_insert_range(
- &c->ibucontext, &entry->rdma_entry, var_table->stride_size,
- MLX5_IB_MMAP_OFFSET_START << 16,
- (MLX5_IB_MMAP_OFFSET_END << 16) + (1UL << 16) - 1);
+ err = mlx5_rdma_user_mmap_entry_insert(c, entry,
+ var_table->stride_size);
if (err)
goto err_insert;
@@ -6217,7 +6250,7 @@ DECLARE_UVERBS_NAMED_METHOD_DESTROY(
UA_MANDATORY));
DECLARE_UVERBS_NAMED_OBJECT(MLX5_IB_OBJECT_VAR,
- UVERBS_TYPE_ALLOC_IDR(var_obj_cleanup),
+ UVERBS_TYPE_ALLOC_IDR(mmap_obj_cleanup),
&UVERBS_METHOD(MLX5_IB_METHOD_VAR_OBJ_ALLOC),
&UVERBS_METHOD(MLX5_IB_METHOD_VAR_OBJ_DESTROY));
@@ -6229,6 +6262,134 @@ static bool var_is_supported(struct ib_device *device)
MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q);
}
+static struct mlx5_user_mmap_entry *
+alloc_uar_entry(struct mlx5_ib_ucontext *c,
+ enum mlx5_ib_uapi_uar_alloc_type alloc_type)
+{
+ struct mlx5_user_mmap_entry *entry;
+ struct mlx5_ib_dev *dev;
+ u32 uar_index;
+ int err;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return ERR_PTR(-ENOMEM);
+
+ dev = to_mdev(c->ibucontext.device);
+ err = mlx5_cmd_alloc_uar(dev->mdev, &uar_index);
+ if (err)
+ goto end;
+
+ entry->page_idx = uar_index;
+ entry->address = uar_index2paddress(dev, uar_index);
+ if (alloc_type == MLX5_IB_UAPI_UAR_ALLOC_TYPE_BF)
+ entry->mmap_flag = MLX5_IB_MMAP_TYPE_UAR_WC;
+ else
+ entry->mmap_flag = MLX5_IB_MMAP_TYPE_UAR_NC;
+
+ err = mlx5_rdma_user_mmap_entry_insert(c, entry, PAGE_SIZE);
+ if (err)
+ goto err_insert;
+
+ return entry;
+
+err_insert:
+ mlx5_cmd_free_uar(dev->mdev, uar_index);
+end:
+ kfree(entry);
+ return ERR_PTR(err);
+}
+
+static int UVERBS_HANDLER(MLX5_IB_METHOD_UAR_OBJ_ALLOC)(
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_uobject *uobj = uverbs_attr_get_uobject(
+ attrs, MLX5_IB_ATTR_UAR_OBJ_ALLOC_HANDLE);
+ enum mlx5_ib_uapi_uar_alloc_type alloc_type;
+ struct mlx5_ib_ucontext *c;
+ struct mlx5_user_mmap_entry *entry;
+ u64 mmap_offset;
+ u32 length;
+ int err;
+
+ c = to_mucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(c))
+ return PTR_ERR(c);
+
+ err = uverbs_get_const(&alloc_type, attrs,
+ MLX5_IB_ATTR_UAR_OBJ_ALLOC_TYPE);
+ if (err)
+ return err;
+
+ if (alloc_type != MLX5_IB_UAPI_UAR_ALLOC_TYPE_BF &&
+ alloc_type != MLX5_IB_UAPI_UAR_ALLOC_TYPE_NC)
+ return -EOPNOTSUPP;
+
+ if (!to_mdev(c->ibucontext.device)->wc_support &&
+ alloc_type == MLX5_IB_UAPI_UAR_ALLOC_TYPE_BF)
+ return -EOPNOTSUPP;
+
+ entry = alloc_uar_entry(c, alloc_type);
+ if (IS_ERR(entry))
+ return PTR_ERR(entry);
+
+ mmap_offset = mlx5_entry_to_mmap_offset(entry);
+ length = entry->rdma_entry.npages * PAGE_SIZE;
+ uobj->object = entry;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_UAR_OBJ_ALLOC_MMAP_OFFSET,
+ &mmap_offset, sizeof(mmap_offset));
+ if (err)
+ goto err;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_UAR_OBJ_ALLOC_PAGE_ID,
+ &entry->page_idx, sizeof(entry->page_idx));
+ if (err)
+ goto err;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_UAR_OBJ_ALLOC_MMAP_LENGTH,
+ &length, sizeof(length));
+ if (err)
+ goto err;
+
+ return 0;
+
+err:
+ rdma_user_mmap_entry_remove(&entry->rdma_entry);
+ return err;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ MLX5_IB_METHOD_UAR_OBJ_ALLOC,
+ UVERBS_ATTR_IDR(MLX5_IB_ATTR_UAR_OBJ_ALLOC_HANDLE,
+ MLX5_IB_OBJECT_UAR,
+ UVERBS_ACCESS_NEW,
+ UA_MANDATORY),
+ UVERBS_ATTR_CONST_IN(MLX5_IB_ATTR_UAR_OBJ_ALLOC_TYPE,
+ enum mlx5_ib_uapi_uar_alloc_type,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_UAR_OBJ_ALLOC_PAGE_ID,
+ UVERBS_ATTR_TYPE(u32),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_UAR_OBJ_ALLOC_MMAP_LENGTH,
+ UVERBS_ATTR_TYPE(u32),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_UAR_OBJ_ALLOC_MMAP_OFFSET,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD_DESTROY(
+ MLX5_IB_METHOD_UAR_OBJ_DESTROY,
+ UVERBS_ATTR_IDR(MLX5_IB_ATTR_UAR_OBJ_DESTROY_HANDLE,
+ MLX5_IB_OBJECT_UAR,
+ UVERBS_ACCESS_DESTROY,
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(MLX5_IB_OBJECT_UAR,
+ UVERBS_TYPE_ALLOC_IDR(mmap_obj_cleanup),
+ &UVERBS_METHOD(MLX5_IB_METHOD_UAR_OBJ_ALLOC),
+ &UVERBS_METHOD(MLX5_IB_METHOD_UAR_OBJ_DESTROY));
+
ADD_UVERBS_ATTRIBUTES_SIMPLE(
mlx5_ib_dm,
UVERBS_OBJECT_DM,
@@ -6253,12 +6414,14 @@ ADD_UVERBS_ATTRIBUTES_SIMPLE(
static const struct uapi_definition mlx5_ib_defs[] = {
UAPI_DEF_CHAIN(mlx5_ib_devx_defs),
UAPI_DEF_CHAIN(mlx5_ib_flow_defs),
+ UAPI_DEF_CHAIN(mlx5_ib_qos_defs),
UAPI_DEF_CHAIN_OBJ_TREE(UVERBS_OBJECT_FLOW_ACTION,
&mlx5_ib_flow_action),
UAPI_DEF_CHAIN_OBJ_TREE(UVERBS_OBJECT_DM, &mlx5_ib_dm),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(MLX5_IB_OBJECT_VAR,
UAPI_DEF_IS_OBJ_SUPPORTED(var_is_supported)),
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(MLX5_IB_OBJECT_UAR),
{}
};
@@ -6392,7 +6555,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
spin_lock_init(&dev->reset_flow_resource_lock);
xa_init(&dev->odp_mkeys);
xa_init(&dev->sig_mrs);
- spin_lock_init(&dev->mkey_lock);
+ atomic_set(&dev->mkey_var, 0);
spin_lock_init(&dev->dm.lock);
dev->dm.dev = mdev;
@@ -6548,7 +6711,8 @@ static int mlx5_ib_init_var_table(struct mlx5_ib_dev *dev)
doorbell_bar_offset);
bar_size = (1ULL << log_doorbell_bar_size) * 4096;
var_table->stride_size = 1ULL << log_doorbell_stride;
- var_table->num_var_hw_entries = div64_u64(bar_size, var_table->stride_size);
+ var_table->num_var_hw_entries = div_u64(bar_size,
+ var_table->stride_size);
mutex_init(&var_table->bitmap_lock);
var_table->bitmap = bitmap_zalloc(var_table->num_var_hw_entries,
GFP_KERNEL);
@@ -7080,6 +7244,9 @@ const struct mlx5_ib_profile raw_eth_profile = {
STAGE_CREATE(MLX5_IB_STAGE_COUNTERS,
mlx5_ib_stage_counters_init,
mlx5_ib_stage_counters_cleanup),
+ STAGE_CREATE(MLX5_IB_STAGE_CONG_DEBUGFS,
+ mlx5_ib_stage_cong_debugfs_init,
+ mlx5_ib_stage_cong_debugfs_cleanup),
STAGE_CREATE(MLX5_IB_STAGE_UAR,
mlx5_ib_stage_uar_init,
mlx5_ib_stage_uar_cleanup),
diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c
index b90a3649e7d1..c19ec9fd8a63 100644
--- a/drivers/infiniband/hw/mlx5/mem.c
+++ b/drivers/infiniband/hw/mlx5/mem.c
@@ -316,7 +316,7 @@ int mlx5_ib_test_wc(struct mlx5_ib_dev *dev)
if (!dev->mdev->roce.roce_en &&
port_type_cap == MLX5_CAP_PORT_TYPE_ETH) {
if (mlx5_core_is_pf(dev->mdev))
- dev->wc_support = true;
+ dev->wc_support = arch_can_pci_mmap_wc();
return 0;
}
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index fc19dc1cf2e1..a4e522385de0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -64,8 +64,6 @@
dev_warn(&(_dev)->ib_dev.dev, "%s:%d:(pid %d): " format, __func__, \
__LINE__, current->pid, ##arg)
-#define field_avail(type, fld, sz) (offsetof(type, fld) + \
- sizeof(((type *)0)->fld) <= (sz))
#define MLX5_IB_DEFAULT_UIDX 0xffffff
#define MLX5_USER_ASSIGNED_UIDX_MASK __mlx5_mask(qpc, user_index)
@@ -126,11 +124,27 @@ enum {
enum mlx5_ib_mmap_type {
MLX5_IB_MMAP_TYPE_MEMIC = 1,
MLX5_IB_MMAP_TYPE_VAR = 2,
+ MLX5_IB_MMAP_TYPE_UAR_WC = 3,
+ MLX5_IB_MMAP_TYPE_UAR_NC = 4,
};
-#define MLX5_LOG_SW_ICM_BLOCK_SIZE(dev) \
- (MLX5_CAP_DEV_MEM(dev, log_sw_icm_alloc_granularity))
-#define MLX5_SW_ICM_BLOCK_SIZE(dev) (1 << MLX5_LOG_SW_ICM_BLOCK_SIZE(dev))
+struct mlx5_bfreg_info {
+ u32 *sys_pages;
+ int num_low_latency_bfregs;
+ unsigned int *count;
+
+ /*
+ * protect bfreg allocation data structs
+ */
+ struct mutex lock;
+ u32 ver;
+ u8 lib_uar_4k : 1;
+ u8 lib_uar_dyn : 1;
+ u32 num_sys_pages;
+ u32 num_static_sys_pages;
+ u32 total_num_bfregs;
+ u32 num_dyn_bfregs;
+};
struct mlx5_ib_ucontext {
struct ib_ucontext ibucontext;
@@ -203,6 +217,11 @@ struct mlx5_ib_flow_matcher {
u8 match_criteria_enable;
};
+struct mlx5_ib_pp {
+ u16 index;
+ struct mlx5_core_dev *mdev;
+};
+
struct mlx5_ib_flow_db {
struct mlx5_ib_flow_prio prios[MLX5_IB_NUM_FLOW_FT];
struct mlx5_ib_flow_prio egress_prios[MLX5_IB_NUM_FLOW_FT];
@@ -210,6 +229,7 @@ struct mlx5_ib_flow_db {
struct mlx5_ib_flow_prio egress[MLX5_IB_NUM_EGRESS_FTS];
struct mlx5_ib_flow_prio fdb;
struct mlx5_ib_flow_prio rdma_rx[MLX5_IB_NUM_FLOW_FT];
+ struct mlx5_ib_flow_prio rdma_tx[MLX5_IB_NUM_FLOW_FT];
struct mlx5_flow_table *lag_demux_ft;
/* Protect flow steering bypass flow tables
* when add/del flow rules.
@@ -618,8 +638,8 @@ struct mlx5_ib_mr {
struct ib_umem *umem;
struct mlx5_shared_mr_info *smr_info;
struct list_head list;
- int order;
- bool allocated_from_cache;
+ unsigned int order;
+ struct mlx5_cache_ent *cache_ent;
int npages;
struct mlx5_ib_dev *dev;
u32 out[MLX5_ST_SZ_DW(create_mkey_out)];
@@ -701,22 +721,34 @@ struct mlx5_cache_ent {
u32 access_mode;
u32 page;
- u32 size;
- u32 cur;
+ u8 disabled:1;
+ u8 fill_to_high_water:1;
+
+ /*
+ * - available_mrs is the length of list head, ie the number of MRs
+ * available for immediate allocation.
+ * - total_mrs is available_mrs plus all in use MRs that could be
+ * returned to the cache.
+ * - limit is the low water mark for available_mrs, 2* limit is the
+ * upper water mark.
+ * - pending is the number of MRs currently being created
+ */
+ u32 total_mrs;
+ u32 available_mrs;
+ u32 limit;
+ u32 pending;
+
+ /* Statistics */
u32 miss;
- u32 limit;
struct mlx5_ib_dev *dev;
struct work_struct work;
struct delayed_work dwork;
- int pending;
- struct completion compl;
};
struct mlx5_mr_cache {
struct workqueue_struct *wq;
struct mlx5_cache_ent ent[MAX_MR_CACHE_ENTRIES];
- int stopped;
struct dentry *root;
unsigned long last_add;
};
@@ -794,6 +826,7 @@ enum mlx5_ib_dbg_cc_types {
MLX5_IB_DBG_CC_RP_BYTE_RESET,
MLX5_IB_DBG_CC_RP_THRESHOLD,
MLX5_IB_DBG_CC_RP_AI_RATE,
+ MLX5_IB_DBG_CC_RP_MAX_RATE,
MLX5_IB_DBG_CC_RP_HAI_RATE,
MLX5_IB_DBG_CC_RP_MIN_DEC_FAC,
MLX5_IB_DBG_CC_RP_MIN_RATE,
@@ -803,6 +836,7 @@ enum mlx5_ib_dbg_cc_types {
MLX5_IB_DBG_CC_RP_RATE_REDUCE_MONITOR_PERIOD,
MLX5_IB_DBG_CC_RP_INITIAL_ALPHA_VALUE,
MLX5_IB_DBG_CC_RP_GD,
+ MLX5_IB_DBG_CC_NP_MIN_TIME_BETWEEN_CNPS,
MLX5_IB_DBG_CC_NP_CNP_DSCP,
MLX5_IB_DBG_CC_NP_CNP_PRIO_MODE,
MLX5_IB_DBG_CC_NP_CNP_PRIO,
@@ -986,19 +1020,16 @@ struct mlx5_ib_dev {
*/
struct mutex cap_mask_mutex;
u8 ib_active:1;
- u8 fill_delay:1;
u8 is_rep:1;
u8 lag_active:1;
u8 wc_support:1;
+ u8 fill_delay;
struct umr_common umrc;
/* sync used page count stats
*/
struct mlx5_ib_resources devr;
- /* protect mkey key part */
- spinlock_t mkey_lock;
- u8 mkey_key;
-
+ atomic_t mkey_var;
struct mlx5_mr_cache cache;
struct timer_list delay_timer;
/* Prevents soft lock on massive reg MRs */
@@ -1268,7 +1299,8 @@ int mlx5_ib_get_cqe_size(struct ib_cq *ibcq);
int mlx5_mr_cache_init(struct mlx5_ib_dev *dev);
int mlx5_mr_cache_cleanup(struct mlx5_ib_dev *dev);
-struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, int entry);
+struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
+ unsigned int entry);
void mlx5_mr_cache_free(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr);
int mlx5_mr_cache_invalidate(struct mlx5_ib_mr *mr);
@@ -1388,6 +1420,7 @@ int mlx5_ib_fill_stat_entry(struct sk_buff *msg,
extern const struct uapi_definition mlx5_ib_devx_defs[];
extern const struct uapi_definition mlx5_ib_flow_defs[];
+extern const struct uapi_definition mlx5_ib_qos_defs[];
#if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
@@ -1477,12 +1510,11 @@ static inline int get_qp_user_index(struct mlx5_ib_ucontext *ucontext,
{
u8 cqe_version = ucontext->cqe_version;
- if (field_avail(struct mlx5_ib_create_qp, uidx, inlen) &&
- !cqe_version && (ucmd->uidx == MLX5_IB_DEFAULT_UIDX))
+ if ((offsetofend(typeof(*ucmd), uidx) <= inlen) && !cqe_version &&
+ (ucmd->uidx == MLX5_IB_DEFAULT_UIDX))
return 0;
- if (!!(field_avail(struct mlx5_ib_create_qp, uidx, inlen) !=
- !!cqe_version))
+ if ((offsetofend(typeof(*ucmd), uidx) <= inlen) != !!cqe_version)
return -EINVAL;
return verify_assign_uidx(cqe_version, ucmd->uidx, user_index);
@@ -1495,12 +1527,11 @@ static inline int get_srq_user_index(struct mlx5_ib_ucontext *ucontext,
{
u8 cqe_version = ucontext->cqe_version;
- if (field_avail(struct mlx5_ib_create_srq, uidx, inlen) &&
- !cqe_version && (ucmd->uidx == MLX5_IB_DEFAULT_UIDX))
+ if ((offsetofend(typeof(*ucmd), uidx) <= inlen) && !cqe_version &&
+ (ucmd->uidx == MLX5_IB_DEFAULT_UIDX))
return 0;
- if (!!(field_avail(struct mlx5_ib_create_srq, uidx, inlen) !=
- !!cqe_version))
+ if ((offsetofend(typeof(*ucmd), uidx) <= inlen) != !!cqe_version)
return -EINVAL;
return verify_assign_uidx(cqe_version, ucmd->uidx, user_index);
@@ -1539,7 +1570,9 @@ static inline bool mlx5_ib_can_use_umr(struct mlx5_ib_dev *dev,
MLX5_CAP_GEN(dev->mdev, umr_modify_atomic_disabled))
return false;
- if (access_flags & IB_ACCESS_RELAXED_ORDERING)
+ if (access_flags & IB_ACCESS_RELAXED_ORDERING &&
+ (MLX5_CAP_GEN(dev->mdev, relaxed_ordering_write) ||
+ MLX5_CAP_GEN(dev->mdev, relaxed_ordering_read)))
return false;
return true;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 8508af500972..a401931189b7 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -54,12 +54,8 @@ static void
assign_mkey_variant(struct mlx5_ib_dev *dev, struct mlx5_core_mkey *mkey,
u32 *in)
{
+ u8 key = atomic_inc_return(&dev->mkey_var);
void *mkc;
- u8 key;
-
- spin_lock_irq(&dev->mkey_lock);
- key = dev->mkey_key++;
- spin_unlock_irq(&dev->mkey_lock);
mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
MLX5_SET(mkc, mkc, mkey_7_0, key);
@@ -90,6 +86,7 @@ mlx5_ib_create_mkey_cb(struct mlx5_ib_dev *dev,
static void clean_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr);
static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr);
static int mr_cache_max_order(struct mlx5_ib_dev *dev);
+static void queue_adjust_cache_locked(struct mlx5_cache_ent *ent);
static bool umr_can_use_indirect_mkey(struct mlx5_ib_dev *dev)
{
@@ -103,16 +100,6 @@ static int destroy_mkey(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
return mlx5_core_destroy_mkey(dev->mdev, &mr->mmkey);
}
-static int order2idx(struct mlx5_ib_dev *dev, int order)
-{
- struct mlx5_mr_cache *cache = &dev->cache;
-
- if (order < cache->ent[0].order)
- return 0;
- else
- return order - cache->ent[0].order;
-}
-
static bool use_umr_mtt_update(struct mlx5_ib_mr *mr, u64 start, u64 length)
{
return ((u64)1 << mr->order) * MLX5_ADAPTER_PAGE_SIZE >=
@@ -124,18 +111,16 @@ static void create_mkey_callback(int status, struct mlx5_async_work *context)
struct mlx5_ib_mr *mr =
container_of(context, struct mlx5_ib_mr, cb_work);
struct mlx5_ib_dev *dev = mr->dev;
- struct mlx5_mr_cache *cache = &dev->cache;
- int c = order2idx(dev, mr->order);
- struct mlx5_cache_ent *ent = &cache->ent[c];
+ struct mlx5_cache_ent *ent = mr->cache_ent;
unsigned long flags;
- spin_lock_irqsave(&ent->lock, flags);
- ent->pending--;
- spin_unlock_irqrestore(&ent->lock, flags);
if (status) {
mlx5_ib_warn(dev, "async reg mr failed. status %d\n", status);
kfree(mr);
- dev->fill_delay = 1;
+ spin_lock_irqsave(&ent->lock, flags);
+ ent->pending--;
+ WRITE_ONCE(dev->fill_delay, 1);
+ spin_unlock_irqrestore(&ent->lock, flags);
mod_timer(&dev->delay_timer, jiffies + HZ);
return;
}
@@ -144,23 +129,44 @@ static void create_mkey_callback(int status, struct mlx5_async_work *context)
mr->mmkey.key |= mlx5_idx_to_mkey(
MLX5_GET(create_mkey_out, mr->out, mkey_index));
- cache->last_add = jiffies;
+ WRITE_ONCE(dev->cache.last_add, jiffies);
spin_lock_irqsave(&ent->lock, flags);
list_add_tail(&mr->list, &ent->head);
- ent->cur++;
- ent->size++;
+ ent->available_mrs++;
+ ent->total_mrs++;
+ /* If we are doing fill_to_high_water then keep going. */
+ queue_adjust_cache_locked(ent);
+ ent->pending--;
spin_unlock_irqrestore(&ent->lock, flags);
-
- if (!completion_done(&ent->compl))
- complete(&ent->compl);
}
-static int add_keys(struct mlx5_ib_dev *dev, int c, int num)
+static struct mlx5_ib_mr *alloc_cache_mr(struct mlx5_cache_ent *ent, void *mkc)
{
- struct mlx5_mr_cache *cache = &dev->cache;
- struct mlx5_cache_ent *ent = &cache->ent[c];
- int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
+ struct mlx5_ib_mr *mr;
+
+ mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+ if (!mr)
+ return NULL;
+ mr->order = ent->order;
+ mr->cache_ent = ent;
+ mr->dev = ent->dev;
+
+ MLX5_SET(mkc, mkc, free, 1);
+ MLX5_SET(mkc, mkc, umr_en, 1);
+ MLX5_SET(mkc, mkc, access_mode_1_0, ent->access_mode & 0x3);
+ MLX5_SET(mkc, mkc, access_mode_4_2, (ent->access_mode >> 2) & 0x7);
+
+ MLX5_SET(mkc, mkc, qpn, 0xffffff);
+ MLX5_SET(mkc, mkc, translations_octword_size, ent->xlt);
+ MLX5_SET(mkc, mkc, log_page_size, ent->page);
+ return mr;
+}
+
+/* Asynchronously schedule new MRs to be populated in the cache. */
+static int add_keys(struct mlx5_cache_ent *ent, unsigned int num)
+{
+ size_t inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
struct mlx5_ib_mr *mr;
void *mkc;
u32 *in;
@@ -173,42 +179,29 @@ static int add_keys(struct mlx5_ib_dev *dev, int c, int num)
mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
for (i = 0; i < num; i++) {
- if (ent->pending >= MAX_PENDING_REG_MR) {
- err = -EAGAIN;
- break;
- }
-
- mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+ mr = alloc_cache_mr(ent, mkc);
if (!mr) {
err = -ENOMEM;
break;
}
- mr->order = ent->order;
- mr->allocated_from_cache = true;
- mr->dev = dev;
-
- MLX5_SET(mkc, mkc, free, 1);
- MLX5_SET(mkc, mkc, umr_en, 1);
- MLX5_SET(mkc, mkc, access_mode_1_0, ent->access_mode & 0x3);
- MLX5_SET(mkc, mkc, access_mode_4_2,
- (ent->access_mode >> 2) & 0x7);
-
- MLX5_SET(mkc, mkc, qpn, 0xffffff);
- MLX5_SET(mkc, mkc, translations_octword_size, ent->xlt);
- MLX5_SET(mkc, mkc, log_page_size, ent->page);
-
spin_lock_irq(&ent->lock);
+ if (ent->pending >= MAX_PENDING_REG_MR) {
+ err = -EAGAIN;
+ spin_unlock_irq(&ent->lock);
+ kfree(mr);
+ break;
+ }
ent->pending++;
spin_unlock_irq(&ent->lock);
- err = mlx5_ib_create_mkey_cb(dev, &mr->mmkey,
- &dev->async_ctx, in, inlen,
- mr->out, sizeof(mr->out),
- &mr->cb_work);
+ err = mlx5_ib_create_mkey_cb(ent->dev, &mr->mmkey,
+ &ent->dev->async_ctx, in, inlen,
+ mr->out, sizeof(mr->out),
+ &mr->cb_work);
if (err) {
spin_lock_irq(&ent->lock);
ent->pending--;
spin_unlock_irq(&ent->lock);
- mlx5_ib_warn(dev, "create mkey failed %d\n", err);
+ mlx5_ib_warn(ent->dev, "create mkey failed %d\n", err);
kfree(mr);
break;
}
@@ -218,32 +211,89 @@ static int add_keys(struct mlx5_ib_dev *dev, int c, int num)
return err;
}
-static void remove_keys(struct mlx5_ib_dev *dev, int c, int num)
+/* Synchronously create a MR in the cache */
+static struct mlx5_ib_mr *create_cache_mr(struct mlx5_cache_ent *ent)
{
- struct mlx5_mr_cache *cache = &dev->cache;
- struct mlx5_cache_ent *ent = &cache->ent[c];
- struct mlx5_ib_mr *tmp_mr;
+ size_t inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
struct mlx5_ib_mr *mr;
- LIST_HEAD(del_list);
- int i;
+ void *mkc;
+ u32 *in;
+ int err;
- for (i = 0; i < num; i++) {
- spin_lock_irq(&ent->lock);
- if (list_empty(&ent->head)) {
- spin_unlock_irq(&ent->lock);
- break;
- }
- mr = list_first_entry(&ent->head, struct mlx5_ib_mr, list);
- list_move(&mr->list, &del_list);
- ent->cur--;
- ent->size--;
- spin_unlock_irq(&ent->lock);
- mlx5_core_destroy_mkey(dev->mdev, &mr->mmkey);
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return ERR_PTR(-ENOMEM);
+ mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+
+ mr = alloc_cache_mr(ent, mkc);
+ if (!mr) {
+ err = -ENOMEM;
+ goto free_in;
}
- list_for_each_entry_safe(mr, tmp_mr, &del_list, list) {
- list_del(&mr->list);
- kfree(mr);
+ err = mlx5_core_create_mkey(ent->dev->mdev, &mr->mmkey, in, inlen);
+ if (err)
+ goto free_mr;
+
+ mr->mmkey.type = MLX5_MKEY_MR;
+ WRITE_ONCE(ent->dev->cache.last_add, jiffies);
+ spin_lock_irq(&ent->lock);
+ ent->total_mrs++;
+ spin_unlock_irq(&ent->lock);
+ kfree(in);
+ return mr;
+free_mr:
+ kfree(mr);
+free_in:
+ kfree(in);
+ return ERR_PTR(err);
+}
+
+static void remove_cache_mr_locked(struct mlx5_cache_ent *ent)
+{
+ struct mlx5_ib_mr *mr;
+
+ lockdep_assert_held(&ent->lock);
+ if (list_empty(&ent->head))
+ return;
+ mr = list_first_entry(&ent->head, struct mlx5_ib_mr, list);
+ list_del(&mr->list);
+ ent->available_mrs--;
+ ent->total_mrs--;
+ spin_unlock_irq(&ent->lock);
+ mlx5_core_destroy_mkey(ent->dev->mdev, &mr->mmkey);
+ kfree(mr);
+ spin_lock_irq(&ent->lock);
+}
+
+static int resize_available_mrs(struct mlx5_cache_ent *ent, unsigned int target,
+ bool limit_fill)
+{
+ int err;
+
+ lockdep_assert_held(&ent->lock);
+
+ while (true) {
+ if (limit_fill)
+ target = ent->limit * 2;
+ if (target == ent->available_mrs + ent->pending)
+ return 0;
+ if (target > ent->available_mrs + ent->pending) {
+ u32 todo = target - (ent->available_mrs + ent->pending);
+
+ spin_unlock_irq(&ent->lock);
+ err = add_keys(ent, todo);
+ if (err == -EAGAIN)
+ usleep_range(3000, 5000);
+ spin_lock_irq(&ent->lock);
+ if (err) {
+ if (err != -EAGAIN)
+ return err;
+ } else
+ return 0;
+ } else {
+ remove_cache_mr_locked(ent);
+ }
}
}
@@ -251,37 +301,38 @@ static ssize_t size_write(struct file *filp, const char __user *buf,
size_t count, loff_t *pos)
{
struct mlx5_cache_ent *ent = filp->private_data;
- struct mlx5_ib_dev *dev = ent->dev;
- char lbuf[20] = {0};
- u32 var;
+ u32 target;
int err;
- int c;
- count = min(count, sizeof(lbuf) - 1);
- if (copy_from_user(lbuf, buf, count))
- return -EFAULT;
+ err = kstrtou32_from_user(buf, count, 0, &target);
+ if (err)
+ return err;
- c = order2idx(dev, ent->order);
-
- if (sscanf(lbuf, "%u", &var) != 1)
- return -EINVAL;
-
- if (var < ent->limit)
- return -EINVAL;
-
- if (var > ent->size) {
- do {
- err = add_keys(dev, c, var - ent->size);
- if (err && err != -EAGAIN)
- return err;
-
- usleep_range(3000, 5000);
- } while (err);
- } else if (var < ent->size) {
- remove_keys(dev, c, ent->size - var);
+ /*
+ * Target is the new value of total_mrs the user requests, however we
+ * cannot free MRs that are in use. Compute the target value for
+ * available_mrs.
+ */
+ spin_lock_irq(&ent->lock);
+ if (target < ent->total_mrs - ent->available_mrs) {
+ err = -EINVAL;
+ goto err_unlock;
}
+ target = target - (ent->total_mrs - ent->available_mrs);
+ if (target < ent->limit || target > ent->limit*2) {
+ err = -EINVAL;
+ goto err_unlock;
+ }
+ err = resize_available_mrs(ent, target, false);
+ if (err)
+ goto err_unlock;
+ spin_unlock_irq(&ent->lock);
return count;
+
+err_unlock:
+ spin_unlock_irq(&ent->lock);
+ return err;
}
static ssize_t size_read(struct file *filp, char __user *buf, size_t count,
@@ -291,7 +342,7 @@ static ssize_t size_read(struct file *filp, char __user *buf, size_t count,
char lbuf[20];
int err;
- err = snprintf(lbuf, sizeof(lbuf), "%d\n", ent->size);
+ err = snprintf(lbuf, sizeof(lbuf), "%d\n", ent->total_mrs);
if (err < 0)
return err;
@@ -309,32 +360,23 @@ static ssize_t limit_write(struct file *filp, const char __user *buf,
size_t count, loff_t *pos)
{
struct mlx5_cache_ent *ent = filp->private_data;
- struct mlx5_ib_dev *dev = ent->dev;
- char lbuf[20] = {0};
u32 var;
int err;
- int c;
- count = min(count, sizeof(lbuf) - 1);
- if (copy_from_user(lbuf, buf, count))
- return -EFAULT;
-
- c = order2idx(dev, ent->order);
-
- if (sscanf(lbuf, "%u", &var) != 1)
- return -EINVAL;
-
- if (var > ent->size)
- return -EINVAL;
+ err = kstrtou32_from_user(buf, count, 0, &var);
+ if (err)
+ return err;
+ /*
+ * Upon set we immediately fill the cache to high water mark implied by
+ * the limit.
+ */
+ spin_lock_irq(&ent->lock);
ent->limit = var;
-
- if (ent->cur < ent->limit) {
- err = add_keys(dev, c, 2 * ent->limit - ent->cur);
- if (err)
- return err;
- }
-
+ err = resize_available_mrs(ent, 0, true);
+ spin_unlock_irq(&ent->lock);
+ if (err)
+ return err;
return count;
}
@@ -359,68 +401,119 @@ static const struct file_operations limit_fops = {
.read = limit_read,
};
-static int someone_adding(struct mlx5_mr_cache *cache)
+static bool someone_adding(struct mlx5_mr_cache *cache)
{
- int i;
+ unsigned int i;
for (i = 0; i < MAX_MR_CACHE_ENTRIES; i++) {
- if (cache->ent[i].cur < cache->ent[i].limit)
- return 1;
- }
+ struct mlx5_cache_ent *ent = &cache->ent[i];
+ bool ret;
- return 0;
+ spin_lock_irq(&ent->lock);
+ ret = ent->available_mrs < ent->limit;
+ spin_unlock_irq(&ent->lock);
+ if (ret)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Check if the bucket is outside the high/low water mark and schedule an async
+ * update. The cache refill has hysteresis, once the low water mark is hit it is
+ * refilled up to the high mark.
+ */
+static void queue_adjust_cache_locked(struct mlx5_cache_ent *ent)
+{
+ lockdep_assert_held(&ent->lock);
+
+ if (ent->disabled || READ_ONCE(ent->dev->fill_delay))
+ return;
+ if (ent->available_mrs < ent->limit) {
+ ent->fill_to_high_water = true;
+ queue_work(ent->dev->cache.wq, &ent->work);
+ } else if (ent->fill_to_high_water &&
+ ent->available_mrs + ent->pending < 2 * ent->limit) {
+ /*
+ * Once we start populating due to hitting a low water mark
+ * continue until we pass the high water mark.
+ */
+ queue_work(ent->dev->cache.wq, &ent->work);
+ } else if (ent->available_mrs == 2 * ent->limit) {
+ ent->fill_to_high_water = false;
+ } else if (ent->available_mrs > 2 * ent->limit) {
+ /* Queue deletion of excess entries */
+ ent->fill_to_high_water = false;
+ if (ent->pending)
+ queue_delayed_work(ent->dev->cache.wq, &ent->dwork,
+ msecs_to_jiffies(1000));
+ else
+ queue_work(ent->dev->cache.wq, &ent->work);
+ }
}
static void __cache_work_func(struct mlx5_cache_ent *ent)
{
struct mlx5_ib_dev *dev = ent->dev;
struct mlx5_mr_cache *cache = &dev->cache;
- int i = order2idx(dev, ent->order);
int err;
- if (cache->stopped)
- return;
+ spin_lock_irq(&ent->lock);
+ if (ent->disabled)
+ goto out;
- ent = &dev->cache.ent[i];
- if (ent->cur < 2 * ent->limit && !dev->fill_delay) {
- err = add_keys(dev, i, 1);
- if (ent->cur < 2 * ent->limit) {
- if (err == -EAGAIN) {
- mlx5_ib_dbg(dev, "returned eagain, order %d\n",
- i + 2);
- queue_delayed_work(cache->wq, &ent->dwork,
- msecs_to_jiffies(3));
- } else if (err) {
- mlx5_ib_warn(dev, "command failed order %d, err %d\n",
- i + 2, err);
+ if (ent->fill_to_high_water &&
+ ent->available_mrs + ent->pending < 2 * ent->limit &&
+ !READ_ONCE(dev->fill_delay)) {
+ spin_unlock_irq(&ent->lock);
+ err = add_keys(ent, 1);
+ spin_lock_irq(&ent->lock);
+ if (ent->disabled)
+ goto out;
+ if (err) {
+ /*
+ * EAGAIN only happens if pending is positive, so we
+ * will be rescheduled from reg_mr_callback(). The only
+ * failure path here is ENOMEM.
+ */
+ if (err != -EAGAIN) {
+ mlx5_ib_warn(
+ dev,
+ "command failed order %d, err %d\n",
+ ent->order, err);
queue_delayed_work(cache->wq, &ent->dwork,
msecs_to_jiffies(1000));
- } else {
- queue_work(cache->wq, &ent->work);
}
}
- } else if (ent->cur > 2 * ent->limit) {
+ } else if (ent->available_mrs > 2 * ent->limit) {
+ bool need_delay;
+
/*
- * The remove_keys() logic is performed as garbage collection
- * task. Such task is intended to be run when no other active
- * processes are running.
+ * The remove_cache_mr() logic is performed as garbage
+ * collection task. Such task is intended to be run when no
+ * other active processes are running.
*
* The need_resched() will return TRUE if there are user tasks
* to be activated in near future.
*
- * In such case, we don't execute remove_keys() and postpone
- * the garbage collection work to try to run in next cycle,
- * in order to free CPU resources to other tasks.
+ * In such case, we don't execute remove_cache_mr() and postpone
+ * the garbage collection work to try to run in next cycle, in
+ * order to free CPU resources to other tasks.
*/
- if (!need_resched() && !someone_adding(cache) &&
- time_after(jiffies, cache->last_add + 300 * HZ)) {
- remove_keys(dev, i, 1);
- if (ent->cur > ent->limit)
- queue_work(cache->wq, &ent->work);
- } else {
+ spin_unlock_irq(&ent->lock);
+ need_delay = need_resched() || someone_adding(cache) ||
+ time_after(jiffies,
+ READ_ONCE(cache->last_add) + 300 * HZ);
+ spin_lock_irq(&ent->lock);
+ if (ent->disabled)
+ goto out;
+ if (need_delay)
queue_delayed_work(cache->wq, &ent->dwork, 300 * HZ);
- }
+ remove_cache_mr_locked(ent);
+ queue_adjust_cache_locked(ent);
}
+out:
+ spin_unlock_irq(&ent->lock);
}
static void delayed_cache_work_func(struct work_struct *work)
@@ -439,117 +532,95 @@ static void cache_work_func(struct work_struct *work)
__cache_work_func(ent);
}
-struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, int entry)
+/* Allocate a special entry from the cache */
+struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
+ unsigned int entry)
{
struct mlx5_mr_cache *cache = &dev->cache;
struct mlx5_cache_ent *ent;
struct mlx5_ib_mr *mr;
- int err;
- if (entry < 0 || entry >= MAX_MR_CACHE_ENTRIES) {
- mlx5_ib_err(dev, "cache entry %d is out of range\n", entry);
+ if (WARN_ON(entry <= MR_CACHE_LAST_STD_ENTRY ||
+ entry >= ARRAY_SIZE(cache->ent)))
return ERR_PTR(-EINVAL);
- }
ent = &cache->ent[entry];
- while (1) {
- spin_lock_irq(&ent->lock);
- if (list_empty(&ent->head)) {
- spin_unlock_irq(&ent->lock);
-
- err = add_keys(dev, entry, 1);
- if (err && err != -EAGAIN)
- return ERR_PTR(err);
-
- wait_for_completion(&ent->compl);
- } else {
- mr = list_first_entry(&ent->head, struct mlx5_ib_mr,
- list);
- list_del(&mr->list);
- ent->cur--;
- spin_unlock_irq(&ent->lock);
- if (ent->cur < ent->limit)
- queue_work(cache->wq, &ent->work);
+ spin_lock_irq(&ent->lock);
+ if (list_empty(&ent->head)) {
+ spin_unlock_irq(&ent->lock);
+ mr = create_cache_mr(ent);
+ if (IS_ERR(mr))
return mr;
- }
+ } else {
+ mr = list_first_entry(&ent->head, struct mlx5_ib_mr, list);
+ list_del(&mr->list);
+ ent->available_mrs--;
+ queue_adjust_cache_locked(ent);
+ spin_unlock_irq(&ent->lock);
}
+ return mr;
}
-static struct mlx5_ib_mr *alloc_cached_mr(struct mlx5_ib_dev *dev, int order)
+/* Return a MR already available in the cache */
+static struct mlx5_ib_mr *get_cache_mr(struct mlx5_cache_ent *req_ent)
{
- struct mlx5_mr_cache *cache = &dev->cache;
+ struct mlx5_ib_dev *dev = req_ent->dev;
struct mlx5_ib_mr *mr = NULL;
- struct mlx5_cache_ent *ent;
- int last_umr_cache_entry;
- int c;
- int i;
+ struct mlx5_cache_ent *ent = req_ent;
- c = order2idx(dev, order);
- last_umr_cache_entry = order2idx(dev, mr_cache_max_order(dev));
- if (c < 0 || c > last_umr_cache_entry) {
- mlx5_ib_warn(dev, "order %d, cache index %d\n", order, c);
- return NULL;
- }
-
- for (i = c; i <= last_umr_cache_entry; i++) {
- ent = &cache->ent[i];
-
- mlx5_ib_dbg(dev, "order %d, cache index %d\n", ent->order, i);
+ /* Try larger MR pools from the cache to satisfy the allocation */
+ for (; ent != &dev->cache.ent[MR_CACHE_LAST_STD_ENTRY + 1]; ent++) {
+ mlx5_ib_dbg(dev, "order %u, cache index %zu\n", ent->order,
+ ent - dev->cache.ent);
spin_lock_irq(&ent->lock);
if (!list_empty(&ent->head)) {
mr = list_first_entry(&ent->head, struct mlx5_ib_mr,
list);
list_del(&mr->list);
- ent->cur--;
+ ent->available_mrs--;
+ queue_adjust_cache_locked(ent);
spin_unlock_irq(&ent->lock);
- if (ent->cur < ent->limit)
- queue_work(cache->wq, &ent->work);
break;
}
+ queue_adjust_cache_locked(ent);
spin_unlock_irq(&ent->lock);
-
- queue_work(cache->wq, &ent->work);
}
if (!mr)
- cache->ent[c].miss++;
+ req_ent->miss++;
return mr;
}
+static void detach_mr_from_cache(struct mlx5_ib_mr *mr)
+{
+ struct mlx5_cache_ent *ent = mr->cache_ent;
+
+ mr->cache_ent = NULL;
+ spin_lock_irq(&ent->lock);
+ ent->total_mrs--;
+ spin_unlock_irq(&ent->lock);
+}
+
void mlx5_mr_cache_free(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
{
- struct mlx5_mr_cache *cache = &dev->cache;
- struct mlx5_cache_ent *ent;
- int shrink = 0;
- int c;
+ struct mlx5_cache_ent *ent = mr->cache_ent;
- if (!mr->allocated_from_cache)
+ if (!ent)
return;
- c = order2idx(dev, mr->order);
- WARN_ON(c < 0 || c >= MAX_MR_CACHE_ENTRIES);
-
if (mlx5_mr_cache_invalidate(mr)) {
- mr->allocated_from_cache = false;
+ detach_mr_from_cache(mr);
destroy_mkey(dev, mr);
- ent = &cache->ent[c];
- if (ent->cur < ent->limit)
- queue_work(cache->wq, &ent->work);
return;
}
- ent = &cache->ent[c];
spin_lock_irq(&ent->lock);
list_add_tail(&mr->list, &ent->head);
- ent->cur++;
- if (ent->cur > 2 * ent->limit)
- shrink = 1;
+ ent->available_mrs++;
+ queue_adjust_cache_locked(ent);
spin_unlock_irq(&ent->lock);
-
- if (shrink)
- queue_work(cache->wq, &ent->work);
}
static void clean_keys(struct mlx5_ib_dev *dev, int c)
@@ -569,8 +640,8 @@ static void clean_keys(struct mlx5_ib_dev *dev, int c)
}
mr = list_first_entry(&ent->head, struct mlx5_ib_mr, list);
list_move(&mr->list, &del_list);
- ent->cur--;
- ent->size--;
+ ent->available_mrs--;
+ ent->total_mrs--;
spin_unlock_irq(&ent->lock);
mlx5_core_destroy_mkey(dev->mdev, &mr->mmkey);
}
@@ -608,7 +679,7 @@ static void mlx5_mr_cache_debugfs_init(struct mlx5_ib_dev *dev)
dir = debugfs_create_dir(ent->name, cache->root);
debugfs_create_file("size", 0600, dir, ent, &size_fops);
debugfs_create_file("limit", 0600, dir, ent, &limit_fops);
- debugfs_create_u32("cur", 0400, dir, &ent->cur);
+ debugfs_create_u32("cur", 0400, dir, &ent->available_mrs);
debugfs_create_u32("miss", 0600, dir, &ent->miss);
}
}
@@ -617,7 +688,7 @@ static void delay_time_func(struct timer_list *t)
{
struct mlx5_ib_dev *dev = from_timer(dev, t, delay_timer);
- dev->fill_delay = 0;
+ WRITE_ONCE(dev->fill_delay, 0);
}
int mlx5_mr_cache_init(struct mlx5_ib_dev *dev)
@@ -643,7 +714,6 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev)
ent->dev = dev;
ent->limit = 0;
- init_completion(&ent->compl);
INIT_WORK(&ent->work, cache_work_func);
INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
@@ -665,7 +735,9 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev)
ent->limit = dev->mdev->profile->mr_cache[i].limit;
else
ent->limit = 0;
- queue_work(cache->wq, &ent->work);
+ spin_lock_irq(&ent->lock);
+ queue_adjust_cache_locked(ent);
+ spin_unlock_irq(&ent->lock);
}
mlx5_mr_cache_debugfs_init(dev);
@@ -675,13 +747,20 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev)
int mlx5_mr_cache_cleanup(struct mlx5_ib_dev *dev)
{
- int i;
+ unsigned int i;
if (!dev->cache.wq)
return 0;
- dev->cache.stopped = 1;
- flush_workqueue(dev->cache.wq);
+ for (i = 0; i < MAX_MR_CACHE_ENTRIES; i++) {
+ struct mlx5_cache_ent *ent = &dev->cache.ent[i];
+
+ spin_lock_irq(&ent->lock);
+ ent->disabled = true;
+ spin_unlock_irq(&ent->lock);
+ cancel_work_sync(&ent->work);
+ cancel_delayed_work_sync(&ent->dwork);
+ }
mlx5_mr_cache_debugfs_cleanup(dev);
mlx5_cmd_cleanup_async_ctx(&dev->async_ctx);
@@ -876,31 +955,37 @@ static int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
return err;
}
-static struct mlx5_ib_mr *alloc_mr_from_cache(
- struct ib_pd *pd, struct ib_umem *umem,
- u64 virt_addr, u64 len, int npages,
- int page_shift, int order, int access_flags)
+static struct mlx5_cache_ent *mr_cache_ent_from_order(struct mlx5_ib_dev *dev,
+ unsigned int order)
+{
+ struct mlx5_mr_cache *cache = &dev->cache;
+
+ if (order < cache->ent[0].order)
+ return &cache->ent[0];
+ order = order - cache->ent[0].order;
+ if (order > MR_CACHE_LAST_STD_ENTRY)
+ return NULL;
+ return &cache->ent[order];
+}
+
+static struct mlx5_ib_mr *
+alloc_mr_from_cache(struct ib_pd *pd, struct ib_umem *umem, u64 virt_addr,
+ u64 len, int npages, int page_shift, unsigned int order,
+ int access_flags)
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
+ struct mlx5_cache_ent *ent = mr_cache_ent_from_order(dev, order);
struct mlx5_ib_mr *mr;
- int err = 0;
- int i;
- for (i = 0; i < 1; i++) {
- mr = alloc_cached_mr(dev, order);
- if (mr)
- break;
-
- err = add_keys(dev, order2idx(dev, order), 1);
- if (err && err != -EAGAIN) {
- mlx5_ib_warn(dev, "add_keys failed, err %d\n", err);
- break;
- }
+ if (!ent)
+ return ERR_PTR(-E2BIG);
+ mr = get_cache_mr(ent);
+ if (!mr) {
+ mr = create_cache_mr(ent);
+ if (IS_ERR(mr))
+ return mr;
}
- if (!mr)
- return ERR_PTR(-EAGAIN);
-
mr->ibmr.pd = pd;
mr->umem = umem;
mr->access_flags = access_flags;
@@ -1474,10 +1559,9 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
/*
* UMR can't be used - MKey needs to be replaced.
*/
- if (mr->allocated_from_cache)
- err = mlx5_mr_cache_invalidate(mr);
- else
- err = destroy_mkey(dev, mr);
+ if (mr->cache_ent)
+ detach_mr_from_cache(mr);
+ err = destroy_mkey(dev, mr);
if (err)
goto err;
@@ -1489,8 +1573,6 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
mr = to_mmr(ib_mr);
goto err;
}
-
- mr->allocated_from_cache = false;
} else {
/*
* Send a UMR WQE
@@ -1577,8 +1659,6 @@ mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
static void clean_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
{
- int allocated_from_cache = mr->allocated_from_cache;
-
if (mr->sig) {
if (mlx5_core_destroy_psv(dev->mdev,
mr->sig->psv_memory.psv_idx))
@@ -1593,7 +1673,7 @@ static void clean_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
mr->sig = NULL;
}
- if (!allocated_from_cache) {
+ if (!mr->cache_ent) {
destroy_mkey(dev, mr);
mlx5_free_priv_descs(mr);
}
@@ -1610,7 +1690,7 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
else
clean_mr(dev, mr);
- if (mr->allocated_from_cache)
+ if (mr->cache_ent)
mlx5_mr_cache_free(dev, mr);
else
kfree(mr);
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index bf50cd91f472..3de7606d4a1a 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -197,7 +197,7 @@ static void dma_fence_odp_mr(struct mlx5_ib_mr *mr)
odp->private = NULL;
mutex_unlock(&odp->umem_mutex);
- if (!mr->allocated_from_cache) {
+ if (!mr->cache_ent) {
mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
WARN_ON(mr->descs);
}
diff --git a/drivers/infiniband/hw/mlx5/qos.c b/drivers/infiniband/hw/mlx5/qos.c
new file mode 100644
index 000000000000..cac878a70edb
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/qos.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2020, Mellanox Technologies inc. All rights reserved.
+ */
+
+#include
+#include
+#include
+#include
+#include "mlx5_ib.h"
+
+#define UVERBS_MODULE_NAME mlx5_ib
+#include
+
+static bool pp_is_supported(struct ib_device *device)
+{
+ struct mlx5_ib_dev *dev = to_mdev(device);
+
+ return (MLX5_CAP_GEN(dev->mdev, qos) &&
+ MLX5_CAP_QOS(dev->mdev, packet_pacing) &&
+ MLX5_CAP_QOS(dev->mdev, packet_pacing_uid));
+}
+
+static int UVERBS_HANDLER(MLX5_IB_METHOD_PP_OBJ_ALLOC)(
+ struct uverbs_attr_bundle *attrs)
+{
+ u8 rl_raw[MLX5_ST_SZ_BYTES(set_pp_rate_limit_context)] = {};
+ struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs,
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_HANDLE);
+ struct mlx5_ib_dev *dev;
+ struct mlx5_ib_ucontext *c;
+ struct mlx5_ib_pp *pp_entry;
+ void *in_ctx;
+ u16 uid;
+ int inlen;
+ u32 flags;
+ int err;
+
+ c = to_mucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(c))
+ return PTR_ERR(c);
+
+ /* The allocated entry can be used only by a DEVX context */
+ if (!c->devx_uid)
+ return -EINVAL;
+
+ dev = to_mdev(c->ibucontext.device);
+ pp_entry = kzalloc(sizeof(*pp_entry), GFP_KERNEL);
+ if (!pp_entry)
+ return -ENOMEM;
+
+ in_ctx = uverbs_attr_get_alloced_ptr(attrs,
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_CTX);
+ inlen = uverbs_attr_get_len(attrs,
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_CTX);
+ memcpy(rl_raw, in_ctx, inlen);
+ err = uverbs_get_flags32(&flags, attrs,
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_FLAGS,
+ MLX5_IB_UAPI_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+ if (err)
+ goto err;
+
+ uid = (flags & MLX5_IB_UAPI_PP_ALLOC_FLAGS_DEDICATED_INDEX) ?
+ c->devx_uid : MLX5_SHARED_RESOURCE_UID;
+
+ err = mlx5_rl_add_rate_raw(dev->mdev, rl_raw, uid,
+ (flags & MLX5_IB_UAPI_PP_ALLOC_FLAGS_DEDICATED_INDEX),
+ &pp_entry->index);
+ if (err)
+ goto err;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_PP_OBJ_ALLOC_INDEX,
+ &pp_entry->index, sizeof(pp_entry->index));
+ if (err)
+ goto clean;
+
+ pp_entry->mdev = dev->mdev;
+ uobj->object = pp_entry;
+ return 0;
+
+clean:
+ mlx5_rl_remove_rate_raw(dev->mdev, pp_entry->index);
+err:
+ kfree(pp_entry);
+ return err;
+}
+
+static int pp_obj_cleanup(struct ib_uobject *uobject,
+ enum rdma_remove_reason why,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct mlx5_ib_pp *pp_entry = uobject->object;
+
+ mlx5_rl_remove_rate_raw(pp_entry->mdev, pp_entry->index);
+ kfree(pp_entry);
+ return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ MLX5_IB_METHOD_PP_OBJ_ALLOC,
+ UVERBS_ATTR_IDR(MLX5_IB_ATTR_PP_OBJ_ALLOC_HANDLE,
+ MLX5_IB_OBJECT_PP,
+ UVERBS_ACCESS_NEW,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_IN(
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_CTX,
+ UVERBS_ATTR_SIZE(1,
+ MLX5_ST_SZ_BYTES(set_pp_rate_limit_context)),
+ UA_MANDATORY,
+ UA_ALLOC_AND_COPY),
+ UVERBS_ATTR_FLAGS_IN(MLX5_IB_ATTR_PP_OBJ_ALLOC_FLAGS,
+ enum mlx5_ib_uapi_pp_alloc_flags,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_PP_OBJ_ALLOC_INDEX,
+ UVERBS_ATTR_TYPE(u16),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD_DESTROY(
+ MLX5_IB_METHOD_PP_OBJ_DESTROY,
+ UVERBS_ATTR_IDR(MLX5_IB_ATTR_PP_OBJ_DESTROY_HANDLE,
+ MLX5_IB_OBJECT_PP,
+ UVERBS_ACCESS_DESTROY,
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(MLX5_IB_OBJECT_PP,
+ UVERBS_TYPE_ALLOC_IDR(pp_obj_cleanup),
+ &UVERBS_METHOD(MLX5_IB_METHOD_PP_OBJ_ALLOC),
+ &UVERBS_METHOD(MLX5_IB_METHOD_PP_OBJ_DESTROY));
+
+
+const struct uapi_definition mlx5_ib_qos_defs[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(
+ MLX5_IB_OBJECT_PP,
+ UAPI_DEF_IS_OBJ_SUPPORTED(pp_is_supported)),
+ {},
+};
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 8fe149e808af..1456db4b6295 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -697,6 +697,9 @@ static int alloc_bfreg(struct mlx5_ib_dev *dev,
{
int bfregn = -ENOMEM;
+ if (bfregi->lib_uar_dyn)
+ return -EINVAL;
+
mutex_lock(&bfregi->lock);
if (bfregi->ver >= 2) {
bfregn = alloc_high_class_bfreg(dev, bfregi);
@@ -768,6 +771,9 @@ int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
u32 index_of_sys_page;
u32 offset;
+ if (bfregi->lib_uar_dyn)
+ return -EINVAL;
+
bfregs_per_sys_page = get_uars_per_sys_page(dev, bfregi->lib_uar_4k) *
MLX5_NON_FP_BFREGS_PER_UAR;
index_of_sys_page = bfregn / bfregs_per_sys_page;
@@ -919,6 +925,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
void *qpc;
int err;
u16 uid;
+ u32 uar_flags;
err = ib_copy_from_udata(&ucmd, udata, sizeof(ucmd));
if (err) {
@@ -928,24 +935,29 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
context = rdma_udata_to_drv_context(udata, struct mlx5_ib_ucontext,
ibucontext);
- if (ucmd.flags & MLX5_QP_FLAG_BFREG_INDEX) {
+ uar_flags = ucmd.flags & (MLX5_QP_FLAG_UAR_PAGE_INDEX |
+ MLX5_QP_FLAG_BFREG_INDEX);
+ switch (uar_flags) {
+ case MLX5_QP_FLAG_UAR_PAGE_INDEX:
+ uar_index = ucmd.bfreg_index;
+ bfregn = MLX5_IB_INVALID_BFREG;
+ break;
+ case MLX5_QP_FLAG_BFREG_INDEX:
uar_index = bfregn_to_uar_index(dev, &context->bfregi,
ucmd.bfreg_index, true);
if (uar_index < 0)
return uar_index;
-
bfregn = MLX5_IB_INVALID_BFREG;
- } else if (qp->flags & MLX5_IB_QP_CROSS_CHANNEL) {
- /*
- * TBD: should come from the verbs when we have the API
- */
- /* In CROSS_CHANNEL CQ and QP must use the same UAR */
- bfregn = MLX5_CROSS_CHANNEL_BFREG;
- }
- else {
+ break;
+ case 0:
+ if (qp->flags & MLX5_IB_QP_CROSS_CHANNEL)
+ return -EINVAL;
bfregn = alloc_bfreg(dev, &context->bfregi);
if (bfregn < 0)
return bfregn;
+ break;
+ default:
+ return -EINVAL;
}
mlx5_ib_dbg(dev, "bfregn 0x%x, uar_index 0x%x\n", bfregn, uar_index);
@@ -2100,6 +2112,7 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd,
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC |
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC |
MLX5_QP_FLAG_TUNNEL_OFFLOADS |
+ MLX5_QP_FLAG_UAR_PAGE_INDEX |
MLX5_QP_FLAG_TYPE_DCI |
MLX5_QP_FLAG_TYPE_DCT))
return -EINVAL;
@@ -2789,7 +2802,7 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd,
mlx5_ib_dbg(dev, "unsupported qp type %d\n",
init_attr->qp_type);
/* Don't support raw QPs */
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
}
if (verbs_init_attr->qp_type == IB_QPT_DRIVER)
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 78a48aea3faf..fa808582b08b 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -58,7 +58,7 @@ struct mthca_user_db_table {
u64 uvirt;
struct scatterlist mem;
int refcount;
- } page[0];
+ } page[];
};
static void mthca_free_icm_pages(struct mthca_dev *dev, struct mthca_icm_chunk *chunk)
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.h b/drivers/infiniband/hw/mthca/mthca_memfree.h
index da9b8f9b884f..f9a2e65e2ff5 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.h
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.h
@@ -68,7 +68,7 @@ struct mthca_icm_table {
int lowmem;
int coherent;
struct mutex mutex;
- struct mthca_icm *icm[0];
+ struct mthca_icm *icm[];
};
struct mthca_icm_iter {
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index ac19d57803b5..69a3e4f62fb1 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -561,7 +561,7 @@ static struct ib_qp *mthca_create_qp(struct ib_pd *pd,
}
default:
/* Don't support raw QPs */
- return ERR_PTR(-ENOSYS);
+ return ERR_PTR(-EOPNOTSUPP);
}
if (err) {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index d47ea675734b..10e343894595 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1111,7 +1111,7 @@ static int ocrdma_check_qp_params(struct ib_pd *ibpd, struct ocrdma_dev *dev,
(attrs->qp_type != IB_QPT_UD)) {
pr_err("%s(%d) unsupported qp type=0x%x requested\n",
__func__, dev->id, attrs->qp_type);
- return -EINVAL;
+ return -EOPNOTSUPP;
}
/* Skip the check for QP1 to support CM size of 128 */
if ((attrs->qp_type != IB_QPT_GSI) &&
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 484b555150e0..a5bd3adaf90a 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1186,7 +1186,7 @@ static int qedr_check_qp_attrs(struct ib_pd *ibpd, struct qedr_dev *dev,
DP_DEBUG(dev, QEDR_MSG_QP,
"create qp: unsupported qp type=0x%x requested\n",
attrs->qp_type);
- return -EINVAL;
+ return -EOPNOTSUPP;
}
if (attrs->cap.max_send_wr > qattr->max_sqe) {
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 5ef93f8f17a1..7508abb6a0fa 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -39,7 +39,6 @@
#include
#include
#include
-#include
#include
#include
@@ -1503,7 +1502,6 @@ int qib_register_ib_device(struct qib_devdata *dd)
unsigned i, ctxt;
int ret;
- get_random_bytes(&dev->qp_rnd, sizeof(dev->qp_rnd));
for (i = 0; i < dd->num_pports; i++)
init_ibport(ppd + i);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 8bf414b47b96..dc0e81f3b6f4 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -177,7 +177,6 @@ struct qib_ibdev {
struct timer_list mem_timer;
struct qib_pio_header *pio_hdrs;
dma_addr_t pio_hdrs_phys;
- u32 qp_rnd; /* random bytes for hash */
u32 n_piowait;
u32 n_txwait;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 556b8e44a51c..71f82339446c 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -504,7 +504,7 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd,
if (init_attr->qp_type != IB_QPT_UD) {
usnic_err("%s asked to make a non-UD QP: %d\n",
dev_name(&us_ibdev->ib_dev.dev), init_attr->qp_type);
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
}
trans_spec = cmd.spec;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.h b/drivers/infiniband/hw/usnic/usnic_uiom.h
index 70be49b1ca05..7ec8991ace67 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.h
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.h
@@ -77,7 +77,7 @@ struct usnic_uiom_reg {
struct usnic_uiom_chunk {
struct list_head list;
int nents;
- struct scatterlist page_list[0];
+ struct scatterlist page_list[];
};
struct usnic_uiom_pd *usnic_uiom_alloc_pd(void);
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
index 9de1281f9a3b..afcc2abcf55c 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c
@@ -217,7 +217,7 @@ struct ib_qp *pvrdma_create_qp(struct ib_pd *pd,
init_attr->qp_type != IB_QPT_GSI) {
dev_warn(&dev->pdev->dev, "queuepair type %d not supported\n",
init_attr->qp_type);
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
}
if (is_srq && !dev->dsr->caps.max_srq) {
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 7858d499db03..0e1b291d2cec 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -1220,7 +1220,7 @@ struct ib_qp *rvt_create_qp(struct ib_pd *ibpd,
default:
/* Don't support raw QPs */
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EOPNOTSUPP);
}
init_attr->cap.max_inline_data = 0;
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 986265ad6e79..72b031ab7092 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -284,12 +284,6 @@ static int rvt_query_gid(struct ib_device *ibdev, u8 port_num,
&gid->global.interface_id);
}
-static inline struct rvt_ucontext *to_iucontext(struct ib_ucontext
- *ibucontext)
-{
- return container_of(ibucontext, struct rvt_ucontext, ibucontext);
-}
-
/**
* rvt_alloc_ucontext - Allocate a user context
* @uctx: Verbs context
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 0946a301a5c5..4afdd2e20883 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -103,6 +103,8 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
rxe->attr.max_fast_reg_page_list_len = RXE_MAX_FMR_PAGE_LIST_LEN;
rxe->attr.max_pkeys = RXE_MAX_PKEYS;
rxe->attr.local_ca_ack_delay = RXE_LOCAL_CA_ACK_DELAY;
+ addrconf_addr_eui48((unsigned char *)&rxe->attr.sys_image_guid,
+ rxe->ndev->dev_addr);
rxe->max_ucontext = RXE_MAX_UCONTEXT;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index ec21f616ac98..6c11c3aeeca6 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -590,15 +590,16 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
int err;
if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
- int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
+ int max_rd_atomic = attr->max_rd_atomic ?
+ roundup_pow_of_two(attr->max_rd_atomic) : 0;
qp->attr.max_rd_atomic = max_rd_atomic;
atomic_set(&qp->req.rd_atomic, max_rd_atomic);
}
if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
- int max_dest_rd_atomic =
- __roundup_pow_of_two(attr->max_dest_rd_atomic);
+ int max_dest_rd_atomic = attr->max_dest_rd_atomic ?
+ roundup_pow_of_two(attr->max_dest_rd_atomic) : 0;
qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;
diff --git a/drivers/infiniband/sw/rxe/rxe_queue.h b/drivers/infiniband/sw/rxe/rxe_queue.h
index acd0a925481c..8ef17d617022 100644
--- a/drivers/infiniband/sw/rxe/rxe_queue.h
+++ b/drivers/infiniband/sw/rxe/rxe_queue.h
@@ -63,7 +63,7 @@ struct rxe_queue_buf {
__u32 pad_2[31];
__u32 consumer_index;
__u32 pad_3[31];
- __u8 data[0];
+ __u8 data[];
};
struct rxe_queue {
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index c5651a96b196..559e5fd3bad8 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1769,14 +1769,23 @@ int siw_reject(struct iw_cm_id *id, const void *pdata, u8 pd_len)
return 0;
}
-static int siw_listen_address(struct iw_cm_id *id, int backlog,
- struct sockaddr *laddr, int addr_family)
+/*
+ * siw_create_listen - Create resources for a listener's IWCM ID @id
+ *
+ * Starts listen on the socket address id->local_addr.
+ *
+ */
+int siw_create_listen(struct iw_cm_id *id, int backlog)
{
struct socket *s;
struct siw_cep *cep = NULL;
struct siw_device *sdev = to_siw_dev(id->device);
+ int addr_family = id->local_addr.ss_family;
int rv = 0, s_val;
+ if (addr_family != AF_INET && addr_family != AF_INET6)
+ return -EAFNOSUPPORT;
+
rv = sock_create(addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
if (rv < 0)
return rv;
@@ -1791,9 +1800,25 @@ static int siw_listen_address(struct iw_cm_id *id, int backlog,
siw_dbg(id->device, "setsockopt error: %d\n", rv);
goto error;
}
- rv = s->ops->bind(s, laddr, addr_family == AF_INET ?
- sizeof(struct sockaddr_in) :
- sizeof(struct sockaddr_in6));
+ if (addr_family == AF_INET) {
+ struct sockaddr_in *laddr = &to_sockaddr_in(id->local_addr);
+
+ /* For wildcard addr, limit binding to current device only */
+ if (ipv4_is_zeronet(laddr->sin_addr.s_addr))
+ s->sk->sk_bound_dev_if = sdev->netdev->ifindex;
+
+ rv = s->ops->bind(s, (struct sockaddr *)laddr,
+ sizeof(struct sockaddr_in));
+ } else {
+ struct sockaddr_in6 *laddr = &to_sockaddr_in6(id->local_addr);
+
+ /* For wildcard addr, limit binding to current device only */
+ if (ipv6_addr_any(&laddr->sin6_addr))
+ s->sk->sk_bound_dev_if = sdev->netdev->ifindex;
+
+ rv = s->ops->bind(s, (struct sockaddr *)laddr,
+ sizeof(struct sockaddr_in6));
+ }
if (rv) {
siw_dbg(id->device, "socket bind error: %d\n", rv);
goto error;
@@ -1852,7 +1877,7 @@ static int siw_listen_address(struct iw_cm_id *id, int backlog,
list_add_tail(&cep->listenq, (struct list_head *)id->provider_data);
cep->state = SIW_EPSTATE_LISTENING;
- siw_dbg(id->device, "Listen at laddr %pISp\n", laddr);
+ siw_dbg(id->device, "Listen at laddr %pISp\n", &id->local_addr);
return 0;
@@ -1910,106 +1935,6 @@ static void siw_drop_listeners(struct iw_cm_id *id)
}
}
-/*
- * siw_create_listen - Create resources for a listener's IWCM ID @id
- *
- * Listens on the socket address id->local_addr.
- *
- * If the listener's @id provides a specific local IP address, at most one
- * listening socket is created and associated with @id.
- *
- * If the listener's @id provides the wildcard (zero) local IP address,
- * a separate listen is performed for each local IP address of the device
- * by creating a listening socket and binding to that local IP address.
- *
- */
-int siw_create_listen(struct iw_cm_id *id, int backlog)
-{
- struct net_device *dev = to_siw_dev(id->device)->netdev;
- int rv = 0, listeners = 0;
-
- siw_dbg(id->device, "backlog %d\n", backlog);
-
- /*
- * For each attached address of the interface, create a
- * listening socket, if id->local_addr is the wildcard
- * IP address or matches the IP address.
- */
- if (id->local_addr.ss_family == AF_INET) {
- struct in_device *in_dev = in_dev_get(dev);
- struct sockaddr_in s_laddr;
- const struct in_ifaddr *ifa;
-
- if (!in_dev) {
- rv = -ENODEV;
- goto out;
- }
- memcpy(&s_laddr, &id->local_addr, sizeof(s_laddr));
-
- siw_dbg(id->device, "laddr %pISp\n", &s_laddr);
-
- rtnl_lock();
- in_dev_for_each_ifa_rtnl(ifa, in_dev) {
- if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||
- s_laddr.sin_addr.s_addr == ifa->ifa_address) {
- s_laddr.sin_addr.s_addr = ifa->ifa_address;
-
- rv = siw_listen_address(id, backlog,
- (struct sockaddr *)&s_laddr,
- AF_INET);
- if (!rv)
- listeners++;
- }
- }
- rtnl_unlock();
- in_dev_put(in_dev);
- } else if (id->local_addr.ss_family == AF_INET6) {
- struct inet6_dev *in6_dev = in6_dev_get(dev);
- struct inet6_ifaddr *ifp;
- struct sockaddr_in6 *s_laddr = &to_sockaddr_in6(id->local_addr);
-
- if (!in6_dev) {
- rv = -ENODEV;
- goto out;
- }
- siw_dbg(id->device, "laddr %pISp\n", &s_laddr);
-
- rtnl_lock();
- list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
- if (ifp->flags & (IFA_F_TENTATIVE | IFA_F_DEPRECATED))
- continue;
- if (ipv6_addr_any(&s_laddr->sin6_addr) ||
- ipv6_addr_equal(&s_laddr->sin6_addr, &ifp->addr)) {
- struct sockaddr_in6 bind_addr = {
- .sin6_family = AF_INET6,
- .sin6_port = s_laddr->sin6_port,
- .sin6_flowinfo = 0,
- .sin6_addr = ifp->addr,
- .sin6_scope_id = dev->ifindex };
-
- rv = siw_listen_address(id, backlog,
- (struct sockaddr *)&bind_addr,
- AF_INET6);
- if (!rv)
- listeners++;
- }
- }
- rtnl_unlock();
- in6_dev_put(in6_dev);
- } else {
- rv = -EAFNOSUPPORT;
- }
-out:
- if (listeners)
- rv = 0;
- else if (!rv)
- rv = -EINVAL;
-
- siw_dbg(id->device, "%s\n", rv ? "FAIL" : "OK");
-
- return rv;
-}
-
int siw_destroy_listen(struct iw_cm_id *id)
{
if (!id->provider_data) {
diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
index 9ccce2909ac4..650520244ed0 100644
--- a/drivers/infiniband/sw/siw/siw_qp_rx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
@@ -332,7 +332,7 @@ static struct siw_wqe *siw_rqe_get(struct siw_qp *qp)
struct siw_srq *srq;
struct siw_wqe *wqe = NULL;
bool srq_event = false;
- unsigned long flags;
+ unsigned long uninitialized_var(flags);
srq = qp->srq;
if (srq) {
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index 07e30138aaa1..aeb842bc7a1e 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -165,15 +165,16 @@ int siw_query_port(struct ib_device *base_dev, u8 port,
struct ib_port_attr *attr)
{
struct siw_device *sdev = to_siw_dev(base_dev);
+ int rv;
memset(attr, 0, sizeof(*attr));
- attr->active_mtu = attr->max_mtu;
- attr->active_speed = 2;
- attr->active_width = 2;
+ rv = ib_get_eth_speed(base_dev, port, &attr->active_speed,
+ &attr->active_width);
attr->gid_tbl_len = 1;
attr->max_msg_sz = -1;
attr->max_mtu = ib_mtu_int_to_enum(sdev->netdev->mtu);
+ attr->active_mtu = ib_mtu_int_to_enum(sdev->netdev->mtu);
attr->phys_state = sdev->state == IB_PORT_ACTIVE ?
IB_PORT_PHYS_STATE_LINK_UP : IB_PORT_PHYS_STATE_DISABLED;
attr->pkey_tbl_len = 1;
@@ -192,7 +193,7 @@ int siw_query_port(struct ib_device *base_dev, u8 port,
* attr->subnet_timeout = 0;
* attr->init_type_repy = 0;
*/
- return 0;
+ return rv;
}
int siw_get_port_immutable(struct ib_device *base_dev, u8 port,
@@ -322,7 +323,7 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd,
}
if (attrs->qp_type != IB_QPT_RC) {
siw_dbg(base_dev, "only RC QP's supported\n");
- rv = -EINVAL;
+ rv = -EOPNOTSUPP;
goto err_out;
}
if ((attrs->cap.max_send_wr > SIW_MAX_QP_WR) ||
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2aa3457a30ce..e188a95984b5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -838,6 +838,4 @@ extern int ipoib_debug_level;
#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
-extern const char ipoib_driver_version[];
-
#endif /* _IPOIB_H */
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c b/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
index a10a0c2ca2da..67a21fdf5367 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
@@ -68,9 +68,6 @@ static void ipoib_get_drvinfo(struct net_device *netdev,
strlcpy(drvinfo->bus_info, dev_name(priv->ca->dev.parent),
sizeof(drvinfo->bus_info));
- strlcpy(drvinfo->version, ipoib_driver_version,
- sizeof(drvinfo->version));
-
strlcpy(drvinfo->driver, "ib_ipoib", sizeof(drvinfo->driver));
}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 4a0d3a9e72e1..81b8227214f1 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -52,10 +52,6 @@
#include
#include
-#define DRV_VERSION "1.0.0"
-
-const char ipoib_driver_version[] = DRV_VERSION;
-
MODULE_AUTHOR("Roland Dreier");
MODULE_DESCRIPTION("IP-over-InfiniBand net driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 7a8f24de3631..999ef7cdd05e 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -292,12 +292,27 @@ void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task,
{
struct iser_device *device = iser_task->iser_conn->ib_conn.device;
struct iser_mem_reg *reg = &iser_task->rdma_reg[cmd_dir];
+ struct iser_fr_desc *desc;
+ struct ib_mr_status mr_status;
- if (!reg->mem_h)
+ desc = reg->mem_h;
+ if (!desc)
return;
- device->reg_ops->reg_desc_put(&iser_task->iser_conn->ib_conn,
- reg->mem_h);
+ /*
+ * The signature MR cannot be invalidated and reused without checking.
+ * libiscsi calls the check_protection transport handler only if
+ * SCSI-Response is received. And the signature MR is not checked if
+ * the task is completed for some other reason like a timeout or error
+ * handling. That's why we must check the signature MR here before
+ * putting it to the free pool.
+ */
+ if (unlikely(desc->sig_protected)) {
+ desc->sig_protected = false;
+ ib_check_mr_status(desc->rsc.sig_mr, IB_MR_CHECK_SIG_STATUS,
+ &mr_status);
+ }
+ device->reg_ops->reg_desc_put(&iser_task->iser_conn->ib_conn, desc);
reg->mem_h = NULL;
}
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h
index 4480092c68e0..d324312a373c 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h
@@ -239,7 +239,7 @@ struct opa_veswport_mactable_entry {
* @offset: mac table starting offset
* @num_entries: Number of entries to get or set
* @mac_tbl_digest: mac table digest
- * @tbl_entries[]: Array of table entries
+ * @tbl_entries: Array of table entries
*
* The EM sends down this structure in a MAD indicating
* the starting offset in the forwarding table that this
@@ -258,7 +258,7 @@ struct opa_veswport_mactable {
__be16 offset;
__be16 num_entries;
__be32 mac_tbl_digest;
- struct opa_veswport_mactable_entry tbl_entries[0];
+ struct opa_veswport_mactable_entry tbl_entries[];
} __packed;
/**
@@ -440,7 +440,7 @@ struct opa_veswport_iface_macs {
__be16 num_macs_in_msg;
__be16 tot_macs_in_lst;
__be16 gen_count;
- struct opa_vnic_iface_mac_entry entry[0];
+ struct opa_vnic_iface_mac_entry entry[];
} __packed;
/**
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_ethtool.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_ethtool.c
index 8ad7da989a0e..42d557dff19d 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_ethtool.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_ethtool.c
@@ -125,8 +125,6 @@ static void vnic_get_drvinfo(struct net_device *netdev,
struct ethtool_drvinfo *drvinfo)
{
strlcpy(drvinfo->driver, opa_vnic_driver_name, sizeof(drvinfo->driver));
- strlcpy(drvinfo->version, opa_vnic_driver_version,
- sizeof(drvinfo->version));
strlcpy(drvinfo->bus_info, dev_name(netdev->dev.parent),
sizeof(drvinfo->bus_info));
}
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
index 6dbc08e1a6a6..dd942dd642bd 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
@@ -292,7 +292,6 @@ struct opa_vnic_mac_tbl_node {
hlist_for_each_entry(obj, &name[bkt], member)
extern char opa_vnic_driver_name[];
-extern const char opa_vnic_driver_version[];
struct opa_vnic_adapter *opa_vnic_add_netdev(struct ib_device *ibdev,
u8 port_num, u8 vport_num);
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
index be5befd92d16..6e8d650c17c7 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
@@ -59,9 +59,7 @@
#include "opa_vnic_internal.h"
-#define DRV_VERSION "1.0"
char opa_vnic_driver_name[] = "opa_vnic";
-const char opa_vnic_driver_version[] = DRV_VERSION;
/*
* The trap service level is kept in bits 3 to 7 in the trap_sl_rsvd
@@ -1041,9 +1039,6 @@ static int __init opa_vnic_init(void)
{
int rc;
- pr_info("OPA Virtual Network Driver - v%s\n",
- opa_vnic_driver_version);
-
rc = ib_register_client(&opa_vnic_client);
if (rc)
pr_err("VNIC driver register failed %d\n", rc);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 5359ece561ca..6fabcc2faf1f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -309,7 +309,7 @@ struct srp_fr_pool {
int max_page_list_len;
spinlock_t lock;
struct list_head free_list;
- struct srp_fr_desc desc[0];
+ struct srp_fr_desc desc[];
};
/**
diff --git a/drivers/md/dm-clone-metadata.c b/drivers/md/dm-clone-metadata.c
index c05b12110456..17712456fa63 100644
--- a/drivers/md/dm-clone-metadata.c
+++ b/drivers/md/dm-clone-metadata.c
@@ -656,7 +656,7 @@ bool dm_clone_is_range_hydrated(struct dm_clone_metadata *cmd,
return (bit >= (start + nr_regions));
}
-unsigned long dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd)
+unsigned int dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd)
{
return bitmap_weight(cmd->region_map, cmd->nr_regions);
}
@@ -850,6 +850,12 @@ int dm_clone_set_region_hydrated(struct dm_clone_metadata *cmd, unsigned long re
struct dirty_map *dmap;
unsigned long word, flags;
+ if (unlikely(region_nr >= cmd->nr_regions)) {
+ DMERR("Region %lu out of range (total number of regions %lu)",
+ region_nr, cmd->nr_regions);
+ return -ERANGE;
+ }
+
word = region_nr / BITS_PER_LONG;
spin_lock_irqsave(&cmd->bitmap_lock, flags);
@@ -879,6 +885,13 @@ int dm_clone_cond_set_range(struct dm_clone_metadata *cmd, unsigned long start,
struct dirty_map *dmap;
unsigned long word, region_nr;
+ if (unlikely(start >= cmd->nr_regions || (start + nr_regions) < start ||
+ (start + nr_regions) > cmd->nr_regions)) {
+ DMERR("Invalid region range: start %lu, nr_regions %lu (total number of regions %lu)",
+ start, nr_regions, cmd->nr_regions);
+ return -ERANGE;
+ }
+
spin_lock_irq(&cmd->bitmap_lock);
if (cmd->read_only) {
diff --git a/drivers/md/dm-clone-metadata.h b/drivers/md/dm-clone-metadata.h
index 14af1ebd853f..d848b8799c07 100644
--- a/drivers/md/dm-clone-metadata.h
+++ b/drivers/md/dm-clone-metadata.h
@@ -156,7 +156,7 @@ bool dm_clone_is_range_hydrated(struct dm_clone_metadata *cmd,
/*
* Returns the number of hydrated regions.
*/
-unsigned long dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd);
+unsigned int dm_clone_nr_of_hydrated_regions(struct dm_clone_metadata *cmd);
/*
* Returns the first unhydrated region with region_nr >= @start
diff --git a/drivers/md/dm-clone-target.c b/drivers/md/dm-clone-target.c
index d1e1b5b56b1b..5ce96ddf1ce1 100644
--- a/drivers/md/dm-clone-target.c
+++ b/drivers/md/dm-clone-target.c
@@ -282,7 +282,7 @@ static bool bio_triggers_commit(struct clone *clone, struct bio *bio)
/* Get the address of the region in sectors */
static inline sector_t region_to_sector(struct clone *clone, unsigned long region_nr)
{
- return (region_nr << clone->region_shift);
+ return ((sector_t)region_nr << clone->region_shift);
}
/* Get the region number of the bio */
@@ -293,10 +293,17 @@ static inline unsigned long bio_to_region(struct clone *clone, struct bio *bio)
/* Get the region range covered by the bio */
static void bio_region_range(struct clone *clone, struct bio *bio,
- unsigned long *rs, unsigned long *re)
+ unsigned long *rs, unsigned long *nr_regions)
{
+ unsigned long end;
+
*rs = dm_sector_div_up(bio->bi_iter.bi_sector, clone->region_size);
- *re = bio_end_sector(bio) >> clone->region_shift;
+ end = bio_end_sector(bio) >> clone->region_shift;
+
+ if (*rs >= end)
+ *nr_regions = 0;
+ else
+ *nr_regions = end - *rs;
}
/* Check whether a bio overwrites a region */
@@ -454,7 +461,7 @@ static void trim_bio(struct bio *bio, sector_t sector, unsigned int len)
static void complete_discard_bio(struct clone *clone, struct bio *bio, bool success)
{
- unsigned long rs, re;
+ unsigned long rs, nr_regions;
/*
* If the destination device supports discards, remap and trim the
@@ -463,9 +470,9 @@ static void complete_discard_bio(struct clone *clone, struct bio *bio, bool succ
*/
if (test_bit(DM_CLONE_DISCARD_PASSDOWN, &clone->flags) && success) {
remap_to_dest(clone, bio);
- bio_region_range(clone, bio, &rs, &re);
- trim_bio(bio, rs << clone->region_shift,
- (re - rs) << clone->region_shift);
+ bio_region_range(clone, bio, &rs, &nr_regions);
+ trim_bio(bio, region_to_sector(clone, rs),
+ nr_regions << clone->region_shift);
generic_make_request(bio);
} else
bio_endio(bio);
@@ -473,12 +480,21 @@ static void complete_discard_bio(struct clone *clone, struct bio *bio, bool succ
static void process_discard_bio(struct clone *clone, struct bio *bio)
{
- unsigned long rs, re;
+ unsigned long rs, nr_regions;
- bio_region_range(clone, bio, &rs, &re);
- BUG_ON(re > clone->nr_regions);
+ bio_region_range(clone, bio, &rs, &nr_regions);
+ if (!nr_regions) {
+ bio_endio(bio);
+ return;
+ }
- if (unlikely(rs == re)) {
+ if (WARN_ON(rs >= clone->nr_regions || (rs + nr_regions) < rs ||
+ (rs + nr_regions) > clone->nr_regions)) {
+ DMERR("%s: Invalid range (%lu + %lu, total regions %lu) for discard (%llu + %u)",
+ clone_device_name(clone), rs, nr_regions,
+ clone->nr_regions,
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bio_sectors(bio));
bio_endio(bio);
return;
}
@@ -487,7 +503,7 @@ static void process_discard_bio(struct clone *clone, struct bio *bio)
* The covered regions are already hydrated so we just need to pass
* down the discard.
*/
- if (dm_clone_is_range_hydrated(clone->cmd, rs, re - rs)) {
+ if (dm_clone_is_range_hydrated(clone->cmd, rs, nr_regions)) {
complete_discard_bio(clone, bio, true);
return;
}
@@ -788,11 +804,14 @@ static void hydration_copy(struct dm_clone_region_hydration *hd, unsigned int nr
struct dm_io_region from, to;
struct clone *clone = hd->clone;
+ if (WARN_ON(!nr_regions))
+ return;
+
region_size = clone->region_size;
region_start = hd->region_nr;
region_end = region_start + nr_regions - 1;
- total_size = (nr_regions - 1) << clone->region_shift;
+ total_size = region_to_sector(clone, nr_regions - 1);
if (region_end == clone->nr_regions - 1) {
/*
@@ -1169,7 +1188,7 @@ static void process_deferred_discards(struct clone *clone)
int r = -EPERM;
struct bio *bio;
struct blk_plug plug;
- unsigned long rs, re;
+ unsigned long rs, nr_regions;
struct bio_list discards = BIO_EMPTY_LIST;
spin_lock_irq(&clone->lock);
@@ -1185,14 +1204,13 @@ static void process_deferred_discards(struct clone *clone)
/* Update the metadata */
bio_list_for_each(bio, &discards) {
- bio_region_range(clone, bio, &rs, &re);
+ bio_region_range(clone, bio, &rs, &nr_regions);
/*
* A discard request might cover regions that have been already
* hydrated. There is no need to update the metadata for these
* regions.
*/
- r = dm_clone_cond_set_range(clone->cmd, rs, re - rs);
-
+ r = dm_clone_cond_set_range(clone->cmd, rs, nr_regions);
if (unlikely(r))
break;
}
@@ -1455,7 +1473,7 @@ static void clone_status(struct dm_target *ti, status_type_t type,
goto error;
}
- DMEMIT("%u %llu/%llu %llu %lu/%lu %u ",
+ DMEMIT("%u %llu/%llu %llu %u/%lu %u ",
DM_CLONE_METADATA_BLOCK_SIZE,
(unsigned long long)(nr_metadata_blocks - nr_free_metadata_blocks),
(unsigned long long)nr_metadata_blocks,
@@ -1775,6 +1793,7 @@ static int copy_ctr_args(struct clone *clone, int argc, const char **argv, char
static int clone_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{
int r;
+ sector_t nr_regions;
struct clone *clone;
struct dm_arg_set as;
@@ -1816,7 +1835,16 @@ static int clone_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto out_with_source_dev;
clone->region_shift = __ffs(clone->region_size);
- clone->nr_regions = dm_sector_div_up(ti->len, clone->region_size);
+ nr_regions = dm_sector_div_up(ti->len, clone->region_size);
+
+ /* Check for overflow */
+ if (nr_regions != (unsigned long)nr_regions) {
+ ti->error = "Too many regions. Consider increasing the region size";
+ r = -EOVERFLOW;
+ goto out_with_source_dev;
+ }
+
+ clone->nr_regions = nr_regions;
r = validate_nr_regions(clone->nr_regions, &ti->error);
if (r)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index c6a529873d0f..3df90daba89e 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -230,6 +230,8 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io);
static struct scatterlist *crypt_get_sg_data(struct crypt_config *cc,
struct scatterlist *sg);
+static bool crypt_integrity_aead(struct crypt_config *cc);
+
/*
* Use this to access cipher attributes that are independent of the key.
*/
@@ -346,7 +348,7 @@ static int crypt_iv_benbi_ctr(struct crypt_config *cc, struct dm_target *ti,
unsigned bs;
int log;
- if (test_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags))
+ if (crypt_integrity_aead(cc))
bs = crypto_aead_blocksize(any_tfm_aead(cc));
else
bs = crypto_skcipher_blocksize(any_tfm(cc));
@@ -712,7 +714,7 @@ static int crypt_iv_random_gen(struct crypt_config *cc, u8 *iv,
static int crypt_iv_eboiv_ctr(struct crypt_config *cc, struct dm_target *ti,
const char *opts)
{
- if (test_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags)) {
+ if (crypt_integrity_aead(cc)) {
ti->error = "AEAD transforms not supported for EBOIV";
return -EINVAL;
}
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 2f03fecd312d..b989d109d55d 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -39,6 +39,7 @@
#define RECALC_WRITE_SUPER 16
#define BITMAP_BLOCK_SIZE 4096 /* don't change it */
#define BITMAP_FLUSH_INTERVAL (10 * HZ)
+#define DISCARD_FILLER 0xf6
/*
* Warning - DEBUG_PRINT prints security-sensitive data to the log,
@@ -257,6 +258,7 @@ struct dm_integrity_c {
bool just_formatted;
bool recalculate_flag;
bool fix_padding;
+ bool discard;
struct alg_spec internal_hash_alg;
struct alg_spec journal_crypt_alg;
@@ -284,7 +286,7 @@ struct dm_integrity_io {
struct work_struct work;
struct dm_integrity_c *ic;
- bool write;
+ enum req_opf op;
bool fua;
struct dm_integrity_range range;
@@ -510,8 +512,8 @@ static bool block_bitmap_op(struct dm_integrity_c *ic, struct page_list *bitmap,
if (unlikely(((sector | n_sectors) & ((1 << ic->sb->log2_sectors_per_block) - 1)) != 0)) {
DMCRIT("invalid bitmap access (%llx,%llx,%d,%d,%d)",
- (unsigned long long)sector,
- (unsigned long long)n_sectors,
+ sector,
+ n_sectors,
ic->sb->log2_sectors_per_block,
ic->log2_blocks_per_bitmap_bit,
mode);
@@ -1299,6 +1301,11 @@ static bool find_newer_committed_node(struct dm_integrity_c *ic, struct journal_
static int dm_integrity_rw_tag(struct dm_integrity_c *ic, unsigned char *tag, sector_t *metadata_block,
unsigned *metadata_offset, unsigned total_size, int op)
{
+#define MAY_BE_FILLER 1
+#define MAY_BE_HASH 2
+ unsigned hash_offset = 0;
+ unsigned may_be = MAY_BE_HASH | (ic->discard ? MAY_BE_FILLER : 0);
+
do {
unsigned char *data, *dp;
struct dm_buffer *b;
@@ -1320,18 +1327,35 @@ static int dm_integrity_rw_tag(struct dm_integrity_c *ic, unsigned char *tag, se
} else if (op == TAG_WRITE) {
memcpy(dp, tag, to_copy);
dm_bufio_mark_partial_buffer_dirty(b, *metadata_offset, *metadata_offset + to_copy);
- } else {
+ } else {
/* e.g.: op == TAG_CMP */
- if (unlikely(memcmp(dp, tag, to_copy))) {
- unsigned i;
- for (i = 0; i < to_copy; i++) {
- if (dp[i] != tag[i])
- break;
- total_size--;
+ if (likely(is_power_of_2(ic->tag_size))) {
+ if (unlikely(memcmp(dp, tag, to_copy)))
+ if (unlikely(!ic->discard) ||
+ unlikely(!memchr_inv(dp, DISCARD_FILLER, to_copy))) {
+ goto thorough_test;
+ }
+ } else {
+ unsigned i, ts;
+thorough_test:
+ ts = total_size;
+
+ for (i = 0; i < to_copy; i++, ts--) {
+ if (unlikely(dp[i] != tag[i]))
+ may_be &= ~MAY_BE_HASH;
+ if (likely(dp[i] != DISCARD_FILLER))
+ may_be &= ~MAY_BE_FILLER;
+ hash_offset++;
+ if (unlikely(hash_offset == ic->tag_size)) {
+ if (unlikely(!may_be)) {
+ dm_bufio_release(b);
+ return ts;
+ }
+ hash_offset = 0;
+ may_be = MAY_BE_HASH | (ic->discard ? MAY_BE_FILLER : 0);
+ }
}
- dm_bufio_release(b);
- return total_size;
}
}
dm_bufio_release(b);
@@ -1342,10 +1366,17 @@ static int dm_integrity_rw_tag(struct dm_integrity_c *ic, unsigned char *tag, se
(*metadata_block)++;
*metadata_offset = 0;
}
+
+ if (unlikely(!is_power_of_2(ic->tag_size))) {
+ hash_offset = (hash_offset + to_copy) % ic->tag_size;
+ }
+
total_size -= to_copy;
} while (unlikely(total_size));
return 0;
+#undef MAY_BE_FILLER
+#undef MAY_BE_HASH
}
static void dm_integrity_flush_buffers(struct dm_integrity_c *ic)
@@ -1428,7 +1459,7 @@ static void dec_in_flight(struct dm_integrity_io *dio)
remove_range(ic, &dio->range);
- if (unlikely(dio->write))
+ if (dio->op == REQ_OP_WRITE || unlikely(dio->op == REQ_OP_DISCARD))
schedule_autocommit(ic);
bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
@@ -1519,15 +1550,20 @@ static void integrity_metadata(struct work_struct *w)
struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
char *checksums;
unsigned extra_space = unlikely(digest_size > ic->tag_size) ? digest_size - ic->tag_size : 0;
- char checksums_onstack[HASH_MAX_DIGESTSIZE];
- unsigned sectors_to_process = dio->range.n_sectors;
- sector_t sector = dio->range.logical_sector;
+ char checksums_onstack[max((size_t)HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)];
+ sector_t sector;
+ unsigned sectors_to_process;
+ sector_t save_metadata_block;
+ unsigned save_metadata_offset;
if (unlikely(ic->mode == 'R'))
goto skip_io;
- checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space,
- GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
+ if (likely(dio->op != REQ_OP_DISCARD))
+ checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space,
+ GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
+ else
+ checksums = kmalloc(PAGE_SIZE, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
if (!checksums) {
checksums = checksums_onstack;
if (WARN_ON(extra_space &&
@@ -1537,6 +1573,43 @@ static void integrity_metadata(struct work_struct *w)
}
}
+ if (unlikely(dio->op == REQ_OP_DISCARD)) {
+ sector_t bi_sector = dio->bio_details.bi_iter.bi_sector;
+ unsigned bi_size = dio->bio_details.bi_iter.bi_size;
+ unsigned max_size = likely(checksums != checksums_onstack) ? PAGE_SIZE : HASH_MAX_DIGESTSIZE;
+ unsigned max_blocks = max_size / ic->tag_size;
+ memset(checksums, DISCARD_FILLER, max_size);
+
+ while (bi_size) {
+ unsigned this_step_blocks = bi_size >> (SECTOR_SHIFT + ic->sb->log2_sectors_per_block);
+ this_step_blocks = min(this_step_blocks, max_blocks);
+ r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset,
+ this_step_blocks * ic->tag_size, TAG_WRITE);
+ if (unlikely(r)) {
+ if (likely(checksums != checksums_onstack))
+ kfree(checksums);
+ goto error;
+ }
+
+ /*if (bi_size < this_step_blocks << (SECTOR_SHIFT + ic->sb->log2_sectors_per_block)) {
+ printk("BUGG: bi_sector: %llx, bi_size: %u\n", bi_sector, bi_size);
+ printk("BUGG: this_step_blocks: %u\n", this_step_blocks);
+ BUG();
+ }*/
+ bi_size -= this_step_blocks << (SECTOR_SHIFT + ic->sb->log2_sectors_per_block);
+ bi_sector += this_step_blocks << ic->sb->log2_sectors_per_block;
+ }
+
+ if (likely(checksums != checksums_onstack))
+ kfree(checksums);
+ goto skip_io;
+ }
+
+ save_metadata_block = dio->metadata_block;
+ save_metadata_offset = dio->metadata_offset;
+ sector = dio->range.logical_sector;
+ sectors_to_process = dio->range.n_sectors;
+
__bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) {
unsigned pos;
char *mem, *checksums_ptr;
@@ -1555,11 +1628,12 @@ static void integrity_metadata(struct work_struct *w)
kunmap_atomic(mem);
r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset,
- checksums_ptr - checksums, !dio->write ? TAG_CMP : TAG_WRITE);
+ checksums_ptr - checksums, dio->op == REQ_OP_READ ? TAG_CMP : TAG_WRITE);
if (unlikely(r)) {
if (r > 0) {
- DMERR_LIMIT("Checksum failed at sector 0x%llx",
- (unsigned long long)(sector - ((r + ic->tag_size - 1) / ic->tag_size)));
+ char b[BDEVNAME_SIZE];
+ DMERR_LIMIT("%s: Checksum failed at sector 0x%llx", bio_devname(bio, b),
+ (sector - ((r + ic->tag_size - 1) / ic->tag_size)));
r = -EILSEQ;
atomic64_inc(&ic->number_of_mismatches);
}
@@ -1598,7 +1672,7 @@ static void integrity_metadata(struct work_struct *w)
tag = lowmem_page_address(biv.bv_page) + biv.bv_offset;
this_len = min(biv.bv_len, data_to_process);
r = dm_integrity_rw_tag(ic, tag, &dio->metadata_block, &dio->metadata_offset,
- this_len, !dio->write ? TAG_READ : TAG_WRITE);
+ this_len, dio->op == REQ_OP_READ ? TAG_READ : TAG_WRITE);
if (unlikely(r))
goto error;
data_to_process -= this_len;
@@ -1625,6 +1699,20 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio)
dio->ic = ic;
dio->bi_status = 0;
+ dio->op = bio_op(bio);
+
+ if (unlikely(dio->op == REQ_OP_DISCARD)) {
+ if (ti->max_io_len) {
+ sector_t sec = dm_target_offset(ti, bio->bi_iter.bi_sector);
+ unsigned log2_max_io_len = __fls(ti->max_io_len);
+ sector_t start_boundary = sec >> log2_max_io_len;
+ sector_t end_boundary = (sec + bio_sectors(bio) - 1) >> log2_max_io_len;
+ if (start_boundary < end_boundary) {
+ sector_t len = ti->max_io_len - (sec & (ti->max_io_len - 1));
+ dm_accept_partial_bio(bio, len);
+ }
+ }
+ }
if (unlikely(bio->bi_opf & REQ_PREFLUSH)) {
submit_flush_bio(ic, dio);
@@ -1632,8 +1720,7 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio)
}
dio->range.logical_sector = dm_target_offset(ti, bio->bi_iter.bi_sector);
- dio->write = bio_op(bio) == REQ_OP_WRITE;
- dio->fua = dio->write && bio->bi_opf & REQ_FUA;
+ dio->fua = dio->op == REQ_OP_WRITE && bio->bi_opf & REQ_FUA;
if (unlikely(dio->fua)) {
/*
* Don't pass down the FUA flag because we have to flush
@@ -1643,18 +1730,18 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio)
}
if (unlikely(dio->range.logical_sector + bio_sectors(bio) > ic->provided_data_sectors)) {
DMERR("Too big sector number: 0x%llx + 0x%x > 0x%llx",
- (unsigned long long)dio->range.logical_sector, bio_sectors(bio),
- (unsigned long long)ic->provided_data_sectors);
+ dio->range.logical_sector, bio_sectors(bio),
+ ic->provided_data_sectors);
return DM_MAPIO_KILL;
}
if (unlikely((dio->range.logical_sector | bio_sectors(bio)) & (unsigned)(ic->sectors_per_block - 1))) {
DMERR("Bio not aligned on %u sectors: 0x%llx, 0x%x",
ic->sectors_per_block,
- (unsigned long long)dio->range.logical_sector, bio_sectors(bio));
+ dio->range.logical_sector, bio_sectors(bio));
return DM_MAPIO_KILL;
}
- if (ic->sectors_per_block > 1) {
+ if (ic->sectors_per_block > 1 && likely(dio->op != REQ_OP_DISCARD)) {
struct bvec_iter iter;
struct bio_vec bv;
bio_for_each_segment(bv, bio, iter) {
@@ -1687,7 +1774,7 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio)
}
}
- if (unlikely(ic->mode == 'R') && unlikely(dio->write))
+ if (unlikely(ic->mode == 'R') && unlikely(dio->op != REQ_OP_READ))
return DM_MAPIO_KILL;
get_area_and_offset(ic, dio->range.logical_sector, &area, &offset);
@@ -1717,13 +1804,13 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
bio_advance_iter(bio, &bio->bi_iter, bv.bv_len);
retry_kmap:
mem = kmap_atomic(bv.bv_page);
- if (likely(dio->write))
+ if (likely(dio->op == REQ_OP_WRITE))
flush_dcache_page(bv.bv_page);
do {
struct journal_entry *je = access_journal_entry(ic, journal_section, journal_entry);
- if (unlikely(!dio->write)) {
+ if (unlikely(dio->op == REQ_OP_READ)) {
struct journal_sector *js;
char *mem_ptr;
unsigned s;
@@ -1748,12 +1835,12 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
} while (++s < ic->sectors_per_block);
#ifdef INTERNAL_VERIFY
if (ic->internal_hash) {
- char checksums_onstack[max(HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)];
+ char checksums_onstack[max((size_t)HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)];
integrity_sector_checksum(ic, logical_sector, mem + bv.bv_offset, checksums_onstack);
if (unlikely(memcmp(checksums_onstack, journal_entry_tag(ic, je), ic->tag_size))) {
DMERR_LIMIT("Checksum failed when reading from journal, at sector 0x%llx",
- (unsigned long long)logical_sector);
+ logical_sector);
}
}
#endif
@@ -1770,7 +1857,7 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
char *tag_addr;
BUG_ON(PageHighMem(biv.bv_page));
tag_addr = lowmem_page_address(biv.bv_page) + biv.bv_offset;
- if (likely(dio->write))
+ if (likely(dio->op == REQ_OP_WRITE))
memcpy(tag_ptr, tag_addr, tag_now);
else
memcpy(tag_addr, tag_ptr, tag_now);
@@ -1778,12 +1865,12 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
tag_ptr += tag_now;
tag_todo -= tag_now;
} while (unlikely(tag_todo)); else {
- if (likely(dio->write))
+ if (likely(dio->op == REQ_OP_WRITE))
memset(tag_ptr, 0, tag_todo);
}
}
- if (likely(dio->write)) {
+ if (likely(dio->op == REQ_OP_WRITE)) {
struct journal_sector *js;
unsigned s;
@@ -1819,12 +1906,12 @@ static bool __journal_read_write(struct dm_integrity_io *dio, struct bio *bio,
bv.bv_offset += ic->sectors_per_block << SECTOR_SHIFT;
} while (bv.bv_len -= ic->sectors_per_block << SECTOR_SHIFT);
- if (unlikely(!dio->write))
+ if (unlikely(dio->op == REQ_OP_READ))
flush_dcache_page(bv.bv_page);
kunmap_atomic(mem);
} while (n_sectors);
- if (likely(dio->write)) {
+ if (likely(dio->op == REQ_OP_WRITE)) {
smp_mb();
if (unlikely(waitqueue_active(&ic->copy_to_journal_wait)))
wake_up(&ic->copy_to_journal_wait);
@@ -1856,7 +1943,10 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
unsigned journal_section, journal_entry;
unsigned journal_read_pos;
struct completion read_comp;
- bool need_sync_io = ic->internal_hash && !dio->write;
+ bool discard_retried = false;
+ bool need_sync_io = ic->internal_hash && dio->op == REQ_OP_READ;
+ if (unlikely(dio->op == REQ_OP_DISCARD) && ic->mode != 'D')
+ need_sync_io = true;
if (need_sync_io && from_map) {
INIT_WORK(&dio->work, integrity_bio_wait);
@@ -1874,8 +1964,8 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
}
dio->range.n_sectors = bio_sectors(bio);
journal_read_pos = NOT_FOUND;
- if (likely(ic->mode == 'J')) {
- if (dio->write) {
+ if (ic->mode == 'J' && likely(dio->op != REQ_OP_DISCARD)) {
+ if (dio->op == REQ_OP_WRITE) {
unsigned next_entry, i, pos;
unsigned ws, we, range_sectors;
@@ -1970,6 +2060,21 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
}
}
}
+ if (ic->mode == 'J' && likely(dio->op == REQ_OP_DISCARD) && !discard_retried) {
+ sector_t next_sector;
+ unsigned new_pos = find_journal_node(ic, dio->range.logical_sector, &next_sector);
+ if (unlikely(new_pos != NOT_FOUND) ||
+ unlikely(next_sector < dio->range.logical_sector - dio->range.n_sectors)) {
+ remove_range_unlocked(ic, &dio->range);
+ spin_unlock_irq(&ic->endio_wait.lock);
+ queue_work(ic->commit_wq, &ic->commit_work);
+ flush_workqueue(ic->commit_wq);
+ queue_work(ic->writer_wq, &ic->writer_work);
+ flush_workqueue(ic->writer_wq);
+ discard_retried = true;
+ goto lock_retry;
+ }
+ }
spin_unlock_irq(&ic->endio_wait.lock);
if (unlikely(journal_read_pos != NOT_FOUND)) {
@@ -1978,7 +2083,7 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
goto journal_read_write;
}
- if (ic->mode == 'B' && dio->write) {
+ if (ic->mode == 'B' && (dio->op == REQ_OP_WRITE || unlikely(dio->op == REQ_OP_DISCARD))) {
if (!block_bitmap_op(ic, ic->may_write_bitmap, dio->range.logical_sector,
dio->range.n_sectors, BITMAP_OP_TEST_ALL_SET)) {
struct bitmap_block_status *bbs;
@@ -2007,6 +2112,18 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
bio->bi_end_io = integrity_end_io;
bio->bi_iter.bi_size = dio->range.n_sectors << SECTOR_SHIFT;
+ if (unlikely(dio->op == REQ_OP_DISCARD) && likely(ic->mode != 'D')) {
+ integrity_metadata(&dio->work);
+ dm_integrity_flush_buffers(ic);
+
+ dio->in_flight = (atomic_t)ATOMIC_INIT(1);
+ dio->completion = NULL;
+
+ generic_make_request(bio);
+
+ return;
+ }
+
generic_make_request(bio);
if (need_sync_io) {
@@ -2193,6 +2310,8 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned write_start,
sec &= ~(sector_t)(ic->sectors_per_block - 1);
}
}
+ if (unlikely(sec >= ic->provided_data_sectors))
+ continue;
get_area_and_offset(ic, sec, &area, &offset);
restore_last_bytes(ic, access_journal_data(ic, i, j), je);
for (k = j + 1; k < ic->journal_section_entries; k++) {
@@ -2202,6 +2321,8 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned write_start,
break;
BUG_ON(unlikely(journal_entry_is_inprogress(je2)) && !from_replay);
sec2 = journal_entry_get_sector(je2);
+ if (unlikely(sec2 >= ic->provided_data_sectors))
+ break;
get_area_and_offset(ic, sec2, &area2, &offset2);
if (area2 != area || offset2 != offset + ((k - j) << ic->sb->log2_sectors_per_block))
break;
@@ -2404,7 +2525,7 @@ static void integrity_recalc(struct work_struct *w)
get_area_and_offset(ic, logical_sector, &area, &offset);
}
- DEBUG_print("recalculating: %lx, %lx\n", logical_sector, n_sectors);
+ DEBUG_print("recalculating: %llx, %llx\n", logical_sector, n_sectors);
if (unlikely(++super_counter == RECALC_WRITE_SUPER)) {
recalc_write_super(ic);
@@ -2828,9 +2949,29 @@ static void dm_integrity_postsuspend(struct dm_target *ti)
static void dm_integrity_resume(struct dm_target *ti)
{
struct dm_integrity_c *ic = (struct dm_integrity_c *)ti->private;
+ __u64 old_provided_data_sectors = le64_to_cpu(ic->sb->provided_data_sectors);
int r;
+
DEBUG_print("resume\n");
+ if (ic->provided_data_sectors != old_provided_data_sectors) {
+ if (ic->provided_data_sectors > old_provided_data_sectors &&
+ ic->mode == 'B' &&
+ ic->sb->log2_blocks_per_bitmap_bit == ic->log2_blocks_per_bitmap_bit) {
+ rw_journal_sectors(ic, REQ_OP_READ, 0, 0,
+ ic->n_bitmap_blocks * (BITMAP_BLOCK_SIZE >> SECTOR_SHIFT), NULL);
+ block_bitmap_op(ic, ic->journal, old_provided_data_sectors,
+ ic->provided_data_sectors - old_provided_data_sectors, BITMAP_OP_SET);
+ rw_journal_sectors(ic, REQ_OP_WRITE, REQ_FUA | REQ_SYNC, 0,
+ ic->n_bitmap_blocks * (BITMAP_BLOCK_SIZE >> SECTOR_SHIFT), NULL);
+ }
+
+ ic->sb->provided_data_sectors = cpu_to_le64(ic->provided_data_sectors);
+ r = sync_rw_sb(ic, REQ_OP_WRITE, REQ_FUA);
+ if (unlikely(r))
+ dm_integrity_io_error(ic, "writing superblock", r);
+ }
+
if (ic->sb->flags & cpu_to_le32(SB_FLAG_DIRTY_BITMAP)) {
DEBUG_print("resume dirty_bitmap\n");
rw_journal_sectors(ic, REQ_OP_READ, 0, 0,
@@ -2898,7 +3039,7 @@ static void dm_integrity_resume(struct dm_target *ti)
DEBUG_print("testing recalc: %x\n", ic->sb->flags);
if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING)) {
__u64 recalc_pos = le64_to_cpu(ic->sb->recalc_sector);
- DEBUG_print("recalc pos: %lx / %lx\n", (long)recalc_pos, ic->provided_data_sectors);
+ DEBUG_print("recalc pos: %llx / %llx\n", recalc_pos, ic->provided_data_sectors);
if (recalc_pos < ic->provided_data_sectors) {
queue_work(ic->recalc_wq, &ic->recalc_work);
} else if (recalc_pos > ic->provided_data_sectors) {
@@ -2928,10 +3069,10 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
switch (type) {
case STATUSTYPE_INFO:
DMEMIT("%llu %llu",
- (unsigned long long)atomic64_read(&ic->number_of_mismatches),
- (unsigned long long)ic->provided_data_sectors);
+ atomic64_read(&ic->number_of_mismatches),
+ ic->provided_data_sectors);
if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))
- DMEMIT(" %llu", (unsigned long long)le64_to_cpu(ic->sb->recalc_sector));
+ DMEMIT(" %llu", le64_to_cpu(ic->sb->recalc_sector));
else
DMEMIT(" -");
break;
@@ -2944,6 +3085,7 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
arg_count += !!ic->meta_dev;
arg_count += ic->sectors_per_block != 1;
arg_count += !!(ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING));
+ arg_count += ic->discard;
arg_count += ic->mode == 'J';
arg_count += ic->mode == 'J';
arg_count += ic->mode == 'B';
@@ -2952,7 +3094,7 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
arg_count += !!ic->journal_crypt_alg.alg_string;
arg_count += !!ic->journal_mac_alg.alg_string;
arg_count += (ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0;
- DMEMIT("%s %llu %u %c %u", ic->dev->name, (unsigned long long)ic->start,
+ DMEMIT("%s %llu %u %c %u", ic->dev->name, ic->start,
ic->tag_size, ic->mode, arg_count);
if (ic->meta_dev)
DMEMIT(" meta_device:%s", ic->meta_dev->name);
@@ -2960,6 +3102,8 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
DMEMIT(" block_size:%u", ic->sectors_per_block << SECTOR_SHIFT);
if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))
DMEMIT(" recalculate");
+ if (ic->discard)
+ DMEMIT(" allow_discards");
DMEMIT(" journal_sectors:%u", ic->initial_sectors - SB_SECTORS);
DMEMIT(" interleave_sectors:%u", 1U << ic->sb->log2_interleave_sectors);
DMEMIT(" buffer_sectors:%u", 1U << ic->log2_buffer_sectors);
@@ -2968,7 +3112,7 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
DMEMIT(" commit_time:%u", ic->autocommit_msec);
}
if (ic->mode == 'B') {
- DMEMIT(" sectors_per_bit:%llu", (unsigned long long)ic->sectors_per_block << ic->log2_blocks_per_bitmap_bit);
+ DMEMIT(" sectors_per_bit:%llu", (sector_t)ic->sectors_per_block << ic->log2_blocks_per_bitmap_bit);
DMEMIT(" bitmap_flush_interval:%u", jiffies_to_msecs(ic->bitmap_flush_interval));
}
if ((ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0)
@@ -3073,6 +3217,24 @@ static int calculate_device_limits(struct dm_integrity_c *ic)
return 0;
}
+static void get_provided_data_sectors(struct dm_integrity_c *ic)
+{
+ if (!ic->meta_dev) {
+ int test_bit;
+ ic->provided_data_sectors = 0;
+ for (test_bit = fls64(ic->meta_device_sectors) - 1; test_bit >= 3; test_bit--) {
+ __u64 prev_data_sectors = ic->provided_data_sectors;
+
+ ic->provided_data_sectors |= (sector_t)1 << test_bit;
+ if (calculate_device_limits(ic))
+ ic->provided_data_sectors = prev_data_sectors;
+ }
+ } else {
+ ic->provided_data_sectors = ic->data_device_sectors;
+ ic->provided_data_sectors &= ~(sector_t)(ic->sectors_per_block - 1);
+ }
+}
+
static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sectors, unsigned interleave_sectors)
{
unsigned journal_sections;
@@ -3100,20 +3262,15 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec
ic->sb->log2_interleave_sectors = max((__u8)MIN_LOG2_INTERLEAVE_SECTORS, ic->sb->log2_interleave_sectors);
ic->sb->log2_interleave_sectors = min((__u8)MAX_LOG2_INTERLEAVE_SECTORS, ic->sb->log2_interleave_sectors);
- ic->provided_data_sectors = 0;
- for (test_bit = fls64(ic->meta_device_sectors) - 1; test_bit >= 3; test_bit--) {
- __u64 prev_data_sectors = ic->provided_data_sectors;
-
- ic->provided_data_sectors |= (sector_t)1 << test_bit;
- if (calculate_device_limits(ic))
- ic->provided_data_sectors = prev_data_sectors;
- }
+ get_provided_data_sectors(ic);
if (!ic->provided_data_sectors)
return -EINVAL;
} else {
ic->sb->log2_interleave_sectors = 0;
- ic->provided_data_sectors = ic->data_device_sectors;
- ic->provided_data_sectors &= ~(sector_t)(ic->sectors_per_block - 1);
+
+ get_provided_data_sectors(ic);
+ if (!ic->provided_data_sectors)
+ return -EINVAL;
try_smaller_buffer:
ic->sb->journal_sections = cpu_to_le32(0);
@@ -3733,6 +3890,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto bad;
} else if (!strcmp(opt_string, "recalculate")) {
ic->recalculate_flag = true;
+ } else if (!strcmp(opt_string, "allow_discards")) {
+ ic->discard = true;
} else if (!strcmp(opt_string, "fix_padding")) {
ic->fix_padding = true;
} else {
@@ -3791,6 +3950,12 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto bad;
}
+ if (ic->discard && !ic->internal_hash) {
+ r = -EINVAL;
+ ti->error = "Discard can be only used with internal hash";
+ goto bad;
+ }
+
ic->autocommit_jiffies = msecs_to_jiffies(sync_msec);
ic->autocommit_msec = sync_msec;
timer_setup(&ic->autocommit_timer, autocommit_fn, 0);
@@ -3920,19 +4085,19 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto bad;
}
}
- ic->provided_data_sectors = le64_to_cpu(ic->sb->provided_data_sectors);
- if (ic->provided_data_sectors != le64_to_cpu(ic->sb->provided_data_sectors)) {
- /* test for overflow */
- r = -EINVAL;
- ti->error = "The superblock has 64-bit device size, but the kernel was compiled with 32-bit sectors";
- goto bad;
- }
if (!!(ic->sb->flags & cpu_to_le32(SB_FLAG_HAVE_JOURNAL_MAC)) != !!ic->journal_mac_alg.alg_string) {
r = -EINVAL;
ti->error = "Journal mac mismatch";
goto bad;
}
+ get_provided_data_sectors(ic);
+ if (!ic->provided_data_sectors) {
+ r = -EINVAL;
+ ti->error = "The device is too small";
+ goto bad;
+ }
+
try_smaller_buffer:
r = calculate_device_limits(ic);
if (r) {
@@ -3994,10 +4159,9 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
DEBUG_print(" initial_sectors 0x%x\n", ic->initial_sectors);
DEBUG_print(" metadata_run 0x%x\n", ic->metadata_run);
DEBUG_print(" log2_metadata_run %d\n", ic->log2_metadata_run);
- DEBUG_print(" provided_data_sectors 0x%llx (%llu)\n", (unsigned long long)ic->provided_data_sectors,
- (unsigned long long)ic->provided_data_sectors);
+ DEBUG_print(" provided_data_sectors 0x%llx (%llu)\n", ic->provided_data_sectors, ic->provided_data_sectors);
DEBUG_print(" log2_buffer_sectors %u\n", ic->log2_buffer_sectors);
- DEBUG_print(" bits_in_journal %llu\n", (unsigned long long)bits_in_journal);
+ DEBUG_print(" bits_in_journal %llu\n", bits_in_journal);
if (ic->recalculate_flag && !(ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))) {
ic->sb->flags |= cpu_to_le32(SB_FLAG_RECALCULATING);
@@ -4121,6 +4285,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->num_flush_bios = 1;
ti->flush_supported = true;
+ if (ic->discard)
+ ti->num_discard_bios = 1;
return 0;
@@ -4202,7 +4368,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
static struct target_type integrity_target = {
.name = "integrity",
- .version = {1, 5, 0},
+ .version = {1, 6, 0},
.module = THIS_MODULE,
.features = DM_TARGET_SINGLETON | DM_TARGET_INTEGRITY,
.ctr = dm_integrity_ctr,
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 3ceeb6b404ed..49147e634046 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -551,6 +551,7 @@ void verity_fec_dtr(struct dm_verity *v)
mempool_exit(&f->rs_pool);
mempool_exit(&f->prealloc_pool);
mempool_exit(&f->extra_pool);
+ mempool_exit(&f->output_pool);
kmem_cache_destroy(f->cache);
if (f->data_bufio)
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index a09bdc000e64..114927da9cc9 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -26,6 +26,8 @@
#define AUTOCOMMIT_BLOCKS_SSD 65536
#define AUTOCOMMIT_BLOCKS_PMEM 64
#define AUTOCOMMIT_MSEC 1000
+#define MAX_AGE_DIV 16
+#define MAX_AGE_UNSPECIFIED -1UL
#define BITMAP_GRANULARITY 65536
#if BITMAP_GRANULARITY < PAGE_SIZE
@@ -88,6 +90,7 @@ struct wc_entry {
:47
#endif
;
+ unsigned long age;
#ifdef DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
uint64_t original_sector;
uint64_t seq_count;
@@ -119,6 +122,7 @@ struct dm_writecache {
size_t writeback_size;
size_t freelist_high_watermark;
size_t freelist_low_watermark;
+ unsigned long max_age;
unsigned uncommitted_blocks;
unsigned autocommit_blocks;
@@ -130,6 +134,8 @@ struct dm_writecache {
struct timer_list autocommit_timer;
struct wait_queue_head freelist_wait;
+ struct timer_list max_age_timer;
+
atomic_t bio_in_progress[2];
struct wait_queue_head bio_in_progress_wait[2];
@@ -160,6 +166,7 @@ struct dm_writecache {
bool autocommit_time_set:1;
bool writeback_fua_set:1;
bool flush_on_suspend:1;
+ bool cleaner:1;
unsigned writeback_all;
struct workqueue_struct *writeback_wq;
@@ -502,6 +509,34 @@ static void ssd_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
memset(wc->dirty_bitmap, 0, wc->dirty_bitmap_size);
}
+static void ssd_commit_superblock(struct dm_writecache *wc)
+{
+ int r;
+ struct dm_io_region region;
+ struct dm_io_request req;
+
+ region.bdev = wc->ssd_dev->bdev;
+ region.sector = 0;
+ region.count = PAGE_SIZE;
+
+ if (unlikely(region.sector + region.count > wc->metadata_sectors))
+ region.count = wc->metadata_sectors - region.sector;
+
+ region.sector += wc->start_sector;
+
+ req.bi_op = REQ_OP_WRITE;
+ req.bi_op_flags = REQ_SYNC | REQ_FUA;
+ req.mem.type = DM_IO_VMA;
+ req.mem.ptr.vma = (char *)wc->memory_map;
+ req.client = wc->dm_io;
+ req.notify.fn = NULL;
+ req.notify.context = NULL;
+
+ r = dm_io(&req, 1, ®ion, NULL);
+ if (unlikely(r))
+ writecache_error(wc, r, "error writing superblock");
+}
+
static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
{
if (WC_MODE_PMEM(wc))
@@ -596,6 +631,7 @@ static void writecache_insert_entry(struct dm_writecache *wc, struct wc_entry *i
rb_link_node(&ins->rb_node, parent, node);
rb_insert_color(&ins->rb_node, &wc->tree);
list_add(&ins->lru, &wc->lru);
+ ins->age = jiffies;
}
static void writecache_unlink(struct dm_writecache *wc, struct wc_entry *e)
@@ -631,6 +667,16 @@ static inline void writecache_verify_watermark(struct dm_writecache *wc)
queue_work(wc->writeback_wq, &wc->writeback_work);
}
+static void writecache_max_age_timer(struct timer_list *t)
+{
+ struct dm_writecache *wc = from_timer(wc, t, max_age_timer);
+
+ if (!dm_suspended(wc->ti) && !writecache_has_error(wc)) {
+ queue_work(wc->writeback_wq, &wc->writeback_work);
+ mod_timer(&wc->max_age_timer, jiffies + wc->max_age / MAX_AGE_DIV);
+ }
+}
+
static struct wc_entry *writecache_pop_from_freelist(struct dm_writecache *wc, sector_t expected_sector)
{
struct wc_entry *e;
@@ -741,8 +787,10 @@ static void writecache_flush(struct dm_writecache *wc)
wc->seq_count++;
pmem_assign(sb(wc)->seq_count, cpu_to_le64(wc->seq_count));
- writecache_flush_region(wc, &sb(wc)->seq_count, sizeof sb(wc)->seq_count);
- writecache_commit_flushed(wc, false);
+ if (WC_MODE_PMEM(wc))
+ writecache_commit_flushed(wc, false);
+ else
+ ssd_commit_superblock(wc);
wc->overwrote_committed = false;
@@ -837,6 +885,7 @@ static void writecache_suspend(struct dm_target *ti)
bool flush_on_suspend;
del_timer_sync(&wc->autocommit_timer);
+ del_timer_sync(&wc->max_age_timer);
wc_lock(wc);
writecache_flush(wc);
@@ -876,6 +925,7 @@ static int writecache_alloc_entries(struct dm_writecache *wc)
struct wc_entry *e = &wc->entries[b];
e->index = b;
e->write_in_progress = false;
+ cond_resched();
}
return 0;
@@ -930,6 +980,7 @@ static void writecache_resume(struct dm_target *ti)
e->original_sector = le64_to_cpu(wme.original_sector);
e->seq_count = le64_to_cpu(wme.seq_count);
}
+ cond_resched();
}
#endif
for (b = 0; b < wc->n_blocks; b++) {
@@ -973,6 +1024,9 @@ static void writecache_resume(struct dm_target *ti)
writecache_verify_watermark(wc);
+ if (wc->max_age != MAX_AGE_UNSPECIFIED)
+ mod_timer(&wc->max_age_timer, jiffies + wc->max_age / MAX_AGE_DIV);
+
wc_unlock(wc);
}
@@ -1021,6 +1075,28 @@ static int process_flush_on_suspend_mesg(unsigned argc, char **argv, struct dm_w
return 0;
}
+static void activate_cleaner(struct dm_writecache *wc)
+{
+ wc->flush_on_suspend = true;
+ wc->cleaner = true;
+ wc->freelist_high_watermark = wc->n_blocks;
+ wc->freelist_low_watermark = wc->n_blocks;
+}
+
+static int process_cleaner_mesg(unsigned argc, char **argv, struct dm_writecache *wc)
+{
+ if (argc != 1)
+ return -EINVAL;
+
+ wc_lock(wc);
+ activate_cleaner(wc);
+ if (!dm_suspended(wc->ti))
+ writecache_verify_watermark(wc);
+ wc_unlock(wc);
+
+ return 0;
+}
+
static int writecache_message(struct dm_target *ti, unsigned argc, char **argv,
char *result, unsigned maxlen)
{
@@ -1031,6 +1107,8 @@ static int writecache_message(struct dm_target *ti, unsigned argc, char **argv,
r = process_flush_mesg(argc, argv, wc);
else if (!strcasecmp(argv[0], "flush_on_suspend"))
r = process_flush_on_suspend_mesg(argc, argv, wc);
+ else if (!strcasecmp(argv[0], "cleaner"))
+ r = process_cleaner_mesg(argc, argv, wc);
else
DMERR("unrecognised message received: %s", argv[0]);
@@ -1194,6 +1272,7 @@ static int writecache_map(struct dm_target *ti, struct bio *bio)
}
} else {
do {
+ bool found_entry = false;
if (writecache_has_error(wc))
goto unlock_error;
e = writecache_find_entry(wc, bio->bi_iter.bi_sector, 0);
@@ -1204,9 +1283,25 @@ static int writecache_map(struct dm_target *ti, struct bio *bio)
wc->overwrote_committed = true;
goto bio_copy;
}
+ found_entry = true;
+ } else {
+ if (unlikely(wc->cleaner))
+ goto direct_write;
}
e = writecache_pop_from_freelist(wc, (sector_t)-1);
if (unlikely(!e)) {
+ if (!found_entry) {
+direct_write:
+ e = writecache_find_entry(wc, bio->bi_iter.bi_sector, WFE_RETURN_FOLLOWING);
+ if (e) {
+ sector_t next_boundary = read_original_sector(wc, e) - bio->bi_iter.bi_sector;
+ BUG_ON(!next_boundary);
+ if (next_boundary < bio->bi_iter.bi_size >> SECTOR_SHIFT) {
+ dm_accept_partial_bio(bio, next_boundary);
+ }
+ }
+ goto unlock_remap_origin;
+ }
writecache_wait_on_freelist(wc);
continue;
}
@@ -1619,7 +1714,9 @@ static void writecache_writeback(struct work_struct *work)
wbl.size = 0;
while (!list_empty(&wc->lru) &&
(wc->writeback_all ||
- wc->freelist_size + wc->writeback_size <= wc->freelist_low_watermark)) {
+ wc->freelist_size + wc->writeback_size <= wc->freelist_low_watermark ||
+ (jiffies - container_of(wc->lru.prev, struct wc_entry, lru)->age >=
+ wc->max_age - wc->max_age / MAX_AGE_DIV))) {
n_walked++;
if (unlikely(n_walked > WRITEBACK_LATENCY) &&
@@ -1791,8 +1888,10 @@ static int init_memory(struct dm_writecache *wc)
pmem_assign(sb(wc)->n_blocks, cpu_to_le64(wc->n_blocks));
pmem_assign(sb(wc)->seq_count, cpu_to_le64(0));
- for (b = 0; b < wc->n_blocks; b++)
+ for (b = 0; b < wc->n_blocks; b++) {
write_original_sector_seq_count(wc, &wc->entries[b], -1, -1);
+ cond_resched();
+ }
writecache_flush_all_metadata(wc);
writecache_commit_flushed(wc, false);
@@ -1882,9 +1981,11 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
wc->ti = ti;
mutex_init(&wc->lock);
+ wc->max_age = MAX_AGE_UNSPECIFIED;
writecache_poison_lists(wc);
init_waitqueue_head(&wc->freelist_wait);
timer_setup(&wc->autocommit_timer, writecache_autocommit_timer, 0);
+ timer_setup(&wc->max_age_timer, writecache_max_age_timer, 0);
for (i = 0; i < 2; i++) {
atomic_set(&wc->bio_in_progress[i], 0);
@@ -2058,6 +2159,16 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto invalid_optional;
wc->autocommit_jiffies = msecs_to_jiffies(autocommit_msecs);
wc->autocommit_time_set = true;
+ } else if (!strcasecmp(string, "max_age") && opt_params >= 1) {
+ unsigned max_age_msecs;
+ string = dm_shift_arg(&as), opt_params--;
+ if (sscanf(string, "%u%c", &max_age_msecs, &dummy) != 1)
+ goto invalid_optional;
+ if (max_age_msecs > 86400000)
+ goto invalid_optional;
+ wc->max_age = msecs_to_jiffies(max_age_msecs);
+ } else if (!strcasecmp(string, "cleaner")) {
+ wc->cleaner = true;
} else if (!strcasecmp(string, "fua")) {
if (WC_MODE_PMEM(wc)) {
wc->writeback_fua = true;
@@ -2235,6 +2346,9 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
do_div(x, 100);
wc->freelist_low_watermark = x;
+ if (wc->cleaner)
+ activate_cleaner(wc);
+
r = writecache_alloc_entries(wc);
if (r) {
ti->error = "Cannot allocate memory";
@@ -2278,9 +2392,9 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
extra_args = 0;
if (wc->start_sector)
extra_args += 2;
- if (wc->high_wm_percent_set)
+ if (wc->high_wm_percent_set && !wc->cleaner)
extra_args += 2;
- if (wc->low_wm_percent_set)
+ if (wc->low_wm_percent_set && !wc->cleaner)
extra_args += 2;
if (wc->max_writeback_jobs_set)
extra_args += 2;
@@ -2288,19 +2402,21 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
extra_args += 2;
if (wc->autocommit_time_set)
extra_args += 2;
+ if (wc->cleaner)
+ extra_args++;
if (wc->writeback_fua_set)
extra_args++;
DMEMIT("%u", extra_args);
if (wc->start_sector)
DMEMIT(" start_sector %llu", (unsigned long long)wc->start_sector);
- if (wc->high_wm_percent_set) {
+ if (wc->high_wm_percent_set && !wc->cleaner) {
x = (uint64_t)wc->freelist_high_watermark * 100;
x += wc->n_blocks / 2;
do_div(x, (size_t)wc->n_blocks);
DMEMIT(" high_watermark %u", 100 - (unsigned)x);
}
- if (wc->low_wm_percent_set) {
+ if (wc->low_wm_percent_set && !wc->cleaner) {
x = (uint64_t)wc->freelist_low_watermark * 100;
x += wc->n_blocks / 2;
do_div(x, (size_t)wc->n_blocks);
@@ -2312,6 +2428,10 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
DMEMIT(" autocommit_blocks %u", wc->autocommit_blocks);
if (wc->autocommit_time_set)
DMEMIT(" autocommit_time %u", jiffies_to_msecs(wc->autocommit_jiffies));
+ if (wc->max_age != MAX_AGE_UNSPECIFIED)
+ DMEMIT(" max_age %u", jiffies_to_msecs(wc->max_age));
+ if (wc->cleaner)
+ DMEMIT(" cleaner");
if (wc->writeback_fua_set)
DMEMIT(" %sfua", wc->writeback_fua ? "" : "no");
break;
@@ -2320,7 +2440,7 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
static struct target_type writecache_target = {
.name = "writecache",
- .version = {1, 2, 0},
+ .version = {1, 3, 0},
.module = THIS_MODULE,
.ctr = writecache_ctr,
.dtr = writecache_dtr,
diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 516c7b671d25..369de15c4e80 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1109,7 +1109,6 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int idx, void *data)
switch (blkz->type) {
case BLK_ZONE_TYPE_CONVENTIONAL:
set_bit(DMZ_RND, &zone->flags);
- zmd->nr_rnd_zones++;
break;
case BLK_ZONE_TYPE_SEQWRITE_REQ:
case BLK_ZONE_TYPE_SEQWRITE_PREF:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index b25465d9e030..90048697b2ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -904,6 +904,7 @@ const struct mlx5_flow_cmds *mlx5_fs_cmd_get_default(enum fs_flow_table_type typ
case FS_FT_SNIFFER_TX:
case FS_FT_NIC_TX:
case FS_FT_RDMA_RX:
+ case FS_FT_RDMA_TX:
return mlx5_fs_cmd_get_fw_cmds();
default:
return mlx5_fs_cmd_get_stub_cmds();
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 62ce2b9417ab..d5defe09339a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -87,6 +87,15 @@
.identified_miss_table_mode), \
FS_CAP(flow_table_properties_nic_transmit.flow_table_modify))
+#define FS_CHAINING_CAPS_RDMA_TX \
+ FS_REQUIRED_CAPS( \
+ FS_CAP(flow_table_properties_nic_transmit_rdma.flow_modify_en), \
+ FS_CAP(flow_table_properties_nic_transmit_rdma.modify_root), \
+ FS_CAP(flow_table_properties_nic_transmit_rdma \
+ .identified_miss_table_mode), \
+ FS_CAP(flow_table_properties_nic_transmit_rdma \
+ .flow_table_modify))
+
#define LEFTOVERS_NUM_LEVELS 1
#define LEFTOVERS_NUM_PRIOS 1
@@ -202,6 +211,18 @@ static struct init_tree_node rdma_rx_root_fs = {
}
};
+static struct init_tree_node rdma_tx_root_fs = {
+ .type = FS_TYPE_NAMESPACE,
+ .ar_size = 1,
+ .children = (struct init_tree_node[]) {
+ ADD_PRIO(0, MLX5_BY_PASS_NUM_PRIOS, 0,
+ FS_CHAINING_CAPS_RDMA_TX,
+ ADD_NS(MLX5_FLOW_TABLE_MISS_ACTION_DEF,
+ ADD_MULTIPLE_PRIO(MLX5_BY_PASS_NUM_PRIOS,
+ BY_PASS_PRIO_NUM_LEVELS))),
+ }
+};
+
enum fs_i_lock_class {
FS_LOCK_GRANDPARENT,
FS_LOCK_PARENT,
@@ -2121,6 +2142,8 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
} else if (type == MLX5_FLOW_NAMESPACE_RDMA_RX_KERNEL) {
root_ns = steering->rdma_rx_root_ns;
prio = RDMA_RX_KERNEL_PRIO;
+ } else if (type == MLX5_FLOW_NAMESPACE_RDMA_TX) {
+ root_ns = steering->rdma_tx_root_ns;
} else { /* Must be NIC RX */
root_ns = steering->root_ns;
prio = type;
@@ -2524,6 +2547,7 @@ void mlx5_cleanup_fs(struct mlx5_core_dev *dev)
cleanup_root_ns(steering->sniffer_rx_root_ns);
cleanup_root_ns(steering->sniffer_tx_root_ns);
cleanup_root_ns(steering->rdma_rx_root_ns);
+ cleanup_root_ns(steering->rdma_tx_root_ns);
cleanup_root_ns(steering->egress_root_ns);
mlx5_cleanup_fc_stats(dev);
kmem_cache_destroy(steering->ftes_cache);
@@ -2580,6 +2604,29 @@ static int init_rdma_rx_root_ns(struct mlx5_flow_steering *steering)
return err;
}
+static int init_rdma_tx_root_ns(struct mlx5_flow_steering *steering)
+{
+ int err;
+
+ steering->rdma_tx_root_ns = create_root_ns(steering, FS_FT_RDMA_TX);
+ if (!steering->rdma_tx_root_ns)
+ return -ENOMEM;
+
+ err = init_root_tree(steering, &rdma_tx_root_fs,
+ &steering->rdma_tx_root_ns->ns.node);
+ if (err)
+ goto out_err;
+
+ set_prio_attrs(steering->rdma_tx_root_ns);
+
+ return 0;
+
+out_err:
+ cleanup_root_ns(steering->rdma_tx_root_ns);
+ steering->rdma_tx_root_ns = NULL;
+ return err;
+}
+
/* FT and tc chains are stored in the same array so we can re-use the
* mlx5_get_fdb_sub_ns() and tc api for FT chains.
* When creating a new ns for each chain store it in the first available slot.
@@ -2890,6 +2937,12 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
goto err;
}
+ if (MLX5_CAP_FLOWTABLE_RDMA_TX(dev, ft_support)) {
+ err = init_rdma_tx_root_ns(steering);
+ if (err)
+ goto err;
+ }
+
if (MLX5_IPSEC_DEV(dev) || MLX5_CAP_FLOWTABLE_NIC_TX(dev, ft_support)) {
err = init_egress_root_ns(steering);
if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index be5f5e32c1e8..508108c58dae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -86,7 +86,8 @@ enum fs_flow_table_type {
FS_FT_SNIFFER_RX = 0X5,
FS_FT_SNIFFER_TX = 0X6,
FS_FT_RDMA_RX = 0X7,
- FS_FT_MAX_TYPE = FS_FT_RDMA_RX,
+ FS_FT_RDMA_TX = 0X8,
+ FS_FT_MAX_TYPE = FS_FT_RDMA_TX,
};
enum fs_flow_table_op_mod {
@@ -116,6 +117,7 @@ struct mlx5_flow_steering {
struct mlx5_flow_root_namespace *sniffer_tx_root_ns;
struct mlx5_flow_root_namespace *sniffer_rx_root_ns;
struct mlx5_flow_root_namespace *rdma_rx_root_ns;
+ struct mlx5_flow_root_namespace *rdma_tx_root_ns;
struct mlx5_flow_root_namespace *egress_root_ns;
};
@@ -316,7 +318,8 @@ void mlx5_cleanup_fs(struct mlx5_core_dev *dev);
(type == FS_FT_SNIFFER_RX) ? MLX5_CAP_FLOWTABLE_SNIFFER_RX(mdev, cap) : \
(type == FS_FT_SNIFFER_TX) ? MLX5_CAP_FLOWTABLE_SNIFFER_TX(mdev, cap) : \
(type == FS_FT_RDMA_RX) ? MLX5_CAP_FLOWTABLE_RDMA_RX(mdev, cap) : \
- (BUILD_BUG_ON_ZERO(FS_FT_RDMA_RX != FS_FT_MAX_TYPE))\
+ (type == FS_FT_RDMA_TX) ? MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) : \
+ (BUILD_BUG_ON_ZERO(FS_FT_RDMA_TX != FS_FT_MAX_TYPE))\
)
#endif
diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c
index a3cdb0036c5d..f3a0f412b43b 100644
--- a/fs/autofs/dev-ioctl.c
+++ b/fs/autofs/dev-ioctl.c
@@ -186,7 +186,7 @@ static int find_autofs_mount(const char *pathname,
struct path path;
int err;
- err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0);
+ err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path);
if (err)
return err;
err = -ENOENT;
@@ -519,8 +519,8 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
if (!fp || param->ioctlfd == -1) {
if (autofs_type_any(type))
- err = kern_path_mountpoint(AT_FDCWD,
- name, &path, LOOKUP_FOLLOW);
+ err = kern_path(name, LOOKUP_FOLLOW | LOOKUP_MOUNTPOINT,
+ &path);
else
err = find_autofs_mount(name, &path,
test_by_type, &type);
diff --git a/fs/exec.c b/fs/exec.c
index 51dd4ea989fc..69877148f299 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1036,16 +1036,26 @@ ssize_t read_code(struct file *file, unsigned long addr, loff_t pos, size_t len)
}
EXPORT_SYMBOL(read_code);
+/*
+ * Maps the mm_struct mm into the current task struct.
+ * On success, this function returns with the mutex
+ * exec_update_mutex locked.
+ */
static int exec_mmap(struct mm_struct *mm)
{
struct task_struct *tsk;
struct mm_struct *old_mm, *active_mm;
+ int ret;
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
exec_mm_release(tsk, old_mm);
+ ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
+ if (ret)
+ return ret;
+
if (old_mm) {
sync_mm_rss(old_mm);
/*
@@ -1057,9 +1067,11 @@ static int exec_mmap(struct mm_struct *mm)
down_read(&old_mm->mmap_sem);
if (unlikely(old_mm->core_state)) {
up_read(&old_mm->mmap_sem);
+ mutex_unlock(&tsk->signal->exec_update_mutex);
return -EINTR;
}
}
+
task_lock(tsk);
active_mm = tsk->active_mm;
membarrier_exec_mmap(mm);
@@ -1215,10 +1227,22 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
-#ifdef CONFIG_POSIX_TIMERS
- exit_itimers(sig);
- flush_itimer_signals();
-#endif
+ BUG_ON(!thread_group_leader(tsk));
+ return 0;
+
+killed:
+ /* protects against exit_notify() and __exit_signal() */
+ read_lock(&tasklist_lock);
+ sig->group_exit_task = NULL;
+ sig->notify_count = 0;
+ read_unlock(&tasklist_lock);
+ return -EAGAIN;
+}
+
+
+static int unshare_sighand(struct task_struct *me)
+{
+ struct sighand_struct *oldsighand = me->sighand;
if (refcount_read(&oldsighand->count) != 1) {
struct sighand_struct *newsighand;
@@ -1236,23 +1260,13 @@ static int de_thread(struct task_struct *tsk)
write_lock_irq(&tasklist_lock);
spin_lock(&oldsighand->siglock);
- rcu_assign_pointer(tsk->sighand, newsighand);
+ rcu_assign_pointer(me->sighand, newsighand);
spin_unlock(&oldsighand->siglock);
write_unlock_irq(&tasklist_lock);
__cleanup_sighand(oldsighand);
}
-
- BUG_ON(!thread_group_leader(tsk));
return 0;
-
-killed:
- /* protects against exit_notify() and __exit_signal() */
- read_lock(&tasklist_lock);
- sig->group_exit_task = NULL;
- sig->notify_count = 0;
- read_unlock(&tasklist_lock);
- return -EAGAIN;
}
char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
@@ -1286,13 +1300,13 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
*/
int flush_old_exec(struct linux_binprm * bprm)
{
+ struct task_struct *me = current;
int retval;
/*
- * Make sure we have a private signal table and that
- * we are unassociated from the previous thread group.
+ * Make this the only thread in the thread group.
*/
- retval = de_thread(current);
+ retval = de_thread(me);
if (retval)
goto out;
@@ -1312,18 +1326,31 @@ int flush_old_exec(struct linux_binprm * bprm)
goto out;
/*
- * After clearing bprm->mm (to mark that current is using the
- * prepared mm now), we have nothing left of the original
+ * After setting bprm->called_exec_mmap (to mark that current is
+ * using the prepared mm now), we have nothing left of the original
* process. If anything from here on returns an error, the check
* in search_binary_handler() will SEGV current.
*/
+ bprm->called_exec_mmap = 1;
bprm->mm = NULL;
+#ifdef CONFIG_POSIX_TIMERS
+ exit_itimers(me->signal);
+ flush_itimer_signals();
+#endif
+
+ /*
+ * Make the signal table private.
+ */
+ retval = unshare_sighand(me);
+ if (retval)
+ goto out;
+
set_fs(USER_DS);
- current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
+ me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
PF_NOFREEZE | PF_NO_SETAFFINITY);
flush_thread();
- current->personality &= ~bprm->per_clear;
+ me->personality &= ~bprm->per_clear;
/*
* We have to apply CLOEXEC before we change whether the process is
@@ -1331,7 +1358,7 @@ int flush_old_exec(struct linux_binprm * bprm)
* trying to access the should-be-closed file descriptors of a process
* undergoing exec(2).
*/
- do_close_on_exec(current->files);
+ do_close_on_exec(me->files);
return 0;
out:
@@ -1412,7 +1439,7 @@ void setup_new_exec(struct linux_binprm * bprm)
/* An exec changes our domain. We are no longer part of the thread
group */
- current->self_exec_id++;
+ WRITE_ONCE(current->self_exec_id, current->self_exec_id + 1);
flush_signal_handlers(current, 0);
}
EXPORT_SYMBOL(setup_new_exec);
@@ -1450,6 +1477,8 @@ static void free_bprm(struct linux_binprm *bprm)
{
free_arg_pages(bprm);
if (bprm->cred) {
+ if (bprm->called_exec_mmap)
+ mutex_unlock(¤t->signal->exec_update_mutex);
mutex_unlock(¤t->signal->cred_guard_mutex);
abort_creds(bprm->cred);
}
@@ -1499,6 +1528,7 @@ void install_exec_creds(struct linux_binprm *bprm)
* credentials; any time after this it may be unlocked.
*/
security_bprm_committed_creds(bprm);
+ mutex_unlock(¤t->signal->exec_update_mutex);
mutex_unlock(¤t->signal->cred_guard_mutex);
}
EXPORT_SYMBOL(install_exec_creds);
@@ -1690,7 +1720,7 @@ int search_binary_handler(struct linux_binprm *bprm)
read_lock(&binfmt_lock);
put_binfmt(fmt);
- if (retval < 0 && !bprm->mm) {
+ if (retval < 0 && bprm->called_exec_mmap) {
/* we got to flush_old_exec() and failed after it */
read_unlock(&binfmt_lock);
force_sigsegv(SIGSEGV);
diff --git a/fs/internal.h b/fs/internal.h
index 2d2f926010c1..af1026ef8373 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -60,7 +60,6 @@ extern int finish_clean_context(struct fs_context *fc);
*/
extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
struct path *path, struct path *root);
-extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
long do_mknodat(int dfd, const char __user *filename, umode_t mode,
unsigned int dev);
long do_mkdirat(int dfd, const char __user *pathname, umode_t mode);
diff --git a/fs/namei.c b/fs/namei.c
index 6ba60606626b..0b2418664c0d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -516,9 +516,10 @@ struct nameidata {
} *stack, internal[EMBEDDED_LEVELS];
struct filename *name;
struct nameidata *saved;
- struct inode *link_inode;
unsigned root_seq;
int dfd;
+ kuid_t dir_uid;
+ umode_t dir_mode;
} __randomize_layout;
static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
@@ -543,52 +544,34 @@ static void restore_nameidata(void)
kfree(now->stack);
}
-static int __nd_alloc_stack(struct nameidata *nd)
+static bool nd_alloc_stack(struct nameidata *nd)
{
struct saved *p;
- if (nd->flags & LOOKUP_RCU) {
- p= kmalloc_array(MAXSYMLINKS, sizeof(struct saved),
- GFP_ATOMIC);
- if (unlikely(!p))
- return -ECHILD;
- } else {
- p= kmalloc_array(MAXSYMLINKS, sizeof(struct saved),
- GFP_KERNEL);
- if (unlikely(!p))
- return -ENOMEM;
- }
+ p= kmalloc_array(MAXSYMLINKS, sizeof(struct saved),
+ nd->flags & LOOKUP_RCU ? GFP_ATOMIC : GFP_KERNEL);
+ if (unlikely(!p))
+ return false;
memcpy(p, nd->internal, sizeof(nd->internal));
nd->stack = p;
- return 0;
+ return true;
}
/**
- * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
- * @path: nameidate to verify
+ * path_connected - Verify that a dentry is below mnt.mnt_root
*
* Rename can sometimes move a file or directory outside of a bind
* mount, path_connected allows those cases to be detected.
*/
-static bool path_connected(const struct path *path)
+static bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
{
- struct vfsmount *mnt = path->mnt;
struct super_block *sb = mnt->mnt_sb;
/* Bind mounts and multi-root filesystems can have disconnected paths */
if (!(sb->s_iflags & SB_I_MULTIROOT) && (mnt->mnt_root == sb->s_root))
return true;
- return is_subdir(path->dentry, mnt->mnt_root);
-}
-
-static inline int nd_alloc_stack(struct nameidata *nd)
-{
- if (likely(nd->depth != EMBEDDED_LEVELS))
- return 0;
- if (likely(nd->stack != nd->internal))
- return 0;
- return __nd_alloc_stack(nd);
+ return is_subdir(dentry, mnt->mnt_root);
}
static void drop_links(struct nameidata *nd)
@@ -621,10 +604,9 @@ static void terminate_walk(struct nameidata *nd)
}
/* path_put is needed afterwards regardless of success or failure */
-static bool legitimize_path(struct nameidata *nd,
- struct path *path, unsigned seq)
+static bool __legitimize_path(struct path *path, unsigned seq, unsigned mseq)
{
- int res = __legitimize_mnt(path->mnt, nd->m_seq);
+ int res = __legitimize_mnt(path->mnt, mseq);
if (unlikely(res)) {
if (res > 0)
path->mnt = NULL;
@@ -638,6 +620,12 @@ static bool legitimize_path(struct nameidata *nd,
return !read_seqcount_retry(&path->dentry->d_seq, seq);
}
+static inline bool legitimize_path(struct nameidata *nd,
+ struct path *path, unsigned seq)
+{
+ return __legitimize_path(path, nd->m_seq, seq);
+}
+
static bool legitimize_links(struct nameidata *nd)
{
int i;
@@ -952,25 +940,6 @@ static int set_root(struct nameidata *nd)
return 0;
}
-static void path_put_conditional(struct path *path, struct nameidata *nd)
-{
- dput(path->dentry);
- if (path->mnt != nd->path.mnt)
- mntput(path->mnt);
-}
-
-static inline void path_to_nameidata(const struct path *path,
- struct nameidata *nd)
-{
- if (!(nd->flags & LOOKUP_RCU)) {
- dput(nd->path.dentry);
- if (nd->path.mnt != path->mnt)
- mntput(nd->path.mnt);
- }
- nd->path.mnt = path->mnt;
- nd->path.dentry = path->dentry;
-}
-
static int nd_jump_root(struct nameidata *nd)
{
if (unlikely(nd->flags & LOOKUP_BENEATH))
@@ -1063,28 +1032,21 @@ int sysctl_protected_regular __read_mostly;
*
* Returns 0 if following the symlink is allowed, -ve on error.
*/
-static inline int may_follow_link(struct nameidata *nd)
+static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
{
- const struct inode *inode;
- const struct inode *parent;
- kuid_t puid;
-
if (!sysctl_protected_symlinks)
return 0;
/* Allowed if owner and follower match. */
- inode = nd->link_inode;
if (uid_eq(current_cred()->fsuid, inode->i_uid))
return 0;
/* Allowed if parent directory not sticky and world-writable. */
- parent = nd->inode;
- if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
+ if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
return 0;
/* Allowed if parent directory and link owner match. */
- puid = parent->i_uid;
- if (uid_valid(puid) && uid_eq(puid, inode->i_uid))
+ if (uid_valid(nd->dir_uid) && uid_eq(nd->dir_uid, inode->i_uid))
return 0;
if (nd->flags & LOOKUP_RCU)
@@ -1207,63 +1169,6 @@ static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid,
return 0;
}
-static __always_inline
-const char *get_link(struct nameidata *nd)
-{
- struct saved *last = nd->stack + nd->depth - 1;
- struct dentry *dentry = last->link.dentry;
- struct inode *inode = nd->link_inode;
- int error;
- const char *res;
-
- if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
- return ERR_PTR(-ELOOP);
-
- if (!(nd->flags & LOOKUP_RCU)) {
- touch_atime(&last->link);
- cond_resched();
- } else if (atime_needs_update(&last->link, inode)) {
- if (unlikely(unlazy_walk(nd)))
- return ERR_PTR(-ECHILD);
- touch_atime(&last->link);
- }
-
- error = security_inode_follow_link(dentry, inode,
- nd->flags & LOOKUP_RCU);
- if (unlikely(error))
- return ERR_PTR(error);
-
- nd->last_type = LAST_BIND;
- res = READ_ONCE(inode->i_link);
- if (!res) {
- const char * (*get)(struct dentry *, struct inode *,
- struct delayed_call *);
- get = inode->i_op->get_link;
- if (nd->flags & LOOKUP_RCU) {
- res = get(NULL, inode, &last->done);
- if (res == ERR_PTR(-ECHILD)) {
- if (unlikely(unlazy_walk(nd)))
- return ERR_PTR(-ECHILD);
- res = get(dentry, inode, &last->done);
- }
- } else {
- res = get(dentry, inode, &last->done);
- }
- if (IS_ERR_OR_NULL(res))
- return res;
- }
- if (*res == '/') {
- error = nd_jump_root(nd);
- if (unlikely(error))
- return ERR_PTR(error);
- while (unlikely(*++res == '/'))
- ;
- }
- if (!*res)
- res = NULL;
- return res;
-}
-
/*
* follow_up - Find the mountpoint of path's vfsmount
*
@@ -1297,19 +1202,59 @@ int follow_up(struct path *path)
}
EXPORT_SYMBOL(follow_up);
+static bool choose_mountpoint_rcu(struct mount *m, const struct path *root,
+ struct path *path, unsigned *seqp)
+{
+ while (mnt_has_parent(m)) {
+ struct dentry *mountpoint = m->mnt_mountpoint;
+
+ m = m->mnt_parent;
+ if (unlikely(root->dentry == mountpoint &&
+ root->mnt == &m->mnt))
+ break;
+ if (mountpoint != m->mnt.mnt_root) {
+ path->mnt = &m->mnt;
+ path->dentry = mountpoint;
+ *seqp = read_seqcount_begin(&mountpoint->d_seq);
+ return true;
+ }
+ }
+ return false;
+}
+
+static bool choose_mountpoint(struct mount *m, const struct path *root,
+ struct path *path)
+{
+ bool found;
+
+ rcu_read_lock();
+ while (1) {
+ unsigned seq, mseq = read_seqbegin(&mount_lock);
+
+ found = choose_mountpoint_rcu(m, root, path, &seq);
+ if (unlikely(!found)) {
+ if (!read_seqretry(&mount_lock, mseq))
+ break;
+ } else {
+ if (likely(__legitimize_path(path, seq, mseq)))
+ break;
+ rcu_read_unlock();
+ path_put(path);
+ rcu_read_lock();
+ }
+ }
+ rcu_read_unlock();
+ return found;
+}
+
/*
* Perform an automount
* - return -EISDIR to tell follow_managed() to stop and return the path we
* were called with.
*/
-static int follow_automount(struct path *path, struct nameidata *nd,
- bool *need_mntput)
+static int follow_automount(struct path *path, int *count, unsigned lookup_flags)
{
- struct vfsmount *mnt;
- int err;
-
- if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
- return -EREMOTE;
+ struct dentry *dentry = path->dentry;
/* We don't want to mount if someone's just doing a stat -
* unless they're stat'ing a directory and appended a '/' to
@@ -1322,138 +1267,91 @@ static int follow_automount(struct path *path, struct nameidata *nd,
* as being automount points. These will need the attentions
* of the daemon to instantiate them before they can be used.
*/
- if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+ if (!(lookup_flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
- path->dentry->d_inode)
+ dentry->d_inode)
return -EISDIR;
- nd->total_link_count++;
- if (nd->total_link_count >= 40)
+ if (count && (*count)++ >= MAXSYMLINKS)
return -ELOOP;
- mnt = path->dentry->d_op->d_automount(path);
- if (IS_ERR(mnt)) {
- /*
- * The filesystem is allowed to return -EISDIR here to indicate
- * it doesn't want to automount. For instance, autofs would do
- * this so that its userspace daemon can mount on this dentry.
- *
- * However, we can only permit this if it's a terminal point in
- * the path being looked up; if it wasn't then the remainder of
- * the path is inaccessible and we should say so.
- */
- if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
- return -EREMOTE;
- return PTR_ERR(mnt);
- }
-
- if (!mnt) /* mount collision */
- return 0;
-
- if (!*need_mntput) {
- /* lock_mount() may release path->mnt on error */
- mntget(path->mnt);
- *need_mntput = true;
- }
- err = finish_automount(mnt, path);
-
- switch (err) {
- case -EBUSY:
- /* Someone else made a mount here whilst we were busy */
- return 0;
- case 0:
- path_put(path);
- path->mnt = mnt;
- path->dentry = dget(mnt->mnt_root);
- return 0;
- default:
- return err;
- }
-
+ return finish_automount(dentry->d_op->d_automount(path), path);
}
/*
- * Handle a dentry that is managed in some way.
- * - Flagged for transit management (autofs)
- * - Flagged as mountpoint
- * - Flagged as automount point
- *
- * This may only be called in refwalk mode.
- * On success path->dentry is known positive.
- *
- * Serialization is taken care of in namespace.c
+ * mount traversal - out-of-line part. One note on ->d_flags accesses -
+ * dentries are pinned but not locked here, so negative dentry can go
+ * positive right under us. Use of smp_load_acquire() provides a barrier
+ * sufficient for ->d_inode and ->d_flags consistency.
*/
-static int follow_managed(struct path *path, struct nameidata *nd)
+static int __traverse_mounts(struct path *path, unsigned flags, bool *jumped,
+ int *count, unsigned lookup_flags)
{
- struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
- unsigned flags;
+ struct vfsmount *mnt = path->mnt;
bool need_mntput = false;
int ret = 0;
- /* Given that we're not holding a lock here, we retain the value in a
- * local variable for each dentry as we look at it so that we don't see
- * the components of that value change under us */
- while (flags = smp_load_acquire(&path->dentry->d_flags),
- unlikely(flags & DCACHE_MANAGED_DENTRY)) {
+ while (flags & DCACHE_MANAGED_DENTRY) {
/* Allow the filesystem to manage the transit without i_mutex
* being held. */
if (flags & DCACHE_MANAGE_TRANSIT) {
- BUG_ON(!path->dentry->d_op);
- BUG_ON(!path->dentry->d_op->d_manage);
ret = path->dentry->d_op->d_manage(path, false);
flags = smp_load_acquire(&path->dentry->d_flags);
if (ret < 0)
break;
}
- /* Transit to a mounted filesystem. */
- if (flags & DCACHE_MOUNTED) {
+ if (flags & DCACHE_MOUNTED) { // something's mounted on it..
struct vfsmount *mounted = lookup_mnt(path);
- if (mounted) {
+ if (mounted) { // ... in our namespace
dput(path->dentry);
if (need_mntput)
mntput(path->mnt);
path->mnt = mounted;
path->dentry = dget(mounted->mnt_root);
+ // here we know it's positive
+ flags = path->dentry->d_flags;
need_mntput = true;
continue;
}
-
- /* Something is mounted on this dentry in another
- * namespace and/or whatever was mounted there in this
- * namespace got unmounted before lookup_mnt() could
- * get it */
}
- /* Handle an automount point */
- if (flags & DCACHE_NEED_AUTOMOUNT) {
- ret = follow_automount(path, nd, &need_mntput);
- if (ret < 0)
- break;
- continue;
- }
+ if (!(flags & DCACHE_NEED_AUTOMOUNT))
+ break;
- /* We didn't change the current path point */
- break;
+ // uncovered automount point
+ ret = follow_automount(path, count, lookup_flags);
+ flags = smp_load_acquire(&path->dentry->d_flags);
+ if (ret < 0)
+ break;
}
- if (need_mntput) {
- if (path->mnt == mnt)
- mntput(path->mnt);
- if (unlikely(nd->flags & LOOKUP_NO_XDEV))
- ret = -EXDEV;
- else
- nd->flags |= LOOKUP_JUMPED;
- }
- if (ret == -EISDIR || !ret)
- ret = 1;
- if (ret > 0 && unlikely(d_flags_negative(flags)))
+ if (ret == -EISDIR)
+ ret = 0;
+ // possible if you race with several mount --move
+ if (need_mntput && path->mnt == mnt)
+ mntput(path->mnt);
+ if (!ret && unlikely(d_flags_negative(flags)))
ret = -ENOENT;
- if (unlikely(ret < 0))
- path_put_conditional(path, nd);
+ *jumped = need_mntput;
return ret;
}
+static inline int traverse_mounts(struct path *path, bool *jumped,
+ int *count, unsigned lookup_flags)
+{
+ unsigned flags = smp_load_acquire(&path->dentry->d_flags);
+
+ /* fastpath */
+ if (likely(!(flags & DCACHE_MANAGED_DENTRY))) {
+ *jumped = false;
+ if (unlikely(d_flags_negative(flags)))
+ return -ENOENT;
+ return 0;
+ }
+ return __traverse_mounts(path, flags, jumped, count, lookup_flags);
+}
+
int follow_down_one(struct path *path)
{
struct vfsmount *mounted;
@@ -1470,11 +1368,22 @@ int follow_down_one(struct path *path)
}
EXPORT_SYMBOL(follow_down_one);
-static inline int managed_dentry_rcu(const struct path *path)
+/*
+ * Follow down to the covering mount currently visible to userspace. At each
+ * point, the filesystem owning that dentry may be queried as to whether the
+ * caller is permitted to proceed or not.
+ */
+int follow_down(struct path *path)
{
- return (path->dentry->d_flags & DCACHE_MANAGE_TRANSIT) ?
- path->dentry->d_op->d_manage(path, true) : 0;
+ struct vfsmount *mnt = path->mnt;
+ bool jumped;
+ int ret = traverse_mounts(path, &jumped, NULL, 0);
+
+ if (path->mnt != mnt)
+ mntput(mnt);
+ return ret;
}
+EXPORT_SYMBOL(follow_down);
/*
* Try to skip to top of mountpoint pile in rcuwalk mode. Fail if
@@ -1483,204 +1392,88 @@ static inline int managed_dentry_rcu(const struct path *path)
static bool __follow_mount_rcu(struct nameidata *nd, struct path *path,
struct inode **inode, unsigned *seqp)
{
+ struct dentry *dentry = path->dentry;
+ unsigned int flags = dentry->d_flags;
+
+ if (likely(!(flags & DCACHE_MANAGED_DENTRY)))
+ return true;
+
+ if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+ return false;
+
for (;;) {
- struct mount *mounted;
/*
* Don't forget we might have a non-mountpoint managed dentry
* that wants to block transit.
*/
- switch (managed_dentry_rcu(path)) {
- case -ECHILD:
- default:
- return false;
- case -EISDIR:
- return true;
- case 0:
- break;
+ if (unlikely(flags & DCACHE_MANAGE_TRANSIT)) {
+ int res = dentry->d_op->d_manage(path, true);
+ if (res)
+ return res == -EISDIR;
+ flags = dentry->d_flags;
}
- if (!d_mountpoint(path->dentry))
- return !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
-
- mounted = __lookup_mnt(path->mnt, path->dentry);
- if (!mounted)
- break;
- if (unlikely(nd->flags & LOOKUP_NO_XDEV))
- return false;
- path->mnt = &mounted->mnt;
- path->dentry = mounted->mnt.mnt_root;
- nd->flags |= LOOKUP_JUMPED;
- *seqp = read_seqcount_begin(&path->dentry->d_seq);
- /*
- * Update the inode too. We don't need to re-check the
- * dentry sequence number here after this d_inode read,
- * because a mount-point is always pinned.
- */
- *inode = path->dentry->d_inode;
+ if (flags & DCACHE_MOUNTED) {
+ struct mount *mounted = __lookup_mnt(path->mnt, dentry);
+ if (mounted) {
+ path->mnt = &mounted->mnt;
+ dentry = path->dentry = mounted->mnt.mnt_root;
+ nd->flags |= LOOKUP_JUMPED;
+ *seqp = read_seqcount_begin(&dentry->d_seq);
+ *inode = dentry->d_inode;
+ /*
+ * We don't need to re-check ->d_seq after this
+ * ->d_inode read - there will be an RCU delay
+ * between mount hash removal and ->mnt_root
+ * becoming unpinned.
+ */
+ flags = dentry->d_flags;
+ continue;
+ }
+ if (read_seqretry(&mount_lock, nd->m_seq))
+ return false;
+ }
+ return !(flags & DCACHE_NEED_AUTOMOUNT);
}
- return !read_seqretry(&mount_lock, nd->m_seq) &&
- !(path->dentry->d_flags & DCACHE_NEED_AUTOMOUNT);
}
-static int follow_dotdot_rcu(struct nameidata *nd)
+static inline int handle_mounts(struct nameidata *nd, struct dentry *dentry,
+ struct path *path, struct inode **inode,
+ unsigned int *seqp)
{
- struct inode *inode = nd->inode;
-
- while (1) {
- if (path_equal(&nd->path, &nd->root)) {
- if (unlikely(nd->flags & LOOKUP_BENEATH))
- return -ECHILD;
- break;
- }
- if (nd->path.dentry != nd->path.mnt->mnt_root) {
- struct dentry *old = nd->path.dentry;
- struct dentry *parent = old->d_parent;
- unsigned seq;
-
- inode = parent->d_inode;
- seq = read_seqcount_begin(&parent->d_seq);
- if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
- return -ECHILD;
- nd->path.dentry = parent;
- nd->seq = seq;
- if (unlikely(!path_connected(&nd->path)))
- return -ECHILD;
- break;
- } else {
- struct mount *mnt = real_mount(nd->path.mnt);
- struct mount *mparent = mnt->mnt_parent;
- struct dentry *mountpoint = mnt->mnt_mountpoint;
- struct inode *inode2 = mountpoint->d_inode;
- unsigned seq = read_seqcount_begin(&mountpoint->d_seq);
- if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
- return -ECHILD;
- if (&mparent->mnt == nd->path.mnt)
- break;
- if (unlikely(nd->flags & LOOKUP_NO_XDEV))
- return -ECHILD;
- /* we know that mountpoint was pinned */
- nd->path.dentry = mountpoint;
- nd->path.mnt = &mparent->mnt;
- inode = inode2;
- nd->seq = seq;
- }
- }
- while (unlikely(d_mountpoint(nd->path.dentry))) {
- struct mount *mounted;
- mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
- if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
- return -ECHILD;
- if (!mounted)
- break;
- if (unlikely(nd->flags & LOOKUP_NO_XDEV))
- return -ECHILD;
- nd->path.mnt = &mounted->mnt;
- nd->path.dentry = mounted->mnt.mnt_root;
- inode = nd->path.dentry->d_inode;
- nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
- }
- nd->inode = inode;
- return 0;
-}
-
-/*
- * Follow down to the covering mount currently visible to userspace. At each
- * point, the filesystem owning that dentry may be queried as to whether the
- * caller is permitted to proceed or not.
- */
-int follow_down(struct path *path)
-{
- unsigned managed;
+ bool jumped;
int ret;
- while (managed = READ_ONCE(path->dentry->d_flags),
- unlikely(managed & DCACHE_MANAGED_DENTRY)) {
- /* Allow the filesystem to manage the transit without i_mutex
- * being held.
- *
- * We indicate to the filesystem if someone is trying to mount
- * something here. This gives autofs the chance to deny anyone
- * other than its daemon the right to mount on its
- * superstructure.
- *
- * The filesystem may sleep at this point.
- */
- if (managed & DCACHE_MANAGE_TRANSIT) {
- BUG_ON(!path->dentry->d_op);
- BUG_ON(!path->dentry->d_op->d_manage);
- ret = path->dentry->d_op->d_manage(path, false);
- if (ret < 0)
- return ret == -EISDIR ? 0 : ret;
- }
-
- /* Transit to a mounted filesystem. */
- if (managed & DCACHE_MOUNTED) {
- struct vfsmount *mounted = lookup_mnt(path);
- if (!mounted)
- break;
- dput(path->dentry);
- mntput(path->mnt);
- path->mnt = mounted;
- path->dentry = dget(mounted->mnt_root);
- continue;
- }
-
- /* Don't handle automount points here */
- break;
+ path->mnt = nd->path.mnt;
+ path->dentry = dentry;
+ if (nd->flags & LOOKUP_RCU) {
+ unsigned int seq = *seqp;
+ if (unlikely(!*inode))
+ return -ENOENT;
+ if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+ return 0;
+ if (unlazy_child(nd, dentry, seq))
+ return -ECHILD;
+ // *path might've been clobbered by __follow_mount_rcu()
+ path->mnt = nd->path.mnt;
+ path->dentry = dentry;
}
- return 0;
-}
-EXPORT_SYMBOL(follow_down);
-
-/*
- * Skip to top of mountpoint pile in refwalk mode for follow_dotdot()
- */
-static void follow_mount(struct path *path)
-{
- while (d_mountpoint(path->dentry)) {
- struct vfsmount *mounted = lookup_mnt(path);
- if (!mounted)
- break;
- dput(path->dentry);
- mntput(path->mnt);
- path->mnt = mounted;
- path->dentry = dget(mounted->mnt_root);
- }
-}
-
-static int path_parent_directory(struct path *path)
-{
- struct dentry *old = path->dentry;
- /* rare case of legitimate dget_parent()... */
- path->dentry = dget_parent(path->dentry);
- dput(old);
- if (unlikely(!path_connected(path)))
- return -ENOENT;
- return 0;
-}
-
-static int follow_dotdot(struct nameidata *nd)
-{
- while (1) {
- if (path_equal(&nd->path, &nd->root)) {
- if (unlikely(nd->flags & LOOKUP_BENEATH))
- return -EXDEV;
- break;
- }
- if (nd->path.dentry != nd->path.mnt->mnt_root) {
- int ret = path_parent_directory(&nd->path);
- if (ret)
- return ret;
- break;
- }
- if (!follow_up(&nd->path))
- break;
+ ret = traverse_mounts(path, &jumped, &nd->total_link_count, nd->flags);
+ if (jumped) {
if (unlikely(nd->flags & LOOKUP_NO_XDEV))
- return -EXDEV;
+ ret = -EXDEV;
+ else
+ nd->flags |= LOOKUP_JUMPED;
}
- follow_mount(&nd->path);
- nd->inode = nd->path.dentry->d_inode;
- return 0;
+ if (unlikely(ret)) {
+ dput(path->dentry);
+ if (path->mnt != nd->path.mnt)
+ mntput(path->mnt);
+ } else {
+ *inode = d_backing_inode(path->dentry);
+ *seqp = 0; /* out of RCU mode, so the value doesn't matter */
+ }
+ return ret;
}
/*
@@ -1737,14 +1530,12 @@ static struct dentry *__lookup_hash(const struct qstr *name,
return dentry;
}
-static int lookup_fast(struct nameidata *nd,
- struct path *path, struct inode **inode,
- unsigned *seqp)
+static struct dentry *lookup_fast(struct nameidata *nd,
+ struct inode **inode,
+ unsigned *seqp)
{
- struct vfsmount *mnt = nd->path.mnt;
struct dentry *dentry, *parent = nd->path.dentry;
int status = 1;
- int err;
/*
* Rename seqlock is not required here because in the off chance
@@ -1753,12 +1544,11 @@ static int lookup_fast(struct nameidata *nd,
*/
if (nd->flags & LOOKUP_RCU) {
unsigned seq;
- bool negative;
dentry = __d_lookup_rcu(parent, &nd->last, &seq);
if (unlikely(!dentry)) {
if (unlazy_walk(nd))
- return -ECHILD;
- return 0;
+ return ERR_PTR(-ECHILD);
+ return NULL;
}
/*
@@ -1766,9 +1556,8 @@ static int lookup_fast(struct nameidata *nd,
* the dentry name information from lookup.
*/
*inode = d_backing_inode(dentry);
- negative = d_is_negative(dentry);
if (unlikely(read_seqcount_retry(&dentry->d_seq, seq)))
- return -ECHILD;
+ return ERR_PTR(-ECHILD);
/*
* This sequence count validates that the parent had no
@@ -1778,46 +1567,30 @@ static int lookup_fast(struct nameidata *nd,
* enough, we can use __read_seqcount_retry here.
*/
if (unlikely(__read_seqcount_retry(&parent->d_seq, nd->seq)))
- return -ECHILD;
+ return ERR_PTR(-ECHILD);
*seqp = seq;
status = d_revalidate(dentry, nd->flags);
- if (likely(status > 0)) {
- /*
- * Note: do negative dentry check after revalidation in
- * case that drops it.
- */
- if (unlikely(negative))
- return -ENOENT;
- path->mnt = mnt;
- path->dentry = dentry;
- if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
- return 1;
- }
+ if (likely(status > 0))
+ return dentry;
if (unlazy_child(nd, dentry, seq))
- return -ECHILD;
+ return ERR_PTR(-ECHILD);
if (unlikely(status == -ECHILD))
/* we'd been told to redo it in non-rcu mode */
status = d_revalidate(dentry, nd->flags);
} else {
dentry = __d_lookup(parent, &nd->last);
if (unlikely(!dentry))
- return 0;
+ return NULL;
status = d_revalidate(dentry, nd->flags);
}
if (unlikely(status <= 0)) {
if (!status)
d_invalidate(dentry);
dput(dentry);
- return status;
+ return ERR_PTR(status);
}
-
- path->mnt = mnt;
- path->dentry = dentry;
- err = follow_managed(path, nd);
- if (likely(err > 0))
- *inode = d_backing_inode(path->dentry);
- return err;
+ return dentry;
}
/* Fast lookup failed, do it the slow way */
@@ -1882,21 +1655,250 @@ static inline int may_lookup(struct nameidata *nd)
return inode_permission2(nd->path.mnt, nd->inode, MAY_EXEC);
}
-static inline int handle_dots(struct nameidata *nd, int type)
+static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
+{
+ if (unlikely(nd->total_link_count++ >= MAXSYMLINKS))
+ return -ELOOP;
+
+ if (likely(nd->depth != EMBEDDED_LEVELS))
+ return 0;
+ if (likely(nd->stack != nd->internal))
+ return 0;
+ if (likely(nd_alloc_stack(nd)))
+ return 0;
+
+ if (nd->flags & LOOKUP_RCU) {
+ // we need to grab link before we do unlazy. And we can't skip
+ // unlazy even if we fail to grab the link - cleanup needs it
+ bool grabbed_link = legitimize_path(nd, link, seq);
+
+ if (unlazy_walk(nd) != 0 || !grabbed_link)
+ return -ECHILD;
+
+ if (nd_alloc_stack(nd))
+ return 0;
+ }
+ return -ENOMEM;
+}
+
+enum {WALK_TRAILING = 1, WALK_MORE = 2, WALK_NOFOLLOW = 4};
+
+static const char *pick_link(struct nameidata *nd, struct path *link,
+ struct inode *inode, unsigned seq, int flags)
+{
+ struct saved *last;
+ const char *res;
+ int error = reserve_stack(nd, link, seq);
+
+ if (unlikely(error)) {
+ if (!(nd->flags & LOOKUP_RCU))
+ path_put(link);
+ return ERR_PTR(error);
+ }
+ last = nd->stack + nd->depth++;
+ last->link = *link;
+ clear_delayed_call(&last->done);
+ last->seq = seq;
+
+ if (flags & WALK_TRAILING) {
+ error = may_follow_link(nd, inode);
+ if (unlikely(error))
+ return ERR_PTR(error);
+ }
+
+ if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
+ return ERR_PTR(-ELOOP);
+
+ if (!(nd->flags & LOOKUP_RCU)) {
+ touch_atime(&last->link);
+ cond_resched();
+ } else if (atime_needs_update(&last->link, inode)) {
+ if (unlikely(unlazy_walk(nd)))
+ return ERR_PTR(-ECHILD);
+ touch_atime(&last->link);
+ }
+
+ error = security_inode_follow_link(link->dentry, inode,
+ nd->flags & LOOKUP_RCU);
+ if (unlikely(error))
+ return ERR_PTR(error);
+
+ res = READ_ONCE(inode->i_link);
+ if (!res) {
+ const char * (*get)(struct dentry *, struct inode *,
+ struct delayed_call *);
+ get = inode->i_op->get_link;
+ if (nd->flags & LOOKUP_RCU) {
+ res = get(NULL, inode, &last->done);
+ if (res == ERR_PTR(-ECHILD)) {
+ if (unlikely(unlazy_walk(nd)))
+ return ERR_PTR(-ECHILD);
+ res = get(link->dentry, inode, &last->done);
+ }
+ } else {
+ res = get(link->dentry, inode, &last->done);
+ }
+ if (!res)
+ goto all_done;
+ if (IS_ERR(res))
+ return res;
+ }
+ if (*res == '/') {
+ error = nd_jump_root(nd);
+ if (unlikely(error))
+ return ERR_PTR(error);
+ while (unlikely(*++res == '/'))
+ ;
+ }
+ if (*res)
+ return res;
+all_done: // pure jump
+ put_link(nd);
+ return NULL;
+}
+
+/*
+ * Do we need to follow links? We _really_ want to be able
+ * to do this check without having to look at inode->i_op,
+ * so we keep a cache of "no, this doesn't need follow_link"
+ * for the common case.
+ */
+static const char *step_into(struct nameidata *nd, int flags,
+ struct dentry *dentry, struct inode *inode, unsigned seq)
+{
+ struct path path;
+ int err = handle_mounts(nd, dentry, &path, &inode, &seq);
+
+ if (err < 0)
+ return ERR_PTR(err);
+ if (likely(!d_is_symlink(path.dentry)) ||
+ ((flags & WALK_TRAILING) && !(nd->flags & LOOKUP_FOLLOW)) ||
+ (flags & WALK_NOFOLLOW)) {
+ /* not a symlink or should not follow */
+ if (!(nd->flags & LOOKUP_RCU)) {
+ dput(nd->path.dentry);
+ if (nd->path.mnt != path.mnt)
+ mntput(nd->path.mnt);
+ }
+ nd->path = path;
+ nd->inode = inode;
+ nd->seq = seq;
+ return NULL;
+ }
+ if (nd->flags & LOOKUP_RCU) {
+ /* make sure that d_is_symlink above matches inode */
+ if (read_seqcount_retry(&path.dentry->d_seq, seq))
+ return ERR_PTR(-ECHILD);
+ } else {
+ if (path.mnt == nd->path.mnt)
+ mntget(path.mnt);
+ }
+ return pick_link(nd, &path, inode, seq, flags);
+}
+
+static struct dentry *follow_dotdot_rcu(struct nameidata *nd,
+ struct inode **inodep,
+ unsigned *seqp)
+{
+ struct dentry *parent, *old;
+
+ if (path_equal(&nd->path, &nd->root))
+ goto in_root;
+ if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+ struct path path;
+ unsigned seq;
+ if (!choose_mountpoint_rcu(real_mount(nd->path.mnt),
+ &nd->root, &path, &seq))
+ goto in_root;
+ if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+ return ERR_PTR(-ECHILD);
+ nd->path = path;
+ nd->inode = path.dentry->d_inode;
+ nd->seq = seq;
+ if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+ return ERR_PTR(-ECHILD);
+ /* we know that mountpoint was pinned */
+ }
+ old = nd->path.dentry;
+ parent = old->d_parent;
+ *inodep = parent->d_inode;
+ *seqp = read_seqcount_begin(&parent->d_seq);
+ if (unlikely(read_seqcount_retry(&old->d_seq, nd->seq)))
+ return ERR_PTR(-ECHILD);
+ if (unlikely(!path_connected(nd->path.mnt, parent)))
+ return ERR_PTR(-ECHILD);
+ return parent;
+in_root:
+ if (unlikely(read_seqretry(&mount_lock, nd->m_seq)))
+ return ERR_PTR(-ECHILD);
+ if (unlikely(nd->flags & LOOKUP_BENEATH))
+ return ERR_PTR(-ECHILD);
+ return NULL;
+}
+
+static struct dentry *follow_dotdot(struct nameidata *nd,
+ struct inode **inodep,
+ unsigned *seqp)
+{
+ struct dentry *parent;
+
+ if (path_equal(&nd->path, &nd->root))
+ goto in_root;
+ if (unlikely(nd->path.dentry == nd->path.mnt->mnt_root)) {
+ struct path path;
+
+ if (!choose_mountpoint(real_mount(nd->path.mnt),
+ &nd->root, &path))
+ goto in_root;
+ path_put(&nd->path);
+ nd->path = path;
+ nd->inode = path.dentry->d_inode;
+ if (unlikely(nd->flags & LOOKUP_NO_XDEV))
+ return ERR_PTR(-EXDEV);
+ }
+ /* rare case of legitimate dget_parent()... */
+ parent = dget_parent(nd->path.dentry);
+ if (unlikely(!path_connected(nd->path.mnt, parent))) {
+ dput(parent);
+ return ERR_PTR(-ENOENT);
+ }
+ *seqp = 0;
+ *inodep = parent->d_inode;
+ return parent;
+
+in_root:
+ if (unlikely(nd->flags & LOOKUP_BENEATH))
+ return ERR_PTR(-EXDEV);
+ dget(nd->path.dentry);
+ return NULL;
+}
+
+static const char *handle_dots(struct nameidata *nd, int type)
{
if (type == LAST_DOTDOT) {
- int error = 0;
+ const char *error = NULL;
+ struct dentry *parent;
+ struct inode *inode;
+ unsigned seq;
if (!nd->root.mnt) {
- error = set_root(nd);
+ error = ERR_PTR(set_root(nd));
if (error)
return error;
}
if (nd->flags & LOOKUP_RCU)
- error = follow_dotdot_rcu(nd);
+ parent = follow_dotdot_rcu(nd, &inode, &seq);
else
- error = follow_dotdot(nd);
- if (error)
+ parent = follow_dotdot(nd, &inode, &seq);
+ if (IS_ERR(parent))
+ return ERR_CAST(parent);
+ if (unlikely(!parent))
+ error = step_into(nd, WALK_NOFOLLOW,
+ nd->path.dentry, nd->inode, nd->seq);
+ else
+ error = step_into(nd, WALK_NOFOLLOW,
+ parent, inode, seq);
+ if (unlikely(error))
return error;
if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
@@ -1908,119 +1910,40 @@ static inline int handle_dots(struct nameidata *nd, int type)
*/
smp_rmb();
if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
- return -EAGAIN;
+ return ERR_PTR(-EAGAIN);
if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
- return -EAGAIN;
+ return ERR_PTR(-EAGAIN);
}
}
- return 0;
+ return NULL;
}
-static int pick_link(struct nameidata *nd, struct path *link,
- struct inode *inode, unsigned seq)
+static const char *walk_component(struct nameidata *nd, int flags)
{
- int error;
- struct saved *last;
- if (unlikely(nd->total_link_count++ >= MAXSYMLINKS)) {
- path_to_nameidata(link, nd);
- return -ELOOP;
- }
- if (!(nd->flags & LOOKUP_RCU)) {
- if (link->mnt == nd->path.mnt)
- mntget(link->mnt);
- }
- error = nd_alloc_stack(nd);
- if (unlikely(error)) {
- if (error == -ECHILD) {
- if (unlikely(!legitimize_path(nd, link, seq))) {
- drop_links(nd);
- nd->depth = 0;
- nd->flags &= ~LOOKUP_RCU;
- nd->path.mnt = NULL;
- nd->path.dentry = NULL;
- rcu_read_unlock();
- } else if (likely(unlazy_walk(nd)) == 0)
- error = nd_alloc_stack(nd);
- }
- if (error) {
- path_put(link);
- return error;
- }
- }
-
- last = nd->stack + nd->depth++;
- last->link = *link;
- clear_delayed_call(&last->done);
- nd->link_inode = inode;
- last->seq = seq;
- return 1;
-}
-
-enum {WALK_FOLLOW = 1, WALK_MORE = 2};
-
-/*
- * Do we need to follow links? We _really_ want to be able
- * to do this check without having to look at inode->i_op,
- * so we keep a cache of "no, this doesn't need follow_link"
- * for the common case.
- */
-static inline int step_into(struct nameidata *nd, struct path *path,
- int flags, struct inode *inode, unsigned seq)
-{
- if (!(flags & WALK_MORE) && nd->depth)
- put_link(nd);
- if (likely(!d_is_symlink(path->dentry)) ||
- !(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
- /* not a symlink or should not follow */
- path_to_nameidata(path, nd);
- nd->inode = inode;
- nd->seq = seq;
- return 0;
- }
- /* make sure that d_is_symlink above matches inode */
- if (nd->flags & LOOKUP_RCU) {
- if (read_seqcount_retry(&path->dentry->d_seq, seq))
- return -ECHILD;
- }
- return pick_link(nd, path, inode, seq);
-}
-
-static int walk_component(struct nameidata *nd, int flags)
-{
- struct path path;
+ struct dentry *dentry;
struct inode *inode;
unsigned seq;
- int err;
/*
* "." and ".." are special - ".." especially so because it has
* to be able to know about the current root directory and
* parent relationships.
*/
if (unlikely(nd->last_type != LAST_NORM)) {
- err = handle_dots(nd, nd->last_type);
if (!(flags & WALK_MORE) && nd->depth)
put_link(nd);
- return err;
+ return handle_dots(nd, nd->last_type);
}
- err = lookup_fast(nd, &path, &inode, &seq);
- if (unlikely(err <= 0)) {
- if (err < 0)
- return err;
- path.dentry = lookup_slow(&nd->last, nd->path.dentry,
- nd->flags);
- if (IS_ERR(path.dentry))
- return PTR_ERR(path.dentry);
-
- path.mnt = nd->path.mnt;
- err = follow_managed(&path, nd);
- if (unlikely(err < 0))
- return err;
-
- seq = 0; /* we are already out of RCU mode */
- inode = d_backing_inode(path.dentry);
+ dentry = lookup_fast(nd, &inode, &seq);
+ if (IS_ERR(dentry))
+ return ERR_CAST(dentry);
+ if (unlikely(!dentry)) {
+ dentry = lookup_slow(&nd->last, nd->path.dentry, nd->flags);
+ if (IS_ERR(dentry))
+ return ERR_CAST(dentry);
}
-
- return step_into(nd, &path, flags, inode, seq);
+ if (!(flags & WALK_MORE) && nd->depth)
+ put_link(nd);
+ return step_into(nd, flags, dentry, inode, seq);
}
/*
@@ -2261,8 +2184,11 @@ static inline u64 hash_name(const void *salt, const char *name)
*/
static int link_path_walk(const char *name, struct nameidata *nd)
{
+ int depth = 0; // depth <= nd->depth
int err;
+ nd->last_type = LAST_ROOT;
+ nd->flags |= LOOKUP_PARENT;
if (IS_ERR(name))
return PTR_ERR(name);
while (*name=='/')
@@ -2272,6 +2198,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
/* At this point we know we have a real path component. */
for(;;) {
+ const char *link;
u64 hash_len;
int type;
@@ -2321,36 +2248,27 @@ static int link_path_walk(const char *name, struct nameidata *nd)
} while (unlikely(*name == '/'));
if (unlikely(!*name)) {
OK:
- /* pathname body, done */
- if (!nd->depth)
- return 0;
- name = nd->stack[nd->depth - 1].name;
- /* trailing symlink, done */
- if (!name)
+ /* pathname or trailing symlink, done */
+ if (!depth) {
+ nd->dir_uid = nd->inode->i_uid;
+ nd->dir_mode = nd->inode->i_mode;
+ nd->flags &= ~LOOKUP_PARENT;
return 0;
+ }
/* last component of nested symlink */
- err = walk_component(nd, WALK_FOLLOW);
+ name = nd->stack[--depth].name;
+ link = walk_component(nd, 0);
} else {
/* not the last component */
- err = walk_component(nd, WALK_FOLLOW | WALK_MORE);
+ link = walk_component(nd, WALK_MORE);
}
- if (err < 0)
- return err;
-
- if (err) {
- const char *s = get_link(nd);
-
- if (IS_ERR(s))
- return PTR_ERR(s);
- err = 0;
- if (unlikely(!s)) {
- /* jumped */
- put_link(nd);
- } else {
- nd->stack[nd->depth - 1].name = name;
- name = s;
- continue;
- }
+ if (unlikely(link)) {
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ /* a symlink to follow */
+ nd->stack[depth++].name = name;
+ name = link;
+ continue;
}
if (unlikely(!d_can_lookup(nd->path.dentry))) {
if (nd->flags & LOOKUP_RCU) {
@@ -2373,8 +2291,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
if (flags & LOOKUP_RCU)
rcu_read_lock();
- nd->last_type = LAST_ROOT; /* if there are only slashes... */
- nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
+ nd->flags = flags | LOOKUP_JUMPED;
nd->depth = 0;
nd->m_seq = __read_seqcount_begin(&mount_lock.seqcount);
@@ -2464,54 +2381,20 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
return s;
}
-static const char *trailing_symlink(struct nameidata *nd)
-{
- const char *s;
- int error = may_follow_link(nd);
- if (unlikely(error))
- return ERR_PTR(error);
- nd->flags |= LOOKUP_PARENT;
- nd->stack[0].name = NULL;
- s = get_link(nd);
- return s ? s : "";
-}
-
-static inline int lookup_last(struct nameidata *nd)
+static inline const char *lookup_last(struct nameidata *nd)
{
if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
- nd->flags &= ~LOOKUP_PARENT;
- return walk_component(nd, 0);
+ return walk_component(nd, WALK_TRAILING);
}
static int handle_lookup_down(struct nameidata *nd)
{
- struct path path = nd->path;
- struct inode *inode = nd->inode;
- unsigned seq = nd->seq;
- int err;
-
- if (nd->flags & LOOKUP_RCU) {
- /*
- * don't bother with unlazy_walk on failure - we are
- * at the very beginning of walk, so we lose nothing
- * if we simply redo everything in non-RCU mode
- */
- if (unlikely(!__follow_mount_rcu(nd, &path, &inode, &seq)))
- return -ECHILD;
- } else {
- dget(path.dentry);
- err = follow_managed(&path, nd);
- if (unlikely(err < 0))
- return err;
- inode = d_backing_inode(path.dentry);
- seq = 0;
- }
- path_to_nameidata(&path, nd);
- nd->inode = inode;
- nd->seq = seq;
- return 0;
+ if (!(nd->flags & LOOKUP_RCU))
+ dget(nd->path.dentry);
+ return PTR_ERR(step_into(nd, WALK_NOFOLLOW,
+ nd->path.dentry, nd->inode, nd->seq));
}
/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@@ -2526,16 +2409,19 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
s = ERR_PTR(err);
}
- while (!(err = link_path_walk(s, nd))
- && ((err = lookup_last(nd)) > 0)) {
- s = trailing_symlink(nd);
- }
+ while (!(err = link_path_walk(s, nd)) &&
+ (s = lookup_last(nd)) != NULL)
+ ;
if (!err)
err = complete_walk(nd);
if (!err && nd->flags & LOOKUP_DIRECTORY)
if (!d_can_lookup(nd->path.dentry))
err = -ENOTDIR;
+ if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
+ err = handle_lookup_down(nd);
+ nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
+ }
if (!err) {
*path = nd->path;
nd->path.mnt = NULL;
@@ -2564,7 +2450,8 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);
if (likely(!retval))
- audit_inode(name, path->dentry, 0);
+ audit_inode(name, path->dentry,
+ flags & LOOKUP_MOUNTPOINT ? AUDIT_INODE_NOEVAL : 0);
restore_nameidata();
putname(name);
return retval;
@@ -2818,24 +2705,23 @@ int path_pts(struct path *path)
/* Find something mounted on "pts" in the same directory as
* the input path.
*/
- struct dentry *child, *parent;
- struct qstr this;
- int ret;
+ struct dentry *parent = dget_parent(path->dentry);
+ struct dentry *child;
+ struct qstr this = QSTR_INIT("pts", 3);
- ret = path_parent_directory(path);
- if (ret)
- return ret;
-
- parent = path->dentry;
- this.name = "pts";
- this.len = 3;
+ if (unlikely(!path_connected(path->mnt, parent))) {
+ dput(parent);
+ return -ENOENT;
+ }
+ dput(path->dentry);
+ path->dentry = parent;
child = d_hash_and_lookup(parent, &this);
if (!child)
return -ENOENT;
path->dentry = child;
dput(parent);
- follow_mount(path);
+ follow_down(path);
return 0;
}
#endif
@@ -2848,88 +2734,6 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
}
EXPORT_SYMBOL(user_path_at_empty);
-/**
- * path_mountpoint - look up a path to be umounted
- * @nd: lookup context
- * @flags: lookup flags
- * @path: pointer to container for result
- *
- * Look up the given name, but don't attempt to revalidate the last component.
- * Returns 0 and "path" will be valid on success; Returns error otherwise.
- */
-static int
-path_mountpoint(struct nameidata *nd, unsigned flags, struct path *path)
-{
- const char *s = path_init(nd, flags);
- int err;
-
- while (!(err = link_path_walk(s, nd)) &&
- (err = lookup_last(nd)) > 0) {
- s = trailing_symlink(nd);
- }
- if (!err && (nd->flags & LOOKUP_RCU))
- err = unlazy_walk(nd);
- if (!err)
- err = handle_lookup_down(nd);
- if (!err) {
- *path = nd->path;
- nd->path.mnt = NULL;
- nd->path.dentry = NULL;
- }
- terminate_walk(nd);
- return err;
-}
-
-static int
-filename_mountpoint(int dfd, struct filename *name, struct path *path,
- unsigned int flags)
-{
- struct nameidata nd;
- int error;
- if (IS_ERR(name))
- return PTR_ERR(name);
- set_nameidata(&nd, dfd, name);
- error = path_mountpoint(&nd, flags | LOOKUP_RCU, path);
- if (unlikely(error == -ECHILD))
- error = path_mountpoint(&nd, flags, path);
- if (unlikely(error == -ESTALE))
- error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path);
- if (likely(!error))
- audit_inode(name, path->dentry, AUDIT_INODE_NOEVAL);
- restore_nameidata();
- putname(name);
- return error;
-}
-
-/**
- * user_path_mountpoint_at - lookup a path from userland in order to umount it
- * @dfd: directory file descriptor
- * @name: pathname from userland
- * @flags: lookup flags
- * @path: pointer to container to hold result
- *
- * A umount is a special case for path walking. We're not actually interested
- * in the inode in this situation, and ESTALE errors can be a problem. We
- * simply want track down the dentry and vfsmount attached at the mountpoint
- * and avoid revalidating the last component.
- *
- * Returns 0 and populates "path" on success.
- */
-int
-user_path_mountpoint_at(int dfd, const char __user *name, unsigned int flags,
- struct path *path)
-{
- return filename_mountpoint(dfd, getname(name), path, flags);
-}
-
-int
-kern_path_mountpoint(int dfd, const char *name, struct path *path,
- unsigned int flags)
-{
- return filename_mountpoint(dfd, getname_kernel(name), path, flags);
-}
-EXPORT_SYMBOL(kern_path_mountpoint);
-
int __check_sticky(struct inode *dir, struct inode *inode)
{
kuid_t fsuid = current_fsuid();
@@ -3244,18 +3048,14 @@ static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t m
*
* Returns an error code otherwise.
*/
-static int atomic_open(struct nameidata *nd, struct dentry *dentry,
- struct path *path, struct file *file,
- const struct open_flags *op,
- int open_flag, umode_t mode)
+static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
+ struct file *file,
+ int open_flag, umode_t mode)
{
struct dentry *const DENTRY_NOT_SET = (void *) -1UL;
struct inode *dir = nd->path.dentry->d_inode;
int error;
- if (!(~open_flag & (O_EXCL | O_CREAT))) /* both O_EXCL and O_CREAT */
- open_flag &= ~O_TRUNC;
-
if (nd->flags & LOOKUP_DIRECTORY)
open_flag |= O_DIRECTORY;
@@ -3266,19 +3066,10 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
d_lookup_done(dentry);
if (!error) {
if (file->f_mode & FMODE_OPENED) {
- /*
- * We didn't have the inode before the open, so check open
- * permission here.
- */
- int acc_mode = op->acc_mode;
- if (file->f_mode & FMODE_CREATED) {
- WARN_ON(!(open_flag & O_CREAT));
- fsnotify_create(dir, dentry);
- acc_mode = 0;
+ if (unlikely(dentry != file->f_path.dentry)) {
+ dput(dentry);
+ dentry = dget(file->f_path.dentry);
}
- error = may_open(&file->f_path, acc_mode, open_flag);
- if (WARN_ON(error > 0))
- error = -EINVAL;
} else if (WARN_ON(file->f_path.dentry == DENTRY_NOT_SET)) {
error = -EIO;
} else {
@@ -3286,19 +3077,15 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
dput(dentry);
dentry = file->f_path.dentry;
}
- if (file->f_mode & FMODE_CREATED)
- fsnotify_create(dir, dentry);
- if (unlikely(d_is_negative(dentry))) {
+ if (unlikely(d_is_negative(dentry)))
error = -ENOENT;
- } else {
- path->dentry = dentry;
- path->mnt = nd->path.mnt;
- return 0;
- }
}
}
- dput(dentry);
- return error;
+ if (error) {
+ dput(dentry);
+ dentry = ERR_PTR(error);
+ }
+ return dentry;
}
/*
@@ -3316,10 +3103,9 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
*
* An error code is returned on failure.
*/
-static int lookup_open(struct nameidata *nd, struct path *path,
- struct file *file,
- const struct open_flags *op,
- bool got_write)
+static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
+ const struct open_flags *op,
+ bool got_write)
{
struct dentry *dir = nd->path.dentry;
struct inode *dir_inode = dir->d_inode;
@@ -3330,7 +3116,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
if (unlikely(IS_DEADDIR(dir_inode)))
- return -ENOENT;
+ return ERR_PTR(-ENOENT);
file->f_mode &= ~FMODE_CREATED;
dentry = d_lookup(dir, &nd->last);
@@ -3338,7 +3124,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
if (!dentry) {
dentry = d_alloc_parallel(dir, &nd->last, &wq);
if (IS_ERR(dentry))
- return PTR_ERR(dentry);
+ return dentry;
}
if (d_in_lookup(dentry))
break;
@@ -3354,7 +3140,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
}
if (dentry->d_inode) {
/* Cached positive dentry: will open in f_op->open */
- goto out_no_open;
+ return dentry;
}
/*
@@ -3366,41 +3152,27 @@ static int lookup_open(struct nameidata *nd, struct path *path,
* Another problem is returing the "right" error value (e.g. for an
* O_EXCL open we want to return EEXIST not EROFS).
*/
+ if (unlikely(!got_write))
+ open_flag &= ~O_TRUNC;
if (open_flag & O_CREAT) {
+ if (open_flag & O_EXCL)
+ open_flag &= ~O_TRUNC;
if (!IS_POSIXACL(dir->d_inode))
mode &= ~current_umask();
- if (unlikely(!got_write)) {
- create_error = -EROFS;
- open_flag &= ~O_CREAT;
- if (open_flag & (O_EXCL | O_TRUNC))
- goto no_open;
- /* No side effects, safe to clear O_CREAT */
- } else {
+ if (likely(got_write))
create_error = may_o_create(&nd->path, dentry, mode);
- if (create_error) {
- open_flag &= ~O_CREAT;
- if (open_flag & O_EXCL)
- goto no_open;
- }
- }
- } else if ((open_flag & (O_TRUNC|O_WRONLY|O_RDWR)) &&
- unlikely(!got_write)) {
- /*
- * No O_CREATE -> atomicity not a requirement -> fall
- * back to lookup + open
- */
- goto no_open;
+ else
+ create_error = -EROFS;
}
-
+ if (create_error)
+ open_flag &= ~O_CREAT;
if (dir_inode->i_op->atomic_open) {
- error = atomic_open(nd, dentry, path, file, op, open_flag,
- mode);
- if (unlikely(error == -ENOENT) && create_error)
- error = create_error;
- return error;
+ dentry = atomic_open(nd, dentry, file, open_flag, mode);
+ if (unlikely(create_error) && dentry == ERR_PTR(-ENOENT))
+ dentry = ERR_PTR(create_error);
+ return dentry;
}
-no_open:
if (d_in_lookup(dentry)) {
struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
nd->flags);
@@ -3427,78 +3199,60 @@ static int lookup_open(struct nameidata *nd, struct path *path,
open_flag & O_EXCL);
if (error)
goto out_dput;
- fsnotify_create(dir_inode, dentry);
}
if (unlikely(create_error) && !dentry->d_inode) {
error = create_error;
goto out_dput;
}
-out_no_open:
- path->dentry = dentry;
- path->mnt = nd->path.mnt;
- return 0;
+ return dentry;
out_dput:
dput(dentry);
- return error;
+ return ERR_PTR(error);
}
-/*
- * Handle the last step of open()
- */
-static int do_last(struct nameidata *nd,
+static const char *open_last_lookups(struct nameidata *nd,
struct file *file, const struct open_flags *op)
{
struct dentry *dir = nd->path.dentry;
- kuid_t dir_uid = nd->inode->i_uid;
- umode_t dir_mode = nd->inode->i_mode;
int open_flag = op->open_flag;
- bool will_truncate = (open_flag & O_TRUNC) != 0;
bool got_write = false;
- int acc_mode = op->acc_mode;
unsigned seq;
struct inode *inode;
- struct path path;
+ struct dentry *dentry;
+ const char *res;
int error;
- nd->flags &= ~LOOKUP_PARENT;
nd->flags |= op->intent;
if (nd->last_type != LAST_NORM) {
- error = handle_dots(nd, nd->last_type);
- if (unlikely(error))
- return error;
- goto finish_open;
+ if (nd->depth)
+ put_link(nd);
+ return handle_dots(nd, nd->last_type);
}
if (!(open_flag & O_CREAT)) {
if (nd->last.name[nd->last.len])
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
/* we _can_ be in RCU mode here */
- error = lookup_fast(nd, &path, &inode, &seq);
- if (likely(error > 0))
+ dentry = lookup_fast(nd, &inode, &seq);
+ if (IS_ERR(dentry))
+ return ERR_CAST(dentry);
+ if (likely(dentry))
goto finish_lookup;
- if (error < 0)
- return error;
-
- BUG_ON(nd->inode != dir->d_inode);
BUG_ON(nd->flags & LOOKUP_RCU);
} else {
/* create side of things */
- /*
- * This will *only* deal with leaving RCU mode - LOOKUP_JUMPED
- * has been cleared when we got to the last component we are
- * about to look up
- */
- error = complete_walk(nd);
- if (error)
- return error;
-
+ if (nd->flags & LOOKUP_RCU) {
+ error = unlazy_walk(nd);
+ if (unlikely(error))
+ return ERR_PTR(error);
+ }
audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
/* trailing slashes? */
if (unlikely(nd->last.name[nd->last.len]))
- return -EISDIR;
+ return ERR_PTR(-EISDIR);
}
if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
@@ -3515,108 +3269,90 @@ static int do_last(struct nameidata *nd,
inode_lock(dir->d_inode);
else
inode_lock_shared(dir->d_inode);
- error = lookup_open(nd, &path, file, op, got_write);
+ dentry = lookup_open(nd, file, op, got_write);
+ if (!IS_ERR(dentry) && (file->f_mode & FMODE_CREATED))
+ fsnotify_create(dir->d_inode, dentry);
if (open_flag & O_CREAT)
inode_unlock(dir->d_inode);
else
inode_unlock_shared(dir->d_inode);
- if (error)
- goto out;
+ if (got_write)
+ mnt_drop_write(nd->path.mnt);
- if (file->f_mode & FMODE_OPENED) {
- if ((file->f_mode & FMODE_CREATED) ||
- !S_ISREG(file_inode(file)->i_mode))
- will_truncate = false;
+ if (IS_ERR(dentry))
+ return ERR_CAST(dentry);
- audit_inode(nd->name, file->f_path.dentry, 0);
- goto opened;
+ if (file->f_mode & (FMODE_OPENED | FMODE_CREATED)) {
+ dput(nd->path.dentry);
+ nd->path.dentry = dentry;
+ return NULL;
}
+finish_lookup:
+ if (nd->depth)
+ put_link(nd);
+ res = step_into(nd, WALK_TRAILING, dentry, inode, seq);
+ if (unlikely(res))
+ nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
+ return res;
+}
+
+/*
+ * Handle the last step of open()
+ */
+static int do_open(struct nameidata *nd,
+ struct file *file, const struct open_flags *op)
+{
+ int open_flag = op->open_flag;
+ bool do_truncate;
+ int acc_mode;
+ int error;
+
+ if (!(file->f_mode & (FMODE_OPENED | FMODE_CREATED))) {
+ error = complete_walk(nd);
+ if (error)
+ return error;
+ }
+ if (!(file->f_mode & FMODE_CREATED))
+ audit_inode(nd->name, nd->path.dentry, 0);
+ if (open_flag & O_CREAT) {
+ if ((open_flag & O_EXCL) && !(file->f_mode & FMODE_CREATED))
+ return -EEXIST;
+ if (d_is_dir(nd->path.dentry))
+ return -EISDIR;
+ error = may_create_in_sticky(nd->dir_mode, nd->dir_uid,
+ d_backing_inode(nd->path.dentry));
+ if (unlikely(error))
+ return error;
+ }
+ if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
+ return -ENOTDIR;
+
+ do_truncate = false;
+ acc_mode = op->acc_mode;
if (file->f_mode & FMODE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
- will_truncate = false;
acc_mode = 0;
- path_to_nameidata(&path, nd);
- goto finish_open_created;
- }
-
- /*
- * If atomic_open() acquired write access it is dropped now due to
- * possible mount and symlink following (this might be optimized away if
- * necessary...)
- */
- if (got_write) {
- mnt_drop_write(nd->path.mnt);
- got_write = false;
- }
-
- error = follow_managed(&path, nd);
- if (unlikely(error < 0))
- return error;
-
- /*
- * create/update audit record if it already exists.
- */
- audit_inode(nd->name, path.dentry, 0);
-
- if (unlikely((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))) {
- path_to_nameidata(&path, nd);
- return -EEXIST;
- }
-
- seq = 0; /* out of RCU mode, so the value doesn't matter */
- inode = d_backing_inode(path.dentry);
-finish_lookup:
- error = step_into(nd, &path, 0, inode, seq);
- if (unlikely(error))
- return error;
-finish_open:
- /* Why this, you ask? _Now_ we might have grown LOOKUP_JUMPED... */
- error = complete_walk(nd);
- if (error)
- return error;
- audit_inode(nd->name, nd->path.dentry, 0);
- if (open_flag & O_CREAT) {
- error = -EISDIR;
- if (d_is_dir(nd->path.dentry))
- goto out;
- error = may_create_in_sticky(dir_mode, dir_uid,
- d_backing_inode(nd->path.dentry));
- if (unlikely(error))
- goto out;
- }
- error = -ENOTDIR;
- if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
- goto out;
- if (!d_is_reg(nd->path.dentry))
- will_truncate = false;
-
- if (will_truncate) {
+ } else if (d_is_reg(nd->path.dentry) && open_flag & O_TRUNC) {
error = mnt_want_write(nd->path.mnt);
if (error)
- goto out;
- got_write = true;
+ return error;
+ do_truncate = true;
}
-finish_open_created:
error = may_open(&nd->path, acc_mode, open_flag);
- if (error)
- goto out;
- BUG_ON(file->f_mode & FMODE_OPENED); /* once it's opened, it's opened */
- error = vfs_open(&nd->path, file);
- if (error)
- goto out;
-opened:
- error = ima_file_check(file, op->acc_mode);
- if (!error && will_truncate)
+ if (!error && !(file->f_mode & FMODE_OPENED))
+ error = vfs_open(&nd->path, file);
+ if (!error)
+ error = ima_file_check(file, op->acc_mode);
+ if (!error && do_truncate)
error = handle_truncate(file);
-out:
if (unlikely(error > 0)) {
WARN_ON(1);
error = -EINVAL;
}
- if (got_write)
+ if (do_truncate)
mnt_drop_write(nd->path.mnt);
return error;
}
@@ -3722,10 +3458,10 @@ static struct file *path_openat(struct nameidata *nd,
} else {
const char *s = path_init(nd, flags);
while (!(error = link_path_walk(s, nd)) &&
- (error = do_last(nd, file, op)) > 0) {
- nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
- s = trailing_symlink(nd);
- }
+ (s = open_last_lookups(nd, file, op)) != NULL)
+ ;
+ if (!error)
+ error = do_open(nd, file, op);
terminate_walk(nd);
}
if (likely(!error)) {
diff --git a/fs/namespace.c b/fs/namespace.c
index cabcc0ee4f47..b547a0670e51 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1691,7 +1691,7 @@ int ksys_umount(char __user *name, int flags)
struct path path;
struct mount *mnt;
int retval;
- int lookup_flags = 0;
+ int lookup_flags = LOOKUP_MOUNTPOINT;
if (flags & ~(MNT_FORCE | MNT_DETACH | MNT_EXPIRE | UMOUNT_NOFOLLOW))
return -EINVAL;
@@ -1702,7 +1702,7 @@ int ksys_umount(char __user *name, int flags)
if (!(flags & UMOUNT_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
- retval = user_path_mountpoint_at(AT_FDCWD, name, lookup_flags, &path);
+ retval = user_path_at(AT_FDCWD, name, lookup_flags, &path);
if (retval)
goto out;
mnt = real_mount(path.mnt);
@@ -2727,45 +2727,32 @@ static int do_move_mount_old(struct path *path, const char *old_name)
/*
* add a mount into a namespace's mount tree
*/
-static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
+ struct path *path, int mnt_flags)
{
- struct mountpoint *mp;
- struct mount *parent;
- int err;
+ struct mount *parent = real_mount(path->mnt);
mnt_flags &= ~MNT_INTERNAL_FLAGS;
- mp = lock_mount(path);
- if (IS_ERR(mp))
- return PTR_ERR(mp);
-
- parent = real_mount(path->mnt);
- err = -EINVAL;
if (unlikely(!check_mnt(parent))) {
/* that's acceptable only for automounts done in private ns */
if (!(mnt_flags & MNT_SHRINKABLE))
- goto unlock;
+ return -EINVAL;
/* ... and for those we'd better have mountpoint still alive */
if (!parent->mnt_ns)
- goto unlock;
+ return -EINVAL;
}
/* Refuse the same filesystem on the same mount point */
- err = -EBUSY;
if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
path->mnt->mnt_root == path->dentry)
- goto unlock;
+ return -EBUSY;
- err = -EINVAL;
if (d_is_symlink(newmnt->mnt.mnt_root))
- goto unlock;
+ return -EINVAL;
newmnt->mnt.mnt_flags = mnt_flags;
- err = graft_tree(newmnt, parent, mp);
-
-unlock:
- unlock_mount(mp);
- return err;
+ return graft_tree(newmnt, parent, mp);
}
static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -2778,6 +2765,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
unsigned int mnt_flags)
{
struct vfsmount *mnt;
+ struct mountpoint *mp;
struct super_block *sb = fc->root->d_sb;
int error;
@@ -2798,7 +2786,13 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
mnt_warn_timestamp_expiry(mountpoint, mnt);
- error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
+ mp = lock_mount(mountpoint);
+ if (IS_ERR(mp)) {
+ mntput(mnt);
+ return PTR_ERR(mp);
+ }
+ error = do_add_mount(real_mount(mnt), mp, mountpoint, mnt_flags);
+ unlock_mount(mp);
if (error < 0)
mntput(mnt);
return error;
@@ -2859,23 +2853,63 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
int finish_automount(struct vfsmount *m, struct path *path)
{
- struct mount *mnt = real_mount(m);
+ struct dentry *dentry = path->dentry;
+ struct mountpoint *mp;
+ struct mount *mnt;
int err;
+
+ if (!m)
+ return 0;
+ if (IS_ERR(m))
+ return PTR_ERR(m);
+
+ mnt = real_mount(m);
/* The new mount record should have at least 2 refs to prevent it being
* expired before we get a chance to add it
*/
BUG_ON(mnt_get_count(mnt) < 2);
if (m->mnt_sb == path->mnt->mnt_sb &&
- m->mnt_root == path->dentry) {
+ m->mnt_root == dentry) {
err = -ELOOP;
- goto fail;
+ goto discard;
}
- err = do_add_mount(mnt, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
- if (!err)
- return 0;
-fail:
+ /*
+ * we don't want to use lock_mount() - in this case finding something
+ * that overmounts our mountpoint to be means "quitely drop what we've
+ * got", not "try to mount it on top".
+ */
+ inode_lock(dentry->d_inode);
+ namespace_lock();
+ if (unlikely(cant_mount(dentry))) {
+ err = -ENOENT;
+ goto discard_locked;
+ }
+ rcu_read_lock();
+ if (unlikely(__lookup_mnt(path->mnt, dentry))) {
+ rcu_read_unlock();
+ err = 0;
+ goto discard_locked;
+ }
+ rcu_read_unlock();
+ mp = get_mountpoint(dentry);
+ if (IS_ERR(mp)) {
+ err = PTR_ERR(mp);
+ goto discard_locked;
+ }
+
+ err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
+ unlock_mount(mp);
+ if (unlikely(err))
+ goto discard;
+ mntput(m);
+ return 0;
+
+discard_locked:
+ namespace_unlock();
+ inode_unlock(dentry->d_inode);
+discard:
/* remove m from any expiration list it may be on */
if (!list_empty(&mnt->mnt_expire)) {
namespace_lock();
diff --git a/fs/open.c b/fs/open.c
index 8488d64267c4..fbe912eb1768 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1058,8 +1058,10 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
if (flags & O_CREAT) {
op->intent |= LOOKUP_CREATE;
- if (flags & O_EXCL)
+ if (flags & O_EXCL) {
op->intent |= LOOKUP_EXCL;
+ flags |= O_NOFOLLOW;
+ }
}
if (flags & O_DIRECTORY)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 832f2d373f50..257924c31074 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -406,11 +406,11 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
static int lock_trace(struct task_struct *task)
{
- int err = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ int err = mutex_lock_killable(&task->signal->exec_update_mutex);
if (err)
return err;
if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_FSCREDS)) {
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
return -EPERM;
}
return 0;
@@ -418,7 +418,7 @@ static int lock_trace(struct task_struct *task)
static void unlock_trace(struct task_struct *task)
{
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
}
#ifdef CONFIG_STACKTRACE
@@ -1835,11 +1835,25 @@ void task_dump_owner(struct task_struct *task, umode_t mode,
*rgid = gid;
}
+void proc_pid_evict_inode(struct proc_inode *ei)
+{
+ struct pid *pid = ei->pid;
+
+ if (S_ISDIR(ei->vfs_inode.i_mode)) {
+ spin_lock(&pid->wait_pidfd.lock);
+ hlist_del_init_rcu(&ei->sibling_inodes);
+ spin_unlock(&pid->wait_pidfd.lock);
+ }
+
+ put_pid(pid);
+}
+
struct inode *proc_pid_make_inode(struct super_block * sb,
struct task_struct *task, umode_t mode)
{
struct inode * inode;
struct proc_inode *ei;
+ struct pid *pid;
/* We need a new inode */
@@ -1857,10 +1871,18 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
/*
* grab the reference to task.
*/
- ei->pid = get_task_pid(task, PIDTYPE_PID);
- if (!ei->pid)
+ pid = get_task_pid(task, PIDTYPE_PID);
+ if (!pid)
goto out_unlock;
+ /* Let the pid remember us for quick removal */
+ ei->pid = pid;
+ if (S_ISDIR(mode)) {
+ spin_lock(&pid->wait_pidfd.lock);
+ hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+ spin_unlock(&pid->wait_pidfd.lock);
+ }
+
task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
security_task_to_inode(task, inode);
@@ -2862,7 +2884,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh
unsigned long flags;
int result;
- result = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ result = mutex_lock_killable(&task->signal->exec_update_mutex);
if (result)
return result;
@@ -2898,7 +2920,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh
result = 0;
out_unlock:
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
return result;
}
@@ -3234,90 +3256,29 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
.permission = proc_pid_permission,
};
-static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
-{
- struct dentry *dentry, *leader, *dir;
- char buf[10 + 1];
- struct qstr name;
-
- name.name = buf;
- name.len = snprintf(buf, sizeof(buf), "%u", pid);
- /* no ->d_hash() rejects on procfs */
- dentry = d_hash_and_lookup(mnt->mnt_root, &name);
- if (dentry) {
- d_invalidate(dentry);
- dput(dentry);
- }
-
- if (pid == tgid)
- return;
-
- name.name = buf;
- name.len = snprintf(buf, sizeof(buf), "%u", tgid);
- leader = d_hash_and_lookup(mnt->mnt_root, &name);
- if (!leader)
- goto out;
-
- name.name = "task";
- name.len = strlen(name.name);
- dir = d_hash_and_lookup(leader, &name);
- if (!dir)
- goto out_put_leader;
-
- name.name = buf;
- name.len = snprintf(buf, sizeof(buf), "%u", pid);
- dentry = d_hash_and_lookup(dir, &name);
- if (dentry) {
- d_invalidate(dentry);
- dput(dentry);
- }
-
- dput(dir);
-out_put_leader:
- dput(leader);
-out:
- return;
-}
-
/**
- * proc_flush_task - Remove dcache entries for @task from the /proc dcache.
- * @task: task that should be flushed.
+ * proc_flush_pid - Remove dcache entries for @pid from the /proc dcache.
+ * @pid: pid that should be flushed.
*
- * When flushing dentries from proc, one needs to flush them from global
- * proc (proc_mnt) and from all the namespaces' procs this task was seen
- * in. This call is supposed to do all of this job.
- *
- * Looks in the dcache for
- * /proc/@pid
- * /proc/@tgid/task/@pid
- * if either directory is present flushes it and all of it'ts children
- * from the dcache.
+ * This function walks a list of inodes (that belong to any proc
+ * filesystem) that are attached to the pid and flushes them from
+ * the dentry cache.
*
* It is safe and reasonable to cache /proc entries for a task until
* that task exits. After that they just clog up the dcache with
* useless entries, possibly causing useful dcache entries to be
- * flushed instead. This routine is proved to flush those useless
- * dcache entries at process exit time.
+ * flushed instead. This routine is provided to flush those useless
+ * dcache entries when a process is reaped.
*
* NOTE: This routine is just an optimization so it does not guarantee
- * that no dcache entries will exist at process exit time it
- * just makes it very unlikely that any will persist.
+ * that no dcache entries will exist after a process is reaped
+ * it just makes it very unlikely that any will persist.
*/
-void proc_flush_task(struct task_struct *task)
+void proc_flush_pid(struct pid *pid)
{
- int i;
- struct pid *pid, *tgid;
- struct upid *upid;
-
- pid = task_pid(task);
- tgid = task_tgid(task);
-
- for (i = 0; i <= pid->level; i++) {
- upid = &pid->numbers[i];
- proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
- tgid->numbers[i].nr);
- }
+ proc_invalidate_siblings_dcache(&pid->inodes, &pid->wait_pidfd.lock);
+ put_pid(pid);
}
static struct dentry *proc_pid_instantiate(struct dentry * dentry,
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 6da18316d209..1e730ea1dcd6 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -33,21 +33,27 @@ static void proc_evict_inode(struct inode *inode)
{
struct proc_dir_entry *de;
struct ctl_table_header *head;
+ struct proc_inode *ei = PROC_I(inode);
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
/* Stop tracking associated processes */
- put_pid(PROC_I(inode)->pid);
+ if (ei->pid) {
+ proc_pid_evict_inode(ei);
+ ei->pid = NULL;
+ }
/* Let go of any associated proc directory entry */
- de = PDE(inode);
- if (de)
+ de = ei->pde;
+ if (de) {
pde_put(de);
+ ei->pde = NULL;
+ }
- head = PROC_I(inode)->sysctl;
+ head = ei->sysctl;
if (head) {
- RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
+ RCU_INIT_POINTER(ei->sysctl, NULL);
proc_sys_evict_inode(inode, head);
}
}
@@ -68,6 +74,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
ei->pde = NULL;
ei->sysctl = NULL;
ei->sysctl_entry = NULL;
+ INIT_HLIST_NODE(&ei->sibling_inodes);
ei->ns_ops = NULL;
return &ei->vfs_inode;
}
@@ -102,6 +109,62 @@ void __init proc_init_kmemcache(void)
BUILD_BUG_ON(sizeof(struct proc_dir_entry) >= SIZEOF_PDE);
}
+void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock)
+{
+ struct inode *inode;
+ struct proc_inode *ei;
+ struct hlist_node *node;
+ struct super_block *old_sb = NULL;
+
+ rcu_read_lock();
+ for (;;) {
+ struct super_block *sb;
+ node = hlist_first_rcu(inodes);
+ if (!node)
+ break;
+ ei = hlist_entry(node, struct proc_inode, sibling_inodes);
+ spin_lock(lock);
+ hlist_del_init_rcu(&ei->sibling_inodes);
+ spin_unlock(lock);
+
+ inode = &ei->vfs_inode;
+ sb = inode->i_sb;
+ if ((sb != old_sb) && !atomic_inc_not_zero(&sb->s_active))
+ continue;
+ inode = igrab(inode);
+ rcu_read_unlock();
+ if (sb != old_sb) {
+ if (old_sb)
+ deactivate_super(old_sb);
+ old_sb = sb;
+ }
+ if (unlikely(!inode)) {
+ rcu_read_lock();
+ continue;
+ }
+
+ if (S_ISDIR(inode->i_mode)) {
+ struct dentry *dir = d_find_any_alias(inode);
+ if (dir) {
+ d_invalidate(dir);
+ dput(dir);
+ }
+ } else {
+ struct dentry *dentry;
+ while ((dentry = d_find_alias(inode))) {
+ d_invalidate(dentry);
+ dput(dentry);
+ }
+ }
+ iput(inode);
+
+ rcu_read_lock();
+ }
+ rcu_read_unlock();
+ if (old_sb)
+ deactivate_super(old_sb);
+}
+
static int proc_show_options(struct seq_file *seq, struct dentry *root)
{
struct super_block *sb = root->d_sb;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 41587276798e..9e294f0290e5 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -91,7 +91,7 @@ struct proc_inode {
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
- struct hlist_node sysctl_inodes;
+ struct hlist_node sibling_inodes;
const struct proc_ns_operations *ns_ops;
struct inode vfs_inode;
} __randomize_layout;
@@ -158,6 +158,7 @@ extern int proc_pid_statm(struct seq_file *, struct pid_namespace *,
extern const struct dentry_operations pid_dentry_operations;
extern int pid_getattr(const struct path *, struct kstat *, u32, unsigned int);
extern int proc_setattr(struct dentry *, struct iattr *);
+extern void proc_pid_evict_inode(struct proc_inode *);
extern struct inode *proc_pid_make_inode(struct super_block *, struct task_struct *, umode_t);
extern void pid_update_inode(struct task_struct *, struct inode *);
extern int pid_delete_dentry(const struct dentry *);
@@ -210,6 +211,7 @@ extern const struct inode_operations proc_pid_link_inode_operations;
extern const struct super_operations proc_sops;
void proc_init_kmemcache(void);
+void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock);
void set_proc_pid_nlink(void);
extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
extern void proc_entry_rundown(struct proc_dir_entry *);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c75bb4632ed1..b6f5d459b087 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -267,42 +267,9 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
}
-static void proc_sys_prune_dcache(struct ctl_table_header *head)
+static void proc_sys_invalidate_dcache(struct ctl_table_header *head)
{
- struct inode *inode;
- struct proc_inode *ei;
- struct hlist_node *node;
- struct super_block *sb;
-
- rcu_read_lock();
- for (;;) {
- node = hlist_first_rcu(&head->inodes);
- if (!node)
- break;
- ei = hlist_entry(node, struct proc_inode, sysctl_inodes);
- spin_lock(&sysctl_lock);
- hlist_del_init_rcu(&ei->sysctl_inodes);
- spin_unlock(&sysctl_lock);
-
- inode = &ei->vfs_inode;
- sb = inode->i_sb;
- if (!atomic_inc_not_zero(&sb->s_active))
- continue;
- inode = igrab(inode);
- rcu_read_unlock();
- if (unlikely(!inode)) {
- deactivate_super(sb);
- rcu_read_lock();
- continue;
- }
-
- d_prune_aliases(inode);
- iput(inode);
- deactivate_super(sb);
-
- rcu_read_lock();
- }
- rcu_read_unlock();
+ proc_invalidate_siblings_dcache(&head->inodes, &sysctl_lock);
}
/* called under sysctl_lock, will reacquire if has to wait */
@@ -324,10 +291,10 @@ static void start_unregistering(struct ctl_table_header *p)
spin_unlock(&sysctl_lock);
}
/*
- * Prune dentries for unregistered sysctls: namespaced sysctls
+ * Invalidate dentries for unregistered sysctls: namespaced sysctls
* can have duplicate names and contaminate dcache very badly.
*/
- proc_sys_prune_dcache(p);
+ proc_sys_invalidate_dcache(p);
/*
* do not remove from the list until nobody holds it; walking the
* list in do_sysctl() relies on that.
@@ -483,7 +450,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
}
ei->sysctl = head;
ei->sysctl_entry = table;
- hlist_add_head_rcu(&ei->sysctl_inodes, &head->inodes);
+ hlist_add_head_rcu(&ei->sibling_inodes, &head->inodes);
head->count++;
spin_unlock(&sysctl_lock);
@@ -514,7 +481,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- hlist_del_init_rcu(&PROC_I(inode)->sysctl_inodes);
+ hlist_del_init_rcu(&PROC_I(inode)->sibling_inodes);
if (!--head->count)
kfree_rcu(head, rcu);
spin_unlock(&sysctl_lock);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 73c9d8b2f0c8..894801320ee0 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -292,39 +292,3 @@ struct proc_dir_entry proc_root = {
.subdir = RB_ROOT,
.name = "/proc",
};
-
-int pid_ns_prepare_proc(struct pid_namespace *ns)
-{
- struct proc_fs_context *ctx;
- struct fs_context *fc;
- struct vfsmount *mnt;
-
- fc = fs_context_for_mount(&proc_fs_type, SB_KERNMOUNT);
- if (IS_ERR(fc))
- return PTR_ERR(fc);
-
- if (fc->user_ns != ns->user_ns) {
- put_user_ns(fc->user_ns);
- fc->user_ns = get_user_ns(ns->user_ns);
- }
-
- ctx = fc->fs_private;
- if (ctx->pid_ns != ns) {
- put_pid_ns(ctx->pid_ns);
- get_pid_ns(ns);
- ctx->pid_ns = ns;
- }
-
- mnt = fc_mount(fc);
- put_fs_context(fc);
- if (IS_ERR(mnt))
- return PTR_ERR(mnt);
-
- ns->proc_mnt = mnt;
- return 0;
-}
-
-void pid_ns_release_proc(struct pid_namespace *ns)
-{
- kern_unmount(ns->proc_mnt);
-}
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 2dfb550c6723..9b0c46a6ca1f 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -81,6 +81,17 @@ struct kunit_resource {
struct kunit;
+/* Size of log associated with test. */
+#define KUNIT_LOG_SIZE 512
+
+/*
+ * TAP specifies subtest stream indentation of 4 spaces, 8 spaces for a
+ * sub-subtest. See the "Subtests" section in
+ * https://node-tap.org/tap-protocol/
+ */
+#define KUNIT_SUBTEST_INDENT " "
+#define KUNIT_SUBSUBTEST_INDENT " "
+
/**
* struct kunit_case - represents an individual test case.
*
@@ -123,8 +134,14 @@ struct kunit_case {
/* private: internal use only. */
bool success;
+ char *log;
};
+static inline char *kunit_status_to_string(bool status)
+{
+ return status ? "ok" : "not ok";
+}
+
/**
* KUNIT_CASE - A helper for creating a &struct kunit_case
*
@@ -157,6 +174,10 @@ struct kunit_suite {
int (*init)(struct kunit *test);
void (*exit)(struct kunit *test);
struct kunit_case *test_cases;
+
+ /* private - internal use only */
+ struct dentry *debugfs;
+ char *log;
};
/**
@@ -175,6 +196,7 @@ struct kunit {
/* private: internal use only. */
const char *name; /* Read only after initialization! */
+ char *log; /* Points at case log after initialization */
struct kunit_try_catch try_catch;
/*
* success starts as true, and may only be set to false during a
@@ -193,10 +215,19 @@ struct kunit {
struct list_head resources; /* Protected by lock. */
};
-void kunit_init_test(struct kunit *test, const char *name);
+void kunit_init_test(struct kunit *test, const char *name, char *log);
int kunit_run_tests(struct kunit_suite *suite);
+size_t kunit_suite_num_test_cases(struct kunit_suite *suite);
+
+unsigned int kunit_test_case_num(struct kunit_suite *suite,
+ struct kunit_case *test_case);
+
+int __kunit_test_suites_init(struct kunit_suite **suites);
+
+void __kunit_test_suites_exit(struct kunit_suite **suites);
+
/**
* kunit_test_suites() - used to register one or more &struct kunit_suite
* with KUnit.
@@ -226,20 +257,22 @@ int kunit_run_tests(struct kunit_suite *suite);
static struct kunit_suite *suites[] = { __VA_ARGS__, NULL}; \
static int kunit_test_suites_init(void) \
{ \
- unsigned int i; \
- for (i = 0; suites[i] != NULL; i++) \
- kunit_run_tests(suites[i]); \
- return 0; \
+ return __kunit_test_suites_init(suites); \
} \
late_initcall(kunit_test_suites_init); \
static void __exit kunit_test_suites_exit(void) \
{ \
- return; \
+ return __kunit_test_suites_exit(suites); \
} \
module_exit(kunit_test_suites_exit)
#define kunit_test_suite(suite) kunit_test_suites(&suite)
+#define kunit_suite_for_each_test_case(suite, test_case) \
+ for (test_case = suite->test_cases; test_case->run_case; test_case++)
+
+bool kunit_suite_has_succeeded(struct kunit_suite *suite);
+
/*
* Like kunit_alloc_resource() below, but returns the struct kunit_resource
* object that contains the allocation. This is mostly for testing purposes.
@@ -356,8 +389,22 @@ static inline void *kunit_kzalloc(struct kunit *test, size_t size, gfp_t gfp)
void kunit_cleanup(struct kunit *test);
-#define kunit_printk(lvl, test, fmt, ...) \
- printk(lvl "\t# %s: " fmt, (test)->name, ##__VA_ARGS__)
+void kunit_log_append(char *log, const char *fmt, ...);
+
+/*
+ * printk and log to per-test or per-suite log buffer. Logging only done
+ * if CONFIG_KUNIT_DEBUGFS is 'y'; if it is 'n', no log is allocated/used.
+ */
+#define kunit_log(lvl, test_or_suite, fmt, ...) \
+ do { \
+ printk(lvl fmt, ##__VA_ARGS__); \
+ kunit_log_append((test_or_suite)->log, fmt "\n", \
+ ##__VA_ARGS__); \
+ } while (0)
+
+#define kunit_printk(lvl, test, fmt, ...) \
+ kunit_log(lvl, test, KUNIT_SUBTEST_INDENT "# %s: " fmt, \
+ (test)->name, ##__VA_ARGS__)
/**
* kunit_info() - Prints an INFO level message associated with @test.
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index b40fc633f3be..a345d9fed3d8 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -44,7 +44,13 @@ struct linux_binprm {
* exec has happened. Used to sanitize execution environment
* and to set AT_SECURE auxv for glibc.
*/
- secureexec:1;
+ secureexec:1,
+ /*
+ * Set by flush_old_exec, when exec_mmap has been called.
+ * This is past the point of no return, when the
+ * exec_update_mutex has been taken.
+ */
+ called_exec_mmap:1;
#ifdef __alpha__
unsigned int taso:1;
#endif
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index ddf9f7144c43..7475051100c7 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -3,58 +3,8 @@
* Copyright 2013 Red Hat Inc.
*
* Authors: Jérôme Glisse
- */
-/*
- * Heterogeneous Memory Management (HMM)
*
- * See Documentation/vm/hmm.rst for reasons and overview of what HMM is and it
- * is for. Here we focus on the HMM API description, with some explanation of
- * the underlying implementation.
- *
- * Short description: HMM provides a set of helpers to share a virtual address
- * space between CPU and a device, so that the device can access any valid
- * address of the process (while still obeying memory protection). HMM also
- * provides helpers to migrate process memory to device memory, and back. Each
- * set of functionality (address space mirroring, and migration to and from
- * device memory) can be used independently of the other.
- *
- *
- * HMM address space mirroring API:
- *
- * Use HMM address space mirroring if you want to mirror a range of the CPU
- * page tables of a process into a device page table. Here, "mirror" means "keep
- * synchronized". Prerequisites: the device must provide the ability to write-
- * protect its page tables (at PAGE_SIZE granularity), and must be able to
- * recover from the resulting potential page faults.
- *
- * HMM guarantees that at any point in time, a given virtual address points to
- * either the same memory in both CPU and device page tables (that is: CPU and
- * device page tables each point to the same pages), or that one page table (CPU
- * or device) points to no entry, while the other still points to the old page
- * for the address. The latter case happens when the CPU page table update
- * happens first, and then the update is mirrored over to the device page table.
- * This does not cause any issue, because the CPU page table cannot start
- * pointing to a new page until the device page table is invalidated.
- *
- * HMM uses mmu_notifiers to monitor the CPU page tables, and forwards any
- * updates to each device driver that has registered a mirror. It also provides
- * some API calls to help with taking a snapshot of the CPU page table, and to
- * synchronize with any updates that might happen concurrently.
- *
- *
- * HMM migration to and from device memory:
- *
- * HMM provides a set of helpers to hotplug device memory as ZONE_DEVICE, with
- * a new MEMORY_DEVICE_PRIVATE type. This provides a struct page for each page
- * of the device memory, and allows the device driver to manage its memory
- * using those struct pages. Having struct pages for device memory makes
- * migration easier. Because that memory is not addressable by the CPU it must
- * never be pinned to the device; in other words, any CPU page fault can always
- * cause the device memory to be migrated (copied/moved) back to regular memory.
- *
- * A new migrate helper (migrate_vma()) has been added (see mm/migrate.c) that
- * allows use of a device DMA engine to perform the copy operation between
- * regular system memory and device memory.
+ * See Documentation/vm/hmm.rst for reasons and overview of what HMM is.
*/
#ifndef LINUX_HMM_H
#define LINUX_HMM_H
@@ -74,7 +24,6 @@
* Flags:
* HMM_PFN_VALID: pfn is valid. It has, at least, read permission.
* HMM_PFN_WRITE: CPU page table has write permission set
- * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
*
* The driver provides a flags array for mapping page protections to device
* PTE bits. If the driver valid bit for an entry is bit 3,
@@ -86,7 +35,6 @@
enum hmm_pfn_flag_e {
HMM_PFN_VALID = 0,
HMM_PFN_WRITE,
- HMM_PFN_DEVICE_PRIVATE,
HMM_PFN_FLAG_MAX
};
@@ -122,9 +70,6 @@ enum hmm_pfn_value_e {
*
* @notifier: a mmu_interval_notifier that includes the start/end
* @notifier_seq: result of mmu_interval_read_begin()
- * @hmm: the core HMM structure this range is active against
- * @vma: the vm area struct for the range
- * @list: all range lock are on a list
* @start: range virtual start address (inclusive)
* @end: range virtual end address (exclusive)
* @pfns: array of pfns (big enough for the range)
@@ -132,8 +77,8 @@ enum hmm_pfn_value_e {
* @values: pfn value for some special case (none, special, error, ...)
* @default_flags: default flags for the range (write, read, ... see hmm doc)
* @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter
- * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
- * @valid: pfns array did not change since it has been fill by an HMM function
+ * @pfn_shift: pfn shift value (should be <= PAGE_SHIFT)
+ * @dev_private_owner: owner of device private pages
*/
struct hmm_range {
struct mmu_interval_notifier *notifier;
@@ -146,6 +91,7 @@ struct hmm_range {
uint64_t default_flags;
uint64_t pfn_flags_mask;
uint8_t pfn_shift;
+ void *dev_private_owner;
};
/*
@@ -171,71 +117,10 @@ static inline struct page *hmm_device_entry_to_page(const struct hmm_range *rang
return pfn_to_page(entry >> range->pfn_shift);
}
-/*
- * hmm_device_entry_to_pfn() - return pfn value store in a device entry
- * @range: range use to decode device entry value
- * @entry: device entry to extract pfn from
- * Return: pfn value if device entry is valid, -1UL otherwise
- */
-static inline unsigned long
-hmm_device_entry_to_pfn(const struct hmm_range *range, uint64_t pfn)
-{
- if (pfn == range->values[HMM_PFN_NONE])
- return -1UL;
- if (pfn == range->values[HMM_PFN_ERROR])
- return -1UL;
- if (pfn == range->values[HMM_PFN_SPECIAL])
- return -1UL;
- if (!(pfn & range->flags[HMM_PFN_VALID]))
- return -1UL;
- return (pfn >> range->pfn_shift);
-}
-
-/*
- * hmm_device_entry_from_page() - create a valid device entry for a page
- * @range: range use to encode HMM pfn value
- * @page: page for which to create the device entry
- * Return: valid device entry for the page
- */
-static inline uint64_t hmm_device_entry_from_page(const struct hmm_range *range,
- struct page *page)
-{
- return (page_to_pfn(page) << range->pfn_shift) |
- range->flags[HMM_PFN_VALID];
-}
-
-/*
- * hmm_device_entry_from_pfn() - create a valid device entry value from pfn
- * @range: range use to encode HMM pfn value
- * @pfn: pfn value for which to create the device entry
- * Return: valid device entry for the pfn
- */
-static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
- unsigned long pfn)
-{
- return (pfn << range->pfn_shift) |
- range->flags[HMM_PFN_VALID];
-}
-
-/*
- * Retry fault if non-blocking, drop mmap_sem and return -EAGAIN in that case.
- */
-#define HMM_FAULT_ALLOW_RETRY (1 << 0)
-
-/* Don't fault in missing PTEs, just snapshot the current state. */
-#define HMM_FAULT_SNAPSHOT (1 << 1)
-
-#ifdef CONFIG_HMM_MIRROR
/*
* Please see Documentation/vm/hmm.rst for how to use the range API.
*/
-long hmm_range_fault(struct hmm_range *range, unsigned int flags);
-#else
-static inline long hmm_range_fault(struct hmm_range *range, unsigned int flags)
-{
- return -EOPNOTSUPP;
-}
-#endif
+long hmm_range_fault(struct hmm_range *range);
/*
* HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 6fefb09af7c3..60d97e8fd3c0 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -103,6 +103,9 @@ struct dev_pagemap_ops {
* @type: memory type: see MEMORY_* in memory_hotplug.h
* @flags: PGMAP_* flags to specify defailed behavior
* @ops: method table
+ * @owner: an opaque pointer identifying the entity that manages this
+ * instance. Used by various helpers to make sure that no
+ * foreign ZONE_DEVICE memory is accessed.
*/
struct dev_pagemap {
struct vmem_altmap altmap;
@@ -113,6 +116,7 @@ struct dev_pagemap {
enum memory_type type;
unsigned int flags;
const struct dev_pagemap_ops *ops;
+ void *owner;
};
static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 72120061b7d4..3e546cbf03dd 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -196,6 +196,14 @@ struct migrate_vma {
unsigned long npages;
unsigned long start;
unsigned long end;
+
+ /*
+ * Set to the owner value also stored in page->pgmap->owner for
+ * migrating out of device private memory. If set only device
+ * private pages with this owner are migrated. If not set
+ * device private pages are not migrated at all.
+ */
+ void *src_owner;
};
int migrate_vma_setup(struct migrate_vma *args);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 0e62c3db45e5..2b90097a6cf9 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1211,6 +1211,12 @@ enum mlx5_qcam_feature_groups {
#define MLX5_CAP_FLOWTABLE_RDMA_RX_MAX(mdev, cap) \
MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_rdma.cap)
+#define MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) \
+ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_rdma.cap)
+
+#define MLX5_CAP_FLOWTABLE_RDMA_TX_MAX(mdev, cap) \
+ MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_rdma.cap)
+
#define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \
MLX5_GET(flow_table_eswitch_cap, \
mdev->caps.hca_cur[MLX5_CAP_ESWITCH_FLOW_TABLE], cap)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d143b8bd55c9..6f8f79ef829b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -213,23 +213,6 @@ enum mlx5_port_status {
MLX5_PORT_DOWN = 2,
};
-struct mlx5_bfreg_info {
- u32 *sys_pages;
- int num_low_latency_bfregs;
- unsigned int *count;
-
- /*
- * protect bfreg allocation data structs
- */
- struct mutex lock;
- u32 ver;
- bool lib_uar_4k;
- u32 num_sys_pages;
- u32 num_static_sys_pages;
- u32 total_num_bfregs;
- u32 num_dyn_bfregs;
-};
-
struct mlx5_cmd_first {
__be32 data[4];
};
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index a5cf5c76f348..e2d13e074067 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -77,6 +77,7 @@ enum mlx5_flow_namespace_type {
MLX5_FLOW_NAMESPACE_EGRESS,
MLX5_FLOW_NAMESPACE_RDMA_RX,
MLX5_FLOW_NAMESPACE_RDMA_RX_KERNEL,
+ MLX5_FLOW_NAMESPACE_RDMA_TX,
};
enum {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index cc55cee3b53c..69b27c7dfc3e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -709,7 +709,7 @@ struct mlx5_ifc_flow_table_nic_cap_bits {
struct mlx5_ifc_flow_table_prop_layout_bits flow_table_properties_nic_transmit;
- u8 reserved_at_a00[0x200];
+ struct mlx5_ifc_flow_table_prop_layout_bits flow_table_properties_nic_transmit_rdma;
struct mlx5_ifc_flow_table_prop_layout_bits flow_table_properties_nic_transmit_sniffer;
@@ -879,7 +879,11 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
u8 swp_csum[0x1];
u8 swp_lso[0x1];
u8 cqe_checksum_full[0x1];
- u8 reserved_at_24[0x5];
+ u8 tunnel_stateless_geneve_tx[0x1];
+ u8 tunnel_stateless_mpls_over_udp[0x1];
+ u8 tunnel_stateless_mpls_over_gre[0x1];
+ u8 tunnel_stateless_vxlan_gpe[0x1];
+ u8 tunnel_stateless_ipv4_over_vxlan[0x1];
u8 tunnel_stateless_ip_over_ip[0x1];
u8 reserved_at_2a[0x6];
u8 max_vxlan_udp_ports[0x8];
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 843a3d5da4d8..bc73f16b6c02 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -15,7 +15,7 @@ enum { MAX_NESTED_LINKS = 8 };
/*
* Type of the last component on LOOKUP_PARENT
*/
-enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
+enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
/* pathwalk mode */
#define LOOKUP_FOLLOW 0x0001 /* follow links at the end */
@@ -23,6 +23,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_AUTOMOUNT 0x0004 /* force terminal automount */
#define LOOKUP_EMPTY 0x4000 /* accept empty path [user_... only] */
#define LOOKUP_DOWN 0x8000 /* follow mounts in the starting point */
+#define LOOKUP_MOUNTPOINT 0x0080 /* follow mounts in the end */
#define LOOKUP_REVAL 0x0020 /* tell ->d_revalidate() to trust no cache */
#define LOOKUP_RCU 0x0040 /* RCU pathwalk mode; semi-internal */
@@ -66,7 +67,6 @@ extern void done_path_create(struct path *, struct dentry *);
extern struct dentry *kern_path_locked(const char *, struct path *);
extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
const char *, unsigned int, struct path *);
-extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int);
extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 998ae7d24450..01a0d4e28506 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -62,6 +62,7 @@ struct pid
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
+ struct hlist_head inodes;
/* wait queue for pidfd notifications */
wait_queue_head_t wait_pidfd;
struct rcu_head rcu;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 2ed6af88794b..4956e362e55e 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -33,7 +33,6 @@ struct pid_namespace {
unsigned int level;
struct pid_namespace *parent;
#ifdef CONFIG_PROC_FS
- struct vfsmount *proc_mnt;
struct dentry *proc_self;
struct dentry *proc_thread_self;
#endif
@@ -42,7 +41,6 @@ struct pid_namespace {
#endif
struct user_namespace *user_ns;
struct ucounts *ucounts;
- struct work_struct proc_work;
kgid_t pid_gid;
int hide_pid;
int reboot; /* group exit code if this pidns was rebooted */
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 3dfa92633af3..40a7982b7285 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -32,7 +32,7 @@ struct proc_ops {
typedef int (*proc_write_t)(struct file *, char *, size_t);
extern void proc_root_init(void);
-extern void proc_flush_task(struct task_struct *);
+extern void proc_flush_pid(struct pid *);
extern struct proc_dir_entry *proc_symlink(const char *,
struct proc_dir_entry *, const char *);
@@ -105,7 +105,7 @@ static inline void proc_root_init(void)
{
}
-static inline void proc_flush_task(struct task_struct *task)
+static inline void proc_flush_pid(struct pid *pid)
{
}
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index adff08bfecf9..6abe85c34681 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -50,16 +50,11 @@ enum {
#ifdef CONFIG_PROC_FS
-extern int pid_ns_prepare_proc(struct pid_namespace *ns);
-extern void pid_ns_release_proc(struct pid_namespace *ns);
extern int proc_alloc_inum(unsigned int *pino);
extern void proc_free_inum(unsigned int inum);
#else /* CONFIG_PROC_FS */
-static inline int pid_ns_prepare_proc(struct pid_namespace *ns) { return 0; }
-static inline void pid_ns_release_proc(struct pid_namespace *ns) {}
-
static inline int proc_alloc_inum(unsigned int *inum)
{
*inum = 1;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e8058dfb58f2..6fd3d7f49cad 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -949,8 +949,8 @@ struct task_struct {
struct seccomp seccomp;
/* Thread group tracking: */
- u32 parent_exec_id;
- u32 self_exec_id;
+ u64 parent_exec_id;
+ u64 self_exec_id;
/* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */
spinlock_t alloc_lock;
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 88050259c466..a29df79540ce 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -224,7 +224,14 @@ struct signal_struct {
struct mutex cred_guard_mutex; /* guard against foreign influences on
* credential calculations
- * (notably. ptrace) */
+ * (notably. ptrace)
+ * Deprecated do not use in new code.
+ * Use exec_update_mutex instead.
+ */
+ struct mutex exec_update_mutex; /* Held while task_struct is being
+ * updated during exec, and may have
+ * inconsistent permissions.
+ */
} __randomize_layout;
/*
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index f73e1775ded0..14c893433139 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -32,8 +32,8 @@
* The following internal entries have a special meaning:
*
* 0-62: Sibling entries
- * 256: Zero entry
- * 257: Retry entry
+ * 256: Retry entry
+ * 257: Zero entry
*
* Errors are also represented as internal entries, but use the negative
* space (-4094 to -2). They're never stored in the slots array; only
@@ -1648,6 +1648,7 @@ static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
xa_mark_t mark)
{
struct xa_node *node = xas->xa_node;
+ void *entry;
unsigned int offset;
if (unlikely(xas_not_node(node) || node->shift))
@@ -1659,7 +1660,10 @@ static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
return NULL;
if (offset == XA_CHUNK_SIZE)
return xas_find_marked(xas, max, mark);
- return xa_entry(xas->xa, node, offset);
+ entry = xa_entry(xas->xa, node, offset);
+ if (!entry)
+ return xas_find_marked(xas, max, mark);
+ return entry;
}
/*
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index 870b5e6c06db..e06d13388ae7 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -39,6 +39,7 @@
int rdma_query_gid(struct ib_device *device, u8 port_num, int index,
union ib_gid *gid);
+void *rdma_read_gid_hw_context(const struct ib_gid_attr *attr);
const struct ib_gid_attr *rdma_find_gid(struct ib_device *device,
const union ib_gid *gid,
enum ib_gid_type gid_type,
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 8ec482e391aa..058cfbc2b37f 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -360,7 +360,6 @@ struct ib_cm_req_param {
u32 starting_psn;
const void *private_data;
u8 private_data_len;
- u8 peer_to_peer;
u8 responder_resources;
u8 initiator_depth;
u8 remote_cm_response_timeout;
diff --git a/include/rdma/ib_fmr_pool.h b/include/rdma/ib_fmr_pool.h
index f8982e4e9702..2fd9bfb6d648 100644
--- a/include/rdma/ib_fmr_pool.h
+++ b/include/rdma/ib_fmr_pool.h
@@ -73,7 +73,7 @@ struct ib_pool_fmr {
int remap_count;
u64 io_virtual_address;
int page_list_len;
- u64 page_list[0];
+ u64 page_list[];
};
struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd *pd,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1f779fad3a1e..bbc5cfb57cd2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1876,7 +1876,7 @@ struct ib_flow_eth_filter {
__be16 ether_type;
__be16 vlan_tag;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_eth {
@@ -1890,7 +1890,7 @@ struct ib_flow_ib_filter {
__be16 dlid;
__u8 sl;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_ib {
@@ -1915,7 +1915,7 @@ struct ib_flow_ipv4_filter {
u8 ttl;
u8 flags;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_ipv4 {
@@ -1933,7 +1933,7 @@ struct ib_flow_ipv6_filter {
u8 traffic_class;
u8 hop_limit;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_ipv6 {
@@ -1947,7 +1947,7 @@ struct ib_flow_tcp_udp_filter {
__be16 dst_port;
__be16 src_port;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_tcp_udp {
@@ -1959,7 +1959,7 @@ struct ib_flow_spec_tcp_udp {
struct ib_flow_tunnel_filter {
__be32 tunnel_id;
- u8 real_sz[0];
+ u8 real_sz[];
};
/* ib_flow_spec_tunnel describes the Vxlan tunnel
@@ -1976,7 +1976,7 @@ struct ib_flow_esp_filter {
__be32 spi;
__be32 seq;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_esp {
@@ -1991,7 +1991,7 @@ struct ib_flow_gre_filter {
__be16 protocol;
__be32 key;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_gre {
@@ -2004,7 +2004,7 @@ struct ib_flow_spec_gre {
struct ib_flow_mpls_filter {
__be32 tag;
/* Must be last */
- u8 real_sz[0];
+ u8 real_sz[];
};
struct ib_flow_spec_mpls {
@@ -3627,35 +3627,8 @@ static inline int ib_post_srq_recv(struct ib_srq *srq,
bad_recv_wr ? : &dummy);
}
-/**
- * ib_create_qp_user - Creates a QP associated with the specified protection
- * domain.
- * @pd: The protection domain associated with the QP.
- * @qp_init_attr: A list of initial attributes required to create the
- * QP. If QP creation succeeds, then the attributes are updated to
- * the actual capabilities of the created QP.
- * @udata: Valid user data or NULL for kernel objects
- */
-struct ib_qp *ib_create_qp_user(struct ib_pd *pd,
- struct ib_qp_init_attr *qp_init_attr,
- struct ib_udata *udata);
-
-/**
- * ib_create_qp - Creates a kernel QP associated with the specified protection
- * domain.
- * @pd: The protection domain associated with the QP.
- * @qp_init_attr: A list of initial attributes required to create the
- * QP. If QP creation succeeds, then the attributes are updated to
- * the actual capabilities of the created QP.
- * @udata: Valid user data or NULL for kernel objects
- *
- * NOTE: for user qp use ib_create_qp_user with valid udata!
- */
-static inline struct ib_qp *ib_create_qp(struct ib_pd *pd,
- struct ib_qp_init_attr *qp_init_attr)
-{
- return ib_create_qp_user(pd, qp_init_attr, NULL);
-}
+struct ib_qp *ib_create_qp(struct ib_pd *pd,
+ struct ib_qp_init_attr *qp_init_attr);
/**
* ib_modify_qp_with_udata - Modifies the attributes for the specified QP.
diff --git a/include/rdma/opa_vnic.h b/include/rdma/opa_vnic.h
index 0c07a70bd7f6..e90b149fe92a 100644
--- a/include/rdma/opa_vnic.h
+++ b/include/rdma/opa_vnic.h
@@ -75,7 +75,7 @@
struct opa_vnic_rdma_netdev {
struct rdma_netdev rn; /* keep this first */
/* followed by device private data */
- char *dev_priv[0];
+ char *dev_priv[];
};
static inline void *opa_vnic_priv(const struct net_device *dev)
diff --git a/include/rdma/rdmavt_mr.h b/include/rdma/rdmavt_mr.h
index 72a3856d4057..ce6c888f7fe7 100644
--- a/include/rdma/rdmavt_mr.h
+++ b/include/rdma/rdmavt_mr.h
@@ -85,7 +85,7 @@ struct rvt_mregion {
u8 lkey_published; /* in global table */
struct percpu_ref refcount;
struct completion comp; /* complete when refcount goes to zero */
- struct rvt_segarray *map[0]; /* the segments */
+ struct rvt_segarray *map[]; /* the segments */
};
#define RVT_MAX_LKEY_TABLE_BITS 23
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index 0d5c70e2d8ab..5fc10108703a 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -191,7 +191,7 @@ struct rvt_swqe {
u32 ssn; /* send sequence number */
u32 length; /* total length of data in sg_list */
void *priv; /* driver dependent field */
- struct rvt_sge sg_list[0];
+ struct rvt_sge sg_list[];
};
/**
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index 28570ac2b6a0..9f3b1e004046 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -173,7 +173,7 @@ enum uapi_radix_data {
UVERBS_API_OBJ_KEY_BITS = 5,
UVERBS_API_OBJ_KEY_SHIFT =
UVERBS_API_METHOD_KEY_BITS + UVERBS_API_METHOD_KEY_SHIFT,
- UVERBS_API_OBJ_KEY_NUM_CORE = 24,
+ UVERBS_API_OBJ_KEY_NUM_CORE = 20,
UVERBS_API_OBJ_KEY_NUM_DRIVER =
(1 << UVERBS_API_OBJ_KEY_BITS) - UVERBS_API_OBJ_KEY_NUM_CORE,
UVERBS_API_OBJ_KEY_MASK = GENMASK(31, UVERBS_API_OBJ_KEY_SHIFT),
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index 624f5b53eb1f..df1cc3641bda 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -49,6 +49,7 @@ enum {
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC = 1 << 7,
MLX5_QP_FLAG_ALLOW_SCATTER_CQE = 1 << 8,
MLX5_QP_FLAG_PACKET_BASED_CREDIT_MODE = 1 << 9,
+ MLX5_QP_FLAG_UAR_PAGE_INDEX = 1 << 10,
};
enum {
@@ -78,6 +79,7 @@ struct mlx5_ib_alloc_ucontext_req {
enum mlx5_lib_caps {
MLX5_LIB_CAP_4K_UAR = (__u64)1 << 0,
+ MLX5_LIB_CAP_DYN_UAR = (__u64)1 << 1,
};
enum mlx5_ib_alloc_uctx_v2_flags {
@@ -266,6 +268,7 @@ struct mlx5_ib_query_device_resp {
enum mlx5_ib_create_cq_flags {
MLX5_IB_CREATE_CQ_FLAGS_CQE_128B_PAD = 1 << 0,
+ MLX5_IB_CREATE_CQ_FLAGS_UAR_PAGE_INDEX = 1 << 1,
};
struct mlx5_ib_create_cq {
@@ -275,6 +278,9 @@ struct mlx5_ib_create_cq {
__u8 cqe_comp_en;
__u8 cqe_comp_res_format;
__u16 flags;
+ __u16 uar_page_index;
+ __u16 reserved0;
+ __u32 reserved1;
};
struct mlx5_ib_create_cq_resp {
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index afe7da6f2b8e..24f3388c3182 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -131,6 +131,23 @@ enum mlx5_ib_var_obj_methods {
MLX5_IB_METHOD_VAR_OBJ_DESTROY,
};
+enum mlx5_ib_uar_alloc_attrs {
+ MLX5_IB_ATTR_UAR_OBJ_ALLOC_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+ MLX5_IB_ATTR_UAR_OBJ_ALLOC_TYPE,
+ MLX5_IB_ATTR_UAR_OBJ_ALLOC_MMAP_OFFSET,
+ MLX5_IB_ATTR_UAR_OBJ_ALLOC_MMAP_LENGTH,
+ MLX5_IB_ATTR_UAR_OBJ_ALLOC_PAGE_ID,
+};
+
+enum mlx5_ib_uar_obj_destroy_attrs {
+ MLX5_IB_ATTR_UAR_OBJ_DESTROY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+};
+
+enum mlx5_ib_uar_obj_methods {
+ MLX5_IB_METHOD_UAR_OBJ_ALLOC = (1U << UVERBS_ID_NS_SHIFT),
+ MLX5_IB_METHOD_UAR_OBJ_DESTROY,
+};
+
enum mlx5_ib_devx_umem_reg_attrs {
MLX5_IB_ATTR_DEVX_UMEM_REG_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
MLX5_IB_ATTR_DEVX_UMEM_REG_ADDR,
@@ -143,6 +160,22 @@ enum mlx5_ib_devx_umem_dereg_attrs {
MLX5_IB_ATTR_DEVX_UMEM_DEREG_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
};
+enum mlx5_ib_pp_obj_methods {
+ MLX5_IB_METHOD_PP_OBJ_ALLOC = (1U << UVERBS_ID_NS_SHIFT),
+ MLX5_IB_METHOD_PP_OBJ_DESTROY,
+};
+
+enum mlx5_ib_pp_alloc_attrs {
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_CTX,
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_FLAGS,
+ MLX5_IB_ATTR_PP_OBJ_ALLOC_INDEX,
+};
+
+enum mlx5_ib_pp_obj_destroy_attrs {
+ MLX5_IB_ATTR_PP_OBJ_DESTROY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+};
+
enum mlx5_ib_devx_umem_methods {
MLX5_IB_METHOD_DEVX_UMEM_REG = (1U << UVERBS_ID_NS_SHIFT),
MLX5_IB_METHOD_DEVX_UMEM_DEREG,
@@ -173,6 +206,8 @@ enum mlx5_ib_objects {
MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD,
MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD,
MLX5_IB_OBJECT_VAR,
+ MLX5_IB_OBJECT_PP,
+ MLX5_IB_OBJECT_UAR,
};
enum mlx5_ib_flow_matcher_create_attrs {
diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
index 88b6ca70c2fe..56b26eaea083 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_verbs.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
@@ -44,6 +44,7 @@ enum mlx5_ib_uapi_flow_table_type {
MLX5_IB_UAPI_FLOW_TABLE_TYPE_NIC_TX = 0x1,
MLX5_IB_UAPI_FLOW_TABLE_TYPE_FDB = 0x2,
MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_RX = 0x3,
+ MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TX = 0x4,
};
enum mlx5_ib_uapi_flow_action_packet_reformat_type {
@@ -73,5 +74,14 @@ struct mlx5_ib_uapi_devx_async_event_hdr {
__u8 out_data[];
};
+enum mlx5_ib_uapi_pp_alloc_flags {
+ MLX5_IB_UAPI_PP_ALLOC_FLAGS_DEDICATED_INDEX = 1 << 0,
+};
+
+enum mlx5_ib_uapi_uar_alloc_type {
+ MLX5_IB_UAPI_UAR_ALLOC_TYPE_BF = 0x0,
+ MLX5_IB_UAPI_UAR_ALLOC_TYPE_NC = 0x1,
+};
+
#endif
diff --git a/init/init_task.c b/init/init_task.c
index cbd40460e903..aaa71366d162 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -27,6 +27,7 @@ static struct signal_struct init_signals = {
.multiprocess = HLIST_HEAD_INIT,
.rlim = INIT_RLIMITS,
.cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
+ .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex),
#ifdef CONFIG_POSIX_TIMERS
.posix_timers = LIST_HEAD_INIT(init_signals.posix_timers),
.cputimer = {
diff --git a/kernel/cred.c b/kernel/cred.c
index 809a985b1793..71a792616917 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -675,8 +675,6 @@ void __init cred_init(void)
* The caller may change these controls afterwards if desired.
*
* Returns the new credentials or NULL if out of memory.
- *
- * Does not take, and does not return holding current->cred_replace_mutex.
*/
struct cred *prepare_kernel_cred(struct task_struct *daemon)
{
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 318435c5bf0b..e1459df73043 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1295,7 +1295,7 @@ static void put_ctx(struct perf_event_context *ctx)
* function.
*
* Lock order:
- * cred_guard_mutex
+ * exec_update_mutex
* task_struct::perf_event_mutex
* perf_event_context::mutex
* perf_event::child_mutex;
@@ -11425,14 +11425,14 @@ SYSCALL_DEFINE5(perf_event_open,
}
if (task) {
- err = mutex_lock_interruptible(&task->signal->cred_guard_mutex);
+ err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
if (err)
goto err_task;
/*
* Reuse ptrace permission checks for now.
*
- * We must hold cred_guard_mutex across this and any potential
+ * We must hold exec_update_mutex across this and any potential
* perf_install_in_context() call for this new event to
* serialize against exec() altering our credentials (and the
* perf_event_exit_task() that could imply).
@@ -11721,7 +11721,7 @@ SYSCALL_DEFINE5(perf_event_open,
mutex_unlock(&ctx->mutex);
if (task) {
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
put_task_struct(task);
}
@@ -11757,7 +11757,7 @@ SYSCALL_DEFINE5(perf_event_open,
free_event(event);
err_cred:
if (task)
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
err_task:
if (task)
put_task_struct(task);
@@ -12062,7 +12062,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
/*
* When a child task exits, feed back event values to parent events.
*
- * Can be called with cred_guard_mutex held when called from
+ * Can be called with exec_update_mutex held when called from
* install_exec_creds().
*/
void perf_event_exit_task(struct task_struct *child)
diff --git a/kernel/exit.c b/kernel/exit.c
index d70d47159640..389a88cb3081 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -182,6 +182,7 @@ void put_task_struct_rcu_user(struct task_struct *task)
void release_task(struct task_struct *p)
{
struct task_struct *leader;
+ struct pid *thread_pid;
int zap_leader;
repeat:
/* don't need to get the RCU readlock here - the process is dead and
@@ -190,11 +191,11 @@ void release_task(struct task_struct *p)
atomic_dec(&__task_cred(p)->user->processes);
rcu_read_unlock();
- proc_flush_task(p);
cgroup_release(p);
write_lock_irq(&tasklist_lock);
ptrace_release_task(p);
+ thread_pid = get_pid(p->thread_pid);
__exit_signal(p);
/*
@@ -217,6 +218,7 @@ void release_task(struct task_struct *p)
}
write_unlock_irq(&tasklist_lock);
+ proc_flush_pid(thread_pid);
release_thread(p);
put_task_struct_rcu_user(p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 41af5efeee5c..8a7873a8c8f0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1235,7 +1235,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
struct mm_struct *mm;
int err;
- err = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ err = mutex_lock_killable(&task->signal->exec_update_mutex);
if (err)
return ERR_PTR(err);
@@ -1245,7 +1245,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
mmput(mm);
mm = ERR_PTR(-EACCES);
}
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
return mm;
}
@@ -1605,6 +1605,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->oom_score_adj_min = current->signal->oom_score_adj_min;
mutex_init(&sig->cred_guard_mutex);
+ mutex_init(&sig->exec_update_mutex);
return 0;
}
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index a0e3d7a0e8b8..b3ff9288c6cc 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -173,8 +173,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
/*
* One should have enough rights to inspect task details.
*/
- ret = kcmp_lock(&task1->signal->cred_guard_mutex,
- &task2->signal->cred_guard_mutex);
+ ret = kcmp_lock(&task1->signal->exec_update_mutex,
+ &task2->signal->exec_update_mutex);
if (ret)
goto err;
if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
@@ -229,8 +229,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
}
err_unlock:
- kcmp_unlock(&task1->signal->cred_guard_mutex,
- &task2->signal->cred_guard_mutex);
+ kcmp_unlock(&task1->signal->exec_update_mutex,
+ &task2->signal->exec_update_mutex);
err:
put_task_struct(task1);
put_task_struct(task2);
diff --git a/kernel/pid.c b/kernel/pid.c
index 647b4bb457b5..bc21c0fb26d8 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -144,9 +144,6 @@ void free_pid(struct pid *pid)
/* Handle a fork failure of the first process */
WARN_ON(ns->child_reaper);
ns->pid_allocated = 0;
- /* fall through */
- case 0:
- schedule_work(&ns->proc_work);
break;
}
@@ -257,17 +254,13 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
*/
retval = -ENOMEM;
- if (unlikely(is_child_reaper(pid))) {
- if (pid_ns_prepare_proc(ns))
- goto out_free;
- }
-
get_pid_ns(ns);
refcount_set(&pid->count, 1);
for (type = 0; type < PIDTYPE_MAX; ++type)
INIT_HLIST_HEAD(&pid->tasks[type]);
init_waitqueue_head(&pid->wait_pidfd);
+ INIT_HLIST_HEAD(&pid->inodes);
upid = pid->numbers + ns->level;
spin_lock_irq(&pidmap_lock);
@@ -594,7 +587,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
struct file *file;
int ret;
- ret = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ ret = mutex_lock_killable(&task->signal->exec_update_mutex);
if (ret)
return ERR_PTR(ret);
@@ -603,7 +596,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
else
file = ERR_PTR(-EPERM);
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
return file ?: ERR_PTR(-EBADF);
}
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index d40017e79ebe..01f8ba32cc0c 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -57,12 +57,6 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
return READ_ONCE(*pkc);
}
-static void proc_cleanup_work(struct work_struct *work)
-{
- struct pid_namespace *ns = container_of(work, struct pid_namespace, proc_work);
- pid_ns_release_proc(ns);
-}
-
static struct ucounts *inc_pid_namespaces(struct user_namespace *ns)
{
return inc_ucount(ns, current_euid(), UCOUNT_PID_NAMESPACES);
@@ -114,7 +108,6 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
ns->user_ns = get_user_ns(user_ns);
ns->ucounts = ucounts;
ns->pid_allocated = PIDNS_ADDING;
- INIT_WORK(&ns->proc_work, proc_cleanup_work);
return ns;
@@ -231,20 +224,27 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
} while (rc != -ECHILD);
/*
- * kernel_wait4() above can't reap the EXIT_DEAD children but we do not
- * really care, we could reparent them to the global init. We could
- * exit and reap ->child_reaper even if it is not the last thread in
- * this pid_ns, free_pid(pid_allocated == 0) calls proc_cleanup_work(),
- * pid_ns can not go away until proc_kill_sb() drops the reference.
+ * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
+ * process whose parents processes are outside of the pid
+ * namespace. Such processes are created with setns()+fork().
*
- * But this ns can also have other tasks injected by setns()+fork().
- * Again, ignoring the user visible semantics we do not really need
- * to wait until they are all reaped, but they can be reparented to
- * us and thus we need to ensure that pid->child_reaper stays valid
- * until they all go away. See free_pid()->wake_up_process().
+ * If those EXIT_ZOMBIE processes are not reaped by their
+ * parents before their parents exit, they will be reparented
+ * to pid_ns->child_reaper. Thus pidns->child_reaper needs to
+ * stay valid until they all go away.
*
- * We rely on ignored SIGCHLD, an injected zombie must be autoreaped
- * if reparented.
+ * The code relies on the the pid_ns->child_reaper ignoring
+ * SIGCHILD to cause those EXIT_ZOMBIE processes to be
+ * autoreaped if reparented.
+ *
+ * Semantically it is also desirable to wait for EXIT_ZOMBIE
+ * processes before allowing the child_reaper to be reaped, as
+ * that gives the invariant that when the init process of a
+ * pid namespace is reaped all of the processes in the pid
+ * namespace are gone.
+ *
+ * Once all of the other tasks are gone from the pid_namespace
+ * free_pid() will awaken this task.
*/
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
diff --git a/kernel/signal.c b/kernel/signal.c
index 5b2396350dd1..e58a6c619824 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1931,7 +1931,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
* This is only possible if parent == real_parent.
* Check if it has changed security domain.
*/
- if (tsk->parent_exec_id != tsk->parent->self_exec_id)
+ if (tsk->parent_exec_id != READ_ONCE(tsk->parent->self_exec_id))
sig = SIGCHLD;
}
diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
index 065aa16f448b..95d12e3d6d95 100644
--- a/lib/kunit/Kconfig
+++ b/lib/kunit/Kconfig
@@ -14,6 +14,14 @@ menuconfig KUNIT
if KUNIT
+config KUNIT_DEBUGFS
+ bool "KUnit - Enable /sys/kernel/debug/kunit debugfs representation"
+ help
+ Enable debugfs representation for kunit. Currently this consists
+ of /sys/kernel/debug/kunit//results files for each
+ test suite, which allow users to see results of the last test suite
+ run that occurred.
+
config KUNIT_TEST
tristate "KUnit test for KUnit"
help
diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile
index fab55649b69a..724b94311ca3 100644
--- a/lib/kunit/Makefile
+++ b/lib/kunit/Makefile
@@ -5,6 +5,10 @@ kunit-objs += test.o \
assert.o \
try-catch.o
+ifeq ($(CONFIG_KUNIT_DEBUGFS),y)
+kunit-objs += debugfs.o
+endif
+
obj-$(CONFIG_KUNIT_TEST) += kunit-test.o
# string-stream-test compiles built-in only.
diff --git a/lib/kunit/assert.c b/lib/kunit/assert.c
index b24bebca052d..33acdaa28a7d 100644
--- a/lib/kunit/assert.c
+++ b/lib/kunit/assert.c
@@ -6,6 +6,7 @@
* Author: Brendan Higgins
*/
#include
+#include
#include "string-stream.h"
@@ -53,12 +54,12 @@ void kunit_unary_assert_format(const struct kunit_assert *assert,
kunit_base_assert_format(assert, stream);
if (unary_assert->expected_true)
string_stream_add(stream,
- "\tExpected %s to be true, but is false\n",
- unary_assert->condition);
+ KUNIT_SUBTEST_INDENT "Expected %s to be true, but is false\n",
+ unary_assert->condition);
else
string_stream_add(stream,
- "\tExpected %s to be false, but is true\n",
- unary_assert->condition);
+ KUNIT_SUBTEST_INDENT "Expected %s to be false, but is true\n",
+ unary_assert->condition);
kunit_assert_print_msg(assert, stream);
}
EXPORT_SYMBOL_GPL(kunit_unary_assert_format);
@@ -72,13 +73,13 @@ void kunit_ptr_not_err_assert_format(const struct kunit_assert *assert,
kunit_base_assert_format(assert, stream);
if (!ptr_assert->value) {
string_stream_add(stream,
- "\tExpected %s is not null, but is\n",
- ptr_assert->text);
+ KUNIT_SUBTEST_INDENT "Expected %s is not null, but is\n",
+ ptr_assert->text);
} else if (IS_ERR(ptr_assert->value)) {
string_stream_add(stream,
- "\tExpected %s is not error, but is: %ld\n",
- ptr_assert->text,
- PTR_ERR(ptr_assert->value));
+ KUNIT_SUBTEST_INDENT "Expected %s is not error, but is: %ld\n",
+ ptr_assert->text,
+ PTR_ERR(ptr_assert->value));
}
kunit_assert_print_msg(assert, stream);
}
@@ -92,16 +93,16 @@ void kunit_binary_assert_format(const struct kunit_assert *assert,
kunit_base_assert_format(assert, stream);
string_stream_add(stream,
- "\tExpected %s %s %s, but\n",
- binary_assert->left_text,
- binary_assert->operation,
- binary_assert->right_text);
- string_stream_add(stream, "\t\t%s == %lld\n",
- binary_assert->left_text,
- binary_assert->left_value);
- string_stream_add(stream, "\t\t%s == %lld",
- binary_assert->right_text,
- binary_assert->right_value);
+ KUNIT_SUBTEST_INDENT "Expected %s %s %s, but\n",
+ binary_assert->left_text,
+ binary_assert->operation,
+ binary_assert->right_text);
+ string_stream_add(stream, KUNIT_SUBSUBTEST_INDENT "%s == %lld\n",
+ binary_assert->left_text,
+ binary_assert->left_value);
+ string_stream_add(stream, KUNIT_SUBSUBTEST_INDENT "%s == %lld",
+ binary_assert->right_text,
+ binary_assert->right_value);
kunit_assert_print_msg(assert, stream);
}
EXPORT_SYMBOL_GPL(kunit_binary_assert_format);
@@ -114,16 +115,16 @@ void kunit_binary_ptr_assert_format(const struct kunit_assert *assert,
kunit_base_assert_format(assert, stream);
string_stream_add(stream,
- "\tExpected %s %s %s, but\n",
- binary_assert->left_text,
- binary_assert->operation,
- binary_assert->right_text);
- string_stream_add(stream, "\t\t%s == %pK\n",
- binary_assert->left_text,
- binary_assert->left_value);
- string_stream_add(stream, "\t\t%s == %pK",
- binary_assert->right_text,
- binary_assert->right_value);
+ KUNIT_SUBTEST_INDENT "Expected %s %s %s, but\n",
+ binary_assert->left_text,
+ binary_assert->operation,
+ binary_assert->right_text);
+ string_stream_add(stream, KUNIT_SUBSUBTEST_INDENT "%s == %px\n",
+ binary_assert->left_text,
+ binary_assert->left_value);
+ string_stream_add(stream, KUNIT_SUBSUBTEST_INDENT "%s == %px",
+ binary_assert->right_text,
+ binary_assert->right_value);
kunit_assert_print_msg(assert, stream);
}
EXPORT_SYMBOL_GPL(kunit_binary_ptr_assert_format);
@@ -136,16 +137,16 @@ void kunit_binary_str_assert_format(const struct kunit_assert *assert,
kunit_base_assert_format(assert, stream);
string_stream_add(stream,
- "\tExpected %s %s %s, but\n",
- binary_assert->left_text,
- binary_assert->operation,
- binary_assert->right_text);
- string_stream_add(stream, "\t\t%s == %s\n",
- binary_assert->left_text,
- binary_assert->left_value);
- string_stream_add(stream, "\t\t%s == %s",
- binary_assert->right_text,
- binary_assert->right_value);
+ KUNIT_SUBTEST_INDENT "Expected %s %s %s, but\n",
+ binary_assert->left_text,
+ binary_assert->operation,
+ binary_assert->right_text);
+ string_stream_add(stream, KUNIT_SUBSUBTEST_INDENT "%s == %s\n",
+ binary_assert->left_text,
+ binary_assert->left_value);
+ string_stream_add(stream, KUNIT_SUBSUBTEST_INDENT "%s == %s",
+ binary_assert->right_text,
+ binary_assert->right_value);
kunit_assert_print_msg(assert, stream);
}
EXPORT_SYMBOL_GPL(kunit_binary_str_assert_format);
diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c
new file mode 100644
index 000000000000..9214c493d8b7
--- /dev/null
+++ b/lib/kunit/debugfs.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates.
+ * Author: Alan Maguire
+ */
+
+#include
+#include
+
+#include
+
+#include "string-stream.h"
+
+#define KUNIT_DEBUGFS_ROOT "kunit"
+#define KUNIT_DEBUGFS_RESULTS "results"
+
+/*
+ * Create a debugfs representation of test suites:
+ *
+ * Path Semantics
+ * /sys/kernel/debug/kunit//results Show results of last run for
+ * testsuite
+ *
+ */
+
+static struct dentry *debugfs_rootdir;
+
+void kunit_debugfs_cleanup(void)
+{
+ debugfs_remove_recursive(debugfs_rootdir);
+}
+
+void kunit_debugfs_init(void)
+{
+ if (!debugfs_rootdir)
+ debugfs_rootdir = debugfs_create_dir(KUNIT_DEBUGFS_ROOT, NULL);
+}
+
+static void debugfs_print_result(struct seq_file *seq,
+ struct kunit_suite *suite,
+ struct kunit_case *test_case)
+{
+ if (!test_case || !test_case->log)
+ return;
+
+ seq_printf(seq, "%s", test_case->log);
+}
+
+/*
+ * /sys/kernel/debug/kunit//results shows all results for testsuite.
+ */
+static int debugfs_print_results(struct seq_file *seq, void *v)
+{
+ struct kunit_suite *suite = (struct kunit_suite *)seq->private;
+ bool success = kunit_suite_has_succeeded(suite);
+ struct kunit_case *test_case;
+
+ if (!suite || !suite->log)
+ return 0;
+
+ seq_printf(seq, "%s", suite->log);
+
+ kunit_suite_for_each_test_case(suite, test_case)
+ debugfs_print_result(seq, suite, test_case);
+
+ seq_printf(seq, "%s %d - %s\n",
+ kunit_status_to_string(success), 1, suite->name);
+ return 0;
+}
+
+static int debugfs_release(struct inode *inode, struct file *file)
+{
+ return single_release(inode, file);
+}
+
+static int debugfs_results_open(struct inode *inode, struct file *file)
+{
+ struct kunit_suite *suite;
+
+ suite = (struct kunit_suite *)inode->i_private;
+
+ return single_open(file, debugfs_print_results, suite);
+}
+
+static const struct file_operations debugfs_results_fops = {
+ .open = debugfs_results_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = debugfs_release,
+};
+
+void kunit_debugfs_create_suite(struct kunit_suite *suite)
+{
+ struct kunit_case *test_case;
+
+ /* Allocate logs before creating debugfs representation. */
+ suite->log = kzalloc(KUNIT_LOG_SIZE, GFP_KERNEL);
+ kunit_suite_for_each_test_case(suite, test_case)
+ test_case->log = kzalloc(KUNIT_LOG_SIZE, GFP_KERNEL);
+
+ suite->debugfs = debugfs_create_dir(suite->name, debugfs_rootdir);
+
+ debugfs_create_file(KUNIT_DEBUGFS_RESULTS, S_IFREG | 0444,
+ suite->debugfs,
+ suite, &debugfs_results_fops);
+}
+
+void kunit_debugfs_destroy_suite(struct kunit_suite *suite)
+{
+ struct kunit_case *test_case;
+
+ debugfs_remove_recursive(suite->debugfs);
+ kfree(suite->log);
+ kunit_suite_for_each_test_case(suite, test_case)
+ kfree(test_case->log);
+}
diff --git a/lib/kunit/debugfs.h b/lib/kunit/debugfs.h
new file mode 100644
index 000000000000..dcc7d7556107
--- /dev/null
+++ b/lib/kunit/debugfs.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020, Oracle and/or its affiliates.
+ */
+
+#ifndef _KUNIT_DEBUGFS_H
+#define _KUNIT_DEBUGFS_H
+
+#include
+
+#ifdef CONFIG_KUNIT_DEBUGFS
+
+void kunit_debugfs_create_suite(struct kunit_suite *suite);
+void kunit_debugfs_destroy_suite(struct kunit_suite *suite);
+void kunit_debugfs_init(void);
+void kunit_debugfs_cleanup(void);
+
+#else
+
+static inline void kunit_debugfs_create_suite(struct kunit_suite *suite) { }
+
+static inline void kunit_debugfs_destroy_suite(struct kunit_suite *suite) { }
+
+static inline void kunit_debugfs_init(void) { }
+
+static inline void kunit_debugfs_cleanup(void) { }
+
+#endif /* CONFIG_KUNIT_DEBUGFS */
+
+#endif /* _KUNIT_DEBUGFS_H */
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index ccb8d2e332f7..4f3d36a72f8f 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -134,7 +134,7 @@ static void kunit_resource_test_init_resources(struct kunit *test)
{
struct kunit_test_resource_context *ctx = test->priv;
- kunit_init_test(&ctx->test, "testing_test_init_test");
+ kunit_init_test(&ctx->test, "testing_test_init_test", NULL);
KUNIT_EXPECT_TRUE(test, list_empty(&ctx->test.resources));
}
@@ -301,7 +301,7 @@ static int kunit_resource_test_init(struct kunit *test)
test->priv = ctx;
- kunit_init_test(&ctx->test, "test_test_context");
+ kunit_init_test(&ctx->test, "test_test_context", NULL);
return 0;
}
@@ -329,6 +329,44 @@ static struct kunit_suite kunit_resource_test_suite = {
.exit = kunit_resource_test_exit,
.test_cases = kunit_resource_test_cases,
};
-kunit_test_suites(&kunit_try_catch_test_suite, &kunit_resource_test_suite);
+
+static void kunit_log_test(struct kunit *test);
+
+static struct kunit_case kunit_log_test_cases[] = {
+ KUNIT_CASE(kunit_log_test),
+ {}
+};
+
+static struct kunit_suite kunit_log_test_suite = {
+ .name = "kunit-log-test",
+ .test_cases = kunit_log_test_cases,
+};
+
+static void kunit_log_test(struct kunit *test)
+{
+ struct kunit_suite *suite = &kunit_log_test_suite;
+
+ kunit_log(KERN_INFO, test, "put this in log.");
+ kunit_log(KERN_INFO, test, "this too.");
+ kunit_log(KERN_INFO, suite, "add to suite log.");
+ kunit_log(KERN_INFO, suite, "along with this.");
+
+#ifdef CONFIG_KUNIT_DEBUGFS
+ KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
+ strstr(test->log, "put this in log."));
+ KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
+ strstr(test->log, "this too."));
+ KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
+ strstr(suite->log, "add to suite log."));
+ KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
+ strstr(suite->log, "along with this."));
+#else
+ KUNIT_EXPECT_PTR_EQ(test, test->log, (char *)NULL);
+ KUNIT_EXPECT_PTR_EQ(test, suite->log, (char *)NULL);
+#endif
+}
+
+kunit_test_suites(&kunit_try_catch_test_suite, &kunit_resource_test_suite,
+ &kunit_log_test_suite);
MODULE_LICENSE("GPL v2");
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 9242f932896c..7a6430a7fca0 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -10,6 +10,7 @@
#include
#include
+#include "debugfs.h"
#include "string-stream.h"
#include "try-catch-impl.h"
@@ -28,73 +29,117 @@ static void kunit_print_tap_version(void)
}
}
-static size_t kunit_test_cases_len(struct kunit_case *test_cases)
+/*
+ * Append formatted message to log, size of which is limited to
+ * KUNIT_LOG_SIZE bytes (including null terminating byte).
+ */
+void kunit_log_append(char *log, const char *fmt, ...)
+{
+ char line[KUNIT_LOG_SIZE];
+ va_list args;
+ int len_left;
+
+ if (!log)
+ return;
+
+ len_left = KUNIT_LOG_SIZE - strlen(log) - 1;
+ if (len_left <= 0)
+ return;
+
+ va_start(args, fmt);
+ vsnprintf(line, sizeof(line), fmt, args);
+ va_end(args);
+
+ strncat(log, line, len_left);
+}
+EXPORT_SYMBOL_GPL(kunit_log_append);
+
+size_t kunit_suite_num_test_cases(struct kunit_suite *suite)
{
struct kunit_case *test_case;
size_t len = 0;
- for (test_case = test_cases; test_case->run_case; test_case++)
+ kunit_suite_for_each_test_case(suite, test_case)
len++;
return len;
}
+EXPORT_SYMBOL_GPL(kunit_suite_num_test_cases);
static void kunit_print_subtest_start(struct kunit_suite *suite)
{
kunit_print_tap_version();
- pr_info("\t# Subtest: %s\n", suite->name);
- pr_info("\t1..%zd\n", kunit_test_cases_len(suite->test_cases));
+ kunit_log(KERN_INFO, suite, KUNIT_SUBTEST_INDENT "# Subtest: %s",
+ suite->name);
+ kunit_log(KERN_INFO, suite, KUNIT_SUBTEST_INDENT "1..%zd",
+ kunit_suite_num_test_cases(suite));
}
-static void kunit_print_ok_not_ok(bool should_indent,
+static void kunit_print_ok_not_ok(void *test_or_suite,
+ bool is_test,
bool is_ok,
size_t test_number,
const char *description)
{
- const char *indent, *ok_not_ok;
+ struct kunit_suite *suite = is_test ? NULL : test_or_suite;
+ struct kunit *test = is_test ? test_or_suite : NULL;
- if (should_indent)
- indent = "\t";
+ /*
+ * We do not log the test suite results as doing so would
+ * mean debugfs display would consist of the test suite
+ * description and status prior to individual test results.
+ * Hence directly printk the suite status, and we will
+ * separately seq_printf() the suite status for the debugfs
+ * representation.
+ */
+ if (suite)
+ pr_info("%s %zd - %s",
+ kunit_status_to_string(is_ok),
+ test_number, description);
else
- indent = "";
-
- if (is_ok)
- ok_not_ok = "ok";
- else
- ok_not_ok = "not ok";
-
- pr_info("%s%s %zd - %s\n", indent, ok_not_ok, test_number, description);
+ kunit_log(KERN_INFO, test, KUNIT_SUBTEST_INDENT "%s %zd - %s",
+ kunit_status_to_string(is_ok),
+ test_number, description);
}
-static bool kunit_suite_has_succeeded(struct kunit_suite *suite)
+bool kunit_suite_has_succeeded(struct kunit_suite *suite)
{
const struct kunit_case *test_case;
- for (test_case = suite->test_cases; test_case->run_case; test_case++)
+ kunit_suite_for_each_test_case(suite, test_case) {
if (!test_case->success)
return false;
+ }
return true;
}
+EXPORT_SYMBOL_GPL(kunit_suite_has_succeeded);
static void kunit_print_subtest_end(struct kunit_suite *suite)
{
static size_t kunit_suite_counter = 1;
- kunit_print_ok_not_ok(false,
+ kunit_print_ok_not_ok((void *)suite, false,
kunit_suite_has_succeeded(suite),
kunit_suite_counter++,
suite->name);
}
-static void kunit_print_test_case_ok_not_ok(struct kunit_case *test_case,
- size_t test_number)
+unsigned int kunit_test_case_num(struct kunit_suite *suite,
+ struct kunit_case *test_case)
{
- kunit_print_ok_not_ok(true,
- test_case->success,
- test_number,
- test_case->name);
+ struct kunit_case *tc;
+ unsigned int i = 1;
+
+ kunit_suite_for_each_test_case(suite, tc) {
+ if (tc == test_case)
+ return i;
+ i++;
+ }
+
+ return 0;
}
+EXPORT_SYMBOL_GPL(kunit_test_case_num);
static void kunit_print_string_stream(struct kunit *test,
struct string_stream *stream)
@@ -102,6 +147,9 @@ static void kunit_print_string_stream(struct kunit *test,
struct string_stream_fragment *fragment;
char *buf;
+ if (string_stream_is_empty(stream))
+ return;
+
buf = string_stream_get_string(stream);
if (!buf) {
kunit_err(test,
@@ -175,11 +223,14 @@ void kunit_do_assertion(struct kunit *test,
}
EXPORT_SYMBOL_GPL(kunit_do_assertion);
-void kunit_init_test(struct kunit *test, const char *name)
+void kunit_init_test(struct kunit *test, const char *name, char *log)
{
spin_lock_init(&test->lock);
INIT_LIST_HEAD(&test->resources);
test->name = name;
+ test->log = log;
+ if (test->log)
+ test->log[0] = '\0';
test->success = true;
}
EXPORT_SYMBOL_GPL(kunit_init_test);
@@ -290,7 +341,7 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
struct kunit_try_catch *try_catch;
struct kunit test;
- kunit_init_test(&test, test_case->name);
+ kunit_init_test(&test, test_case->name, test_case->log);
try_catch = &test.try_catch;
kunit_try_catch_init(try_catch,
@@ -303,19 +354,20 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
kunit_try_catch_run(try_catch, &context);
test_case->success = test.success;
+
+ kunit_print_ok_not_ok(&test, true, test_case->success,
+ kunit_test_case_num(suite, test_case),
+ test_case->name);
}
int kunit_run_tests(struct kunit_suite *suite)
{
struct kunit_case *test_case;
- size_t test_case_count = 1;
kunit_print_subtest_start(suite);
- for (test_case = suite->test_cases; test_case->run_case; test_case++) {
+ kunit_suite_for_each_test_case(suite, test_case)
kunit_run_case_catch_errors(suite, test_case);
- kunit_print_test_case_ok_not_ok(test_case, test_case_count++);
- }
kunit_print_subtest_end(suite);
@@ -323,6 +375,37 @@ int kunit_run_tests(struct kunit_suite *suite)
}
EXPORT_SYMBOL_GPL(kunit_run_tests);
+static void kunit_init_suite(struct kunit_suite *suite)
+{
+ kunit_debugfs_create_suite(suite);
+}
+
+int __kunit_test_suites_init(struct kunit_suite **suites)
+{
+ unsigned int i;
+
+ for (i = 0; suites[i] != NULL; i++) {
+ kunit_init_suite(suites[i]);
+ kunit_run_tests(suites[i]);
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__kunit_test_suites_init);
+
+static void kunit_exit_suite(struct kunit_suite *suite)
+{
+ kunit_debugfs_destroy_suite(suite);
+}
+
+void __kunit_test_suites_exit(struct kunit_suite **suites)
+{
+ unsigned int i;
+
+ for (i = 0; suites[i] != NULL; i++)
+ kunit_exit_suite(suites[i]);
+}
+EXPORT_SYMBOL_GPL(__kunit_test_suites_exit);
+
struct kunit_resource *kunit_alloc_and_get_resource(struct kunit *test,
kunit_resource_init_t init,
kunit_resource_free_t free,
@@ -489,12 +572,15 @@ EXPORT_SYMBOL_GPL(kunit_cleanup);
static int __init kunit_init(void)
{
+ kunit_debugfs_init();
+
return 0;
}
late_initcall(kunit_init);
static void __exit kunit_exit(void)
{
+ kunit_debugfs_cleanup();
}
module_exit(kunit_exit);
diff --git a/lib/list-test.c b/lib/list-test.c
index 76babb1df889..ee09505df16f 100644
--- a/lib/list-test.c
+++ b/lib/list-test.c
@@ -659,7 +659,7 @@ static void list_test_list_for_each_prev_safe(struct kunit *test)
static void list_test_list_for_each_entry(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
@@ -680,7 +680,7 @@ static void list_test_list_for_each_entry(struct kunit *test)
static void list_test_list_for_each_entry_reverse(struct kunit *test)
{
struct list_test_struct entries[5], *cur;
- static LIST_HEAD(list);
+ LIST_HEAD(list);
int i = 0;
for (i = 0; i < 5; ++i) {
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index c8fa1d274530..2ee6ae3b0ade 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -55,14 +55,6 @@ struct kmem_cache *radix_tree_node_cachep;
RADIX_TREE_MAP_SHIFT))
#define IDR_PRELOAD_SIZE (IDR_MAX_PATH * 2 - 1)
-/*
- * The IDA is even shorter since it uses a bitmap at the last level.
- */
-#define IDA_INDEX_BITS (8 * sizeof(int) - 1 - ilog2(IDA_BITMAP_BITS))
-#define IDA_MAX_PATH (DIV_ROUND_UP(IDA_INDEX_BITS, \
- RADIX_TREE_MAP_SHIFT))
-#define IDA_PRELOAD_SIZE (IDA_MAX_PATH * 2 - 1)
-
/*
* Per-cpu pool of preloaded nodes
*/
diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 55c14e8c8859..d4f97925dbd8 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -12,6 +12,9 @@
static unsigned int tests_run;
static unsigned int tests_passed;
+static const unsigned int order_limit =
+ IS_ENABLED(CONFIG_XARRAY_MULTI) ? BITS_PER_LONG : 1;
+
#ifndef XA_DEBUG
# ifdef __KERNEL__
void xa_dump(const struct xarray *xa) { }
@@ -959,6 +962,20 @@ static noinline void check_multi_find_2(struct xarray *xa)
}
}
+static noinline void check_multi_find_3(struct xarray *xa)
+{
+ unsigned int order;
+
+ for (order = 5; order < order_limit; order++) {
+ unsigned long index = 1UL << (order - 5);
+
+ XA_BUG_ON(xa, !xa_empty(xa));
+ xa_store_order(xa, 0, order - 4, xa_mk_index(0), GFP_KERNEL);
+ XA_BUG_ON(xa, xa_find_after(xa, &index, ULONG_MAX, XA_PRESENT));
+ xa_erase_index(xa, 0);
+ }
+}
+
static noinline void check_find_1(struct xarray *xa)
{
unsigned long i, j, k;
@@ -1081,6 +1098,7 @@ static noinline void check_find(struct xarray *xa)
for (i = 2; i < 10; i++)
check_multi_find_1(xa, i);
check_multi_find_2(xa);
+ check_multi_find_3(xa);
}
/* See find_swap_entry() in mm/shmem.c */
@@ -1138,6 +1156,42 @@ static noinline void check_find_entry(struct xarray *xa)
XA_BUG_ON(xa, !xa_empty(xa));
}
+static noinline void check_pause(struct xarray *xa)
+{
+ XA_STATE(xas, xa, 0);
+ void *entry;
+ unsigned int order;
+ unsigned long index = 1;
+ unsigned int count = 0;
+
+ for (order = 0; order < order_limit; order++) {
+ XA_BUG_ON(xa, xa_store_order(xa, index, order,
+ xa_mk_index(index), GFP_KERNEL));
+ index += 1UL << order;
+ }
+
+ rcu_read_lock();
+ xas_for_each(&xas, entry, ULONG_MAX) {
+ XA_BUG_ON(xa, entry != xa_mk_index(1UL << count));
+ count++;
+ }
+ rcu_read_unlock();
+ XA_BUG_ON(xa, count != order_limit);
+
+ count = 0;
+ xas_set(&xas, 0);
+ rcu_read_lock();
+ xas_for_each(&xas, entry, ULONG_MAX) {
+ XA_BUG_ON(xa, entry != xa_mk_index(1UL << count));
+ count++;
+ xas_pause(&xas);
+ }
+ rcu_read_unlock();
+ XA_BUG_ON(xa, count != order_limit);
+
+ xa_destroy(xa);
+}
+
static noinline void check_move_tiny(struct xarray *xa)
{
XA_STATE(xas, xa, 0);
@@ -1646,6 +1700,7 @@ static int xarray_checks(void)
check_xa_alloc();
check_find(&array);
check_find_entry(&array);
+ check_pause(&array);
check_account(&array);
check_destroy(&array);
check_move(&array);
diff --git a/lib/xarray.c b/lib/xarray.c
index 1d9fab7db8da..e9e641d3c0c3 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -970,7 +970,7 @@ void xas_pause(struct xa_state *xas)
xas->xa_node = XAS_RESTART;
if (node) {
- unsigned int offset = xas->xa_offset;
+ unsigned long offset = xas->xa_offset;
while (++offset < XA_CHUNK_SIZE) {
if (!xa_is_sibling(xa_entry(xas->xa, node, offset)))
break;
@@ -1208,6 +1208,8 @@ void *xas_find_marked(struct xa_state *xas, unsigned long max, xa_mark_t mark)
}
entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset);
+ if (!entry && !(xa_track_free(xas->xa) && mark == XA_FREE_MARK))
+ continue;
if (!xa_is_node(entry))
return entry;
xas->xa_node = xa_to_node(entry);
@@ -1836,10 +1838,11 @@ static bool xas_sibling(struct xa_state *xas)
struct xa_node *node = xas->xa_node;
unsigned long mask;
- if (!node)
+ if (!IS_ENABLED(CONFIG_XARRAY_MULTI) || !node)
return false;
mask = (XA_CHUNK_SIZE << node->shift) - 1;
- return (xas->xa_index & mask) > (xas->xa_offset << node->shift);
+ return (xas->xa_index & mask) >
+ ((unsigned long)xas->xa_offset << node->shift);
}
/**
diff --git a/mm/hmm.c b/mm/hmm.c
index 72e5a6d9a417..280585833adf 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -28,41 +28,25 @@
struct hmm_vma_walk {
struct hmm_range *range;
- struct dev_pagemap *pgmap;
unsigned long last;
- unsigned int flags;
};
-static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
- bool write_fault, uint64_t *pfn)
+enum {
+ HMM_NEED_FAULT = 1 << 0,
+ HMM_NEED_WRITE_FAULT = 1 << 1,
+ HMM_NEED_ALL_BITS = HMM_NEED_FAULT | HMM_NEED_WRITE_FAULT,
+};
+
+/*
+ * hmm_device_entry_from_pfn() - create a valid device entry value from pfn
+ * @range: range use to encode HMM pfn value
+ * @pfn: pfn value for which to create the device entry
+ * Return: valid device entry for the pfn
+ */
+static uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
+ unsigned long pfn)
{
- unsigned int flags = FAULT_FLAG_REMOTE;
- struct hmm_vma_walk *hmm_vma_walk = walk->private;
- struct hmm_range *range = hmm_vma_walk->range;
- struct vm_area_struct *vma = walk->vma;
- vm_fault_t ret;
-
- if (!vma)
- goto err;
-
- if (hmm_vma_walk->flags & HMM_FAULT_ALLOW_RETRY)
- flags |= FAULT_FLAG_ALLOW_RETRY;
- if (write_fault)
- flags |= FAULT_FLAG_WRITE;
-
- ret = handle_mm_fault(vma, addr, flags);
- if (ret & VM_FAULT_RETRY) {
- /* Note, handle_mm_fault did up_read(&mm->mmap_sem)) */
- return -EAGAIN;
- }
- if (ret & VM_FAULT_ERROR)
- goto err;
-
- return -EBUSY;
-
-err:
- *pfn = range->values[HMM_PFN_ERROR];
- return -EFAULT;
+ return (pfn << range->pfn_shift) | range->flags[HMM_PFN_VALID];
}
static int hmm_pfns_fill(unsigned long addr, unsigned long end,
@@ -79,56 +63,43 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
}
/*
- * hmm_vma_walk_hole_() - handle a range lacking valid pmd or pte(s)
+ * hmm_vma_fault() - fault in a range lacking valid pmd or pte(s)
* @addr: range virtual start address (inclusive)
* @end: range virtual end address (exclusive)
- * @fault: should we fault or not ?
- * @write_fault: write fault ?
+ * @required_fault: HMM_NEED_* flags
* @walk: mm_walk structure
- * Return: 0 on success, -EBUSY after page fault, or page fault error
+ * Return: -EBUSY after page fault, or page fault error
*
* This function will be called whenever pmd_none() or pte_none() returns true,
* or whenever there is no page directory covering the virtual address range.
*/
-static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
- bool fault, bool write_fault,
- struct mm_walk *walk)
+static int hmm_vma_fault(unsigned long addr, unsigned long end,
+ unsigned int required_fault, struct mm_walk *walk)
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
- struct hmm_range *range = hmm_vma_walk->range;
- uint64_t *pfns = range->pfns;
- unsigned long i;
+ struct vm_area_struct *vma = walk->vma;
+ unsigned int fault_flags = FAULT_FLAG_REMOTE;
+ WARN_ON_ONCE(!required_fault);
hmm_vma_walk->last = addr;
- i = (addr - range->start) >> PAGE_SHIFT;
- if (write_fault && walk->vma && !(walk->vma->vm_flags & VM_WRITE))
- return -EPERM;
-
- for (; addr < end; addr += PAGE_SIZE, i++) {
- pfns[i] = range->values[HMM_PFN_NONE];
- if (fault || write_fault) {
- int ret;
-
- ret = hmm_vma_do_fault(walk, addr, write_fault,
- &pfns[i]);
- if (ret != -EBUSY)
- return ret;
- }
+ if (required_fault & HMM_NEED_WRITE_FAULT) {
+ if (!(vma->vm_flags & VM_WRITE))
+ return -EPERM;
+ fault_flags |= FAULT_FLAG_WRITE;
}
- return (fault || write_fault) ? -EBUSY : 0;
+ for (; addr < end; addr += PAGE_SIZE)
+ if (handle_mm_fault(vma, addr, fault_flags) & VM_FAULT_ERROR)
+ return -EFAULT;
+ return -EBUSY;
}
-static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
- uint64_t pfns, uint64_t cpu_flags,
- bool *fault, bool *write_fault)
+static unsigned int hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
+ uint64_t pfns, uint64_t cpu_flags)
{
struct hmm_range *range = hmm_vma_walk->range;
- if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT)
- return;
-
/*
* So we not only consider the individual per page request we also
* consider the default flags requested for the range. The API can
@@ -143,46 +114,44 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
/* We aren't ask to do anything ... */
if (!(pfns & range->flags[HMM_PFN_VALID]))
- return;
- /* If this is device memory then only fault if explicitly requested */
- if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
- /* Do we fault on device memory ? */
- if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
- *write_fault = pfns & range->flags[HMM_PFN_WRITE];
- *fault = true;
- }
- return;
- }
+ return 0;
- /* If CPU page table is not valid then we need to fault */
- *fault = !(cpu_flags & range->flags[HMM_PFN_VALID]);
/* Need to write fault ? */
if ((pfns & range->flags[HMM_PFN_WRITE]) &&
- !(cpu_flags & range->flags[HMM_PFN_WRITE])) {
- *write_fault = true;
- *fault = true;
- }
+ !(cpu_flags & range->flags[HMM_PFN_WRITE]))
+ return HMM_NEED_FAULT | HMM_NEED_WRITE_FAULT;
+
+ /* If CPU page table is not valid then we need to fault */
+ if (!(cpu_flags & range->flags[HMM_PFN_VALID]))
+ return HMM_NEED_FAULT;
+ return 0;
}
-static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
- const uint64_t *pfns, unsigned long npages,
- uint64_t cpu_flags, bool *fault,
- bool *write_fault)
+static unsigned int
+hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
+ const uint64_t *pfns, unsigned long npages,
+ uint64_t cpu_flags)
{
+ struct hmm_range *range = hmm_vma_walk->range;
+ unsigned int required_fault = 0;
unsigned long i;
- if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT) {
- *fault = *write_fault = false;
- return;
- }
+ /*
+ * If the default flags do not request to fault pages, and the mask does
+ * not allow for individual pages to be faulted, then
+ * hmm_pte_need_fault() will always return 0.
+ */
+ if (!((range->default_flags | range->pfn_flags_mask) &
+ range->flags[HMM_PFN_VALID]))
+ return 0;
- *fault = *write_fault = false;
for (i = 0; i < npages; ++i) {
- hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags,
- fault, write_fault);
- if ((*write_fault))
- return;
+ required_fault |=
+ hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags);
+ if (required_fault == HMM_NEED_ALL_BITS)
+ return required_fault;
}
+ return required_fault;
}
static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
@@ -190,16 +159,23 @@ static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
- bool fault, write_fault;
+ unsigned int required_fault;
unsigned long i, npages;
uint64_t *pfns;
i = (addr - range->start) >> PAGE_SHIFT;
npages = (end - addr) >> PAGE_SHIFT;
pfns = &range->pfns[i];
- hmm_range_need_fault(hmm_vma_walk, pfns, npages,
- 0, &fault, &write_fault);
- return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+ required_fault = hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0);
+ if (!walk->vma) {
+ if (required_fault)
+ return -EFAULT;
+ return hmm_pfns_fill(addr, end, range, HMM_PFN_ERROR);
+ }
+ if (required_fault)
+ return hmm_vma_fault(addr, end, required_fault, walk);
+ hmm_vma_walk->last = addr;
+ return hmm_pfns_fill(addr, end, range, HMM_PFN_NONE);
}
static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
@@ -218,31 +194,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
unsigned long pfn, npages, i;
- bool fault, write_fault;
+ unsigned int required_fault;
uint64_t cpu_flags;
npages = (end - addr) >> PAGE_SHIFT;
cpu_flags = pmd_to_hmm_pfn_flags(range, pmd);
- hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
- &fault, &write_fault);
-
- if (pmd_protnone(pmd) || fault || write_fault)
- return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+ required_fault =
+ hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags);
+ if (required_fault)
+ return hmm_vma_fault(addr, end, required_fault, walk);
pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
- for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
- if (pmd_devmap(pmd)) {
- hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
- hmm_vma_walk->pgmap);
- if (unlikely(!hmm_vma_walk->pgmap))
- return -EBUSY;
- }
+ for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
- }
- if (hmm_vma_walk->pgmap) {
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
hmm_vma_walk->last = end;
return 0;
}
@@ -252,6 +216,14 @@ int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
unsigned long end, uint64_t *pfns, pmd_t pmd);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static inline bool hmm_is_device_private_entry(struct hmm_range *range,
+ swp_entry_t entry)
+{
+ return is_device_private_entry(entry) &&
+ device_private_entry_to_page(entry)->pgmap->owner ==
+ range->dev_private_owner;
+}
+
static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
{
if (pte_none(pte) || !pte_present(pte) || pte_protnone(pte))
@@ -267,102 +239,81 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
- bool fault, write_fault;
+ unsigned int required_fault;
uint64_t cpu_flags;
pte_t pte = *ptep;
uint64_t orig_pfn = *pfn;
- *pfn = range->values[HMM_PFN_NONE];
- fault = write_fault = false;
-
if (pte_none(pte)) {
- hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0,
- &fault, &write_fault);
- if (fault || write_fault)
+ required_fault = hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0);
+ if (required_fault)
goto fault;
+ *pfn = range->values[HMM_PFN_NONE];
return 0;
}
if (!pte_present(pte)) {
swp_entry_t entry = pte_to_swp_entry(pte);
- if (!non_swap_entry(entry)) {
- cpu_flags = pte_to_hmm_pfn_flags(range, pte);
- hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
- &fault, &write_fault);
- if (fault || write_fault)
- goto fault;
+ /*
+ * Never fault in device private pages pages, but just report
+ * the PFN even if not present.
+ */
+ if (hmm_is_device_private_entry(range, entry)) {
+ *pfn = hmm_device_entry_from_pfn(range,
+ device_private_entry_to_pfn(entry));
+ *pfn |= range->flags[HMM_PFN_VALID];
+ if (is_write_device_private_entry(entry))
+ *pfn |= range->flags[HMM_PFN_WRITE];
return 0;
}
- /*
- * This is a special swap entry, ignore migration, use
- * device and report anything else as error.
- */
- if (is_device_private_entry(entry)) {
- cpu_flags = range->flags[HMM_PFN_VALID] |
- range->flags[HMM_PFN_DEVICE_PRIVATE];
- cpu_flags |= is_write_device_private_entry(entry) ?
- range->flags[HMM_PFN_WRITE] : 0;
- hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
- &fault, &write_fault);
- if (fault || write_fault)
- goto fault;
- *pfn = hmm_device_entry_from_pfn(range,
- swp_offset(entry));
- *pfn |= cpu_flags;
+ required_fault = hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0);
+ if (!required_fault) {
+ *pfn = range->values[HMM_PFN_NONE];
return 0;
}
+ if (!non_swap_entry(entry))
+ goto fault;
+
if (is_migration_entry(entry)) {
- if (fault || write_fault) {
- pte_unmap(ptep);
- hmm_vma_walk->last = addr;
- migration_entry_wait(walk->mm, pmdp, addr);
- return -EBUSY;
- }
- return 0;
+ pte_unmap(ptep);
+ hmm_vma_walk->last = addr;
+ migration_entry_wait(walk->mm, pmdp, addr);
+ return -EBUSY;
}
/* Report error for everything else */
- *pfn = range->values[HMM_PFN_ERROR];
+ pte_unmap(ptep);
return -EFAULT;
- } else {
- cpu_flags = pte_to_hmm_pfn_flags(range, pte);
- hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
- &fault, &write_fault);
}
- if (fault || write_fault)
+ cpu_flags = pte_to_hmm_pfn_flags(range, pte);
+ required_fault = hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags);
+ if (required_fault)
goto fault;
- if (pte_devmap(pte)) {
- hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
- hmm_vma_walk->pgmap);
- if (unlikely(!hmm_vma_walk->pgmap))
- return -EBUSY;
- } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
- if (!is_zero_pfn(pte_pfn(pte))) {
- *pfn = range->values[HMM_PFN_SPECIAL];
+ /*
+ * Since each architecture defines a struct page for the zero page, just
+ * fall through and treat it like a normal page.
+ */
+ if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+ if (hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0)) {
+ pte_unmap(ptep);
return -EFAULT;
}
- /*
- * Since each architecture defines a struct page for the zero
- * page, just fall through and treat it like a normal page.
- */
+ *pfn = range->values[HMM_PFN_SPECIAL];
+ return 0;
}
*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
return 0;
fault:
- if (hmm_vma_walk->pgmap) {
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
pte_unmap(ptep);
/* Fault any virtual address we were asked to fault */
- return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+ return hmm_vma_fault(addr, end, required_fault, walk);
}
static int hmm_vma_walk_pmd(pmd_t *pmdp,
@@ -372,8 +323,9 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
- uint64_t *pfns = range->pfns;
- unsigned long addr = start, i;
+ uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+ unsigned long npages = (end - start) >> PAGE_SHIFT;
+ unsigned long addr = start;
pte_t *ptep;
pmd_t pmd;
@@ -383,24 +335,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
return hmm_vma_walk_hole(start, end, -1, walk);
if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
- bool fault, write_fault;
- unsigned long npages;
- uint64_t *pfns;
-
- i = (addr - range->start) >> PAGE_SHIFT;
- npages = (end - addr) >> PAGE_SHIFT;
- pfns = &range->pfns[i];
-
- hmm_range_need_fault(hmm_vma_walk, pfns, npages,
- 0, &fault, &write_fault);
- if (fault || write_fault) {
+ if (hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0)) {
hmm_vma_walk->last = addr;
pmd_migration_entry_wait(walk->mm, pmdp);
return -EBUSY;
}
- return 0;
- } else if (!pmd_present(pmd))
+ return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
+ }
+
+ if (!pmd_present(pmd)) {
+ if (hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0))
+ return -EFAULT;
return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+ }
if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
/*
@@ -417,8 +364,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
goto again;
- i = (addr - range->start) >> PAGE_SHIFT;
- return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
+ return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
}
/*
@@ -427,31 +373,23 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
* entry pointing to pte directory or it is a bad pmd that will not
* recover.
*/
- if (pmd_bad(pmd))
+ if (pmd_bad(pmd)) {
+ if (hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0))
+ return -EFAULT;
return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+ }
ptep = pte_offset_map(pmdp, addr);
- i = (addr - range->start) >> PAGE_SHIFT;
- for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
+ for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
int r;
- r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+ r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
if (r) {
- /* hmm_vma_handle_pte() did unmap pte directory */
+ /* hmm_vma_handle_pte() did pte_unmap() */
hmm_vma_walk->last = addr;
return r;
}
}
- if (hmm_vma_walk->pgmap) {
- /*
- * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
- * so that we can leverage get_dev_pagemap() optimization which
- * will not re-take a reference on a pgmap if we already have
- * one.
- */
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
pte_unmap(ptep - 1);
hmm_vma_walk->last = addr;
@@ -487,18 +425,18 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
pud = READ_ONCE(*pudp);
if (pud_none(pud)) {
- ret = hmm_vma_walk_hole(start, end, -1, walk);
- goto out_unlock;
+ spin_unlock(ptl);
+ return hmm_vma_walk_hole(start, end, -1, walk);
}
if (pud_huge(pud) && pud_devmap(pud)) {
unsigned long i, npages, pfn;
+ unsigned int required_fault;
uint64_t *pfns, cpu_flags;
- bool fault, write_fault;
if (!pud_present(pud)) {
- ret = hmm_vma_walk_hole(start, end, -1, walk);
- goto out_unlock;
+ spin_unlock(ptl);
+ return hmm_vma_walk_hole(start, end, -1, walk);
}
i = (addr - range->start) >> PAGE_SHIFT;
@@ -506,29 +444,17 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
pfns = &range->pfns[i];
cpu_flags = pud_to_hmm_pfn_flags(range, pud);
- hmm_range_need_fault(hmm_vma_walk, pfns, npages,
- cpu_flags, &fault, &write_fault);
- if (fault || write_fault) {
- ret = hmm_vma_walk_hole_(addr, end, fault,
- write_fault, walk);
- goto out_unlock;
+ required_fault = hmm_range_need_fault(hmm_vma_walk, pfns,
+ npages, cpu_flags);
+ if (required_fault) {
+ spin_unlock(ptl);
+ return hmm_vma_fault(addr, end, required_fault, walk);
}
pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
- for (i = 0; i < npages; ++i, ++pfn) {
- hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
- hmm_vma_walk->pgmap);
- if (unlikely(!hmm_vma_walk->pgmap)) {
- ret = -EBUSY;
- goto out_unlock;
- }
+ for (i = 0; i < npages; ++i, ++pfn)
pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
cpu_flags;
- }
- if (hmm_vma_walk->pgmap) {
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
hmm_vma_walk->last = end;
goto out_unlock;
}
@@ -554,24 +480,20 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
struct hmm_range *range = hmm_vma_walk->range;
struct vm_area_struct *vma = walk->vma;
uint64_t orig_pfn, cpu_flags;
- bool fault, write_fault;
+ unsigned int required_fault;
spinlock_t *ptl;
pte_t entry;
- int ret = 0;
ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
entry = huge_ptep_get(pte);
i = (start - range->start) >> PAGE_SHIFT;
orig_pfn = range->pfns[i];
- range->pfns[i] = range->values[HMM_PFN_NONE];
cpu_flags = pte_to_hmm_pfn_flags(range, entry);
- fault = write_fault = false;
- hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
- &fault, &write_fault);
- if (fault || write_fault) {
- ret = -ENOENT;
- goto unlock;
+ required_fault = hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags);
+ if (required_fault) {
+ spin_unlock(ptl);
+ return hmm_vma_fault(addr, end, required_fault, walk);
}
pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT);
@@ -579,14 +501,8 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
cpu_flags;
hmm_vma_walk->last = end;
-
-unlock:
spin_unlock(ptl);
-
- if (ret == -ENOENT)
- return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
-
- return ret;
+ return 0;
}
#else
#define hmm_vma_walk_hugetlb_entry NULL
@@ -599,40 +515,32 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
struct hmm_range *range = hmm_vma_walk->range;
struct vm_area_struct *vma = walk->vma;
+ if (!(vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) &&
+ vma->vm_flags & VM_READ)
+ return 0;
+
/*
- * Skip vma ranges that don't have struct page backing them or
- * map I/O devices directly.
+ * vma ranges that don't have struct page backing them or map I/O
+ * devices directly cannot be handled by hmm_range_fault().
+ *
+ * If the vma does not allow read access, then assume that it does not
+ * allow write access either. HMM does not support architectures that
+ * allow write without read.
+ *
+ * If a fault is requested for an unsupported range then it is a hard
+ * failure.
*/
- if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
+ if (hmm_range_need_fault(hmm_vma_walk,
+ range->pfns +
+ ((start - range->start) >> PAGE_SHIFT),
+ (end - start) >> PAGE_SHIFT, 0))
return -EFAULT;
- /*
- * If the vma does not allow read access, then assume that it does not
- * allow write access either. HMM does not support architectures
- * that allow write without read.
- */
- if (!(vma->vm_flags & VM_READ)) {
- bool fault, write_fault;
+ hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+ hmm_vma_walk->last = end;
- /*
- * Check to see if a fault is requested for any page in the
- * range.
- */
- hmm_range_need_fault(hmm_vma_walk, range->pfns +
- ((start - range->start) >> PAGE_SHIFT),
- (end - start) >> PAGE_SHIFT,
- 0, &fault, &write_fault);
- if (fault || write_fault)
- return -EFAULT;
-
- hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
- hmm_vma_walk->last = end;
-
- /* Skip this vma and continue processing the next vma. */
- return 1;
- }
-
- return 0;
+ /* Skip this vma and continue processing the next vma. */
+ return 1;
}
static const struct mm_walk_ops hmm_walk_ops = {
@@ -645,8 +553,7 @@ static const struct mm_walk_ops hmm_walk_ops = {
/**
* hmm_range_fault - try to fault some address in a virtual address range
- * @range: range being faulted
- * @flags: HMM_FAULT_* flags
+ * @range: argument structure
*
* Return: the number of valid pages in range->pfns[] (from range start
* address), which may be zero. On error one of the following status codes
@@ -657,26 +564,19 @@ static const struct mm_walk_ops hmm_walk_ops = {
* -ENOMEM: Out of memory.
* -EPERM: Invalid permission (e.g., asking for write and range is read
* only).
- * -EAGAIN: A page fault needs to be retried and mmap_sem was dropped.
* -EBUSY: The range has been invalidated and the caller needs to wait for
* the invalidation to finish.
- * -EFAULT: Invalid (i.e., either no valid vma or it is illegal to access
- * that range) number of valid pages in range->pfns[] (from
- * range start address).
+ * -EFAULT: A page was requested to be valid and could not be made valid
+ * ie it has no backing VMA or it is illegal to access
*
- * This is similar to a regular CPU page fault except that it will not trigger
- * any memory migration if the memory being faulted is not accessible by CPUs
- * and caller does not ask for migration.
- *
- * On error, for one virtual address in the range, the function will mark the
- * corresponding HMM pfn entry with an error flag.
+ * This is similar to get_user_pages(), except that it can read the page tables
+ * without mutating them (ie causing faults).
*/
-long hmm_range_fault(struct hmm_range *range, unsigned int flags)
+long hmm_range_fault(struct hmm_range *range)
{
struct hmm_vma_walk hmm_vma_walk = {
.range = range,
.last = range->start,
- .flags = flags,
};
struct mm_struct *mm = range->notifier->mm;
int ret;
diff --git a/mm/memremap.c b/mm/memremap.c
index 09b5b7adc773..9b2c97ceb775 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -181,6 +181,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
WARN(1, "Missing migrate_to_ram method\n");
return ERR_PTR(-EINVAL);
}
+ if (!pgmap->owner) {
+ WARN(1, "Missing owner\n");
+ return ERR_PTR(-EINVAL);
+ }
break;
case MEMORY_DEVICE_FS_DAX:
if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
diff --git a/mm/migrate.c b/mm/migrate.c
index b1092876e537..7605d2c23433 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2241,7 +2241,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
arch_enter_lazy_mmu_mode();
for (; addr < end; addr += PAGE_SIZE, ptep++) {
- unsigned long mpfn, pfn;
+ unsigned long mpfn = 0, pfn;
struct page *page;
swp_entry_t entry;
pte_t pte;
@@ -2255,8 +2255,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
}
if (!pte_present(pte)) {
- mpfn = 0;
-
/*
* Only care about unaddressable device page special
* page table entry. Other special swap entries are not
@@ -2267,11 +2265,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
goto next;
page = device_private_entry_to_page(entry);
+ if (page->pgmap->owner != migrate->src_owner)
+ goto next;
+
mpfn = migrate_pfn(page_to_pfn(page)) |
MIGRATE_PFN_MIGRATE;
if (is_write_device_private_entry(entry))
mpfn |= MIGRATE_PFN_WRITE;
} else {
+ if (migrate->src_owner)
+ goto next;
pfn = pte_pfn(pte);
if (is_zero_pfn(pfn)) {
mpfn = MIGRATE_PFN_MIGRATE;
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index de41e830cdac..74e957e302fe 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -206,7 +206,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov_iter *iter,
if (!mm || IS_ERR(mm)) {
rc = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
/*
- * Explicitly map EACCES to EPERM as EPERM is a more a
+ * Explicitly map EACCES to EPERM as EPERM is a more
* appropriate error code for process_vw_readv/writev
*/
if (rc == -EACCES)
diff --git a/tools/testing/kunit/.gitattributes b/tools/testing/kunit/.gitattributes
new file mode 100644
index 000000000000..5b7da1fc3b8f
--- /dev/null
+++ b/tools/testing/kunit/.gitattributes
@@ -0,0 +1 @@
+test_data/* binary
diff --git a/tools/testing/kunit/configs/broken_on_uml.config b/tools/testing/kunit/configs/broken_on_uml.config
new file mode 100644
index 000000000000..239b9f03da2c
--- /dev/null
+++ b/tools/testing/kunit/configs/broken_on_uml.config
@@ -0,0 +1,41 @@
+# These are currently broken on UML and prevent allyesconfig from building
+# CONFIG_STATIC_LINK is not set
+# CONFIG_UML_NET_VECTOR is not set
+# CONFIG_UML_NET_VDE is not set
+# CONFIG_UML_NET_PCAP is not set
+# CONFIG_NET_PTP_CLASSIFY is not set
+# CONFIG_IP_VS is not set
+# CONFIG_BRIDGE_EBT_BROUTE is not set
+# CONFIG_BRIDGE_EBT_T_FILTER is not set
+# CONFIG_BRIDGE_EBT_T_NAT is not set
+# CONFIG_MTD_NAND_CADENCE is not set
+# CONFIG_MTD_NAND_NANDSIM is not set
+# CONFIG_BLK_DEV_NULL_BLK is not set
+# CONFIG_BLK_DEV_RAM is not set
+# CONFIG_SCSI_DEBUG is not set
+# CONFIG_NET_VENDOR_XILINX is not set
+# CONFIG_NULL_TTY is not set
+# CONFIG_PTP_1588_CLOCK is not set
+# CONFIG_PINCTRL_EQUILIBRIUM is not set
+# CONFIG_DMABUF_SELFTESTS is not set
+# CONFIG_COMEDI is not set
+# CONFIG_XIL_AXIS_FIFO is not set
+# CONFIG_EXFAT_FS is not set
+# CONFIG_STM_DUMMY is not set
+# CONFIG_FSI_MASTER_ASPEED is not set
+# CONFIG_JFS_FS is not set
+# CONFIG_UBIFS_FS is not set
+# CONFIG_CRAMFS is not set
+# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
+# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
+# CONFIG_KCOV is not set
+# CONFIG_LKDTM is not set
+# CONFIG_REED_SOLOMON_TEST is not set
+# CONFIG_TEST_RHASHTABLE is not set
+# CONFIG_TEST_MEMINIT is not set
+# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
+# CONFIG_DEBUG_INFO_BTF is not set
+# CONFIG_PTP_1588_CLOCK_INES is not set
+# CONFIG_QCOM_CPR is not set
+# CONFIG_RESET_BRCMSTB_RESCAL is not set
+# CONFIG_RESET_INTEL_GW is not set
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 180ad1e1b04f..7dca74774dd2 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -22,7 +22,9 @@ import kunit_parser
KunitResult = namedtuple('KunitResult', ['status','result'])
-KunitRequest = namedtuple('KunitRequest', ['raw_output','timeout', 'jobs', 'build_dir', 'defconfig'])
+KunitRequest = namedtuple('KunitRequest', ['raw_output','timeout', 'jobs',
+ 'build_dir', 'defconfig',
+ 'alltests', 'make_options'])
KernelDirectoryPath = sys.argv[0].split('tools/testing/kunit/')[0]
@@ -47,7 +49,7 @@ def get_kernel_root_path():
def run_tests(linux: kunit_kernel.LinuxSourceTree,
request: KunitRequest) -> KunitResult:
config_start = time.time()
- success = linux.build_reconfig(request.build_dir)
+ success = linux.build_reconfig(request.build_dir, request.make_options)
config_end = time.time()
if not success:
return KunitResult(KunitStatus.CONFIG_FAILURE, 'could not configure kernel')
@@ -55,24 +57,24 @@ def run_tests(linux: kunit_kernel.LinuxSourceTree,
kunit_parser.print_with_timestamp('Building KUnit Kernel ...')
build_start = time.time()
- success = linux.build_um_kernel(request.jobs, request.build_dir)
+ success = linux.build_um_kernel(request.alltests,
+ request.jobs,
+ request.build_dir,
+ request.make_options)
build_end = time.time()
if not success:
return KunitResult(KunitStatus.BUILD_FAILURE, 'could not build kernel')
kunit_parser.print_with_timestamp('Starting KUnit Kernel ...')
test_start = time.time()
-
- test_result = kunit_parser.TestResult(kunit_parser.TestStatus.SUCCESS,
- [],
- 'Tests not Parsed.')
+ kunit_output = linux.run_kernel(
+ timeout=None if request.alltests else request.timeout,
+ build_dir=request.build_dir)
if request.raw_output:
- kunit_parser.raw_output(
- linux.run_kernel(timeout=request.timeout,
- build_dir=request.build_dir))
+ raw_output = kunit_parser.raw_output(kunit_output)
+ isolated = list(kunit_parser.isolate_kunit_output(raw_output))
+ test_result = kunit_parser.parse_test_result(isolated)
else:
- kunit_output = linux.run_kernel(timeout=request.timeout,
- build_dir=request.build_dir)
test_result = kunit_parser.parse_run_tests(kunit_output)
test_end = time.time()
@@ -120,6 +122,14 @@ def main(argv, linux=None):
help='Uses a default .kunitconfig.',
action='store_true')
+ run_parser.add_argument('--alltests',
+ help='Run all KUnit tests through allyesconfig',
+ action='store_true')
+
+ run_parser.add_argument('--make_options',
+ help='X=Y make option, can be repeated.',
+ action='append')
+
cli_args = parser.parse_args(argv)
if cli_args.subcommand == 'run':
@@ -143,7 +153,9 @@ def main(argv, linux=None):
cli_args.timeout,
cli_args.jobs,
cli_args.build_dir,
- cli_args.defconfig)
+ cli_args.defconfig,
+ cli_args.alltests,
+ cli_args.make_options)
result = run_tests(linux, request)
if result.status != KunitStatus.SUCCESS:
sys.exit(1)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index ebf3942b23f5..e75063d603b5 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -9,16 +9,18 @@
import collections
import re
-CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_\w+ is not set$'
-CONFIG_PATTERN = r'^CONFIG_\w+=\S+$'
-
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['raw_entry'])
+CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
+CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+)$'
+KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
def __str__(self) -> str:
- return self.raw_entry
+ if self.value == 'n':
+ return r'# CONFIG_%s is not set' % (self.name)
+ else:
+ return r'CONFIG_%s=%s' % (self.name, self.value)
class KconfigParseError(Exception):
@@ -38,7 +40,17 @@ class Kconfig(object):
self._entries.append(entry)
def is_subset_of(self, other: 'Kconfig') -> bool:
- return self.entries().issubset(other.entries())
+ for a in self.entries():
+ found = False
+ for b in other.entries():
+ if a.name != b.name:
+ continue
+ if a.value != b.value:
+ return False
+ found = True
+ if a.value != 'n' and found == False:
+ return False
+ return True
def write_to_file(self, path: str) -> None:
with open(path, 'w') as f:
@@ -54,9 +66,20 @@ class Kconfig(object):
line = line.strip()
if not line:
continue
- elif config_matcher.match(line) or is_not_set_matcher.match(line):
- self._entries.append(KconfigEntry(line))
- elif line[0] == '#':
+
+ match = config_matcher.match(line)
+ if match:
+ entry = KconfigEntry(match.group(1), match.group(2))
+ self.add_entry(entry)
+ continue
+
+ empty_match = is_not_set_matcher.match(line)
+ if empty_match:
+ entry = KconfigEntry(empty_match.group(1), 'n')
+ self.add_entry(entry)
+ continue
+
+ if line[0] == '#':
continue
else:
raise KconfigParseError('Failed to parse: ' + line)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index d99ae75ef72f..63dbda2d029f 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -10,11 +10,16 @@
import logging
import subprocess
import os
+import signal
+
+from contextlib import ExitStack
import kunit_config
+import kunit_parser
KCONFIG_PATH = '.config'
kunitconfig_path = '.kunitconfig'
+BROKEN_ALLCONFIG_PATH = 'tools/testing/kunit/configs/broken_on_uml.config'
class ConfigError(Exception):
"""Represents an error trying to configure the Linux kernel."""
@@ -35,19 +40,40 @@ class LinuxSourceTreeOperations(object):
except subprocess.CalledProcessError as e:
raise ConfigError(e.output)
- def make_olddefconfig(self, build_dir):
+ def make_olddefconfig(self, build_dir, make_options):
command = ['make', 'ARCH=um', 'olddefconfig']
+ if make_options:
+ command.extend(make_options)
if build_dir:
command += ['O=' + build_dir]
try:
- subprocess.check_output(command)
+ subprocess.check_output(command, stderr=subprocess.PIPE)
except OSError as e:
raise ConfigError('Could not call make command: ' + e)
except subprocess.CalledProcessError as e:
raise ConfigError(e.output)
- def make(self, jobs, build_dir):
+ def make_allyesconfig(self):
+ kunit_parser.print_with_timestamp(
+ 'Enabling all CONFIGs for UML...')
+ process = subprocess.Popen(
+ ['make', 'ARCH=um', 'allyesconfig'],
+ stdout=subprocess.DEVNULL,
+ stderr=subprocess.STDOUT)
+ process.wait()
+ kunit_parser.print_with_timestamp(
+ 'Disabling broken configs to run KUnit tests...')
+ with ExitStack() as es:
+ config = open(KCONFIG_PATH, 'a')
+ disable = open(BROKEN_ALLCONFIG_PATH, 'r').read()
+ config.write(disable)
+ kunit_parser.print_with_timestamp(
+ 'Starting Kernel with all configs takes a few minutes...')
+
+ def make(self, jobs, build_dir, make_options):
command = ['make', 'ARCH=um', '--jobs=' + str(jobs)]
+ if make_options:
+ command.extend(make_options)
if build_dir:
command += ['O=' + build_dir]
try:
@@ -57,18 +83,16 @@ class LinuxSourceTreeOperations(object):
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
- def linux_bin(self, params, timeout, build_dir):
+ def linux_bin(self, params, timeout, build_dir, outfile):
"""Runs the Linux UML binary. Must be named 'linux'."""
linux_bin = './linux'
if build_dir:
linux_bin = os.path.join(build_dir, 'linux')
- process = subprocess.Popen(
- [linux_bin] + params,
- stdin=subprocess.PIPE,
- stdout=subprocess.PIPE,
- stderr=subprocess.PIPE)
- process.wait(timeout=timeout)
- return process
+ with open(outfile, 'w') as output:
+ process = subprocess.Popen([linux_bin] + params,
+ stdout=output,
+ stderr=subprocess.STDOUT)
+ process.wait(timeout)
def get_kconfig_path(build_dir):
@@ -84,6 +108,7 @@ class LinuxSourceTree(object):
self._kconfig = kunit_config.Kconfig()
self._kconfig.read_from_file(kunitconfig_path)
self._ops = LinuxSourceTreeOperations()
+ signal.signal(signal.SIGINT, self.signal_handler)
def clean(self):
try:
@@ -107,19 +132,19 @@ class LinuxSourceTree(object):
return False
return True
- def build_config(self, build_dir):
+ def build_config(self, build_dir, make_options):
kconfig_path = get_kconfig_path(build_dir)
if build_dir and not os.path.exists(build_dir):
os.mkdir(build_dir)
self._kconfig.write_to_file(kconfig_path)
try:
- self._ops.make_olddefconfig(build_dir)
+ self._ops.make_olddefconfig(build_dir, make_options)
except ConfigError as e:
logging.error(e)
return False
return self.validate_config(build_dir)
- def build_reconfig(self, build_dir):
+ def build_reconfig(self, build_dir, make_options):
"""Creates a new .config if it is not a subset of the .kunitconfig."""
kconfig_path = get_kconfig_path(build_dir)
if os.path.exists(kconfig_path):
@@ -128,26 +153,33 @@ class LinuxSourceTree(object):
if not self._kconfig.is_subset_of(existing_kconfig):
print('Regenerating .config ...')
os.remove(kconfig_path)
- return self.build_config(build_dir)
+ return self.build_config(build_dir, make_options)
else:
return True
else:
print('Generating .config ...')
- return self.build_config(build_dir)
+ return self.build_config(build_dir, make_options)
- def build_um_kernel(self, jobs, build_dir):
+ def build_um_kernel(self, alltests, jobs, build_dir, make_options):
+ if alltests:
+ self._ops.make_allyesconfig()
try:
- self._ops.make_olddefconfig(build_dir)
- self._ops.make(jobs, build_dir)
+ self._ops.make_olddefconfig(build_dir, make_options)
+ self._ops.make(jobs, build_dir, make_options)
except (ConfigError, BuildError) as e:
logging.error(e)
return False
return self.validate_config(build_dir)
- def run_kernel(self, args=[], timeout=None, build_dir=''):
- args.extend(['mem=256M'])
- process = self._ops.linux_bin(args, timeout, build_dir)
- with open(os.path.join(build_dir, 'test.log'), 'w') as f:
- for line in process.stdout:
- f.write(line.rstrip().decode('ascii') + '\n')
- yield line.rstrip().decode('ascii')
+ def run_kernel(self, args=[], build_dir='', timeout=None):
+ args.extend(['mem=1G'])
+ outfile = 'test.log'
+ self._ops.linux_bin(args, timeout, build_dir, outfile)
+ subprocess.call(['stty', 'sane'])
+ with open(outfile, 'r') as file:
+ for line in file:
+ yield line
+
+ def signal_handler(self, sig, frame):
+ logging.error('Build interruption occurred. Cleaning console.')
+ subprocess.call(['stty', 'sane'])
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 4ffbae0f6732..64aac9dcd431 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -46,23 +46,26 @@ class TestStatus(Enum):
TEST_CRASHED = auto()
NO_TESTS = auto()
-kunit_start_re = re.compile(r'^TAP version [0-9]+$')
-kunit_end_re = re.compile('List of all partitions:')
+kunit_start_re = re.compile(r'TAP version [0-9]+$')
+kunit_end_re = re.compile('(List of all partitions:|'
+ 'Kernel panic - not syncing: VFS:|reboot: System halted)')
def isolate_kunit_output(kernel_output):
started = False
for line in kernel_output:
- if kunit_start_re.match(line):
+ if kunit_start_re.search(line):
+ prefix_len = len(line.split('TAP version')[0])
started = True
- yield line
- elif kunit_end_re.match(line):
+ yield line[prefix_len:] if prefix_len > 0 else line
+ elif kunit_end_re.search(line):
break
elif started:
- yield line
+ yield line[prefix_len:] if prefix_len > 0 else line
def raw_output(kernel_output):
for line in kernel_output:
print(line)
+ yield line
DIVIDER = '=' * 60
@@ -91,7 +94,7 @@ def print_log(log):
for m in log:
print_with_timestamp(m)
-TAP_ENTRIES = re.compile(r'^(TAP|\t?ok|\t?not ok|\t?[0-9]+\.\.[0-9]+|\t?#).*$')
+TAP_ENTRIES = re.compile(r'^(TAP|[\s]*ok|[\s]*not ok|[\s]*[0-9]+\.\.[0-9]+|[\s]*#).*$')
def consume_non_diagnositic(lines: List[str]) -> None:
while lines and not TAP_ENTRIES.match(lines[0]):
@@ -104,22 +107,20 @@ def save_non_diagnositic(lines: List[str], test_case: TestCase) -> None:
OkNotOkResult = namedtuple('OkNotOkResult', ['is_ok','description', 'text'])
-OK_NOT_OK_SUBTEST = re.compile(r'^\t(ok|not ok) [0-9]+ - (.*)$')
+OK_NOT_OK_SUBTEST = re.compile(r'^[\s]+(ok|not ok) [0-9]+ - (.*)$')
OK_NOT_OK_MODULE = re.compile(r'^(ok|not ok) [0-9]+ - (.*)$')
-def parse_ok_not_ok_test_case(lines: List[str],
- test_case: TestCase,
- expecting_test_case: bool) -> bool:
+def parse_ok_not_ok_test_case(lines: List[str], test_case: TestCase) -> bool:
save_non_diagnositic(lines, test_case)
if not lines:
- if expecting_test_case:
- test_case.status = TestStatus.TEST_CRASHED
- return True
- else:
- return False
+ test_case.status = TestStatus.TEST_CRASHED
+ return True
line = lines[0]
match = OK_NOT_OK_SUBTEST.match(line)
+ while not match and lines:
+ line = lines.pop(0)
+ match = OK_NOT_OK_SUBTEST.match(line)
if match:
test_case.log.append(lines.pop(0))
test_case.name = match.group(2)
@@ -133,7 +134,7 @@ def parse_ok_not_ok_test_case(lines: List[str],
else:
return False
-SUBTEST_DIAGNOSTIC = re.compile(r'^\t# .*?: (.*)$')
+SUBTEST_DIAGNOSTIC = re.compile(r'^[\s]+# .*?: (.*)$')
DIAGNOSTIC_CRASH_MESSAGE = 'kunit test case crashed!'
def parse_diagnostic(lines: List[str], test_case: TestCase) -> bool:
@@ -150,17 +151,17 @@ def parse_diagnostic(lines: List[str], test_case: TestCase) -> bool:
else:
return False
-def parse_test_case(lines: List[str], expecting_test_case: bool) -> TestCase:
+def parse_test_case(lines: List[str]) -> TestCase:
test_case = TestCase()
save_non_diagnositic(lines, test_case)
while parse_diagnostic(lines, test_case):
pass
- if parse_ok_not_ok_test_case(lines, test_case, expecting_test_case):
+ if parse_ok_not_ok_test_case(lines, test_case):
return test_case
else:
return None
-SUBTEST_HEADER = re.compile(r'^\t# Subtest: (.*)$')
+SUBTEST_HEADER = re.compile(r'^[\s]+# Subtest: (.*)$')
def parse_subtest_header(lines: List[str]) -> str:
consume_non_diagnositic(lines)
@@ -173,7 +174,7 @@ def parse_subtest_header(lines: List[str]) -> str:
else:
return None
-SUBTEST_PLAN = re.compile(r'\t[0-9]+\.\.([0-9]+)')
+SUBTEST_PLAN = re.compile(r'[\s]+[0-9]+\.\.([0-9]+)')
def parse_subtest_plan(lines: List[str]) -> int:
consume_non_diagnositic(lines)
@@ -234,11 +235,11 @@ def parse_test_suite(lines: List[str]) -> TestSuite:
expected_test_case_num = parse_subtest_plan(lines)
if not expected_test_case_num:
return None
- test_case = parse_test_case(lines, expected_test_case_num > 0)
- expected_test_case_num -= 1
- while test_case:
+ while expected_test_case_num > 0:
+ test_case = parse_test_case(lines)
+ if not test_case:
+ break
test_suite.cases.append(test_case)
- test_case = parse_test_case(lines, expected_test_case_num > 0)
expected_test_case_num -= 1
if parse_ok_not_ok_test_suite(lines, test_suite):
test_suite.status = bubble_up_test_case_errors(test_suite)
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index cba97756ac4a..984588d6ba95 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -37,7 +37,7 @@ class KconfigTest(unittest.TestCase):
self.assertTrue(kconfig0.is_subset_of(kconfig0))
kconfig1 = kunit_config.Kconfig()
- kconfig1.add_entry(kunit_config.KconfigEntry('CONFIG_TEST=y'))
+ kconfig1.add_entry(kunit_config.KconfigEntry('TEST', 'y'))
self.assertTrue(kconfig1.is_subset_of(kconfig1))
self.assertTrue(kconfig0.is_subset_of(kconfig1))
self.assertFalse(kconfig1.is_subset_of(kconfig0))
@@ -51,15 +51,15 @@ class KconfigTest(unittest.TestCase):
expected_kconfig = kunit_config.Kconfig()
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_UML=y'))
+ kunit_config.KconfigEntry('UML', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_MMU=y'))
+ kunit_config.KconfigEntry('MMU', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_TEST=y'))
+ kunit_config.KconfigEntry('TEST', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_EXAMPLE_TEST=y'))
+ kunit_config.KconfigEntry('EXAMPLE_TEST', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('# CONFIG_MK8 is not set'))
+ kunit_config.KconfigEntry('MK8', 'n'))
self.assertEqual(kconfig.entries(), expected_kconfig.entries())
@@ -68,15 +68,15 @@ class KconfigTest(unittest.TestCase):
expected_kconfig = kunit_config.Kconfig()
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_UML=y'))
+ kunit_config.KconfigEntry('UML', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_MMU=y'))
+ kunit_config.KconfigEntry('MMU', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_TEST=y'))
+ kunit_config.KconfigEntry('TEST', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('CONFIG_EXAMPLE_TEST=y'))
+ kunit_config.KconfigEntry('EXAMPLE_TEST', 'y'))
expected_kconfig.add_entry(
- kunit_config.KconfigEntry('# CONFIG_MK8 is not set'))
+ kunit_config.KconfigEntry('MK8', 'n'))
expected_kconfig.write_to_file(kconfig_path)
@@ -108,6 +108,36 @@ class KUnitParserTest(unittest.TestCase):
self.assertContains('ok 1 - example', result)
file.close()
+ def test_output_with_prefix_isolated_correctly(self):
+ log_path = get_absolute_path(
+ 'test_data/test_pound_sign.log')
+ with open(log_path) as file:
+ result = kunit_parser.isolate_kunit_output(file.readlines())
+ self.assertContains('TAP version 14\n', result)
+ self.assertContains(' # Subtest: kunit-resource-test', result)
+ self.assertContains(' 1..5', result)
+ self.assertContains(' ok 1 - kunit_resource_test_init_resources', result)
+ self.assertContains(' ok 2 - kunit_resource_test_alloc_resource', result)
+ self.assertContains(' ok 3 - kunit_resource_test_destroy_resource', result)
+ self.assertContains(' foo bar #', result)
+ self.assertContains(' ok 4 - kunit_resource_test_cleanup_resources', result)
+ self.assertContains(' ok 5 - kunit_resource_test_proper_free_ordering', result)
+ self.assertContains('ok 1 - kunit-resource-test', result)
+ self.assertContains(' foo bar # non-kunit output', result)
+ self.assertContains(' # Subtest: kunit-try-catch-test', result)
+ self.assertContains(' 1..2', result)
+ self.assertContains(' ok 1 - kunit_test_try_catch_successful_try_no_catch',
+ result)
+ self.assertContains(' ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch',
+ result)
+ self.assertContains('ok 2 - kunit-try-catch-test', result)
+ self.assertContains(' # Subtest: string-stream-test', result)
+ self.assertContains(' 1..3', result)
+ self.assertContains(' ok 1 - string_stream_test_empty_on_creation', result)
+ self.assertContains(' ok 2 - string_stream_test_not_empty_after_add', result)
+ self.assertContains(' ok 3 - string_stream_test_get_string', result)
+ self.assertContains('ok 3 - string-stream-test', result)
+
def test_parse_successful_test_log(self):
all_passed_log = get_absolute_path(
'test_data/test_is_test_passed-all_passed.log')
@@ -150,6 +180,45 @@ class KUnitParserTest(unittest.TestCase):
result.status)
file.close()
+ def test_ignores_prefix_printk_time(self):
+ prefix_log = get_absolute_path(
+ 'test_data/test_config_printk_time.log')
+ with open(prefix_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
+ self.assertEqual('kunit-resource-test', result.suites[0].name)
+
+ def test_ignores_multiple_prefixes(self):
+ prefix_log = get_absolute_path(
+ 'test_data/test_multiple_prefixes.log')
+ with open(prefix_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
+ self.assertEqual('kunit-resource-test', result.suites[0].name)
+
+ def test_prefix_mixed_kernel_output(self):
+ mixed_prefix_log = get_absolute_path(
+ 'test_data/test_interrupted_tap_output.log')
+ with open(mixed_prefix_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
+ self.assertEqual('kunit-resource-test', result.suites[0].name)
+
+ def test_prefix_poundsign(self):
+ pound_log = get_absolute_path('test_data/test_pound_sign.log')
+ with open(pound_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
+ self.assertEqual('kunit-resource-test', result.suites[0].name)
+
+ def test_kernel_panic_end(self):
+ panic_log = get_absolute_path('test_data/test_kernel_panic_interrupt.log')
+ with open(panic_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
+ self.assertEqual('kunit-resource-test', result.suites[0].name)
+
+ def test_pound_no_prefix(self):
+ pound_log = get_absolute_path('test_data/test_pound_no_prefix.log')
+ with open(pound_log) as file:
+ result = kunit_parser.parse_run_tests(file.readlines())
+ self.assertEqual('kunit-resource-test', result.suites[0].name)
+
class StrContains(str):
def __eq__(self, other):
return self in other
@@ -174,7 +243,8 @@ class KUnitMainTest(unittest.TestCase):
kunit.main(['run'], self.linux_source_mock)
assert self.linux_source_mock.build_reconfig.call_count == 1
assert self.linux_source_mock.run_kernel.call_count == 1
- self.linux_source_mock.run_kernel.assert_called_once_with(build_dir='', timeout=300)
+ self.linux_source_mock.run_kernel.assert_called_once_with(
+ build_dir='', timeout=300)
self.print_mock.assert_any_call(StrContains('Testing complete.'))
def test_run_passes_args_fail(self):
@@ -189,25 +259,27 @@ class KUnitMainTest(unittest.TestCase):
def test_run_raw_output(self):
self.linux_source_mock.run_kernel = mock.Mock(return_value=[])
- kunit.main(['run', '--raw_output'], self.linux_source_mock)
+ with self.assertRaises(SystemExit) as e:
+ kunit.main(['run', '--raw_output'], self.linux_source_mock)
+ assert type(e.exception) == SystemExit
+ assert e.exception.code == 1
assert self.linux_source_mock.build_reconfig.call_count == 1
assert self.linux_source_mock.run_kernel.call_count == 1
- for kall in self.print_mock.call_args_list:
- assert kall != mock.call(StrContains('Testing complete.'))
- assert kall != mock.call(StrContains(' 0 tests run'))
def test_run_timeout(self):
timeout = 3453
kunit.main(['run', '--timeout', str(timeout)], self.linux_source_mock)
assert self.linux_source_mock.build_reconfig.call_count == 1
- self.linux_source_mock.run_kernel.assert_called_once_with(build_dir='', timeout=timeout)
+ self.linux_source_mock.run_kernel.assert_called_once_with(
+ build_dir='', timeout=timeout)
self.print_mock.assert_any_call(StrContains('Testing complete.'))
def test_run_builddir(self):
build_dir = '.kunit'
kunit.main(['run', '--build_dir', build_dir], self.linux_source_mock)
assert self.linux_source_mock.build_reconfig.call_count == 1
- self.linux_source_mock.run_kernel.assert_called_once_with(build_dir=build_dir, timeout=300)
+ self.linux_source_mock.run_kernel.assert_called_once_with(
+ build_dir=build_dir, timeout=300)
self.print_mock.assert_any_call(StrContains('Testing complete.'))
if __name__ == '__main__':
diff --git a/tools/testing/kunit/test_data/test_config_printk_time.log b/tools/testing/kunit/test_data/test_config_printk_time.log
new file mode 100644
index 000000000000..c02ca773946d
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_config_printk_time.log
@@ -0,0 +1,31 @@
+[ 0.060000] printk: console [mc-1] enabled
+[ 0.060000] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+[ 0.060000] TAP version 14
+[ 0.060000] # Subtest: kunit-resource-test
+[ 0.060000] 1..5
+[ 0.060000] ok 1 - kunit_resource_test_init_resources
+[ 0.060000] ok 2 - kunit_resource_test_alloc_resource
+[ 0.060000] ok 3 - kunit_resource_test_destroy_resource
+[ 0.060000] ok 4 - kunit_resource_test_cleanup_resources
+[ 0.060000] ok 5 - kunit_resource_test_proper_free_ordering
+[ 0.060000] ok 1 - kunit-resource-test
+[ 0.060000] # Subtest: kunit-try-catch-test
+[ 0.060000] 1..2
+[ 0.060000] ok 1 - kunit_test_try_catch_successful_try_no_catch
+[ 0.060000] ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+[ 0.060000] ok 2 - kunit-try-catch-test
+[ 0.060000] # Subtest: string-stream-test
+[ 0.060000] 1..3
+[ 0.060000] ok 1 - string_stream_test_empty_on_creation
+[ 0.060000] ok 2 - string_stream_test_not_empty_after_add
+[ 0.060000] ok 3 - string_stream_test_get_string
+[ 0.060000] ok 3 - string-stream-test
+[ 0.060000] List of all partitions:
+[ 0.060000] No filesystem could mount root, tried:
+[ 0.060000]
+[ 0.060000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+[ 0.060000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+[ 0.060000] Stack:
+[ 0.060000] 602086f8 601bc260 705c0000 705c0000
+[ 0.060000] 602086f8 6005fcec 705c0000 6002c6ab
+[ 0.060000] 6005fcec 601bc260 705c0000 3000000010
\ No newline at end of file
diff --git a/tools/testing/kunit/test_data/test_interrupted_tap_output.log b/tools/testing/kunit/test_data/test_interrupted_tap_output.log
new file mode 100644
index 000000000000..5c73fb3a1c6f
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_interrupted_tap_output.log
@@ -0,0 +1,37 @@
+[ 0.060000] printk: console [mc-1] enabled
+[ 0.060000] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+[ 0.060000] TAP version 14
+[ 0.060000] # Subtest: kunit-resource-test
+[ 0.060000] 1..5
+[ 0.060000] ok 1 - kunit_resource_test_init_resources
+[ 0.060000] ok 2 - kunit_resource_test_alloc_resource
+[ 0.060000] ok 3 - kunit_resource_test_destroy_resource
+[ 0.060000] kAFS: Red Hat AFS client v0.1 registering.
+[ 0.060000] FS-Cache: Netfs 'afs' registered for caching
+[ 0.060000] *** VALIDATE kAFS ***
+[ 0.060000] Btrfs loaded, crc32c=crc32c-generic, debug=on, assert=on, integrity-checker=on, ref-verify=on
+[ 0.060000] BTRFS: selftest: sectorsize: 4096 nodesize: 4096
+[ 0.060000] BTRFS: selftest: running btrfs free space cache tests
+[ 0.060000] ok 4 - kunit_resource_test_cleanup_resources
+[ 0.060000] ok 5 - kunit_resource_test_proper_free_ordering
+[ 0.060000] ok 1 - kunit-resource-test
+[ 0.060000] # Subtest: kunit-try-catch-test
+[ 0.060000] 1..2
+[ 0.060000] ok 1 - kunit_test_try_catch_successful_try_no_catch
+[ 0.060000] ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+[ 0.060000] ok 2 - kunit-try-catch-test
+[ 0.060000] # Subtest: string-stream-test
+[ 0.060000] 1..3
+[ 0.060000] ok 1 - string_stream_test_empty_on_creation
+[ 0.060000] ok 2 - string_stream_test_not_empty_after_add
+[ 0.060000] ok 3 - string_stream_test_get_string
+[ 0.060000] ok 3 - string-stream-test
+[ 0.060000] List of all partitions:
+[ 0.060000] No filesystem could mount root, tried:
+[ 0.060000]
+[ 0.060000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+[ 0.060000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+[ 0.060000] Stack:
+[ 0.060000] 602086f8 601bc260 705c0000 705c0000
+[ 0.060000] 602086f8 6005fcec 705c0000 6002c6ab
+[ 0.060000] 6005fcec 601bc260 705c0000 3000000010
\ No newline at end of file
diff --git a/tools/testing/kunit/test_data/test_kernel_panic_interrupt.log b/tools/testing/kunit/test_data/test_kernel_panic_interrupt.log
new file mode 100644
index 000000000000..c045eee75f27
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_kernel_panic_interrupt.log
@@ -0,0 +1,25 @@
+[ 0.060000] printk: console [mc-1] enabled
+[ 0.060000] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+[ 0.060000] TAP version 14
+[ 0.060000] # Subtest: kunit-resource-test
+[ 0.060000] 1..5
+[ 0.060000] ok 1 - kunit_resource_test_init_resources
+[ 0.060000] ok 2 - kunit_resource_test_alloc_resource
+[ 0.060000] ok 3 - kunit_resource_test_destroy_resource
+[ 0.060000] ok 4 - kunit_resource_test_cleanup_resources
+[ 0.060000] ok 5 - kunit_resource_test_proper_free_ordering
+[ 0.060000] ok 1 - kunit-resource-test
+[ 0.060000] # Subtest: kunit-try-catch-test
+[ 0.060000] 1..2
+[ 0.060000] ok 1 - kunit_test_try_catch_successful_try_no_catch
+[ 0.060000] ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+[ 0.060000] ok 2 - kunit-try-catch-test
+[ 0.060000] # Subtest: string-stream-test
+[ 0.060000] 1..3
+[ 0.060000] ok 1 - string_stream_test_empty_on_creation
+[ 0.060000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+[ 0.060000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+[ 0.060000] Stack:
+[ 0.060000] 602086f8 601bc260 705c0000 705c0000
+[ 0.060000] 602086f8 6005fcec 705c0000 6002c6ab
+[ 0.060000] 6005fcec 601bc260 705c0000 3000000010
\ No newline at end of file
diff --git a/tools/testing/kunit/test_data/test_multiple_prefixes.log b/tools/testing/kunit/test_data/test_multiple_prefixes.log
new file mode 100644
index 000000000000..bc48407dcc36
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_multiple_prefixes.log
@@ -0,0 +1,31 @@
+[ 0.060000][ T1] printk: console [mc-1] enabled
+[ 0.060000][ T1] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+[ 0.060000][ T1] TAP version 14
+[ 0.060000][ T1] # Subtest: kunit-resource-test
+[ 0.060000][ T1] 1..5
+[ 0.060000][ T1] ok 1 - kunit_resource_test_init_resources
+[ 0.060000][ T1] ok 2 - kunit_resource_test_alloc_resource
+[ 0.060000][ T1] ok 3 - kunit_resource_test_destroy_resource
+[ 0.060000][ T1] ok 4 - kunit_resource_test_cleanup_resources
+[ 0.060000][ T1] ok 5 - kunit_resource_test_proper_free_ordering
+[ 0.060000][ T1] ok 1 - kunit-resource-test
+[ 0.060000][ T1] # Subtest: kunit-try-catch-test
+[ 0.060000][ T1] 1..2
+[ 0.060000][ T1] ok 1 - kunit_test_try_catch_successful_try_no_catch
+[ 0.060000][ T1] ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+[ 0.060000][ T1] ok 2 - kunit-try-catch-test
+[ 0.060000][ T1] # Subtest: string-stream-test
+[ 0.060000][ T1] 1..3
+[ 0.060000][ T1] ok 1 - string_stream_test_empty_on_creation
+[ 0.060000][ T1] ok 2 - string_stream_test_not_empty_after_add
+[ 0.060000][ T1] ok 3 - string_stream_test_get_string
+[ 0.060000][ T1] ok 3 - string-stream-test
+[ 0.060000][ T1] List of all partitions:
+[ 0.060000][ T1] No filesystem could mount root, tried:
+[ 0.060000][ T1]
+[ 0.060000][ T1] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+[ 0.060000][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+[ 0.060000][ T1] Stack:
+[ 0.060000][ T1] 602086f8 601bc260 705c0000 705c0000
+[ 0.060000][ T1] 602086f8 6005fcec 705c0000 6002c6ab
+[ 0.060000][ T1] 6005fcec 601bc260 705c0000 3000000010
\ No newline at end of file
diff --git a/tools/testing/kunit/test_data/test_output_with_prefix_isolated_correctly.log b/tools/testing/kunit/test_data/test_output_with_prefix_isolated_correctly.log
new file mode 100644
index 000000000000..0f87cdabebb0
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_output_with_prefix_isolated_correctly.log
@@ -0,0 +1,33 @@
+[ 0.060000] printk: console [mc-1] enabled
+[ 0.060000] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+[ 0.060000] TAP version 14
+[ 0.060000] # Subtest: kunit-resource-test
+[ 0.060000] 1..5
+[ 0.060000] ok 1 - kunit_resource_test_init_resources
+[ 0.060000] ok 2 - kunit_resource_test_alloc_resource
+[ 0.060000] ok 3 - kunit_resource_test_destroy_resource
+[ 0.060000] foo bar #
+[ 0.060000] ok 4 - kunit_resource_test_cleanup_resources
+[ 0.060000] ok 5 - kunit_resource_test_proper_free_ordering
+[ 0.060000] ok 1 - kunit-resource-test
+[ 0.060000] foo bar # non-kunit output
+[ 0.060000] # Subtest: kunit-try-catch-test
+[ 0.060000] 1..2
+[ 0.060000] ok 1 - kunit_test_try_catch_successful_try_no_catch
+[ 0.060000] ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+[ 0.060000] ok 2 - kunit-try-catch-test
+[ 0.060000] # Subtest: string-stream-test
+[ 0.060000] 1..3
+[ 0.060000] ok 1 - string_stream_test_empty_on_creation
+[ 0.060000] ok 2 - string_stream_test_not_empty_after_add
+[ 0.060000] ok 3 - string_stream_test_get_string
+[ 0.060000] ok 3 - string-stream-test
+[ 0.060000] List of all partitions:
+[ 0.060000] No filesystem could mount root, tried:
+[ 0.060000]
+[ 0.060000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+[ 0.060000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+[ 0.060000] Stack:
+[ 0.060000] 602086f8 601bc260 705c0000 705c0000
+[ 0.060000] 602086f8 6005fcec 705c0000 6002c6ab
+[ 0.060000] 6005fcec 601bc260 705c0000 3000000010
\ No newline at end of file
diff --git a/tools/testing/kunit/test_data/test_pound_no_prefix.log b/tools/testing/kunit/test_data/test_pound_no_prefix.log
new file mode 100644
index 000000000000..2ceb360be7d5
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_pound_no_prefix.log
@@ -0,0 +1,33 @@
+ printk: console [mc-1] enabled
+ random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+ TAP version 14
+ # Subtest: kunit-resource-test
+ 1..5
+ ok 1 - kunit_resource_test_init_resources
+ ok 2 - kunit_resource_test_alloc_resource
+ ok 3 - kunit_resource_test_destroy_resource
+ foo bar #
+ ok 4 - kunit_resource_test_cleanup_resources
+ ok 5 - kunit_resource_test_proper_free_ordering
+ ok 1 - kunit-resource-test
+ foo bar # non-kunit output
+ # Subtest: kunit-try-catch-test
+ 1..2
+ ok 1 - kunit_test_try_catch_successful_try_no_catch
+ ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+ ok 2 - kunit-try-catch-test
+ # Subtest: string-stream-test
+ 1..3
+ ok 1 - string_stream_test_empty_on_creation
+ ok 2 - string_stream_test_not_empty_after_add
+ ok 3 - string_stream_test_get_string
+ ok 3 - string-stream-test
+ List of all partitions:
+ No filesystem could mount root, tried:
+
+ Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+ CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+ Stack:
+ 602086f8 601bc260 705c0000 705c0000
+ 602086f8 6005fcec 705c0000 6002c6ab
+ 6005fcec 601bc260 705c0000 3000000010
\ No newline at end of file
diff --git a/tools/testing/kunit/test_data/test_pound_sign.log b/tools/testing/kunit/test_data/test_pound_sign.log
new file mode 100644
index 000000000000..28ffa5ba03bf
--- /dev/null
+++ b/tools/testing/kunit/test_data/test_pound_sign.log
@@ -0,0 +1,33 @@
+[ 0.060000] printk: console [mc-1] enabled
+[ 0.060000] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
+[ 0.060000] TAP version 14
+[ 0.060000] # Subtest: kunit-resource-test
+[ 0.060000] 1..5
+[ 0.060000] ok 1 - kunit_resource_test_init_resources
+[ 0.060000] ok 2 - kunit_resource_test_alloc_resource
+[ 0.060000] ok 3 - kunit_resource_test_destroy_resource
+[ 0.060000] foo bar #
+[ 0.060000] ok 4 - kunit_resource_test_cleanup_resources
+[ 0.060000] ok 5 - kunit_resource_test_proper_free_ordering
+[ 0.060000] ok 1 - kunit-resource-test
+[ 0.060000] foo bar # non-kunit output
+[ 0.060000] # Subtest: kunit-try-catch-test
+[ 0.060000] 1..2
+[ 0.060000] ok 1 - kunit_test_try_catch_successful_try_no_catch
+[ 0.060000] ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
+[ 0.060000] ok 2 - kunit-try-catch-test
+[ 0.060000] # Subtest: string-stream-test
+[ 0.060000] 1..3
+[ 0.060000] ok 1 - string_stream_test_empty_on_creation
+[ 0.060000] ok 2 - string_stream_test_not_empty_after_add
+[ 0.060000] ok 3 - string_stream_test_get_string
+[ 0.060000] ok 3 - string-stream-test
+[ 0.060000] List of all partitions:
+[ 0.060000] No filesystem could mount root, tried:
+[ 0.060000]
+[ 0.060000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
+[ 0.060000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.0-rc1-gea2dd7c0875e-dirty #2
+[ 0.060000] Stack:
+[ 0.060000] 602086f8 601bc260 705c0000 705c0000
+[ 0.060000] 602086f8 6005fcec 705c0000 6002c6ab
+[ 0.060000] 6005fcec 601bc260 705c0000 3000000010
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 397d6b612502..aa6abfe0749c 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -7,8 +7,8 @@ LDLIBS+= -lpthread -lurcu
TARGETS = main idr-test multiorder xarray
CORE_OFILES := xarray.o radix-tree.o idr.o linux.o test.o find_bit.o bitmap.o
OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
- regression4.o \
- tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o
+ regression4.o tag_check.o multiorder.o idr-test.o iteration_check.o \
+ iteration_check_2.o benchmark.o
ifndef SHIFT
SHIFT=3
diff --git a/tools/testing/radix-tree/iteration_check_2.c b/tools/testing/radix-tree/iteration_check_2.c
new file mode 100644
index 000000000000..aac5c50a3674
--- /dev/null
+++ b/tools/testing/radix-tree/iteration_check_2.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iteration_check_2.c: Check that deleting a tagged entry doesn't cause
+ * an RCU walker to finish early.
+ * Copyright (c) 2020 Oracle
+ * Author: Matthew Wilcox
+ */
+#include
+#include "test.h"
+
+static volatile bool test_complete;
+
+static void *iterator(void *arg)
+{
+ XA_STATE(xas, arg, 0);
+ void *entry;
+
+ rcu_register_thread();
+
+ while (!test_complete) {
+ xas_set(&xas, 0);
+ rcu_read_lock();
+ xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_0)
+ ;
+ rcu_read_unlock();
+ assert(xas.xa_index >= 100);
+ }
+
+ rcu_unregister_thread();
+ return NULL;
+}
+
+static void *throbber(void *arg)
+{
+ struct xarray *xa = arg;
+
+ rcu_register_thread();
+
+ while (!test_complete) {
+ int i;
+
+ for (i = 0; i < 100; i++) {
+ xa_store(xa, i, xa_mk_value(i), GFP_KERNEL);
+ xa_set_mark(xa, i, XA_MARK_0);
+ }
+ for (i = 0; i < 100; i++)
+ xa_erase(xa, i);
+ }
+
+ rcu_unregister_thread();
+ return NULL;
+}
+
+void iteration_test2(unsigned test_duration)
+{
+ pthread_t threads[2];
+ DEFINE_XARRAY(array);
+ int i;
+
+ printv(1, "Running iteration test 2 for %d seconds\n", test_duration);
+
+ test_complete = false;
+
+ xa_store(&array, 100, xa_mk_value(100), GFP_KERNEL);
+ xa_set_mark(&array, 100, XA_MARK_0);
+
+ if (pthread_create(&threads[0], NULL, iterator, &array)) {
+ perror("create iterator thread");
+ exit(1);
+ }
+ if (pthread_create(&threads[1], NULL, throbber, &array)) {
+ perror("create throbber thread");
+ exit(1);
+ }
+
+ sleep(test_duration);
+ test_complete = true;
+
+ for (i = 0; i < 2; i++) {
+ if (pthread_join(threads[i], NULL)) {
+ perror("pthread_join");
+ exit(1);
+ }
+ }
+
+ xa_destroy(&array);
+}
diff --git a/tools/testing/radix-tree/linux.c b/tools/testing/radix-tree/linux.c
index 44a0d1ad4408..2d9c59df60de 100644
--- a/tools/testing/radix-tree/linux.c
+++ b/tools/testing/radix-tree/linux.c
@@ -19,37 +19,44 @@ int test_verbose;
struct kmem_cache {
pthread_mutex_t lock;
- int size;
+ unsigned int size;
+ unsigned int align;
int nr_objs;
void *objs;
void (*ctor)(void *);
};
-void *kmem_cache_alloc(struct kmem_cache *cachep, int flags)
+void *kmem_cache_alloc(struct kmem_cache *cachep, int gfp)
{
- struct radix_tree_node *node;
+ void *p;
- if (!(flags & __GFP_DIRECT_RECLAIM))
+ if (!(gfp & __GFP_DIRECT_RECLAIM))
return NULL;
pthread_mutex_lock(&cachep->lock);
if (cachep->nr_objs) {
+ struct radix_tree_node *node = cachep->objs;
cachep->nr_objs--;
- node = cachep->objs;
cachep->objs = node->parent;
pthread_mutex_unlock(&cachep->lock);
node->parent = NULL;
+ p = node;
} else {
pthread_mutex_unlock(&cachep->lock);
- node = malloc(cachep->size);
+ if (cachep->align)
+ posix_memalign(&p, cachep->align, cachep->size);
+ else
+ p = malloc(cachep->size);
if (cachep->ctor)
- cachep->ctor(node);
+ cachep->ctor(p);
+ else if (gfp & __GFP_ZERO)
+ memset(p, 0, cachep->size);
}
uatomic_inc(&nr_allocated);
if (kmalloc_verbose)
- printf("Allocating %p from slab\n", node);
- return node;
+ printf("Allocating %p from slab\n", p);
+ return p;
}
void kmem_cache_free(struct kmem_cache *cachep, void *objp)
@@ -59,7 +66,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
if (kmalloc_verbose)
printf("Freeing %p to slab\n", objp);
pthread_mutex_lock(&cachep->lock);
- if (cachep->nr_objs > 10) {
+ if (cachep->nr_objs > 10 || cachep->align) {
memset(objp, POISON_FREE, cachep->size);
free(objp);
} else {
@@ -98,13 +105,14 @@ void kfree(void *p)
}
struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t offset,
- unsigned long flags, void (*ctor)(void *))
+kmem_cache_create(const char *name, unsigned int size, unsigned int align,
+ unsigned int flags, void (*ctor)(void *))
{
struct kmem_cache *ret = malloc(sizeof(*ret));
pthread_mutex_init(&ret->lock, NULL);
ret->size = size;
+ ret->align = align;
ret->nr_objs = 0;
ret->objs = NULL;
ret->ctor = ctor;
diff --git a/tools/testing/radix-tree/linux/slab.h b/tools/testing/radix-tree/linux/slab.h
index a037def0dec6..2958830ce4d7 100644
--- a/tools/testing/radix-tree/linux/slab.h
+++ b/tools/testing/radix-tree/linux/slab.h
@@ -20,8 +20,8 @@ static inline void *kzalloc(size_t size, gfp_t gfp)
void *kmem_cache_alloc(struct kmem_cache *cachep, int flags);
void kmem_cache_free(struct kmem_cache *cachep, void *objp);
-struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t offset,
- unsigned long flags, void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, unsigned int size,
+ unsigned int align, unsigned int flags,
+ void (*ctor)(void *));
#endif /* SLAB_H */
diff --git a/tools/testing/radix-tree/main.c b/tools/testing/radix-tree/main.c
index 7a22d6e3732e..f2cbc8e5b97c 100644
--- a/tools/testing/radix-tree/main.c
+++ b/tools/testing/radix-tree/main.c
@@ -311,6 +311,7 @@ int main(int argc, char **argv)
regression4_test();
iteration_test(0, 10 + 90 * long_run);
iteration_test(7, 10 + 90 * long_run);
+ iteration_test2(10 + 90 * long_run);
single_thread_tests(long_run);
/* Free any remaining preallocated nodes */
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index 1ee4b2c0ad10..34dab4d18744 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -34,6 +34,7 @@ void xarray_tests(void);
void tag_check(void);
void multiorder_checks(void);
void iteration_test(unsigned order, unsigned duration);
+void iteration_test2(unsigned duration);
void benchmark(void);
void idr_checks(void);
void ida_tests(void);
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 077818d0197f..0635684cbf07 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -91,7 +91,7 @@ override LDFLAGS =
override MAKEFLAGS =
endif
-# Append kselftest to KBUILD_OUTPUT to avoid cluttering
+# Append kselftest to KBUILD_OUTPUT and O to avoid cluttering
# KBUILD_OUTPUT with selftest objects and headers installed
# by selftests Makefile or lib.mk.
ifdef building_out_of_srctree
@@ -99,7 +99,7 @@ override LDFLAGS =
endif
ifneq ($(O),)
- BUILD := $(O)
+ BUILD := $(O)/kselftest
else
ifneq ($(KBUILD_OUTPUT),)
BUILD := $(KBUILD_OUTPUT)/kselftest
diff --git a/tools/testing/selftests/android/Makefile b/tools/testing/selftests/android/Makefile
index 7c462714b418..9258306cafe9 100644
--- a/tools/testing/selftests/android/Makefile
+++ b/tools/testing/selftests/android/Makefile
@@ -21,7 +21,7 @@ all:
override define INSTALL_RULE
mkdir -p $(INSTALL_PATH)
- install -t $(INSTALL_PATH) $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES)
+install -t $(INSTALL_PATH) $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
@for SUBDIR in $(SUBDIRS); do \
BUILD_TARGET=$(OUTPUT)/$$SUBDIR; \
diff --git a/tools/testing/selftests/android/ion/Makefile b/tools/testing/selftests/android/ion/Makefile
index 0eb7ab626e1c..42b71f005332 100644
--- a/tools/testing/selftests/android/ion/Makefile
+++ b/tools/testing/selftests/android/ion/Makefile
@@ -17,4 +17,4 @@ include ../../lib.mk
$(OUTPUT)/ionapp_export: ionapp_export.c ipcsocket.c ionutils.c
$(OUTPUT)/ionapp_import: ionapp_import.c ipcsocket.c ionutils.c
-$(OUTPUT)/ionmap_test: ionmap_test.c ionutils.c
+$(OUTPUT)/ionmap_test: ionmap_test.c ionutils.c ipcsocket.c
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
index 18fdaab9f570..68ff3f45c720 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
@@ -23,7 +23,7 @@ if [ ! -f events/sched/sched_process_fork/hist ]; then
exit_unsupported
fi
-echo "Test histogram multiple tiggers"
+echo "Test histogram multiple triggers"
echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger
echo 'hist:keys=parent_comm:vals=child_pid' >> events/sched/sched_process_fork/trigger
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 5336b26506ab..2902f6a78f8a 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -635,10 +635,12 @@
struct __test_metadata {
const char *name;
void (*fn)(struct __test_metadata *);
+ pid_t pid; /* pid of test when being run */
int termsig;
int passed;
int trigger; /* extra handler after the evaluation */
- int timeout;
+ int timeout; /* seconds to wait for test timeout */
+ bool timed_out; /* did this test timeout instead of exiting? */
__u8 step;
bool no_print; /* manual trigger when TH_LOG_STREAM is not available */
struct __test_metadata *prev, *next;
@@ -695,64 +697,116 @@ static inline int __bail(int for_realz, bool no_print, __u8 step)
return 0;
}
-void __run_test(struct __test_metadata *t)
+struct __test_metadata *__active_test;
+static void __timeout_handler(int sig, siginfo_t *info, void *ucontext)
{
- pid_t child_pid;
+ struct __test_metadata *t = __active_test;
+
+ /* Sanity check handler execution environment. */
+ if (!t) {
+ fprintf(TH_LOG_STREAM,
+ "no active test in SIGARLM handler!?\n");
+ abort();
+ }
+ if (sig != SIGALRM || sig != info->si_signo) {
+ fprintf(TH_LOG_STREAM,
+ "%s: SIGALRM handler caught signal %d!?\n",
+ t->name, sig != SIGALRM ? sig : info->si_signo);
+ abort();
+ }
+
+ t->timed_out = true;
+ kill(t->pid, SIGKILL);
+}
+
+void __wait_for_test(struct __test_metadata *t)
+{
+ struct sigaction action = {
+ .sa_sigaction = __timeout_handler,
+ .sa_flags = SA_SIGINFO,
+ };
+ struct sigaction saved_action;
int status;
+ if (sigaction(SIGALRM, &action, &saved_action)) {
+ t->passed = 0;
+ fprintf(TH_LOG_STREAM,
+ "%s: unable to install SIGARLM handler\n",
+ t->name);
+ return;
+ }
+ __active_test = t;
+ t->timed_out = false;
+ alarm(t->timeout);
+ waitpid(t->pid, &status, 0);
+ alarm(0);
+ if (sigaction(SIGALRM, &saved_action, NULL)) {
+ t->passed = 0;
+ fprintf(TH_LOG_STREAM,
+ "%s: unable to uninstall SIGARLM handler\n",
+ t->name);
+ return;
+ }
+ __active_test = NULL;
+
+ if (t->timed_out) {
+ t->passed = 0;
+ fprintf(TH_LOG_STREAM,
+ "%s: Test terminated by timeout\n", t->name);
+ } else if (WIFEXITED(status)) {
+ t->passed = t->termsig == -1 ? !WEXITSTATUS(status) : 0;
+ if (t->termsig != -1) {
+ fprintf(TH_LOG_STREAM,
+ "%s: Test exited normally "
+ "instead of by signal (code: %d)\n",
+ t->name,
+ WEXITSTATUS(status));
+ } else if (!t->passed) {
+ fprintf(TH_LOG_STREAM,
+ "%s: Test failed at step #%d\n",
+ t->name,
+ WEXITSTATUS(status));
+ }
+ } else if (WIFSIGNALED(status)) {
+ t->passed = 0;
+ if (WTERMSIG(status) == SIGABRT) {
+ fprintf(TH_LOG_STREAM,
+ "%s: Test terminated by assertion\n",
+ t->name);
+ } else if (WTERMSIG(status) == t->termsig) {
+ t->passed = 1;
+ } else {
+ fprintf(TH_LOG_STREAM,
+ "%s: Test terminated unexpectedly "
+ "by signal %d\n",
+ t->name,
+ WTERMSIG(status));
+ }
+ } else {
+ fprintf(TH_LOG_STREAM,
+ "%s: Test ended in some other way [%u]\n",
+ t->name,
+ status);
+ }
+}
+
+void __run_test(struct __test_metadata *t)
+{
t->passed = 1;
t->trigger = 0;
printf("[ RUN ] %s\n", t->name);
- alarm(t->timeout);
- child_pid = fork();
- if (child_pid < 0) {
+ t->pid = fork();
+ if (t->pid < 0) {
printf("ERROR SPAWNING TEST CHILD\n");
t->passed = 0;
- } else if (child_pid == 0) {
+ } else if (t->pid == 0) {
t->fn(t);
/* return the step that failed or 0 */
_exit(t->passed ? 0 : t->step);
} else {
- /* TODO(wad) add timeout support. */
- waitpid(child_pid, &status, 0);
- if (WIFEXITED(status)) {
- t->passed = t->termsig == -1 ? !WEXITSTATUS(status) : 0;
- if (t->termsig != -1) {
- fprintf(TH_LOG_STREAM,
- "%s: Test exited normally "
- "instead of by signal (code: %d)\n",
- t->name,
- WEXITSTATUS(status));
- } else if (!t->passed) {
- fprintf(TH_LOG_STREAM,
- "%s: Test failed at step #%d\n",
- t->name,
- WEXITSTATUS(status));
- }
- } else if (WIFSIGNALED(status)) {
- t->passed = 0;
- if (WTERMSIG(status) == SIGABRT) {
- fprintf(TH_LOG_STREAM,
- "%s: Test terminated by assertion\n",
- t->name);
- } else if (WTERMSIG(status) == t->termsig) {
- t->passed = 1;
- } else {
- fprintf(TH_LOG_STREAM,
- "%s: Test terminated unexpectedly "
- "by signal %d\n",
- t->name,
- WTERMSIG(status));
- }
- } else {
- fprintf(TH_LOG_STREAM,
- "%s: Test ended in some other way [%u]\n",
- t->name,
- status);
- }
+ __wait_for_test(t);
}
printf("[ %4s ] %s\n", (t->passed ? "OK" : "FAIL"), t->name);
- alarm(0);
}
static int test_harness_run(int __attribute__((unused)) argc,
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 3ed0134a764d..b0556c752443 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -137,7 +137,8 @@ endif
# Selftest makefiles can override those targets by setting
# OVERRIDE_TARGETS = 1.
ifeq ($(OVERRIDE_TARGETS),)
-$(OUTPUT)/%:%.c
+LOCAL_HDRS := $(selfdir)/kselftest_harness.h $(selfdir)/kselftest.h
+$(OUTPUT)/%:%.c $(LOCAL_HDRS)
$(LINK.c) $^ $(LDLIBS) -o $@
$(OUTPUT)/%.o:%.S
diff --git a/tools/testing/selftests/memfd/Makefile b/tools/testing/selftests/memfd/Makefile
index 53a848109f7b..0a15f9e23431 100644
--- a/tools/testing/selftests/memfd/Makefile
+++ b/tools/testing/selftests/memfd/Makefile
@@ -4,9 +4,8 @@ CFLAGS += -I../../../../include/uapi/
CFLAGS += -I../../../../include/
CFLAGS += -I../../../../usr/include/
-TEST_GEN_PROGS := memfd_test
+TEST_GEN_PROGS := memfd_test fuse_test fuse_mnt
TEST_PROGS := run_fuse_test.sh run_hugetlbfs_test.sh
-TEST_GEN_FILES := fuse_mnt fuse_test
fuse_mnt.o: CFLAGS += $(shell pkg-config fuse --cflags)
@@ -14,7 +13,7 @@ include ../lib.mk
$(OUTPUT)/fuse_mnt: LDLIBS += $(shell pkg-config fuse --libs)
-$(OUTPUT)/memfd_test: memfd_test.c common.o
-$(OUTPUT)/fuse_test: fuse_test.c common.o
+$(OUTPUT)/memfd_test: memfd_test.c common.c
+$(OUTPUT)/fuse_test: fuse_test.c common.c
-EXTRA_CLEAN = common.o
+EXTRA_CLEAN = $(OUTPUT)/common.o
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index c0b7f89f0930..2f1f532c39db 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
-CFLAGS += -iquote../../../../include/uapi -Wall
+CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall
-TEST_GEN_PROGS := get_syscall_info peeksiginfo
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
new file mode 100644
index 000000000000..4db327b44586
--- /dev/null
+++ b/tools/testing/selftests/ptrace/vmaccess.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2020 Bernd Edlinger
+ * All rights reserved.
+ *
+ * Check whether /proc/$pid/mem can be accessed without causing deadlocks
+ * when de_thread is blocked with ->cred_guard_mutex held.
+ */
+
+#include "../kselftest_harness.h"
+#include
+#include
+#include
+#include
+#include
+#include
+
+static void *thread(void *arg)
+{
+ ptrace(PTRACE_TRACEME, 0, 0L, 0L);
+ return NULL;
+}
+
+TEST(vmaccess)
+{
+ int f, pid = fork();
+ char mm[64];
+
+ if (!pid) {
+ pthread_t pt;
+
+ pthread_create(&pt, NULL, thread, NULL);
+ pthread_join(pt, NULL);
+ execlp("true", "true", NULL);
+ }
+
+ sleep(1);
+ sprintf(mm, "/proc/%d/mem", pid);
+ f = open(mm, O_RDONLY);
+ ASSERT_GE(f, 0);
+ close(f);
+ f = kill(pid, SIGCONT);
+ ASSERT_EQ(f, 0);
+}
+
+TEST(attach)
+{
+ int s, k, pid = fork();
+
+ if (!pid) {
+ pthread_t pt;
+
+ pthread_create(&pt, NULL, thread, NULL);
+ pthread_join(pt, NULL);
+ execlp("sleep", "sleep", "2", NULL);
+ }
+
+ sleep(1);
+ k = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
+ ASSERT_EQ(errno, EAGAIN);
+ ASSERT_EQ(k, -1);
+ k = waitpid(-1, &s, WNOHANG);
+ ASSERT_NE(k, -1);
+ ASSERT_NE(k, 0);
+ ASSERT_NE(k, pid);
+ ASSERT_EQ(WIFEXITED(s), 1);
+ ASSERT_EQ(WEXITSTATUS(s), 0);
+ sleep(1);
+ k = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
+ ASSERT_EQ(k, 0);
+ k = waitpid(-1, &s, 0);
+ ASSERT_EQ(k, pid);
+ ASSERT_EQ(WIFSTOPPED(s), 1);
+ ASSERT_EQ(WSTOPSIG(s), SIGSTOP);
+ k = ptrace(PTRACE_DETACH, pid, 0L, 0L);
+ ASSERT_EQ(k, 0);
+ k = waitpid(-1, &s, 0);
+ ASSERT_EQ(k, pid);
+ ASSERT_EQ(WIFEXITED(s), 1);
+ ASSERT_EQ(WEXITSTATUS(s), 0);
+ k = waitpid(-1, NULL, 0);
+ ASSERT_EQ(k, -1);
+ ASSERT_EQ(errno, ECHILD);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/resctrl/Makefile b/tools/testing/selftests/resctrl/Makefile
new file mode 100644
index 000000000000..d585cc1948cc
--- /dev/null
+++ b/tools/testing/selftests/resctrl/Makefile
@@ -0,0 +1,17 @@
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -g -Wall
+SRCS=$(wildcard *.c)
+OBJS=$(SRCS:.c=.o)
+
+all: resctrl_tests
+
+$(OBJS): $(SRCS)
+ $(CC) $(CFLAGS) -c $(SRCS)
+
+resctrl_tests: $(OBJS)
+ $(CC) $(CFLAGS) -o $@ $^
+
+.PHONY: clean
+
+clean:
+ $(RM) $(OBJS) resctrl_tests
diff --git a/tools/testing/selftests/resctrl/README b/tools/testing/selftests/resctrl/README
new file mode 100644
index 000000000000..6e5a0ffa18e8
--- /dev/null
+++ b/tools/testing/selftests/resctrl/README
@@ -0,0 +1,53 @@
+resctrl_tests - resctrl file system test suit
+
+Authors:
+ Fenghua Yu
+ Sai Praneeth Prakhya ,
+
+resctrl_tests tests various resctrl functionalities and interfaces including
+both software and hardware.
+
+Currently it supports Memory Bandwidth Monitoring test and Memory Bandwidth
+Allocation test on Intel RDT hardware. More tests will be added in the future.
+And the test suit can be extended to cover AMD QoS and ARM MPAM hardware
+as well.
+
+BUILD
+-----
+
+Run "make" to build executable file "resctrl_tests".
+
+RUN
+---
+
+To use resctrl_tests, root or sudoer privileges are required. This is because
+the test needs to mount resctrl file system and change contents in the file
+system.
+
+Executing the test without any parameter will run all supported tests:
+
+ sudo ./resctrl_tests
+
+OVERVIEW OF EXECUTION
+---------------------
+
+A test case has four stages:
+
+ - setup: mount resctrl file system, create group, setup schemata, move test
+ process pids to tasks, start benchmark.
+ - execute: let benchmark run
+ - verify: get resctrl data and verify the data with another source, e.g.
+ perf event.
+ - teardown: umount resctrl and clear temporary files.
+
+ARGUMENTS
+---------
+
+Parameter '-h' shows usage information.
+
+usage: resctrl_tests [-h] [-b "benchmark_cmd [options]"] [-t test list] [-n no_of_bits]
+ -b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CQM default benchmark is builtin fill_buf
+ -t test list: run tests specified in the test list, e.g. -t mbm, mba, cqm, cat
+ -n no_of_bits: run cache tests using specified no of bits in cache bit mask
+ -p cpu_no: specify CPU number to run the test. 1 is default
+ -h: help
diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
new file mode 100644
index 000000000000..38dbf4962e33
--- /dev/null
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -0,0 +1,272 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include
+#include "resctrl.h"
+
+struct read_format {
+ __u64 nr; /* The number of events */
+ struct {
+ __u64 value; /* The value of the event */
+ } values[2];
+};
+
+static struct perf_event_attr pea_llc_miss;
+static struct read_format rf_cqm;
+static int fd_lm;
+char llc_occup_path[1024];
+
+static void initialize_perf_event_attr(void)
+{
+ pea_llc_miss.type = PERF_TYPE_HARDWARE;
+ pea_llc_miss.size = sizeof(struct perf_event_attr);
+ pea_llc_miss.read_format = PERF_FORMAT_GROUP;
+ pea_llc_miss.exclude_kernel = 1;
+ pea_llc_miss.exclude_hv = 1;
+ pea_llc_miss.exclude_idle = 1;
+ pea_llc_miss.exclude_callchain_kernel = 1;
+ pea_llc_miss.inherit = 1;
+ pea_llc_miss.exclude_guest = 1;
+ pea_llc_miss.disabled = 1;
+}
+
+static void ioctl_perf_event_ioc_reset_enable(void)
+{
+ ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0);
+ ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0);
+}
+
+static int perf_event_open_llc_miss(pid_t pid, int cpu_no)
+{
+ fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1,
+ PERF_FLAG_FD_CLOEXEC);
+ if (fd_lm == -1) {
+ perror("Error opening leader");
+ ctrlc_handler(0, NULL, NULL);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int initialize_llc_perf(void)
+{
+ memset(&pea_llc_miss, 0, sizeof(struct perf_event_attr));
+ memset(&rf_cqm, 0, sizeof(struct read_format));
+
+ /* Initialize perf_event_attr structures for HW_CACHE_MISSES */
+ initialize_perf_event_attr();
+
+ pea_llc_miss.config = PERF_COUNT_HW_CACHE_MISSES;
+
+ rf_cqm.nr = 1;
+
+ return 0;
+}
+
+static int reset_enable_llc_perf(pid_t pid, int cpu_no)
+{
+ int ret = 0;
+
+ ret = perf_event_open_llc_miss(pid, cpu_no);
+ if (ret < 0)
+ return ret;
+
+ /* Start counters to log values */
+ ioctl_perf_event_ioc_reset_enable();
+
+ return 0;
+}
+
+/*
+ * get_llc_perf: llc cache miss through perf events
+ * @cpu_no: CPU number that the benchmark PID is binded to
+ *
+ * Perf events like HW_CACHE_MISSES could be used to validate number of
+ * cache lines allocated.
+ *
+ * Return: =0 on success. <0 on failure.
+ */
+static int get_llc_perf(unsigned long *llc_perf_miss)
+{
+ __u64 total_misses;
+
+ /* Stop counters after one span to get miss rate */
+
+ ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
+
+ if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
+ perror("Could not get llc misses through perf");
+
+ return -1;
+ }
+
+ total_misses = rf_cqm.values[0].value;
+
+ close(fd_lm);
+
+ *llc_perf_miss = total_misses;
+
+ return 0;
+}
+
+/*
+ * Get LLC Occupancy as reported by RESCTRL FS
+ * For CQM,
+ * 1. If con_mon grp and mon grp given, then read from mon grp in
+ * con_mon grp
+ * 2. If only con_mon grp given, then read from con_mon grp
+ * 3. If both not given, then read from root con_mon grp
+ * For CAT,
+ * 1. If con_mon grp given, then read from it
+ * 2. If con_mon grp not given, then read from root con_mon grp
+ *
+ * Return: =0 on success. <0 on failure.
+ */
+static int get_llc_occu_resctrl(unsigned long *llc_occupancy)
+{
+ FILE *fp;
+
+ fp = fopen(llc_occup_path, "r");
+ if (!fp) {
+ perror("Failed to open results file");
+
+ return errno;
+ }
+ if (fscanf(fp, "%lu", llc_occupancy) <= 0) {
+ perror("Could not get llc occupancy");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ return 0;
+}
+
+/*
+ * print_results_cache: the cache results are stored in a file
+ * @filename: file that stores the results
+ * @bm_pid: child pid that runs benchmark
+ * @llc_value: perf miss value /
+ * llc occupancy value reported by resctrl FS
+ *
+ * Return: 0 on success. non-zero on failure.
+ */
+static int print_results_cache(char *filename, int bm_pid,
+ unsigned long llc_value)
+{
+ FILE *fp;
+
+ if (strcmp(filename, "stdio") == 0 || strcmp(filename, "stderr") == 0) {
+ printf("Pid: %d \t LLC_value: %lu\n", bm_pid,
+ llc_value);
+ } else {
+ fp = fopen(filename, "a");
+ if (!fp) {
+ perror("Cannot open results file");
+
+ return errno;
+ }
+ fprintf(fp, "Pid: %d \t llc_value: %lu\n", bm_pid, llc_value);
+ fclose(fp);
+ }
+
+ return 0;
+}
+
+int measure_cache_vals(struct resctrl_val_param *param, int bm_pid)
+{
+ unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0;
+ int ret;
+
+ /*
+ * Measure cache miss from perf.
+ */
+ if (!strcmp(param->resctrl_val, "cat")) {
+ ret = get_llc_perf(&llc_perf_miss);
+ if (ret < 0)
+ return ret;
+ llc_value = llc_perf_miss;
+ }
+
+ /*
+ * Measure llc occupancy from resctrl.
+ */
+ if (!strcmp(param->resctrl_val, "cqm")) {
+ ret = get_llc_occu_resctrl(&llc_occu_resc);
+ if (ret < 0)
+ return ret;
+ llc_value = llc_occu_resc;
+ }
+ ret = print_results_cache(param->filename, bm_pid, llc_value);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/*
+ * cache_val: execute benchmark and measure LLC occupancy resctrl
+ * and perf cache miss for the benchmark
+ * @param: parameters passed to cache_val()
+ *
+ * Return: 0 on success. non-zero on failure.
+ */
+int cat_val(struct resctrl_val_param *param)
+{
+ int malloc_and_init_memory = 1, memflush = 1, operation = 0, ret = 0;
+ char *resctrl_val = param->resctrl_val;
+ pid_t bm_pid;
+
+ if (strcmp(param->filename, "") == 0)
+ sprintf(param->filename, "stdio");
+
+ bm_pid = getpid();
+
+ /* Taskset benchmark to specified cpu */
+ ret = taskset_benchmark(bm_pid, param->cpu_no);
+ if (ret)
+ return ret;
+
+ /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS*/
+ ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
+ resctrl_val);
+ if (ret)
+ return ret;
+
+ if ((strcmp(resctrl_val, "cat") == 0)) {
+ ret = initialize_llc_perf();
+ if (ret)
+ return ret;
+ }
+
+ /* Test runs until the callback setup() tells the test to stop. */
+ while (1) {
+ if (strcmp(resctrl_val, "cat") == 0) {
+ ret = param->setup(1, param);
+ if (ret) {
+ ret = 0;
+ break;
+ }
+ ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
+ if (ret)
+ break;
+
+ if (run_fill_buf(param->span, malloc_and_init_memory,
+ memflush, operation, resctrl_val)) {
+ fprintf(stderr, "Error-running fill buffer\n");
+ ret = -1;
+ break;
+ }
+
+ sleep(1);
+ ret = measure_cache_vals(param, bm_pid);
+ if (ret)
+ break;
+ } else {
+ break;
+ }
+ }
+
+ return ret;
+}
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
new file mode 100644
index 000000000000..5da43767b973
--- /dev/null
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -0,0 +1,250 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Cache Allocation Technology (CAT) test
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+#include
+
+#define RESULT_FILE_NAME1 "result_cat1"
+#define RESULT_FILE_NAME2 "result_cat2"
+#define NUM_OF_RUNS 5
+#define MAX_DIFF_PERCENT 4
+#define MAX_DIFF 1000000
+
+int count_of_bits;
+char cbm_mask[256];
+unsigned long long_mask;
+unsigned long cache_size;
+
+/*
+ * Change schemata. Write schemata to specified
+ * con_mon grp, mon_grp in resctrl FS.
+ * Run 5 times in order to get average values.
+ */
+static int cat_setup(int num, ...)
+{
+ struct resctrl_val_param *p;
+ char schemata[64];
+ va_list param;
+ int ret = 0;
+
+ va_start(param, num);
+ p = va_arg(param, struct resctrl_val_param *);
+ va_end(param);
+
+ /* Run NUM_OF_RUNS times */
+ if (p->num_of_runs >= NUM_OF_RUNS)
+ return -1;
+
+ if (p->num_of_runs == 0) {
+ sprintf(schemata, "%lx", p->mask);
+ ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
+ p->resctrl_val);
+ }
+ p->num_of_runs++;
+
+ return ret;
+}
+
+static void show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
+ unsigned long span)
+{
+ unsigned long allocated_cache_lines = span / 64;
+ unsigned long avg_llc_perf_miss = 0;
+ float diff_percent;
+
+ avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1);
+ diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) /
+ allocated_cache_lines * 100;
+
+ printf("%sok CAT: cache miss rate within %d%%\n",
+ !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT ?
+ "not " : "", MAX_DIFF_PERCENT);
+ tests_run++;
+ printf("# Percent diff=%d\n", abs((int)diff_percent));
+ printf("# Number of bits: %d\n", no_of_bits);
+ printf("# Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);
+ printf("# Allocated cache lines: %lu\n", allocated_cache_lines);
+}
+
+static int check_results(struct resctrl_val_param *param)
+{
+ char *token_array[8], temp[512];
+ unsigned long sum_llc_perf_miss = 0;
+ int runs = 0, no_of_bits = 0;
+ FILE *fp;
+
+ printf("# Checking for pass/fail\n");
+ fp = fopen(param->filename, "r");
+ if (!fp) {
+ perror("# Cannot open file");
+
+ return errno;
+ }
+
+ while (fgets(temp, sizeof(temp), fp)) {
+ char *token = strtok(temp, ":\t");
+ int fields = 0;
+
+ while (token) {
+ token_array[fields++] = token;
+ token = strtok(NULL, ":\t");
+ }
+ /*
+ * Discard the first value which is inaccurate due to monitoring
+ * setup transition phase.
+ */
+ if (runs > 0)
+ sum_llc_perf_miss += strtoul(token_array[3], NULL, 0);
+ runs++;
+ }
+
+ fclose(fp);
+ no_of_bits = count_bits(param->mask);
+
+ show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);
+
+ return 0;
+}
+
+void cat_test_cleanup(void)
+{
+ remove(RESULT_FILE_NAME1);
+ remove(RESULT_FILE_NAME2);
+}
+
+int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
+{
+ unsigned long l_mask, l_mask_1;
+ int ret, pipefd[2], sibling_cpu_no;
+ char pipe_message;
+ pid_t bm_pid;
+
+ cache_size = 0;
+
+ ret = remount_resctrlfs(true);
+ if (ret)
+ return ret;
+
+ if (!validate_resctrl_feature_request("cat"))
+ return -1;
+
+ /* Get default cbm mask for L3/L2 cache */
+ ret = get_cbm_mask(cache_type);
+ if (ret)
+ return ret;
+
+ long_mask = strtoul(cbm_mask, NULL, 16);
+
+ /* Get L3/L2 cache size */
+ ret = get_cache_size(cpu_no, cache_type, &cache_size);
+ if (ret)
+ return ret;
+ printf("cache size :%lu\n", cache_size);
+
+ /* Get max number of bits from default-cabm mask */
+ count_of_bits = count_bits(long_mask);
+
+ if (n < 1 || n > count_of_bits - 1) {
+ printf("Invalid input value for no_of_bits n!\n");
+ printf("Please Enter value in range 1 to %d\n",
+ count_of_bits - 1);
+ return -1;
+ }
+
+ /* Get core id from same socket for running another thread */
+ sibling_cpu_no = get_core_sibling(cpu_no);
+ if (sibling_cpu_no < 0)
+ return -1;
+
+ struct resctrl_val_param param = {
+ .resctrl_val = "cat",
+ .cpu_no = cpu_no,
+ .mum_resctrlfs = 0,
+ .setup = cat_setup,
+ };
+
+ l_mask = long_mask >> n;
+ l_mask_1 = ~l_mask & long_mask;
+
+ /* Set param values for parent thread which will be allocated bitmask
+ * with (max_bits - n) bits
+ */
+ param.span = cache_size * (count_of_bits - n) / count_of_bits;
+ strcpy(param.ctrlgrp, "c2");
+ strcpy(param.mongrp, "m2");
+ strcpy(param.filename, RESULT_FILE_NAME2);
+ param.mask = l_mask;
+ param.num_of_runs = 0;
+
+ if (pipe(pipefd)) {
+ perror("# Unable to create pipe");
+ return errno;
+ }
+
+ bm_pid = fork();
+
+ /* Set param values for child thread which will be allocated bitmask
+ * with n bits
+ */
+ if (bm_pid == 0) {
+ param.mask = l_mask_1;
+ strcpy(param.ctrlgrp, "c1");
+ strcpy(param.mongrp, "m1");
+ param.span = cache_size * n / count_of_bits;
+ strcpy(param.filename, RESULT_FILE_NAME1);
+ param.num_of_runs = 0;
+ param.cpu_no = sibling_cpu_no;
+ }
+
+ remove(param.filename);
+
+ ret = cat_val(¶m);
+ if (ret)
+ return ret;
+
+ ret = check_results(¶m);
+ if (ret)
+ return ret;
+
+ if (bm_pid == 0) {
+ /* Tell parent that child is ready */
+ close(pipefd[0]);
+ pipe_message = 1;
+ if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
+ sizeof(pipe_message)) {
+ close(pipefd[1]);
+ perror("# failed signaling parent process");
+ return errno;
+ }
+
+ close(pipefd[1]);
+ while (1)
+ ;
+ } else {
+ /* Parent waits for child to be ready. */
+ close(pipefd[1]);
+ pipe_message = 0;
+ while (pipe_message != 1) {
+ if (read(pipefd[0], &pipe_message,
+ sizeof(pipe_message)) < sizeof(pipe_message)) {
+ perror("# failed reading from child process");
+ break;
+ }
+ }
+ close(pipefd[0]);
+ kill(bm_pid, SIGKILL);
+ }
+
+ cat_test_cleanup();
+ if (bm_pid)
+ umount_resctrlfs();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/resctrl/cqm_test.c b/tools/testing/selftests/resctrl/cqm_test.c
new file mode 100644
index 000000000000..c8756152bd61
--- /dev/null
+++ b/tools/testing/selftests/resctrl/cqm_test.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Cache Monitoring Technology (CQM) test
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+#include
+
+#define RESULT_FILE_NAME "result_cqm"
+#define NUM_OF_RUNS 5
+#define MAX_DIFF 2000000
+#define MAX_DIFF_PERCENT 15
+
+int count_of_bits;
+char cbm_mask[256];
+unsigned long long_mask;
+unsigned long cache_size;
+
+static int cqm_setup(int num, ...)
+{
+ struct resctrl_val_param *p;
+ va_list param;
+
+ va_start(param, num);
+ p = va_arg(param, struct resctrl_val_param *);
+ va_end(param);
+
+ /* Run NUM_OF_RUNS times */
+ if (p->num_of_runs >= NUM_OF_RUNS)
+ return -1;
+
+ p->num_of_runs++;
+
+ return 0;
+}
+
+static void show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits,
+ unsigned long span)
+{
+ unsigned long avg_llc_occu_resc = 0;
+ float diff_percent;
+ long avg_diff = 0;
+ bool res;
+
+ avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1);
+ avg_diff = (long)abs(span - avg_llc_occu_resc);
+
+ diff_percent = (((float)span - avg_llc_occu_resc) / span) * 100;
+
+ if ((abs((int)diff_percent) <= MAX_DIFF_PERCENT) ||
+ (abs(avg_diff) <= MAX_DIFF))
+ res = true;
+ else
+ res = false;
+
+ printf("%sok CQM: diff within %d, %d\%%\n", res ? "" : "not",
+ MAX_DIFF, (int)MAX_DIFF_PERCENT);
+
+ printf("# diff: %ld\n", avg_diff);
+ printf("# percent diff=%d\n", abs((int)diff_percent));
+ printf("# Results are displayed in (Bytes)\n");
+ printf("# Number of bits: %d\n", no_of_bits);
+ printf("# Avg_llc_occu_resc: %lu\n", avg_llc_occu_resc);
+ printf("# llc_occu_exp (span): %lu\n", span);
+
+ tests_run++;
+}
+
+static int check_results(struct resctrl_val_param *param, int no_of_bits)
+{
+ char *token_array[8], temp[512];
+ unsigned long sum_llc_occu_resc = 0;
+ int runs = 0;
+ FILE *fp;
+
+ printf("# checking for pass/fail\n");
+ fp = fopen(param->filename, "r");
+ if (!fp) {
+ perror("# Error in opening file\n");
+
+ return errno;
+ }
+
+ while (fgets(temp, 1024, fp)) {
+ char *token = strtok(temp, ":\t");
+ int fields = 0;
+
+ while (token) {
+ token_array[fields++] = token;
+ token = strtok(NULL, ":\t");
+ }
+
+ /* Field 3 is llc occ resc value */
+ if (runs > 0)
+ sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
+ runs++;
+ }
+ fclose(fp);
+ show_cache_info(sum_llc_occu_resc, no_of_bits, param->span);
+
+ return 0;
+}
+
+void cqm_test_cleanup(void)
+{
+ remove(RESULT_FILE_NAME);
+}
+
+int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd)
+{
+ int ret, mum_resctrlfs;
+
+ cache_size = 0;
+ mum_resctrlfs = 1;
+
+ ret = remount_resctrlfs(mum_resctrlfs);
+ if (ret)
+ return ret;
+
+ if (!validate_resctrl_feature_request("cqm"))
+ return -1;
+
+ ret = get_cbm_mask("L3");
+ if (ret)
+ return ret;
+
+ long_mask = strtoul(cbm_mask, NULL, 16);
+
+ ret = get_cache_size(cpu_no, "L3", &cache_size);
+ if (ret)
+ return ret;
+ printf("cache size :%lu\n", cache_size);
+
+ count_of_bits = count_bits(long_mask);
+
+ if (n < 1 || n > count_of_bits) {
+ printf("Invalid input value for numbr_of_bits n!\n");
+ printf("Please Enter value in range 1 to %d\n", count_of_bits);
+ return -1;
+ }
+
+ struct resctrl_val_param param = {
+ .resctrl_val = "cqm",
+ .ctrlgrp = "c1",
+ .mongrp = "m1",
+ .cpu_no = cpu_no,
+ .mum_resctrlfs = 0,
+ .filename = RESULT_FILE_NAME,
+ .mask = ~(long_mask << n) & long_mask,
+ .span = cache_size * n / count_of_bits,
+ .num_of_runs = 0,
+ .setup = cqm_setup,
+ };
+
+ if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
+ sprintf(benchmark_cmd[1], "%lu", param.span);
+
+ remove(RESULT_FILE_NAME);
+
+ ret = resctrl_val(benchmark_cmd, ¶m);
+ if (ret)
+ return ret;
+
+ ret = check_results(¶m, n);
+ if (ret)
+ return ret;
+
+ cqm_test_cleanup();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
new file mode 100644
index 000000000000..79c611c99a3d
--- /dev/null
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fill_buf benchmark
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "resctrl.h"
+
+#define CL_SIZE (64)
+#define PAGE_SIZE (4 * 1024)
+#define MB (1024 * 1024)
+
+static unsigned char *startptr;
+
+static void sb(void)
+{
+#if defined(__i386) || defined(__x86_64)
+ asm volatile("sfence\n\t"
+ : : : "memory");
+#endif
+}
+
+static void ctrl_handler(int signo)
+{
+ free(startptr);
+ printf("\nEnding\n");
+ sb();
+ exit(EXIT_SUCCESS);
+}
+
+static void cl_flush(void *p)
+{
+#if defined(__i386) || defined(__x86_64)
+ asm volatile("clflush (%0)\n\t"
+ : : "r"(p) : "memory");
+#endif
+}
+
+static void mem_flush(void *p, size_t s)
+{
+ char *cp = (char *)p;
+ size_t i = 0;
+
+ s = s / CL_SIZE; /* mem size in cache llines */
+
+ for (i = 0; i < s; i++)
+ cl_flush(&cp[i * CL_SIZE]);
+
+ sb();
+}
+
+static void *malloc_and_init_memory(size_t s)
+{
+ uint64_t *p64;
+ size_t s64;
+
+ void *p = memalign(PAGE_SIZE, s);
+
+ p64 = (uint64_t *)p;
+ s64 = s / sizeof(uint64_t);
+
+ while (s64 > 0) {
+ *p64 = (uint64_t)rand();
+ p64 += (CL_SIZE / sizeof(uint64_t));
+ s64 -= (CL_SIZE / sizeof(uint64_t));
+ }
+
+ return p;
+}
+
+static int fill_one_span_read(unsigned char *start_ptr, unsigned char *end_ptr)
+{
+ unsigned char sum, *p;
+
+ sum = 0;
+ p = start_ptr;
+ while (p < end_ptr) {
+ sum += *p;
+ p += (CL_SIZE / 2);
+ }
+
+ return sum;
+}
+
+static
+void fill_one_span_write(unsigned char *start_ptr, unsigned char *end_ptr)
+{
+ unsigned char *p;
+
+ p = start_ptr;
+ while (p < end_ptr) {
+ *p = '1';
+ p += (CL_SIZE / 2);
+ }
+}
+
+static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr,
+ char *resctrl_val)
+{
+ int ret = 0;
+ FILE *fp;
+
+ while (1) {
+ ret = fill_one_span_read(start_ptr, end_ptr);
+ if (!strcmp(resctrl_val, "cat"))
+ break;
+ }
+
+ /* Consume read result so that reading memory is not optimized out. */
+ fp = fopen("/dev/null", "w");
+ if (!fp)
+ perror("Unable to write to /dev/null");
+ fprintf(fp, "Sum: %d ", ret);
+ fclose(fp);
+
+ return 0;
+}
+
+static int fill_cache_write(unsigned char *start_ptr, unsigned char *end_ptr,
+ char *resctrl_val)
+{
+ while (1) {
+ fill_one_span_write(start_ptr, end_ptr);
+ if (!strcmp(resctrl_val, "cat"))
+ break;
+ }
+
+ return 0;
+}
+
+static int
+fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush,
+ int op, char *resctrl_val)
+{
+ unsigned char *start_ptr, *end_ptr;
+ unsigned long long i;
+ int ret;
+
+ if (malloc_and_init)
+ start_ptr = malloc_and_init_memory(buf_size);
+ else
+ start_ptr = malloc(buf_size);
+
+ if (!start_ptr)
+ return -1;
+
+ startptr = start_ptr;
+ end_ptr = start_ptr + buf_size;
+
+ /*
+ * It's better to touch the memory once to avoid any compiler
+ * optimizations
+ */
+ if (!malloc_and_init) {
+ for (i = 0; i < buf_size; i++)
+ *start_ptr++ = (unsigned char)rand();
+ }
+
+ start_ptr = startptr;
+
+ /* Flush the memory before using to avoid "cache hot pages" effect */
+ if (memflush)
+ mem_flush(start_ptr, buf_size);
+
+ if (op == 0)
+ ret = fill_cache_read(start_ptr, end_ptr, resctrl_val);
+ else
+ ret = fill_cache_write(start_ptr, end_ptr, resctrl_val);
+
+ if (ret) {
+ printf("\n Error in fill cache read/write...\n");
+ return -1;
+ }
+
+ free(startptr);
+
+ return 0;
+}
+
+int run_fill_buf(unsigned long span, int malloc_and_init_memory,
+ int memflush, int op, char *resctrl_val)
+{
+ unsigned long long cache_size = span;
+ int ret;
+
+ /* set up ctrl-c handler */
+ if (signal(SIGINT, ctrl_handler) == SIG_ERR)
+ printf("Failed to catch SIGINT!\n");
+ if (signal(SIGHUP, ctrl_handler) == SIG_ERR)
+ printf("Failed to catch SIGHUP!\n");
+
+ ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op,
+ resctrl_val);
+ if (ret) {
+ printf("\n Error in fill cache\n");
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
new file mode 100644
index 000000000000..7bf8eaa6204b
--- /dev/null
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory Bandwidth Allocation (MBA) test
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+
+#define RESULT_FILE_NAME "result_mba"
+#define NUM_OF_RUNS 5
+#define MAX_DIFF 300
+#define ALLOCATION_MAX 100
+#define ALLOCATION_MIN 10
+#define ALLOCATION_STEP 10
+
+/*
+ * Change schemata percentage from 100 to 10%. Write schemata to specified
+ * con_mon grp, mon_grp in resctrl FS.
+ * For each allocation, run 5 times in order to get average values.
+ */
+static int mba_setup(int num, ...)
+{
+ static int runs_per_allocation, allocation = 100;
+ struct resctrl_val_param *p;
+ char allocation_str[64];
+ va_list param;
+
+ va_start(param, num);
+ p = va_arg(param, struct resctrl_val_param *);
+ va_end(param);
+
+ if (runs_per_allocation >= NUM_OF_RUNS)
+ runs_per_allocation = 0;
+
+ /* Only set up schemata once every NUM_OF_RUNS of allocations */
+ if (runs_per_allocation++ != 0)
+ return 0;
+
+ if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
+ return -1;
+
+ sprintf(allocation_str, "%d", allocation);
+
+ write_schemata(p->ctrlgrp, allocation_str, p->cpu_no, p->resctrl_val);
+ allocation -= ALLOCATION_STEP;
+
+ return 0;
+}
+
+static void show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
+{
+ int allocation, runs;
+ bool failed = false;
+
+ printf("# Results are displayed in (MB)\n");
+ /* Memory bandwidth from 100% down to 10% */
+ for (allocation = 0; allocation < ALLOCATION_MAX / ALLOCATION_STEP;
+ allocation++) {
+ unsigned long avg_bw_imc, avg_bw_resc;
+ unsigned long sum_bw_imc = 0, sum_bw_resc = 0;
+ unsigned long avg_diff;
+
+ /*
+ * The first run is discarded due to inaccurate value from
+ * phase transition.
+ */
+ for (runs = NUM_OF_RUNS * allocation + 1;
+ runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS ; runs++) {
+ sum_bw_imc += bw_imc[runs];
+ sum_bw_resc += bw_resc[runs];
+ }
+
+ avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
+ avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
+ avg_diff = labs((long)(avg_bw_resc - avg_bw_imc));
+
+ printf("%sok MBA schemata percentage %u smaller than %d %%\n",
+ avg_diff > MAX_DIFF ? "not " : "",
+ ALLOCATION_MAX - ALLOCATION_STEP * allocation,
+ MAX_DIFF);
+ tests_run++;
+ printf("# avg_diff: %lu\n", avg_diff);
+ printf("# avg_bw_imc: %lu\n", avg_bw_imc);
+ printf("# avg_bw_resc: %lu\n", avg_bw_resc);
+ if (avg_diff > MAX_DIFF)
+ failed = true;
+ }
+
+ printf("%sok schemata change using MBA%s\n", failed ? "not " : "",
+ failed ? " # at least one test failed" : "");
+ tests_run++;
+}
+
+static int check_results(void)
+{
+ char *token_array[8], output[] = RESULT_FILE_NAME, temp[512];
+ unsigned long bw_imc[1024], bw_resc[1024];
+ int runs;
+ FILE *fp;
+
+ fp = fopen(output, "r");
+ if (!fp) {
+ perror(output);
+
+ return errno;
+ }
+
+ runs = 0;
+ while (fgets(temp, sizeof(temp), fp)) {
+ char *token = strtok(temp, ":\t");
+ int fields = 0;
+
+ while (token) {
+ token_array[fields++] = token;
+ token = strtok(NULL, ":\t");
+ }
+
+ /* Field 3 is perf imc value */
+ bw_imc[runs] = strtoul(token_array[3], NULL, 0);
+ /* Field 5 is resctrl value */
+ bw_resc[runs] = strtoul(token_array[5], NULL, 0);
+ runs++;
+ }
+
+ fclose(fp);
+
+ show_mba_info(bw_imc, bw_resc);
+
+ return 0;
+}
+
+void mba_test_cleanup(void)
+{
+ remove(RESULT_FILE_NAME);
+}
+
+int mba_schemata_change(int cpu_no, char *bw_report, char **benchmark_cmd)
+{
+ struct resctrl_val_param param = {
+ .resctrl_val = "mba",
+ .ctrlgrp = "c1",
+ .mongrp = "m1",
+ .cpu_no = cpu_no,
+ .mum_resctrlfs = 1,
+ .filename = RESULT_FILE_NAME,
+ .bw_report = bw_report,
+ .setup = mba_setup
+ };
+ int ret;
+
+ remove(RESULT_FILE_NAME);
+
+ if (!validate_resctrl_feature_request("mba"))
+ return -1;
+
+ ret = resctrl_val(benchmark_cmd, ¶m);
+ if (ret)
+ return ret;
+
+ ret = check_results();
+ if (ret)
+ return ret;
+
+ mba_test_cleanup();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
new file mode 100644
index 000000000000..4700f7453f81
--- /dev/null
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory Bandwidth Monitoring (MBM) test
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+
+#define RESULT_FILE_NAME "result_mbm"
+#define MAX_DIFF 300
+#define NUM_OF_RUNS 5
+
+static void
+show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, int span)
+{
+ unsigned long avg_bw_imc = 0, avg_bw_resc = 0;
+ unsigned long sum_bw_imc = 0, sum_bw_resc = 0;
+ long avg_diff = 0;
+ int runs;
+
+ /*
+ * Discard the first value which is inaccurate due to monitoring setup
+ * transition phase.
+ */
+ for (runs = 1; runs < NUM_OF_RUNS ; runs++) {
+ sum_bw_imc += bw_imc[runs];
+ sum_bw_resc += bw_resc[runs];
+ }
+
+ avg_bw_imc = sum_bw_imc / 4;
+ avg_bw_resc = sum_bw_resc / 4;
+ avg_diff = avg_bw_resc - avg_bw_imc;
+
+ printf("%sok MBM: diff within %d%%\n",
+ labs(avg_diff) > MAX_DIFF ? "not " : "", MAX_DIFF);
+ tests_run++;
+ printf("# avg_diff: %lu\n", labs(avg_diff));
+ printf("# Span (MB): %d\n", span);
+ printf("# avg_bw_imc: %lu\n", avg_bw_imc);
+ printf("# avg_bw_resc: %lu\n", avg_bw_resc);
+}
+
+static int check_results(int span)
+{
+ unsigned long bw_imc[NUM_OF_RUNS], bw_resc[NUM_OF_RUNS];
+ char temp[1024], *token_array[8];
+ char output[] = RESULT_FILE_NAME;
+ int runs;
+ FILE *fp;
+
+ printf("# Checking for pass/fail\n");
+
+ fp = fopen(output, "r");
+ if (!fp) {
+ perror(output);
+
+ return errno;
+ }
+
+ runs = 0;
+ while (fgets(temp, sizeof(temp), fp)) {
+ char *token = strtok(temp, ":\t");
+ int i = 0;
+
+ while (token) {
+ token_array[i++] = token;
+ token = strtok(NULL, ":\t");
+ }
+
+ bw_resc[runs] = strtoul(token_array[5], NULL, 0);
+ bw_imc[runs] = strtoul(token_array[3], NULL, 0);
+ runs++;
+ }
+
+ show_bw_info(bw_imc, bw_resc, span);
+
+ fclose(fp);
+
+ return 0;
+}
+
+static int mbm_setup(int num, ...)
+{
+ struct resctrl_val_param *p;
+ static int num_of_runs;
+ va_list param;
+ int ret = 0;
+
+ /* Run NUM_OF_RUNS times */
+ if (num_of_runs++ >= NUM_OF_RUNS)
+ return -1;
+
+ va_start(param, num);
+ p = va_arg(param, struct resctrl_val_param *);
+ va_end(param);
+
+ /* Set up shemata with 100% allocation on the first run. */
+ if (num_of_runs == 0)
+ ret = write_schemata(p->ctrlgrp, "100", p->cpu_no,
+ p->resctrl_val);
+
+ return ret;
+}
+
+void mbm_test_cleanup(void)
+{
+ remove(RESULT_FILE_NAME);
+}
+
+int mbm_bw_change(int span, int cpu_no, char *bw_report, char **benchmark_cmd)
+{
+ struct resctrl_val_param param = {
+ .resctrl_val = "mbm",
+ .ctrlgrp = "c1",
+ .mongrp = "m1",
+ .span = span,
+ .cpu_no = cpu_no,
+ .mum_resctrlfs = 1,
+ .filename = RESULT_FILE_NAME,
+ .bw_report = bw_report,
+ .setup = mbm_setup
+ };
+ int ret;
+
+ remove(RESULT_FILE_NAME);
+
+ if (!validate_resctrl_feature_request("mbm"))
+ return -1;
+
+ ret = resctrl_val(benchmark_cmd, ¶m);
+ if (ret)
+ return ret;
+
+ ret = check_results(span);
+ if (ret)
+ return ret;
+
+ mbm_test_cleanup();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
new file mode 100644
index 000000000000..39bf59c6b9c5
--- /dev/null
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#define _GNU_SOURCE
+#ifndef RESCTRL_H
+#define RESCTRL_H
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#define MB (1024 * 1024)
+#define RESCTRL_PATH "/sys/fs/resctrl"
+#define PHYS_ID_PATH "/sys/devices/system/cpu/cpu"
+#define CBM_MASK_PATH "/sys/fs/resctrl/info"
+
+#define PARENT_EXIT(err_msg) \
+ do { \
+ perror(err_msg); \
+ kill(ppid, SIGKILL); \
+ exit(EXIT_FAILURE); \
+ } while (0)
+
+/*
+ * resctrl_val_param: resctrl test parameters
+ * @resctrl_val: Resctrl feature (Eg: mbm, mba.. etc)
+ * @ctrlgrp: Name of the control monitor group (con_mon grp)
+ * @mongrp: Name of the monitor group (mon grp)
+ * @cpu_no: CPU number to which the benchmark would be binded
+ * @span: Memory bytes accessed in each benchmark iteration
+ * @mum_resctrlfs: Should the resctrl FS be remounted?
+ * @filename: Name of file to which the o/p should be written
+ * @bw_report: Bandwidth report type (reads vs writes)
+ * @setup: Call back function to setup test environment
+ */
+struct resctrl_val_param {
+ char *resctrl_val;
+ char ctrlgrp[64];
+ char mongrp[64];
+ int cpu_no;
+ unsigned long span;
+ int mum_resctrlfs;
+ char filename[64];
+ char *bw_report;
+ unsigned long mask;
+ int num_of_runs;
+ int (*setup)(int num, ...);
+};
+
+pid_t bm_pid, ppid;
+int tests_run;
+
+char llc_occup_path[1024];
+bool is_amd;
+
+bool check_resctrlfs_support(void);
+int filter_dmesg(void);
+int remount_resctrlfs(bool mum_resctrlfs);
+int get_resource_id(int cpu_no, int *resource_id);
+int umount_resctrlfs(void);
+int validate_bw_report_request(char *bw_report);
+bool validate_resctrl_feature_request(char *resctrl_val);
+char *fgrep(FILE *inf, const char *str);
+int taskset_benchmark(pid_t bm_pid, int cpu_no);
+void run_benchmark(int signum, siginfo_t *info, void *ucontext);
+int write_schemata(char *ctrlgrp, char *schemata, int cpu_no,
+ char *resctrl_val);
+int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
+ char *resctrl_val);
+int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu,
+ int group_fd, unsigned long flags);
+int run_fill_buf(unsigned long span, int malloc_and_init_memory, int memflush,
+ int op, char *resctrl_va);
+int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param);
+int mbm_bw_change(int span, int cpu_no, char *bw_report, char **benchmark_cmd);
+void tests_cleanup(void);
+void mbm_test_cleanup(void);
+int mba_schemata_change(int cpu_no, char *bw_report, char **benchmark_cmd);
+void mba_test_cleanup(void);
+int get_cbm_mask(char *cache_type);
+int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size);
+void ctrlc_handler(int signum, siginfo_t *info, void *ptr);
+int cat_val(struct resctrl_val_param *param);
+void cat_test_cleanup(void);
+int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type);
+int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd);
+unsigned int count_bits(unsigned long n);
+void cqm_test_cleanup(void);
+int get_core_sibling(int cpu_no);
+int measure_cache_vals(struct resctrl_val_param *param, int bm_pid);
+
+#endif /* RESCTRL_H */
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
new file mode 100644
index 000000000000..425cc85ac883
--- /dev/null
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -0,0 +1,202 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Resctrl tests
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+
+#define BENCHMARK_ARGS 64
+#define BENCHMARK_ARG_SIZE 64
+
+bool is_amd;
+
+void detect_amd(void)
+{
+ FILE *inf = fopen("/proc/cpuinfo", "r");
+ char *res;
+
+ if (!inf)
+ return;
+
+ res = fgrep(inf, "vendor_id");
+
+ if (res) {
+ char *s = strchr(res, ':');
+
+ is_amd = s && !strcmp(s, ": AuthenticAMD\n");
+ free(res);
+ }
+ fclose(inf);
+}
+
+static void cmd_help(void)
+{
+ printf("usage: resctrl_tests [-h] [-b \"benchmark_cmd [options]\"] [-t test list] [-n no_of_bits]\n");
+ printf("\t-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CQM");
+ printf("\t default benchmark is builtin fill_buf\n");
+ printf("\t-t test list: run tests specified in the test list, ");
+ printf("e.g. -t mbm, mba, cqm, cat\n");
+ printf("\t-n no_of_bits: run cache tests using specified no of bits in cache bit mask\n");
+ printf("\t-p cpu_no: specify CPU number to run the test. 1 is default\n");
+ printf("\t-h: help\n");
+}
+
+void tests_cleanup(void)
+{
+ mbm_test_cleanup();
+ mba_test_cleanup();
+ cqm_test_cleanup();
+ cat_test_cleanup();
+}
+
+int main(int argc, char **argv)
+{
+ bool has_ben = false, mbm_test = true, mba_test = true, cqm_test = true;
+ int res, c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits = 5;
+ char *benchmark_cmd[BENCHMARK_ARGS], bw_report[64], bm_type[64];
+ char benchmark_cmd_area[BENCHMARK_ARGS][BENCHMARK_ARG_SIZE];
+ int ben_ind, ben_count;
+ bool cat_test = true;
+
+ for (i = 0; i < argc; i++) {
+ if (strcmp(argv[i], "-b") == 0) {
+ ben_ind = i + 1;
+ ben_count = argc - ben_ind;
+ argc_new = ben_ind - 1;
+ has_ben = true;
+ break;
+ }
+ }
+
+ while ((c = getopt(argc_new, argv, "ht:b:")) != -1) {
+ char *token;
+
+ switch (c) {
+ case 't':
+ token = strtok(optarg, ",");
+
+ mbm_test = false;
+ mba_test = false;
+ cqm_test = false;
+ cat_test = false;
+ while (token) {
+ if (!strcmp(token, "mbm")) {
+ mbm_test = true;
+ } else if (!strcmp(token, "mba")) {
+ mba_test = true;
+ } else if (!strcmp(token, "cqm")) {
+ cqm_test = true;
+ } else if (!strcmp(token, "cat")) {
+ cat_test = true;
+ } else {
+ printf("invalid argument\n");
+
+ return -1;
+ }
+ token = strtok(NULL, ":\t");
+ }
+ break;
+ case 'p':
+ cpu_no = atoi(optarg);
+ break;
+ case 'n':
+ no_of_bits = atoi(optarg);
+ break;
+ case 'h':
+ cmd_help();
+
+ return 0;
+ default:
+ printf("invalid argument\n");
+
+ return -1;
+ }
+ }
+
+ printf("TAP version 13\n");
+
+ /*
+ * Typically we need root privileges, because:
+ * 1. We write to resctrl FS
+ * 2. We execute perf commands
+ */
+ if (geteuid() != 0)
+ printf("# WARNING: not running as root, tests may fail.\n");
+
+ /* Detect AMD vendor */
+ detect_amd();
+
+ if (has_ben) {
+ /* Extract benchmark command from command line. */
+ for (i = ben_ind; i < argc; i++) {
+ benchmark_cmd[i - ben_ind] = benchmark_cmd_area[i];
+ sprintf(benchmark_cmd[i - ben_ind], "%s", argv[i]);
+ }
+ benchmark_cmd[ben_count] = NULL;
+ } else {
+ /* If no benchmark is given by "-b" argument, use fill_buf. */
+ for (i = 0; i < 6; i++)
+ benchmark_cmd[i] = benchmark_cmd_area[i];
+
+ strcpy(benchmark_cmd[0], "fill_buf");
+ sprintf(benchmark_cmd[1], "%d", span);
+ strcpy(benchmark_cmd[2], "1");
+ strcpy(benchmark_cmd[3], "1");
+ strcpy(benchmark_cmd[4], "0");
+ strcpy(benchmark_cmd[5], "");
+ benchmark_cmd[6] = NULL;
+ }
+
+ sprintf(bw_report, "reads");
+ sprintf(bm_type, "fill_buf");
+
+ check_resctrlfs_support();
+ filter_dmesg();
+
+ if (!is_amd && mbm_test) {
+ printf("# Starting MBM BW change ...\n");
+ if (!has_ben)
+ sprintf(benchmark_cmd[5], "%s", "mba");
+ res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
+ printf("%sok MBM: bw change\n", res ? "not " : "");
+ mbm_test_cleanup();
+ tests_run++;
+ }
+
+ if (!is_amd && mba_test) {
+ printf("# Starting MBA Schemata change ...\n");
+ if (!has_ben)
+ sprintf(benchmark_cmd[1], "%d", span);
+ res = mba_schemata_change(cpu_no, bw_report, benchmark_cmd);
+ printf("%sok MBA: schemata change\n", res ? "not " : "");
+ mba_test_cleanup();
+ tests_run++;
+ }
+
+ if (cqm_test) {
+ printf("# Starting CQM test ...\n");
+ if (!has_ben)
+ sprintf(benchmark_cmd[5], "%s", "cqm");
+ res = cqm_resctrl_val(cpu_no, no_of_bits, benchmark_cmd);
+ printf("%sok CQM: test\n", res ? "not " : "");
+ cqm_test_cleanup();
+ tests_run++;
+ }
+
+ if (cat_test) {
+ printf("# Starting CAT test ...\n");
+ res = cat_perf_miss_val(cpu_no, no_of_bits, "L3");
+ printf("%sok CAT: test\n", res ? "not " : "");
+ tests_run++;
+ cat_test_cleanup();
+ }
+
+ printf("1..%d\n", tests_run);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
new file mode 100644
index 000000000000..520fea3606d1
--- /dev/null
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -0,0 +1,744 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory bandwidth monitoring and allocation library
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+
+#define UNCORE_IMC "uncore_imc"
+#define READ_FILE_NAME "events/cas_count_read"
+#define WRITE_FILE_NAME "events/cas_count_write"
+#define DYN_PMU_PATH "/sys/bus/event_source/devices"
+#define SCALE 0.00006103515625
+#define MAX_IMCS 20
+#define MAX_TOKENS 5
+#define READ 0
+#define WRITE 1
+#define CON_MON_MBM_LOCAL_BYTES_PATH \
+ "%s/%s/mon_groups/%s/mon_data/mon_L3_%02d/mbm_local_bytes"
+
+#define CON_MBM_LOCAL_BYTES_PATH \
+ "%s/%s/mon_data/mon_L3_%02d/mbm_local_bytes"
+
+#define MON_MBM_LOCAL_BYTES_PATH \
+ "%s/mon_groups/%s/mon_data/mon_L3_%02d/mbm_local_bytes"
+
+#define MBM_LOCAL_BYTES_PATH \
+ "%s/mon_data/mon_L3_%02d/mbm_local_bytes"
+
+#define CON_MON_LCC_OCCUP_PATH \
+ "%s/%s/mon_groups/%s/mon_data/mon_L3_%02d/llc_occupancy"
+
+#define CON_LCC_OCCUP_PATH \
+ "%s/%s/mon_data/mon_L3_%02d/llc_occupancy"
+
+#define MON_LCC_OCCUP_PATH \
+ "%s/mon_groups/%s/mon_data/mon_L3_%02d/llc_occupancy"
+
+#define LCC_OCCUP_PATH \
+ "%s/mon_data/mon_L3_%02d/llc_occupancy"
+
+struct membw_read_format {
+ __u64 value; /* The value of the event */
+ __u64 time_enabled; /* if PERF_FORMAT_TOTAL_TIME_ENABLED */
+ __u64 time_running; /* if PERF_FORMAT_TOTAL_TIME_RUNNING */
+ __u64 id; /* if PERF_FORMAT_ID */
+};
+
+struct imc_counter_config {
+ __u32 type;
+ __u64 event;
+ __u64 umask;
+ struct perf_event_attr pe;
+ struct membw_read_format return_value;
+ int fd;
+};
+
+static char mbm_total_path[1024];
+static int imcs;
+static struct imc_counter_config imc_counters_config[MAX_IMCS][2];
+
+void membw_initialize_perf_event_attr(int i, int j)
+{
+ memset(&imc_counters_config[i][j].pe, 0,
+ sizeof(struct perf_event_attr));
+ imc_counters_config[i][j].pe.type = imc_counters_config[i][j].type;
+ imc_counters_config[i][j].pe.size = sizeof(struct perf_event_attr);
+ imc_counters_config[i][j].pe.disabled = 1;
+ imc_counters_config[i][j].pe.inherit = 1;
+ imc_counters_config[i][j].pe.exclude_guest = 0;
+ imc_counters_config[i][j].pe.config =
+ imc_counters_config[i][j].umask << 8 |
+ imc_counters_config[i][j].event;
+ imc_counters_config[i][j].pe.sample_type = PERF_SAMPLE_IDENTIFIER;
+ imc_counters_config[i][j].pe.read_format =
+ PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING;
+}
+
+void membw_ioctl_perf_event_ioc_reset_enable(int i, int j)
+{
+ ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_RESET, 0);
+ ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_ENABLE, 0);
+}
+
+void membw_ioctl_perf_event_ioc_disable(int i, int j)
+{
+ ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_DISABLE, 0);
+}
+
+/*
+ * get_event_and_umask: Parse config into event and umask
+ * @cas_count_cfg: Config
+ * @count: iMC number
+ * @op: Operation (read/write)
+ */
+void get_event_and_umask(char *cas_count_cfg, int count, bool op)
+{
+ char *token[MAX_TOKENS];
+ int i = 0;
+
+ strcat(cas_count_cfg, ",");
+ token[0] = strtok(cas_count_cfg, "=,");
+
+ for (i = 1; i < MAX_TOKENS; i++)
+ token[i] = strtok(NULL, "=,");
+
+ for (i = 0; i < MAX_TOKENS; i++) {
+ if (!token[i])
+ break;
+ if (strcmp(token[i], "event") == 0) {
+ if (op == READ)
+ imc_counters_config[count][READ].event =
+ strtol(token[i + 1], NULL, 16);
+ else
+ imc_counters_config[count][WRITE].event =
+ strtol(token[i + 1], NULL, 16);
+ }
+ if (strcmp(token[i], "umask") == 0) {
+ if (op == READ)
+ imc_counters_config[count][READ].umask =
+ strtol(token[i + 1], NULL, 16);
+ else
+ imc_counters_config[count][WRITE].umask =
+ strtol(token[i + 1], NULL, 16);
+ }
+ }
+}
+
+static int open_perf_event(int i, int cpu_no, int j)
+{
+ imc_counters_config[i][j].fd =
+ perf_event_open(&imc_counters_config[i][j].pe, -1, cpu_no, -1,
+ PERF_FLAG_FD_CLOEXEC);
+
+ if (imc_counters_config[i][j].fd == -1) {
+ fprintf(stderr, "Error opening leader %llx\n",
+ imc_counters_config[i][j].pe.config);
+
+ return -1;
+ }
+
+ return 0;
+}
+
+/* Get type and config (read and write) of an iMC counter */
+static int read_from_imc_dir(char *imc_dir, int count)
+{
+ char cas_count_cfg[1024], imc_counter_cfg[1024], imc_counter_type[1024];
+ FILE *fp;
+
+ /* Get type of iMC counter */
+ sprintf(imc_counter_type, "%s%s", imc_dir, "type");
+ fp = fopen(imc_counter_type, "r");
+ if (!fp) {
+ perror("Failed to open imc counter type file");
+
+ return -1;
+ }
+ if (fscanf(fp, "%u", &imc_counters_config[count][READ].type) <= 0) {
+ perror("Could not get imc type");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ imc_counters_config[count][WRITE].type =
+ imc_counters_config[count][READ].type;
+
+ /* Get read config */
+ sprintf(imc_counter_cfg, "%s%s", imc_dir, READ_FILE_NAME);
+ fp = fopen(imc_counter_cfg, "r");
+ if (!fp) {
+ perror("Failed to open imc config file");
+
+ return -1;
+ }
+ if (fscanf(fp, "%s", cas_count_cfg) <= 0) {
+ perror("Could not get imc cas count read");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ get_event_and_umask(cas_count_cfg, count, READ);
+
+ /* Get write config */
+ sprintf(imc_counter_cfg, "%s%s", imc_dir, WRITE_FILE_NAME);
+ fp = fopen(imc_counter_cfg, "r");
+ if (!fp) {
+ perror("Failed to open imc config file");
+
+ return -1;
+ }
+ if (fscanf(fp, "%s", cas_count_cfg) <= 0) {
+ perror("Could not get imc cas count write");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ get_event_and_umask(cas_count_cfg, count, WRITE);
+
+ return 0;
+}
+
+/*
+ * A system can have 'n' number of iMC (Integrated Memory Controller)
+ * counters, get that 'n'. For each iMC counter get it's type and config.
+ * Also, each counter has two configs, one for read and the other for write.
+ * A config again has two parts, event and umask.
+ * Enumerate all these details into an array of structures.
+ *
+ * Return: >= 0 on success. < 0 on failure.
+ */
+static int num_of_imcs(void)
+{
+ unsigned int count = 0;
+ char imc_dir[512];
+ struct dirent *ep;
+ int ret;
+ DIR *dp;
+
+ dp = opendir(DYN_PMU_PATH);
+ if (dp) {
+ while ((ep = readdir(dp))) {
+ if (strstr(ep->d_name, UNCORE_IMC)) {
+ sprintf(imc_dir, "%s/%s/", DYN_PMU_PATH,
+ ep->d_name);
+ ret = read_from_imc_dir(imc_dir, count);
+ if (ret) {
+ closedir(dp);
+
+ return ret;
+ }
+ count++;
+ }
+ }
+ closedir(dp);
+ if (count == 0) {
+ perror("Unable find iMC counters!\n");
+
+ return -1;
+ }
+ } else {
+ perror("Unable to open PMU directory!\n");
+
+ return -1;
+ }
+
+ return count;
+}
+
+static int initialize_mem_bw_imc(void)
+{
+ int imc, j;
+
+ imcs = num_of_imcs();
+ if (imcs <= 0)
+ return imcs;
+
+ /* Initialize perf_event_attr structures for all iMC's */
+ for (imc = 0; imc < imcs; imc++) {
+ for (j = 0; j < 2; j++)
+ membw_initialize_perf_event_attr(imc, j);
+ }
+
+ return 0;
+}
+
+/*
+ * get_mem_bw_imc: Memory band width as reported by iMC counters
+ * @cpu_no: CPU number that the benchmark PID is binded to
+ * @bw_report: Bandwidth report type (reads, writes)
+ *
+ * Memory B/W utilized by a process on a socket can be calculated using
+ * iMC counters. Perf events are used to read these counters.
+ *
+ * Return: >= 0 on success. < 0 on failure.
+ */
+static float get_mem_bw_imc(int cpu_no, char *bw_report)
+{
+ float reads, writes, of_mul_read, of_mul_write;
+ int imc, j, ret;
+
+ /* Start all iMC counters to log values (both read and write) */
+ reads = 0, writes = 0, of_mul_read = 1, of_mul_write = 1;
+ for (imc = 0; imc < imcs; imc++) {
+ for (j = 0; j < 2; j++) {
+ ret = open_perf_event(imc, cpu_no, j);
+ if (ret)
+ return -1;
+ }
+ for (j = 0; j < 2; j++)
+ membw_ioctl_perf_event_ioc_reset_enable(imc, j);
+ }
+
+ sleep(1);
+
+ /* Stop counters after a second to get results (both read and write) */
+ for (imc = 0; imc < imcs; imc++) {
+ for (j = 0; j < 2; j++)
+ membw_ioctl_perf_event_ioc_disable(imc, j);
+ }
+
+ /*
+ * Get results which are stored in struct type imc_counter_config
+ * Take over flow into consideration before calculating total b/w
+ */
+ for (imc = 0; imc < imcs; imc++) {
+ struct imc_counter_config *r =
+ &imc_counters_config[imc][READ];
+ struct imc_counter_config *w =
+ &imc_counters_config[imc][WRITE];
+
+ if (read(r->fd, &r->return_value,
+ sizeof(struct membw_read_format)) == -1) {
+ perror("Couldn't get read b/w through iMC");
+
+ return -1;
+ }
+
+ if (read(w->fd, &w->return_value,
+ sizeof(struct membw_read_format)) == -1) {
+ perror("Couldn't get write bw through iMC");
+
+ return -1;
+ }
+
+ __u64 r_time_enabled = r->return_value.time_enabled;
+ __u64 r_time_running = r->return_value.time_running;
+
+ if (r_time_enabled != r_time_running)
+ of_mul_read = (float)r_time_enabled /
+ (float)r_time_running;
+
+ __u64 w_time_enabled = w->return_value.time_enabled;
+ __u64 w_time_running = w->return_value.time_running;
+
+ if (w_time_enabled != w_time_running)
+ of_mul_write = (float)w_time_enabled /
+ (float)w_time_running;
+ reads += r->return_value.value * of_mul_read * SCALE;
+ writes += w->return_value.value * of_mul_write * SCALE;
+ }
+
+ for (imc = 0; imc < imcs; imc++) {
+ close(imc_counters_config[imc][READ].fd);
+ close(imc_counters_config[imc][WRITE].fd);
+ }
+
+ if (strcmp(bw_report, "reads") == 0)
+ return reads;
+
+ if (strcmp(bw_report, "writes") == 0)
+ return writes;
+
+ return (reads + writes);
+}
+
+void set_mbm_path(const char *ctrlgrp, const char *mongrp, int resource_id)
+{
+ if (ctrlgrp && mongrp)
+ sprintf(mbm_total_path, CON_MON_MBM_LOCAL_BYTES_PATH,
+ RESCTRL_PATH, ctrlgrp, mongrp, resource_id);
+ else if (!ctrlgrp && mongrp)
+ sprintf(mbm_total_path, MON_MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
+ mongrp, resource_id);
+ else if (ctrlgrp && !mongrp)
+ sprintf(mbm_total_path, CON_MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
+ ctrlgrp, resource_id);
+ else if (!ctrlgrp && !mongrp)
+ sprintf(mbm_total_path, MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
+ resource_id);
+}
+
+/*
+ * initialize_mem_bw_resctrl: Appropriately populate "mbm_total_path"
+ * @ctrlgrp: Name of the control monitor group (con_mon grp)
+ * @mongrp: Name of the monitor group (mon grp)
+ * @cpu_no: CPU number that the benchmark PID is binded to
+ * @resctrl_val: Resctrl feature (Eg: mbm, mba.. etc)
+ */
+static void initialize_mem_bw_resctrl(const char *ctrlgrp, const char *mongrp,
+ int cpu_no, char *resctrl_val)
+{
+ int resource_id;
+
+ if (get_resource_id(cpu_no, &resource_id) < 0) {
+ perror("Could not get resource_id");
+ return;
+ }
+
+ if (strcmp(resctrl_val, "mbm") == 0)
+ set_mbm_path(ctrlgrp, mongrp, resource_id);
+
+ if ((strcmp(resctrl_val, "mba") == 0)) {
+ if (ctrlgrp)
+ sprintf(mbm_total_path, CON_MBM_LOCAL_BYTES_PATH,
+ RESCTRL_PATH, ctrlgrp, resource_id);
+ else
+ sprintf(mbm_total_path, MBM_LOCAL_BYTES_PATH,
+ RESCTRL_PATH, resource_id);
+ }
+}
+
+/*
+ * Get MBM Local bytes as reported by resctrl FS
+ * For MBM,
+ * 1. If con_mon grp and mon grp are given, then read from con_mon grp's mon grp
+ * 2. If only con_mon grp is given, then read from con_mon grp
+ * 3. If both are not given, then read from root con_mon grp
+ * For MBA,
+ * 1. If con_mon grp is given, then read from it
+ * 2. If con_mon grp is not given, then read from root con_mon grp
+ */
+static unsigned long get_mem_bw_resctrl(void)
+{
+ unsigned long mbm_total = 0;
+ FILE *fp;
+
+ fp = fopen(mbm_total_path, "r");
+ if (!fp) {
+ perror("Failed to open total bw file");
+
+ return -1;
+ }
+ if (fscanf(fp, "%lu", &mbm_total) <= 0) {
+ perror("Could not get mbm local bytes");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ return mbm_total;
+}
+
+pid_t bm_pid, ppid;
+
+void ctrlc_handler(int signum, siginfo_t *info, void *ptr)
+{
+ kill(bm_pid, SIGKILL);
+ umount_resctrlfs();
+ tests_cleanup();
+ printf("Ending\n\n");
+
+ exit(EXIT_SUCCESS);
+}
+
+/*
+ * print_results_bw: the memory bandwidth results are stored in a file
+ * @filename: file that stores the results
+ * @bm_pid: child pid that runs benchmark
+ * @bw_imc: perf imc counter value
+ * @bw_resc: memory bandwidth value
+ *
+ * Return: 0 on success. non-zero on failure.
+ */
+static int print_results_bw(char *filename, int bm_pid, float bw_imc,
+ unsigned long bw_resc)
+{
+ unsigned long diff = fabs(bw_imc - bw_resc);
+ FILE *fp;
+
+ if (strcmp(filename, "stdio") == 0 || strcmp(filename, "stderr") == 0) {
+ printf("Pid: %d \t Mem_BW_iMC: %f \t ", bm_pid, bw_imc);
+ printf("Mem_BW_resc: %lu \t Difference: %lu\n", bw_resc, diff);
+ } else {
+ fp = fopen(filename, "a");
+ if (!fp) {
+ perror("Cannot open results file");
+
+ return errno;
+ }
+ if (fprintf(fp, "Pid: %d \t Mem_BW_iMC: %f \t Mem_BW_resc: %lu \t Difference: %lu\n",
+ bm_pid, bw_imc, bw_resc, diff) <= 0) {
+ fclose(fp);
+ perror("Could not log results.");
+
+ return errno;
+ }
+ fclose(fp);
+ }
+
+ return 0;
+}
+
+static void set_cqm_path(const char *ctrlgrp, const char *mongrp, char sock_num)
+{
+ if (strlen(ctrlgrp) && strlen(mongrp))
+ sprintf(llc_occup_path, CON_MON_LCC_OCCUP_PATH, RESCTRL_PATH,
+ ctrlgrp, mongrp, sock_num);
+ else if (!strlen(ctrlgrp) && strlen(mongrp))
+ sprintf(llc_occup_path, MON_LCC_OCCUP_PATH, RESCTRL_PATH,
+ mongrp, sock_num);
+ else if (strlen(ctrlgrp) && !strlen(mongrp))
+ sprintf(llc_occup_path, CON_LCC_OCCUP_PATH, RESCTRL_PATH,
+ ctrlgrp, sock_num);
+ else if (!strlen(ctrlgrp) && !strlen(mongrp))
+ sprintf(llc_occup_path, LCC_OCCUP_PATH, RESCTRL_PATH, sock_num);
+}
+
+/*
+ * initialize_llc_occu_resctrl: Appropriately populate "llc_occup_path"
+ * @ctrlgrp: Name of the control monitor group (con_mon grp)
+ * @mongrp: Name of the monitor group (mon grp)
+ * @cpu_no: CPU number that the benchmark PID is binded to
+ * @resctrl_val: Resctrl feature (Eg: cat, cqm.. etc)
+ */
+static void initialize_llc_occu_resctrl(const char *ctrlgrp, const char *mongrp,
+ int cpu_no, char *resctrl_val)
+{
+ int resource_id;
+
+ if (get_resource_id(cpu_no, &resource_id) < 0) {
+ perror("# Unable to resource_id");
+ return;
+ }
+
+ if (strcmp(resctrl_val, "cqm") == 0)
+ set_cqm_path(ctrlgrp, mongrp, resource_id);
+}
+
+static int
+measure_vals(struct resctrl_val_param *param, unsigned long *bw_resc_start)
+{
+ unsigned long bw_imc, bw_resc, bw_resc_end;
+ int ret;
+
+ /*
+ * Measure memory bandwidth from resctrl and from
+ * another source which is perf imc value or could
+ * be something else if perf imc event is not available.
+ * Compare the two values to validate resctrl value.
+ * It takes 1sec to measure the data.
+ */
+ bw_imc = get_mem_bw_imc(param->cpu_no, param->bw_report);
+ if (bw_imc <= 0)
+ return bw_imc;
+
+ bw_resc_end = get_mem_bw_resctrl();
+ if (bw_resc_end <= 0)
+ return bw_resc_end;
+
+ bw_resc = (bw_resc_end - *bw_resc_start) / MB;
+ ret = print_results_bw(param->filename, bm_pid, bw_imc, bw_resc);
+ if (ret)
+ return ret;
+
+ *bw_resc_start = bw_resc_end;
+
+ return 0;
+}
+
+/*
+ * resctrl_val: execute benchmark and measure memory bandwidth on
+ * the benchmark
+ * @benchmark_cmd: benchmark command and its arguments
+ * @param: parameters passed to resctrl_val()
+ *
+ * Return: 0 on success. non-zero on failure.
+ */
+int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param)
+{
+ char *resctrl_val = param->resctrl_val;
+ unsigned long bw_resc_start = 0;
+ struct sigaction sigact;
+ int ret = 0, pipefd[2];
+ char pipe_message = 0;
+ union sigval value;
+
+ if (strcmp(param->filename, "") == 0)
+ sprintf(param->filename, "stdio");
+
+ if ((strcmp(resctrl_val, "mba")) == 0 ||
+ (strcmp(resctrl_val, "mbm")) == 0) {
+ ret = validate_bw_report_request(param->bw_report);
+ if (ret)
+ return ret;
+ }
+
+ ret = remount_resctrlfs(param->mum_resctrlfs);
+ if (ret)
+ return ret;
+
+ /*
+ * If benchmark wasn't successfully started by child, then child should
+ * kill parent, so save parent's pid
+ */
+ ppid = getpid();
+
+ if (pipe(pipefd)) {
+ perror("# Unable to create pipe");
+
+ return -1;
+ }
+
+ /*
+ * Fork to start benchmark, save child's pid so that it can be killed
+ * when needed
+ */
+ bm_pid = fork();
+ if (bm_pid == -1) {
+ perror("# Unable to fork");
+
+ return -1;
+ }
+
+ if (bm_pid == 0) {
+ /*
+ * Mask all signals except SIGUSR1, parent uses SIGUSR1 to
+ * start benchmark
+ */
+ sigfillset(&sigact.sa_mask);
+ sigdelset(&sigact.sa_mask, SIGUSR1);
+
+ sigact.sa_sigaction = run_benchmark;
+ sigact.sa_flags = SA_SIGINFO;
+
+ /* Register for "SIGUSR1" signal from parent */
+ if (sigaction(SIGUSR1, &sigact, NULL))
+ PARENT_EXIT("Can't register child for signal");
+
+ /* Tell parent that child is ready */
+ close(pipefd[0]);
+ pipe_message = 1;
+ if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
+ sizeof(pipe_message)) {
+ perror("# failed signaling parent process");
+ close(pipefd[1]);
+ return -1;
+ }
+ close(pipefd[1]);
+
+ /* Suspend child until delivery of "SIGUSR1" from parent */
+ sigsuspend(&sigact.sa_mask);
+
+ PARENT_EXIT("Child is done");
+ }
+
+ printf("# benchmark PID: %d\n", bm_pid);
+
+ /*
+ * Register CTRL-C handler for parent, as it has to kill benchmark
+ * before exiting
+ */
+ sigact.sa_sigaction = ctrlc_handler;
+ sigemptyset(&sigact.sa_mask);
+ sigact.sa_flags = SA_SIGINFO;
+ if (sigaction(SIGINT, &sigact, NULL) ||
+ sigaction(SIGHUP, &sigact, NULL)) {
+ perror("# sigaction");
+ ret = errno;
+ goto out;
+ }
+
+ value.sival_ptr = benchmark_cmd;
+
+ /* Taskset benchmark to specified cpu */
+ ret = taskset_benchmark(bm_pid, param->cpu_no);
+ if (ret)
+ goto out;
+
+ /* Write benchmark to specified control&monitoring grp in resctrl FS */
+ ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
+ resctrl_val);
+ if (ret)
+ goto out;
+
+ if ((strcmp(resctrl_val, "mbm") == 0) ||
+ (strcmp(resctrl_val, "mba") == 0)) {
+ ret = initialize_mem_bw_imc();
+ if (ret)
+ goto out;
+
+ initialize_mem_bw_resctrl(param->ctrlgrp, param->mongrp,
+ param->cpu_no, resctrl_val);
+ } else if (strcmp(resctrl_val, "cqm") == 0)
+ initialize_llc_occu_resctrl(param->ctrlgrp, param->mongrp,
+ param->cpu_no, resctrl_val);
+
+ /* Parent waits for child to be ready. */
+ close(pipefd[1]);
+ while (pipe_message != 1) {
+ if (read(pipefd[0], &pipe_message, sizeof(pipe_message)) <
+ sizeof(pipe_message)) {
+ perror("# failed reading message from child process");
+ close(pipefd[0]);
+ goto out;
+ }
+ }
+ close(pipefd[0]);
+
+ /* Signal child to start benchmark */
+ if (sigqueue(bm_pid, SIGUSR1, value) == -1) {
+ perror("# sigqueue SIGUSR1 to child");
+ ret = errno;
+ goto out;
+ }
+
+ /* Give benchmark enough time to fully run */
+ sleep(1);
+
+ /* Test runs until the callback setup() tells the test to stop. */
+ while (1) {
+ if ((strcmp(resctrl_val, "mbm") == 0) ||
+ (strcmp(resctrl_val, "mba") == 0)) {
+ ret = param->setup(1, param);
+ if (ret) {
+ ret = 0;
+ break;
+ }
+
+ ret = measure_vals(param, &bw_resc_start);
+ if (ret)
+ break;
+ } else if (strcmp(resctrl_val, "cqm") == 0) {
+ ret = param->setup(1, param);
+ if (ret) {
+ ret = 0;
+ break;
+ }
+ sleep(1);
+ ret = measure_cache_vals(param, bm_pid);
+ if (ret)
+ break;
+ } else {
+ break;
+ }
+ }
+
+out:
+ kill(bm_pid, SIGKILL);
+ umount_resctrlfs();
+
+ return ret;
+}
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
new file mode 100644
index 000000000000..19c0ec4045a4
--- /dev/null
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -0,0 +1,722 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Basic resctrl file system operations
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Authors:
+ * Sai Praneeth Prakhya ,
+ * Fenghua Yu
+ */
+#include "resctrl.h"
+
+int tests_run;
+
+static int find_resctrl_mount(char *buffer)
+{
+ FILE *mounts;
+ char line[256], *fs, *mntpoint;
+
+ mounts = fopen("/proc/mounts", "r");
+ if (!mounts) {
+ perror("/proc/mounts");
+ return -ENXIO;
+ }
+ while (!feof(mounts)) {
+ if (!fgets(line, 256, mounts))
+ break;
+ fs = strtok(line, " \t");
+ if (!fs)
+ continue;
+ mntpoint = strtok(NULL, " \t");
+ if (!mntpoint)
+ continue;
+ fs = strtok(NULL, " \t");
+ if (!fs)
+ continue;
+ if (strcmp(fs, "resctrl"))
+ continue;
+
+ fclose(mounts);
+ if (buffer)
+ strncpy(buffer, mntpoint, 256);
+
+ return 0;
+ }
+
+ fclose(mounts);
+
+ return -ENOENT;
+}
+
+char cbm_mask[256];
+
+/*
+ * remount_resctrlfs - Remount resctrl FS at /sys/fs/resctrl
+ * @mum_resctrlfs: Should the resctrl FS be remounted?
+ *
+ * If not mounted, mount it.
+ * If mounted and mum_resctrlfs then remount resctrl FS.
+ * If mounted and !mum_resctrlfs then noop
+ *
+ * Return: 0 on success, non-zero on failure
+ */
+int remount_resctrlfs(bool mum_resctrlfs)
+{
+ char mountpoint[256];
+ int ret;
+
+ ret = find_resctrl_mount(mountpoint);
+ if (ret)
+ strcpy(mountpoint, RESCTRL_PATH);
+
+ if (!ret && mum_resctrlfs && umount(mountpoint)) {
+ printf("not ok unmounting \"%s\"\n", mountpoint);
+ perror("# umount");
+ tests_run++;
+ }
+
+ if (!ret && !mum_resctrlfs)
+ return 0;
+
+ ret = mount("resctrl", RESCTRL_PATH, "resctrl", 0, NULL);
+ printf("%sok mounting resctrl to \"%s\"\n", ret ? "not " : "",
+ RESCTRL_PATH);
+ if (ret)
+ perror("# mount");
+
+ tests_run++;
+
+ return ret;
+}
+
+int umount_resctrlfs(void)
+{
+ if (umount(RESCTRL_PATH)) {
+ perror("# Unable to umount resctrl");
+
+ return errno;
+ }
+
+ return 0;
+}
+
+/*
+ * get_resource_id - Get socket number/l3 id for a specified CPU
+ * @cpu_no: CPU number
+ * @resource_id: Socket number or l3_id
+ *
+ * Return: >= 0 on success, < 0 on failure.
+ */
+int get_resource_id(int cpu_no, int *resource_id)
+{
+ char phys_pkg_path[1024];
+ FILE *fp;
+
+ if (is_amd)
+ sprintf(phys_pkg_path, "%s%d/cache/index3/id",
+ PHYS_ID_PATH, cpu_no);
+ else
+ sprintf(phys_pkg_path, "%s%d/topology/physical_package_id",
+ PHYS_ID_PATH, cpu_no);
+
+ fp = fopen(phys_pkg_path, "r");
+ if (!fp) {
+ perror("Failed to open physical_package_id");
+
+ return -1;
+ }
+ if (fscanf(fp, "%d", resource_id) <= 0) {
+ perror("Could not get socket number or l3 id");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ return 0;
+}
+
+/*
+ * get_cache_size - Get cache size for a specified CPU
+ * @cpu_no: CPU number
+ * @cache_type: Cache level L2/L3
+ * @cache_size: pointer to cache_size
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size)
+{
+ char cache_path[1024], cache_str[64];
+ int length, i, cache_num;
+ FILE *fp;
+
+ if (!strcmp(cache_type, "L3")) {
+ cache_num = 3;
+ } else if (!strcmp(cache_type, "L2")) {
+ cache_num = 2;
+ } else {
+ perror("Invalid cache level");
+ return -1;
+ }
+
+ sprintf(cache_path, "/sys/bus/cpu/devices/cpu%d/cache/index%d/size",
+ cpu_no, cache_num);
+ fp = fopen(cache_path, "r");
+ if (!fp) {
+ perror("Failed to open cache size");
+
+ return -1;
+ }
+ if (fscanf(fp, "%s", cache_str) <= 0) {
+ perror("Could not get cache_size");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ length = (int)strlen(cache_str);
+
+ *cache_size = 0;
+
+ for (i = 0; i < length; i++) {
+ if ((cache_str[i] >= '0') && (cache_str[i] <= '9'))
+
+ *cache_size = *cache_size * 10 + (cache_str[i] - '0');
+
+ else if (cache_str[i] == 'K')
+
+ *cache_size = *cache_size * 1024;
+
+ else if (cache_str[i] == 'M')
+
+ *cache_size = *cache_size * 1024 * 1024;
+
+ else
+ break;
+ }
+
+ return 0;
+}
+
+#define CORE_SIBLINGS_PATH "/sys/bus/cpu/devices/cpu"
+
+/*
+ * get_cbm_mask - Get cbm mask for given cache
+ * @cache_type: Cache level L2/L3
+ *
+ * Mask is stored in cbm_mask which is global variable.
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+int get_cbm_mask(char *cache_type)
+{
+ char cbm_mask_path[1024];
+ FILE *fp;
+
+ sprintf(cbm_mask_path, "%s/%s/cbm_mask", CBM_MASK_PATH, cache_type);
+
+ fp = fopen(cbm_mask_path, "r");
+ if (!fp) {
+ perror("Failed to open cache level");
+
+ return -1;
+ }
+ if (fscanf(fp, "%s", cbm_mask) <= 0) {
+ perror("Could not get max cbm_mask");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ return 0;
+}
+
+/*
+ * get_core_sibling - Get sibling core id from the same socket for given CPU
+ * @cpu_no: CPU number
+ *
+ * Return: > 0 on success, < 0 on failure.
+ */
+int get_core_sibling(int cpu_no)
+{
+ char core_siblings_path[1024], cpu_list_str[64];
+ int sibling_cpu_no = -1;
+ FILE *fp;
+
+ sprintf(core_siblings_path, "%s%d/topology/core_siblings_list",
+ CORE_SIBLINGS_PATH, cpu_no);
+
+ fp = fopen(core_siblings_path, "r");
+ if (!fp) {
+ perror("Failed to open core siblings path");
+
+ return -1;
+ }
+ if (fscanf(fp, "%s", cpu_list_str) <= 0) {
+ perror("Could not get core_siblings list");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ char *token = strtok(cpu_list_str, "-,");
+
+ while (token) {
+ sibling_cpu_no = atoi(token);
+ /* Skipping core 0 as we don't want to run test on core 0 */
+ if (sibling_cpu_no != 0)
+ break;
+ token = strtok(NULL, "-,");
+ }
+
+ return sibling_cpu_no;
+}
+
+/*
+ * taskset_benchmark - Taskset PID (i.e. benchmark) to a specified cpu
+ * @bm_pid: PID that should be binded
+ * @cpu_no: CPU number at which the PID would be binded
+ *
+ * Return: 0 on success, non-zero on failure
+ */
+int taskset_benchmark(pid_t bm_pid, int cpu_no)
+{
+ cpu_set_t my_set;
+
+ CPU_ZERO(&my_set);
+ CPU_SET(cpu_no, &my_set);
+
+ if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) {
+ perror("Unable to taskset benchmark");
+
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * run_benchmark - Run a specified benchmark or fill_buf (default benchmark)
+ * in specified signal. Direct benchmark stdio to /dev/null.
+ * @signum: signal number
+ * @info: signal info
+ * @ucontext: user context in signal handling
+ *
+ * Return: void
+ */
+void run_benchmark(int signum, siginfo_t *info, void *ucontext)
+{
+ int operation, ret, malloc_and_init_memory, memflush;
+ unsigned long span, buffer_span;
+ char **benchmark_cmd;
+ char resctrl_val[64];
+ FILE *fp;
+
+ benchmark_cmd = info->si_ptr;
+
+ /*
+ * Direct stdio of child to /dev/null, so that only parent writes to
+ * stdio (console)
+ */
+ fp = freopen("/dev/null", "w", stdout);
+ if (!fp)
+ PARENT_EXIT("Unable to direct benchmark status to /dev/null");
+
+ if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
+ /* Execute default fill_buf benchmark */
+ span = strtoul(benchmark_cmd[1], NULL, 10);
+ malloc_and_init_memory = atoi(benchmark_cmd[2]);
+ memflush = atoi(benchmark_cmd[3]);
+ operation = atoi(benchmark_cmd[4]);
+ sprintf(resctrl_val, "%s", benchmark_cmd[5]);
+
+ if (strcmp(resctrl_val, "cqm") != 0)
+ buffer_span = span * MB;
+ else
+ buffer_span = span;
+
+ if (run_fill_buf(buffer_span, malloc_and_init_memory, memflush,
+ operation, resctrl_val))
+ fprintf(stderr, "Error in running fill buffer\n");
+ } else {
+ /* Execute specified benchmark */
+ ret = execvp(benchmark_cmd[0], benchmark_cmd);
+ if (ret)
+ perror("wrong\n");
+ }
+
+ fclose(stdout);
+ PARENT_EXIT("Unable to run specified benchmark");
+}
+
+/*
+ * create_grp - Create a group only if one doesn't exist
+ * @grp_name: Name of the group
+ * @grp: Full path and name of the group
+ * @parent_grp: Full path and name of the parent group
+ *
+ * Return: 0 on success, non-zero on failure
+ */
+static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
+{
+ int found_grp = 0;
+ struct dirent *ep;
+ DIR *dp;
+
+ /*
+ * At this point, we are guaranteed to have resctrl FS mounted and if
+ * length of grp_name == 0, it means, user wants to use root con_mon
+ * grp, so do nothing
+ */
+ if (strlen(grp_name) == 0)
+ return 0;
+
+ /* Check if requested grp exists or not */
+ dp = opendir(parent_grp);
+ if (dp) {
+ while ((ep = readdir(dp)) != NULL) {
+ if (strcmp(ep->d_name, grp_name) == 0)
+ found_grp = 1;
+ }
+ closedir(dp);
+ } else {
+ perror("Unable to open resctrl for group");
+
+ return -1;
+ }
+
+ /* Requested grp doesn't exist, hence create it */
+ if (found_grp == 0) {
+ if (mkdir(grp, 0) == -1) {
+ perror("Unable to create group");
+
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int write_pid_to_tasks(char *tasks, pid_t pid)
+{
+ FILE *fp;
+
+ fp = fopen(tasks, "w");
+ if (!fp) {
+ perror("Failed to open tasks file");
+
+ return -1;
+ }
+ if (fprintf(fp, "%d\n", pid) < 0) {
+ perror("Failed to wr pid to tasks file");
+ fclose(fp);
+
+ return -1;
+ }
+ fclose(fp);
+
+ return 0;
+}
+
+/*
+ * write_bm_pid_to_resctrl - Write a PID (i.e. benchmark) to resctrl FS
+ * @bm_pid: PID that should be written
+ * @ctrlgrp: Name of the control monitor group (con_mon grp)
+ * @mongrp: Name of the monitor group (mon grp)
+ * @resctrl_val: Resctrl feature (Eg: mbm, mba.. etc)
+ *
+ * If a con_mon grp is requested, create it and write pid to it, otherwise
+ * write pid to root con_mon grp.
+ * If a mon grp is requested, create it and write pid to it, otherwise
+ * pid is not written, this means that pid is in con_mon grp and hence
+ * should consult con_mon grp's mon_data directory for results.
+ *
+ * Return: 0 on success, non-zero on failure
+ */
+int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
+ char *resctrl_val)
+{
+ char controlgroup[128], monitorgroup[512], monitorgroup_p[256];
+ char tasks[1024];
+ int ret = 0;
+
+ if (strlen(ctrlgrp))
+ sprintf(controlgroup, "%s/%s", RESCTRL_PATH, ctrlgrp);
+ else
+ sprintf(controlgroup, "%s", RESCTRL_PATH);
+
+ /* Create control and monitoring group and write pid into it */
+ ret = create_grp(ctrlgrp, controlgroup, RESCTRL_PATH);
+ if (ret)
+ goto out;
+ sprintf(tasks, "%s/tasks", controlgroup);
+ ret = write_pid_to_tasks(tasks, bm_pid);
+ if (ret)
+ goto out;
+
+ /* Create mon grp and write pid into it for "mbm" and "cqm" test */
+ if ((strcmp(resctrl_val, "cqm") == 0) ||
+ (strcmp(resctrl_val, "mbm") == 0)) {
+ if (strlen(mongrp)) {
+ sprintf(monitorgroup_p, "%s/mon_groups", controlgroup);
+ sprintf(monitorgroup, "%s/%s", monitorgroup_p, mongrp);
+ ret = create_grp(mongrp, monitorgroup, monitorgroup_p);
+ if (ret)
+ goto out;
+
+ sprintf(tasks, "%s/mon_groups/%s/tasks",
+ controlgroup, mongrp);
+ ret = write_pid_to_tasks(tasks, bm_pid);
+ if (ret)
+ goto out;
+ }
+ }
+
+out:
+ printf("%sok writing benchmark parameters to resctrl FS\n",
+ ret ? "not " : "");
+ if (ret)
+ perror("# writing to resctrlfs");
+
+ tests_run++;
+
+ return ret;
+}
+
+/*
+ * write_schemata - Update schemata of a con_mon grp
+ * @ctrlgrp: Name of the con_mon grp
+ * @schemata: Schemata that should be updated to
+ * @cpu_no: CPU number that the benchmark PID is binded to
+ * @resctrl_val: Resctrl feature (Eg: mbm, mba.. etc)
+ *
+ * Update schemata of a con_mon grp *only* if requested resctrl feature is
+ * allocation type
+ *
+ * Return: 0 on success, non-zero on failure
+ */
+int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
+{
+ char controlgroup[1024], schema[1024], reason[64];
+ int resource_id, ret = 0;
+ FILE *fp;
+
+ if ((strcmp(resctrl_val, "mba") != 0) &&
+ (strcmp(resctrl_val, "cat") != 0) &&
+ (strcmp(resctrl_val, "cqm") != 0))
+ return -ENOENT;
+
+ if (!schemata) {
+ printf("# Skipping empty schemata update\n");
+
+ return -1;
+ }
+
+ if (get_resource_id(cpu_no, &resource_id) < 0) {
+ sprintf(reason, "Failed to get resource id");
+ ret = -1;
+
+ goto out;
+ }
+
+ if (strlen(ctrlgrp) != 0)
+ sprintf(controlgroup, "%s/%s/schemata", RESCTRL_PATH, ctrlgrp);
+ else
+ sprintf(controlgroup, "%s/schemata", RESCTRL_PATH);
+
+ if (!strcmp(resctrl_val, "cat") || !strcmp(resctrl_val, "cqm"))
+ sprintf(schema, "%s%d%c%s", "L3:", resource_id, '=', schemata);
+ if (strcmp(resctrl_val, "mba") == 0)
+ sprintf(schema, "%s%d%c%s", "MB:", resource_id, '=', schemata);
+
+ fp = fopen(controlgroup, "w");
+ if (!fp) {
+ sprintf(reason, "Failed to open control group");
+ ret = -1;
+
+ goto out;
+ }
+
+ if (fprintf(fp, "%s\n", schema) < 0) {
+ sprintf(reason, "Failed to write schemata in control group");
+ fclose(fp);
+ ret = -1;
+
+ goto out;
+ }
+ fclose(fp);
+
+out:
+ printf("%sok Write schema \"%s\" to resctrl FS%s%s\n",
+ ret ? "not " : "", schema, ret ? " # " : "",
+ ret ? reason : "");
+ tests_run++;
+
+ return ret;
+}
+
+bool check_resctrlfs_support(void)
+{
+ FILE *inf = fopen("/proc/filesystems", "r");
+ DIR *dp;
+ char *res;
+ bool ret = false;
+
+ if (!inf)
+ return false;
+
+ res = fgrep(inf, "nodev\tresctrl\n");
+
+ if (res) {
+ ret = true;
+ free(res);
+ }
+
+ fclose(inf);
+
+ printf("%sok kernel supports resctrl filesystem\n", ret ? "" : "not ");
+ tests_run++;
+
+ dp = opendir(RESCTRL_PATH);
+ printf("%sok resctrl mountpoint \"%s\" exists\n",
+ dp ? "" : "not ", RESCTRL_PATH);
+ if (dp)
+ closedir(dp);
+ tests_run++;
+
+ printf("# resctrl filesystem %s mounted\n",
+ find_resctrl_mount(NULL) ? "not" : "is");
+
+ return ret;
+}
+
+char *fgrep(FILE *inf, const char *str)
+{
+ char line[256];
+ int slen = strlen(str);
+
+ while (!feof(inf)) {
+ if (!fgets(line, 256, inf))
+ break;
+ if (strncmp(line, str, slen))
+ continue;
+
+ return strdup(line);
+ }
+
+ return NULL;
+}
+
+/*
+ * validate_resctrl_feature_request - Check if requested feature is valid.
+ * @resctrl_val: Requested feature
+ *
+ * Return: 0 on success, non-zero on failure
+ */
+bool validate_resctrl_feature_request(char *resctrl_val)
+{
+ FILE *inf = fopen("/proc/cpuinfo", "r");
+ bool found = false;
+ char *res;
+
+ if (!inf)
+ return false;
+
+ res = fgrep(inf, "flags");
+
+ if (res) {
+ char *s = strchr(res, ':');
+
+ found = s && !strstr(s, resctrl_val);
+ free(res);
+ }
+ fclose(inf);
+
+ return found;
+}
+
+int filter_dmesg(void)
+{
+ char line[1024];
+ FILE *fp;
+ int pipefds[2];
+ pid_t pid;
+ int ret;
+
+ ret = pipe(pipefds);
+ if (ret) {
+ perror("pipe");
+ return ret;
+ }
+ pid = fork();
+ if (pid == 0) {
+ close(pipefds[0]);
+ dup2(pipefds[1], STDOUT_FILENO);
+ execlp("dmesg", "dmesg", NULL);
+ perror("executing dmesg");
+ exit(1);
+ }
+ close(pipefds[1]);
+ fp = fdopen(pipefds[0], "r");
+ if (!fp) {
+ perror("fdopen(pipe)");
+ kill(pid, SIGTERM);
+
+ return -1;
+ }
+
+ while (fgets(line, 1024, fp)) {
+ if (strstr(line, "intel_rdt:"))
+ printf("# dmesg: %s", line);
+ if (strstr(line, "resctrl:"))
+ printf("# dmesg: %s", line);
+ }
+ fclose(fp);
+ waitpid(pid, NULL, 0);
+
+ return 0;
+}
+
+int validate_bw_report_request(char *bw_report)
+{
+ if (strcmp(bw_report, "reads") == 0)
+ return 0;
+ if (strcmp(bw_report, "writes") == 0)
+ return 0;
+ if (strcmp(bw_report, "nt-writes") == 0) {
+ strcpy(bw_report, "writes");
+ return 0;
+ }
+ if (strcmp(bw_report, "total") == 0)
+ return 0;
+
+ fprintf(stderr, "Requested iMC B/W report type unavailable\n");
+
+ return -1;
+}
+
+int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu,
+ int group_fd, unsigned long flags)
+{
+ int ret;
+
+ ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
+ group_fd, flags);
+ return ret;
+}
+
+unsigned int count_bits(unsigned long n)
+{
+ unsigned int count = 0;
+
+ while (n) {
+ count += n & 1;
+ n >>= 1;
+ }
+
+ return count;
+}
diff --git a/tools/testing/selftests/seccomp/Makefile b/tools/testing/selftests/seccomp/Makefile
index 1760b3e39730..0ebfe8b0e147 100644
--- a/tools/testing/selftests/seccomp/Makefile
+++ b/tools/testing/selftests/seccomp/Makefile
@@ -1,17 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-all:
-
-include ../lib.mk
-
-.PHONY: all clean
-
-BINARIES := seccomp_bpf seccomp_benchmark
CFLAGS += -Wl,-no-as-needed -Wall
+LDFLAGS += -lpthread
-seccomp_bpf: seccomp_bpf.c ../kselftest_harness.h
- $(CC) $(CFLAGS) $(LDFLAGS) $< -lpthread -o $@
-
-TEST_PROGS += $(BINARIES)
-EXTRA_CLEAN := $(BINARIES)
-
-all: $(BINARIES)
+TEST_GEN_PROGS := seccomp_bpf seccomp_benchmark
+include ../lib.mk
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index a9ad3bd8b2ad..89fb3e0b552e 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -913,7 +913,7 @@ TEST(ERRNO_order)
EXPECT_EQ(12, errno);
}
-FIXTURE_DATA(TRAP) {
+FIXTURE(TRAP) {
struct sock_fprog prog;
};
@@ -1024,7 +1024,7 @@ TEST_F(TRAP, handler)
EXPECT_NE(0, (unsigned long)sigsys->_call_addr);
}
-FIXTURE_DATA(precedence) {
+FIXTURE(precedence) {
struct sock_fprog allow;
struct sock_fprog log;
struct sock_fprog trace;
@@ -1513,7 +1513,7 @@ void tracer_poke(struct __test_metadata *_metadata, pid_t tracee, int status,
EXPECT_EQ(0, ret);
}
-FIXTURE_DATA(TRACE_poke) {
+FIXTURE(TRACE_poke) {
struct sock_fprog prog;
pid_t tracer;
long poked;
@@ -1821,7 +1821,7 @@ void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
change_syscall(_metadata, tracee, -1, -ESRCH);
}
-FIXTURE_DATA(TRACE_syscall) {
+FIXTURE(TRACE_syscall) {
struct sock_fprog prog;
pid_t tracer, mytid, mypid, parent;
};
@@ -2326,7 +2326,7 @@ struct tsync_sibling {
} \
} while (0)
-FIXTURE_DATA(TSYNC) {
+FIXTURE(TSYNC) {
struct sock_fprog root_prog, apply_prog;
struct tsync_sibling sibling[TSYNC_SIBLINGS];
sem_t started;
diff --git a/tools/testing/selftests/timens/exec.c b/tools/testing/selftests/timens/exec.c
index 87b47b557a7a..e40dc5be2f66 100644
--- a/tools/testing/selftests/timens/exec.c
+++ b/tools/testing/selftests/timens/exec.c
@@ -11,7 +11,6 @@
#include
#include
#include
-#include
#include
#include "log.h"
diff --git a/tools/testing/selftests/timens/procfs.c b/tools/testing/selftests/timens/procfs.c
index 43d93f4006b9..7f14f0fdac84 100644
--- a/tools/testing/selftests/timens/procfs.c
+++ b/tools/testing/selftests/timens/procfs.c
@@ -12,7 +12,6 @@
#include
#include
#include
-#include
#include "log.h"
#include "timens.h"
diff --git a/tools/testing/selftests/timens/timens.c b/tools/testing/selftests/timens/timens.c
index 559d26e21ba0..098be7c83be3 100644
--- a/tools/testing/selftests/timens/timens.c
+++ b/tools/testing/selftests/timens/timens.c
@@ -10,7 +10,6 @@
#include
#include
#include
-#include
#include
#include "log.h"
diff --git a/tools/testing/selftests/timens/timer.c b/tools/testing/selftests/timens/timer.c
index 0cca7aafc4bd..96dba11ebe44 100644
--- a/tools/testing/selftests/timens/timer.c
+++ b/tools/testing/selftests/timens/timer.c
@@ -11,7 +11,6 @@
#include
#include
#include
-#include
#include "log.h"
#include "timens.h"