From b608dd670bb69c89d5074685899d4c1089f57a16 Mon Sep 17 00:00:00 2001 From: Rahul Rameshbabu Date: Mon, 27 Feb 2023 13:57:00 -0800 Subject: [PATCH 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst De-duplicate documentation by removing mellanox/mlx5/devlink.rst. Instead, only use the generic devlink documentation directory to document mlx5 devlink parameters. Avoid providing general devlink tool usage information in mlx5-specific documentation. Signed-off-by: Rahul Rameshbabu Reviewed-by: Jiri Pirko Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/devlink.rst | 313 ------------------ .../ethernet/mellanox/mlx5/index.rst | 1 - Documentation/networking/devlink/mlx5.rst | 179 ++++++++++ 3 files changed, 179 insertions(+), 314 deletions(-) delete mode 100644 Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst deleted file mode 100644 index a4edf908b707..000000000000 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst +++ /dev/null @@ -1,313 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB -.. include:: - -======= -Devlink -======= - -:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - -Contents -======== - -- `Info`_ -- `Parameters`_ -- `Health reporters`_ - -Info -==== - -The devlink info reports the running and stored firmware versions on device. -It also prints the device PSID which represents the HCA board type ID. - -User command example:: - - $ devlink dev info pci/0000:00:06.0 - pci/0000:00:06.0: - driver mlx5_core - versions: - fixed: - fw.psid MT_0000000009 - running: - fw.version 16.26.0100 - stored: - fw.version 16.26.0100 - -Parameters -========== - -flow_steering_mode: Device flow steering mode ---------------------------------------------- -The flow steering mode parameter controls the flow steering mode of the driver. -Two modes are supported: - -1. 'dmfs' - Device managed flow steering. -2. 'smfs' - Software/Driver managed flow steering. - -In DMFS mode, the HW steering entities are created and managed through the -Firmware. -In SMFS mode, the HW steering entities are created and managed though by -the driver directly into hardware without firmware intervention. - -SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode. - -User command examples: - -- Set SMFS flow steering mode:: - - $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime - -- Read device flow steering mode:: - - $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode - pci/0000:06:00.0: - name flow_steering_mode type driver-specific - values: - cmode runtime value smfs - -enable_roce: RoCE enablement state ----------------------------------- -If the device supports RoCE disablement, RoCE enablement state controls device -support for RoCE capability. Otherwise, the control occurs in the driver stack. -When RoCE is disabled at the driver level, only raw ethernet QPs are supported. - -To change RoCE enablement state, a user must change the driverinit cmode value -and run devlink reload. - -User command examples: - -- Disable RoCE:: - - $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit - $ devlink dev reload pci/0000:06:00.0 - -- Read RoCE enablement state:: - - $ devlink dev param show pci/0000:06:00.0 name enable_roce - pci/0000:06:00.0: - name enable_roce type generic - values: - cmode driverinit value true - -esw_port_metadata: Eswitch port metadata state ----------------------------------------------- -When applicable, disabling eswitch metadata can increase packet rate -up to 20% depending on the use case and packet sizes. - -Eswitch port metadata state controls whether to internally tag packets with -metadata. Metadata tagging must be enabled for multi-port RoCE, failover -between representors and stacked devices. -By default metadata is enabled on the supported devices in E-switch. -Metadata is applicable only for E-switch in switchdev mode and -users may disable it when NONE of the below use cases will be in use: - -1. HCA is in Dual/multi-port RoCE mode. -2. VF/SF representor bonding (Usually used for Live migration) -3. Stacked devices - -When metadata is disabled, the above use cases will fail to initialize if -users try to enable them. - -- Show eswitch port metadata:: - - $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata - pci/0000:06:00.0: - name esw_port_metadata type driver-specific - values: - cmode runtime value true - -- Disable eswitch port metadata:: - - $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime - -- Change eswitch mode to switchdev mode where after choosing the metadata value:: - - $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev - -hairpin_num_queues: Number of hairpin queues --------------------------------------------- -We refer to a TC NIC rule that involves forwarding as "hairpin". - -Hairpin queues are mlx5 hardware specific implementation for hardware -forwarding of such packets. - -- Show the number of hairpin queues:: - - $ devlink dev param show pci/0000:06:00.0 name hairpin_num_queues - pci/0000:06:00.0: - name hairpin_num_queues type driver-specific - values: - cmode driverinit value 2 - -- Change the number of hairpin queues:: - - $ devlink dev param set pci/0000:06:00.0 name hairpin_num_queues value 4 cmode driverinit - -hairpin_queue_size: Size of the hairpin queues ----------------------------------------------- -Control the size of the hairpin queues. - -- Show the size of the hairpin queues:: - - $ devlink dev param show pci/0000:06:00.0 name hairpin_queue_size - pci/0000:06:00.0: - name hairpin_queue_size type driver-specific - values: - cmode driverinit value 1024 - -- Change the size (in packets) of the hairpin queues:: - - $ devlink dev param set pci/0000:06:00.0 name hairpin_queue_size value 512 cmode driverinit - -Health reporters -================ - -tx reporter ------------ -The tx reporter is responsible for reporting and recovering of the following two error scenarios: - -- tx timeout - Report on kernel tx timeout detection. - Recover by searching lost interrupts. -- tx error completion - Report on error tx completion. - Recover by flushing the tx queue and reset it. - -tx reporter also support on demand diagnose callback, on which it provides -real time information of its send queues status. - -User commands examples: - -- Diagnose send queues status:: - - $ devlink health diagnose pci/0000:82:00.0 reporter tx - -.. note:: - This command has valid output only when interface is up, otherwise the command has empty output. - -- Show number of tx errors indicated, number of recover flows ended successfully, - is autorecover enabled and graceful period from last recover:: - - $ devlink health show pci/0000:82:00.0 reporter tx - -rx reporter ------------ -The rx reporter is responsible for reporting and recovering of the following two error scenarios: - -- rx queues' initialization (population) timeout - Population of rx queues' descriptors on ring initialization is done - in napi context via triggering an irq. In case of a failure to get - the minimum amount of descriptors, a timeout would occur, and - descriptors could be recovered by polling the EQ (Event Queue). -- rx completions with errors (reported by HW on interrupt context) - Report on rx completion error. - Recover (if needed) by flushing the related queue and reset it. - -rx reporter also supports on demand diagnose callback, on which it -provides real time information of its receive queues' status. - -- Diagnose rx queues' status and corresponding completion queue:: - - $ devlink health diagnose pci/0000:82:00.0 reporter rx - -NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output. - -- Show number of rx errors indicated, number of recover flows ended successfully, - is autorecover enabled, and graceful period from last recover:: - - $ devlink health show pci/0000:82:00.0 reporter rx - -fw reporter ------------ -The fw reporter implements `diagnose` and `dump` callbacks. -It follows symptoms of fw error such as fw syndrome by triggering -fw core dump and storing it into the dump buffer. -The fw reporter diagnose command can be triggered any time by the user to check -current fw status. - -User commands examples: - -- Check fw heath status:: - - $ devlink health diagnose pci/0000:82:00.0 reporter fw - -- Read FW core dump if already stored or trigger new one:: - - $ devlink health dump show pci/0000:82:00.0 reporter fw - -.. note:: - This command can run only on the PF which has fw tracer ownership, - running it on other PF or any VF will return "Operation not permitted". - -fw fatal reporter ------------------ -The fw fatal reporter implements `dump` and `recover` callbacks. -It follows fatal errors indications by CR-space dump and recover flow. -The CR-space dump uses vsc interface which is valid even if the FW command -interface is not functional, which is the case in most FW fatal errors. -The recover function runs recover flow which reloads the driver and triggers fw -reset if needed. -On firmware error, the health buffer is dumped into the dmesg. The log -level is derived from the error's severity (given in health buffer). - -User commands examples: - -- Run fw recover flow manually:: - - $ devlink health recover pci/0000:82:00.0 reporter fw_fatal - -- Read FW CR-space dump if already stored or trigger new one:: - - $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal - -.. note:: - This command can run only on PF. - -vnic reporter -------------- -The vnic reporter implements only the `diagnose` callback. -It is responsible for querying the vnic diagnostic counters from fw and displaying -them in realtime. - -Description of the vnic counters: - -- total_q_under_processor_handle - number of queues in an error state due to - an async error or errored command. -- send_queue_priority_update_flow - number of QP/SQ priority/SL update events. -- cq_overrun - number of times CQ entered an error state due to an overflow. -- async_eq_overrun - number of times an EQ mapped to async events was overrun. - comp_eq_overrun number of times an EQ mapped to completion events was - overrun. -- quota_exceeded_command - number of commands issued and failed due to quota exceeded. -- invalid_command - number of commands issued and failed dues to any reason other than quota - exceeded. -- nic_receive_steering_discard - number of packets that completed RX flow - steering but were discarded due to a mismatch in flow table. -- generated_pkt_steering_fail - number of packets generated by the VNIC experiencing unexpected steering - failure (at any point in steering flow). -- handled_pkt_steering_fail - number of packets handled by the VNIC experiencing unexpected steering - failure (at any point in steering flow owned by the VNIC, including the FDB - for the eswitch owner). - -User commands examples: - -- Diagnose PF/VF vnic counters:: - - $ devlink health diagnose pci/0000:82:00.1 reporter vnic - -- Diagnose representor vnic counters (performed by supplying devlink port of the - representor, which can be obtained via devlink port command):: - - $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic - -.. note:: - This command can run over all interfaces such as PF/VF and representor ports. diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst index 3fdcd6b61ccf..581a91caa579 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst @@ -13,7 +13,6 @@ Contents: :maxdepth: 2 kconfig - devlink switchdev tracepoints counters diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 202798d6501e..196a4bb28df1 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -18,6 +18,11 @@ Parameters * - ``enable_roce`` - driverinit - Type: Boolean + + If the device supports RoCE disablement, RoCE enablement state controls + device support for RoCE capability. Otherwise, the control occurs in the + driver stack. When RoCE is disabled at the driver level, only raw + ethernet QPs are supported. * - ``io_eq_size`` - driverinit - The range is between 64 and 4096. @@ -48,6 +53,9 @@ parameters. * ``smfs`` Software managed flow steering. In SMFS mode, the HW steering entities are created and manage through the driver without firmware intervention. + + SMFS mode is faster and provides better rule insertion rate compared to + default DMFS mode. * - ``fdb_large_groups`` - u32 - driverinit @@ -71,7 +79,24 @@ parameters. deprecated. Default: disabled + * - ``esw_port_metadata`` + - Boolean + - runtime + - When applicable, disabling eswitch metadata can increase packet rate up + to 20% depending on the use case and packet sizes. + Eswitch port metadata state controls whether to internally tag packets + with metadata. Metadata tagging must be enabled for multi-port RoCE, + failover between representors and stacked devices. By default metadata is + enabled on the supported devices in E-switch. Metadata is applicable only + for E-switch in switchdev mode and users may disable it when NONE of the + below use cases will be in use: + 1. HCA is in Dual/multi-port RoCE mode. + 2. VF/SF representor bonding (Usually used for Live migration) + 3. Stacked devices + + When metadata is disabled, the above use cases will fail to initialize if + users try to enable them. * - ``hairpin_num_queues`` - u32 - driverinit @@ -104,3 +129,157 @@ The ``mlx5`` driver reports the following versions * - ``fw.version`` - stored, running - Three digit major.minor.subminor firmware version number. + +Health reporters +================ + +tx reporter +----------- +The tx reporter is responsible for reporting and recovering of the following two error scenarios: + +- tx timeout + Report on kernel tx timeout detection. + Recover by searching lost interrupts. +- tx error completion + Report on error tx completion. + Recover by flushing the tx queue and reset it. + +tx reporter also support on demand diagnose callback, on which it provides +real time information of its send queues status. + +User commands examples: + +- Diagnose send queues status:: + + $ devlink health diagnose pci/0000:82:00.0 reporter tx + +.. note:: + This command has valid output only when interface is up, otherwise the command has empty output. + +- Show number of tx errors indicated, number of recover flows ended successfully, + is autorecover enabled and graceful period from last recover:: + + $ devlink health show pci/0000:82:00.0 reporter tx + +rx reporter +----------- +The rx reporter is responsible for reporting and recovering of the following two error scenarios: + +- rx queues' initialization (population) timeout + Population of rx queues' descriptors on ring initialization is done + in napi context via triggering an irq. In case of a failure to get + the minimum amount of descriptors, a timeout would occur, and + descriptors could be recovered by polling the EQ (Event Queue). +- rx completions with errors (reported by HW on interrupt context) + Report on rx completion error. + Recover (if needed) by flushing the related queue and reset it. + +rx reporter also supports on demand diagnose callback, on which it +provides real time information of its receive queues' status. + +- Diagnose rx queues' status and corresponding completion queue:: + + $ devlink health diagnose pci/0000:82:00.0 reporter rx + +.. note:: + This command has valid output only when interface is up. Otherwise, the command has empty output. + +- Show number of rx errors indicated, number of recover flows ended successfully, + is autorecover enabled, and graceful period from last recover:: + + $ devlink health show pci/0000:82:00.0 reporter rx + +fw reporter +----------- +The fw reporter implements `diagnose` and `dump` callbacks. +It follows symptoms of fw error such as fw syndrome by triggering +fw core dump and storing it into the dump buffer. +The fw reporter diagnose command can be triggered any time by the user to check +current fw status. + +User commands examples: + +- Check fw heath status:: + + $ devlink health diagnose pci/0000:82:00.0 reporter fw + +- Read FW core dump if already stored or trigger new one:: + + $ devlink health dump show pci/0000:82:00.0 reporter fw + +.. note:: + This command can run only on the PF which has fw tracer ownership, + running it on other PF or any VF will return "Operation not permitted". + +fw fatal reporter +----------------- +The fw fatal reporter implements `dump` and `recover` callbacks. +It follows fatal errors indications by CR-space dump and recover flow. +The CR-space dump uses vsc interface which is valid even if the FW command +interface is not functional, which is the case in most FW fatal errors. +The recover function runs recover flow which reloads the driver and triggers fw +reset if needed. +On firmware error, the health buffer is dumped into the dmesg. The log +level is derived from the error's severity (given in health buffer). + +User commands examples: + +- Run fw recover flow manually:: + + $ devlink health recover pci/0000:82:00.0 reporter fw_fatal + +- Read FW CR-space dump if already stored or trigger new one:: + + $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal + +.. note:: + This command can run only on PF. + +vnic reporter +------------- +The vnic reporter implements only the `diagnose` callback. +It is responsible for querying the vnic diagnostic counters from fw and displaying +them in realtime. + +Description of the vnic counters: + +- total_q_under_processor_handle + number of queues in an error state due to + an async error or errored command. +- send_queue_priority_update_flow + number of QP/SQ priority/SL update events. +- cq_overrun + number of times CQ entered an error state due to an overflow. +- async_eq_overrun + number of times an EQ mapped to async events was overrun. + comp_eq_overrun number of times an EQ mapped to completion events was + overrun. +- quota_exceeded_command + number of commands issued and failed due to quota exceeded. +- invalid_command + number of commands issued and failed dues to any reason other than quota + exceeded. +- nic_receive_steering_discard + number of packets that completed RX flow + steering but were discarded due to a mismatch in flow table. +- generated_pkt_steering_fail + number of packets generated by the VNIC experiencing unexpected steering + failure (at any point in steering flow). +- handled_pkt_steering_fail + number of packets handled by the VNIC experiencing unexpected steering + failure (at any point in steering flow owned by the VNIC, including the FDB + for the eswitch owner). + +User commands examples: + +- Diagnose PF/VF vnic counters:: + + $ devlink health diagnose pci/0000:82:00.1 reporter vnic + +- Diagnose representor vnic counters (performed by supplying devlink port of the + representor, which can be obtained via devlink port command):: + + $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic + +.. note:: + This command can run over all interfaces such as PF/VF and representor ports. From 3178308ad4ca38955cad684d235153d4939f1fcd Mon Sep 17 00:00:00 2001 From: Rahul Rameshbabu Date: Tue, 2 May 2023 16:31:40 -0700 Subject: [PATCH 02/14] net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs Use a map structure for associating CQEs containing port timestamping information with the appropriate skb. Track order of WQEs submitted using a FIFO. Check if the corresponding port timestamping CQEs from the lookup values in the FIFO are considered dropped due to time elapsed. Return the lookup value to a freelist after consuming the skb. Reuse the freed lookup in future WQE submission iterations. The map structure uses an integer identifier for the key and returns an skb corresponding to that identifier. Embed the integer identifier in the WQE submitted to the WQ for the transmit path when the SQ is a PTP (port timestamping) SQ. The embedded identifier can then be queried using a field in the CQE of the corresponding port timestamping CQ. In the port timestamping napi_poll context, the identifier is queried from the CQE polled from CQ and used to lookup the corresponding skb from the WQE submit path. The skb reference is removed from map and then embedded with the port HW timestamp information from the CQE and eventually consumed. The metadata freelist FIFO is an array containing integer identifiers that can be pushed and popped in the FIFO. The purpose of this structure is bookkeeping what identifier values can safely be used in a subsequent WQE submission and should not contain identifiers that have still not been reaped by processing a corresponding CQE completion on the port timestamping CQ. The ts_cqe_pending_list structure is a combination of an array and linked list. The array is pre-populated with the nodes that will be added and removed from the head of the linked list. Each node contains the unique identifier value associated with the values submitted in the WQEs and retrieved in the port timestamping CQEs. When a WQE is submitted, the node in the array corresponding to the identifier popped from the metadata freelist is added to the end of the CQE pending list and is marked as "in-use". The node is removed from the linked list under two conditions. The first condition is that the corresponding port timestamping CQE is polled in the PTP napi_poll context. The second condition is that more than a second has elapsed since the DMA timestamp value corresponding to the WQE submission. When the first condition occurs, the "in-use" bit in the linked list node is cleared, and the resources corresponding to the WQE submission are then released. The second condition, however, indicates that the port timestamping CQE will likely never be delivered. It's not impossible for the device to post a CQE after an infinite amount of time though highly improbable. In order to be resilient to this improbable case, resources related to the corresponding WQE submission are still kept, the identifier value is not returned to the freelist, and the "in-use" bit is cleared on the node to indicate that it's no longer part of the linked list of "likely to be delivered" port timestamping CQE identifiers. A count for the number of port timestamping CQEs considered highly likely to never be delivered by the device is maintained. This count gets decremented in the unlikely event a port timestamping CQE considered unlikely to ever be delivered is polled in the PTP napi_poll context. Signed-off-by: Rahul Rameshbabu Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- .../ethernet/mellanox/mlx5/counters.rst | 6 + .../net/ethernet/mellanox/mlx5/core/en/ptp.c | 225 +++++++++++++----- .../net/ethernet/mellanox/mlx5/core/en/ptp.h | 57 ++++- .../ethernet/mellanox/mlx5/core/en_ethtool.c | 3 +- .../ethernet/mellanox/mlx5/core/en_stats.c | 4 +- .../ethernet/mellanox/mlx5/core/en_stats.h | 4 +- .../net/ethernet/mellanox/mlx5/core/en_tx.c | 28 ++- 7 files changed, 241 insertions(+), 86 deletions(-) diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst index a395df9c2751..008e560e12b5 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst @@ -683,6 +683,12 @@ the software port. time protocol. - Error + * - `ptp_cq[i]_late_cqe` + - Number of times a CQE has been delivered on the PTP timestamping CQ when + the CQE was not expected since a certain amount of time had elapsed where + the device typically ensures not posting the CQE. + - Error + .. [#ring_global] The corresponding ring and global counters do not share the same name (i.e. do not follow the common naming scheme). diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c index b0b429a0321e..8680d21f3e7b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c @@ -5,6 +5,8 @@ #include "en/txrx.h" #include "en/params.h" #include "en/fs_tt_redirect.h" +#include +#include struct mlx5e_ptp_fs { struct mlx5_flow_handle *l2_rule; @@ -19,6 +21,48 @@ struct mlx5e_ptp_params { struct mlx5e_rq_param rq_param; }; +struct mlx5e_ptp_port_ts_cqe_tracker { + u8 metadata_id; + bool inuse : 1; + struct list_head entry; +}; + +struct mlx5e_ptp_port_ts_cqe_list { + struct mlx5e_ptp_port_ts_cqe_tracker *nodes; + struct list_head tracker_list_head; + /* Sync list operations in xmit and napi_poll contexts */ + spinlock_t tracker_list_lock; +}; + +static inline void +mlx5e_ptp_port_ts_cqe_list_add(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metadata) +{ + struct mlx5e_ptp_port_ts_cqe_tracker *tracker = &list->nodes[metadata]; + + WARN_ON_ONCE(tracker->inuse); + tracker->inuse = true; + spin_lock(&list->tracker_list_lock); + list_add_tail(&tracker->entry, &list->tracker_list_head); + spin_unlock(&list->tracker_list_lock); +} + +static void +mlx5e_ptp_port_ts_cqe_list_remove(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metadata) +{ + struct mlx5e_ptp_port_ts_cqe_tracker *tracker = &list->nodes[metadata]; + + WARN_ON_ONCE(!tracker->inuse); + tracker->inuse = false; + spin_lock(&list->tracker_list_lock); + list_del(&tracker->entry); + spin_unlock(&list->tracker_list_lock); +} + +void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata) +{ + mlx5e_ptp_port_ts_cqe_list_add(ptpsq->ts_cqe_pending_list, metadata); +} + struct mlx5e_skb_cb_hwtstamp { ktime_t cqe_hwtstamp; ktime_t port_hwtstamp; @@ -79,75 +123,88 @@ void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type, memset(skb->cb, 0, sizeof(struct mlx5e_skb_cb_hwtstamp)); } -#define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask) - -static bool mlx5e_ptp_ts_cqe_drop(struct mlx5e_ptpsq *ptpsq, u16 skb_ci, u16 skb_id) +static struct sk_buff * +mlx5e_ptp_metadata_map_lookup(struct mlx5e_ptp_metadata_map *map, u16 metadata) { - return (ptpsq->ts_cqe_ctr_mask && (skb_ci != skb_id)); + return map->data[metadata]; } -static bool mlx5e_ptp_ts_cqe_ooo(struct mlx5e_ptpsq *ptpsq, u16 skb_id) +static struct sk_buff * +mlx5e_ptp_metadata_map_remove(struct mlx5e_ptp_metadata_map *map, u16 metadata) { - u16 skb_ci = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc); - u16 skb_pi = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_pc); - - if (PTP_WQE_CTR2IDX(skb_id - skb_ci) >= PTP_WQE_CTR2IDX(skb_pi - skb_ci)) - return true; - - return false; -} - -static void mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq *ptpsq, u16 skb_ci, - u16 skb_id, int budget) -{ - struct skb_shared_hwtstamps hwts = {}; struct sk_buff *skb; - ptpsq->cq_stats->resync_event++; + skb = map->data[metadata]; + map->data[metadata] = NULL; - while (skb_ci != skb_id) { - skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo); - hwts.hwtstamp = mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp; - skb_tstamp_tx(skb, &hwts); - ptpsq->cq_stats->resync_cqe++; - napi_consume_skb(skb, budget); - skb_ci = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc); - } + return skb; } +static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq, + ktime_t port_tstamp) +{ + struct mlx5e_ptp_port_ts_cqe_list *cqe_list = ptpsq->ts_cqe_pending_list; + ktime_t timeout = ns_to_ktime(MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT); + struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map; + struct mlx5e_ptp_port_ts_cqe_tracker *pos, *n; + + spin_lock(&cqe_list->tracker_list_lock); + list_for_each_entry_safe(pos, n, &cqe_list->tracker_list_head, entry) { + struct sk_buff *skb = + mlx5e_ptp_metadata_map_lookup(metadata_map, pos->metadata_id); + ktime_t dma_tstamp = mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp; + + if (!dma_tstamp || + ktime_after(ktime_add(dma_tstamp, timeout), port_tstamp)) + break; + + metadata_map->undelivered_counter++; + WARN_ON_ONCE(!pos->inuse); + pos->inuse = false; + list_del(&pos->entry); + } + spin_unlock(&cqe_list->tracker_list_lock); +} + +#define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask) + static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq, struct mlx5_cqe64 *cqe, int budget) { - u16 skb_id = PTP_WQE_CTR2IDX(be16_to_cpu(cqe->wqe_counter)); - u16 skb_ci = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc); + struct mlx5e_ptp_port_ts_cqe_list *pending_cqe_list = ptpsq->ts_cqe_pending_list; + u8 metadata_id = PTP_WQE_CTR2IDX(be16_to_cpu(cqe->wqe_counter)); + bool is_err_cqe = !!MLX5E_RX_ERR_CQE(cqe); struct mlx5e_txqsq *sq = &ptpsq->txqsq; struct sk_buff *skb; ktime_t hwtstamp; - if (unlikely(MLX5E_RX_ERR_CQE(cqe))) { - skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo); + if (likely(pending_cqe_list->nodes[metadata_id].inuse)) { + mlx5e_ptp_port_ts_cqe_list_remove(pending_cqe_list, metadata_id); + } else { + /* Reclaim space in the unlikely event CQE was delivered after + * marking it late. + */ + ptpsq->metadata_map.undelivered_counter--; + ptpsq->cq_stats->late_cqe++; + } + + skb = mlx5e_ptp_metadata_map_remove(&ptpsq->metadata_map, metadata_id); + + if (unlikely(is_err_cqe)) { ptpsq->cq_stats->err_cqe++; goto out; } - if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_ci, skb_id)) { - if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) { - /* already handled by a previous resync */ - ptpsq->cq_stats->ooo_cqe_drop++; - return; - } - mlx5e_ptp_skb_fifo_ts_cqe_resync(ptpsq, skb_ci, skb_id, budget); - } - - skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo); hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, get_cqe_ts(cqe)); mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_PORT_HWTSTAMP, hwtstamp, ptpsq->cq_stats); ptpsq->cq_stats->cqe++; + mlx5e_ptpsq_mark_ts_cqes_undelivered(ptpsq, hwtstamp); out: napi_consume_skb(skb, budget); + mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist, metadata_id); } static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget) @@ -291,36 +348,78 @@ static void mlx5e_ptp_destroy_sq(struct mlx5_core_dev *mdev, u32 sqn) static int mlx5e_ptp_alloc_traffic_db(struct mlx5e_ptpsq *ptpsq, int numa) { - int wq_sz = mlx5_wq_cyc_get_size(&ptpsq->txqsq.wq); - struct mlx5_core_dev *mdev = ptpsq->txqsq.mdev; + struct mlx5e_ptp_metadata_fifo *metadata_freelist = &ptpsq->metadata_freelist; + struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map; + struct mlx5e_ptp_port_ts_cqe_list *cqe_list; + int db_sz; + int md; - ptpsq->skb_fifo.fifo = kvzalloc_node(array_size(wq_sz, sizeof(*ptpsq->skb_fifo.fifo)), - GFP_KERNEL, numa); - if (!ptpsq->skb_fifo.fifo) + cqe_list = kvzalloc_node(sizeof(*ptpsq->ts_cqe_pending_list), GFP_KERNEL, numa); + if (!cqe_list) return -ENOMEM; + ptpsq->ts_cqe_pending_list = cqe_list; + + db_sz = min_t(u32, mlx5_wq_cyc_get_size(&ptpsq->txqsq.wq), + 1 << MLX5_CAP_GEN_2(ptpsq->txqsq.mdev, + ts_cqe_metadata_size2wqe_counter)); + ptpsq->ts_cqe_ctr_mask = db_sz - 1; + + cqe_list->nodes = kvzalloc_node(array_size(db_sz, sizeof(*cqe_list->nodes)), + GFP_KERNEL, numa); + if (!cqe_list->nodes) + goto free_cqe_list; + INIT_LIST_HEAD(&cqe_list->tracker_list_head); + spin_lock_init(&cqe_list->tracker_list_lock); + + metadata_freelist->data = + kvzalloc_node(array_size(db_sz, sizeof(*metadata_freelist->data)), + GFP_KERNEL, numa); + if (!metadata_freelist->data) + goto free_cqe_list_nodes; + metadata_freelist->mask = ptpsq->ts_cqe_ctr_mask; + + for (md = 0; md < db_sz; ++md) { + cqe_list->nodes[md].metadata_id = md; + metadata_freelist->data[md] = md; + } + metadata_freelist->pc = db_sz; + + metadata_map->data = + kvzalloc_node(array_size(db_sz, sizeof(*metadata_map->data)), + GFP_KERNEL, numa); + if (!metadata_map->data) + goto free_metadata_freelist; + metadata_map->capacity = db_sz; - ptpsq->skb_fifo.pc = &ptpsq->skb_fifo_pc; - ptpsq->skb_fifo.cc = &ptpsq->skb_fifo_cc; - ptpsq->skb_fifo.mask = wq_sz - 1; - if (MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter)) - ptpsq->ts_cqe_ctr_mask = - (1 << MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter)) - 1; return 0; + +free_metadata_freelist: + kvfree(metadata_freelist->data); +free_cqe_list_nodes: + kvfree(cqe_list->nodes); +free_cqe_list: + kvfree(cqe_list); + return -ENOMEM; } -static void mlx5e_ptp_drain_skb_fifo(struct mlx5e_skb_fifo *skb_fifo) +static void mlx5e_ptp_drain_metadata_map(struct mlx5e_ptp_metadata_map *map) { - while (*skb_fifo->pc != *skb_fifo->cc) { - struct sk_buff *skb = mlx5e_skb_fifo_pop(skb_fifo); + int idx; + + for (idx = 0; idx < map->capacity; ++idx) { + struct sk_buff *skb = map->data[idx]; dev_kfree_skb_any(skb); } } -static void mlx5e_ptp_free_traffic_db(struct mlx5e_skb_fifo *skb_fifo) +static void mlx5e_ptp_free_traffic_db(struct mlx5e_ptpsq *ptpsq) { - mlx5e_ptp_drain_skb_fifo(skb_fifo); - kvfree(skb_fifo->fifo); + mlx5e_ptp_drain_metadata_map(&ptpsq->metadata_map); + kvfree(ptpsq->metadata_map.data); + kvfree(ptpsq->metadata_freelist.data); + kvfree(ptpsq->ts_cqe_pending_list->nodes); + kvfree(ptpsq->ts_cqe_pending_list); } static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn, @@ -348,8 +447,7 @@ static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn, if (err) goto err_free_txqsq; - err = mlx5e_ptp_alloc_traffic_db(ptpsq, - dev_to_node(mlx5_core_dma_dev(c->mdev))); + err = mlx5e_ptp_alloc_traffic_db(ptpsq, dev_to_node(mlx5_core_dma_dev(c->mdev))); if (err) goto err_free_txqsq; @@ -366,7 +464,7 @@ static void mlx5e_ptp_close_txqsq(struct mlx5e_ptpsq *ptpsq) struct mlx5e_txqsq *sq = &ptpsq->txqsq; struct mlx5_core_dev *mdev = sq->mdev; - mlx5e_ptp_free_traffic_db(&ptpsq->skb_fifo); + mlx5e_ptp_free_traffic_db(ptpsq); cancel_work_sync(&sq->recover_work); mlx5e_ptp_destroy_sq(mdev, sq->sqn); mlx5e_free_txqsq_descs(sq); @@ -534,7 +632,10 @@ static void mlx5e_ptp_build_params(struct mlx5e_ptp *c, /* SQ */ if (test_bit(MLX5E_PTP_STATE_TX, c->state)) { - params->log_sq_size = orig->log_sq_size; + params->log_sq_size = + min(MLX5_CAP_GEN_2(c->mdev, ts_cqe_metadata_size2wqe_counter), + MLX5E_PTP_MAX_LOG_SQ_SIZE); + params->log_sq_size = min(params->log_sq_size, orig->log_sq_size); mlx5e_ptp_build_sq_param(c->mdev, params, &cparams->txq_sq_param); } /* RQ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h index cc7efde88ac3..7c5597d4589d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h @@ -7,18 +7,36 @@ #include "en.h" #include "en_stats.h" #include "en/txrx.h" +#include #include +#include #define MLX5E_PTP_CHANNEL_IX 0 +#define MLX5E_PTP_MAX_LOG_SQ_SIZE (8U) +#define MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT (1 * NSEC_PER_SEC) + +struct mlx5e_ptp_metadata_fifo { + u8 cc; + u8 pc; + u8 mask; + u8 *data; +}; + +struct mlx5e_ptp_metadata_map { + u16 undelivered_counter; + u16 capacity; + struct sk_buff **data; +}; struct mlx5e_ptpsq { struct mlx5e_txqsq txqsq; struct mlx5e_cq ts_cq; - u16 skb_fifo_cc; - u16 skb_fifo_pc; - struct mlx5e_skb_fifo skb_fifo; struct mlx5e_ptp_cq_stats *cq_stats; u16 ts_cqe_ctr_mask; + + struct mlx5e_ptp_port_ts_cqe_list *ts_cqe_pending_list; + struct mlx5e_ptp_metadata_fifo metadata_freelist; + struct mlx5e_ptp_metadata_map metadata_map; }; enum { @@ -69,12 +87,35 @@ static inline bool mlx5e_use_ptpsq(struct sk_buff *skb) fk.ports.dst == htons(PTP_EV_PORT)); } -static inline bool mlx5e_ptpsq_fifo_has_room(struct mlx5e_txqsq *sq) +static inline void mlx5e_ptp_metadata_fifo_push(struct mlx5e_ptp_metadata_fifo *fifo, u8 metadata) { - if (!sq->ptpsq) - return true; + fifo->data[fifo->mask & fifo->pc++] = metadata; +} - return mlx5e_skb_fifo_has_room(&sq->ptpsq->skb_fifo); +static inline u8 +mlx5e_ptp_metadata_fifo_pop(struct mlx5e_ptp_metadata_fifo *fifo) +{ + return fifo->data[fifo->mask & fifo->cc++]; +} + +static inline void +mlx5e_ptp_metadata_map_put(struct mlx5e_ptp_metadata_map *map, + struct sk_buff *skb, u8 metadata) +{ + WARN_ON_ONCE(map->data[metadata]); + map->data[metadata] = skb; +} + +static inline bool mlx5e_ptpsq_metadata_freelist_empty(struct mlx5e_ptpsq *ptpsq) +{ + struct mlx5e_ptp_metadata_fifo *freelist; + + if (likely(!ptpsq)) + return false; + + freelist = &ptpsq->metadata_freelist; + + return freelist->pc == freelist->cc; } int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params, @@ -89,6 +130,8 @@ void mlx5e_ptp_free_rx_fs(struct mlx5e_flow_steering *fs, const struct mlx5e_profile *profile); int mlx5e_ptp_rx_manage_fs(struct mlx5e_priv *priv, bool set); +void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata); + enum { MLX5E_SKB_CB_CQE_HWTSTAMP = BIT(0), MLX5E_SKB_CB_PORT_HWTSTAMP = BIT(1), diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 04195a673a6b..dff02434ff45 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -2061,7 +2061,8 @@ static int set_pflag_tx_port_ts(struct net_device *netdev, bool enable) struct mlx5e_params new_params; int err; - if (!MLX5_CAP_GEN(mdev, ts_cqe_to_dest_cqn)) + if (!MLX5_CAP_GEN(mdev, ts_cqe_to_dest_cqn) || + !MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter)) return -EOPNOTSUPP; /* Don't allow changing the PTP state if HTB offload is active, because diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c index 07b84d668fcc..52141729444e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c @@ -2142,9 +2142,7 @@ static const struct counter_desc ptp_cq_stats_desc[] = { { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, err_cqe) }, { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort) }, { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort_abs_diff_ns) }, - { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, resync_cqe) }, - { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, resync_event) }, - { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, ooo_cqe_drop) }, + { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, late_cqe) }, }; static const struct counter_desc ptp_rq_stats_desc[] = { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h index 1ff8a06027dc..409e9a47e433 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h @@ -449,9 +449,7 @@ struct mlx5e_ptp_cq_stats { u64 err_cqe; u64 abort; u64 abort_abs_diff_ns; - u64 resync_cqe; - u64 resync_event; - u64 ooo_cqe_drop; + u64 late_cqe; }; struct mlx5e_rep_stats { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index c7eb6b238c2b..d41435c22ce5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -372,7 +372,7 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb, const struct mlx5e_tx_attr *attr, const struct mlx5e_tx_wqe_attr *wqe_attr, u8 num_dma, struct mlx5e_tx_wqe_info *wi, struct mlx5_wqe_ctrl_seg *cseg, - bool xmit_more) + struct mlx5_wqe_eth_seg *eseg, bool xmit_more) { struct mlx5_wq_cyc *wq = &sq->wq; bool send_doorbell; @@ -394,11 +394,16 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb, mlx5e_tx_check_stop(sq); - if (unlikely(sq->ptpsq)) { + if (unlikely(sq->ptpsq && + (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))) { + u8 metadata_index = be32_to_cpu(eseg->flow_table_metadata); + mlx5e_skb_cb_hwtstamp_init(skb); - mlx5e_skb_fifo_push(&sq->ptpsq->skb_fifo, skb); + mlx5e_ptpsq_track_metadata(sq->ptpsq, metadata_index); + mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb, + metadata_index); if (!netif_tx_queue_stopped(sq->txq) && - !mlx5e_skb_fifo_has_room(&sq->ptpsq->skb_fifo)) { + mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq)) { netif_tx_stop_queue(sq->txq); sq->stats->stopped++; } @@ -483,13 +488,16 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, if (unlikely(num_dma < 0)) goto err_drop; - mlx5e_txwqe_complete(sq, skb, attr, wqe_attr, num_dma, wi, cseg, xmit_more); + mlx5e_txwqe_complete(sq, skb, attr, wqe_attr, num_dma, wi, cseg, eseg, xmit_more); return; err_drop: stats->dropped++; dev_kfree_skb_any(skb); + if (unlikely(sq->ptpsq && (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))) + mlx5e_ptp_metadata_fifo_push(&sq->ptpsq->metadata_freelist, + be32_to_cpu(eseg->flow_table_metadata)); mlx5e_tx_flush(sq); } @@ -645,9 +653,9 @@ void mlx5e_tx_mpwqe_ensure_complete(struct mlx5e_txqsq *sq) static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb, struct mlx5_wqe_eth_seg *eseg) { - if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) - eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc & - ptpsq->ts_cqe_ctr_mask); + if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) + eseg->flow_table_metadata = + cpu_to_be32(mlx5e_ptp_metadata_fifo_pop(&ptpsq->metadata_freelist)); } static void mlx5e_txwqe_build_eseg(struct mlx5e_priv *priv, struct mlx5e_txqsq *sq, @@ -766,7 +774,7 @@ void mlx5e_txqsq_wake(struct mlx5e_txqsq *sq) { if (netif_tx_queue_stopped(sq->txq) && mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, sq->stop_room) && - mlx5e_ptpsq_fifo_has_room(sq) && + !mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq) && !test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) { netif_tx_wake_queue(sq->txq); sq->stats->wake++; @@ -1031,7 +1039,7 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb, if (unlikely(num_dma < 0)) goto err_drop; - mlx5e_txwqe_complete(sq, skb, &attr, &wqe_attr, num_dma, wi, cseg, xmit_more); + mlx5e_txwqe_complete(sq, skb, &attr, &wqe_attr, num_dma, wi, cseg, eseg, xmit_more); return; From 53b836a44db4259b94ffcfff321fb3d63f976b76 Mon Sep 17 00:00:00 2001 From: Rahul Rameshbabu Date: Tue, 8 Aug 2023 21:10:21 -0700 Subject: [PATCH 03/14] net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ A new check for the tx devlink health reporter is introduced for determining when the PTP port timestamping SQ is considered unhealthy. If there are enough CQEs considered never to be delivered, the space that can be utilized on the SQ decreases significantly, impacting performance and usability of the SQ. The health reporter is triggered when the number of likely never delivered port timestamping CQEs that utilize the space of the PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink health reporter recover method is also provided for this specific TX error context that restarts the PTP SQ. Signed-off-by: Rahul Rameshbabu Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- Documentation/networking/devlink/mlx5.rst | 5 +- .../ethernet/mellanox/mlx5/core/en/health.h | 1 + .../net/ethernet/mellanox/mlx5/core/en/ptp.c | 22 +++++++ .../net/ethernet/mellanox/mlx5/core/en/ptp.h | 2 + .../mellanox/mlx5/core/en/reporter_tx.c | 65 +++++++++++++++++++ 5 files changed, 94 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 196a4bb28df1..702f204a3dbd 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -135,7 +135,7 @@ Health reporters tx reporter ----------- -The tx reporter is responsible for reporting and recovering of the following two error scenarios: +The tx reporter is responsible for reporting and recovering of the following three error scenarios: - tx timeout Report on kernel tx timeout detection. @@ -143,6 +143,9 @@ The tx reporter is responsible for reporting and recovering of the following two - tx error completion Report on error tx completion. Recover by flushing the tx queue and reset it. +- tx PTP port timestamping CQ unhealthy + Report too many CQEs never delivered on port ts CQ. + Recover by flushing and re-creating all PTP channels. tx reporter also support on demand diagnose callback, on which it provides real time information of its send queues status. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h index 0107e4e73bb0..415840c3ef84 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h @@ -18,6 +18,7 @@ void mlx5e_reporter_tx_create(struct mlx5e_priv *priv); void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv); void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq); int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq); +void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq); int mlx5e_health_cq_diag_fmsg(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg); int mlx5e_health_cq_common_diag_fmsg(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c index 8680d21f3e7b..bb11e644d24f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c @@ -2,6 +2,7 @@ // Copyright (c) 2020 Mellanox Technologies #include "en/ptp.h" +#include "en/health.h" #include "en/txrx.h" #include "en/params.h" #include "en/fs_tt_redirect.h" @@ -140,6 +141,12 @@ mlx5e_ptp_metadata_map_remove(struct mlx5e_ptp_metadata_map *map, u16 metadata) return skb; } +static bool mlx5e_ptp_metadata_map_unhealthy(struct mlx5e_ptp_metadata_map *map) +{ + /* Considered beginning unhealthy state if size * 15 / 2^4 cannot be reclaimed. */ + return map->undelivered_counter > (map->capacity >> 4) * 15; +} + static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq, ktime_t port_tstamp) { @@ -205,6 +212,9 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq, out: napi_consume_skb(skb, budget); mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist, metadata_id); + if (unlikely(mlx5e_ptp_metadata_map_unhealthy(&ptpsq->metadata_map)) && + !test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) + queue_work(ptpsq->txqsq.priv->wq, &ptpsq->report_unhealthy_work); } static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget) @@ -422,6 +432,14 @@ static void mlx5e_ptp_free_traffic_db(struct mlx5e_ptpsq *ptpsq) kvfree(ptpsq->ts_cqe_pending_list); } +static void mlx5e_ptpsq_unhealthy_work(struct work_struct *work) +{ + struct mlx5e_ptpsq *ptpsq = + container_of(work, struct mlx5e_ptpsq, report_unhealthy_work); + + mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq); +} + static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn, int txq_ix, struct mlx5e_ptp_params *cparams, int tc, struct mlx5e_ptpsq *ptpsq) @@ -451,6 +469,8 @@ static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn, if (err) goto err_free_txqsq; + INIT_WORK(&ptpsq->report_unhealthy_work, mlx5e_ptpsq_unhealthy_work); + return 0; err_free_txqsq: @@ -464,6 +484,8 @@ static void mlx5e_ptp_close_txqsq(struct mlx5e_ptpsq *ptpsq) struct mlx5e_txqsq *sq = &ptpsq->txqsq; struct mlx5_core_dev *mdev = sq->mdev; + if (current_work() != &ptpsq->report_unhealthy_work) + cancel_work_sync(&ptpsq->report_unhealthy_work); mlx5e_ptp_free_traffic_db(ptpsq); cancel_work_sync(&sq->recover_work); mlx5e_ptp_destroy_sq(mdev, sq->sqn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h index 7c5597d4589d..7b700d0f956a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h @@ -10,6 +10,7 @@ #include #include #include +#include #define MLX5E_PTP_CHANNEL_IX 0 #define MLX5E_PTP_MAX_LOG_SQ_SIZE (8U) @@ -34,6 +35,7 @@ struct mlx5e_ptpsq { struct mlx5e_ptp_cq_stats *cq_stats; u16 ts_cqe_ctr_mask; + struct work_struct report_unhealthy_work; struct mlx5e_ptp_port_ts_cqe_list *ts_cqe_pending_list; struct mlx5e_ptp_metadata_fifo metadata_freelist; struct mlx5e_ptp_metadata_map metadata_map; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index b35ff289af49..ff8242f67c54 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -164,6 +164,43 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx) return err; } +static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx) +{ + struct mlx5e_ptpsq *ptpsq = ctx; + struct mlx5e_channels *chs; + struct net_device *netdev; + struct mlx5e_priv *priv; + int carrier_ok; + int err; + + if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &ptpsq->txqsq.state)) + return 0; + + priv = ptpsq->txqsq.priv; + + mutex_lock(&priv->state_lock); + chs = &priv->channels; + netdev = priv->netdev; + + carrier_ok = netif_carrier_ok(netdev); + netif_carrier_off(netdev); + + mlx5e_deactivate_priv_channels(priv); + + mlx5e_ptp_close(chs->ptp); + err = mlx5e_ptp_open(priv, &chs->params, chs->c[0]->lag_port, &chs->ptp); + + mlx5e_activate_priv_channels(priv); + + /* return carrier back if needed */ + if (carrier_ok) + netif_carrier_on(netdev); + + mutex_unlock(&priv->state_lock); + + return err; +} + /* state lock cannot be grabbed within this function. * It can cause a dead lock or a read-after-free. */ @@ -516,6 +553,15 @@ static int mlx5e_tx_reporter_timeout_dump(struct mlx5e_priv *priv, struct devlin return mlx5e_tx_reporter_dump_sq(priv, fmsg, to_ctx->sq); } +static int mlx5e_tx_reporter_ptpsq_unhealthy_dump(struct mlx5e_priv *priv, + struct devlink_fmsg *fmsg, + void *ctx) +{ + struct mlx5e_ptpsq *ptpsq = ctx; + + return mlx5e_tx_reporter_dump_sq(priv, fmsg, &ptpsq->txqsq); +} + static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv, struct devlink_fmsg *fmsg) { @@ -621,6 +667,25 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq) return to_ctx.status; } +void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq) +{ + struct mlx5e_ptp_metadata_map *map = &ptpsq->metadata_map; + char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN]; + struct mlx5e_txqsq *txqsq = &ptpsq->txqsq; + struct mlx5e_cq *ts_cq = &ptpsq->ts_cq; + struct mlx5e_priv *priv = txqsq->priv; + struct mlx5e_err_ctx err_ctx = {}; + + err_ctx.ctx = ptpsq; + err_ctx.recover = mlx5e_tx_reporter_ptpsq_unhealthy_recover; + err_ctx.dump = mlx5e_tx_reporter_ptpsq_unhealthy_dump; + snprintf(err_str, sizeof(err_str), + "Unhealthy TX port TS queue: %d, SQ: 0x%x, CQ: 0x%x, Undelivered CQEs: %u Map Capacity: %u", + txqsq->ch_ix, txqsq->sqn, ts_cq->mcq.cqn, map->undelivered_counter, map->capacity); + + mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx); +} + static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = { .name = "tx", .recover = mlx5e_tx_reporter_recover, From 6486c0f44ed8e91073c1b08e83075e3832618ae5 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Thu, 13 Jul 2023 14:54:57 +0300 Subject: [PATCH 04/14] net/mlx5: Expose max possible SFs via devlink resource Introduce devlink resource for exposing max possible SFs on mlx5 devices. For example: $ devlink resource show pci/0000:00:0b.0 pci/0000:00:0b.0: name max_local_SFs size 5 unit entry dpipe_tables none name max_external_SFs size 0 unit entry dpipe_tables none Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.h | 8 ++++ .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 44 +++++++++++++++++-- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h index defba5bd91d9..961f75da6227 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h @@ -6,6 +6,14 @@ #include +enum mlx5_devlink_resource_id { + MLX5_DL_RES_MAX_LOCAL_SFS = 1, + MLX5_DL_RES_MAX_EXTERNAL_SFS, + + __MLX5_ID_RES_MAX, + MLX5_ID_RES_MAX = __MLX5_ID_RES_MAX - 1, +}; + enum mlx5_devlink_param_id { MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX, MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c index 17aa348989cb..c4daeaaafead 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c @@ -9,6 +9,7 @@ #include "mlx5_core.h" #include "eswitch.h" #include "diag/sf_tracepoint.h" +#include "devlink.h" struct mlx5_sf_hw { u32 usr_sfnum; @@ -243,6 +244,32 @@ static void mlx5_sf_hw_table_hwc_cleanup(struct mlx5_sf_hwc_table *hwc) kfree(hwc->sfs); } +static void mlx5_sf_hw_table_res_unregister(struct mlx5_core_dev *dev) +{ + devl_resources_unregister(priv_to_devlink(dev)); +} + +static int mlx5_sf_hw_table_res_register(struct mlx5_core_dev *dev, u16 max_fn, + u16 max_ext_fn) +{ + struct devlink_resource_size_params size_params; + struct devlink *devlink = priv_to_devlink(dev); + int err; + + devlink_resource_size_params_init(&size_params, max_fn, max_fn, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + err = devl_resource_register(devlink, "max_local_SFs", max_fn, MLX5_DL_RES_MAX_LOCAL_SFS, + DEVLINK_RESOURCE_ID_PARENT_TOP, &size_params); + if (err) + return err; + + devlink_resource_size_params_init(&size_params, max_ext_fn, max_ext_fn, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + return devl_resource_register(devlink, "max_external_SFs", max_ext_fn, + MLX5_DL_RES_MAX_EXTERNAL_SFS, DEVLINK_RESOURCE_ID_PARENT_TOP, + &size_params); +} + int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) { struct mlx5_sf_hw_table *table; @@ -262,12 +289,17 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) if (err) return err; + if (mlx5_sf_hw_table_res_register(dev, max_fn, max_ext_fn)) + mlx5_core_dbg(dev, "failed to register max SFs resources"); + if (!max_fn && !max_ext_fn) return 0; table = kzalloc(sizeof(*table), GFP_KERNEL); - if (!table) - return -ENOMEM; + if (!table) { + err = -ENOMEM; + goto alloc_err; + } mutex_init(&table->table_lock); table->dev = dev; @@ -291,6 +323,8 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) table_err: mutex_destroy(&table->table_lock); kfree(table); +alloc_err: + mlx5_sf_hw_table_res_unregister(dev); return err; } @@ -299,12 +333,14 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev) struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; if (!table) - return; + goto res_unregister; - mutex_destroy(&table->table_lock); mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_EXTERNAL]); mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]); + mutex_destroy(&table->table_lock); kfree(table); +res_unregister: + mlx5_sf_hw_table_res_unregister(dev); } static int mlx5_sf_hw_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data) From a9f168e4c6e1f623dcf2640b9d76a4f61b9731e5 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 31 May 2023 13:50:21 +0300 Subject: [PATCH 05/14] net/mlx5: Check with FW that sync reset completed successfully Even if the PF driver had no error on his part of the sync reset flow, the firmware can see wider picture as it syncs all the PFs in the flow. So add at end of sync reset flow check with firmware by reading MFRL register and initialization segment that the flow had no issue from firmware point of view too. Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 3 ++ .../ethernet/mellanox/mlx5/core/fw_reset.c | 39 ++++++++++++++++--- .../ethernet/mellanox/mlx5/core/fw_reset.h | 2 + include/linux/mlx5/mlx5_ifc.h | 3 +- 4 files changed, 40 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index 3d82ec890666..af8460bb257b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -212,6 +212,9 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a /* On fw_activate action, also driver is reloaded and reinit performed */ *actions_performed |= BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT); ret = mlx5_load_one_devl_locked(dev, true); + if (ret) + return ret; + ret = mlx5_fw_reset_verify_fw_complete(dev, extack); break; default: /* Unsupported action should not get to this function */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c index 4804990b7f22..e87766f91150 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -127,17 +127,23 @@ static int mlx5_fw_reset_get_reset_state_err(struct mlx5_core_dev *dev, if (mlx5_reg_mfrl_query(dev, NULL, NULL, &reset_state)) goto out; + if (!reset_state) + return 0; + switch (reset_state) { case MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION: case MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS: - NL_SET_ERR_MSG_MOD(extack, "Sync reset was already triggered"); + NL_SET_ERR_MSG_MOD(extack, "Sync reset still in progress"); return -EBUSY; - case MLX5_MFRL_REG_RESET_STATE_TIMEOUT: - NL_SET_ERR_MSG_MOD(extack, "Sync reset got timeout"); + case MLX5_MFRL_REG_RESET_STATE_NEG_TIMEOUT: + NL_SET_ERR_MSG_MOD(extack, "Sync reset negotiation timeout"); return -ETIMEDOUT; case MLX5_MFRL_REG_RESET_STATE_NACK: NL_SET_ERR_MSG_MOD(extack, "One of the hosts disabled reset"); return -EPERM; + case MLX5_MFRL_REG_RESET_STATE_UNLOAD_TIMEOUT: + NL_SET_ERR_MSG_MOD(extack, "Sync reset unload timeout"); + return -ETIMEDOUT; } out: @@ -151,7 +157,7 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel, struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; u32 out[MLX5_ST_SZ_DW(mfrl_reg)] = {}; u32 in[MLX5_ST_SZ_DW(mfrl_reg)] = {}; - int err; + int err, rst_res; set_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags); @@ -164,13 +170,34 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel, return 0; clear_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags); - if (err == -EREMOTEIO && MLX5_CAP_MCAM_FEATURE(dev, reset_state)) - return mlx5_fw_reset_get_reset_state_err(dev, extack); + if (err == -EREMOTEIO && MLX5_CAP_MCAM_FEATURE(dev, reset_state)) { + rst_res = mlx5_fw_reset_get_reset_state_err(dev, extack); + return rst_res ? rst_res : err; + } NL_SET_ERR_MSG_MOD(extack, "Sync reset command failed"); return mlx5_cmd_check(dev, err, in, out); } +int mlx5_fw_reset_verify_fw_complete(struct mlx5_core_dev *dev, + struct netlink_ext_ack *extack) +{ + u8 rst_state; + int err; + + err = mlx5_fw_reset_get_reset_state_err(dev, extack); + if (err) + return err; + + rst_state = mlx5_get_fw_rst_state(dev); + if (!rst_state) + return 0; + + mlx5_core_err(dev, "Sync reset did not complete, state=%d\n", rst_state); + NL_SET_ERR_MSG_MOD(extack, "Sync reset did not complete successfully"); + return rst_state; +} + int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev) { return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h index c57465595f7c..ea527d06a85f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h @@ -12,6 +12,8 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel, int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev); int mlx5_fw_reset_wait_reset_done(struct mlx5_core_dev *dev); +int mlx5_fw_reset_verify_fw_complete(struct mlx5_core_dev *dev, + struct netlink_ext_ack *extack); void mlx5_fw_reset_events_start(struct mlx5_core_dev *dev); void mlx5_fw_reset_events_stop(struct mlx5_core_dev *dev); void mlx5_drain_fw_reset(struct mlx5_core_dev *dev); diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 87fd6f9ed82c..9aed7e9b9f29 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -10858,8 +10858,9 @@ enum { MLX5_MFRL_REG_RESET_STATE_IDLE = 0, MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION = 1, MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS = 2, - MLX5_MFRL_REG_RESET_STATE_TIMEOUT = 3, + MLX5_MFRL_REG_RESET_STATE_NEG_TIMEOUT = 3, MLX5_MFRL_REG_RESET_STATE_NACK = 4, + MLX5_MFRL_REG_RESET_STATE_UNLOAD_TIMEOUT = 5, }; enum { From e0e22d59b47a3306d448284a499169a361555099 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Wed, 19 Apr 2023 03:17:57 +0000 Subject: [PATCH 06/14] net/mlx5: E-switch, Add checking for flow rule destinations Firmware doesn't allow flow rules in FDB to do header rewrite and send packets to both internal and uplink vports. The following syndrome will be generated when trying to offload such kind of rules: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 23569): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8c8f08), err(-22) To avoid this syndrome, add a checking before creating FTE. If a rule with header rewrite action forwards packets to both VF and PF, an error is returned directly. Signed-off-by: Jianbo Liu Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/eswitch_offloads.c | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index e391535e1ab1..46b8c60ac39a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -535,6 +535,28 @@ esw_src_port_rewrite_supported(struct mlx5_eswitch *esw) MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ignore_flow_level); } +static bool +esw_dests_to_vf_pf_vports(struct mlx5_flow_destination *dests, int max_dest) +{ + bool vf_dest = false, pf_dest = false; + int i; + + for (i = 0; i < max_dest; i++) { + if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT) + continue; + + if (dests[i].vport.num == MLX5_VPORT_UPLINK) + pf_dest = true; + else + vf_dest = true; + + if (vf_dest && pf_dest) + return true; + } + + return false; +} + static int esw_setup_dests(struct mlx5_flow_destination *dest, struct mlx5_flow_act *flow_act, @@ -671,6 +693,15 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw, rule = ERR_PTR(err); goto err_create_goto_table; } + + /* Header rewrite with combined wire+loopback in FDB is not allowed */ + if ((flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) && + esw_dests_to_vf_pf_vports(dest, i)) { + esw_warn(esw->dev, + "FDB: Header rewrite with forwarding to both PF and VF is not allowed\n"); + rule = ERR_PTR(-EINVAL); + goto err_esw_get; + } } if (esw_attr->decap_pkt_reformat) From 2ad0160c02be2a56bd80aaeb5c40221250814c25 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 28 Jun 2023 16:19:52 +0200 Subject: [PATCH 07/14] net/mlx5: Use auxiliary_device_uninit() instead of device_put() Instead of using device_put(), use auxiliary_device_uninit() for auxiliary device uninit which internally just calls device_put(). Signed-off-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c index 8e2abbab05f0..b2c849b8c0c9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c @@ -129,7 +129,7 @@ static void mlx5_sf_dev_add(struct mlx5_core_dev *dev, u16 sf_index, u16 fn_id, err = auxiliary_device_add(&sf_dev->adev); if (err) { - put_device(&sf_dev->adev.dev); + auxiliary_device_uninit(&sf_dev->adev); goto add_err; } From ae80d7a06fdb0b34a0f2842e1aac3da3bde93ccb Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 30 Jun 2023 09:32:14 +0200 Subject: [PATCH 08/14] net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init() Since mlx5_sf_supported() check is done as a first thing in mlx5_sf_max_functions(), remove the redundant check. Signed-off-by: Jiri Pirko Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c index c4daeaaafead..1f613320fe07 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c @@ -275,15 +275,14 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) struct mlx5_sf_hw_table *table; u16 max_ext_fn = 0; u16 ext_base_id = 0; - u16 max_fn = 0; u16 base_id; + u16 max_fn; int err; if (!mlx5_vhca_event_supported(dev)) return 0; - if (mlx5_sf_supported(dev)) - max_fn = mlx5_sf_max_functions(dev); + max_fn = mlx5_sf_max_functions(dev); err = mlx5_esw_sf_max_hpf_functions(dev, &max_ext_fn, &ext_base_id); if (err) From 88074d81e5fe8b76010ea3437330d5ab55535537 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 30 Jun 2023 09:37:04 +0200 Subject: [PATCH 09/14] net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN() There is a helper called mlx5_sf_start_function_id() that wraps up a query to get base SF function id. Use that instead of calling MLX5_CAP_GEN() directly. Signed-off-by: Jiri Pirko Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c index b2c849b8c0c9..39132a6cc68b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c @@ -167,7 +167,7 @@ mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_ if (!max_functions) return 0; - base_id = MLX5_CAP_GEN(table->dev, sf_base_id); + base_id = mlx5_sf_start_function_id(table->dev); if (event->function_id < base_id || event->function_id >= (base_id + max_functions)) return 0; @@ -209,7 +209,7 @@ static int mlx5_sf_dev_vhca_arm_all(struct mlx5_sf_dev_table *table) int i; max_functions = mlx5_sf_max_functions(dev); - function_id = MLX5_CAP_GEN(dev, sf_base_id); + function_id = mlx5_sf_start_function_id(dev); /* Arm the vhca context as the vhca event notifier */ for (i = 0; i < max_functions; i++) { err = mlx5_vhca_event_arm(dev, function_id); @@ -234,7 +234,7 @@ static void mlx5_sf_dev_add_active_work(struct work_struct *work) int i; max_functions = mlx5_sf_max_functions(dev); - function_id = MLX5_CAP_GEN(dev, sf_base_id); + function_id = mlx5_sf_start_function_id(dev); for (i = 0; i < max_functions; i++, function_id++) { if (table->stop_active_wq) return; From b63f8bde2fbadc8eca96b42dd281bf99bc853167 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 30 Jun 2023 09:41:14 +0200 Subject: [PATCH 10/14] net/mlx5: Remove redundant check of mlx5_vhca_event_supported() Since mlx5_vhca_event_supported() is called in mlx5_sf_dev_supported(), remove the redundant call. Signed-off-by: Jiri Pirko Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c index 39132a6cc68b..e617a270d74a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c @@ -299,7 +299,7 @@ void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev) unsigned int max_sfs; int err; - if (!mlx5_sf_dev_supported(dev) || !mlx5_vhca_event_supported(dev)) + if (!mlx5_sf_dev_supported(dev)) return; table = kzalloc(sizeof(*table), GFP_KERNEL); From 36e5a0efc81010dab29b17c66c58417a15851836 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 30 Jun 2023 12:45:39 +0200 Subject: [PATCH 11/14] net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler() sw_function_id contains sfnum, so fix the error message to name the value properly. Signed-off-by: Jiri Pirko Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c index e617a270d74a..05e148db9889 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c @@ -185,7 +185,7 @@ mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_ mlx5_sf_dev_del(table->dev, sf_dev, sf_index); else mlx5_core_err(table->dev, - "SF DEV: teardown state for invalid dev index=%d fn_id=0x%x\n", + "SF DEV: teardown state for invalid dev index=%d sfnum=0x%x\n", sf_index, event->sw_function_id); break; case MLX5_VHCA_STATE_ACTIVE: From 0b4eb603d635ca47c1c372f69b4b96672e4c2c05 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Sun, 2 Jan 2022 14:57:39 +0200 Subject: [PATCH 12/14] net/mlx5: Remove unused CAPs mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but there isn't any user who requires them. As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used. Thus, drop all usages and definitions of the mentioned caps above. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 13 ------ .../net/ethernet/mellanox/mlx5/core/main.c | 2 - include/linux/mlx5/device.h | 17 +------- include/linux/mlx5/mlx5_ifc.h | 43 ------------------- 4 files changed, 1 insertion(+), 74 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index fb2035a5ec99..c982ce06d7a3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -206,12 +206,6 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) return err; } - if (MLX5_CAP_GEN(dev, vector_calc)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_VECTOR_CALC); - if (err) - return err; - } - if (MLX5_CAP_GEN(dev, qos)) { err = mlx5_core_get_caps(dev, MLX5_CAP_QOS); if (err) @@ -226,7 +220,6 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) if (MLX5_CAP_GEN(dev, mcam_reg)) { mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_FIRST_128); - mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_0x9080_0x90FF); mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_0x9100_0x917F); } @@ -270,12 +263,6 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) return err; } - if (MLX5_CAP_GEN(dev, shampo)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_SHAMPO); - if (err) - return err; - } - if (MLX5_CAP_GEN_64(dev, general_obj_types) & MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD) { err = mlx5_core_get_caps(dev, MLX5_CAP_MACSEC); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index f4fe06a5042e..cb77b5774aad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1714,7 +1714,6 @@ static const int types[] = { MLX5_CAP_FLOW_TABLE, MLX5_CAP_ESWITCH_FLOW_TABLE, MLX5_CAP_ESWITCH, - MLX5_CAP_VECTOR_CALC, MLX5_CAP_QOS, MLX5_CAP_DEBUG, MLX5_CAP_DEV_MEM, @@ -1723,7 +1722,6 @@ static const int types[] = { MLX5_CAP_VDPA_EMULATION, MLX5_CAP_IPSEC, MLX5_CAP_PORT_SELECTION, - MLX5_CAP_DEV_SHAMPO, MLX5_CAP_MACSEC, MLX5_CAP_ADV_VIRTUALIZATION, MLX5_CAP_CRYPTO, diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 80cc12a9a531..5041965250f6 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1208,9 +1208,7 @@ enum mlx5_cap_type { MLX5_CAP_FLOW_TABLE, MLX5_CAP_ESWITCH_FLOW_TABLE, MLX5_CAP_ESWITCH, - MLX5_CAP_RESERVED, - MLX5_CAP_VECTOR_CALC, - MLX5_CAP_QOS, + MLX5_CAP_QOS = 0xc, MLX5_CAP_DEBUG, MLX5_CAP_RESERVED_14, MLX5_CAP_DEV_MEM, @@ -1220,7 +1218,6 @@ enum mlx5_cap_type { MLX5_CAP_DEV_EVENT = 0x14, MLX5_CAP_IPSEC, MLX5_CAP_CRYPTO = 0x1a, - MLX5_CAP_DEV_SHAMPO = 0x1d, MLX5_CAP_MACSEC = 0x1f, MLX5_CAP_GENERAL_2 = 0x20, MLX5_CAP_PORT_SELECTION = 0x25, @@ -1239,7 +1236,6 @@ enum mlx5_pcam_feature_groups { enum mlx5_mcam_reg_groups { MLX5_MCAM_REGS_FIRST_128 = 0x0, - MLX5_MCAM_REGS_0x9080_0x90FF = 0x1, MLX5_MCAM_REGS_0x9100_0x917F = 0x2, MLX5_MCAM_REGS_NUM = 0x3, }; @@ -1416,10 +1412,6 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_ODP_MAX(mdev, cap)\ MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->max, cap) -#define MLX5_CAP_VECTOR_CALC(mdev, cap) \ - MLX5_GET(vector_calc_cap, \ - mdev->caps.hca[MLX5_CAP_VECTOR_CALC]->cur, cap) - #define MLX5_CAP_QOS(mdev, cap)\ MLX5_GET(qos_cap, mdev->caps.hca[MLX5_CAP_QOS]->cur, cap) @@ -1436,10 +1428,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \ mng_access_reg_cap_mask.access_regs.reg) -#define MLX5_CAP_MCAM_REG1(mdev, reg) \ - MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \ - mng_access_reg_cap_mask.access_regs1.reg) - #define MLX5_CAP_MCAM_REG2(mdev, reg) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \ mng_access_reg_cap_mask.access_regs2.reg) @@ -1485,9 +1473,6 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_CRYPTO(mdev, cap)\ MLX5_GET(crypto_cap, (mdev)->caps.hca[MLX5_CAP_CRYPTO]->cur, cap) -#define MLX5_CAP_DEV_SHAMPO(mdev, cap)\ - MLX5_GET(shampo_cap, mdev->caps.hca_cur[MLX5_CAP_DEV_SHAMPO], cap) - #define MLX5_CAP_MACSEC(mdev, cap)\ MLX5_GET(macsec_cap, (mdev)->caps.hca[MLX5_CAP_MACSEC]->cur, cap) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 9aed7e9b9f29..08dcb1f43be7 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1314,33 +1314,6 @@ struct mlx5_ifc_odp_cap_bits { u8 reserved_at_120[0x6E0]; }; -struct mlx5_ifc_calc_op { - u8 reserved_at_0[0x10]; - u8 reserved_at_10[0x9]; - u8 op_swap_endianness[0x1]; - u8 op_min[0x1]; - u8 op_xor[0x1]; - u8 op_or[0x1]; - u8 op_and[0x1]; - u8 op_max[0x1]; - u8 op_add[0x1]; -}; - -struct mlx5_ifc_vector_calc_cap_bits { - u8 calc_matrix[0x1]; - u8 reserved_at_1[0x1f]; - u8 reserved_at_20[0x8]; - u8 max_vec_count[0x8]; - u8 reserved_at_30[0xd]; - u8 max_chunk_size[0x3]; - struct mlx5_ifc_calc_op calc0; - struct mlx5_ifc_calc_op calc1; - struct mlx5_ifc_calc_op calc2; - struct mlx5_ifc_calc_op calc3; - - u8 reserved_at_c0[0x720]; -}; - struct mlx5_ifc_tls_cap_bits { u8 tls_1_2_aes_gcm_128[0x1]; u8 tls_1_3_aes_gcm_128[0x1]; @@ -3435,20 +3408,6 @@ struct mlx5_ifc_roce_addr_layout_bits { u8 reserved_at_e0[0x20]; }; -struct mlx5_ifc_shampo_cap_bits { - u8 reserved_at_0[0x3]; - u8 shampo_log_max_reservation_size[0x5]; - u8 reserved_at_8[0x3]; - u8 shampo_log_min_reservation_size[0x5]; - u8 shampo_min_mss_size[0x10]; - - u8 reserved_at_20[0x3]; - u8 shampo_max_log_headers_entry_size[0x5]; - u8 reserved_at_28[0x18]; - - u8 reserved_at_40[0x7c0]; -}; - struct mlx5_ifc_crypto_cap_bits { u8 reserved_at_0[0x3]; u8 synchronize_dek[0x1]; @@ -3484,14 +3443,12 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_flow_table_eswitch_cap_bits flow_table_eswitch_cap; struct mlx5_ifc_e_switch_cap_bits e_switch_cap; struct mlx5_ifc_port_selection_cap_bits port_selection_cap; - struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap; struct mlx5_ifc_qos_cap_bits qos_cap; struct mlx5_ifc_debug_cap_bits debug_cap; struct mlx5_ifc_fpga_cap_bits fpga_cap; struct mlx5_ifc_tls_cap_bits tls_cap; struct mlx5_ifc_device_mem_cap_bits device_mem_cap; struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap; - struct mlx5_ifc_shampo_cap_bits shampo_cap; struct mlx5_ifc_macsec_cap_bits macsec_cap; struct mlx5_ifc_crypto_cap_bits crypto_cap; u8 reserved_at_0[0x8000]; From a41cb59117fa12ee17cda5e5c781eecfcb15dc0f Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 11 Jul 2023 15:56:08 +0300 Subject: [PATCH 13/14] net/mlx5: Remove unused MAX HCA capabilities Each device cap has two modes: MAX and CUR. The driver maintains a cache of both modes of the capabilities. For most device caps, the MAX cap mode is never used. Hence, remove all driver queries of the MAX mode of the said caps as well as their helper MACROs. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 34 ++++++------ .../net/ethernet/mellanox/mlx5/core/main.c | 14 ++--- .../ethernet/mellanox/mlx5/core/mlx5_core.h | 3 ++ include/linux/mlx5/device.h | 52 ------------------- include/linux/mlx5/driver.h | 1 - 5 files changed, 30 insertions(+), 74 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index c982ce06d7a3..0e394408af75 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -160,13 +160,15 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) } if (MLX5_CAP_GEN(dev, eth_net_offloads)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ETHERNET_OFFLOADS, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_IPOIB_ENHANCED_OFFLOADS); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_IPOIB_ENHANCED_OFFLOADS, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; } @@ -191,29 +193,30 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) if (MLX5_CAP_GEN(dev, nic_flow_table) || MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_FLOW_TABLE, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_ESWITCH_MANAGER(dev)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH_FLOW_TABLE); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ESWITCH_FLOW_TABLE, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; - err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ESWITCH, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, qos)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_QOS); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_QOS, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, debug)) - mlx5_core_get_caps(dev, MLX5_CAP_DEBUG); + mlx5_core_get_caps_mode(dev, MLX5_CAP_DEBUG, HCA_CAP_OPMOD_GET_CUR); if (MLX5_CAP_GEN(dev, pcam_reg)) mlx5_get_pcam_reg(dev); @@ -227,51 +230,52 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) mlx5_get_qcam_reg(dev); if (MLX5_CAP_GEN(dev, device_memory)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_MEM); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_DEV_MEM, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, event_cap)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_EVENT); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_DEV_EVENT, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, tls_tx) || MLX5_CAP_GEN(dev, tls_rx)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_TLS); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_TLS, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN_64(dev, general_obj_types) & MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) { - err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_VDPA_EMULATION, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, ipsec_offload)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_IPSEC); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_IPSEC, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, crypto)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_CRYPTO); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_CRYPTO, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN_64(dev, general_obj_types) & MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD) { - err = mlx5_core_get_caps(dev, MLX5_CAP_MACSEC); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_MACSEC, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, adv_virtualization)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ADV_VIRTUALIZATION); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ADV_VIRTUALIZATION, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index cb77b5774aad..15561965d2af 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -361,9 +361,8 @@ void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *dev) } EXPORT_SYMBOL(mlx5_core_uplink_netdev_event_replay); -static int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev, - enum mlx5_cap_type cap_type, - enum mlx5_cap_mode cap_mode) +int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type, + enum mlx5_cap_mode cap_mode) { u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)]; int out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); @@ -1620,21 +1619,24 @@ static int mlx5_query_hca_caps_light(struct mlx5_core_dev *dev) return err; if (MLX5_CAP_GEN(dev, eth_net_offloads)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ETHERNET_OFFLOADS, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, nic_flow_table) || MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_FLOW_TABLE, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN_64(dev, general_obj_types) & MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) { - err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_VDPA_EMULATION, + HCA_CAP_OPMOD_GET_CUR); if (err) return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 8e2028d20a9e..124352459c23 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -174,6 +174,9 @@ static inline int mlx5_flexible_inlen(struct mlx5_core_dev *dev, size_t fixed, #define MLX5_FLEXIBLE_INLEN(dev, fixed, item_size, num_items) \ mlx5_flexible_inlen(dev, fixed, item_size, num_items, __func__, __LINE__) +int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); +int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type, + enum mlx5_cap_mode cap_mode); int mlx5_query_hca_caps(struct mlx5_core_dev *dev); int mlx5_query_board_id(struct mlx5_core_dev *dev); int mlx5_query_module_num(struct mlx5_core_dev *dev, int *module_num); diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 5041965250f6..93399802ba77 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1275,10 +1275,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET(per_protocol_networking_offload_caps,\ mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->cur, cap) -#define MLX5_CAP_ETH_MAX(mdev, cap) \ - MLX5_GET(per_protocol_networking_offload_caps,\ - mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->max, cap) - #define MLX5_CAP_IPOIB_ENHANCED(mdev, cap) \ MLX5_GET(per_protocol_networking_offload_caps,\ mdev->caps.hca[MLX5_CAP_IPOIB_ENHANCED_OFFLOADS]->cur, cap) @@ -1301,77 +1297,40 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP64_FLOWTABLE(mdev, cap) \ MLX5_GET64(flow_table_nic_cap, (mdev)->caps.hca[MLX5_CAP_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_FLOWTABLE_MAX(mdev, cap) \ - MLX5_GET(flow_table_nic_cap, mdev->caps.hca[MLX5_CAP_FLOW_TABLE]->max, cap) - #define MLX5_CAP_FLOWTABLE_NIC_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive.cap) -#define MLX5_CAP_FLOWTABLE_NIC_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive.cap) - #define MLX5_CAP_FLOWTABLE_NIC_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit.cap) -#define MLX5_CAP_FLOWTABLE_NIC_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit.cap) - #define MLX5_CAP_FLOWTABLE_SNIFFER_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_sniffer.cap) -#define MLX5_CAP_FLOWTABLE_SNIFFER_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_sniffer.cap) - #define MLX5_CAP_FLOWTABLE_SNIFFER_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_sniffer.cap) -#define MLX5_CAP_FLOWTABLE_SNIFFER_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_sniffer.cap) - #define MLX5_CAP_FLOWTABLE_RDMA_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_rdma.cap) -#define MLX5_CAP_FLOWTABLE_RDMA_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_rdma.cap) - #define MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_rdma.cap) -#define MLX5_CAP_FLOWTABLE_RDMA_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_rdma.cap) - #define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \ MLX5_GET(flow_table_eswitch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, cap) \ - MLX5_GET(flow_table_eswitch_cap, \ - mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->max, cap) - #define MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_nic_esw_fdb.cap) -#define MLX5_CAP_ESW_FLOWTABLE_FDB_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_nic_esw_fdb.cap) - #define MLX5_CAP_ESW_EGRESS_ACL(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_egress.cap) -#define MLX5_CAP_ESW_EGRESS_ACL_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_egress.cap) - #define MLX5_CAP_ESW_INGRESS_ACL(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_ingress.cap) -#define MLX5_CAP_ESW_INGRESS_ACL_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_ingress.cap) - #define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, ft_field_support_2_esw_fdb.cap) -#define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, ft_field_support_2_esw_fdb.cap) - #define MLX5_CAP_ESW(mdev, cap) \ MLX5_GET(e_switch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH]->cur, cap) @@ -1380,10 +1339,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET64(flow_table_eswitch_cap, \ (mdev)->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_ESW_MAX(mdev, cap) \ - MLX5_GET(e_switch_cap, \ - mdev->caps.hca[MLX5_CAP_ESWITCH]->max, cap) - #define MLX5_CAP_PORT_SELECTION(mdev, cap) \ MLX5_GET(port_selection_cap, \ mdev->caps.hca[MLX5_CAP_PORT_SELECTION]->cur, cap) @@ -1396,16 +1351,9 @@ enum mlx5_qcam_feature_groups { MLX5_GET(adv_virtualization_cap, \ mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->cur, cap) -#define MLX5_CAP_ADV_VIRTUALIZATION_MAX(mdev, cap) \ - MLX5_GET(adv_virtualization_cap, \ - mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->max, cap) - #define MLX5_CAP_FLOWTABLE_PORT_SELECTION(mdev, cap) \ MLX5_CAP_PORT_SELECTION(mdev, flow_table_properties_port_selection.cap) -#define MLX5_CAP_FLOWTABLE_PORT_SELECTION_MAX(mdev, cap) \ - MLX5_CAP_PORT_SELECTION_MAX(mdev, flow_table_properties_port_selection.cap) - #define MLX5_CAP_ODP(mdev, cap)\ MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->cur, cap) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e1c7e502a4fc..c9d82e74daaa 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1022,7 +1022,6 @@ bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); void mlx5_core_uplink_netdev_set(struct mlx5_core_dev *mdev, struct net_device *netdev); void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *mdev); -int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); From bd3a2f77809b84df5dc15edd72cf9955568e698e Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 11 Jul 2023 16:32:05 +0300 Subject: [PATCH 14/14] net/mlx5: Don't query MAX caps twice Whenever mlx5 driver is probed or reloaded, it queries some capabilities in MAX mode via set_hca_cap() API. Afterwards, the driver queries all capabilities in MAX mode via mlx5_query_hca_caps() API. Since MAX caps are read only caps, querying them twice is redundant. Hence, delete the second query. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index 0e394408af75..58f4c0d0fafa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -143,18 +143,18 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) { int err; - err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_GENERAL, HCA_CAP_OPMOD_GET_CUR); if (err) return err; if (MLX5_CAP_GEN(dev, port_selection_cap)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_PORT_SELECTION); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_PORT_SELECTION, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, hca_cap_2)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL_2); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_GENERAL_2, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } @@ -174,19 +174,19 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) } if (MLX5_CAP_GEN(dev, pg)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ODP); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ODP, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, atomic)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ATOMIC); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ATOMIC, HCA_CAP_OPMOD_GET_CUR); if (err) return err; } if (MLX5_CAP_GEN(dev, roce)) { - err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE); + err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ROCE, HCA_CAP_OPMOD_GET_CUR); if (err) return err; }