This patch performs various minor DQO TX datapath refactors in
preparation for adding XDP_TX and XDP_REDIRECT support. The following
refactors are performed:
1) gve_tx_fill_pkt_desc_dqo() relies on a SKB pointer to
get whether checksum offloading should be enabled. This won't work
for the XDP case, which does not have a SKB. This patch updates the
method to use a boolean representing whether checksum offloading
should be enabled directly.
2) gve_maybe_stop_dqo() contains some synchronization between the true
TX head and the cached value, a synchronization which is common for
XDP queues and normal netdev queues. However, that method is reserved
for netdev TX queues. To avoid duplicate code, this logic is factored
out into a new method, gve_has_tx_slots_available().
3) gve_tx_update_tail() is added to update the TX tail, a functionality
that will be common between normal TX and XDP TX codepaths.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for XDP DQ support, the gve_xdp_xmit callback needs to
be generalized for all queue formats. This patch renames the GQ-specific
function to gve_xdp_xmit_gqi, and introduces a new gve_xdp_xmit callback
which branches on queue format.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In gve_adminq_issue_cmd(), return -EINVAL instead of 0 when an unknown
admin queue command opcode is encountered.
This prevents the function from silently succeeding on invalid input
and prevents undefined behavior by ensuring the function fails gracefully
when an unrecognized opcode is provided.
These changes improve error handling.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250616054504.1644770-2-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Correct spelling and improves the clarity of comments
"confiugration" -> "configuration"
"spilt" -> "split"
"It if is 0" -> "If it is 0"
"DQ" -> "DQO" (correct abbreviation)
- Clarify BIT(0) flag usage in gve_get_priv_flags()
- Replaced hardcoded array size with GVE_NUM_PTYPES
for clarity and maintainability.
These changes are purely cosmetic and do not affect functionality.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250616054504.1644770-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expand the get_ts_info ethtool handler with the new gve_get_ts_info
which advertises support for rx hardware timestamping.
With this patch, the driver now fully supports rx hardware timestamping.
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-9-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement ndo_hwtstamp_get/set to enable hardware RX timestamping,
providing support for SIOC[SG]HWTSTAMP IOCTLs. Included with this support
is the small change necessary to read the rx timestamp out of the rx
descriptor, now that timestamps start being enabled. The gve clock is
only used for hardware timestamps, so started when timestamps are
requested and stopped when not needed.
This version only supports RX hardware timestamping with the rx filter
HWTSTAMP_FILTER_ALL. If the user attempts to configure a more
restrictive filter, the filter will be set to HWTSTAMP_FILTER_ALL in the
returned structure.
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-8-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow the rx path to recover the high 32 bits of the full 64 bit rx
timestamp.
Use the low 32 bits of the last synced nic time and the 32 bits of the
timestamp provided in the rx descriptor to generate a difference, which
is then applied to the last synced nic time to reconstruct the complete
64-bit timestamp.
This scheme remains accurate as long as no more than ~2 seconds have
passed between the last read of the nic clock and the timestamping
application of the received packet.
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-7-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Query the nic clock and store the results. The timestamp delivered
in descriptors has a wraparound time of ~4 seconds so 250ms is chosen
as the sync cadence to provide a balance between performance, and
drift potential when we do start associating host time and nic time.
Leverage PTP's aux_work to query the nic clock periodically.
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-6-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adminq commands for queues creation and destruction were not
consistently protected by the driver's adminq_lock. This was previously
benign as these operations were always initiated from contexts holding
kernel-level locks (e.g., rtnl_lock, netdev_lock), which provided
serialization.
Upcoming PTP aux_work will issue adminq commands directly from the
driver to read the NIC clock, without such kernel lock protection.
To prevent race conditions with this new PTP work, this patch ensures
the adminq_lock is held during queues creation and destruction.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-5-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the device supports reading of the nic clock, add support
to initialize and register the PTP clock.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-4-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an adminq command to read NIC's hardware clock. The driver
allocates dma memory and passes that dma memory address to the device.
The device then writes the clock to the given address.
Signed-off-by: Jeff Rogers <jefrogers@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-3-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the device option and negotiation with the device for clock
synchronization with the nic. This option is necessary before the driver
will advertise support for hardware timestamping or other related
features.
Signed-off-by: Jeff Rogers <jefrogers@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-2-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We're migrating RXFH config to new callbacks.
Remove RXFH handling from drivers where it does nothing.
Reviewed-by: Ziwei Xiao <ziweixiao@google.com>
Link: https://patch.msgid.link/20250611145949.2674086-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
gve_alloc_pending_packet() can return NULL, but gve_tx_add_skb_dqo()
did not check for this case before dereferencing the returned pointer.
Add a missing NULL check to prevent a potential NULL pointer
dereference when allocation fails.
This improves robustness in low-memory scenarios.
Fixes: a57e5de476 ("gve: DQO: Add TX path")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, the RX_BUFFERS_POSTED stat incorrectly reported the
fill_cnt from RX queue 0 for all queues, resulting in inaccurate
per-queue statistics.
Fix this by correctly indexing priv->rx[idx].fill_cnt for each RX queue.
Fixes: 24aeb56f2d ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250527130830.1812903-1-alok.a.tiwari@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use netmem_dma_*() helpers in gve_tx_dqo.c DQO-RDA paths to
enable netmem TX support in that mode.
Declare support for netmem TX in GVE DQO-RDA mode.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-8-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 03df156dd3 ("xdp: double protect netdev->xdp_flags with
netdev->lock") introduces the netdev lock to xdp_set_features_flag().
The change includes a _locked version of the method, as it is possible
for a driver to have already acquired the netdev lock before calling
this helper. However, the same applies to
xdp_features_(set|clear)_redirect_flags(), which ends up calling the
unlocked version of xdp_set_features_flags() leading to deadlocks in
GVE, which grabs the netdev lock as part of its suspend, reset, and
shutdown processes:
[ 833.265543] WARNING: possible recursive locking detected
[ 833.270949] 6.15.0-rc1 #6 Tainted: G E
[ 833.276271] --------------------------------------------
[ 833.281681] systemd-shutdow/1 is trying to acquire lock:
[ 833.287090] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: xdp_set_features_flag+0x29/0x90
[ 833.295470]
[ 833.295470] but task is already holding lock:
[ 833.301400] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve]
[ 833.309508]
[ 833.309508] other info that might help us debug this:
[ 833.316130] Possible unsafe locking scenario:
[ 833.316130]
[ 833.322142] CPU0
[ 833.324681] ----
[ 833.327220] lock(&dev->lock);
[ 833.330455] lock(&dev->lock);
[ 833.333689]
[ 833.333689] *** DEADLOCK ***
[ 833.333689]
[ 833.339701] May be due to missing lock nesting notation
[ 833.339701]
[ 833.346582] 5 locks held by systemd-shutdow/1:
[ 833.351205] #0: ffffffffa9c89130 (system_transition_mutex){+.+.}-{4:4}, at: __se_sys_reboot+0xe6/0x210
[ 833.360695] #1: ffff93b399e5c1b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xb4/0x1f0
[ 833.369144] #2: ffff949d19a471b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xc2/0x1f0
[ 833.377603] #3: ffffffffa9eca050 (rtnl_mutex){+.+.}-{4:4}, at: gve_shutdown+0x33/0x90 [gve]
[ 833.386138] #4: ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve]
Introduce xdp_features_(set|clear)_redirect_target_locked() versions
which assume that the netdev lock has already been acquired before
setting the XDP feature flag and update GVE to use the locked version.
Fixes: 03df156dd3 ("xdp: double protect netdev->xdp_flags with netdev->lock")
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250422011643.3509287-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Many drivers populate the stats buffer using C-String based APIs (e.g.
ethtool_sprintf() and ethtool_puts()), usually when building up the
list of stats individually (i.e. with a for() loop). This, however,
requires that the source strings be populated in such a way as to have
a terminating NUL byte in the source.
Other drivers populate the stats buffer directly using one big memcpy()
of an entire array of strings. No NUL termination is needed here, as the
bytes are being directly passed through. Yet others will build up the
stats buffer individually, but also use memcpy(). This, too, does not
need NUL termination of the source strings.
However, there are cases where the strings that populate the
source stats strings are exactly ETH_GSTRING_LEN long, and GCC
15's -Wunterminated-string-initialization option complains that the
trailing NUL byte has been truncated. This situation is fine only if the
driver is using the memcpy() approach. If the C-String APIs are used,
the destination string name will have its final byte truncated by the
required trailing NUL byte applied by the C-string API.
For drivers that are already using memcpy() but have initializers that
truncate the NUL terminator, mark their source strings as __nonstring to
silence the GCC warnings.
For drivers that have initializers that truncate the NUL terminator and
are using the C-String APIs, switch to memcpy() to avoid destination
string truncation and mark their source strings as __nonstring to silence
the GCC warnings. (Also introduce ethtool_cpy() as a helper to make this
an easy replacement).
Specifically the following warnings were investigated and addressed:
../drivers/net/ethernet/chelsio/cxgb/cxgb2.c:364:9: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
364 | "TxFramesAbortedDueToXSCollisions",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/freescale/enetc/enetc_ethtool.c:165:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
165 | { ENETC_PM_R1523X(0), "MAC rx 1523 to max-octet packets" },
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/freescale/enetc/enetc_ethtool.c:190:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
190 | { ENETC_PM_T1523X(0), "MAC tx 1523 to max-octet packets" },
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/google/gve/gve_ethtool.c:76:9: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
76 | "adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:117:53: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
117 | STMMAC_STAT(ptp_rx_msg_type_pdelay_follow_up),
| ^
../drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:46:12: note: in definition of macro 'STMMAC_STAT'
46 | { #m, sizeof_field(struct stmmac_extra_stats, m), \
| ^
../drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:328:24: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
328 | .str = "a_mac_control_frames_transmitted",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c:340:24: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization]
340 | .str = "a_pause_mac_ctrl_frames_received",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250416010210.work.904-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Protect xdp_features with netdev->lock. This way pure readers
no longer have to take rtnl_lock to access the field.
This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
Looks like that's fine for bonding, the only "real" listener,
it's the same as ethtool feature change.
In terms of normal drivers - only GVE need special consideration
(other drivers don't use instance lock or don't support XDP).
It calls xdp_set_features_flag() helper from gve_init_priv() which
in turn is called from gve_reset_recovery() (locked), or prior
to netdev registration. So switch to _locked.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250408195956.412733-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the tx tail is less than the head (in cases of wraparound), the TX
consumed descriptor statistic in DQ will be reported as
UINT32_MAX - head + tail, which is incorrect. Mask the difference of
head and tail according to the ring size when reporting the statistic.
Cc: stable@vger.kernel.org
Fixes: 2c9198356d ("gve: Add consumed counts to ethtool stats")
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250402001037.2717315-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All the misc entry points end up calling into either gve_open()
or gve_close(), they take rtnl_lock today but since the recent
instance locking changes should also take the instance lock.
Found by code inspection and untested.
Fixes: cae03e5bdd ("net: hold netdev instance lock during queue operations")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250328164742.1268069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge in late fixes to prepare for the 6.15 net-next PR.
No conflicts, adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
919f9f497d ("eth: bnxt: fix out-of-range access of vnic_info array")
fe96d717d3 ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds support for running XDP programs on DQ, along with
rudimentary processing for XDP_DROP and XDP_PASS. These actions require
very limited driver functionality when it comes to processing an XDP
buffer, so currently if the XDP action is not XDP_PASS, the packet is
dropped and stats are updated.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaliginedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy<hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-7-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to support installing an XDP program on DQ, RX buffers need to
be reposted using 4K buffers, which is larger than the default packet
buffer size of 2K. This is needed to accommodate the extra head and tail
that accompanies the data portion of an XDP buffer. Continuing to use 2K
buffers would mean that the packet buffer size for the NIC would have to
be restricted to 2048 - 320 - 256 = 1472B. However, this is problematic
for two reasons: first, 1472 is not a packet buffer size accepted by
GVE; second, at least 1474B of buffer space is needed to accommodate an
MTU of 1460, which is the default on GCP. As such, we allocate 4K
buffers, and post a 2K section of those 4K buffers (offset relative to
the XDP headroom) to the NIC for DMA to avoid a potential extra copy.
Because the GQ-QPL datapath requires copies regardless, this change was
not needed to support XDP in that case.
To capture this subtlety, a new field, packet_buffer_truesize, has been
added to the rx ring struct to represent size of the allocated buffer,
while packet_buffer_size has been left to represent the portion of the
buffer posted to the NIC.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-6-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The data_buffer_size_dqo field in gve_priv and the packet_buffer_size
field in gve_rx_ring theoretically have the same meaning, but they are
defined in two different places and used in two separate contexts. There
is no good reason for this, so this change merges those fields into the
packet_buffer_size field in the RX ring.
This change also introduces a packet_buffer_size field to struct
gve_rx_queue_config to account for cases where queues are not allocated,
such as when the interface is down.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-5-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit ebdfae0d37 ("gve: adopt page pool for DQ RDA mode") introduced
a buf_size field to the gve_rx_slot_page_info struct, which can be used
in the datapath to take the place of the packet_buffer_size field, as it
will already be hot in the cache due to its extensive use. Using the
buf_size field in the datapath frees up the packet_buffer_size field in
the GQ-specific RX cacheline to be generalized for GQ and DQ (in the
next patch), as there is currently no common packet buffer size field
between the two queue formats.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-4-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An earlier patch series[1] introduced RX/TX ring allocation configuration
structs which contained metadata used to allocate and configure new RX
and TX rings. This led to a much cleaner and safer allocation pattern
wherein queue resources were not deallocated until new queue resources
were successfully allocated.
Migrate the XDP allocation path to use the same pattern to allow for the
existence of a single allocation path instead of relying on XDP-specific
allocation methods. These extra allocation methods result in the
duplication of many existing behaviors while being prone to error when
configuration changes unrelated to XDP occur.
Link: https://lore.kernel.org/netdev/20240122182632.1102721-1-shailend@google.com/ [1]
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-3-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These statistics pollute the hotpath and do not have any real-world use
or meaning.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250321002910.1343422-2-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit de70981f29 ("gve: unlink old napi when stopping a queue using
queue API") unlinks the old napi when stopping a queue. But this breaks
QPL mode of the driver which does not use page pool. Fix this by checking
that there's a page pool associated with the ring.
Cc: stable@vger.kernel.org
Fixes: de70981f29 ("gve: unlink old napi when stopping a queue using queue API")
Reviewed-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317214141.286854-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To add netmem support to the gve driver, add a union
to the struct gve_rx_slot_page_info. netmem_ref is used for
DQO queue format's raw DMA addressing(RDA) mode. The struct
page is retained for other usecases.
Then, switch to using relevant netmem helper functions for
page pool and skb frag management.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250307003905.601175-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the drivers that use queue management API, switch to the mode where
core stack holds the netdev instance lock. This affects the following
drivers:
- bnxt
- gve
- netdevsim
Originally I locked only start/stop, but switched to holding the
lock over all iterations to make them look atomic to the device
(feels like it should be easier to reason about).
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a queue is stopped using the ndo queue API, before
destroying its page pool, the associated NAPI instance
needs to be unlinked to avoid warnings.
Handle this by calling page_pool_disable_direct_recycling()
when stopping a queue.
Cc: stable@vger.kernel.org
Fixes: ebdfae0d37 ("gve: adopt page pool for DQ RDA mode")
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250226003526.1546854-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not all the devices have the capability for the driver to query for the
registered RSS configuration. The driver can discover this by checking
the relevant device option during setup. If it cannot, the driver needs
to store the RSS config cache and directly return such cache when
queried by the ethtool. RSS config is inited when driver probes. Also the
default RSS config will be adjusted when there is RX queue count change.
At this point, only keys of GVE_RSS_KEY_SIZE and indirection tables of
GVE_RSS_INDIR_SIZE are supported.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250219200451.3348166-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
default as part of driver initialization, and is never cleared. However,
this flag differs from others in that it is used as an indicator for
whether the driver is ready to perform the ndo_xdp_xmit operation as
part of an XDP_REDIRECT. Kernel helpers
xdp_features_(set|clear)_redirect_target exist to convey this meaning.
This patch ensures that the netdev is only reported as a redirect target
when XDP queues exist to forward traffic.
Fixes: 39a7f4aa3e ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable@vger.kernel.org
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20250214224417.1237818-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit f85949f982 ("xdp: add xdp_set_features_flag utility routine")
added routines to inform the core about XDP flag changes.
GVE support was added around the same time and missed using them.
GVE only changes the flags on error recover or resume.
Presumably the flags may change during resume if VM migrated.
User would not get the notification and upper devices would
not get a chance to recalculate their flags.
Fixes: 75eaae158b ("gve: Add XDP DROP and TX support for GQI-QPL format")
Reviewed-By: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250106180210.1861784-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit ba0925c34e ("gve: process XSK TX descriptors as part of RX NAPI")
moved XSK TX processing to be part of the RX NAPI. However, that commit
did not include triggering the RX NAPI in gve_xsk_wakeup. This is
necessary because the TX NAPI only processes TX completions, meaning
that a TX wakeup would not actually trigger XSK descriptor processing.
Also, the branch on XDP_WAKEUP_TX was supposed to have been removed, as
the NAPI should be scheduled whether the wakeup is for RX or TX.
Fixes: ba0925c34e ("gve: process XSK TX descriptors as part of RX NAPI")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://patch.msgid.link/20241221032807.302244-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch fixes a number of consistency issues in the queue allocation
path related to XDP.
As it stands, the number of allocated XDP queues changes in three
different scenarios.
1) Adding an XDP program while the interface is up via
gve_add_xdp_queues
2) Removing an XDP program while the interface is up via
gve_remove_xdp_queues
3) After queues have been allocated and the old queue memory has been
removed in gve_queues_start.
However, the requirement for the interface to be up for
gve_(add|remove)_xdp_queues to be called, in conjunction with the fact
that the number of queues stored in priv isn't updated until _after_ XDP
queues have been allocated in the normal queue allocation path means
that if an XDP program is added while the interface is down, XDP queues
won't be added until the _second_ if_up, not the first.
Given the expectation that the number of XDP queues is equal to the
number of RX queues, scenario (3) has another problematic implication.
When changing the number of queues while an XDP program is loaded, the
number of XDP queues must be updated as well, as there is logic in the
driver (gve_xdp_tx_queue_id()) which relies on every RX queue having a
corresponding XDP TX queue. However, the number of XDP queues stored in
priv would not be updated until _after_ a close/open leading to a
mismatch in the number of XDP queues reported vs the number of XDP
queues which actually exist after the queue count update completes.
This patch remedies these issues by doing the following:
1) The allocation config getter function is set up to retrieve the
_expected_ number of XDP queues to allocate instead of relying
on the value stored in `priv` which is only updated once the queues
have been allocated.
2) When adjusting queues, XDP queues are adjusted to match the number of
RX queues when XDP is enabled. This only works in the case when
queues are live, so part (1) of the fix must still be available in
the case that queues are adjusted when there is an XDP program and
the interface is down.
Fixes: 5f08cd3d64 ("gve: Alloc before freeing when adjusting queues")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When busy polling is enabled, xsk_sendmsg for AF_XDP zero copy marks
the NAPI ID corresponding to the memory pool allocated for the socket.
In GVE, this NAPI ID will never correspond to a NAPI ID of one of the
dedicated XDP TX queues registered with the umem because XDP TX is not
set up to share a NAPI with a corresponding RX queue.
This patch moves XSK TX descriptor processing from the TX NAPI to the RX
NAPI, and the gve_xsk_wakeup callback is updated to use the RX NAPI
instead of the TX NAPI, accordingly. The branch on if the wakeup is for
TX is removed, as the NAPI poll should be invoked whether the wakeup is
for TX or for RX.
Fixes: fd8e40321a ("gve: Add AF_XDP zero-copy support for GQI-QPL format")
Cc: stable@vger.kernel.org
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch predicates the enabling and disabling of XSK pools on the
existence of queues. As it stands, if the interface is down, disabling
or enabling XSK pools would result in a crash, as the RX queue pointer
would be NULL. XSK pool registration will occur as part of the next
interface up.
Similarly, xsk_wakeup needs be guarded against queues disappearing
while the function is executing, so a check against the
GVE_PRIV_FLAGS_NAPI_ENABLED flag is added to synchronize with the
disabling of the bit and the synchronize_net() in gve_turndown.
Fixes: fd8e40321a ("gve: Add AF_XDP zero-copy support for GQI-QPL format")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In GVE, dedicated XDP queues only exist when an XDP program is installed
and the interface is up. As such, the NDO XDP XMIT callback should
return early if either of these conditions are false.
In the case of no loaded XDP program, priv->num_xdp_queues=0 which can
cause a divide-by-zero error, and in the case of interface down,
num_xdp_queues remains untouched to persist XDP queue count for the next
interface up, but the TX pointer itself would be NULL.
The XDP xmit callback also needs to synchronize with a device
transitioning from open to close. This synchronization will happen via
the GVE_PRIV_FLAGS_NAPI_ENABLED bit along with a synchronize_net() call,
which waits for any RCU critical sections at call-time to complete.
Fixes: 39a7f4aa3e ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When stopping XDP TX rings, the XDP clean function needs to be called to
clean out the entire queue, similar to what happens in the normal TX
queue case. Otherwise, the FIFO won't be cleared correctly, and
xsk_tx_completed won't be reported.
Fixes: 75eaae158b ("gve: Add XDP DROP and TX support for GQI-QPL format")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b35108a51c ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
@@ constant C; @@
- msecs_to_jiffies(C * 1000)
+ secs_to_jiffies(C)
@@ constant C; @@
- msecs_to_jiffies(C * MSEC_PER_SEC)
+ secs_to_jiffies(C)
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://patch.msgid.link/20241212-netdev-converge-secs-to-jiffies-v4-1-6dac97a6d6ab@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The last use of gve_adminq_set_mtu() was removed by
commit 37149e9374 ("gve: Implement packet continuation for RX.")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://patch.msgid.link/20241211001927.253161-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When configuring flow steering rules, the driver is currently going
through a reset for all errors from the device. Instead, the driver
should only reset when there's a timeout error from the device.
Fixes: 57718b60df ("gve: Add flow steering adminq commands")
Cc: stable@vger.kernel.org
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241113175930.2585680-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver currently uses page_pool_put_page() to recycle
page pool pages. Since gve uses split pages, if the fragment
being recycled is not the last fragment in the page, there
is no dma sync operation. When the last fragment is recycled,
dma sync is performed by page pool infra according to the
value passed as dma_sync_size which right now is set to the
size of fragment.
But the correct thing to do is to dma sync the entire page when
the last fragment is recycled. Hence change to using
page_pool_put_full_page().
Link: https://lore.kernel.org/netdev/89d7ce83-cc1d-4791-87b5-6f7af29a031d@huawei.com/
Suggested-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Fixes: ebdfae0d37 ("gve: adopt page pool for DQ RDA mode")
Link: https://patch.msgid.link/20241023221141.3008011-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement netdev_stats_ops to export basic per-queue stats.
With page pool support for DQO added in the previous patches,
rx-alloc-fail captures failures in page pool allocations as
well since the rx_buf_alloc_fail stat tracked in the driver
is incremented when gve_alloc_buffer returns error.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241014202108.1051963-4-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For DQ queue format in raw DMA addressing(RDA) mode,
implement page pool recycling of buffers by leveraging
a few helper functions.
DQ QPL mode will continue to use the exisiting recycling
logic. This is because in QPL mode, the pages come from a
constant set of pages that the driver pre-allocates and
registers with the device.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241014202108.1051963-3-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for the upcoming page pool adoption for DQO
raw addressing mode, move RX buffer management code to a new
file. In the follow on patches, page pool code will be added
to this file.
No functional change, just movement of code.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241014202108.1051963-2-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit f13697cc7a ("gve: Switch to config-aware queue allocation")
convert this function to gve_rx_alloc_rings_gqi().
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816101906.882743-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a device option to inform the driver about the hash key size and
hash table size used by the device. This information will be stored and
made available for RSS ethtool operations.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240812222013.1503584-2-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GVE driver wrongly relies on netif_carrier_ok() to check the
interface administrative state when resources are being
allocated/deallocated for queue(s). netif_carrier_ok() needs
to be replaced with netif_running() for all such cases.
Administrative state is the result of "ip link set dev <dev>
up/down". It reflects whether the administrator wants to use
the device for traffic and the corresponding resources have
been allocated.
Fixes: 5f08cd3d64 ("gve: Alloc before freeing when adjusting queues")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240801205619.987396-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The NIC requires each TSO segment to not span more than 10
descriptors. NIC further requires each descriptor to not exceed
16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO).
The descriptors for an skb are generated by
gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format.
gve_tx_add_skb_no_copy_dqo() loops through each skb frag and
generates a descriptor for the entire frag if the frag size is
not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is
greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s)
of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for
the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO).
gve_can_send_tso() checks if the descriptors thus generated for an
skb would meet the requirement that each TSO-segment not span more
than 10 descriptors. However, the current code misses an edge case
when a TSO segment spans multiple descriptors within a large frag.
This change fixes the edge case.
gve_can_send_tso() relies on the assumption that max gso size (9728)
is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb
fragment a TSO segment can never span more than 2 descriptors.
Fixes: a57e5de476 ("gve: DQO: Add TX path")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In gve_clean_xdp_done, the driver processes the TX completions based on
a 32-bit NIC counter and a 32-bit completion counter stored in the tx
queue.
Fix the for loop so that the counter wraparound is handled correctly.
Fixes: 75eaae158b ("gve: Add XDP DROP and TX support for GQI-QPL format")
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240716171041.1561142-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement the ethtool commands that can be used to configure and query
flow-steering rules.
A large part of this change consists of translating the ethtool
representation of 'ntuples' to our internal gve_flow_rule and vice-versa
in the new created gve_flow_rule.c
Considering the possible large amount of flow rules, the driver doesn't
store all the rules locally. When the user runs 'ethtool -n <nic>' to
check the registered rules, the driver will send adminq command to
query a limited amount of rules/rule ids(that filled in a 4096 bytes dma
memory) at a time as a cache for the ethtool queries. The adminq query
commands will be repeated for several times until the ethtool has
queried all the needed rules.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-6-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add new adminq commands for the driver to configure and query flow rules
that are stored in the device. Flow steering rules are assigned with a
location that determines the relative order of the rules.
Flow rules can run up to an order of millions. In such cases, storing
a full copy of the rules in the driver to prepare for the ethtool query
is infeasible while querying them from the device is better. That needs
to be optimized too so that we don't send a lot of adminq commands. The
solution here is to store a limited number of rules/rule ids in the
driver in a cache. Use dma_pool to allocate 4k bytes which lets device
write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time.
For configuring flow rules, there are 3 sub-commands:
- ADD which adds a rule at the location supplied
- DEL which deletes the rule at the location supplied
- RESET which clears all currently active rules in the device
For querying flow rules, there are also 3 sub-commands:
- QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
the allocated cache after querying the device
- QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
rule_ids in the allocated cache after querying the device
- QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
device's current flow rule number and the supported max flow rule
limit
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-5-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new device option to signal to the driver that the device supports
flow steering. This device option also carries the maximum number of
flow steering rules that the device can store.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-4-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The adminq command is limited to 64 bytes per entry and it's 56 bytes
for the command itself at maximum. To support larger commands, we need
to dma_alloc a separate memory to put the command in that memory and
send the dma memory address instead of the actual command.
Introduce an extended adminq command to wrap the real command with the
inner opcode and the allocated dma memory address specified. Once the
device receives it, it can get the real command from the given dma
memory address. As designed with the device, all the extended commands
will use inner opcode larger than 0xFF.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-3-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We were depending on the rtnl_lock to make sure there is only one adminq
command running at a time. But some commands may take too long to hold
the rtnl_lock, such as the upcoming flow steering operations. For such
situations, it can temporarily drop the rtnl_lock, and replace it for
these operations with a new adminq lock, which can ensure the adminq
command execution to be thread-safe.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-2-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it
is freed with dev_kfree_skb_any(). This can result in a subsequent call
to napi_get_frags returning a dangling pointer.
Fix this by clearing napi->skb before the skb is freed.
Fixes: 9b8dd5e5ea ("gve: DQO: Add RX path")
Cc: stable@vger.kernel.org
Reported-by: Shailend Chand <shailend@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TSO currently fails when the skb's gso_type field has more than one bit
set.
TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a
few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes
virtualization, such as QEMU, a real use-case.
The gso_type and gso_size fields as passed from userspace in
virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type
|= SKB_GSO_DODGY to force the packet to enter the software GSO stack
for verification.
This issue might similarly come up when the CWR bit is set in the TCP
header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit
to be set.
Fixes: a57e5de476 ("gve: DQO: Add TX path")
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
v2 - Remove unnecessary comments, remove line break between fixes tag
and signoffs.
v3 - Add back unrelated empty line removal.
Link: https://lore.kernel.org/r/20240610225729.2985343-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of standard helpers to simplify filling in stats strings.
The first two ethtool_puts() changes address the following fortification
warnings flagged by W=1 builds with clang-18. (The last ethtool_puts
change does not because the warning relates to writing beyond the first
element of an array, and gve_gstrings_priv_flags only has one element.)
.../fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
562 | __read_overflow2_field(q_size_field, size);
| ^
.../fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
Likewise, the same changes resolve the same problems flagged by Smatch.
.../gve_ethtool.c:100 gve_get_strings() error: __builtin_memcpy() '*gve_gstrings_main_stats' too small (32 vs 576)
.../gve_ethtool.c:120 gve_get_strings() error: __builtin_memcpy() '*gve_gstrings_adminq_stats' too small (32 vs 512)
Compile tested only.
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240508-gve-comma-v2-2-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Although it does not seem to have any untoward side-effects,
the use of ';' to separate to assignments seems more appropriate than ','.
Flagged by clang-18 -Wcomma
No functional change intended.
Compile tested only.
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508-gve-comma-v2-1-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new netdev queue api is implemented for gve.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/all/20240501232549.1327174-11-shailend@google.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Every tx and rx ring has its own queue-page-list (QPL) that serves as
the bounce buffer. Previously we were allocating QPLs for all queues
before the queues themselves were allocated and later associating a QPL
with a queue. This is avoidable complexity: it is much more natural for
each queue to allocate and free its own QPL.
Moreover, the advent of new queue-manipulating ndo hooks make it hard to
keep things as is: we would need to transfer a QPL from an old queue to
a new queue, and that is unpleasant.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now account for the fact that the NIC might send us stats for a
subset of queues. Without this change, gve_get_ethtool_stats might make
an invalid access on the priv->stats_report->stats array.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This does not fix any existing bug. In anticipation of the ndo queue api
hooks that alloc/free/start/stop a single Rx queue, the already existing
per-queue stop functions are being made more robust. Specifically for
this use case: rx_queue_n.stop() + rx_queue_n.start()
Note that this is not the use case being used in devmem tcp (the first
place these new ndo hooks would be used). There the usecase is:
new_queue.alloc() + old_queue.stop() + new_queue.start() + old_queue.free()
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make possible the implementation of per-queue ndo hooks,
gve_turnup was changed in a previous patch to account for queues already
having some unprocessed descriptors: it does a one-off napi_schdule to
handle them. If conditions of consistent high traffic persist in the
immediate aftermath of this, the poll routine for a queue can be "stuck"
on the cpu on which the ndo hooks ran, instead of the cpu its irq has
affinity with.
This situation is exacerbated by the fact that the ndo hooks for all the
queues are invoked on the same cpu, potentially causing all the napi
poll routines to be residing on the same cpu.
A self correcting mechanism in the poll method itself solves this
problem.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gVNIC has a requirement that all queues have to be quiesced before any
queue is operated on (created or destroyed). To enable the
implementation of future ndo hooks that work on a single queue, we need
to evolve gve_turnup to account for queues already having some
unprocessed descriptors in the ring.
Say rxq 4 is being stopped and started via the queue api. Due to gve's
requirement of quiescence, queues 0 through 3 are not processing their
rings while queue 4 is being toggled. Once they are made live, these
queues need to be poked to cause them to check their rings for
descriptors that were written during their brief period of quiescence.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the queues are either all live or all dead, toggling from one
state to the other via the ndo open and stop hooks. The future addition
of single-queue ndo hooks changes this, and thus gve_turnup and
gve_turndown should evolve to account for a state where some queues are
live and some aren't.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows for implementing future ndo hooks that act on a single
queue.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although this is not fixing any existing double free bug, making these
functions idempotent allows for a simpler implementation of future ndo
hooks that act on a single queue.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qpl_cfg struct was used to make sure that no two different queues
are using QPL with the same qpl_id. We can remove that qpl_cfg struct
since now the qpl_ids map with the queues respectively as follows:
For tx queues: qpl_id = tx_qid
For rx queues: qpl_id = max_tx_queues + rx_qid
And when XDP is used, it will need the user to reduce the tx queues to
be at most half of the max_tx_queues. Then it will use the same number
of tx queues starting from the end of existing tx queues for XDP. So the
XDP queues will not exceed the max_tx_queues range and will not overlap
with the rx queues, where the qpl_ids will not have overlapping too.
Considering of that, we remove the qpl_cfg struct to get the qpl_id
directly based on the queue id. Unless we are erroneously allocating a
rx/tx queue that has already been allocated, we would never allocate
the qpl with the same qpl_id twice. In that case, it should fail much
earlier than the QPL assignment.
Suggested-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20240417205757.778551-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
gve has supported software timestamp generation since its inception,
but has not advertised that support via ethtool. This patch correctly
advertises that support.
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the user to change ring size via ethtool if
supported by the device. The driver relies on the
ring size ranges queried from device to validate
ring sizes requested by the user.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to read ring size change capability and the
min and max descriptor counts from the device and store it
in the driver. Also accommodate a special case where the
device does not provide minimum ring size depending on the
version of the device. In that case, rely on default values
for the minimums.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fulfill the requirement that for GQI, the number of pages per
RX QPL is equal to the ring size. Set this value to be equal to
ring size. Because of this change, the rx_data_slot_cnt and
rx_pages_per_qpl fields stored in the priv structure are not
needed, so remove their usage. And for DQO, the number of pages
per RX QPL is more than ring size to account for out-of-order
completions. So set it to two times of rx ring size.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the DQO queue format, the gve driver stores two ring sizes
for both TX and RX - one for completion queue ring and one for
data buffer ring. This is supposed to enable asymmetric sizes
for these two rings but that is not supported. Make both fields
reference the same single variable.
This change renders reading supported TX completion ring size
and RX buffer ring size for DQO from the device useless, so change
those fields to reserved and remove related code.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Combine the gve_set_desc_cnt and gve_set_desc_cnt_dqo into
one function which sets the counts after checking the queue
format. Both the functions in the previous code and the new
combined function never return an error so make the new
function void and remove the goto on error.
Also rename the new function to gve_set_default_desc_cnt to
be clearer about its intention.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This counter counts the number of times get_ptype_map is executed on the
admin queue, and was previously missing from the stats report.
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240325223308.618671-1-jfraker@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When draining a page_frag_cache, most user are doing
the similar steps, so introduce an API to avoid code
duplication.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
To record the stats of header split packets, three stats are added in
the driver's ethtool stats.
- rx_hsplit_pkt is the split packets count with header split
- rx_hsplit_bytes is the received header bytes count with header split
- rx_hsplit_unsplit_pkt is the unsplit packet count due to header buffer
overflow or zero header length when header split is enabled
Currently, it's entering the stats_update critical section more than
once per packet. We have plans to avoid that in the future change to let
all the stats_update happen in one place at the end of
`gve_rx_poll_dqo`.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add header buffers and ethtool support to enable header split via the
tcp-data-split flag in ethtool's ringparam config. A coherent dma memory
is allocated for the header buffers. There is one header buffer per ring
entry by calculating the offset to the header-buffers starting address.
The header buffer is always copied directly into the skb and payload is
always added as frags. When there is a header buffer overflow or the
header length is 0, the driver places the whole unsplit packet in frags.
When toggling header split, the driver will call gve_adjust_config to
set its queues appropriately. If header split is enabled by the user and
the max packet buffer size is no less than 4KB, driver will set the
packet buffer size as 4KB to support TCP_ZEROCOPY_RECEIVE. Otherwise the
driver will use the default 2KB as the packet buffer size.
`ethtool -G <dev> tcp-data-split on/off` is the command to toggle header
split.
`ethtool -g <dev>` will show the status of header split with the field
of `tcp-data-split`.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To enable header split via ethtool, we first need to query the device to
get the max rx buffer size and header buffer size. Add a device option
to get these values and store them in the driver. If the header buffer
size received from the device is non-zero, it means header split is
supported in the device.
Currently the max rx buffer size will only be used when header split is
enabled which will set the data_buffer_size_dqo to be the max rx buffer
size. Also change the data_buffer_size_dqo from int to u16 since we are
modifying it and making it to be consistent with max_rx_buffer_size.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, each caller of gve_rx_alloc_buffer had to increase counter
and as a result one caller was not tracking those failure. Increasing
counters at a common location now so callers don't have to duplicate
code or miss counter management.
Signed-off-by: Ankit Garg <nktgrg@google.com>
Link: https://lore.kernel.org/r/20240124205435.1021490-1-nktgrg@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For a skb frag with a newly allocated copy page, the true size is
incorrectly set to packet buffer size. It should be set to PAGE_SIZE
instead.
Fixes: 82fd151d38 ("gve: Reduce alloc and copy costs in the GQ rx path")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://lore.kernel.org/r/20240124161025.1819836-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, existing queues were being freed before the resources for
the new queues were being allocated. This would take down the interface
if someone were to attempt to change feature flags under a resource
crunch.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Link: https://lore.kernel.org/r/20240122182632.1102721-7-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, existing queues were being freed before the resources for
the new queues were being allocated. This would take down the interface
if someone were to attempt to change queue counts under a resource
crunch.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Link: https://lore.kernel.org/r/20240122182632.1102721-6-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
gve_open is rewritten to be composed of two funcs: gve_queues_mem_alloc
and gve_queues_start. The former only allocates queue resources without
doing anything to install the queues, which is taken up by the latter.
Similarly gve_close is split into gve_queues_stop and
gve_queues_mem_free.
Separating the acts of queue resource allocation and making the queue
become live help with subsequent changes that aim to not take down the
datapath when applying new configurations.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Link: https://lore.kernel.org/r/20240122182632.1102721-5-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new config-aware functions will help achieve the goal of being able
to allocate resources for new queues while there already are active
queues serving traffic.
These new functions work off of arbitrary queue allocation configs
rather than just the currently active config in priv, and they return
the newly allocated resources instead of writing them into priv.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122182632.1102721-4-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>