Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
libeth: add libeth_xdp helper lib

Alexander Lobakin says:

Time to add XDP helpers infra to libeth to greatly simplify adding
XDP to idpf and iavf, as well as improve and extend XDP in ice and
i40e. Any vendor is free to reuse helpers. If this happens, I'm fine
with moving the folder of out intel/.

The helpers greatly simplify building xdp_buff, running a prog,
handling the verdict, implement XDP_TX, .ndo_xdp_xmit, XDP buffer
completion. Same applies to XSk (with XSk xmit instead of
.ndo_xdp_xmit, plus stuff like XSk wakeup).
They are entirely generic with no HW definitions or assumptions.
HW-specific stuff like parsing Rx desc / filling Tx desc is passed
from the driver as inline callbacks.

For now, key assumptions that optimize performance / avoid code
bloat, but might not fit every driver in driver/net/:
 * netmem holding the buffers are always order-0;
 * driver has separate XDP Tx queues, doesn't use stack queues for
   that. For best efficiency, you may want to have nr_cpu_ids XDP
   queues, but less (queue sharing) is also supported;
 * XDP Tx queues are interrupt-less and use "lazy" cleaning only
   when there are less than 1/4 free Tx descriptors of the queue
   size;
 * main target platforms are 64-bit, although 32-bit is also fully
   supported, but the code might be not as optimized for them.

Library code already supports multi-buffer for all kinds of Tx and
both header split and no split for Rx and Tx. Frags can come from
devmem/io_uring etc., direct `struct page *` is used only for header
buffers for which it's always true.
Drivers are free to pass their own Rx hints and XSK xmit hints ops.

XDP_TX and ndo_xdp_xmit use onstack bulk for the frames to be sent
and send them by batches of 16 buffers. This eats ~280 bytes on the
stack, but gives good boosts and allow to greatly optimize the main
sending function leaving it without any error/exception paths.

XSk xmit fills Tx descriptors in the loop unrolled by 8. This was
proven to improve perf on ice and i40e. XDP_TX and ndo_xdp_xmit
doesn't use unrolling as I wasn't able to get any improvements in
those scenenarios from this, while +1 Kb for their sending functions
for nothing doesn't sound reasonable.

XSk wakeup, instead of traditionally used "SW interrupts" provided
by NICs, uses IPI to schedule NAPI on the CPU corresponding to the
given queue pair. It gives better control over CPU distribution and
in general performs way better than "SW interrupts", plus allows us
to not pass any HW-specific callbacks there.

The code is built the way that all callbacks passed from drivers
get inlined; in general, most of hotpath gets inlined. Everything
slow/exception lands to .c files in the libeth folder, doesn't
create copies in the drivers themselves and doesn't overloat
hotpath.
Sure, inlining means that hotpath will be compiled into every driver
that uses the lib, but the core code is written in one place, so no
copying of bugs happens. Fixed once -- works everywhere.

The last commit might look like sorta hack, but it gives really good
boosts and decreases object code size, plus there are checks that
all those wider accesses are fully safe, so I don't feel anything
bad about it.

An example of using libeth_xdp can be found either on my GitHub or
on the mailing lists here ("XDP for idpf"). Macros for building
driver XDP functions lead to that some implementations (XDP_TX,
ndo_xdp_xmit etc.) consist of really only a few lines.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  libeth: xdp, xsk: access adjacent u32s as u64 where applicable
  libeth: xsk: add XSkFQ refill and XSk wakeup helpers
  libeth: xsk: add XSk Rx processing support
  libeth: xsk: add XSk xmit functions
  libeth: xsk: add XSk XDP_TX sending helpers
  libeth: xdp: add RSS hash hint and XDP features setup helpers
  libeth: xdp: add templates for building driver-side callbacks
  libeth: xdp: add XDP prog run and verdict result handling
  libeth: xdp: add helpers for preparing/processing &libeth_xdp_buff
  libeth: xdp: add XDPSQ cleanup timers
  libeth: xdp: add XDPSQ locking helpers
  libeth: xdp: add XDPSQE completion helpers
  libeth: xdp: add .ndo_xdp_xmit() helpers
  libeth: xdp: add XDP_TX buffers sending
  libeth: support native XDP and register memory model
  libeth: convert to netmem
  libeth, libie: clean symbol exports up a little
====================

Link: https://patch.msgid.link/20250616201639.710420-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2025-06-17 18:50:57 -07:00
commit 189bd9c873
16 changed files with 3596 additions and 57 deletions

View File

@ -723,7 +723,7 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) {
const struct libeth_fqe *rx_fqes = &rx_ring->rx_fqes[i];
page_pool_put_full_page(rx_ring->pp, rx_fqes->page, false);
libeth_rx_recycle_slow(rx_fqes->netmem);
if (unlikely(++i == rx_ring->count))
i = 0;
@ -1197,10 +1197,11 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
const struct libeth_fqe *rx_buffer,
unsigned int size)
{
u32 hr = rx_buffer->page->pp->p.offset;
u32 hr = netmem_get_pp(rx_buffer->netmem)->p.offset;
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->offset + hr, size, rx_buffer->truesize);
skb_add_rx_frag_netmem(skb, skb_shinfo(skb)->nr_frags,
rx_buffer->netmem, rx_buffer->offset + hr,
size, rx_buffer->truesize);
}
/**
@ -1214,12 +1215,13 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
unsigned int size)
{
u32 hr = rx_buffer->page->pp->p.offset;
struct page *buf_page = __netmem_to_page(rx_buffer->netmem);
u32 hr = buf_page->pp->p.offset;
struct sk_buff *skb;
void *va;
/* prefetch first cache line of first page */
va = page_address(rx_buffer->page) + rx_buffer->offset;
va = page_address(buf_page) + rx_buffer->offset;
net_prefetch(va + hr);
/* build an skb around the page buffer */

View File

@ -1006,7 +1006,7 @@ static int idpf_rx_singleq_clean(struct idpf_rx_queue *rx_q, int budget)
break;
skip_data:
rx_buf->page = NULL;
rx_buf->netmem = 0;
IDPF_SINGLEQ_BUMP_RING_IDX(rx_q, ntc);
cleaned_count++;

View File

@ -383,12 +383,12 @@ static int idpf_tx_desc_alloc_all(struct idpf_vport *vport)
*/
static void idpf_rx_page_rel(struct libeth_fqe *rx_buf)
{
if (unlikely(!rx_buf->page))
if (unlikely(!rx_buf->netmem))
return;
page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
libeth_rx_recycle_slow(rx_buf->netmem);
rx_buf->page = NULL;
rx_buf->netmem = 0;
rx_buf->offset = 0;
}
@ -3240,10 +3240,10 @@ idpf_rx_process_skb_fields(struct idpf_rx_queue *rxq, struct sk_buff *skb,
void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
unsigned int size)
{
u32 hr = rx_buf->page->pp->p.offset;
u32 hr = netmem_get_pp(rx_buf->netmem)->p.offset;
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
rx_buf->offset + hr, size, rx_buf->truesize);
skb_add_rx_frag_netmem(skb, skb_shinfo(skb)->nr_frags, rx_buf->netmem,
rx_buf->offset + hr, size, rx_buf->truesize);
}
/**
@ -3266,16 +3266,20 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
struct libeth_fqe *buf, u32 data_len)
{
u32 copy = data_len <= L1_CACHE_BYTES ? data_len : ETH_HLEN;
struct page *hdr_page, *buf_page;
const void *src;
void *dst;
if (!libeth_rx_sync_for_cpu(buf, copy))
if (unlikely(netmem_is_net_iov(buf->netmem)) ||
!libeth_rx_sync_for_cpu(buf, copy))
return 0;
dst = page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
src = page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
memcpy(dst, src, LARGEST_ALIGN(copy));
hdr_page = __netmem_to_page(hdr->netmem);
buf_page = __netmem_to_page(buf->netmem);
dst = page_address(hdr_page) + hdr->offset + hdr_page->pp->p.offset;
src = page_address(buf_page) + buf->offset + buf_page->pp->p.offset;
memcpy(dst, src, LARGEST_ALIGN(copy));
buf->offset += copy;
return copy;
@ -3291,11 +3295,12 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
*/
struct sk_buff *idpf_rx_build_skb(const struct libeth_fqe *buf, u32 size)
{
u32 hr = buf->page->pp->p.offset;
struct page *buf_page = __netmem_to_page(buf->netmem);
u32 hr = buf_page->pp->p.offset;
struct sk_buff *skb;
void *va;
va = page_address(buf->page) + buf->offset;
va = page_address(buf_page) + buf->offset;
prefetch(va + hr);
skb = napi_build_skb(va, buf->truesize);
@ -3429,7 +3434,8 @@ static int idpf_rx_splitq_clean(struct idpf_rx_queue *rxq, int budget)
if (unlikely(!hdr_len && !skb)) {
hdr_len = idpf_rx_hsplit_wa(hdr, rx_buf, pkt_len);
pkt_len -= hdr_len;
/* If failed, drop both buffers by setting len to 0 */
pkt_len -= hdr_len ? : pkt_len;
u64_stats_update_begin(&rxq->stats_sync);
u64_stats_inc(&rxq->q_stats.hsplit_buf_ovf);
@ -3446,7 +3452,7 @@ static int idpf_rx_splitq_clean(struct idpf_rx_queue *rxq, int budget)
u64_stats_update_end(&rxq->stats_sync);
}
hdr->page = NULL;
hdr->netmem = 0;
payload:
if (!libeth_rx_sync_for_cpu(rx_buf, pkt_len))
@ -3462,7 +3468,7 @@ static int idpf_rx_splitq_clean(struct idpf_rx_queue *rxq, int budget)
break;
skip_data:
rx_buf->page = NULL;
rx_buf->netmem = 0;
idpf_rx_post_buf_refill(refillq, buf_id);
IDPF_RX_BUMP_NTC(rxq, ntc);

View File

@ -1,9 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-only
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2024-2025 Intel Corporation
config LIBETH
tristate
tristate "Common Ethernet library (libeth)" if COMPILE_TEST
select PAGE_POOL
help
libeth is a common library containing routines shared between several
drivers, but not yet promoted to the generic kernel API.
config LIBETH_XDP
tristate "Common XDP library (libeth_xdp)" if COMPILE_TEST
select LIBETH
help
XDP and XSk helpers based on libeth hotpath management.

View File

@ -1,6 +1,12 @@
# SPDX-License-Identifier: GPL-2.0-only
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2024-2025 Intel Corporation
obj-$(CONFIG_LIBETH) += libeth.o
libeth-y := rx.o
libeth-y += tx.o
obj-$(CONFIG_LIBETH_XDP) += libeth_xdp.o
libeth_xdp-y += xdp.o
libeth_xdp-y += xsk.o

View File

@ -0,0 +1,37 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (C) 2025 Intel Corporation */
#ifndef __LIBETH_PRIV_H
#define __LIBETH_PRIV_H
#include <linux/types.h>
/* XDP */
enum xdp_action;
struct libeth_xdp_buff;
struct libeth_xdp_tx_frame;
struct skb_shared_info;
struct xdp_frame_bulk;
extern const struct xsk_tx_metadata_ops libeth_xsktmo_slow;
void libeth_xsk_tx_return_bulk(const struct libeth_xdp_tx_frame *bq,
u32 count);
u32 libeth_xsk_prog_exception(struct libeth_xdp_buff *xdp, enum xdp_action act,
int ret);
struct libeth_xdp_ops {
void (*bulk)(const struct skb_shared_info *sinfo,
struct xdp_frame_bulk *bq, bool frags);
void (*xsk)(struct libeth_xdp_buff *xdp);
};
void libeth_attach_xdp(const struct libeth_xdp_ops *ops);
static inline void libeth_detach_xdp(void)
{
libeth_attach_xdp(NULL);
}
#endif /* __LIBETH_PRIV_H */

View File

@ -1,5 +1,9 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2024 Intel Corporation */
/* Copyright (C) 2024-2025 Intel Corporation */
#define DEFAULT_SYMBOL_NAMESPACE "LIBETH"
#include <linux/export.h>
#include <net/libeth/rx.h>
@ -68,7 +72,7 @@ static u32 libeth_rx_hw_len_truesize(const struct page_pool_params *pp,
static bool libeth_rx_page_pool_params(struct libeth_fq *fq,
struct page_pool_params *pp)
{
pp->offset = LIBETH_SKB_HEADROOM;
pp->offset = fq->xdp ? LIBETH_XDP_HEADROOM : LIBETH_SKB_HEADROOM;
/* HW-writeable / syncable length per one page */
pp->max_len = LIBETH_RX_PAGE_LEN(pp->offset);
@ -155,11 +159,12 @@ int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi)
.dev = napi->dev->dev.parent,
.netdev = napi->dev,
.napi = napi,
.dma_dir = DMA_FROM_DEVICE,
};
struct libeth_fqe *fqes;
struct page_pool *pool;
bool ret;
int ret;
pp.dma_dir = fq->xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
if (!fq->hsplit)
ret = libeth_rx_page_pool_params(fq, &pp);
@ -173,20 +178,28 @@ int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi)
return PTR_ERR(pool);
fqes = kvcalloc_node(fq->count, sizeof(*fqes), GFP_KERNEL, fq->nid);
if (!fqes)
if (!fqes) {
ret = -ENOMEM;
goto err_buf;
}
ret = xdp_reg_page_pool(pool);
if (ret)
goto err_mem;
fq->fqes = fqes;
fq->pp = pool;
return 0;
err_mem:
kvfree(fqes);
err_buf:
page_pool_destroy(pool);
return -ENOMEM;
return ret;
}
EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, "LIBETH");
EXPORT_SYMBOL_GPL(libeth_rx_fq_create);
/**
* libeth_rx_fq_destroy - destroy a &page_pool created by libeth
@ -194,22 +207,23 @@ EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, "LIBETH");
*/
void libeth_rx_fq_destroy(struct libeth_fq *fq)
{
xdp_unreg_page_pool(fq->pp);
kvfree(fq->fqes);
page_pool_destroy(fq->pp);
}
EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, "LIBETH");
EXPORT_SYMBOL_GPL(libeth_rx_fq_destroy);
/**
* libeth_rx_recycle_slow - recycle a libeth page from the NAPI context
* @page: page to recycle
* libeth_rx_recycle_slow - recycle libeth netmem
* @netmem: network memory to recycle
*
* To be used on exceptions or rare cases not requiring fast inline recycling.
*/
void libeth_rx_recycle_slow(struct page *page)
void __cold libeth_rx_recycle_slow(netmem_ref netmem)
{
page_pool_recycle_direct(page->pp, page);
page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false);
}
EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, "LIBETH");
EXPORT_SYMBOL_GPL(libeth_rx_recycle_slow);
/* Converting abstract packet type numbers into a software structure with
* the packet parameters to do O(1) lookup on Rx.
@ -251,7 +265,7 @@ void libeth_rx_pt_gen_hash_type(struct libeth_rx_pt *pt)
pt->hash_type |= libeth_rx_pt_xdp_iprot[pt->inner_prot];
pt->hash_type |= libeth_rx_pt_xdp_pl[pt->payload_layer];
}
EXPORT_SYMBOL_NS_GPL(libeth_rx_pt_gen_hash_type, "LIBETH");
EXPORT_SYMBOL_GPL(libeth_rx_pt_gen_hash_type);
/* Module */

View File

@ -0,0 +1,41 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2025 Intel Corporation */
#define DEFAULT_SYMBOL_NAMESPACE "LIBETH"
#include <net/libeth/xdp.h>
#include "priv.h"
/* Tx buffer completion */
DEFINE_STATIC_CALL_NULL(bulk, libeth_xdp_return_buff_bulk);
DEFINE_STATIC_CALL_NULL(xsk, libeth_xsk_buff_free_slow);
/**
* libeth_tx_complete_any - perform Tx completion for one SQE of any type
* @sqe: Tx buffer to complete
* @cp: polling params
*
* Can be used to complete both regular and XDP SQEs, for example when
* destroying queues.
* When libeth_xdp is not loaded, XDPSQEs won't be handled.
*/
void libeth_tx_complete_any(struct libeth_sqe *sqe, struct libeth_cq_pp *cp)
{
if (sqe->type >= __LIBETH_SQE_XDP_START)
__libeth_xdp_complete_tx(sqe, cp, static_call(bulk),
static_call(xsk));
else
libeth_tx_complete(sqe, cp);
}
EXPORT_SYMBOL_GPL(libeth_tx_complete_any);
/* Module */
void libeth_attach_xdp(const struct libeth_xdp_ops *ops)
{
static_call_update(bulk, ops ? ops->bulk : NULL);
static_call_update(xsk, ops ? ops->xsk : NULL);
}
EXPORT_SYMBOL_GPL(libeth_attach_xdp);

View File

@ -0,0 +1,451 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2025 Intel Corporation */
#define DEFAULT_SYMBOL_NAMESPACE "LIBETH_XDP"
#include <linux/export.h>
#include <net/libeth/xdp.h>
#include "priv.h"
/* XDPSQ sharing */
DEFINE_STATIC_KEY_FALSE(libeth_xdpsq_share);
EXPORT_SYMBOL_GPL(libeth_xdpsq_share);
void __libeth_xdpsq_get(struct libeth_xdpsq_lock *lock,
const struct net_device *dev)
{
bool warn;
spin_lock_init(&lock->lock);
lock->share = true;
warn = !static_key_enabled(&libeth_xdpsq_share);
static_branch_inc(&libeth_xdpsq_share);
if (warn && net_ratelimit())
netdev_warn(dev, "XDPSQ sharing enabled, possible XDP Tx slowdown\n");
}
EXPORT_SYMBOL_GPL(__libeth_xdpsq_get);
void __libeth_xdpsq_put(struct libeth_xdpsq_lock *lock,
const struct net_device *dev)
{
static_branch_dec(&libeth_xdpsq_share);
if (!static_key_enabled(&libeth_xdpsq_share) && net_ratelimit())
netdev_notice(dev, "XDPSQ sharing disabled\n");
lock->share = false;
}
EXPORT_SYMBOL_GPL(__libeth_xdpsq_put);
void __acquires(&lock->lock)
__libeth_xdpsq_lock(struct libeth_xdpsq_lock *lock)
{
spin_lock(&lock->lock);
}
EXPORT_SYMBOL_GPL(__libeth_xdpsq_lock);
void __releases(&lock->lock)
__libeth_xdpsq_unlock(struct libeth_xdpsq_lock *lock)
{
spin_unlock(&lock->lock);
}
EXPORT_SYMBOL_GPL(__libeth_xdpsq_unlock);
/* XDPSQ clean-up timers */
/**
* libeth_xdpsq_init_timer - initialize an XDPSQ clean-up timer
* @timer: timer to initialize
* @xdpsq: queue this timer belongs to
* @lock: corresponding XDPSQ lock
* @poll: queue polling/completion function
*
* XDPSQ clean-up timers must be set up before using at the queue configuration
* time. Set the required pointers and the cleaning callback.
*/
void libeth_xdpsq_init_timer(struct libeth_xdpsq_timer *timer, void *xdpsq,
struct libeth_xdpsq_lock *lock,
void (*poll)(struct work_struct *work))
{
timer->xdpsq = xdpsq;
timer->lock = lock;
INIT_DELAYED_WORK(&timer->dwork, poll);
}
EXPORT_SYMBOL_GPL(libeth_xdpsq_init_timer);
/* ``XDP_TX`` bulking */
static void __cold
libeth_xdp_tx_return_one(const struct libeth_xdp_tx_frame *frm)
{
if (frm->len_fl & LIBETH_XDP_TX_MULTI)
libeth_xdp_return_frags(frm->data + frm->soff, true);
libeth_xdp_return_va(frm->data, true);
}
static void __cold
libeth_xdp_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, u32 count)
{
for (u32 i = 0; i < count; i++) {
const struct libeth_xdp_tx_frame *frm = &bq[i];
if (!(frm->len_fl & LIBETH_XDP_TX_FIRST))
continue;
libeth_xdp_tx_return_one(frm);
}
}
static void __cold libeth_trace_xdp_exception(const struct net_device *dev,
const struct bpf_prog *prog,
u32 act)
{
trace_xdp_exception(dev, prog, act);
}
/**
* libeth_xdp_tx_exception - handle Tx exceptions of XDP frames
* @bq: XDP Tx frame bulk
* @sent: number of frames sent successfully (from this bulk)
* @flags: internal libeth_xdp flags (XSk, .ndo_xdp_xmit etc.)
*
* Cold helper used by __libeth_xdp_tx_flush_bulk(), do not call directly.
* Reports XDP Tx exceptions, frees the frames that won't be sent or adjust
* the Tx bulk to try again later.
*/
void __cold libeth_xdp_tx_exception(struct libeth_xdp_tx_bulk *bq, u32 sent,
u32 flags)
{
const struct libeth_xdp_tx_frame *pos = &bq->bulk[sent];
u32 left = bq->count - sent;
if (!(flags & LIBETH_XDP_TX_NDO))
libeth_trace_xdp_exception(bq->dev, bq->prog, XDP_TX);
if (!(flags & LIBETH_XDP_TX_DROP)) {
memmove(bq->bulk, pos, left * sizeof(*bq->bulk));
bq->count = left;
return;
}
if (flags & LIBETH_XDP_TX_XSK)
libeth_xsk_tx_return_bulk(pos, left);
else if (!(flags & LIBETH_XDP_TX_NDO))
libeth_xdp_tx_return_bulk(pos, left);
else
libeth_xdp_xmit_return_bulk(pos, left, bq->dev);
bq->count = 0;
}
EXPORT_SYMBOL_GPL(libeth_xdp_tx_exception);
/* .ndo_xdp_xmit() implementation */
u32 __cold libeth_xdp_xmit_return_bulk(const struct libeth_xdp_tx_frame *bq,
u32 count, const struct net_device *dev)
{
u32 n = 0;
for (u32 i = 0; i < count; i++) {
const struct libeth_xdp_tx_frame *frm = &bq[i];
dma_addr_t dma;
if (frm->flags & LIBETH_XDP_TX_FIRST)
dma = *libeth_xdp_xmit_frame_dma(frm->xdpf);
else
dma = dma_unmap_addr(frm, dma);
dma_unmap_page(dev->dev.parent, dma, dma_unmap_len(frm, len),
DMA_TO_DEVICE);
/* Actual xdp_frames are freed by the core */
n += !!(frm->flags & LIBETH_XDP_TX_FIRST);
}
return n;
}
EXPORT_SYMBOL_GPL(libeth_xdp_xmit_return_bulk);
/* Rx polling path */
/**
* libeth_xdp_load_stash - recreate an &xdp_buff from libeth_xdp buffer stash
* @dst: target &libeth_xdp_buff to initialize
* @src: source stash
*
* External helper used by libeth_xdp_init_buff(), do not call directly.
* Recreate an onstack &libeth_xdp_buff using the stash saved earlier.
* The only field untouched (rxq) is initialized later in the
* abovementioned function.
*/
void libeth_xdp_load_stash(struct libeth_xdp_buff *dst,
const struct libeth_xdp_buff_stash *src)
{
dst->data = src->data;
dst->base.data_end = src->data + src->len;
dst->base.data_meta = src->data;
dst->base.data_hard_start = src->data - src->headroom;
dst->base.frame_sz = src->frame_sz;
dst->base.flags = src->flags;
}
EXPORT_SYMBOL_GPL(libeth_xdp_load_stash);
/**
* libeth_xdp_save_stash - convert &xdp_buff to a libeth_xdp buffer stash
* @dst: target &libeth_xdp_buff_stash to initialize
* @src: source XDP buffer
*
* External helper used by libeth_xdp_save_buff(), do not call directly.
* Use the fields from the passed XDP buffer to initialize the stash on the
* queue, so that a partially received frame can be finished later during
* the next NAPI poll.
*/
void libeth_xdp_save_stash(struct libeth_xdp_buff_stash *dst,
const struct libeth_xdp_buff *src)
{
dst->data = src->data;
dst->headroom = src->data - src->base.data_hard_start;
dst->len = src->base.data_end - src->data;
dst->frame_sz = src->base.frame_sz;
dst->flags = src->base.flags;
WARN_ON_ONCE(dst->flags != src->base.flags);
}
EXPORT_SYMBOL_GPL(libeth_xdp_save_stash);
void __libeth_xdp_return_stash(struct libeth_xdp_buff_stash *stash)
{
LIBETH_XDP_ONSTACK_BUFF(xdp);
libeth_xdp_load_stash(xdp, stash);
libeth_xdp_return_buff_slow(xdp);
stash->data = NULL;
}
EXPORT_SYMBOL_GPL(__libeth_xdp_return_stash);
/**
* libeth_xdp_return_buff_slow - free &libeth_xdp_buff
* @xdp: buffer to free/return
*
* Slowpath version of libeth_xdp_return_buff() to be called on exceptions,
* queue clean-ups etc., without unwanted inlining.
*/
void __cold libeth_xdp_return_buff_slow(struct libeth_xdp_buff *xdp)
{
__libeth_xdp_return_buff(xdp, false);
}
EXPORT_SYMBOL_GPL(libeth_xdp_return_buff_slow);
/**
* libeth_xdp_buff_add_frag - add frag to XDP buffer
* @xdp: head XDP buffer
* @fqe: Rx buffer containing the frag
* @len: frag length reported by HW
*
* External helper used by libeth_xdp_process_buff(), do not call directly.
* Frees both head and frag buffers on error.
*
* Return: true success, false on error (no space for a new frag).
*/
bool libeth_xdp_buff_add_frag(struct libeth_xdp_buff *xdp,
const struct libeth_fqe *fqe,
u32 len)
{
netmem_ref netmem = fqe->netmem;
if (!xdp_buff_add_frag(&xdp->base, netmem,
fqe->offset + netmem_get_pp(netmem)->p.offset,
len, fqe->truesize))
goto recycle;
return true;
recycle:
libeth_rx_recycle_slow(netmem);
libeth_xdp_return_buff_slow(xdp);
return false;
}
EXPORT_SYMBOL_GPL(libeth_xdp_buff_add_frag);
/**
* libeth_xdp_prog_exception - handle XDP prog exceptions
* @bq: XDP Tx bulk
* @xdp: buffer to process
* @act: original XDP prog verdict
* @ret: error code if redirect failed
*
* External helper used by __libeth_xdp_run_prog() and
* __libeth_xsk_run_prog_slow(), do not call directly.
* Reports invalid @act, XDP exception trace event and frees the buffer.
*
* Return: libeth_xdp XDP prog verdict.
*/
u32 __cold libeth_xdp_prog_exception(const struct libeth_xdp_tx_bulk *bq,
struct libeth_xdp_buff *xdp,
enum xdp_action act, int ret)
{
if (act > XDP_REDIRECT)
bpf_warn_invalid_xdp_action(bq->dev, bq->prog, act);
libeth_trace_xdp_exception(bq->dev, bq->prog, act);
if (xdp->base.rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL)
return libeth_xsk_prog_exception(xdp, act, ret);
libeth_xdp_return_buff_slow(xdp);
return LIBETH_XDP_DROP;
}
EXPORT_SYMBOL_GPL(libeth_xdp_prog_exception);
/* Tx buffer completion */
static void libeth_xdp_put_netmem_bulk(netmem_ref netmem,
struct xdp_frame_bulk *bq)
{
if (unlikely(bq->count == XDP_BULK_QUEUE_SIZE))
xdp_flush_frame_bulk(bq);
bq->q[bq->count++] = netmem;
}
/**
* libeth_xdp_return_buff_bulk - free &xdp_buff as part of a bulk
* @sinfo: shared info corresponding to the buffer
* @bq: XDP frame bulk to store the buffer
* @frags: whether the buffer has frags
*
* Same as xdp_return_frame_bulk(), but for &libeth_xdp_buff, speeds up Tx
* completion of ``XDP_TX`` buffers and allows to free them in same bulks
* with &xdp_frame buffers.
*/
void libeth_xdp_return_buff_bulk(const struct skb_shared_info *sinfo,
struct xdp_frame_bulk *bq, bool frags)
{
if (!frags)
goto head;
for (u32 i = 0; i < sinfo->nr_frags; i++)
libeth_xdp_put_netmem_bulk(skb_frag_netmem(&sinfo->frags[i]),
bq);
head:
libeth_xdp_put_netmem_bulk(virt_to_netmem(sinfo), bq);
}
EXPORT_SYMBOL_GPL(libeth_xdp_return_buff_bulk);
/* Misc */
/**
* libeth_xdp_queue_threshold - calculate XDP queue clean/refill threshold
* @count: number of descriptors in the queue
*
* The threshold is the limit at which RQs start to refill (when the number of
* empty buffers exceeds it) and SQs get cleaned up (when the number of free
* descriptors goes below it). To speed up hotpath processing, threshold is
* always pow-2, closest to 1/4 of the queue length.
* Don't call it on hotpath, calculate and cache the threshold during the
* queue initialization.
*
* Return: the calculated threshold.
*/
u32 libeth_xdp_queue_threshold(u32 count)
{
u32 quarter, low, high;
if (likely(is_power_of_2(count)))
return count >> 2;
quarter = DIV_ROUND_CLOSEST(count, 4);
low = rounddown_pow_of_two(quarter);
high = roundup_pow_of_two(quarter);
return high - quarter <= quarter - low ? high : low;
}
EXPORT_SYMBOL_GPL(libeth_xdp_queue_threshold);
/**
* __libeth_xdp_set_features - set XDP features for netdev
* @dev: &net_device to configure
* @xmo: XDP metadata ops (Rx hints)
* @zc_segs: maximum number of S/G frags the HW can transmit
* @tmo: XSk Tx metadata ops (Tx hints)
*
* Set all the features libeth_xdp supports. Only the first argument is
* necessary; without the third one (zero), XSk support won't be advertised.
* Use the non-underscored versions in drivers instead.
*/
void __libeth_xdp_set_features(struct net_device *dev,
const struct xdp_metadata_ops *xmo,
u32 zc_segs,
const struct xsk_tx_metadata_ops *tmo)
{
xdp_set_features_flag(dev,
NETDEV_XDP_ACT_BASIC |
NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_NDO_XMIT |
(zc_segs ? NETDEV_XDP_ACT_XSK_ZEROCOPY : 0) |
NETDEV_XDP_ACT_RX_SG |
NETDEV_XDP_ACT_NDO_XMIT_SG);
dev->xdp_metadata_ops = xmo;
tmo = tmo == libeth_xsktmo ? &libeth_xsktmo_slow : tmo;
dev->xdp_zc_max_segs = zc_segs ? : 1;
dev->xsk_tx_metadata_ops = zc_segs ? tmo : NULL;
}
EXPORT_SYMBOL_GPL(__libeth_xdp_set_features);
/**
* libeth_xdp_set_redirect - toggle the XDP redirect feature
* @dev: &net_device to configure
* @enable: whether XDP is enabled
*
* Use this when XDPSQs are not always available to dynamically enable
* and disable redirect feature.
*/
void libeth_xdp_set_redirect(struct net_device *dev, bool enable)
{
if (enable)
xdp_features_set_redirect_target(dev, true);
else
xdp_features_clear_redirect_target(dev);
}
EXPORT_SYMBOL_GPL(libeth_xdp_set_redirect);
/* Module */
static const struct libeth_xdp_ops xdp_ops __initconst = {
.bulk = libeth_xdp_return_buff_bulk,
.xsk = libeth_xsk_buff_free_slow,
};
static int __init libeth_xdp_module_init(void)
{
libeth_attach_xdp(&xdp_ops);
return 0;
}
module_init(libeth_xdp_module_init);
static void __exit libeth_xdp_module_exit(void)
{
libeth_detach_xdp();
}
module_exit(libeth_xdp_module_exit);
MODULE_DESCRIPTION("Common Ethernet library - XDP infra");
MODULE_IMPORT_NS("LIBETH");
MODULE_LICENSE("GPL");

View File

@ -0,0 +1,271 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2025 Intel Corporation */
#define DEFAULT_SYMBOL_NAMESPACE "LIBETH_XDP"
#include <linux/export.h>
#include <net/libeth/xsk.h>
#include "priv.h"
/* ``XDP_TX`` bulking */
void __cold libeth_xsk_tx_return_bulk(const struct libeth_xdp_tx_frame *bq,
u32 count)
{
for (u32 i = 0; i < count; i++)
libeth_xsk_buff_free_slow(bq[i].xsk);
}
/* XSk TMO */
const struct xsk_tx_metadata_ops libeth_xsktmo_slow = {
.tmo_request_checksum = libeth_xsktmo_req_csum,
};
/* Rx polling path */
/**
* libeth_xsk_buff_free_slow - free an XSk Rx buffer
* @xdp: buffer to free
*
* Slowpath version of xsk_buff_free() to be used on exceptions, cleanups etc.
* to avoid unwanted inlining.
*/
void libeth_xsk_buff_free_slow(struct libeth_xdp_buff *xdp)
{
xsk_buff_free(&xdp->base);
}
EXPORT_SYMBOL_GPL(libeth_xsk_buff_free_slow);
/**
* libeth_xsk_buff_add_frag - add frag to XSk Rx buffer
* @head: head buffer
* @xdp: frag buffer
*
* External helper used by libeth_xsk_process_buff(), do not call directly.
* Frees both main and frag buffers on error.
*
* Return: main buffer with attached frag on success, %NULL on error (no space
* for a new frag).
*/
struct libeth_xdp_buff *libeth_xsk_buff_add_frag(struct libeth_xdp_buff *head,
struct libeth_xdp_buff *xdp)
{
if (!xsk_buff_add_frag(&head->base, &xdp->base))
goto free;
return head;
free:
libeth_xsk_buff_free_slow(xdp);
libeth_xsk_buff_free_slow(head);
return NULL;
}
EXPORT_SYMBOL_GPL(libeth_xsk_buff_add_frag);
/**
* libeth_xsk_buff_stats_frags - update onstack RQ stats with XSk frags info
* @rs: onstack stats to update
* @xdp: buffer to account
*
* External helper used by __libeth_xsk_run_pass(), do not call directly.
* Adds buffer's frags count and total len to the onstack stats.
*/
void libeth_xsk_buff_stats_frags(struct libeth_rq_napi_stats *rs,
const struct libeth_xdp_buff *xdp)
{
libeth_xdp_buff_stats_frags(rs, xdp);
}
EXPORT_SYMBOL_GPL(libeth_xsk_buff_stats_frags);
/**
* __libeth_xsk_run_prog_slow - process the non-``XDP_REDIRECT`` verdicts
* @xdp: buffer to process
* @bq: Tx bulk for queueing on ``XDP_TX``
* @act: verdict to process
* @ret: error code if ``XDP_REDIRECT`` failed
*
* External helper used by __libeth_xsk_run_prog(), do not call directly.
* ``XDP_REDIRECT`` is the most common and hottest verdict on XSk, thus
* it is processed inline. The rest goes here for out-of-line processing,
* together with redirect errors.
*
* Return: libeth_xdp XDP prog verdict.
*/
u32 __libeth_xsk_run_prog_slow(struct libeth_xdp_buff *xdp,
const struct libeth_xdp_tx_bulk *bq,
enum xdp_action act, int ret)
{
switch (act) {
case XDP_DROP:
xsk_buff_free(&xdp->base);
return LIBETH_XDP_DROP;
case XDP_TX:
return LIBETH_XDP_TX;
case XDP_PASS:
return LIBETH_XDP_PASS;
default:
break;
}
return libeth_xdp_prog_exception(bq, xdp, act, ret);
}
EXPORT_SYMBOL_GPL(__libeth_xsk_run_prog_slow);
/**
* libeth_xsk_prog_exception - handle XDP prog exceptions on XSk
* @xdp: buffer to process
* @act: verdict returned by the prog
* @ret: error code if ``XDP_REDIRECT`` failed
*
* Internal. Frees the buffer and, if the queue uses XSk wakeups, stop the
* current NAPI poll when there are no free buffers left.
*
* Return: libeth_xdp's XDP prog verdict.
*/
u32 __cold libeth_xsk_prog_exception(struct libeth_xdp_buff *xdp,
enum xdp_action act, int ret)
{
const struct xdp_buff_xsk *xsk;
u32 __ret = LIBETH_XDP_DROP;
if (act != XDP_REDIRECT)
goto drop;
xsk = container_of(&xdp->base, typeof(*xsk), xdp);
if (xsk_uses_need_wakeup(xsk->pool) && ret == -ENOBUFS)
__ret = LIBETH_XDP_ABORTED;
drop:
libeth_xsk_buff_free_slow(xdp);
return __ret;
}
/* Refill */
/**
* libeth_xskfq_create - create an XSkFQ
* @fq: fill queue to initialize
*
* Allocates the FQEs and initializes the fields used by libeth_xdp: number
* of buffers to refill, refill threshold and buffer len.
*
* Return: %0 on success, -errno otherwise.
*/
int libeth_xskfq_create(struct libeth_xskfq *fq)
{
fq->fqes = kvcalloc_node(fq->count, sizeof(*fq->fqes), GFP_KERNEL,
fq->nid);
if (!fq->fqes)
return -ENOMEM;
fq->pending = fq->count;
fq->thresh = libeth_xdp_queue_threshold(fq->count);
fq->buf_len = xsk_pool_get_rx_frame_size(fq->pool);
return 0;
}
EXPORT_SYMBOL_GPL(libeth_xskfq_create);
/**
* libeth_xskfq_destroy - destroy an XSkFQ
* @fq: fill queue to destroy
*
* Zeroes the used fields and frees the FQEs array.
*/
void libeth_xskfq_destroy(struct libeth_xskfq *fq)
{
fq->buf_len = 0;
fq->thresh = 0;
fq->pending = 0;
kvfree(fq->fqes);
}
EXPORT_SYMBOL_GPL(libeth_xskfq_destroy);
/* .ndo_xsk_wakeup */
static void libeth_xsk_napi_sched(void *info)
{
__napi_schedule_irqoff(info);
}
/**
* libeth_xsk_init_wakeup - initialize libeth XSk wakeup structure
* @csd: struct to initialize
* @napi: NAPI corresponding to this queue
*
* libeth_xdp uses inter-processor interrupts to perform XSk wakeups. In order
* to do that, the corresponding CSDs must be initialized when creating the
* queues.
*/
void libeth_xsk_init_wakeup(call_single_data_t *csd, struct napi_struct *napi)
{
INIT_CSD(csd, libeth_xsk_napi_sched, napi);
}
EXPORT_SYMBOL_GPL(libeth_xsk_init_wakeup);
/**
* libeth_xsk_wakeup - perform an XSk wakeup
* @csd: CSD corresponding to the queue
* @qid: the stack queue index
*
* Try to mark the NAPI as missed first, so that it could be rescheduled.
* If it's not, schedule it on the corresponding CPU using IPIs (or directly
* if already running on it).
*/
void libeth_xsk_wakeup(call_single_data_t *csd, u32 qid)
{
struct napi_struct *napi = csd->info;
if (napi_if_scheduled_mark_missed(napi) ||
unlikely(!napi_schedule_prep(napi)))
return;
if (unlikely(qid >= nr_cpu_ids))
qid %= nr_cpu_ids;
if (qid != raw_smp_processor_id() && cpu_online(qid))
smp_call_function_single_async(qid, csd);
else
__napi_schedule(napi);
}
EXPORT_SYMBOL_GPL(libeth_xsk_wakeup);
/* Pool setup */
#define LIBETH_XSK_DMA_ATTR \
(DMA_ATTR_WEAK_ORDERING | DMA_ATTR_SKIP_CPU_SYNC)
/**
* libeth_xsk_setup_pool - setup or destroy an XSk pool for a queue
* @dev: target &net_device
* @qid: stack queue index to configure
* @enable: whether to enable or disable the pool
*
* Check that @qid is valid and then map or unmap the pool.
*
* Return: %0 on success, -errno otherwise.
*/
int libeth_xsk_setup_pool(struct net_device *dev, u32 qid, bool enable)
{
struct xsk_buff_pool *pool;
pool = xsk_get_pool_from_qid(dev, qid);
if (!pool)
return -EINVAL;
if (enable)
return xsk_pool_dma_map(pool, dev->dev.parent,
LIBETH_XSK_DMA_ATTR);
else
xsk_pool_dma_unmap(pool, LIBETH_XSK_DMA_ATTR);
return 0;
}
EXPORT_SYMBOL_GPL(libeth_xsk_setup_pool);

View File

@ -1,6 +1,9 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2024 Intel Corporation */
/* Copyright (C) 2024-2025 Intel Corporation */
#define DEFAULT_SYMBOL_NAMESPACE "LIBIE"
#include <linux/export.h>
#include <linux/net/intel/libie/rx.h>
/* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
@ -116,7 +119,7 @@ const struct libeth_rx_pt libie_rx_pt_lut[LIBIE_RX_PT_NUM] = {
LIBIE_RX_PT_IP(4),
LIBIE_RX_PT_IP(6),
};
EXPORT_SYMBOL_NS_GPL(libie_rx_pt_lut, "LIBIE");
EXPORT_SYMBOL_GPL(libie_rx_pt_lut);
MODULE_DESCRIPTION("Intel(R) Ethernet common library");
MODULE_IMPORT_NS("LIBETH");

View File

@ -1,5 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (C) 2024 Intel Corporation */
/* Copyright (C) 2024-2025 Intel Corporation */
#ifndef __LIBETH_RX_H
#define __LIBETH_RX_H
@ -13,8 +13,10 @@
/* Space reserved in front of each frame */
#define LIBETH_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN)
#define LIBETH_XDP_HEADROOM (ALIGN(XDP_PACKET_HEADROOM, NET_SKB_PAD) + \
NET_IP_ALIGN)
/* Maximum headroom for worst-case calculations */
#define LIBETH_MAX_HEADROOM LIBETH_SKB_HEADROOM
#define LIBETH_MAX_HEADROOM LIBETH_XDP_HEADROOM
/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */
#define LIBETH_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
/* Maximum supported L2-L4 header length */
@ -31,7 +33,7 @@
/**
* struct libeth_fqe - structure representing an Rx buffer (fill queue element)
* @page: page holding the buffer
* @netmem: network memory reference holding the buffer
* @offset: offset from the page start (to the headroom)
* @truesize: total space occupied by the buffer (w/ headroom and tailroom)
*
@ -40,7 +42,7 @@
* former, @offset is always 0 and @truesize is always ```PAGE_SIZE```.
*/
struct libeth_fqe {
struct page *page;
netmem_ref netmem;
u32 offset;
u32 truesize;
} __aligned_largest;
@ -66,6 +68,7 @@ enum libeth_fqe_type {
* @count: number of descriptors/buffers the queue has
* @type: type of the buffers this queue has
* @hsplit: flag whether header split is enabled
* @xdp: flag indicating whether XDP is enabled
* @buf_len: HW-writeable length per each buffer
* @nid: ID of the closest NUMA node with memory
*/
@ -81,6 +84,7 @@ struct libeth_fq {
/* Cold fields */
enum libeth_fqe_type type:2;
bool hsplit:1;
bool xdp:1;
u32 buf_len;
int nid;
@ -102,15 +106,16 @@ static inline dma_addr_t libeth_rx_alloc(const struct libeth_fq_fp *fq, u32 i)
struct libeth_fqe *buf = &fq->fqes[i];
buf->truesize = fq->truesize;
buf->page = page_pool_dev_alloc(fq->pp, &buf->offset, &buf->truesize);
if (unlikely(!buf->page))
buf->netmem = page_pool_dev_alloc_netmem(fq->pp, &buf->offset,
&buf->truesize);
if (unlikely(!buf->netmem))
return DMA_MAPPING_ERROR;
return page_pool_get_dma_addr(buf->page) + buf->offset +
return page_pool_get_dma_addr_netmem(buf->netmem) + buf->offset +
fq->pp->p.offset;
}
void libeth_rx_recycle_slow(struct page *page);
void libeth_rx_recycle_slow(netmem_ref netmem);
/**
* libeth_rx_sync_for_cpu - synchronize or recycle buffer post DMA
@ -126,18 +131,19 @@ void libeth_rx_recycle_slow(struct page *page);
static inline bool libeth_rx_sync_for_cpu(const struct libeth_fqe *fqe,
u32 len)
{
struct page *page = fqe->page;
netmem_ref netmem = fqe->netmem;
/* Very rare, but possible case. The most common reason:
* the last fragment contained FCS only, which was then
* stripped by the HW.
*/
if (unlikely(!len)) {
libeth_rx_recycle_slow(page);
libeth_rx_recycle_slow(netmem);
return false;
}
page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
page_pool_dma_sync_netmem_for_cpu(netmem_get_pp(netmem), netmem,
fqe->offset, len);
return true;
}

View File

@ -1,5 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (C) 2024 Intel Corporation */
/* Copyright (C) 2024-2025 Intel Corporation */
#ifndef __LIBETH_TX_H
#define __LIBETH_TX_H
@ -12,11 +12,17 @@
/**
* enum libeth_sqe_type - type of &libeth_sqe to act on Tx completion
* @LIBETH_SQE_EMPTY: unused/empty, no action required
* @LIBETH_SQE_EMPTY: unused/empty OR XDP_TX/XSk frame, no action required
* @LIBETH_SQE_CTX: context descriptor with empty SQE, no action required
* @LIBETH_SQE_SLAB: kmalloc-allocated buffer, unmap and kfree()
* @LIBETH_SQE_FRAG: mapped skb frag, only unmap DMA
* @LIBETH_SQE_SKB: &sk_buff, unmap and napi_consume_skb(), update stats
* @__LIBETH_SQE_XDP_START: separator between skb and XDP types
* @LIBETH_SQE_XDP_TX: &skb_shared_info, libeth_xdp_return_buff_bulk(), stats
* @LIBETH_SQE_XDP_XMIT: &xdp_frame, unmap and xdp_return_frame_bulk(), stats
* @LIBETH_SQE_XDP_XMIT_FRAG: &xdp_frame frag, only unmap DMA
* @LIBETH_SQE_XSK_TX: &libeth_xdp_buff on XSk queue, xsk_buff_free(), stats
* @LIBETH_SQE_XSK_TX_FRAG: &libeth_xdp_buff frag on XSk queue, xsk_buff_free()
*/
enum libeth_sqe_type {
LIBETH_SQE_EMPTY = 0U,
@ -24,6 +30,13 @@ enum libeth_sqe_type {
LIBETH_SQE_SLAB,
LIBETH_SQE_FRAG,
LIBETH_SQE_SKB,
__LIBETH_SQE_XDP_START,
LIBETH_SQE_XDP_TX = __LIBETH_SQE_XDP_START,
LIBETH_SQE_XDP_XMIT,
LIBETH_SQE_XDP_XMIT_FRAG,
LIBETH_SQE_XSK_TX,
LIBETH_SQE_XSK_TX_FRAG,
};
/**
@ -32,6 +45,9 @@ enum libeth_sqe_type {
* @rs_idx: index of the last buffer from the batch this one was sent in
* @raw: slab buffer to free via kfree()
* @skb: &sk_buff to consume
* @sinfo: skb shared info of an XDP_TX frame
* @xdpf: XDP frame from ::ndo_xdp_xmit()
* @xsk: XSk Rx frame from XDP_TX action
* @dma: DMA address to unmap
* @len: length of the mapped region to unmap
* @nr_frags: number of frags in the frame this buffer belongs to
@ -46,6 +62,9 @@ struct libeth_sqe {
union {
void *raw;
struct sk_buff *skb;
struct skb_shared_info *sinfo;
struct xdp_frame *xdpf;
struct libeth_xdp_buff *xsk;
};
DEFINE_DMA_UNMAP_ADDR(dma);
@ -71,7 +90,10 @@ struct libeth_sqe {
/**
* struct libeth_cq_pp - completion queue poll params
* @dev: &device to perform DMA unmapping
* @bq: XDP frame bulk to combine return operations
* @ss: onstack NAPI stats to fill
* @xss: onstack XDPSQ NAPI stats to fill
* @xdp_tx: number of XDP-not-XSk frames processed
* @napi: whether it's called from the NAPI context
*
* libeth uses this structure to access objects needed for performing full
@ -80,7 +102,13 @@ struct libeth_sqe {
*/
struct libeth_cq_pp {
struct device *dev;
struct libeth_sq_napi_stats *ss;
struct xdp_frame_bulk *bq;
union {
struct libeth_sq_napi_stats *ss;
struct libeth_xdpsq_napi_stats *xss;
};
u32 xdp_tx;
bool napi;
};
@ -126,4 +154,6 @@ static inline void libeth_tx_complete(struct libeth_sqe *sqe,
sqe->type = LIBETH_SQE_EMPTY;
}
void libeth_tx_complete_any(struct libeth_sqe *sqe, struct libeth_cq_pp *cp);
#endif /* __LIBETH_TX_H */

View File

@ -1,10 +1,32 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (C) 2024 Intel Corporation */
/* Copyright (C) 2024-2025 Intel Corporation */
#ifndef __LIBETH_TYPES_H
#define __LIBETH_TYPES_H
#include <linux/types.h>
#include <linux/workqueue.h>
/* Stats */
/**
* struct libeth_rq_napi_stats - "hot" counters to update in Rx polling loop
* @packets: received frames counter
* @bytes: sum of bytes of received frames above
* @fragments: sum of fragments of received S/G frames
* @hsplit: number of frames the device performed the header split for
* @raw: alias to access all the fields as an array
*/
struct libeth_rq_napi_stats {
union {
struct {
u32 packets;
u32 bytes;
u32 fragments;
u32 hsplit;
};
DECLARE_FLEX_ARRAY(u32, raw);
};
};
/**
* struct libeth_sq_napi_stats - "hot" counters to update in Tx completion loop
@ -22,4 +44,84 @@ struct libeth_sq_napi_stats {
};
};
/**
* struct libeth_xdpsq_napi_stats - "hot" counters to update in XDP Tx
* completion loop
* @packets: completed frames counter
* @bytes: sum of bytes of completed frames above
* @fragments: sum of fragments of completed S/G frames
* @raw: alias to access all the fields as an array
*/
struct libeth_xdpsq_napi_stats {
union {
struct {
u32 packets;
u32 bytes;
u32 fragments;
};
DECLARE_FLEX_ARRAY(u32, raw);
};
};
/* XDP */
/*
* The following structures should be embedded into driver's queue structure
* and passed to the libeth_xdp helpers, never used directly.
*/
/* XDPSQ sharing */
/**
* struct libeth_xdpsq_lock - locking primitive for sharing XDPSQs
* @lock: spinlock for locking the queue
* @share: whether this particular queue is shared
*/
struct libeth_xdpsq_lock {
spinlock_t lock;
bool share;
};
/* XDPSQ clean-up timers */
/**
* struct libeth_xdpsq_timer - timer for cleaning up XDPSQs w/o interrupts
* @xdpsq: queue this timer belongs to
* @lock: lock for the queue
* @dwork: work performing cleanups
*
* XDPSQs not using interrupts but lazy cleaning, i.e. only when there's no
* space for sending the current queued frame/bulk, must fire up timers to
* make sure there are no stale buffers to free.
*/
struct libeth_xdpsq_timer {
void *xdpsq;
struct libeth_xdpsq_lock *lock;
struct delayed_work dwork;
};
/* Rx polling path */
/**
* struct libeth_xdp_buff_stash - struct for stashing &xdp_buff onto a queue
* @data: pointer to the start of the frame, xdp_buff.data
* @headroom: frame headroom, xdp_buff.data - xdp_buff.data_hard_start
* @len: frame linear space length, xdp_buff.data_end - xdp_buff.data
* @frame_sz: truesize occupied by the frame, xdp_buff.frame_sz
* @flags: xdp_buff.flags
*
* &xdp_buff is 56 bytes long on x64, &libeth_xdp_buff is 64 bytes. This
* structure carries only necessary fields to save/restore a partially built
* frame on the queue structure to finish it during the next NAPI poll.
*/
struct libeth_xdp_buff_stash {
void *data;
u16 headroom;
u16 len;
u32 frame_sz:24;
u32 flags:8;
} __aligned_largest;
#endif /* __LIBETH_TYPES_H */

1879
include/net/libeth/xdp.h Normal file

File diff suppressed because it is too large Load Diff

685
include/net/libeth/xsk.h Normal file
View File

@ -0,0 +1,685 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (C) 2025 Intel Corporation */
#ifndef __LIBETH_XSK_H
#define __LIBETH_XSK_H
#include <net/libeth/xdp.h>
#include <net/xdp_sock_drv.h>
/* ``XDP_TXMD_FLAGS_VALID`` is defined only under ``CONFIG_XDP_SOCKETS`` */
#ifdef XDP_TXMD_FLAGS_VALID
static_assert(XDP_TXMD_FLAGS_VALID <= LIBETH_XDP_TX_XSKMD);
#endif
/* ``XDP_TX`` bulking */
/**
* libeth_xsk_tx_queue_head - internal helper for queueing XSk ``XDP_TX`` head
* @bq: XDP Tx bulk to queue the head frag to
* @xdp: XSk buffer with the head to queue
*
* Return: false if it's the only frag of the frame, true if it's an S/G frame.
*/
static inline bool libeth_xsk_tx_queue_head(struct libeth_xdp_tx_bulk *bq,
struct libeth_xdp_buff *xdp)
{
bq->bulk[bq->count++] = (typeof(*bq->bulk)){
.xsk = xdp,
__libeth_xdp_tx_len(xdp->base.data_end - xdp->data,
LIBETH_XDP_TX_FIRST),
};
if (likely(!xdp_buff_has_frags(&xdp->base)))
return false;
bq->bulk[bq->count - 1].flags |= LIBETH_XDP_TX_MULTI;
return true;
}
/**
* libeth_xsk_tx_queue_frag - internal helper for queueing XSk ``XDP_TX`` frag
* @bq: XDP Tx bulk to queue the frag to
* @frag: XSk frag to queue
*/
static inline void libeth_xsk_tx_queue_frag(struct libeth_xdp_tx_bulk *bq,
struct libeth_xdp_buff *frag)
{
bq->bulk[bq->count++] = (typeof(*bq->bulk)){
.xsk = frag,
__libeth_xdp_tx_len(frag->base.data_end - frag->data),
};
}
/**
* libeth_xsk_tx_queue_bulk - internal helper for queueing XSk ``XDP_TX`` frame
* @bq: XDP Tx bulk to queue the frame to
* @xdp: XSk buffer to queue
* @flush_bulk: driver callback to flush the bulk to the HW queue
*
* Return: true on success, false on flush error.
*/
static __always_inline bool
libeth_xsk_tx_queue_bulk(struct libeth_xdp_tx_bulk *bq,
struct libeth_xdp_buff *xdp,
bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq,
u32 flags))
{
bool ret = true;
if (unlikely(bq->count == LIBETH_XDP_TX_BULK) &&
unlikely(!flush_bulk(bq, LIBETH_XDP_TX_XSK))) {
libeth_xsk_buff_free_slow(xdp);
return false;
}
if (!libeth_xsk_tx_queue_head(bq, xdp))
goto out;
for (const struct libeth_xdp_buff *head = xdp; ; ) {
xdp = container_of(xsk_buff_get_frag(&head->base),
typeof(*xdp), base);
if (!xdp)
break;
if (unlikely(bq->count == LIBETH_XDP_TX_BULK) &&
unlikely(!flush_bulk(bq, LIBETH_XDP_TX_XSK))) {
ret = false;
break;
}
libeth_xsk_tx_queue_frag(bq, xdp);
}
out:
bq->bulk[bq->count - 1].flags |= LIBETH_XDP_TX_LAST;
return ret;
}
/**
* libeth_xsk_tx_fill_buf - internal helper to fill XSk ``XDP_TX`` &libeth_sqe
* @frm: XDP Tx frame from the bulk
* @i: index on the HW queue
* @sq: XDPSQ abstraction for the queue
* @priv: private data
*
* Return: XDP Tx descriptor with the synced DMA and other info to pass to
* the driver callback.
*/
static inline struct libeth_xdp_tx_desc
libeth_xsk_tx_fill_buf(struct libeth_xdp_tx_frame frm, u32 i,
const struct libeth_xdpsq *sq, u64 priv)
{
struct libeth_xdp_buff *xdp = frm.xsk;
struct libeth_xdp_tx_desc desc = {
.addr = xsk_buff_xdp_get_dma(&xdp->base),
.opts = frm.opts,
};
struct libeth_sqe *sqe;
xsk_buff_raw_dma_sync_for_device(sq->pool, desc.addr, desc.len);
sqe = &sq->sqes[i];
sqe->xsk = xdp;
if (!(desc.flags & LIBETH_XDP_TX_FIRST)) {
sqe->type = LIBETH_SQE_XSK_TX_FRAG;
return desc;
}
sqe->type = LIBETH_SQE_XSK_TX;
libeth_xdp_tx_fill_stats(sqe, &desc,
xdp_get_shared_info_from_buff(&xdp->base));
return desc;
}
/**
* libeth_xsk_tx_flush_bulk - wrapper to define flush of XSk ``XDP_TX`` bulk
* @bq: bulk to flush
* @flags: Tx flags, see __libeth_xdp_tx_flush_bulk()
* @prep: driver callback to prepare the queue
* @xmit: driver callback to fill a HW descriptor
*
* Use via LIBETH_XSK_DEFINE_FLUSH_TX() to define an XSk ``XDP_TX`` driver
* callback.
*/
#define libeth_xsk_tx_flush_bulk(bq, flags, prep, xmit) \
__libeth_xdp_tx_flush_bulk(bq, (flags) | LIBETH_XDP_TX_XSK, prep, \
libeth_xsk_tx_fill_buf, xmit)
/* XSk TMO */
/**
* libeth_xsktmo_req_csum - XSk Tx metadata op to request checksum offload
* @csum_start: unused
* @csum_offset: unused
* @priv: &libeth_xdp_tx_desc from the filling helper
*
* Generic implementation of ::tmo_request_checksum. Works only when HW doesn't
* require filling checksum offsets and other parameters beside the checksum
* request bit.
* Consider using within @libeth_xsktmo unless the driver requires HW-specific
* callbacks.
*/
static inline void libeth_xsktmo_req_csum(u16 csum_start, u16 csum_offset,
void *priv)
{
((struct libeth_xdp_tx_desc *)priv)->flags |= LIBETH_XDP_TX_CSUM;
}
/* Only to inline the callbacks below, use @libeth_xsktmo in drivers instead */
static const struct xsk_tx_metadata_ops __libeth_xsktmo = {
.tmo_request_checksum = libeth_xsktmo_req_csum,
};
/**
* __libeth_xsk_xmit_fill_buf_md - internal helper to prepare XSk xmit w/meta
* @xdesc: &xdp_desc from the XSk buffer pool
* @sq: XDPSQ abstraction for the queue
* @priv: XSk Tx metadata ops
*
* Same as __libeth_xsk_xmit_fill_buf(), but requests metadata pointer and
* fills additional fields in &libeth_xdp_tx_desc to ask for metadata offload.
*
* Return: XDP Tx descriptor with the DMA, metadata request bits, and other
* info to pass to the driver callback.
*/
static __always_inline struct libeth_xdp_tx_desc
__libeth_xsk_xmit_fill_buf_md(const struct xdp_desc *xdesc,
const struct libeth_xdpsq *sq,
u64 priv)
{
const struct xsk_tx_metadata_ops *tmo = libeth_xdp_priv_to_ptr(priv);
struct libeth_xdp_tx_desc desc;
struct xdp_desc_ctx ctx;
ctx = xsk_buff_raw_get_ctx(sq->pool, xdesc->addr);
desc = (typeof(desc)){
.addr = ctx.dma,
__libeth_xdp_tx_len(xdesc->len),
};
BUILD_BUG_ON(!__builtin_constant_p(tmo == libeth_xsktmo));
tmo = tmo == libeth_xsktmo ? &__libeth_xsktmo : tmo;
xsk_tx_metadata_request(ctx.meta, tmo, &desc);
return desc;
}
/* XSk xmit implementation */
/**
* __libeth_xsk_xmit_fill_buf - internal helper to prepare XSk xmit w/o meta
* @xdesc: &xdp_desc from the XSk buffer pool
* @sq: XDPSQ abstraction for the queue
*
* Return: XDP Tx descriptor with the DMA and other info to pass to
* the driver callback.
*/
static inline struct libeth_xdp_tx_desc
__libeth_xsk_xmit_fill_buf(const struct xdp_desc *xdesc,
const struct libeth_xdpsq *sq)
{
return (struct libeth_xdp_tx_desc){
.addr = xsk_buff_raw_get_dma(sq->pool, xdesc->addr),
__libeth_xdp_tx_len(xdesc->len),
};
}
/**
* libeth_xsk_xmit_fill_buf - internal helper to prepare an XSk xmit
* @frm: &xdp_desc from the XSk buffer pool
* @i: index on the HW queue
* @sq: XDPSQ abstraction for the queue
* @priv: XSk Tx metadata ops
*
* Depending on the metadata ops presence (determined at compile time), calls
* the quickest helper to build a libeth XDP Tx descriptor.
*
* Return: XDP Tx descriptor with the synced DMA, metadata request bits,
* and other info to pass to the driver callback.
*/
static __always_inline struct libeth_xdp_tx_desc
libeth_xsk_xmit_fill_buf(struct libeth_xdp_tx_frame frm, u32 i,
const struct libeth_xdpsq *sq, u64 priv)
{
struct libeth_xdp_tx_desc desc;
if (priv)
desc = __libeth_xsk_xmit_fill_buf_md(&frm.desc, sq, priv);
else
desc = __libeth_xsk_xmit_fill_buf(&frm.desc, sq);
desc.flags |= xsk_is_eop_desc(&frm.desc) ? LIBETH_XDP_TX_LAST : 0;
xsk_buff_raw_dma_sync_for_device(sq->pool, desc.addr, desc.len);
return desc;
}
/**
* libeth_xsk_xmit_do_bulk - send XSk xmit frames
* @pool: XSk buffer pool containing the frames to send
* @xdpsq: opaque pointer to driver's XDPSQ struct
* @budget: maximum number of frames can be sent
* @tmo: optional XSk Tx metadata ops
* @prep: driver callback to build a &libeth_xdpsq
* @xmit: driver callback to put frames to a HW queue
* @finalize: driver callback to start a transmission
*
* Implements generic XSk xmit. Always turns on XSk Tx wakeup as it's assumed
* lazy cleaning is used and interrupts are disabled for the queue.
* HW descriptor filling is unrolled by ``LIBETH_XDP_TX_BATCH`` to optimize
* writes.
* Note that unlike other XDP Tx ops, the queue must be locked and cleaned
* prior to calling this function to already know available @budget.
* @prepare must only build a &libeth_xdpsq and return ``U32_MAX``.
*
* Return: false if @budget was exhausted, true otherwise.
*/
static __always_inline bool
libeth_xsk_xmit_do_bulk(struct xsk_buff_pool *pool, void *xdpsq, u32 budget,
const struct xsk_tx_metadata_ops *tmo,
u32 (*prep)(void *xdpsq, struct libeth_xdpsq *sq),
void (*xmit)(struct libeth_xdp_tx_desc desc, u32 i,
const struct libeth_xdpsq *sq, u64 priv),
void (*finalize)(void *xdpsq, bool sent, bool flush))
{
const struct libeth_xdp_tx_frame *bulk;
bool wake;
u32 n;
wake = xsk_uses_need_wakeup(pool);
if (wake)
xsk_clear_tx_need_wakeup(pool);
n = xsk_tx_peek_release_desc_batch(pool, budget);
bulk = container_of(&pool->tx_descs[0], typeof(*bulk), desc);
libeth_xdp_tx_xmit_bulk(bulk, xdpsq, n, true,
libeth_xdp_ptr_to_priv(tmo), prep,
libeth_xsk_xmit_fill_buf, xmit);
finalize(xdpsq, n, true);
if (wake)
xsk_set_tx_need_wakeup(pool);
return n < budget;
}
/* Rx polling path */
/**
* libeth_xsk_tx_init_bulk - initialize XDP Tx bulk for an XSk Rx NAPI poll
* @bq: bulk to initialize
* @prog: RCU pointer to the XDP program (never %NULL)
* @dev: target &net_device
* @xdpsqs: array of driver XDPSQ structs
* @num: number of active XDPSQs, the above array length
*
* Should be called on an onstack XDP Tx bulk before the XSk NAPI polling loop.
* Initializes all the needed fields to run libeth_xdp functions.
* Never checks if @prog is %NULL or @num == 0 as XDP must always be enabled
* when hitting this path.
*/
#define libeth_xsk_tx_init_bulk(bq, prog, dev, xdpsqs, num) \
__libeth_xdp_tx_init_bulk(bq, prog, dev, xdpsqs, num, true, \
__UNIQUE_ID(bq_), __UNIQUE_ID(nqs_))
struct libeth_xdp_buff *libeth_xsk_buff_add_frag(struct libeth_xdp_buff *head,
struct libeth_xdp_buff *xdp);
/**
* libeth_xsk_process_buff - attach XSk Rx buffer to &libeth_xdp_buff
* @head: head XSk buffer to attach the XSk buffer to (or %NULL)
* @xdp: XSk buffer to process
* @len: received data length from the descriptor
*
* If @head == %NULL, treats the XSk buffer as head and initializes
* the required fields. Otherwise, attaches the buffer as a frag.
* Already performs DMA sync-for-CPU and frame start prefetch
* (for head buffers only).
*
* Return: head XSk buffer on success or if the descriptor must be skipped
* (empty), %NULL if there is no space for a new frag.
*/
static inline struct libeth_xdp_buff *
libeth_xsk_process_buff(struct libeth_xdp_buff *head,
struct libeth_xdp_buff *xdp, u32 len)
{
if (unlikely(!len)) {
libeth_xsk_buff_free_slow(xdp);
return head;
}
xsk_buff_set_size(&xdp->base, len);
xsk_buff_dma_sync_for_cpu(&xdp->base);
if (head)
return libeth_xsk_buff_add_frag(head, xdp);
prefetch(xdp->data);
return xdp;
}
void libeth_xsk_buff_stats_frags(struct libeth_rq_napi_stats *rs,
const struct libeth_xdp_buff *xdp);
u32 __libeth_xsk_run_prog_slow(struct libeth_xdp_buff *xdp,
const struct libeth_xdp_tx_bulk *bq,
enum xdp_action act, int ret);
/**
* __libeth_xsk_run_prog - run XDP program on XSk buffer
* @xdp: XSk buffer to run the prog on
* @bq: buffer bulk for ``XDP_TX`` queueing
*
* Internal inline abstraction to run XDP program on XSk Rx path. Handles
* only the most common ``XDP_REDIRECT`` inline, the rest is processed
* externally.
* Reports an XDP prog exception on errors.
*
* Return: libeth_xdp prog verdict depending on the prog's verdict.
*/
static __always_inline u32
__libeth_xsk_run_prog(struct libeth_xdp_buff *xdp,
const struct libeth_xdp_tx_bulk *bq)
{
enum xdp_action act;
int ret = 0;
act = bpf_prog_run_xdp(bq->prog, &xdp->base);
if (unlikely(act != XDP_REDIRECT))
rest:
return __libeth_xsk_run_prog_slow(xdp, bq, act, ret);
ret = xdp_do_redirect(bq->dev, &xdp->base, bq->prog);
if (unlikely(ret))
goto rest;
return LIBETH_XDP_REDIRECT;
}
/**
* libeth_xsk_run_prog - run XDP program on XSk path and handle all verdicts
* @xdp: XSk buffer to process
* @bq: XDP Tx bulk to queue ``XDP_TX`` buffers
* @fl: driver ``XDP_TX`` bulk flush callback
*
* Run the attached XDP program and handle all possible verdicts.
* Prefer using it via LIBETH_XSK_DEFINE_RUN{,_PASS,_PROG}().
*
* Return: libeth_xdp prog verdict depending on the prog's verdict.
*/
#define libeth_xsk_run_prog(xdp, bq, fl) \
__libeth_xdp_run_flush(xdp, bq, __libeth_xsk_run_prog, \
libeth_xsk_tx_queue_bulk, fl)
/**
* __libeth_xsk_run_pass - helper to run XDP program and handle the result
* @xdp: XSk buffer to process
* @bq: XDP Tx bulk to queue ``XDP_TX`` frames
* @napi: NAPI to build an skb and pass it up the stack
* @rs: onstack libeth RQ stats
* @md: metadata that should be filled to the XSk buffer
* @prep: callback for filling the metadata
* @run: driver wrapper to run XDP program
* @populate: driver callback to populate an skb with the HW descriptor data
*
* Inline abstraction, XSk's counterpart of __libeth_xdp_run_pass(), see its
* doc for details.
*
* Return: false if the polling loop must be exited due to lack of free
* buffers, true otherwise.
*/
static __always_inline bool
__libeth_xsk_run_pass(struct libeth_xdp_buff *xdp,
struct libeth_xdp_tx_bulk *bq, struct napi_struct *napi,
struct libeth_rq_napi_stats *rs, const void *md,
void (*prep)(struct libeth_xdp_buff *xdp,
const void *md),
u32 (*run)(struct libeth_xdp_buff *xdp,
struct libeth_xdp_tx_bulk *bq),
bool (*populate)(struct sk_buff *skb,
const struct libeth_xdp_buff *xdp,
struct libeth_rq_napi_stats *rs))
{
struct sk_buff *skb;
u32 act;
rs->bytes += xdp->base.data_end - xdp->data;
rs->packets++;
if (unlikely(xdp_buff_has_frags(&xdp->base)))
libeth_xsk_buff_stats_frags(rs, xdp);
if (prep && (!__builtin_constant_p(!!md) || md))
prep(xdp, md);
act = run(xdp, bq);
if (likely(act == LIBETH_XDP_REDIRECT))
return true;
if (act != LIBETH_XDP_PASS)
return act != LIBETH_XDP_ABORTED;
skb = xdp_build_skb_from_zc(&xdp->base);
if (unlikely(!skb)) {
libeth_xsk_buff_free_slow(xdp);
return true;
}
if (unlikely(!populate(skb, xdp, rs))) {
napi_consume_skb(skb, true);
return true;
}
napi_gro_receive(napi, skb);
return true;
}
/**
* libeth_xsk_run_pass - helper to run XDP program and handle the result
* @xdp: XSk buffer to process
* @bq: XDP Tx bulk to queue ``XDP_TX`` frames
* @napi: NAPI to build an skb and pass it up the stack
* @rs: onstack libeth RQ stats
* @desc: pointer to the HW descriptor for that frame
* @run: driver wrapper to run XDP program
* @populate: driver callback to populate an skb with the HW descriptor data
*
* Wrapper around the underscored version when "fill the descriptor metadata"
* means just writing the pointer to the HW descriptor as @xdp->desc.
*/
#define libeth_xsk_run_pass(xdp, bq, napi, rs, desc, run, populate) \
__libeth_xsk_run_pass(xdp, bq, napi, rs, desc, libeth_xdp_prep_desc, \
run, populate)
/**
* libeth_xsk_finalize_rx - finalize XDPSQ after an XSk NAPI polling loop
* @bq: ``XDP_TX`` frame bulk
* @flush: driver callback to flush the bulk
* @finalize: driver callback to start sending the frames and run the timer
*
* Flush the bulk if there are frames left to send, kick the queue and flush
* the XDP maps.
*/
#define libeth_xsk_finalize_rx(bq, flush, finalize) \
__libeth_xdp_finalize_rx(bq, LIBETH_XDP_TX_XSK, flush, finalize)
/*
* Helpers to reduce boilerplate code in drivers.
*
* Typical driver XSk Rx flow would be (excl. bulk and buff init, frag attach):
*
* LIBETH_XDP_DEFINE_START();
* LIBETH_XSK_DEFINE_FLUSH_TX(static driver_xsk_flush_tx, driver_xsk_tx_prep,
* driver_xdp_xmit);
* LIBETH_XSK_DEFINE_RUN(static driver_xsk_run, driver_xsk_run_prog,
* driver_xsk_flush_tx, driver_populate_skb);
* LIBETH_XSK_DEFINE_FINALIZE(static driver_xsk_finalize_rx,
* driver_xsk_flush_tx, driver_xdp_finalize_sq);
* LIBETH_XDP_DEFINE_END();
*
* This will build a set of 4 static functions. The compiler is free to decide
* whether to inline them.
* Then, in the NAPI polling function:
*
* while (packets < budget) {
* // ...
* if (!driver_xsk_run(xdp, &bq, napi, &rs, desc))
* break;
* }
* driver_xsk_finalize_rx(&bq);
*/
/**
* LIBETH_XSK_DEFINE_FLUSH_TX - define a driver XSk ``XDP_TX`` flush function
* @name: name of the function to define
* @prep: driver callback to clean an XDPSQ
* @xmit: driver callback to write a HW Tx descriptor
*/
#define LIBETH_XSK_DEFINE_FLUSH_TX(name, prep, xmit) \
__LIBETH_XDP_DEFINE_FLUSH_TX(name, prep, xmit, xsk)
/**
* LIBETH_XSK_DEFINE_RUN_PROG - define a driver XDP program run function
* @name: name of the function to define
* @flush: driver callback to flush an XSk ``XDP_TX`` bulk
*/
#define LIBETH_XSK_DEFINE_RUN_PROG(name, flush) \
u32 __LIBETH_XDP_DEFINE_RUN_PROG(name, flush, xsk)
/**
* LIBETH_XSK_DEFINE_RUN_PASS - define a driver buffer process + pass function
* @name: name of the function to define
* @run: driver callback to run XDP program (above)
* @populate: driver callback to fill an skb with HW descriptor info
*/
#define LIBETH_XSK_DEFINE_RUN_PASS(name, run, populate) \
bool __LIBETH_XDP_DEFINE_RUN_PASS(name, run, populate, xsk)
/**
* LIBETH_XSK_DEFINE_RUN - define a driver buffer process, run + pass function
* @name: name of the function to define
* @run: name of the XDP prog run function to define
* @flush: driver callback to flush an XSk ``XDP_TX`` bulk
* @populate: driver callback to fill an skb with HW descriptor info
*/
#define LIBETH_XSK_DEFINE_RUN(name, run, flush, populate) \
__LIBETH_XDP_DEFINE_RUN(name, run, flush, populate, XSK)
/**
* LIBETH_XSK_DEFINE_FINALIZE - define a driver XSk NAPI poll finalize function
* @name: name of the function to define
* @flush: driver callback to flush an XSk ``XDP_TX`` bulk
* @finalize: driver callback to finalize an XDPSQ and run the timer
*/
#define LIBETH_XSK_DEFINE_FINALIZE(name, flush, finalize) \
__LIBETH_XDP_DEFINE_FINALIZE(name, flush, finalize, xsk)
/* Refilling */
/**
* struct libeth_xskfq - structure representing an XSk buffer (fill) queue
* @fp: hotpath part of the structure
* @pool: &xsk_buff_pool for buffer management
* @fqes: array of XSk buffer pointers
* @descs: opaque pointer to the HW descriptor array
* @ntu: index of the next buffer to poll
* @count: number of descriptors/buffers the queue has
* @pending: current number of XSkFQEs to refill
* @thresh: threshold below which the queue is refilled
* @buf_len: HW-writeable length per each buffer
* @nid: ID of the closest NUMA node with memory
*/
struct libeth_xskfq {
struct_group_tagged(libeth_xskfq_fp, fp,
struct xsk_buff_pool *pool;
struct libeth_xdp_buff **fqes;
void *descs;
u32 ntu;
u32 count;
);
/* Cold fields */
u32 pending;
u32 thresh;
u32 buf_len;
int nid;
};
int libeth_xskfq_create(struct libeth_xskfq *fq);
void libeth_xskfq_destroy(struct libeth_xskfq *fq);
/**
* libeth_xsk_buff_xdp_get_dma - get DMA address of XSk &libeth_xdp_buff
* @xdp: buffer to get the DMA addr for
*/
#define libeth_xsk_buff_xdp_get_dma(xdp) \
xsk_buff_xdp_get_dma(&(xdp)->base)
/**
* libeth_xskfqe_alloc - allocate @n XSk Rx buffers
* @fq: hotpath part of the XSkFQ, usually onstack
* @n: number of buffers to allocate
* @fill: driver callback to write DMA addresses to HW descriptors
*
* Note that @fq->ntu gets updated, but ::pending must be recalculated
* by the caller.
*
* Return: number of buffers refilled.
*/
static __always_inline u32
libeth_xskfqe_alloc(struct libeth_xskfq_fp *fq, u32 n,
void (*fill)(const struct libeth_xskfq_fp *fq, u32 i))
{
u32 this, ret, done = 0;
struct xdp_buff **xskb;
this = fq->count - fq->ntu;
if (likely(this > n))
this = n;
again:
xskb = (typeof(xskb))&fq->fqes[fq->ntu];
ret = xsk_buff_alloc_batch(fq->pool, xskb, this);
for (u32 i = 0, ntu = fq->ntu; likely(i < ret); i++)
fill(fq, ntu + i);
done += ret;
fq->ntu += ret;
if (likely(fq->ntu < fq->count) || unlikely(ret < this))
goto out;
fq->ntu = 0;
if (this < n) {
this = n - this;
goto again;
}
out:
return done;
}
/* .ndo_xsk_wakeup */
void libeth_xsk_init_wakeup(call_single_data_t *csd, struct napi_struct *napi);
void libeth_xsk_wakeup(call_single_data_t *csd, u32 qid);
/* Pool setup */
int libeth_xsk_setup_pool(struct net_device *dev, u32 qid, bool enable);
#endif /* __LIBETH_XSK_H */