Convert fbnic from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's __fbnic_set_rx_mode() now takes explicit uc/mc list
parameters and uses __hw_addr_sync_dev() on the snapshots instead
of __dev_uc_sync/__dev_mc_sync on the netdev directly.
Update callers in fbnic_up, fbnic_fw_config_after_crash,
fbnic_bmc_rpc_check and fbnic_set_mac to pass the real address
lists calling __fbnic_set_rx_mode outside the async work path.
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: kernel-team@meta.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-6-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Increasing the MTU beyond the HDS threshold causes the hardware to
fragment packets across multiple buffers. If a single-buffer XDP program
is attached, the driver will drop all multi-frag frames. While we can't
prevent a remote sender from sending non-TCP packets larger than the MTU,
this will prevent users from inadvertently breaking new TCP streams.
Traditionally, drivers supported XDP with MTU less than 4Kb
(packet per page). Fbnic currently prevents attaching XDP when MTU is too high.
But it does not prevent increasing MTU after XDP is attached.
Fixes: 1b0a3950db ("eth: fbnic: Add XDP pass, drop, abort support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we have exposed the PCS registers via the SWMII we can now start looking
at connecting the XPCS driver to those registers and let it mange the PCS
instead of us doing it directly from the fbnic driver.
For now this just gets us the ability to detect link. The hope is in the
future to add some of the vendor specific registers to begin enabling XPCS
configuration of the interface.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/176374325295.959489.14521115864034905277.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
One complication with the design of our part is that the PMD doesn't
provide a direct signal to the host. Instead we have visibility to signals
that the PCS provides to the MAC that allow us to check the link state
through that.
We will need to account for several things in the PMD and firmware when
managing the link. Specifically when the link first starts to come up the
PMD will cause the link to flap. This is due to the firmware starting a
training cycle when the link is first detected. This will cause link
flapping if we were to immediately report link up when the PCS first
detects it.
To address that we are adding a pmd_state variable that is meant to be a
countdown of sorts indicating the state of the PMD. If the link is down or
has been reconfigured the PMD will start out in the initialize state. By
default the link is assumed to be in the SEND_DATA state if it is available
on initial link inspection. If link is detected while in the initialize
state the PMD state will switch to training, and if after 4 seconds the
link is still stable we will transition to link_ready, and finally the
send_data state. With this we can avoid link flapping when a cable is
first connected to the NIC.
One side effect of this is that we need to pull the link state away from
the PCS. For now we use a union of the PCS link state register value and
the pmd_state. The plan is to add a PMD register to report the pmd_state
to the phylink interface. With that we can then look at switching over to
the use of the XPCS driver for fbnic instead of having an internal one.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/176374323107.959489.14951134213387615059.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Throughout several spots in the code I had called out the IRQ as being
related to the PCS. However the actual IRQ is a part of the MAC and it is
just exposing PCS data. To more accurately reflect the owner of the calls
this change makes it so that we rename the functions and values that are
taking in the interrupt value and processing it to reflect that it is a MAC
call and not a PCS one.
This change is mostly motivated by the fact that we will be moving the
handling of this interrupt from being PCS focused to being more PMA/PMD
focused as this will drive the phydev driver that I am adding instead of
driving the PCS directly.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/176374322373.959489.12018231545479053860.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Rx processing under normal circumstances has 3 rings - 2 buffer
rings (heads, payloads) and a completion ring. All the rings
have a struct fbnic_ring. Make sure we expose alloc_failed
counter from the buffer rings, previously only the alloc_failed
from the completion ring was reported, even tho all ring types
may increment this counter (buffer rings in __fbnic_fill_bdq()).
This makes the pp_alloc_fail.py test pass, it expects the qstat
to be incrementing as page pool injections happen.
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 67dc4eb5fc ("eth: fbnic: report software Rx queue stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251007232653.2099376-7-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Support devmem Tx. We already use skb_frag_dma_map(), we just need
to make sure we don't try to unmap the frags. Check if frag is
unreadable and mark the ring entry.
# ./tools/testing/selftests/drivers/net/hw/devmem.py
TAP version 13
1..3
ok 1 devmem.check_rx
ok 2 devmem.check_tx
ok 3 devmem.check_tx_chunks
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Acked-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250916145401.1464550-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Support queue ops. fbnic doesn't shut down the entire device
just to restart a single queue.
./tools/testing/selftests/drivers/net/hw/iou-zcrx.py
TAP version 13
1..3
ok 1 iou-zcrx.test_zcrx
ok 2 iou-zcrx.test_zcrx_oneshot
ok 3 iou-zcrx.test_zcrx_rss
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Acked-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-15-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We'll add queue ops soon so. queue ops will opt the driver into
extra locking. Request this locking explicitly already to make
future patches smaller and easier to review.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250901211214.1027927-6-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cross-merge networking fixes after downstream PR (net-6.17-rc4).
No conflicts.
Adjacent changes:
drivers/net/ethernet/intel/idpf/idpf_txrx.c
02614eee26 ("idpf: do not linearize big TSO packets")
6c4e684802 ("idpf: remove obsolete stashing code")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To make the __fbnic_set_rx_mode and __fbnic_clear_rx_mode calls usable by
more points in the code we can make to that they expect a fbnic_dev pointer
instead of a netdev pointer.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/175623749436.2246365.6068665520216196789.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In order for us to support the BMC possibly connecting, disconnecting, and
then reconnecting we need to be able to support entities outside of just
the NIC setting up promiscuous mode as the BMC can use a multicast
promiscuous setup.
To support that we should move the promisc_sync code out of the netdev and
into the RPC section of the driver so that it is reachable from more paths.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/175623748769.2246365.2130394904175851458.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The fbnic driver was presenting with the following locking assert coming
out of a PM resume:
[ 42.208116][ T164] RTNL: assertion failed at drivers/net/phy/phylink.c (2611)
[ 42.208492][ T164] WARNING: CPU: 1 PID: 164 at drivers/net/phy/phylink.c:2611 phylink_resume+0x190/0x1e0
[ 42.208872][ T164] Modules linked in:
[ 42.209140][ T164] CPU: 1 UID: 0 PID: 164 Comm: bash Not tainted 6.17.0-rc2-virtme #134 PREEMPT(full)
[ 42.209496][ T164] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014
[ 42.209861][ T164] RIP: 0010:phylink_resume+0x190/0x1e0
[ 42.210057][ T164] Code: 83 e5 01 0f 85 b0 fe ff ff c6 05 1c cd 3e 02 01 90 ba 33 0a 00 00 48 c7 c6 20 3a 1d a5 48 c7 c7 e0 3e 1d a5 e8 21 b8 90 fe 90 <0f> 0b 90 90 e9 86 fe ff ff e8 42 ea 1f ff e9 e2 fe ff ff 48 89 ef
[ 42.210708][ T164] RSP: 0018:ffffc90000affbd8 EFLAGS: 00010296
[ 42.210983][ T164] RAX: 0000000000000000 RBX: ffff8880078d8400 RCX: 0000000000000000
[ 42.211235][ T164] RDX: 0000000000000000 RSI: 1ffffffff4f10938 RDI: 0000000000000001
[ 42.211466][ T164] RBP: 0000000000000000 R08: ffffffffa2ae79ea R09: fffffbfff4b3eb84
[ 42.211707][ T164] R10: 0000000000000003 R11: 0000000000000000 R12: ffff888007ad8000
[ 42.211997][ T164] R13: 0000000000000002 R14: ffff888006a18800 R15: ffffffffa34c59e0
[ 42.212234][ T164] FS: 00007f0dc8e39740(0000) GS:ffff88808f51f000(0000) knlGS:0000000000000000
[ 42.212505][ T164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.212704][ T164] CR2: 00007f0dc8e9fe10 CR3: 000000000b56d003 CR4: 0000000000772ef0
[ 42.213227][ T164] PKRU: 55555554
[ 42.213366][ T164] Call Trace:
[ 42.213483][ T164] <TASK>
[ 42.213565][ T164] __fbnic_pm_attach.isra.0+0x8e/0xa0
[ 42.213725][ T164] pci_reset_function+0x116/0x1d0
[ 42.213895][ T164] reset_store+0xa0/0x100
[ 42.214025][ T164] ? pci_dev_reset_attr_is_visible+0x50/0x50
[ 42.214221][ T164] ? sysfs_file_kobj+0xc1/0x1e0
[ 42.214374][ T164] ? sysfs_kf_write+0x65/0x160
[ 42.214526][ T164] kernfs_fop_write_iter+0x2f8/0x4c0
[ 42.214677][ T164] ? kernfs_vma_page_mkwrite+0x1f0/0x1f0
[ 42.214836][ T164] new_sync_write+0x308/0x6f0
[ 42.214987][ T164] ? __lock_acquire+0x34c/0x740
[ 42.215135][ T164] ? new_sync_read+0x6f0/0x6f0
[ 42.215288][ T164] ? lock_acquire.part.0+0xbc/0x260
[ 42.215440][ T164] ? ksys_write+0xff/0x200
[ 42.215590][ T164] ? perf_trace_sched_switch+0x6d0/0x6d0
[ 42.215742][ T164] vfs_write+0x65e/0xbb0
[ 42.215876][ T164] ksys_write+0xff/0x200
[ 42.215994][ T164] ? __ia32_sys_read+0xc0/0xc0
[ 42.216141][ T164] ? do_user_addr_fault+0x269/0x9f0
[ 42.216292][ T164] ? rcu_is_watching+0x15/0xd0
[ 42.216442][ T164] do_syscall_64+0xbb/0x360
[ 42.216591][ T164] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 42.216784][ T164] RIP: 0033:0x7f0dc8ea9986
A bit of digging showed that we were invoking the phylink_resume as a part
of the fbnic_up path when we were enabling the service task while not
holding the RTNL lock. We should be enabling this sooner as a part of the
ndo_open path and then just letting the service task come online later.
This will help to enforce the correct locking and brings the phylink
interface online at the same time as the network interface, instead of at a
later time.
I tested this on QEMU to verify this was working by putting the system to
sleep using "echo mem > /sys/power/state" to put the system to sleep in the
guest and then using the command "system_wakeup" in the QEMU monitor.
Fixes: 69684376ee ("eth: fbnic: Add link detection")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/175616257316.1963577.12238158800417771119.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move hw_stats_lock out of fbnic_dev to a more appropriate struct
fbnic_hw_stats since the only use of this lock is to protect access to
the hardware stats. While at it, enclose the lock and stats
initialization in a single init call.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for XDP statistics collection and reporting via rtnl_link
and netdev_queue API.
For XDP programs without frags support, fbnic requires MTU to be less
than the HDS threshold. If an over-sized frame is received, the frame
is dropped and recorded as rx_length_errors reported via ip stats to
highlight that this is an error.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-9-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add basic support for attaching an XDP program to the device and support
for PASS/DROP/ABORT actions. In fbnic, buffers are always mapped as
DMA_BIDIRECTIONAL.
The BPF program pointer can be read either on a per-packet basis or on a
per-NAPI poll basis. Both approaches are functionally equivalent, in the
current code. Stick to per-packet as it limits number of arguments we need
to pass around.
On the XDP hot path, check that packets with fragments are only allowed
when multi-buffer support is enabled for the XDP program. Ideally, this
check should not be necessary because ndo_bpf verifies that for XDP
programs without multi-buff support, MTU is less than the hds_thresh.
However, the MTU currently does not enforce the receive size which would
require cleaning up the data path and bouncing the link. For practical
reasons, prioritize the ability to enter and exit BPF mode with different
MTU sizes without requiring a full reconfig.
Testing:
Hook a simple XDP program that passes all the packets destined for a
specific port
iperf3 -c 192.168.1.10 -P 5 -p 12345
Connecting to host 192.168.1.10, port 12345
[ 5] local 192.168.1.9 port 46702 connected to 192.168.1.10 port 12345
[ ID] Interval Transfer Bitrate Retr Cwnd
- - - - - - - - - - - - - - - - - - - - - - - - -
[SUM] 1.00-2.00 sec 3.86 GBytes 33.2 Gbits/sec 0
XDP_DROP:
Hook an XDP program that drops packets destined for a specific port
iperf3 -c 192.168.1.10 -P 5 -p 12345
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[SUM] 0.00-0.00 sec 0.00 Bytes 0.00 bits/sec 0 sender
[SUM] 0.00-0.00 sec 0.00 Bytes 0.00 bits/sec receiver
iperf3: interrupt - the client has terminated
XDP with HDS:
- Validate XDP attachment failure when HDS is low
~] ethtool -G eth0 hds-thresh 512
~] sudo ip link set eth0 xdpdrv obj xdp_pass_12345.o sec xdp
~] Error: fbnic: MTU too high, or HDS threshold is too low for single
buffer XDP.
- Validate successful XDP attachment when HDS threshold is appropriate
~] ethtool -G eth0 hds-thresh 1536
~] sudo ip link set eth0 xdpdrv obj xdp_pass_12345.o sec xdp
- Validate when the XDP program is attached, changing HDS thresh to a
lower value fails
~] ethtool -G eth0 hds-thresh 512
~] netlink error: fbnic: Use higher HDS threshold or multi-buf capable
program
- Validate HDS thresh does not matter when xdp frags support is
available
~] ethtool -G eth0 hds-thresh 512
~] sudo ip link set eth0 xdpdrv obj xdp_pass_mb_12345.o sec xdp.frags
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add support for configuring the header data split threshold.
For fbnic, the tcp data split support is enabled all the time.
Fbnic supports a maximum buffer size of 4KB. However, the reservation
for the headroom, tailroom, and padding reduce the max header size
accordingly.
ethtool_hds -g eth0
Ring parameters for eth0:
Pre-set maximums:
...
HDS thresh: 3584
Current hardware settings:
...
HDS thresh: 1536
Verify hds tests in ksft-net-drv are passing
ksft-net-drv]# ./drivers/net/hds.py
TAP version 13
1..13
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by ...
...
...
ok 12 hds.ioctl_set_xdp
ok 13 hds.ioctl_enabled_set_xdp
\# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:1 error:0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-2-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wrap copying of drop stats on TX path from fbd->hw_stats by the
hw_stats_lock. Currently, it is being performed outside the lock and
another thread accessing fbd->hw_stats can lead to inconsistencies.
Fixes: 5f8bd2ce82 ("eth: fbnic: add support for TMI stats")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250802024636.679317-3-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correctly copy the tx_dropped stats from the fbd->hw_stats to the
rtnl_link_stats64 struct.
Fixes: 5f8bd2ce82 ("eth: fbnic: add support for TMI stats")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250802024636.679317-2-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CI hit a UaF in fbnic in the AF_XDP portion of the queues.py test.
The UaF is in the __sk_mark_napi_id_once() call in xsk_bind(),
NAPI has been freed. Looks like the device failed to open earlier,
and we lack clearing the NAPI pointer from the queue.
Fixes: 557d02238e ("eth: fbnic: centralize the queue count and NAPI<>queue setting")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250728163129.117360-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There were several issues in the way we were handling the link info coming
from firmware.
First is the fact that we were carrying around "AUTO" flags to indicate
that we needed to populate the values. We can just drop this and assume
that we will always be populating the settings from firmware. With this we
can also clean up the masking as the "AUTO" flags were just there to be
stripped anyway.
Second since we are getting rid of the "AUTO" setting we still need a way
to report that the link is not configured. We convert the link_mode "AUTO"
to "UNKNOWN" to do this. With this we can avoid reporting link up in the
phylink_pcs_get_state call as we will just set link to 0 and return without
updating the link speed. This is preferred versus the driver just forcing
50G which makes it harder to recover when the FW does start providing valid
settings.
With this the plan is to eventually replace the link_mode we use with the
interface_mode from phylink for all intents and purposes and have the two
be interchangeable. At that point we can convert the FW provided settings
over to something closer to link partner settings and give phylink greater
control of the interface allowing for user override of the settings and an
asynchronous setup of the link versus having to pull early settings from
firmware.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/175028445548.625704.1367708155813490215.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We have two issues that need to be addressed in our IRQ handling.
One is the fact that we can end up double-freeing IRQs in the event of an
exception handling error such as a PCIe reset/recovery that fails. To
prevent that from becoming an issue we can use the msix_vector values to
indicate that we have successfully requested/freed the IRQ by only setting
or clearing them when we have completed the given action.
The other issue is that we have several potential races in our IRQ path due
to us manipulating the mask before the vector has been truly disabled. In
order to handle that in the case of the FW mailbox we need to not
auto-enable the IRQ and instead will be enabling/disabling it separately.
In the case of the PCS vector we can mitigate this by unmapping it and
synchronizing the IRQ before we clear the mask.
The general order of operations after this change is now to request the
interrupt, poll the FW mailbox to ready, and then enable the interrupt. For
the shutdown we do the reverse where we disable the interrupt, flush any
pending Tx, and then free the IRQ. I am renaming the enable/disable to
request/free to be equivilent with the IRQ calls being used. We may see
additions in the future to enable/disable the IRQs versus request/free them
for certain use cases.
Fixes: da3cde0820 ("eth: fbnic: Add FW communication mechanism")
Fixes: 69684376ee ("eth: fbnic: Add link detection")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654719271.499179.3634535105127848325.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix the tracking of rtnl_link_stats.tx_dropped. The counter
`tmi.drop.frames` is being double counted whereas, the counter
`tti.cm_drop.frames` is being skipped.
Fixes: f2957147ae ("eth: fbnic: add support for TTI HW stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250503020145.1868252-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add coverage for the TX Extension (TEI) Interface (TTI) stats. We are
tracking packets and control message drops because of credit exhaustion
on the TX interface.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch add coverage for TMI stats including PTP stats and drop
stats.
PTP stats include illegal requests, bad timestamp and good timestamps.
The bad timestamp and illegal request counters are reported under as
`error` via `ethtool -T` Both these counters are individually being
reported via `ethtool -S`
The good timestamp stats are being reported as `pkts` via `ethtool -T`
ethtool -S eth0 | grep "ptp"
ptp_illegal_req: 0
ptp_good_ts: 0
ptp_bad_ts: 0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-5-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch provides coverage to the RXB (RX Buffer) stats. RXB stats
are divided into 3 sections: RXB enqueue, RXB FIFO, and RXB dequeue
stats.
The RXB enqueue/dequeue stats are indexed from 0-3 and cater for the
input/output counters whereas, the RXB fifo stats are indexed from 0-7.
The RXB also supports pause frame stats counters which we are leaving
for a later patch.
ethtool -S eth0 | grep rxb
rxb_integrity_err0: 0
rxb_mac_err0: 0
rxb_parser_err0: 0
rxb_frm_err0: 0
rxb_drbo0_frames: 1433543
rxb_drbo0_bytes: 775949081
---
---
rxb_intf3_frames: 1195711
rxb_intf3_bytes: 739650210
rxb_pbuf3_frames: 1195711
rxb_pbuf3_bytes: 765948092
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-4-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch provides support for hardware queue stats and covers
packet errors for RX-DMA engine, RCQ drops and BDQ drops.
The packet errors are also aggregated with the `rx_errors` stats in the
`rtnl_link_stats` as well as with the `hw_drops` in the queue API.
The RCQ and BDQ drops are aggregated with `rx_over_errors` in the
`rtnl_link_stats` as well as with the `hw_drop_overruns` in the queue API.
ethtool -S eth0 | grep -E 'rde'
rde_0_pkt_err: 0
rde_0_pkt_cq_drop: 0
rde_0_pkt_bdq_drop: 0
---
---
rde_127_pkt_err: 0
rde_127_pkt_cq_drop: 0
rde_127_pkt_bdq_drop: 0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-3-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add ethtool support to configure the IRQ coalescing behavior. Support
separate timers for Rx and Tx for time based coalescing. For frame based
configuration, currently we only support the Rx side.
The hardware allows configuration of descriptor count instead of frame
count requiring conversion between the two. We assume 2 descriptors
per frame, one for the metadata and one for the data segment.
When rx-frames are not configured, we set the RX descriptor count to
half the ring size as a fail safe.
Default configuration:
ethtool -c eth0 | grep -E "rx-usecs:|tx-usecs:|rx-frames:"
rx-usecs: 30
rx-frames: 0
tx-usecs: 35
IRQ rate test:
With single iperf flow we monitor IRQ rate while changing the tx-usesc and
rx-usecs to high and low values.
ethtool -C eth0 rx-frames 8192 rx-usecs 150 tx-usecs 150
irq/sec 13k
irq/sec 14k
irq/sec 14k
ethtool -C eth0 rx-frames 8192 rx-usecs 10 tx-usecs 10
irq/sec 27k
irq/sec 28k
irq/sec 28k
Validating the use of extack:
ethtool -C eth0 rx-frames 16384
netlink error: fbnic: rx_frames is above device max
netlink error: Invalid argument
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250218023520.2038010-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add TSO support to the driver. Device can handle unencapsulated or
IPv6-in-IPv6 packets. Any other tunnel stacks are handled with
GSO partial.
Validate that the packet can be offloaded in ndo_features_check.
Main thing we need to check for is that the header geometry can
be expressed in the decriptor fields (offsets aren't too large).
Report number of TSO super-packets via the qstat API.
Link: https://patch.msgid.link/20250216174109.2808351-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add ethtool -n / -N support. Support only "un-ordered" rule sets
(RX_CLS_LOC_ANY), just for simplicity of the code. It's unclear
anyone actually cares about the rule ordering.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20250206235334.1425329-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I realized when we were adding unicast addresses we were enabling
promiscuous mode. I did a bit of digging and realized we had overlooked
setting the driver private flag to indicate we supported unicast filtering.
Example below shows the table with 00deadbeef01 as the main NIC address,
and 5 additional addresses in the 00deadbeefX0 format.
# cat $dbgfs/mac_addr
Idx S TCAM Bitmap Addr/Mask
----------------------------------
00 0 00000000,00000000 000000000000
000000000000
01 0 00000000,00000000 000000000000
000000000000
02 0 00000000,00000000 000000000000
000000000000
...
24 0 00000000,00000000 000000000000
000000000000
25 1 00100000,00000000 00deadbeef50
000000000000
26 1 00100000,00000000 00deadbeef40
000000000000
27 1 00100000,00000000 00deadbeef30
000000000000
28 1 00100000,00000000 00deadbeef20
000000000000
29 1 00100000,00000000 00deadbeef10
000000000000
30 1 00100000,00000000 00deadbeef01
000000000000
31 0 00000000,00000000 000000000000
000000000000
Before rule 31 would be active. With this change it correctly sticks
to just the unicast filters.
Signed-off-by: Alexander Duyck <alexanderduyck@meta.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204010038.1404268-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
To simplify dealing with RTNL_ASSERT() requirements further
down the line, move setting queue count and NAPI<>queue
association to their own helpers.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20241220025241.1522781-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change our method of swapping NAPIs without disturbing existing config.
This is primarily needed for "live reconfiguration" such as changing
the channel count when interface is already up.
Previously we were planning to use a trick of using shared interrupts.
We would install a second IRQ handler for the new NAPI, and make it
return IRQ_NONE until we were ready for it to take over. This works fine
functionally but breaks IRQ naming. The IRQ subsystem uses the IRQ name
to create the procfs entry, since both handlers used the same name
the second handler wouldn't get a proc directory registered.
When first one gets removed on success full ring count change
it would remove its directory and we would be left with none.
New approach uses a double pointer to the NAPI. The IRQ handler needs
to know how to locate the NAPI to schedule. We register a single IRQ handler
and give it a pointer to a pointer. We can then change what it points to
without re-registering. This may have a tiny perf impact, but really
really negligible.
Link: https://patch.msgid.link/20241220025241.1522781-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We will need an array for storing NAPIs in the upcoming IRQ handler
reuse rework. Replace the current list we have, so that we are able
to reuse it later.
In a few places replace i as the iterator with t when we iterate
over triads, this seems slightly less confusing than having
i, j, k variables.
Link: https://patch.msgid.link/20241220025241.1522781-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to redirect host-to-BMC traffic by writing MACDA entries
from the RPC (RX Parser and Classifier) to TCE-TCAM. The TCE TCAM is a
small L2 destination TCAM which is placed at the end of the TX path (TCE).
Unlike other NICs, where BMC diversion is typically handled by firmware,
for fbnic, firmware does not touch anything related to the host; hence,
the host uses TCE TCAM to divert BMC traffic.
Currently, we lack metadata to track where addresses have been written
in the TCAM, except for the last entry written. To address this issue,
we start at the opposite end of the table in each pass, so that adding
or deleting entries does not affect the availability of all entries,
assuming there is no significant reordering of entries.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20241104031300.1330657-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add callbacks to support timestamping configuration via ethtool.
Add processing of RX timestamps.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Create PHC device and provide callbacks needed for ptp_clock device.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add ethtool ops support and enable 'get_drvinfo' for fbnic. The driver
provides firmware version information while the driver name and bus
information is provided by ethtool_get_drvinfo().
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement netdev_stat_ops and export the basic per-queue stats.
This interface expect users to set the values that are used
either to zero or to some other preserved value (they are 0xff by
default). So here we export bytes/packets/drops from tx and rx_stats
plus set some of the values that are exposed by queue stats
to zero.
$ cd tools/testing/selftests/drivers/net && ./stats.py
[...]
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20240810054322.2766421-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Count packets, bytes and drop on the datapath, and report
to the user. Since queues are completely freed when the
device is down - accumulate the stats in the main netdev struct.
This means that per-queue stats will only report values since
last reset (per qstat recommendation).
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240810054322.2766421-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RSS is controlled by the Rx filter tables. Program rules matching
on appropriate traffic types and set hashing fields using actions.
We need a separate set of rules for broadcast and multicast
because the action there needs to include forwarding to BMC.
This patch only initializes the default settings, the control
of the configuration using ethtool will come soon.
With this the necessary rules are put in place to enable Rx of packets by
the host.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079943591.1778861.17778587068185893750.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Program the Rx TCAM to control L2 forwarding. Since we are in full
control of the NIC we need to make sure we include BMC forwarding
in the rules. When host is not present BMC will program the TCAM
to get onto the network but once we take ownership it's up to
Linux driver to make sure BMC L2 addresses are handled correctly.
Co-developed-by: Sanman Pradhan <sanmanpradhan@meta.com>
Signed-off-by: Sanman Pradhan <sanmanpradhan@meta.com>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079943202.1778861.4410412697614789017.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Handle Rx packets with basic csum and Rx hash offloads.
NIC writes back to the completion ring a head buffer descriptor
(data buffer allocated from header pages), variable number of payload
descriptors (data buffers in payload pages), an optional metadata
descriptor (type 2) and finally the primary metadata descriptor
(type 3).
This format makes scatter support fairly easy - start gathering
the pages when we see head page, gather until we see the primary
metadata descriptor, do the processing. Use XDP infra to collect
the packet fragments as we traverse the descriptors. XDP itself
is not supported yet, but it will be soon.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079942839.1778861.10509071985738726125.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Handle Tx of simple packets. Support checksum offload and gather.
Use .ndo_features_check to make sure packet geometry will be
supported by the HW, i.e. we can fit the header lengths into
the descriptor fields.
The device writes to the completion rings the position of the tail
(consumer) pointer. Read all those writebacks, obviously the last
one will be the most recent, complete skbs up to that point.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079942464.1778861.17919428039428796180.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>