Commit Graph

1266966 Commits

Author SHA1 Message Date
Shannon Nelson
56e41ee12d ionic: better dma-map error handling
Fix up a couple of small dma_addr handling issues
  - don't double-count dma-map-err stat in ionic_tx_map_skb()
    or ionic_xdp_post_frame()
  - return 0 on error from both ionic_tx_map_single() and
    ionic_tx_map_frag() and check for !dma_addr in ionic_tx_map_skb()
    and ionic_xdp_post_frame()
  - be sure to unmap buf_info[0] in ionic_tx_map_skb() error path
  - don't assign rx buf->dma_addr until error checked in ionic_rx_page_alloc()
  - remove unnecessary dma_addr_t casts

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
a12c1e7a64 ionic: remove unnecessary NULL test
We call ionic_rx_page_alloc() only on existing buf_info structs from
ionic_rx_fill().  There's no need for the additional NULL test.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
4554341dd0 ionic: rearrange ionic_queue for better layout
A simple change to the struct ionic_queue layout removes some
unnecessary padding and saves us a cacheline in the struct
ionic_qcq layout.

    struct ionic_queue {
	Before: /* size: 256, cachelines: 4, members: 29 */
	After:  /* size: 192, cachelines: 3, members: 29 */

    struct ionic_qcq {
	Before: /* size: 2112, cachelines: 33, members: 23 */
	After:  /* size: 2048, cachelines: 32, members: 23 */

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
453538c52f ionic: rearrange ionic_qcq
Rearange a few fields for better cache use and to put the
flags field up into the first cacheline rather than the last.

    struct ionic_qcq
	Before: /* size: 2176, cachelines: 34, members: 23 */
	After:  /* size: 2112, cachelines: 33, members: 23 */

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
0165892477 ionic: carry idev in ionic_cq struct
Remove the idev field from ionic_queue, which saves us a
bit of space, and add it into ionic_cq where there's room
within some cacheline padding.  Use this pointer rather
than doing a multi level reference from lif->ionic.

Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
36a47c906b ionic: refactor skb building
The existing ionic_rx_frags() code is a bit of a mess and can
be cleaned up by unrolling the first frag/header setup from
the loop, then reworking the do-while-loop into a for-loop.  We
rename the function to a more descriptive ionic_rx_build_skb().
We also change a couple of related variable names for readability.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
8599bd4cf3 ionic: fold adminq clean into service routine
Since the AdminQ clean is a simple action called from only
one place, fold it back into the service routine.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
4dcd4575bf ionic: use specialized desc info structs
Make desc_info structure specific to the queue type, which
allows us to cut down the Rx and AdminQ descriptor sizes by
not including all the fields needed for the Tx desriptors.

Before:
    struct ionic_desc_info {
	/* size: 464, cachelines: 8, members: 6 */

After:
    struct ionic_tx_desc_info {
	/* size: 464, cachelines: 8, members: 6 */
    struct ionic_rx_desc_info {
	/* size: 224, cachelines: 4, members: 2 */
    struct ionic_admin_desc_info {
	/* size: 8, cachelines: 1, members: 1 */

Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
65e548f6b0 ionic: remove the cq_info to save more memory
With a little simple math we don't need another struct array to
find the completion structs, so we can remove the ionic_cq_info
altogether.  This doesn't really save anything in the ionic_cq
since it gets padded out to the cacheline, but it does remove
the parallel array allocation of 8 * num_descriptors, or about
8 Kbytes per queue in a default configuration.

Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
ae24a8f88b ionic: remove callback pointer from desc_info
By reworking the queue service routines to have their own
servicing loops we can remove the cb pointer from desc_info
to save another 8 bytes per descriptor,

This simplifies some of the queue handling indirection and makes
the code a little easier to follow, and keeps service code in
one place rather than jumping between code files.

   struct ionic_desc_info
	Before:  /* size: 472, cachelines: 8, members: 7 */
	After:   /* size: 464, cachelines: 8, members: 6 */

Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
05c9447395 ionic: move adminq-notifyq handling to main file
Move the AdminQ and NotifyQ queue handling to ionic_main.c with
the rest of the adminq code.

Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:34 +00:00
Shannon Nelson
90c01ede6d ionic: drop q mapping
Now that we're not using desc_info pointers mapped in every q
we can simplify and drop the unnecessary utility functions.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:33 +00:00
Shannon Nelson
d60984d39f ionic: remove desc, sg_desc and cmb_desc from desc_info
Remove the struct pointers from desc_info to use less space.
Instead of pointers in every desc_info to its descriptor,
we can use the queue descriptor index to find the individual
desc, desc_info, and sgl structs in their parallel arrays.

   struct ionic_desc_info
	Before:  /* size: 496, cachelines: 8, members: 10 */
	After:   /* size: 472, cachelines: 8, members: 7 */

Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:54:33 +00:00
David S. Miller
e3eec34977 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-03-06 (iavf, i40e, ixgbe)

This series contains updates to iavf, i40e, and ixgbe drivers.

Alexey Kodanev removes duplicate calls related to cloud filters on iavf
and unnecessary null checks on i40e.

Maciej adds helper functions for common code relating to updating
statistics for ixgbe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:43:21 +00:00
Jakub Kicinski
7221fbe84f Add Jeff Kirsher to .get_maintainer.ignore
Jeff was retired as the Intel driver maintainer in
commit 6667df916f ("MAINTAINERS: Update MAINTAINERS for
Intel ethernet drivers"), and his address bounces.
But he has signed-off a lot of patches over the years
so get_maintainer insists on CCing him.

We haven't heard from him since he left Intel, so remapping
the address via mailmap is also pointless. Add to ignored
addresses.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:36:54 +00:00
Barry Song
77292bb8ca crypto: scomp - remove memcpy if sg_nents is 1 and pages are lowmem
while sg_nents is 1, which is always true for the current kernel
as the only user - zswap is this case, we might have a chance to
remove memcpy, thus improve the performance.
Though sg_nents is 1, its buffer might cross two pages. If those
pages are highmem, we have no cheap way to map them to contiguous
virtual address because kmap doesn't support more than one page
(kmap single higmem page could be still expensive for tlb) and
vmap is expensive.
So we also test and enure page is not highmem in order to safely
use page_to_virt before removing the memcpy. The good news is
that in the most majority of cases, we are lowmem, and we are
always lowmem in those modern and popular hardware.

Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-08 19:23:25 +08:00
Vladis Dronov
43a7885ec0 crypto: tcrypt - add ffdhe2048(dh) test
Commit 7dce598197 ("crypto: dh - implement ffdheXYZ(dh) templates")
implemented the said templates. Add ffdhe2048(dh) test as it is the
fastest one. This is a requirement for the FIPS certification.

Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-08 19:23:25 +08:00
Barry Song
30dd94dba3 crypto: iaa - fix the missing CRYPTO_ALG_ASYNC in cra_flags
Add the missing CRYPTO_ALG_ASYNC flag since intel iaa driver
works asynchronously.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-08 19:23:25 +08:00
Barry Song
db8ac88385 crypto: hisilicon/zip - fix the missing CRYPTO_ALG_ASYNC in cra_flags
Add the missing CRYPTO_ALG_ASYNC flag since hisilizon zip driver
works asynchronously.

Cc: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Yang Shen <shenyang39@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-08 19:23:25 +08:00
Martin Kaiser
12e37aef7b hwrng: hisi - use dev_err_probe
Replace dev_err + return with dev_err_probe.

Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-08 19:23:24 +08:00
Tudor Ambarus
bc9ce934c4 MAINTAINERS: Remove T Ambarus from few mchp entries
I have been no longer at Microchip for more than a year and I'm no
longer interested in maintaining these drivers. Let other mchp people
step up, thus remove myself. Thanks for the nice collaboration everyone!

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-03-08 19:23:24 +08:00
Jiri Pirko
b446631f35 dpll: fix dpll_xa_ref_*_del() for multiple registrations
Currently, if there are multiple registrations of the same pin on the
same dpll device, following warnings are observed:
WARNING: CPU: 5 PID: 2212 at drivers/dpll/dpll_core.c:143 dpll_xa_ref_pin_del.isra.0+0x21e/0x230
WARNING: CPU: 5 PID: 2212 at drivers/dpll/dpll_core.c:223 __dpll_pin_unregister+0x2b3/0x2c0

The problem is, that in both dpll_xa_ref_dpll_del() and
dpll_xa_ref_pin_del() registration is only removed from list in case the
reference count drops to zero. That is wrong, the registration has to
be removed always.

To fix this, remove the registration from the list and free
it unconditionally, instead of doing it only when the ref reference
counter reaches zero.

Fixes: 9431063ad3 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:17:30 +00:00
David S. Miller
570c86ed60 Merge branch 'ipv6-lockless-dump-addrs'
Eric Dumazet says:

====================
ipv6: lockless inet6_dump_addr()

This series removes RTNL locking to dump ipv6 addresses.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
155549a668 ipv6: remove RTNL protection from inet6_dump_addr()
We can now remove RTNL acquisition while running
inet6_dump_addr(), inet6_dump_ifmcaddr()
and inet6_dump_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
9cc4cc329d ipv6: use xa_array iterator to implement inet6_dump_addr()
inet6_dump_addr() can use the new xa_array iterator
for better scalability.

Make it ready for RCU-only protection.
RTNL use is removed in the following patch.

Also properly return 0 at the end of a dump to avoid
and extra recvmsg() to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
46f5182dd7 ipv6: make in6_dump_addrs() lockless
in6_dump_addrs() is called with RCU protection.

There is no need holding idev->lock to iterate through unicast addresses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:35 +00:00
Eric Dumazet
f0a7da7020 ipv6: make inet6_fill_ifaddr() lockless
Make inet6_fill_ifaddr() lockless, and add approriate annotations
on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto
and ifa->rt_priority.

Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr()
and inet6_fill_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:35 +00:00
Michal Simek
f010990046 dt-bindings: rtc: zynqmp: Add support for Versal/Versal NET SoCs
Add support for Versal and Versal NET SoCs. Both of them should use the
same IP core but differences can be in integration part that's why create
separate compatible strings.

Also describe optional power-domains property. It is optional because power
domain doesn't need to be onwed by non secure firmware hence no access to
control it via any driver.

Signed-off-by: Michal Simek <michal.simek@amd.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/5ecd775e6083f86aa744c4e9dfb7f6a13082c78a.1709804617.git.michal.simek@amd.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-03-08 12:06:04 +01:00
Ricardo B. Marliere
6b6ca09611 rtc: class: make rtc_class constant
Since commit 43a7206b09 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the rtc_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Link: https://lore.kernel.org/r/20240305-class_cleanup-abelloni-v1-1-944c026137c8@marliere.net
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-03-08 12:05:10 +01:00
Laurent Pinchart
32a6be0858 dt-bindings: rtc: abx80x: Improve checks on trickle charger constraints
The abracon,tc-diode and abracon,tc-resistor DT properties are only
valid for the ABx0804 and ABx0805. Furthermore, they must both be
present, or neither of them must be specified. Add rules to check this.

The generic abracon,abx08x compatible string doesn't indicate which chip
variant is used, but performs auto-detection at runtime. It must this
also allow the two above properties.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240305080944.17991-1-laurent.pinchart@ideasonboard.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-03-08 12:03:18 +01:00
Lukas Bulwahn
1e60ac6b8b MAINTAINERS: adjust file entry in ARM/Mediatek RTC DRIVER
Commit e8c0498505 ("dt-bindings: rtc: convert MT2717 RTC to the
json-schema") and commit aef3952ec1 ("dt-bindings: rtc: convert MT7622
RTC to the json-schema") convert rtc-mt{2712,7622}.txt to
mediatek,mt{2712,7622}-rtc.yaml, but misses to adjust the file entries in
MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.

Repair these file entries in ARM/Mediatek RTC DRIVER.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20240301145907.32732-1-lukas.bulwahn@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-03-08 12:02:49 +01:00
Alexandre Belloni
babfeb9cbe rtc: nct3018y: fix possible NULL dereference
alarm_enable and alarm_flag are allowed to be NULL but will be dereferenced
later by the dev_dbg call.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202305180042.DEzW1pSd-lkp@intel.com/
Link: https://lore.kernel.org/r/20240229222127.1878176-1-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-03-08 12:01:21 +01:00
David S. Miller
3dbf6d67f2 ipsec-next-2024-03-06
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmXoQdQACgkQrB3Eaf9P
 W7dnTQ//RnTEaOPgTsHzhSwVOfWhsWkHx2xqUAlPNY8W2jrzxGgAIknPzobivvRJ
 U2bYPXDocDHUJAHIELUlu+lzATEz8baBN5zK5a+pPx5hXJlf5UI95linNZ5rEIiV
 RoxLpicnJqtWn1oMZ8d7Y0CknsLR/f4ruiVApzoifk1JaXC/zX8FcqqKsSPwVlqA
 GKy4+f71rNrIE9fbBAqDpmt6RuyRp/5yXPHLBoZlEXfYrYU1JOG8b/HLtGMD0SzV
 yHbDcgRPtbkWgAwNO/zxSDKa+PZr7NbVgakDzyHK+TltpU+6cOsajCaSXHWwsTBB
 +AebDschYY1H49oQe4bwLbNdGY+4lFvXxtk02sa8eM5a104MWxxTEB1QGAEri6gQ
 biAh3xTTbDpls26qkm97iZ6LlDE6pVIzF744buOYedvR8gjjoLt1z1PId05wMYGB
 A/4P6WkM8I1CZL++ODVfT8qR2N6lwFAQ6AM/eqHLvc6QpZ5Hm3lQAdLz1tK6QlCP
 MIV9uuNz8dFPrX1QifmLGojjdedB+4ASglxffOaoqRpHnMgHgzWTOux8tSFpuJGu
 mIYO/Dv5sHMdH8Jm+xXX1549bRzR+KGuqjXPxOSiO1jbOb5VC5ZDd3LVWb7fpDid
 K4eaU4Bo4R3eiCo1Bapt/1jKV1YFuyBKqTvObCDslVuN3Fu9d7I=
 =e4aa
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Introduce forwarding of ICMP Error messages. That is specified
   in RFC 4301 but was never implemented. From Antony Antony.

2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init()
   and xfrm_policy_init(). From Kunwu Chan.

3) Do not allocate stats in the xfrm interface driver, this can be done
   on net core now. From Breno Leitao.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:56:05 +00:00
David S. Miller
7cf497e5a1 Merge branch 'nexthop-group-stats'
Petr Machata says:

====================
Support for nexthop group statistics

ECMP is a fundamental component in L3 designs. However, it's fragile. Many
factors influence whether an ECMP group will operate as intended: hash
policy (i.e. the set of fields that contribute to ECMP hash calculation),
neighbor validity, hash seed (which might lead to polarization) or the type
of ECMP group used (hash-threshold or resilient).

At the same time, collection of statistics that would help an operator
determine that the group performs as desired, is difficult.

A solution that we present in this patchset is to add counters to next hop
group entries. For SW-datapath deployments, this will on its own allow
collection and evaluation of relevant statistics. For HW-datapath
deployments, we further add a way to request that HW counters be installed
for a given group, in-kernel interfaces to collect the HW statistics, and
netlink interfaces to query them.

For example:

    # ip nexthop replace id 4000 group 4001/4002 hw_stats on

    # ip -s -d nexthop show id 4000
    id 4000 group 4001/4002 scope global proto unspec offload hw_stats on used on
      stats:
        id 4001 packets 5002 packets_hw 5000
        id 4002 packets 4999 packets_hw 4999

The point of the patchset is visibility of ECMP balance, and that is
influenced by packet headers, not their payload. Correspondingly, we only
include packet counters in the statistics, not byte counters.

We also decided to model HW statistics as a nexthop group attribute, not an
arbitrary nexthop one. The latter would count any traffic going through a
given nexthop, regardless of which ECMP group it is in, or any at all. The
reason is again hat the point of the patchset is ECMP balance visibility,
not arbitrary inspection of how busy a particular nexthop is.
Implementation of individual-nexthop statistics is certainly possible, and
could well follow the general approach we are taking in this patchset.
For resilient groups, per-bucket statistics could be done in a similar
manner as well.

This patchset contains the core code. mlxsw support will be sent in a
follow-up patch set.

This patchset progresses as follows:

- Patches #1 and #2 add support for a new next-hop object attribute,
  NHA_OP_FLAGS. That is meant to carry various op-specific signaling, in
  particular whether SW- and HW-collected nexthop stats should be part of
  the get or dump response. The idea is to avoid wasting message space, and
  time for collection of HW statistics, when the values are not needed.

- Patches #3 and #4 add SW-datapath stats and corresponding UAPI.

- Patches #5, #6 and #7 add support fro HW-datapath stats and UAPI.
  Individual drivers still need to contribute the appropriate HW-specific
  support code.

v4:
- Patch #2:
    - s/nla_get_bitfield32/nla_get_u32/ in __nh_valid_dump_req().

v3:
- Patch #3:
    - Convert to u64_stats_t
- Patch #4:
    - Give a symbolic name to the set of all valid dump flags
      for the NHA_OP_FLAGS attribute.
    - Convert to u64_stats_t
- Patch #6:
    - Use a named constant for the NHA_HW_STATS_ENABLE policy.

v2:
- Patch #2:
    - Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK
- Patch #3:
    - Set err on nexthop_create_group() error path
- Patch #4:
    - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS
    - Rename jump target in nla_put_nh_group_stats() to avoid
      having to rename further in the patchset.
- Patch #7:
    - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW
    - Do not cancel outside of nesting in nla_put_nh_group_stats()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:48 +00:00
Ido Schimmel
5072ae00ae net: nexthop: Expose nexthop group HW stats to user space
Add netlink support for reading NH group hardware stats.

Stats collection is done through a new notifier,
NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for
a given NH group are thereby asked to collect the stats and report back to
core by calling nh_grp_hw_stats_report_delta(). This is similar to what
netdevice L3 stats do.

Besides exposing number of packets that passed in the HW datapath, also
include information on whether any driver actually realizes the counters.
The core can tell based on whether it got any _report_delta() reports from
the drivers. This allows enabling the statistics at the group at any time,
with drivers opting into supporting them. This is also in line with what
netdevice L3 stats are doing.

So as not to waste time and space, tie the collection and reporting of HW
stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS.

Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
746c19a52e net: nexthop: Add ability to enable / disable hardware statistics
Add netlink support for enabling collection of HW statistics on nexthop
groups.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
5877786fcf net: nexthop: Add hardware statistics notifications
Add hw_stats field to several notifier structures to communicate to the
drivers that HW statistics should be configured for nexthops within a given
group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
95fedd7685 net: nexthop: Expose nexthop group stats to user space
Add netlink support for reading NH group stats.

This data is only for statistics of the traffic in the SW datapath. HW
nexthop group statistics will be added in the following patches.

Emission of the stats is keyed to a new op_stats flag to avoid cluttering
the netlink message with stats if the user doesn't need them:
NHA_OP_FLAG_DUMP_STATS.

Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
f4676ea74b net: nexthop: Add nexthop group entry stats
Add nexthop group entry stats to count the number of packets forwarded
via each nexthop in the group. The stats will be exposed to user space
for better data path observability in the next patch.

The per-CPU stats pointer is placed at the beginning of 'struct
nh_grp_entry', so that all the fields accessed for the data path reside
on the same cache line:

struct nh_grp_entry {
        struct nexthop *           nh;                   /*     0     8 */
        struct nh_grp_entry_stats * stats;               /*     8     8 */
        u8                         weight;               /*    16     1 */

        /* XXX 7 bytes hole, try to pack */

        union {
                struct {
                        atomic_t   upper_bound;          /*    24     4 */
                } hthr;                                  /*    24     4 */
                struct {
                        struct list_head uw_nh_entry;    /*    24    16 */
                        u16        count_buckets;        /*    40     2 */
                        u16        wants_buckets;        /*    42     2 */
                } res;                                   /*    24    24 */
        };                                               /*    24    24 */
        struct list_head           nh_list;              /*    48    16 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct nexthop *           nh_parent;            /*    64     8 */

        /* size: 72, cachelines: 2, members: 6 */
        /* sum members: 65, holes: 1, sum holes: 7 */
        /* last cacheline: 8 bytes */
};

Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:46 +00:00
Petr Machata
a207eab103 net: nexthop: Add NHA_OP_FLAGS
In order to add per-nexthop statistics, but still not increase netlink
message size for consumers that do not care about them, there needs to be a
toggle through which the user indicates their desire to get the statistics.
To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to
use the attribute for carrying of arbitrary operation-specific flags, i.e.
not make it specific for get / dump.

Add the new attribute to get and dump policies, but do not actually allow
any flags yet -- those will come later as the flags themselves are defined.
Add the necessary parsing code.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:46 +00:00
Petr Machata
2118f9390d net: nexthop: Adjust netlink policy parsing for a new attribute
A following patch will introduce a new attribute, op-specific flags to
adjust the behavior of an operation. Different operations will recognize
different flags.

- To make the differentiation possible, stop sharing the policies for get
  and del operations.

- To allow querying for presence of the attribute, have all the attribute
  arrays sized to NHA_MAX, regardless of what is permitted by policy, and
  pass the corresponding value to nlmsg_parse() as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:46 +00:00
Sai Krishna
3b43f19d06 octeontx2-pf: Add TC flower offload support for TCP flags
This patch adds TC offload support for matching TCP flags
from TCP header.

Example usage:
tc qdisc add dev eth0 ingress

TC rule to drop the TCP SYN packets:
tc filter add dev eth0 ingress protocol ip flower ip_proto tcp tcp_flags
0x02/0x3f skip_sw action drop

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:26:52 +00:00
fuyuanli
caabd859c4 tcp: Add skb addr and sock addr to arguments of tracepoint tcp_probe.
It is useful to expose skb addr and sock addr to user in tracepoint
tcp_probe, so that we can get more information while monitoring
receiving of tcp data, by ebpf or other ways.

For example, we need to identify a packet by seq and end_seq when
calculate transmit latency between layer 2 and layer 4 by ebpf, but which is
not available in tcp_probe, so we can only use kprobe hooking
tcp_rcv_established to get them. But we can use tcp_probe directly if skb
addr and sock addr are available, which is more efficient.

Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:25:47 +00:00
Jakub Kicinski
6025b9135f net: dqs: add NIC stall detector based on BQL
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?

Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.

We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.

This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).

The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.

To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.

Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.

We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:23:26 +00:00
Colin Ian King
9b78bbef51 net: chelsio: remove unused function calc_tx_descs
The inlined helper function calc_tx_descs is not used and is redundant.
Remove it.

Cleans up clang scan build warning:
drivers/net/ethernet/chelsio/cxgb4/sge.c:814:28: warning: unused
function 'calc_tx_descs' [-Wunused-function]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:19:35 +00:00
Kévin L'hôpital
4469c0c5b1 net: phy: fix phy_get_internal_delay accessing an empty array
The phy_get_internal_delay function could try to access to an empty
array in the case that the driver is calling phy_get_internal_delay
without defining delay_values and rx-internal-delay-ps or
tx-internal-delay-ps is defined to 0 in the device-tree.
This will lead to "unable to handle kernel NULL pointer dereference at
virtual address 0". To avoid this kernel oops, the test should be delay
>= 0. As there is already delay < 0 test just before, the test could
only be size == 0.

Fixes: 92252eec91 ("net: phy: Add a helper to return the index for of the internal delay")
Co-developed-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
Signed-off-by: Kévin L'hôpital <kevin.lhopital@savoirfairelinux.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:18:33 +00:00
Sunil Goutham
fc1b2901e0 octeontx2-af: Fix devlink params
Devlink param for adjusting NPC MCAM high zone
area is in wrong param list and is not getting
activated on CN10KA silicon.
That patch fixes this issue.

Fixes: dd78428786 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:17:07 +00:00
Eric Dumazet
b0ec2abf98 net: ip_tunnel: make sure to pull inner header in ip_tunnel_rcv()
Apply the same fix than ones found in :

8d975c15c0 ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
1ca1ba465e ("geneve: make sure to pull inner header in geneve_rx()")

We have to save skb->network_header in a temporary variable
in order to be able to recompute the network_header pointer
after a pskb_inet_may_pull() call.

pskb_inet_may_pull() makes sure the needed headers are in skb->head.

syzbot reported:
BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
 BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
 BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
 BUG: KMSAN: uninit-value in ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409
  __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
  INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
  IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
  ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409
  __ipgre_rcv+0x9bc/0xbc0 net/ipv4/ip_gre.c:389
  ipgre_rcv net/ipv4/ip_gre.c:411 [inline]
  gre_rcv+0x423/0x19f0 net/ipv4/ip_gre.c:447
  gre_rcv+0x2a4/0x390 net/ipv4/gre_demux.c:163
  ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
  ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
  dst_input include/net/dst.h:461 [inline]
  ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
  __netif_receive_skb_one_core net/core/dev.c:5534 [inline]
  __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
  netif_receive_skb_internal net/core/dev.c:5734 [inline]
  netif_receive_skb+0x58/0x660 net/core/dev.c:5793
  tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1556
  tun_get_user+0x53b9/0x66e0 drivers/net/tun.c:2009
  tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055
  call_write_iter include/linux/fs.h:2087 [inline]
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0xb6b/0x1520 fs/read_write.c:590
  ksys_write+0x20f/0x4c0 fs/read_write.c:643
  __do_sys_write fs/read_write.c:655 [inline]
  __se_sys_write fs/read_write.c:652 [inline]
  __x64_sys_write+0x93/0xd0 fs/read_write.c:652
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was created at:
  __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590
  alloc_pages_mpol+0x62b/0x9d0 mm/mempolicy.c:2133
  alloc_pages+0x1be/0x1e0 mm/mempolicy.c:2204
  skb_page_frag_refill+0x2bf/0x7c0 net/core/sock.c:2909
  tun_build_skb drivers/net/tun.c:1686 [inline]
  tun_get_user+0xe0a/0x66e0 drivers/net/tun.c:1826
  tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055
  call_write_iter include/linux/fs.h:2087 [inline]
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0xb6b/0x1520 fs/read_write.c:590
  ksys_write+0x20f/0x4c0 fs/read_write.c:643
  __do_sys_write fs/read_write.c:655 [inline]
  __se_sys_write fs/read_write.c:652 [inline]
  __x64_sys_write+0x93/0xd0 fs/read_write.c:652
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:14:15 +00:00
Shiming Cheng
c4386ab4f6 ipv6: fib6_rules: flush route cache when rule is changed
When rule policy is changed, ipv6 socket cache is not refreshed.
The sock's skb still uses a outdated route cache and was sent to
a wrong interface.

To avoid this error we should update fib node's version when
rule is changed. Then skb's route will be reroute checked as
route cache version is already different with fib node version.
The route cache is refreshed to match the latest rule.

Fixes: 101367c2f8 ("[IPV6]: Policy Routing Rules")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:10:35 +00:00
Alexander Sverdlin
8636f19c2d gpio: sysfs: repair export returning -EPERM on 1st attempt
It would make sense to return -EPERM if the bit was already set (already
used), not if it was cleared. Before this fix pins can only be exported on
the 2nd attempt:

$ echo 522 > /sys/class/gpio/export
sh: write error: Operation not permitted
$ echo 522 > /sys/class/gpio/export

Fixes: 35b545332b ("gpio: remove gpio_lock")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-03-08 10:32:00 +01:00