Commit Graph

22457 Commits

Author SHA1 Message Date
Ming Lei
caf84294ff selftests: ublk: fix user_data truncation for tgt_data >= 256
The build_user_data() function packs multiple fields into a __u64
value using bit shifts. Without explicit __u64 casts before shifting,
the shift operations are performed on 32-bit unsigned integers before
being promoted to 64-bit, causing data loss.

Specifically, when tgt_data >= 256, the expression (tgt_data << 24)
shifts on a 32-bit value, truncating the upper 8 bits before promotion
to __u64. Since tgt_data can be up to 16 bits (assertion allows up to
65535), values >= 256 would have their high byte lost.

Add explicit __u64 casts to both op and tgt_data before shifting to
ensure the shift operations happen in 64-bit space, preserving all
bits of the input values.

user_data_to_tgt_data() is only used by stripe.c, in which the max
supported member disks are 4, so won't trigger this issue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22 20:05:41 -07:00
Danielle Ratson
d07d7c3dd9 selftests: net: Add kernel selftest for RFC 4884
RFC 4884 extended certain ICMP messages with a length attribute that
encodes the length of the "original datagram" field. This is needed so
that new information could be appended to these messages without
applications thinking that it is part of the "original datagram" field.

In version 5.9, the kernel was extended with two new socket options
(SOL_IP/IP_RECVERR_4884 and SOL_IPV6/IPV6_RECVERR_RFC4884) that allow
user space to retrieve this length which is basically the offset to the
ICMP Extension Structure at the end of the ICMP message. This is
required by user space applications that need to parse the information
contained in the ICMP Extension Structure. For example, the RFC 5837
extension for tracepath.

Add a selftest that verifies correct handling of the RFC 4884 length
field for both IPv4 and IPv6, with and without extension structures,
and validates that malformed extensions are correctly reported as invalid.

For each address family, the test creates:
  - a raw socket used to send locally crafted ICMP error packets to the
    loopback address, and
  - a datagram socket used to receive the encapsulated original datagram
    and associated error metadata from the kernel error queue.

ICMP packets are constructed entirely in user space rather than relying
on kernel-generated errors. This allows the test to exercise invalid
scenarios (such as corrupted checksums and incorrect length fields) and
verify that the SO_EE_RFC4884_FLAG_INVALID flag is set as expected.

Output Example:

$ ./icmp_rfc4884
Starting 18 tests from 18 test cases.
  RUN           rfc4884.ipv4_ext_small_payload.rfc4884 ...
            OK  rfc4884.ipv4_ext_small_payload.rfc4884
ok 1 rfc4884.ipv4_ext_small_payload.rfc4884
  RUN           rfc4884.ipv4_ext.rfc4884 ...
            OK  rfc4884.ipv4_ext.rfc4884
ok 2 rfc4884.ipv4_ext.rfc4884
  RUN           rfc4884.ipv4_ext_large_payload.rfc4884 ...
            OK  rfc4884.ipv4_ext_large_payload.rfc4884
ok 3 rfc4884.ipv4_ext_large_payload.rfc4884
  RUN           rfc4884.ipv4_no_ext_small_payload.rfc4884 ...
            OK  rfc4884.ipv4_no_ext_small_payload.rfc4884
ok 4 rfc4884.ipv4_no_ext_small_payload.rfc4884
  RUN           rfc4884.ipv4_no_ext_min_payload.rfc4884 ...
            OK  rfc4884.ipv4_no_ext_min_payload.rfc4884
ok 5 rfc4884.ipv4_no_ext_min_payload.rfc4884
  RUN           rfc4884.ipv4_no_ext_large_payload.rfc4884 ...
            OK  rfc4884.ipv4_no_ext_large_payload.rfc4884
ok 6 rfc4884.ipv4_no_ext_large_payload.rfc4884
  RUN           rfc4884.ipv4_invalid_ext_checksum.rfc4884 ...
            OK  rfc4884.ipv4_invalid_ext_checksum.rfc4884
ok 7 rfc4884.ipv4_invalid_ext_checksum.rfc4884
  RUN           rfc4884.ipv4_invalid_ext_length_small.rfc4884 ...
            OK  rfc4884.ipv4_invalid_ext_length_small.rfc4884
ok 8 rfc4884.ipv4_invalid_ext_length_small.rfc4884
  RUN           rfc4884.ipv4_invalid_ext_length_large.rfc4884 ...
            OK  rfc4884.ipv4_invalid_ext_length_large.rfc4884
ok 9 rfc4884.ipv4_invalid_ext_length_large.rfc4884
  RUN           rfc4884.ipv6_ext_small_payload.rfc4884 ...
            OK  rfc4884.ipv6_ext_small_payload.rfc4884
ok 10 rfc4884.ipv6_ext_small_payload.rfc4884
  RUN           rfc4884.ipv6_ext.rfc4884 ...
            OK  rfc4884.ipv6_ext.rfc4884
ok 11 rfc4884.ipv6_ext.rfc4884
  RUN           rfc4884.ipv6_ext_large_payload.rfc4884 ...
            OK  rfc4884.ipv6_ext_large_payload.rfc4884
ok 12 rfc4884.ipv6_ext_large_payload.rfc4884
  RUN           rfc4884.ipv6_no_ext_small_payload.rfc4884 ...
            OK  rfc4884.ipv6_no_ext_small_payload.rfc4884
ok 13 rfc4884.ipv6_no_ext_small_payload.rfc4884
  RUN           rfc4884.ipv6_no_ext_min_payload.rfc4884 ...
            OK  rfc4884.ipv6_no_ext_min_payload.rfc4884
ok 14 rfc4884.ipv6_no_ext_min_payload.rfc4884
  RUN           rfc4884.ipv6_no_ext_large_payload.rfc4884 ...
            OK  rfc4884.ipv6_no_ext_large_payload.rfc4884
ok 15 rfc4884.ipv6_no_ext_large_payload.rfc4884
  RUN           rfc4884.ipv6_invalid_ext_checksum.rfc4884 ...
            OK  rfc4884.ipv6_invalid_ext_checksum.rfc4884
ok 16 rfc4884.ipv6_invalid_ext_checksum.rfc4884
  RUN           rfc4884.ipv6_invalid_ext_length_small.rfc4884 ...
            OK  rfc4884.ipv6_invalid_ext_length_small.rfc4884
ok 17 rfc4884.ipv6_invalid_ext_length_small.rfc4884
  RUN           rfc4884.ipv6_invalid_ext_length_large.rfc4884 ...
            OK  rfc4884.ipv6_invalid_ext_length_large.rfc4884
ok 18 rfc4884.ipv6_invalid_ext_length_large.rfc4884
 PASSED: 18 / 18 tests passed.
 Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260121114644.2863640-1-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-22 18:29:49 -08:00
Terry Bowman
0ff60f2ec3 cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c
Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
from the CXL Virtual Hierarchy (VH) handling. This is because of the
differences in the RCH and VH topologies. Improve the maintainability and
add ability to enable/disable RCH handling.

Move and combine the RCH handling code into a single block conditionally
compiled with the CONFIG_CXL_RCH_RAS kernel config.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260114182055.46029-9-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-22 15:07:04 -07:00
Dave Jiang
7ff8b1d608 cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c
Create new config CONFIG_CXL_RAS and put all CXL RAS items behind the
config. The config will depend on CPER and PCIE AER to build. Move the
related VH RAS code from core/pci.c to core/ras.c.

Restricted CXL host (RCH) RAS functions will be moved in a future patch.

Cc: Robert Richter <rrichter@amd.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260114182055.46029-8-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-22 14:58:03 -07:00
Linus Torvalds
0a80e38d0f Including fixes from CAN and wireless.
Pretty big PR, but hard to make up any cohesive story that would
 explain it, a random collection of fixes. The two reverts of bad
 patches from this release here feel like stuff that'd normally
 show up by rc5 or rc6. Perhaps obvious thing to say, given the MW
 timing.
 
 That said, no active investigations / regressions. Let's see what
 the next week brings.
 
 Current release - fix to a fix:
 
  - can: alloc_candev_mqs(): add missing default CAN capabilities
 
 Current release - regressions:
 
  - usbnet: fix crash due to missing BQL accounting after resume
 
  - Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...
 
 Previous releases - regressions:
 
  - Revert "nfc/nci: Add the inconsistency check between the input ...
 
 Previous releases - always broken:
 
  - number of driver fixes for incorrect use of seqlocks on stats
 
  - rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
    when MSG_PEEK was set
 
  - ipvlan: make the addrs_lock be per port avoid races in the port
    hash table
 
  - sched: enforce that teql can only be used as root qdisc
 
  - virtio: coalesce only linear skb
 
  - wifi: ath12k: fix dead lock while flushing management frames
 
  - eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmlyVNQACgkQMUZtbf5S
 IrvoMQ/+N+KikpNLtHFpkqKAsTnNC6gR9yG+NYvUck1Oo+CQQpc/wJ78pS2cyPpM
 siMa7lLzRuCKPYni3RVpG9nqG/jv9HKvFXpgpSb71Evs6syO85kI8Dy4z8G4mClk
 A4QFH6/8V4uZpH/ewy9DTIjxdWPEJA6CuXXnmolTEBhrNkgskv0Sz3jFVTxDjFzN
 xTMk6H2PhgANUsYFree9PQ+gR+QAHKHPvjobWCsepYRHlT4YVi8PrmqoYBvycLwa
 Q//xy6PUOSXZM4sa6YSUMZk2Nou/84BvqouQ05ljaLSZDl4Ecfxl5MDkh/kzZHjR
 wPbMhiZUrJngwwyXQWcSnir1M17IPKjC/FAEDFUFVWDT/wYYHODt9npCRpdglsa5
 SGoy0yDgUN5Lq9Q5dPL0PMLHVLkuDdlzmyva4uso/dQYovXgTi/5Kx2bPc6LLqMJ
 CP5QxjN5OsOHKnGdI/UO28OU5Yn2KRCmJiW9nW3XdxR316lkmo6BWHAxWG9XhbJE
 aCBXu3EoqNSDVuWaZFVQ8g2uTN3uOhnuJqBllhahtENbdqjsat0cxcjYTuUDvuMm
 B1W0My5qq7O6TRjlFl6JfC3k2gqbZ+HlLg3LfmC74bmddOL52up+9HKyeRoWiIaA
 9BQ3DKpjD3VFEbsbBVPfoZ4Ch4jrB+G+ck8f7g/l9BCNM+Ji1Ds=
 =Ehzr
 -----END PGP SIGNATURE-----

Merge tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from CAN and wireless.

  Pretty big, but hard to make up any cohesive story that would explain
  it, a random collection of fixes. The two reverts of bad patches from
  this release here feel like stuff that'd normally show up by rc5 or
  rc6. Perhaps obvious thing to say, given the holiday timing.

  That said, no active investigations / regressions. Let's see what the
  next week brings.

  Current release - fix to a fix:

   - can: alloc_candev_mqs(): add missing default CAN capabilities

  Current release - regressions:

   - usbnet: fix crash due to missing BQL accounting after resume

   - Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...

  Previous releases - regressions:

   - Revert "nfc/nci: Add the inconsistency check between the input ...

  Previous releases - always broken:

   - number of driver fixes for incorrect use of seqlocks on stats

   - rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
     when MSG_PEEK was set

   - ipvlan: make the addrs_lock be per port avoid races in the port
     hash table

   - sched: enforce that teql can only be used as root qdisc

   - virtio: coalesce only linear skb

   - wifi: ath12k: fix dead lock while flushing management frames

   - eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue"

* tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
  Octeontx2-af: Add proper checks for fwdata
  dpll: Prevent duplicate registrations
  net/sched: act_ife: avoid possible NULL deref
  hinic3: Fix netif_queue_set_napi queue_index input parameter error
  vsock/test: add stream TX credit bounds test
  vsock/virtio: cap TX credit to local buffer size
  vsock/test: fix seqpacket message bounds test
  vsock/virtio: fix potential underflow in virtio_transport_get_credit()
  net: fec: account for VLAN header in frame length calculations
  net: openvswitch: fix data race in ovs_vport_get_upcall_stats
  octeontx2-af: Fix error handling
  net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X
  rxrpc: Fix data-race warning and potential load/store tearing
  net: dsa: fix off-by-one in maximum bridge ID determination
  net: bcmasp: Fix network filter wake for asp-3.0
  bonding: provide a net pointer to __skb_flow_dissect()
  selftests: net: amt: wait longer for connection before sending packets
  be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
  Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning"
  netrom: fix double-free in nr_route_frame()
  ...
2026-01-22 09:32:11 -08:00
Melbin K Mathew
2a689f76ed vsock/test: add stream TX credit bounds test
Add a regression test for the TX credit bounds fix. The test verifies
that a sender with a small local buffer size cannot queue excessive
data even when the peer advertises a large receive buffer.

The client:
  - Sets a small buffer size (64 KiB)
  - Connects to server (which advertises 2 MiB buffer)
  - Sends in non-blocking mode until EAGAIN
  - Verifies total queued data is bounded

This guards against the original vulnerability where a remote peer
could cause unbounded kernel memory allocation by advertising a large
buffer and reading slowly.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: use sock_buf_size to check the bytes sent + small fixes]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-5-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-22 15:41:33 +01:00
Stefano Garzarella
0a98de8013 vsock/test: fix seqpacket message bounds test
The test requires the sender (client) to send all messages before waking
up the receiver (server).
Since virtio-vsock had a bug and did not respect the size of the TX
buffer, this test worked, but now that we are going to fix the bug, the
test hangs because the sender would fill the TX buffer before waking up
the receiver.

Set the buffer size in the sender (client) as well, as we already do for
the receiver (server).

Fixes: 5c338112e4 ("test/vsock: rework message bounds test")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-3-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-22 15:41:33 +01:00
Yicong Yang
57a96356bb kselftest/arm64: Add HWCAP test for FEAT_LS64
Add tests for FEAT_LS64. Issue related instructions if feature
presents, no SIGILL should be received. When such instructions
operate on Device memory or non-cacheable memory, we may received
a SIGBUS during the test (w/o FEAT_LS64WB). Just ignore it since
we only tested whether the instruction itself can be issued as
expected on platforms declaring the support of such features.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22 13:25:33 +00:00
Thomas Weißschuh
2a369c4942 kselftest/arm64: Use syscall() macro over nolibc my_syscall()
The my_syscall*() macros are internal implementation details of nolibc.
Nolibc also provides the regular syscall(2), which is also a macro
and directly expands to the correct my_syscall().

Use syscall() instead.

As a side-effect this fixes some return value checks, as my_syscall()
returns the raw value as set by the kernel and does not set errno.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22 11:21:09 +00:00
Peter Zijlstra
bb332a9e5a selftests/rseq: Add rseq slice histogram script
A script that processes trace-cmd data and generates a histogram of
rseq slice_ext durations for the recorded workload.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260121143208.340549136@infradead.org
2026-01-22 11:11:20 +01:00
Thomas Gleixner
830969e782 selftests/rseq: Implement time slice extension test
Provide an initial test case to evaluate the functionality. This needs to be
extended to cover the ABI violations and expose the race condition between
observing granted and arriving in rseq_slice_yield().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251215155709.320325431@linutronix.de
2026-01-22 11:11:19 +01:00
Taehee Yoo
04708606fd selftests: net: amt: wait longer for connection before sending packets
Both send_mcast4() and send_mcast6() use sleep 2 to wait for the tunnel
connection between the gateway and the relay, and for the listener
socket to be created in the LISTENER namespace.

However, tests sometimes fail because packets are sent before the
connection is fully established.

Increase the waiting time to make the tests more reliable, and use
wait_local_port_listen() to explicitly wait for the listener socket.

Fixes: c08e8baea7 ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20260120133930.863845-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-21 19:20:30 -08:00
Andre Carvalho
6ecc08329b selftests: netconsole: validate target resume
Introduce a new netconsole selftest to validate that netconsole is able
to resume a deactivated target when the low level interface comes back.

The test setups the network using netdevsim, creates a netconsole target
and then remove/add netdevsim in order to bring the same interfaces
back. Afterwards, the test validates that the target works as expected.

Targets are created via cmdline parameters to the module to ensure that
we are able to resume targets that were bound by mac and interface name.

Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260118-netcons-retrigger-v11-7-4de36aebcf48@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-21 19:09:12 -08:00
Kery Qi
a32ae26584 selftests/bpf: Fix resource leak in serial_test_wq on attach failure
When wq__attach() fails, serial_test_wq() returns early without calling
wq__destroy(), leaking the skeleton resources allocated by
wq__open_and_load(). This causes ASAN leak reports in selftests runs.

Fix this by jumping to a common clean_up label that calls wq__destroy()
on all exit paths after successful open_and_load.

Note that the early return after wq__open_and_load() failure is correct
and doesn't need fixing, since that function returns NULL on failure
(after internally cleaning up any partial allocations).

Fixes: 8290dba519 ("selftests/bpf: wq: add bpf_wq_start() checks")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com
2026-01-21 16:12:10 -08:00
Ziyu Chen
6d6ad32e22 selftests/pidfd: fix typo in comment
Fix the typo "untill" → "until" in a comment in pidfd_info_test.c.

This typo is already listed in scripts/spelling.txt by commit
66b47b4a9d ("checkpatch: look for common misspellings").

Link: https://lore.kernel.org/r/20260121094147.4187337-1-chenziyu@uniontech.com
Suggested-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Signed-off-by: Ziyu Chen <chenziyu@uniontech.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-21 10:56:25 -07:00
Yuzuki Ishiyama
f4924ad0b1 selftests/bpf: Test kfunc bpf_strncasecmp
Add testsuites for kfunc bpf_strncasecmp.

Signed-off-by: Yuzuki Ishiyama <ishiyama@hpc.is.uec.ac.jp>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20260121033328.1850010-3-ishiyama@hpc.is.uec.ac.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21 09:42:53 -08:00
Menglong Dong
1ed7977643 selftests/bpf: test bpf_get_func_arg() for tp_btf
Test bpf_get_func_arg() and bpf_get_func_arg_cnt() for tp_btf. The code
is most copied from test1 and test2.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260121044348.113201-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21 09:31:35 -08:00
Ming Lei
e7e1cc18f1 selftests/ublk: fix garbage output in foreground mode
Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
when running kublk in foreground mode. Without this, _evtfd is
zero-initialized to 0 (stdin), and ublk_send_dev_event() writes
binary data to stdin which appears as garbage on the terminal.

Also fix debug message format string.

Fixes: 6aecda00b7 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-21 07:54:55 -07:00
Ming Lei
23e62cf755 selftests/ublk: fix error handling for starting device
Fix error handling in ublk_start_daemon() when start_dev fails:

1. Call ublk_ctrl_stop_dev() to cancel inflight uring_cmd before
   cleanup. Without this, the device deletion may hang waiting for
   I/O completion that will never happen.

2. Add fail_start label so that pthread_join() is called on the
   error path. This ensures proper thread cleanup when startup fails.

Fixes: 6aecda00b7 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-21 07:54:55 -07:00
Ming Lei
75aad5ffe0 selftests/ublk: fix IO thread idle check
Include cmd_inflight in ublk_thread_is_done() check. Without this,
the thread may exit before all FETCH commands are completed, which
may cause device deletion to hang.

Fixes: 6aecda00b7 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-21 07:54:55 -07:00
Menglong Dong
4fca95095c selftests/bpf: test the jited inline of bpf_get_current_task
Add the testcase for the jited inline of bpf_get_current_task().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260120070555.233486-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 20:39:01 -08:00
Li RongQing
e700f5d156 watchdog: softlockup: panic when lockup duration exceeds N thresholds
The softlockup_panic sysctl is currently a binary option: panic
immediately or never panic on soft lockups.

Panicking on any soft lockup, regardless of duration, can be overly
aggressive for brief stalls that may be caused by legitimate operations. 
Conversely, never panicking may allow severe system hangs to persist
undetected.

Extend softlockup_panic to accept an integer threshold, allowing the
kernel to panic only when the normalized lockup duration exceeds N
watchdog threshold periods.  This provides finer-grained control to
distinguish between transient delays and persistent system failures.

The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic when duration >= 1 * threshold (20s default, original behavior)
- N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)

The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility while allowing systems to tolerate brief lockups
while still catching severe, persistent hangs.

[lirongqing@baidu.com: v2]
  Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com
Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:20 -08:00
Chunyu Hu
6319c4f442 selftests/mm: fix comment for check_test_requirements
The test supports arm64 as well so the comment is incorrect.  And there's
a check for arm64 in va_high_addr_switch.c.

Link: https://lkml.kernel.org/r/20251221040025.3159990-5-chuhu@redhat.com
Fixes: 983e760bcd ("selftest/mm: va_high_addr_switch: add ppc64 support check")
Fixes: f556acc2fa ("selftests/mm: skip test for non-LPA2 and non-LVA systems")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:52 -08:00
Chunyu Hu
dd0202a0bd selftests/mm: va_high_addr_switch return fail when either test failed
When the first test failed, and the hugetlb test passed, the result would
be pass, but we expect a fail.  Fix this issue by returning fail if either
is not KSFT_PASS.

Link: https://lkml.kernel.org/r/20251221040025.3159990-4-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:52 -08:00
Chunyu Hu
7544d7969d selftests/mm: remove arm64 nr_hugepages setup for va_high_addr_switch test
arm64 and x86_64 has the same nr_hugepages requriement for running the
va_high_addr_switch test.  Since commit d9d957bd7b ("selftests/mm: alloc
hugepages in va_high_addr_switch test"), the setup can be done in
va_high_addr_switch.sh.  So remove the duplicated setup.

Link: https://lkml.kernel.org/r/20251221040025.3159990-3-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:52 -08:00
Chunyu Hu
b1f031e33c selftests/mm: allocate 6 hugepages in va_high_addr_switch.sh
The va_high_addr_switch test requires 6 hugepages, not 5. If running the
test directly by: ./va_high_addr_switch.sh, the test will hit a mmap 'FAIL'
caused by not enough hugepages:

  mmap(addr_switch_hint - hugepagesize, 2*hugepagesize, MAP_HUGETLB): 0x7f330f800000 - OK
  mmap(addr_switch_hint , 2*hugepagesize, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED

The failure can't be hit if run the tests by running 'run_vmtests.sh -t
hugevm' because the nr_hugepages is set to 128 at the beginning of
run_vmtests.sh and va_high_addr_switch.sh skip the setup of nr_hugepages
because already enough.

Link: https://lkml.kernel.org/r/20251221040025.3159990-2-chuhu@redhat.com
Fixes: d9d957bd7b ("selftests/mm: alloc hugepages in va_high_addr_switch test")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:51 -08:00
Chunyu Hu
b47beff129 selftests/mm: fix va_high_addr_switch.sh return value
Patch series "Fix va_high_addr_switch.sh test failure - again", v2.

The series address several issues exist for the va_high_addr_switch test:
1) the test return value is ignored in va_high_addr_switch.sh.
2) the va_high_addr_switch test requires 6 hugepages not 5.
3) the reurn value of the first test in va_high_addr_switch.c can be
   overridden by the second test.
4) the nr_hugepages setup in run_vmtests.sh for arm64 can be done in
   va_high_addr_switch.sh too.
5) update a comment for check_test_requirements.


This patch: (of 5)

The return value should be return value of va_high_addr_switch, otherwise
a test failure would be silently ignored.

Link: https://lkml.kernel.org/r/20251221040025.3159990-1-chuhu@redhat.com
Fixes: d9d957bd7b ("selftests/mm: alloc hugepages in va_high_addr_switch test")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:51 -08:00
Li Wang
b618876f2e selftests/mm/charge_reserved_hugetlb.sh: add waits with timeout helper
The hugetlb cgroup usage wait loops in charge_reserved_hugetlb.sh were
unbounded and could hang forever if the expected cgroup file value never
appears (e.g.  due to write_to_hugetlbfs in Error mapping).

=== Error log ===
  # uname -r
  6.12.0-xxx.el10.aarch64+64k

  # ls /sys/kernel/mm/hugepages/hugepages-*
  hugepages-16777216kB/  hugepages-2048kB/  hugepages-524288kB/

  #./charge_reserved_hugetlb.sh -cgroup-v2
  # -----------------------------------------
  ...
  # nr hugepages = 10
  # writing cgroup limit: 5368709120
  # writing reseravation limit: 5368709120
  ...
  # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  ...

Introduce a small helper, wait_for_file_value(), and use it for:
  - waiting for reservation usage to drop to 0,
  - waiting for reservation usage to reach a given size,
  - waiting for fault usage to reach a given size.

This makes the waits consistent and adds a hard timeout (60 tries with 1s
sleep) so the test fails instead of stalling indefinitely.

Link: https://lkml.kernel.org/r/20251221122639.3168038-4-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:51 -08:00
Li Wang
1aa1dd9cc5 selftests/mm/charge_reserved_hugetlb: drop mount size for hugetlbfs
charge_reserved_hugetlb.sh mounts a hugetlbfs instance at /mnt/huge with a
fixed size of 256M.  On systems with large base hugepages (e.g.  512MB),
this is smaller than a single hugepage, so the hugetlbfs mount ends up
with zero capacity (often visible as size=0 in mount output).

As a result, write_to_hugetlbfs fails with ENOMEM and the test can hang
waiting for progress.

=== Error log ===
  # uname -r
  6.12.0-xxx.el10.aarch64+64k

  #./charge_reserved_hugetlb.sh -cgroup-v2
  # -----------------------------------------
  ...
  # nr hugepages = 10
  # writing cgroup limit: 5368709120
  # writing reseravation limit: 5368709120
  ...
  # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  # Waiting for hugetlb memory reservation to reach size 2684354560.
  # 0
  ...

  # mount |grep /mnt/huge
  none on /mnt/huge type hugetlbfs (rw,relatime,seclabel,pagesize=512M,size=0)

  # grep -i huge /proc/meminfo
  ...
  HugePages_Total:      10
  HugePages_Free:       10
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:     524288 kB
  Hugetlb:         5242880 kB

Drop the mount args with 'size=256M', so the filesystem capacity is sufficient
regardless of HugeTLB page size.

Link: https://lkml.kernel.org/r/20251221122639.3168038-3-liwang@redhat.com
Fixes: 29750f71a9 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:51 -08:00
Li Wang
8e46adb62f selftests/mm/write_to_hugetlbfs: parse -s as size_t
Patch series "selftests/mm: hugetlb cgroup charging: robustness fixes", v3.

This series fixes a few issues in the hugetlb cgroup charging selftests
(write_to_hugetlbfs.c + charge_reserved_hugetlb.sh) that show up on
systems with large hugepages (e.g.  512MB) and when failures cause the
test to wait indefinitely.

On an aarch64 64k page kernel with 512MB hugepages, the test consistently
fails in write_to_hugetlbfs with ENOMEM and then hangs waiting for the
expected usage values.  The root cause is that charge_reserved_hugetlb.sh
mounts hugetlbfs with a fixed size=256M, which is smaller than a single
hugepage, resulting in a mount with size=0 capacity.

In addition, write_to_hugetlbfs previously parsed -s via atoi() into an
int, which can overflow and print negative sizes.

Reproducer / environment:
  - Kernel: 6.12.0-xxx.el10.aarch64+64k
  - Hugepagesize: 524288 kB (512MB)
  - ./charge_reserved_hugetlb.sh -cgroup-v2
  - Observed mount: pagesize=512M,size=0 before this series

After applying the series, the test completes successfully on the above
setup.


This patch (of 3):

write_to_hugetlbfs currently parses the -s size argument with atoi() into
an int.  This silently accepts malformed input, cannot report overflow,
and can truncate large sizes.

=== Error log ===
 # uname -r
 6.12.0-xxx.el10.aarch64+64k

 # ls /sys/kernel/mm/hugepages/hugepages-*
 hugepages-16777216kB/  hugepages-2048kB/  hugepages-524288kB/

 #./charge_reserved_hugetlb.sh -cgroup-v2
 # -----------------------------------------
 ...
 # nr hugepages = 10
 # writing cgroup limit: 5368709120
 # writing reseravation limit: 5368709120
 ...
 # Writing to this path: /mnt/huge/test
 # Writing this size: -1610612736        <--------

Switch the size variable to size_t and parse -s with sscanf("%zu", ...). 
Also print the size using %zu.

This avoids incorrect behavior with large -s values and makes the utility
more robust.

Link: https://lkml.kernel.org/r/20251221122639.3168038-1-liwang@redhat.com
Link: https://lkml.kernel.org/r/20251221122639.3168038-2-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:50 -08:00
Audra Mitchell
a98ec863fd lib/test_vmalloc.c: minor fixes to test_vmalloc.c
If PAGE_SIZE is larger than 4k and if you have a system with a large
number of CPUs, this test can require a very large amount of memory
leading to oom-killer firing.  Given the type of allocation, the kernel
won't have anything to kill, causing the system to stall.

Add a parameter to the test_vmalloc driver to represent the number of
times a percpu object will be allocated.  Calculate this in
test_vmalloc.sh to be 90% of available memory or the current default of
35000, whichever is smaller.

Link: https://lkml.kernel.org/r/20251201181848.1216197-1-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:47 -08:00
Jakub Kicinski
1802e9079f selftests: drv-net: fix missing include in ncdevmem
Commit ca9d74eb5f ("uapi: add INT_MAX and INT_MIN constants")
recently removed some includes of limits.h in uAPI headers.
ncdevmem.c was depending on them:

  ncdevmem.c: In function ‘ethtool_add_flow’:
  ncdevmem.c:369:60: error: ‘INT_MAX’ undeclared (first use in this function)
  369 |         if (endptr == id_start || flow_id < 0 || flow_id > INT_MAX)
      |                                                            ^~~~~~~
  ncdevmem.c:77:1: note: ‘INT_MAX’ is defined in header ‘<limits.h>’; did you forget to ‘#include <limits.h>’?

Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20260120180319.1673271-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:26:26 -08:00
Vadim Fedorenko
49743f2726 selftests: drv-net: extend HW timestamp test with ioctl
Extend HW timestamp tests to check that ioctl interface is not broken
and configuration setups and requests are equal to netlink interface.
Some linter warnings are disabled because of ctypes classes.

Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260116062121.1230184-2-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:21:27 -08:00
Jakub Kicinski
677a51790b Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux
Pavel Begunkov says:

====================
Add support for providers with large rx buffer

Many modern NICs support configurable receive buffer lengths, and zcrx and
memory providers can use buffers larger than 4K to improve performance.
When paired with hw-gro larger rx buffer sizes can drastically reduce
the number of buffers traversing the stack and save a lot of processing
time. It also allows to give to users larger contiguous chunks of data.

Single stream benchmarks showed up to ~30% CPU util improvement.
E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC:

packets=23987040 (MB=2745098), rps=199559 (MB/s=22837)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    1.53    0.00   27.78    2.72    1.31   66.45    0.22
packets=24078368 (MB=2755550), rps=200319 (MB/s=22924)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    0.69    0.00    8.26   31.65    1.83   57.00    0.57

This series adds net infrastructure for memory providers configuring
the size and implements it for bnxt. It's an opt-in feature for drivers,
they should advertise support for the parameter in the qops and must check
if the hardware supports the given size. It's limited to memory providers
as it drastically simplifies implementation. It doesn't affect the fast
path zcrx uAPI, and the user exposed parameter is defined in zcrx terms,
which allows it to be flexible and adjusted in the future.

A liburing example can be found at [2]

full branch:
[1] https://github.com/isilence/linux.git zcrx/large-buffers-v8
Liburing example:
[2] https://github.com/isilence/liburing.git zcrx/rx-buf-len

* tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux:
  io_uring/zcrx: document area chunking parameter
  selftests: iou-zcrx: test large chunk sizes
  eth: bnxt: support qcfg provided rx page size
  eth: bnxt: adjust the fill level of agg queues with larger buffers
  eth: bnxt: store rx buffer size per queue
  net: pass queue rx page size from memory provider
  net: add bare bone queue configs
  net: reduce indent of struct netdev_queue_mgmt_ops members
  net: memzero mp params when closing a queue
====================

Link: https://patch.msgid.link/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:10:04 -08:00
Jakub Kicinski
8766d61a1d Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'"
This reverts commit 77b9c4a438, reversing
changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497:

 931420a2fc ("selftests/net: Add netkit container tests")
 ab771c938d ("selftests/net: Make NetDrvContEnv support queue leasing")
 6be87fbb27 ("selftests/net: Add env for container based tests")
 61d99ce3df ("selftests/net: Add bpf skb forwarding program")
 920da36341 ("netkit: Add xsk support for af_xdp applications")
 eef51113f8 ("netkit: Add netkit notifier to check for unregistering devices")
 b5ef109d22 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create")
 b5c3fa4a0b ("netkit: Add single device mode for netkit")
 0073d2fd67 ("xsk: Proxy pool management for leased queues")
 1ecea95dd3 ("xsk: Extend xsk_rcv_check validation")
 804bf334d0 ("net: Proxy netdev_queue_get_dma_dev for leased queues")
 0caa9a8dde ("net: Proxy net_mp_{open,close}_rxq for leased queues")
 ff8889ff91 ("net, ethtool: Disallow leased real rxqs to be resized")
 9e2103f361 ("net: Add lease info to queue-get response")
 31127dedde ("net: Implement netdev_nl_queue_create_doit")
 a5546e18f7 ("net: Add queue-create operation")

The series will conflict with io_uring work, and the code needs more
polish.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20 18:06:01 -08:00
Matt Bobrowski
dd341eacdb selftests/bpf: update verifier test for default trusted pointer semantics
Replace the verifier test for default trusted pointer semantics, which
previously relied on BPF kfunc bpf_get_root_mem_cgroup(), with a new
test utilizing dedicated BPF kfuncs defined within the bpf_testmod.

bpf_get_root_mem_cgroup() was modified such that it again relies on
KF_ACQUIRE semantics, therefore no longer making it a suitable
candidate to test BPF verifier default trusted pointer semantics
against.

Link: https://lore.kernel.org/bpf/20260113083949.2502978-2-mattbobrowski@google.com
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20260120091630.3420452-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 17:11:24 -08:00
Yazhou Tang
c9e440bf25 selftests/bpf: Add tests for BPF_DIV and BPF_MOD range tracking
Now BPF_DIV has range tracking support via interval analysis. This patch
adds selftests to cover various cases of BPF_DIV and BPF_MOD operations
when the divisor is a constant, also covering both signed and unsigned variants.

This patch includes several types of tests in 32-bit and 64-bit variants:

1. For UDIV
   - positive divisor
   - zero divisor

2. For SDIV
   - positive divisor, positive dividend
   - positive divisor, negative dividend
   - positive divisor, mixed sign dividend
   - negative divisor, positive dividend
   - negative divisor, negative dividend
   - negative divisor, mixed sign dividend
   - zero divisor
   - overflow (SIGNED_MIN/-1), normal dividend
   - overflow (SIGNED_MIN/-1), constant dividend

3. For UMOD
   - positive divisor
   - positive divisor, small dividend
   - zero divisor

4. For SMOD
   - positive divisor, positive dividend
   - positive divisor, negative dividend
   - positive divisor, mixed sign dividend
   - positive divisor, mixed sign dividend, small dividend
   - negative divisor, positive dividend
   - negative divisor, negative dividend
   - negative divisor, mixed sign dividend
   - negative divisor, mixed sign dividend, small dividend
   - zero divisor
   - overflow (SIGNED_MIN/-1), normal dividend
   - overflow (SIGNED_MIN/-1), constant dividend

Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely analyze the result of BPF_DIV/BPF_MOD
instruction, it can prune the path that leads to an error (here we use
invalid memory access as the error case), allowing the program to pass
verification.

Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Link: https://lore.kernel.org/r/20260119085458.182221-3-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:41:54 -08:00
Yazhou Tang
44fdd581d2 bpf: Add range tracking for BPF_DIV and BPF_MOD
This patch implements range tracking (interval analysis) for BPF_DIV and
BPF_MOD operations when the divisor is a constant, covering both signed
and unsigned variants.

While LLVM typically optimizes integer division and modulo by constants
into multiplication and shift sequences, this optimization is less
effective for the BPF target when dealing with 64-bit arithmetic.

Currently, the verifier does not track bounds for scalar division or
modulo, treating the result as "unbounded". This leads to false positive
rejections for safe code patterns.

For example, the following code (compiled with -O2):

```c
int test(struct pt_regs *ctx) {
    char buffer[6] = {1};
    __u64 x = bpf_ktime_get_ns();
    __u64 res = x % sizeof(buffer);
    char value = buffer[res];
    bpf_printk("res = %llu, val = %d", res, value);
    return 0;
}
```

Generates a raw `BPF_MOD64` instruction:

```asm
;     __u64 res = x % sizeof(buffer);
       1:	97 00 00 00 06 00 00 00	r0 %= 0x6
;     char value = buffer[res];
       2:	18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00	r1 = 0x0 ll
       4:	0f 01 00 00 00 00 00 00	r1 += r0
       5:	91 14 00 00 00 00 00 00	r4 = *(s8 *)(r1 + 0x0)
```

Without this patch, the verifier fails with "math between map_value
pointer and register with unbounded min value is not allowed" because
it cannot deduce that `r0` is within [0, 5].

According to the BPF instruction set[1], the instruction's offset field
(`insn->off`) is used to distinguish between signed (`off == 1`) and
unsigned division (`off == 0`). Moreover, we also follow the BPF division
and modulo runtime behavior (semantics) to handle special cases, such as
division by zero and signed division overflow.

- UDIV: dst = (src != 0) ? (dst / src) : 0
- SDIV: dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst / src))
- UMOD: dst = (src != 0) ? (dst % src) : dst
- SMOD: dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src))

Here is the overview of the changes made in this patch (See the code comments
for more details and examples):

1. For BPF_DIV: Firstly check whether the divisor is zero. If so, set the
   destination register to zero (matching runtime behavior).

   For non-zero constant divisors: goto `scalar(32)?_min_max_(u|s)div` functions.
   - General cases: compute the new range by dividing max_dividend and
     min_dividend by the constant divisor.
   - Overflow case (SIGNED_MIN / -1) in signed division: mark the result
     as unbounded if the dividend is not a single number.

2. For BPF_MOD: Firstly check whether the divisor is zero. If so, leave the
   destination register unchanged (matching runtime behavior).

   For non-zero constant divisors: goto `scalar(32)?_min_max_(u|s)mod` functions.
   - General case: For signed modulo, the result's sign matches the
     dividend's sign. And the result's absolute value is strictly bounded
     by `min(abs(dividend), abs(divisor) - 1)`.
     - Special care is taken when the divisor is SIGNED_MIN. By casting
       to unsigned before negation and subtracting 1, we avoid signed
       overflow and correctly calculate the maximum possible magnitude
       (`res_max_abs` in the code).
   - "Small dividend" case: If the dividend is already within the possible
     result range (e.g., [-2, 5] % 10), the operation is an identity
     function, and the destination register remains unchanged.

3. In `scalar(32)?_min_max_(u|s)(div|mod)` functions: After updating current
   range, reset other ranges and tnum to unbounded/unknown.

   e.g., in `scalar_min_max_sdiv`, signed 64-bit range is updated. Then reset
   unsigned 64-bit range and 32-bit range to unbounded, and tnum to unknown.

   Exception: in BPF_MOD's "small dividend" case, since the result remains
   unchanged, we do not reset other ranges/tnum.

4. Also updated existing selftests based on the expected BPF_DIV and
   BPF_MOD behavior.

[1] https://www.kernel.org/doc/Documentation/bpf/standardization/instruction-set.rst

Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20260119085458.182221-2-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:41:53 -08:00
Ihor Solodrai
bd06b977e0 selftests/bpf: Migrate struct_ops_assoc test to KF_IMPLICIT_ARGS
A test kfunc named bpf_kfunc_multi_st_ops_test_1_impl() is a user of
__prog suffix. Subsequent patch removes __prog support in favor of
KF_IMPLICIT_ARGS, so migrate this kfunc to use implicit argument.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-12-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:22:38 -08:00
Ihor Solodrai
d806f31012 bpf: Migrate bpf_stream_vprintk() to KF_IMPLICIT_ARGS
Implement bpf_stream_vprintk with an implicit bpf_prog_aux argument,
and remote bpf_stream_vprintk_impl from the kernel.

Update the selftests to use the new API with implicit argument.

bpf_stream_vprintk macro is changed to use the new bpf_stream_vprintk
kfunc, and the extern definition of bpf_stream_vprintk_impl is
replaced accordingly.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-11-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:22:38 -08:00
Ihor Solodrai
6e663ffdf7 bpf: Migrate bpf_task_work_schedule_* kfuncs to KF_IMPLICIT_ARGS
Implement bpf_task_work_schedule_* with an implicit bpf_prog_aux
argument, and remove corresponding _impl funcs from the kernel.

Update special kfunc checks in the verifier accordingly.

Update the selftests to use the new API with implicit argument.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:22:20 -08:00
Ihor Solodrai
8157cc739a HID: Use bpf_wq_set_callback kernel function
Remove extern declaration of bpf_wq_set_callback_impl() from
hid_bpf_helpers.h and replace bpf_wq_set_callback macro with a
corresponding new declaration.

Tested with:
  # append tools/testing/selftests/hid/config and build the kernel
  $ make -C tools/testing/selftests/hid
  # in built kernel
  $ ./tools/testing/selftests/hid/hid_bpf -t test_multiply_events_wq

  TAP version 13
  1..1
  # Starting 1 tests from 1 test cases.
  #  RUN           hid_bpf.test_multiply_events_wq ...
  [    2.575520] hid-generic 0003:0001:0A36.0001: hidraw0: USB HID v0.00 Device [test-uhid-device-138] on 138
  #            OK  hid_bpf.test_multiply_events_wq
  ok 1 hid_bpf.test_multiply_events_wq
  # PASSED: 1 / 1 tests passed.
  # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
  PASS

Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-9-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:15:57 -08:00
Ihor Solodrai
b97931a25a bpf: Migrate bpf_wq_set_callback_impl() to KF_IMPLICIT_ARGS
Implement bpf_wq_set_callback() with an implicit bpf_prog_aux
argument, and remove bpf_wq_set_callback_impl().

Update special kfunc checks in the verifier accordingly.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-8-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:15:57 -08:00
Ihor Solodrai
e939f3d16d selftests/bpf: Add tests for KF_IMPLICIT_ARGS
Add trivial end-to-end tests to validate that KF_IMPLICIT_ARGS flag is
properly handled by both resolve_btfids and the verifier.

Declare kfuncs in bpf_testmod. Check that bpf_prog_aux pointer is set
in the kfunc implementation. Verify that calls with implicit args and
a legacy case all work.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:15:57 -08:00
Gyutae Bae
2e6690d4f7 selftests/bpf: Add perfbuf multi-producer benchmark
Add a multi-producer benchmark for perfbuf to complement the existing
ringbuf multi-producer test. Unlike ringbuf which uses a shared buffer
and experiences contention, perfbuf uses per-CPU buffers so the test
measures scaling behavior rather than contention.

This allows developers to compare perfbuf vs ringbuf performance under
multi-producer workloads when choosing between the two for their systems.

Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260120090716.82927-1-gyutae.opensource@navercorp.com
2026-01-20 11:37:25 -08:00
Ryota Sakamoto
db0c35ca36 kunit: add bash completion
Currently, kunit.py has many subcommands and options, making it difficult
to remember them without checking the help message.

Add --list-cmds and --list-opts to kunit.py to get available commands and
options, use those outputs in kunit-completion.sh to show completion.

This implementation is similar to perf and tools/perf/perf-completion.sh.

Example output:
  $ source tools/testing/kunit/kunit-completion.sh
  $ ./tools/testing/kunit/kunit.py [TAB][TAB]
  build   config  exec    parse   run
  $ ./tools/testing/kunit/kunit.py run --k[TAB][TAB]
  --kconfig_add  --kernel_args  --kunitconfig

Link: https://lore.kernel.org/r/20260117-kunit-completion-v2-1-cabd127d0801@gmail.com
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-20 11:17:52 -07:00
Michal Luczaj
5d54aa40c7 vsock/test: Do not filter kallsyms by symbol type
Blamed commit implemented logic to discover available vsock transports by
grepping /proc/kallsyms for known symbols. It incorrectly filtered entries
by type 'd'.

For some kernel configs having

    CONFIG_VIRTIO_VSOCKETS=m
    CONFIG_VSOCKETS_LOOPBACK=y

kallsyms reports

    0000000000000000 d virtio_transport	[vmw_vsock_virtio_transport]
    0000000000000000 t loopback_transport

Overzealous filtering might have affected vsock test suit, resulting in
insufficient/misleading testing.

Do not filter symbols by type. It never helped much.

Fixes: 3070c05b7a ("vsock/test: Introduce get_transports()")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260116-vsock_test-kallsyms-grep-v1-1-3320bc3346f2@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20 16:45:39 +01:00
David Wei
931420a2fc selftests/net: Add netkit container tests
Add two tests using NetDrvContEnv. One basic test that sets up a netkit
pair, with one end in a netns. Use LOCAL_PREFIX_V6 and nk_forward BPF
program to ping from a remote host to the netkit in netns.

Second is a selftest for netkit queue leasing, using io_uring zero copy
test binary inside of a netns with netkit. This checks that memory
providers can be bound against virtual queues in a netkit within a
netns that are leasing from a physical netdev in the default netns.

Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-17-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20 11:58:50 +01:00
David Wei
ab771c938d selftests/net: Make NetDrvContEnv support queue leasing
Add a new parameter `lease` to NetDrvContEnv that sets up queue leasing
in the env.

The NETIF also has some ethtool parameters changed to support memory
provider tests. This is needed in NetDrvContEnv rather than individual
test cases since the cleanup to restore NETIF can't be done, until the
netns in the env is gone.

Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-16-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20 11:58:50 +01:00
David Wei
6be87fbb27 selftests/net: Add env for container based tests
Add an env NetDrvContEnv for container based selftests. This automates
the setup of a netns, netkit pair with one inside the netns, and a BPF
program that forwards skbs from the NETIF host inside the container.

Currently only netkit is used, but other virtual netdevs e.g. veth can
be used too.

Expect netkit container datapath selftests to have a publicly routable
IP prefix to assign to netkit in a container, such that packets will
land on eth0. The BPF skb forward program will then forward such packets
from the host netns to the container netns.

Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-15-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20 11:58:50 +01:00
David Wei
61d99ce3df selftests/net: Add bpf skb forwarding program
Add nk_forward.bpf.c, a BPF program that forwards skbs matching some IPv6
prefix received on eth0 ifindex to a specified netkit ifindex. This will
be needed by netkit container tests.

Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-14-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20 11:58:50 +01:00
Victor Nogueira
2460f31e6e selftests/tc-testing: Try to add teql as a child qdisc
Add a selftest that attempts to add a teql qdisc as a qfq child.
Since teql _must_ be added as a root qdisc, the kernel should reject
this.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260114160243.913069-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19 12:06:42 -08:00
Christophe Leroy (CS GROUP)
d321d505ed selftests: net: csum: Fix printk format in recv_get_packet_csum_status()
Following warning is encountered when building selftests on powerpc/32.

  CC       csum
csum.c: In function 'recv_get_packet_csum_status':
csum.c:710:50: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
  710 |                         error(1, 0, "cmsg: len=%lu expected=%lu",
      |                                                ~~^
      |                                                  |
      |                                                  long unsigned int
      |                                                %u
  711 |                               cm->cmsg_len, CMSG_LEN(sizeof(struct tpacket_auxdata)));
      |                               ~~~~~~~~~~~~
      |                                 |
      |                                 size_t {aka unsigned int}
csum.c:710:63: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
  710 |                         error(1, 0, "cmsg: len=%lu expected=%lu",
      |                                                             ~~^
      |                                                               |
      |                                                               long unsigned int
      |                                                             %u

cm->cmsg_len has type __kernel_size_t and CMSG() macro has the type
returned by sizeof() which is size_t.

size_t is 'unsigned int' on some platforms and 'unsigned long' on
other ones so use %zu instead of %lu.

The code in question was introduced by
commit 91a7de8560 ("selftests/net: add csum offload test").

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/8b69b40826553c1dd500d9d25e45883744f3f348.1768556791.git.chleroy@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19 10:09:47 -08:00
Dmitry Skorodumov
8becfe16e4 selftests: net: simple selftest for ipvtap
This is a simple ipvtap test to test handling
IP-address add/remove on ipvlan interface.

It creates a veth-interface and then creates several
network-namespace with ipvlan0 interface in it linked to veth.

Then it starts to add/remove addresses on ipvlan0 interfaces
in several threads.

At finish, it checks that there is no duplicated addresses.

Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry@huawei.com>
Link: https://patch.msgid.link/20260112142417.4039566-3-skorodumov.dmitry@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19 10:03:31 -08:00
David Matlack
1c588bca3b vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU
Drop the assertions about IOMMU mappings sizes for VFIO_TYPE1_IOMMU
modes (both the VFIO mode and the iommufd compatibility mode). These
assertions fail when CONFIG_IOMMUFD_VFIO_CONTAINER is enabled, since
iommufd compatibility mode provides different huge page behavior than
VFIO for VFIO_TYPE1_IOMMU. VFIO_TYPE1_IOMMU is an old enough interface
that it's not worth changing the behavior of VFIO and iommufd to match
nor care about the IOMMU mapping sizes.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: https://lore.kernel.org/kvm/20260109143830.176dc279@shazbot.org/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114211252.2581145-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:06:30 -07:00
Alex Mastro
080723f4d4 vfio: selftests: Add vfio_dma_mapping_mmio_test
Test IOMMU mapping the BAR mmaps created during vfio_pci_device_setup().

All IOMMU modes are tested: vfio_type1 variants are expected to succeed,
while non-type1 modes are expected to fail. iommufd compat mode can be
updated to expect success once kernel support lands. Native iommufd will
not support mapping vaddrs backed by MMIO (it will support dma-buf based
MMIO mapping instead).

Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114-map-mmio-test-v3-3-44e036d95e64@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:06:30 -07:00
Alex Mastro
557dbdf6c4 vfio: selftests: Align BAR mmaps for efficient IOMMU mapping
Update vfio_pci_bar_map() to align BAR mmaps for efficient huge page
mappings. The manual mmap alignment can be removed once mmap(!MAP_FIXED)
on vfio device fds improves to automatically return well-aligned
addresses.

Also add MADV_HUGEPAGE, which encourages the kernel to use huge pages
(e.g. when /sys/kernel/mm/transparent_hugepage/enabled is set to "madvise").

Drop MAP_FILE from mmap(). It is an ignored compatibility flag.

Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114-map-mmio-test-v3-2-44e036d95e64@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:06:30 -07:00
Alex Mastro
03b7c2d763 vfio: selftests: Centralize IOMMU mode name definitions
Replace scattered string literals with MODE_* macros in iommu.h. This
provides a single source of truth for IOMMU mode name strings.

Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114-map-mmio-test-v3-1-44e036d95e64@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:06:29 -07:00
Christian Brauner
f93bc86982 selftests: add dm-verity keyring selftests
Add selftests that verify the keyring behaves correctly.
For simplicity this works with dm-verity as a module.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-19 15:21:26 +01:00
UYeol Jo
59cac9d52b selftests/x86: Clean up sysret_rip coding style
Tidy up sysret_rip style (cast spacing, main(void), const placement).
No functional change intended.

Signed-off-by: UYeol Jo <jouyeol8739@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260111210126.74752-1-jouyeol8739@gmail.com
2026-01-19 12:06:11 +01:00
LeeYongjun
1deecf7805 selftests: ALSA: Remove unused variable in utimer-test
The variable 'i' in wrong_timers_test() is declared but never used.
This was detected by Cppcheck static analysis.

tools/testing/selftests/alsa/utimer-test.c:144:9: style: Unused variable: i [unusedVariable]

Remove it to clean up the code and silence the warning.

Signed-off-by: LeeYongjun <jun85566@gmail.com>
Link: https://patch.msgid.link/20260118065510.29644-1-jun85566@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-01-19 09:12:59 +01:00
Linus Torvalds
90a855e75a Landlock fix for v6.19-rc6
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCaWledRAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSBowBAID4pRQB8EKRNaoqadUWnoSE/wl929Y6KY7i
 FBf8aODOAP9lOtU/CjL5jHKjt00zKW5gX3o0LIuFzSeuzCFTVx2MBA==
 =Aef0
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
 "This fixes TCP handling, tests, documentation, non-audit elided code,
  and minor cosmetic changes"

* tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Clarify documentation for the IOCTL access right
  selftests/landlock: Properly close a file descriptor
  landlock: Improve the comment for domain_is_scoped
  selftests/landlock: Use scoped_base_variants.h for ptrace_test
  selftests/landlock: Fix missing semicolon
  selftests/landlock: Fix typo in fs_test
  landlock: Optimize stack usage when !CONFIG_AUDIT
  landlock: Fix spelling
  landlock: Clean up hook_ptrace_access_check()
  landlock: Improve erratum documentation
  landlock: Remove useless include
  landlock: Fix wrong type usage
  selftests/landlock: NULL-terminate unix pathname addresses
  selftests/landlock: Remove invalid unix socket bind()
  selftests/landlock: Add missing connect(minimal AF_UNSPEC) test
  selftests/landlock: Fix TCP bind(AF_UNSPEC) test case
  landlock: Fix TCP handling of short AF_UNSPEC addresses
  landlock: Fix formatting
2026-01-18 15:15:47 -08:00
Linus Torvalds
e503f539dc Misc x86 fixes:
- Fix resctrl initialization on Hygon CPUs
 
  - Fix resctrl memory bandwidth counters
    on Hygon CPUs
 
  - Fix x86 self-tests build bug
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmlssTsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gpPBAAlLehAo2UyjalUQ03wakg0vt1hVQ71MG8
 W4xze/rk3vK8xxGwdaWbLnxZxsN7diY6mZt+3JZPfifMQh4UDhHoPSf9seanW2FU
 PndGS6w3IDDiUIAz0JBGdJraDdZyHLLfg1G7+Yp8+avHmKOBIf4NjULYdj+Og/WP
 205kXXhjZ6PaKTzzwkrzm5efZM23bDkx5jGVLxAs9plv7g9C+qRlSkbhL8tUsRSK
 twAx2L25/Sa0y1q9dHDGVoHEWDJNkCw+so36wn6BdYIovXhc9sYpLowF+HqMCpWB
 nsf6D87H7mrtJoSwF34rc3GqN2VkKUQqGM7J8kp2Yi+pB+DImaCVwjG6FNJJIDrt
 CweXSemVhegbfwLJEEflUTaVA/nOsCP/k+sYP8pvOE9TsVuT0QlUj1Ao7vbKEbZu
 TG0Ujzwgm1Y6j7vTxiSmHdQm6ff2xO+1acOt1f3wB7esA+Kpa2i5App1FU/PVFc6
 vO4qWOaYVg5EzUZWPWnpDWRX0nvfjklyx0SzBhhwKOnRjXupsYmEWwnHqpLyBbGf
 7+MfAANqtAzK2b71lZXCZ2yDfmR0jiFLOKhO8pmJueP2kuaxqe8zqisICuLsrV11
 93xw1V/YMdeMAbcQvo+Mkwe5aCfgcM9EOAJcXZHC+nVDXHCC5CiLAgQWpkPXlJGf
 mCXp1h5IEBU=
 =wTUl
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix resctrl initialization on Hygon CPUs

 - Fix resctrl memory bandwidth counters on Hygon CPUs

 - Fix x86 self-tests build bug

* tag 'x86-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86: Add selftests include path for kselftest.h after centralization
  x86/resctrl: Fix memory bandwidth counter width for Hygon
  x86/resctrl: Add missing resctrl initialization for Hygon
2026-01-18 11:34:11 -08:00
Yohei Kojima
342e31254f selftests: net: improve error handling in passive TFO test
Improve the error handling in passive TFO test to check the return value
from sendto(), and to fail if read() or fprintf() failed.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/24707c8133f7095c0e5a94afa69e75c3a80bf6e7.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17 18:01:26 -08:00
Yohei Kojima
52b4859730 selftests: net: fix passive TFO test to fail if child processes failed
Improve the passive TFO test to report failure if the server or the
client timed out or exited with non-zero status.

Before this commit, TFO test didn't fail even if exit(EXIT_FAILURE) is
added to the first line of the run_server() and run_client() functions.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/214d399caec2e5de7738ced5736829915d507e4e.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17 18:01:26 -08:00
Kees Cook
a120a832e3 lkdtm/bugs: Add __counted_by_ptr() test PTR_BOUNDS
Provide run-time validation of the __counted_by_ptr() annotation via
newly added PTR_BOUNDS LKDTM test.

Link: https://patch.msgid.link/20251020220118.1226740-2-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-17 11:00:37 -08:00
Thomas Weißschuh
d045e166d3 selftests: vDSO: getrandom: Fix path to s390 chacha implementation
The s390 vDSO source directory was recently moved,
but this reference was not updated.

Fixes: c0087d807a ("s390/vdso: Rename vdso64 to vdso")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-01-17 15:51:28 +01:00
Bala-Vignesh-Reddy
d9b40d7262 selftests/x86: Add selftests include path for kselftest.h after centralization
The previous change centralizing kselftest.h include path in lib.mk caused x86
selftests to fail, as x86 Makefile overwrites CFLAGS using ":=", dropping the
include path added in lib.mk. Therefore, helpers.h could not find kselftest.h
during compilation.

Fix this by adding the tools/testing/sefltest to CFLAGS in x86 Makefile.

  [ bp: Correct commit ID in Fixes: ]

Fixes: e6fbd1759c ("selftests: complete kselftest include centralization")
Closes: https://lore.kernel.org/lkml/CA+G9fYvKjQcCBMfXA-z2YuL2L+3Qd-pJjEUDX8PDdz2-EEQd=Q@mail.gmail.com/T/#m83fd330231287fc9d6c921155bee16c591db7360
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251022062948.162852-1-reddybalavignesh9979@gmail.com
2026-01-17 12:22:27 +01:00
Yonghong Song
efad162f5a selftests/bpf: Fix map_kptr test failure
On my arm64 machine, I get the following failure:
  ...
  tester_init:PASS:tester_log_buf 0 nsec
  process_subtest:PASS:obj_open_mem 0 nsec
  process_subtest:PASS:specs_alloc 0 nsec
  serial_test_map_kptr:PASS:rcu_tasks_trace_gp__open_and_load 0 nsec
  ...
  test_map_kptr_success:PASS:map_kptr__open_and_load 0 nsec
  test_map_kptr_success:PASS:test_map_kptr_ref1 refcount 0 nsec
  test_map_kptr_success:FAIL:test_map_kptr_ref1 retval unexpected error: 2 (errno 2)
  test_map_kptr_success:PASS:test_map_kptr_ref2 refcount 0 nsec
  test_map_kptr_success:FAIL:test_map_kptr_ref2 retval unexpected error: 1 (errno 2)
  ...
  #201/21  map_kptr/success-map:FAIL

In serial_test_map_kptr(), before test_map_kptr_success(), one
kern_sync_rcu() is used to have some delay for freeing the map.
But in my environment, one kern_sync_rcu() seems not enough and
caused the test failure.

In bpf_map_free_in_work() in syscall.c, the queue time for
  queue_work(system_dfl_wq, &map->work)
may be longer than expected. This may cause the test failure
since test_map_kptr_success() expects all previous maps having been freed.

Since it is not clear how long queue_work() time takes, a bpf prog
is added to count the reference after bpf_kfunc_call_test_acquire().
If the number of references is 2 (for initial ref and the one just
acquired), all previous maps should have been released. This will
resolve the above 'retval unexpected error' issue.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20260116052245.3692405-1-yonghong.song@linux.dev
2026-01-16 14:59:57 -08:00
Alan Maguire
47d440d0a5 selftests/bpf: Support when CONFIG_VXLAN=m
If CONFIG_VXLAN is 'm', struct vxlanhdr will not be in vmlinux.h.
Add a ___local variant to support cases where vxlan is a module.

Fixes: 8517b1abe5 ("selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260115163457.146267-1-alan.maguire@oracle.com
2026-01-16 14:54:10 -08:00
H. Peter Anvin
db7855c96d x86/entry/vdso/selftest: Update location of vgetrandom-chacha.S
As part of the vdso build restructuring, vgetrandom-chacha.S moved
into the vdso/vdso64 subdirectory. Update the selftest #include to
match.

Closes: https://lore.kernel.org/oe-lkp/202601161608.5cd5af9a-lkp@intel.com
Fixes: 693c819fed ("x86/entry/vdso: Refactor the vdso build")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260116204057.386268-4-hpa@zytor.com
2026-01-16 13:25:44 -08:00
Linus Torvalds
6782a30d20 cxl fixes for v6.19-rc6
- Recognize all ZONE_DEVICE users as physaddr consumers
 - Fix format string for extended_linear_cache_size_show()
 - Fix target list setup for multiple decoders sharing the same
   downstream port
 - Restore HBIW check before derefernce platform data
 - Fix potential infinite loop in __cxl_dpa_reserve()
 - Check for invalid addresses returned from translation functions on
   error.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmlqdOUACgkQYGjFFmlT
 OEp6bA/8C5uSMf4BMN+gJMerX1wTcVfmDy3nYyk80ZM4Gw2QqxrhtRzG3bfeYaXm
 IioxYVMWscqasN16xuX7uje2sSDvoPTZYJ2tfnzKfAbyReknlolXvjWJtgoSuVKj
 IGZDGLjuW0CcTpFW8NgNGiiSExnv6Iap7CnzRTXj72ZQkGcb8Goh4JL/CmAWrF8m
 HQDbFZWQHI60rOLgxTXgWzlNKej4/tZfCTRKV1KIhqRv+MCurzNPb62H+giCwGDh
 M+apYESCW3xCk2B7RbGmc6U+rcSmCNc3O31JL3Yt9TKPp4V5N0kaH8dd1196GnQX
 HOJFlDjV5J/HTJjk4vondyAcIcUaSw3T7IJLtBRbXOBrs1zQ76adV96UNSwSEQC4
 KaRjRf0uCsdDvwBw44VbxnWJiH6Y6jrMhKGPyCrXxFikX2EIR5I1l7+ytKDTNC/+
 wdAf3+ch3qc6PfTBtooAWzji9e8JFfWpby87UHkWSz0eeKedJQnMbKtbloc6tHLc
 fO/jiUbhay5QGfnvdPwL31zTGqvZB9eGUR2oBSasRJ7taWvURN2ibAzhAnTY8UTN
 EQrTZ7isXPfxHwEkaGtpFNWts0iX8kJYzl/wjZjWtukltT4NmMy7NfMlkecXWQ5N
 iueUMr6tHKYeDpmmFbdC+Y83LFSWv3i9TiINF1vMWuRj8MryRVU=
 =pIv9
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) fixes from Dave Jiang:

 - Recognize all ZONE_DEVICE users as physaddr consumers

 - Fix format string for extended_linear_cache_size_show()

 - Fix target list setup for multiple decoders sharing the same
   downstream port

 - Restore HBIW check before derefernce platform data

 - Fix potential infinite loop in __cxl_dpa_reserve()

 - Check for invalid addresses returned from translation functions on
   error

* tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl: Check for invalid addresses returned from translation functions on errors
  cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve()
  cxl/acpi: Restore HBIW check before dereferencing platform_data
  cxl/port: Fix target list setup for multiple decoders sharing the same dport
  cxl/region: fix format string for resource_size_t
  x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
2026-01-16 13:09:28 -08:00
Ricardo B. Marlière
3ec6cefc39 selftests/run_kselftest.sh: Add --skip argument option
Currently the only way of excluding certain tests from a collection is by
passing all the other tests explicitly via `--test`. Therefore, if the user
wants to skip a single test the resulting command line might be too big,
depending on the collection. Add an option `--skip` that takes care of
that.

Link: https://lore.kernel.org/r/20260116-selftests-add_skip_opt-v1-1-ab54afaae81b@suse.com
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-16 13:46:40 -07:00
Christian Brauner
b8f7622aa6
selftests/open_tree: add OPEN_TREE_NAMESPACE tests
Add tests for OPEN_TREE_NAMESPACE.

Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-2-bfb24c7b061f@kernel.org
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:21:40 +01:00
Puranjay Mohan
086c99fbe4 selftests: bpf: Add test for multiple syncs from linked register
Before the last commit, sync_linked_regs() corrupted the register whose
bounds are being updated by copying known_reg's id to it. The ids are
the same in value but known_reg has the BPF_ADD_CONST flag which is
wrongly copied to reg.

This later causes issues when creating new links to this reg.
assign_scalar_id_before_mov() sees this BPF_ADD_CONST and gives a new id
to this register and breaks the old links. This is exposed by the added
selftest.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260115151143.1344724-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-16 10:08:59 -08:00
Sean Christopherson
a91cc48246 KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
Update the nested dirty log test to validate KVM's handling of READ faults
when dirty logging is enabled.  Specifically, set the Dirty bit in the
guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
when handling L2 read faults.  When handling read faults in the shadow MMU,
KVM opportunistically creates a writable SPTE if the mapping can be
writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
if KVM doesn't need to intercept writes in order to emulate Dirty-bit
updates.

To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
sequences to separate L1 pages, and differentiate between "marked dirty
due to a WRITE access/fault" and "marked dirty due to creating a writable
SPTE for a READ access/fault".  The updated sequence exposes the bug fixed
by KVM commit 1f4e5fc83a ("KVM: x86: fix nested guest live migration
with PML") when the guest performs a READ=>WRITE sequence with dirty guest
PTEs.

Opportunistically tweak and rename the address macros, and add comments,
to make it more obvious what the test is doing.  E.g. NESTED_TEST_MEM1
vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
creating aliases in both the L2 GPA and GVA address spaces, but only when
L1 is using TDP to run L2.

Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260115172154.709024-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-16 07:48:54 -08:00
Ricardo B. Marlière
4f5f148dd7 selftests: net: fib-onlink-tests: Convert to use namespaces by default
Currently, the test breaks if the SUT already has a default route
configured for IPv6. Fix by avoiding the use of the default namespace.

Fixes: 4ed591c8ab ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260113-selftests-net-fib-onlink-v2-1-89de2b931389@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15 19:58:21 -08:00
Michal Luczaj
a63e5fe095 vsock/test: Add test for a linear and non-linear skb getting coalesced
Loopback transport can mangle data in rx queue when a linear skb is
followed by a small MSG_ZEROCOPY packet.

To exercise the logic, send out two packets: a weirdly sized one (to ensure
some spare tail room in the skb) and a zerocopy one that's small enough to
fit in the spare room of its predecessor. Then, wait for both to land in
the rx queue, and check the data received. Faulty packets merger manifests
itself by corrupting payload of the later packet.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260113-vsock-recv-coalescence-v2-2-552b17837cf4@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15 19:44:44 -08:00
Jakub Kicinski
c27022497d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.19-rc6).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15 18:02:48 -08:00
Jiri Olsa
934d9746ed selftests/bpf: Add test for bpf_override_return helper
We do not actually test the bpf_override_return helper functionality
itself at the moment, only the bpf program being able to attach it.

Adding test that override prctl syscall return value on top of
kprobe and kprobe.multi.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20260112121157.854473-2-jolsa@kernel.org
2026-01-15 16:15:26 -08:00
Linus Torvalds
13b2d15d99 32 hotfixes. 16 are cc:stable, 24 are for MM.
- four kerneldoc fixes from Bagas Sanjaya
 
 - four DAMON fixes from SeongJae
 
 - four mremap VMA-related fixes from Lorenzo
 
 - various singletons - please see the changelogs for details
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaWkP3gAKCRDdBJ7gKXxA
 jnmTAQCBiCyw7Pvvak5OM8Of0qQ8m/jQmNMBPRrd5M3tAAEOlwD9F7E6RjcmyItU
 k+sboDgNKTvgdRHTmRzj1t96aXJrSQQ=
 =OxTh
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2026-01-15-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:

 - kerneldoc fixes from Bagas Sanjaya

 - DAMON fixes from SeongJae

 - mremap VMA-related fixes from Lorenzo

 - various singletons - please see the changelogs for details

* tag 'mm-hotfixes-stable-2026-01-15-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (30 commits)
  drivers/dax: add some missing kerneldoc comment fields for struct dev_dax
  mm: numa,memblock: include <asm/numa.h> for 'numa_nodes_parsed'
  mailmap: add entry for Daniel Thompson
  tools/testing/selftests: fix gup_longterm for unknown fs
  mm/page_alloc: prevent pcp corruption with SMP=n
  iommu/sva: include mmu_notifier.h header
  mm: kmsan: fix poisoning of high-order non-compound pages
  tools/testing/selftests: add forked (un)/faulted VMA merge tests
  mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
  tools/testing/selftests: add tests for !tgt, src mremap() merges
  mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
  mm/zswap: fix error pointer free in zswap_cpu_comp_prepare()
  mm/damon/sysfs-scheme: cleanup access_pattern subdirs on scheme dir setup failure
  mm/damon/sysfs-scheme: cleanup quotas subdirs on scheme dir setup failure
  mm/damon/sysfs: cleanup attrs subdirs on context dir setup failure
  mm/damon/sysfs: cleanup intervals subdirs on attrs dir setup failure
  mm/damon/core: remove call_control in inactive contexts
  powerpc/watchdog: add support for hardlockup_sys_info sysctl
  mips: fix HIGHMEM initialization
  mm/hugetlb: ignore hugepage kernel args if hugepages are unsupported
  ...
2026-01-15 10:47:14 -08:00
Linus Torvalds
9e995c573b Including fixes from bluetooth, can and IPsec.
Current release - regressions:
 
   - net: add net.core.qdisc_max_burst
 
   - can: propagate CAN device capabilities via ml_priv
 
 Previous releases - regressions:
 
   - dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()
 
   - ipv6: fix use-after-free in inet6_addr_del().
 
   - xfrm: fix inner mode lookup in tunnel mode GSO segmentation
 
   - ip_tunnel: spread netdev_lockdep_set_classes()
 
   - ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv()
 
   - bluetooth: hci_sync: enable PA sync lost event
 
   - eth: virtio-net:
     - fix the deadlock when disabling rx NAPI
     - fix misalignment bug in struct virtnet_info
 
 Previous releases - always broken:
 
   - ipv4: ip_gre: make ipgre_header() robust
 
   - can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit.
 
   - eth: mlx5e: profile change fix
 
   - eth: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
 
   - eth: macvlan: fix possible UAF in macvlan_forward_source()
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmlo9uMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOk4iAP/i3I0frORQbcM/NI8sqjQsc7w84SbBB9
 K3FyfA+pfmS+sa5eZBan/eEe3tSQM5xn6XBflZXVxTC2QBxfiH3OVPWGconA7WRk
 KIizSl340vtMhBMlUlXyox8BG7nn+pxjI3BqB5aTHWvEolS9eUgxtJJEDC1MnRzw
 ZkarPXmNTLK6pDWAZ/CYjjBVnaUOt1V2ceRFfRU1SK/ROG7ajpfVLrSXZ4gICc/8
 +br2g4p8hZx6sDYTY4lV/fMMzbbu3n8rRSoqgy8UDAfDJgglifvm4Ocu+EW5mLLG
 IlNobJghtCUcbQN4X8GPg+6cW1yA94qeyHZSHD0Zg3lN7Hfa92WnhVwoqqjCgETC
 ppXnj0zRZN81TeIQx9UIJLXxBZOt+3SSpvdaYdruKg2Sx0rsKraWLjXlResiehzw
 DFZb893SFkJgPEVCcndzGi2I5/MCX3skNQq8+Od73Xcjbl35qiXafGa856R2Z5Ne
 nSEE6BdnYvQAuCwCJ01nFJLQw3ot1H1TO2hF9QEDQ7K42zp2Yj5EKVcYQLAzlAAk
 rzb3D9VIpEpSaXTIpJwa+fZUnpe649J+Rgg71fdj4FJUoNqYErvHQ6QzblmhdThr
 efdpHg1PN3EvpVao27eSiDLl7DGOL8kDmXDpvIhFctKMVhw8mFhdFDmWenGTRQaT
 u2hT+oC19rEG
 =zUPN
 -----END PGP SIGNATURE-----

Merge tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth, can and IPsec.

  Current release - regressions:

   - net: add net.core.qdisc_max_burst

   - can: propagate CAN device capabilities via ml_priv

  Previous releases - regressions:

   - dst: fix races in rt6_uncached_list_del() and
     rt_del_uncached_list()

   - ipv6: fix use-after-free in inet6_addr_del().

   - xfrm: fix inner mode lookup in tunnel mode GSO segmentation

   - ip_tunnel: spread netdev_lockdep_set_classes()

   - ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv()

   - bluetooth: hci_sync: enable PA sync lost event

   - eth: virtio-net:
      - fix the deadlock when disabling rx NAPI
      - fix misalignment bug in struct virtnet_info

  Previous releases - always broken:

   - ipv4: ip_gre: make ipgre_header() robust

   - can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit.

   - eth:
      - mlx5e: profile change fix
      - octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
      - macvlan: fix possible UAF in macvlan_forward_source()"

* tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
  virtio_net: Fix misalignment bug in struct virtnet_info
  net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
  can: raw: instantly reject disabled CAN frames
  can: propagate CAN device capabilities via ml_priv
  Revert "can: raw: instantly reject unsupported CAN frames"
  net/sched: sch_qfq: do not free existing class in qfq_change_class()
  selftests: drv-net: fix RPS mask handling for high CPU numbers
  selftests: drv-net: fix RPS mask handling in toeplitz test
  ipv6: Fix use-after-free in inet6_addr_del().
  dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()
  net: hv_netvsc: reject RSS hash key programming without RX indirection table
  tools: ynl: render event op docs correctly
  net: add net.core.qdisc_max_burst
  net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition
  net: phy: motorcomm: fix duplex setting error for phy leds
  net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
  net/mlx5e: Restore destroying state bit after profile cleanup
  net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv
  net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv
  net/mlx5e: Fix crash on profile change rollback failure
  ...
2026-01-15 10:11:11 -08:00
Fuad Tabba
e0a99a2b72 KVM: selftests: Fix typos and stale comments in kvm_util
Fix minor documentation errors in `kvm_util.h` and `kvm_util.c`.

- Correct the argument description for `vcpu_args_set` in `kvm_util.h`,
  which incorrectly listed `vm` instead of `vcpu`.
- Fix a typo in the comment for `kvm_selftest_arch_init` ("exeucting" ->
  "executing").
- Correct the return value description for `vm_vaddr_unused_gap` in
  `kvm_util.c` to match the implementation, which returns an address "at
  or above" `vaddr_min`, not "at or below".

No functional change intended.

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 13:39:53 +00:00
Fuad Tabba
de00d07321 KVM: selftests: Move page_align() to shared header
To avoid code duplication, move page_align() to the shared `kvm_util.h`
header file. Rename it to vm_page_align(), to make it clear that the
alignment is done with respect to the guest's base page size.

No functional change intended.

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 13:39:53 +00:00
Fuad Tabba
582b39463f KVM: riscv: selftests: Fix incorrect rounding in page_align()
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.

Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.

Fixes: 3e06cdf105 ("KVM: selftests: Add initial support for RISC-V 64-bit")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 13:39:53 +00:00
Fuad Tabba
dd0c5d04d1 KVM: arm64: selftests: Fix incorrect rounding in page_align()
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.

Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.

Fixes: 7a6629ef74 ("kvm: selftests: add virt mem support for aarch64")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 13:39:53 +00:00
Fuad Tabba
7e03d07d03 KVM: arm64: selftests: Disable unused TTBR1_EL1 translations
KVM selftests map all guest code and data into the lower virtual address
range (0x0000...) managed by TTBR0_EL1. The upper range (0xFFFF...)
managed by TTBR1_EL1 is unused and uninitialized.

If a guest accesses the upper range, the MMU attempts a translation
table walk using uninitialized registers, leading to unpredictable
behavior.

Set `TCR_EL1.EPD1` to disable translation table walks for TTBR1_EL1,
ensuring that any access to the upper range generates an immediate
Translation Fault. Additionally, set `TCR_EL1.TBI1` (Top Byte Ignore) to
ensure that tagged pointers in the upper range also deterministically
trigger a Translation Fault via EPD1.

Define `TCR_EPD1_MASK`, `TCR_EPD1_SHIFT`, and `TCR_TBI1` in
`processor.h` to support this configuration. These are based on their
definitions in `arch/arm64/include/asm/pgtable-hwdef.h`.

Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 13:39:53 +00:00
Marc Zyngier
b638a9d0f8 KVM: arm64: selftests: Add a test for FEAT_IDST
Add a very basic test checking that FEAT_IDST actually works for
the {GMID,SMIDR,CSSIDR2}_EL1 registers.

Link: https://patch.msgid.link/20260108173233.2911955-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 11:58:57 +00:00
Lorenzo Stoakes
21c68ad1d9 tools/testing/selftests: fix gup_longterm for unknown fs
Commit 66bce7afba ("selftests/mm: fix test result reporting in
gup_longterm") introduced a small bug causing unknown filesystems to
always result in a test failure.

This is because do_test() was updated to use a common reporting path, but
this case appears to have been missed.

This is problematic for e.g.  virtme-ng which uses an overlayfs file
system, causing gup_longterm to appear to fail each time due to a test
count mismatch:

	# Planned tests != run tests (50 != 46)
	# Totals: pass:24 fail:0 xfail:0 xpass:0 skip:22 error:0

The fix is to simply change the return into a break.

Link: https://lkml.kernel.org/r/20260106154547.214907-1-lorenzo.stoakes@oracle.com
Fixes: 66bce7afba ("selftests/mm: fix test result reporting in gup_longterm")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14 22:16:26 -08:00
Lorenzo Stoakes
fb39444732 tools/testing/selftests: add forked (un)/faulted VMA merge tests
Now we correctly handle forked faulted/unfaulted merge on mremap(),
exhaustively assert that we handle this correctly.

Do this in the less duplicative way by adding a new merge_with_fork
fixture and forked/unforked variants, and abstract the forking logic as
necessary to avoid code duplication with this also.

Link: https://lkml.kernel.org/r/1daf76d89fdb9d96f38a6a0152d8f3c2e9e30ac7.1767638272.git.lorenzo.stoakes@oracle.com
Fixes: 879bca0a2c ("mm/vma: fix incorrectly disallowed anonymous VMA merges")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14 22:16:25 -08:00
Lorenzo Stoakes
0ace8f2db6 tools/testing/selftests: add tests for !tgt, src mremap() merges
Test that mremap()'ing a VMA into a position such that the target VMA on
merge is unfaulted and the source faulted is correctly performed.

We cover 4 cases:

    1. Previous VMA unfaulted:

                  copied -----|
                              v
            |-----------|.............|
            | unfaulted |(faulted VMA)|
            |-----------|.............|
                 prev

    target = prev, expand prev to cover.

    2. Next VMA unfaulted:

                  copied -----|
                              v
                        |.............|-----------|
                        |(faulted VMA)| unfaulted |
                        |.............|-----------|
                                          next

    target = next, expand next to cover.

    3. Both adjacent VMAs unfaulted:

                  copied -----|
                              v
            |-----------|.............|-----------|
            | unfaulted |(faulted VMA)| unfaulted |
            |-----------|.............|-----------|
                 prev                      next

    target = prev, expand prev to cover.

    4. prev unfaulted, next faulted:

                  copied -----|
                              v
            |-----------|.............|-----------|
            | unfaulted |(faulted VMA)|  faulted  |
            |-----------|.............|-----------|
                 prev                      next

    target = prev, expand prev to cover. Essentially equivalent to 3, but
    with additional requirement that next's anon_vma is the same as the
    copied VMA's.

Each of these are performed with MREMAP_DONTUNMAP set, which will cause a
KASAN assert for UAF or an assert on zero refcount anon_vma if a bug
exists with correctly propagating anon_vma state in each scenario.

Link: https://lkml.kernel.org/r/f903af2930c7c2c6e0948c886b58d0f42d8e8ba3.1767638272.git.lorenzo.stoakes@oracle.com
Fixes: 879bca0a2c ("mm/vma: fix incorrectly disallowed anonymous VMA merges")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14 22:16:24 -08:00
Anton Protopopov
7c8e817e44 selftests/bpf: Extend live regs tests with a test for gotox
Add a test which checks that the destination register of a gotox
instruction is marked as used and that the union of jump targets
is considered as live.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260114162544.83253-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-14 19:08:09 -08:00
Alexei Starovoitov
e3d0dbb3b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Adjacent:
Auto-merging MAINTAINERS
Auto-merging Makefile
Auto-merging kernel/bpf/verifier.c
Auto-merging kernel/sched/ext.c
Auto-merging mm/memcontrol.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-14 15:22:01 -08:00
Yosry Ahmed
55058e3215 KVM: selftests: Add a selftests for nested VMLOAD/VMSAVE
Add a test for VMLOAD/VMSAVE in an L2 guest. The test verifies that L1
intercepts for VMSAVE/VMLOAD always work regardless of
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK.

Then, more interestingly, it makes sure that when L1 does not intercept
VMLOAD/VMSAVE, they work as intended in L2. When
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is enabled by L1, VMSAVE/VMLOAD from
L2 should interpret the GPA as an L2 GPA and translate it through the
NPT. When VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is disabled by L1,
VMSAVE/VMLOAD from L2 should interpret the GPA as an L1 GPA.

To test this, put two VMCBs (0 and 1) in L1's physical address space,
and have a single L2 GPA where:
- L2 VMCB GPA == L1 VMCB(0) GPA
- L2 VMCB GPA maps to L1 VMCB(1) via the NPT in L1.

This setup allows detecting how the GPA is interpreted based on which L1
VMCB is actually accessed.

In both cases, L2 sets KERNEL_GS_BASE (one of the fields handled by
VMSAVE/VMLOAD), and executes VMSAVE to write its value to the VMCB. The
test userspace code then checks that the write was made to the correct
VMCB (based on whether VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is set by L1),
and writes a new value to that VMCB. L2 then executes VMLOAD to load the
new value and makes sure it's reflected correctly in KERNERL_GS_BASE.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260110004821.3411245-4-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14 14:09:10 -08:00
Yosry Ahmed
f756ed82c6 KVM: selftests: Slightly simplify memstress_setup_nested()
Instead of calling memstress_setup_ept_mappings() only in the first
iteration in the loop, move it before the loop.

The call needed to happen within the loop before commit e40e72fec0
("KVM: selftests: Stop passing VMX metadata to TDP mapping functions"),
as memstress_setup_ept_mappings() used to take in a pointer to vmx_pages
and pass it into tdp_identity_map_1g() (to get the EPT root GPA). This
is no longer the case, as tdp_identity_map_1g() gets the EPT root
through stage2 MMU.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260113171456.2097312-1-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14 09:06:45 -08:00
Thomas Weißschuh
7158fc54b2 vdso: Remove struct getcpu_cache
The cache parameter of getcpu() is useless nowadays for various reasons.

  * It is never passed by userspace for either the vDSO or syscalls.
  * It is never used by the kernel.
  * It could not be made to work on the current vDSO architecture.
  * The structure definition is not part of the UAPI headers.
  * vdso_getcpu() is superseded by restartable sequences in any case.

Remove the struct and its header.

As a side-effect this gets rid of an unwanted inclusion of the linux/
header namespace from vDSO code.

[ tglx: Adapt to s390 upstream changes */

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Link: https://patch.msgid.link/20251230-getcpu_cache-v3-1-fb9c5f880ebe@linutronix.de
2026-01-14 08:56:40 +01:00
Linus Torvalds
c537e12dae bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmlm+x8ACgkQ6rmadz2v
 bTrC5Q/+NSCjBpHAinVL/09pahh0YJKczxWnpHG0rs6hSSZ88o+1uY+z3PQ3lRX9
 cHSwjIicRvwpX+DosGcXESqspbvo5MLx5DcB0WBSCnLodJ5MDap0iWltBCrGzqme
 mpeuZiQMSpsAGQLjICiywxcCndoaKEMSRZFmjlzNVJPbYkbxbZCQvwUQqg+Z4rQX
 t5zBJjb0X3Oaz3TcSiiW+yulZPoejFyF4P+4eTmFy+mAqq5D7CpMXrrjVTf78mSf
 IqMPbY/PssTExzna2d/ZKtxmKQ+HF6lN7+JaTuqbQsqQtxwh4eXsswezY/UKX7j+
 FdAuwz6W7ybO8UC3i4fvA1jFR60Hi5bBBoSXu+0hb/UGQQYXc+gRtB8DxqeUEgeb
 oVByGwrBWLRSLuBbQVPl3PRTNgwpBn41DkyIpXS6DK6thzdAqZ8zM15vFTYbWanF
 W/skdubkgIdhREunXMpLfZD5h0b/pXIYxzA59x//hZ+W8qvTuaabICja18x3SnqY
 hytoIQMo5d1UytyngfcHvXe79DCi5xFAqqVj0AqYR+CDdt5n9vrXgavX76hrYeHe
 /yYUHZ0h5A564zMXITMmUWDWvgmvTs1Q2hiV+rmtWPJ1bpOpPFxUosIWEBl4edA3
 vKlGoh/CtU9NJp/E/7nYdUV1vXlLJaareGQ/UXpFBl4NYsnNe4w=
 =Vt4X
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong
   Dong)

 - Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa)

 - Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen)

 - Check that BPF insn array is not allowed as a map for const strings
   (Deepanshu Kartikey)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix reference count leak in bpf_prog_test_run_xdp()
  bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str()
  selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
  bpf, test_run: Subtract size of xdp_frame from allowed metadata size
  riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK
2026-01-13 21:21:13 -08:00
Anton Protopopov
c656807675 selftests/bpf: Add tests for loading insn array values with offsets
The ldimm64 instruction for map value supports an offset.
For insn array maps it wasn't tested before, as normally
such instructions aren't generated. However, this is still
possible to pass such instructions, so add a few tests to
check that correct offsets work properly and incorrect
offsets are rejected.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260111153047.8388-4-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 19:36:47 -08:00
Matt Bobrowski
bbdbed193b selftests/bpf: assert BPF kfunc default trusted pointer semantics
The BPF verifier was recently updated to treat pointers to struct types
returned from BPF kfuncs as implicitly trusted by default. Add a new
test case to exercise this new implicit trust semantic.

The KF_ACQUIRE flag was dropped from the bpf_get_root_mem_cgroup()
kfunc because it returns a global pointer to root_mem_cgroup without
performing any explicit reference counting. This makes it an ideal
candidate to verify the new implicit trusted pointer semantics.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-3-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 19:19:13 -08:00
Matt Bobrowski
f8ade2342e bpf: return PTR_TO_BTF_ID | PTR_TRUSTED from BPF kfuncs by default
Teach the BPF verifier to treat pointers to struct types returned from
BPF kfuncs as implicitly trusted (PTR_TO_BTF_ID | PTR_TRUSTED) by
default. Returning untrusted pointers to struct types from BPF kfuncs
should be considered an exception only, and certainly not the norm.

Update existing selftests to reflect the change in register type
printing (e.g. `ptr_` becoming `trusted_ptr_` in verifier error
messages).

Link: https://lore.kernel.org/bpf/aV4nbCaMfIoM0awM@google.com/
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 19:19:13 -08:00
Gal Pressman
cf055f8c00 selftests: drv-net: fix RPS mask handling for high CPU numbers
The RPS bitmask bounds check uses ~(RPS_MAX_CPUS - 1) which equals ~15 =
0xfff0, only allowing CPUs 0-3.

Change the mask to ~((1UL << RPS_MAX_CPUS) - 1) = ~0xffff to allow CPUs
0-15.

Fixes: 5ebfb4cc30 ("selftests/net: toeplitz test")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112173715.384843-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 19:13:08 -08:00
Gal Pressman
9d48c62f6b selftests: drv-net: fix RPS mask handling in toeplitz test
The toeplitz.py test passed the hex mask without "0x" prefix (e.g.,
"300" for CPUs 8,9). The toeplitz.c strtoul() call wrongly parsed this
as decimal 300 (0x12c) instead of hex 0x300.

Pass the prefixed mask to toeplitz.c, and the unprefixed one to sysfs.

Fixes: 9cf9aa77a1 ("selftests: drv-net: hw: convert the Toeplitz test to Python")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112173715.384843-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 19:13:08 -08:00
Ido Schimmel
f8f9ee9d8b selftests: fib-onlink: Add test cases for nexthop device mismatch
Add test cases that verify that when the "onlink" keyword is specified,
both address families (with and without VRF) accept routes with a
gateway address that is reachable via a different interface than the one
specified.

Output without "ipv6: Allow for nexthop device mismatch with "onlink"":

 # ./fib-onlink-tests.sh | grep mismatch
 TEST: nexthop device mismatch                             [ OK ]
 TEST: nexthop device mismatch                             [ OK ]
 TEST: nexthop device mismatch                             [FAIL]
 TEST: nexthop device mismatch                             [FAIL]

Output with "ipv6: Allow for nexthop device mismatch with "onlink"":

 # ./fib-onlink-tests.sh | grep mismatch
 TEST: nexthop device mismatch                             [ OK ]
 TEST: nexthop device mismatch                             [ OK ]
 TEST: nexthop device mismatch                             [ OK ]
 TEST: nexthop device mismatch                             [ OK ]

That is, the IPv4 tests were always passing, but the IPv6 ones only pass
after the specified patch.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 18:57:35 -08:00
Ido Schimmel
9bf8345fb3 selftests: fib-onlink: Add a test case for IPv4 multicast gateway
A multicast gateway address should be rejected when "onlink" is
specified, but it is only tested as part of the IPv6 tests. Add an
equivalent IPv4 test.

 # ./fib-onlink-tests.sh -v
 [...]
 COMMAND: ip ro add table 254 169.254.101.12/32 via 233.252.0.1 dev veth1 onlink
 Error: Nexthop has invalid gateway.

 TEST: Invalid gw - multicast address                      [ OK ]
 [...]
 COMMAND: ip ro add table 1101 169.254.102.12/32 via 233.252.0.1 dev veth5 onlink
 Error: Nexthop has invalid gateway.

 TEST: Invalid gw - multicast address, VRF                 [ OK ]
 [...]
 Tests passed:  37
 Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 18:57:34 -08:00
Ido Schimmel
0a3419f4ba selftests: fib-onlink: Remove "wrong nexthop device" IPv6 tests
The command in the test fails as expected because IPv6 forbids a nexthop
device mismatch:

 # ./fib-onlink-tests.sh -v
 [...]
 COMMAND: ip -6 ro add table 1101 2001:db8:102::103/128 via 2001:db8:701::64 dev veth5 onlink
 Error: Nexthop has invalid gateway or device mismatch.

 TEST: Gateway resolves to wrong nexthop device - VRF      [ OK ]
 [...]

Where:

 # ip route get 2001:db8:701::64 vrf lisa
 2001:db8:701::64 dev veth7 table 1101 proto kernel src 2001:db8:701::1 metric 256 pref medium

This is in contrast to IPv4 where a nexthop device mismatch is allowed
when "onlink" is specified:

 # ip route get 169.254.7.2 vrf lisa
 169.254.7.2 dev veth7 table 1101 src 169.254.7.1 uid 0
 # ip ro add table 1101 169.254.102.103/32 via 169.254.7.2 dev veth5 onlink
 # echo $?
 0

Remove these tests in preparation for aligning IPv6 with IPv4 and
allowing nexthop device mismatch when "onlink" is specified.

A subsequent patch will add tests that verify that both address families
allow a nexthop device mismatch with "onlink".

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 18:57:34 -08:00
Ido Schimmel
e5566f6b1d selftests: fib-onlink: Remove "wrong nexthop device" IPv4 tests
According to the test description, these tests fail because of a wrong
nexthop device:

 # ./fib-onlink-tests.sh -v
 [...]
 COMMAND: ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth1 onlink
 Error: Nexthop has invalid gateway.

 TEST: Gateway resolves to wrong nexthop device            [ OK ]
 COMMAND: ip ro add table 1101 169.254.102.103/32 via 169.254.7.1 dev veth5 onlink
 Error: Nexthop has invalid gateway.

 TEST: Gateway resolves to wrong nexthop device - VRF      [ OK ]
 [...]

But this is incorrect. They fail because the gateway addresses are local
addresses:

 # ip -4 address show
 [...]
 28: veth3@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link-netns peer_ns-Urqh3o
     inet 169.254.3.1/24 scope global veth3
 [...]
 32: veth7@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lisa state UP group default qlen 1000 link-netns peer_ns-Urqh3o
     inet 169.254.7.1/24 scope global veth7

Therefore, using a local address that matches the nexthop device fails
as well:

 # ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth3 onlink
 Error: Nexthop has invalid gateway.

Using a gateway address with a "wrong" nexthop device is actually valid
and allowed:

 # ip route get 169.254.1.2
 169.254.1.2 dev veth1 src 169.254.1.1 uid 0
 # ip ro add table 254 169.254.101.102/32 via 169.254.1.2 dev veth3 onlink
 # echo $?
 0

Remove these tests given that their output is confusing and that the
scenario that they are testing is already covered by other tests.

A subsequent patch will add tests for the nexthop device mismatch
scenario.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 18:57:34 -08:00
Pavel Begunkov
a32bb32d01 selftests: iou-zcrx: test large chunk sizes
Add a test using large chunks for zcrx memory area.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-14 02:13:37 +00:00
Jakub Kicinski
fe074aaa53 selftests: drv-net: gro: break out all individual test cases
GRO test groups the cases into categories, e.g. "tcp" case
checks coalescing in presence of:
 - packets with bad csum,
 - sequence number mismatch,
 - timestamp option value mismatch,
 - different TCP options.

Since we now have TAP support grouping the cases like that
lowers our reporting granularity. This matters even more for
NICs performing HW GRO and LRO since it appears that most
implementation have _some_ bugs. Flagging the whole group
of tests as failed prevents us from catching regressions
in the things that work today.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 17:54:01 -08:00
Jakub Kicinski
d3b35898de selftests: drv-net: gro: run the test against HW GRO and LRO
Run the test against HW GRO and LRO. NICs I have pass the base cases.
Interestingly all are happy to build GROs larger than 64k.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 17:54:01 -08:00
Jakub Kicinski
8171f6a76b selftests: drv-net: gro: improve feature config
We'll need to do a lot more feature handling to test HW-GRO and LRO.
Clean up the feature handling for SW GRO a bit to let the next commit
focus on the new test cases, only.

Make sure HW GRO-like features are not enabled for the SW tests.
Be more careful about changing features as "nothing changed"
situations may result in non-zero error code from ethtool.

Don't disable TSO on the local interface (receiver) when running over
netdevsim, we just want GSO to break up the segments on the sender.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 17:54:01 -08:00
Jakub Kicinski
d131da6d72 selftests: drv-net: gro: use cmd print
Now that cmd() can be printed directly remove the old formatting.

Before:

  # fragmented ip6 doesn't coalesce:
  # Expected {200 100 100 }, Total 3 packets
  # Received {200 100 }, Total 2 packets.
  # /root/ksft-net-drv/drivers/net/gro: incorrect number of packets

Now:

  # CMD: drivers/net/gro --ipv6 --dmac 9e:[...]
  #   EXIT: 1
  #   STDOUT: fragmented ip6 doesn't coalesce:
  #   STDERR: Expected {200 100 100 }, Total 3 packets
  #           Received {200 100 }, Total 2 packets.
  #           /root/ksft-net-drv/drivers/net/gro: incorrect number of packets

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 17:54:00 -08:00
Jakub Kicinski
ce0f92dc73 selftests: net: py: teach cmd() how to print itself
Teach cmd() how to print itself, to make debug prints easier.
Example output (leading # due to ksft_pr()):

  # CMD: /root/ksft-net-drv/drivers/net/gro
  #   EXIT: 1
  #   STDOUT: ipv6 with ext header does coalesce:
  #   STDERR: Expected {200 }, Total 1 packets
  #           Received {100 [!=200]100 [!=0]}, Total 2 packets.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 17:54:00 -08:00
Jakub Kicinski
b324192e36 selftests: net: py: teach ksft_pr() multi-line safety
Make printing multi-line logs easier by automatically prefixing
each line in ksft_pr(). Make use of this when formatting exceptions.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 17:54:00 -08:00
Sean Christopherson
d7507a94a0 KVM: SVM: Treat exit_code as an unsigned 64-bit value through all of KVM
Fix KVM's long-standing buggy handling of SVM's exit_code as a 32-bit
value.  Per the APM and Xen commit d1bd157fbc ("Big merge the HVM
full-virtualisation abstractions.") (which is arguably more trustworthy
than KVM), offset 0x70 is a single 64-bit value:

  070h 63:0 EXITCODE

Track exit_code as a single u64 to prevent reintroducing bugs where KVM
neglects to correctly set bits 63:32.

Fixes: 6aa8b732ca ("[PATCH] kvm: userspace interface")
Cc: Jim Mattson <jmattson@google.com>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230211347.4099600-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13 17:37:03 -08:00
Sean Christopherson
c3a9a27c79 KVM: selftests: Add a test to verify APICv updates (while L2 is active)
Add a test to verify KVM correctly handles a variety of edge cases related
to APICv updates, and in particular updates that are triggered while L2 is
actively running.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20260109034532.1012993-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13 17:35:31 -08:00
Donglin Peng
a3acd7d434 selftests/bpf: Add test cases for btf__permute functionality
This patch introduces test cases for the btf__permute function to ensure
it works correctly with both base BTF and split BTF scenarios.

The test suite includes:
- test_permute_base: Validates permutation on base BTF
- test_permute_split: Tests permutation on split BTF

Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20260109130003.3313716-3-dolinux.peng@gmail.com
2026-01-13 16:10:39 -08:00
Willem de Bruijn
c65182ef9d selftests: net: reduce txtimestamp deschedule flakes
This test occasionally fails due to exceeding timing bounds, as
run in continuous testing on netdev.bots:

  https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh

A common pattern is a single elevated delay between USR and SND.

    # 8.36 [+0.00] test SND
    # 8.36 [+0.00]     USR: 1767864384 s 240994 us (seq=0, len=0)
    # 8.44 [+0.08] ERROR: 18461 us expected between 10000 and 18000
    # 8.44 [+0.00]     SND: 1767864384 s 259455 us (seq=42, len=10)  (USR +18460 us)
    # 8.52 [+0.07]     SND: 1767864384 s 339523 us (seq=42, len=10)  (USR +10005 us)
    # 8.52 [+0.00]     USR: 1767864384 s 409580 us (seq=0, len=0)
    # 8.60 [+0.08]     SND: 1767864384 s 419586 us (seq=42, len=10)  (USR +10005 us)
    # 8.60 [+0.00]     USR: 1767864384 s 489645 us (seq=0, len=0)
    # 8.68 [+0.08]     SND: 1767864384 s 499651 us (seq=42, len=10)  (USR +10005 us)
    # 8.68 [+0.00]     USR-SND: count=4, avg=12119 us, min=10005 us, max=18460 us

(Note that other delays are nowhere near the large 8ms tolerance.)

One hypothesis is that the task is descheduled between taking the USR
timestamp and sending the packet. Possibly in printing.

Delay taking the timestamp closer to sendmsg, and delay printing until
after sendmsg.

With this change, failure rate is significantly lower in current runs.

Link: https://lore.kernel.org/netdev/20260107110521.1aab55e9@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112163355.3510150-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 11:46:39 -08:00
Linus Torvalds
0bb933a9fc x86 fixes:
- Avoid freeing stack-allocated node in kvm_async_pf_queue_task
 
 - Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmlh7zYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM6bQgArr0BI1b34xyPmsGk3J0+mUBSeYwp
 0aVkegq4BLTDNDY7/yz7jXf9wL0x8uLps/+PJjR3QKpjAAMCwwWi5iQrwoVLsOjO
 N9S15Np4Be1fKvpHoB+8z7/uGFhHr9ltpO5wsDOdR2/lpEnZlTZZtD2Wl093DsKV
 xoCbddcfgRmx78NV15GnXSTFeTPodHSLQL+96oVN4XZRZVLefYsISnZa66VA5B/2
 pl9SndD/zMMo8j6YrrImtuQL6Km+DlqNqNddAXgvQ6JIJ3rO/WgLyxJPaV8trC4Q
 n6x8hQcFCRqM+Oi9G0+b4LgfQUPg/0hxPSIZi8yeZAZpJhMb7iEetLXc6A==
 =Kew3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm fixes from Paolo Bonzini:

 - Avoid freeing stack-allocated node in kvm_async_pf_queue_task

 - Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1
  selftests: kvm: try getting XFD and XSAVE state out of sync
  selftests: kvm: replace numbered sync points with actions
  x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1
  x86/kvm: Avoid freeing stack-allocated node in kvm_async_pf_queue_task
2026-01-13 09:50:07 -08:00
Alexei Starovoitov
9160335317 selftests/bpf: Add tests for s>>=31 and s>>=63
Add tests for special arithmetic shift right.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260112201424.816836-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:33:38 -08:00
Yonghong Song
951d79017e selftests/bpf: Fix verifier_arena_globals1 failure with 64K page
With 64K page on arm64, verifier_arena_globals1 failed like below:
  ...
  libbpf: map 'arena': failed to create: -E2BIG
  ...
  #509/1   verifier_arena_globals1/check_reserve1:FAIL
  ...

For 64K page, if the number of arena pages is (1UL << 20), the total
memory will exceed 4G and this will cause map creation failure.
Adjusting ARENA_PAGES based on the actual page size fixed the problem.

Cc: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260113061033.3798549-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:32:16 -08:00
Yonghong Song
d2f7cd20a7 selftests/bpf: Fix sk_bypass_prot_mem failure with 64K page
The current selftest sk_bypass_prot_mem only supports 4K page.
When running with 64K page on arm64, the following failure happens:
  ...
  check_bypass:FAIL:no bypass unexpected no bypass: actual 3 <= expected 32
  ...
  #385/1   sk_bypass_prot_mem/TCP  :FAIL
  ...
  check_bypass:FAIL:no bypass unexpected no bypass: actual 4 <= expected 32
  ...
  #385/2   sk_bypass_prot_mem/UDP  :FAIL
  ...

Adding support to 64K page as well fixed the failure.

Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061028.3798326-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:32:16 -08:00
Yonghong Song
2465a08d43 selftests/bpf: Fix dmabuf_iter/lots_of_buffers failure with 64K page
On arm64 with 64K page , I observed the following test failure:
  ...
  subtest_dmabuf_iter_check_lots_of_buffers:FAIL:total_bytes_read unexpected total_bytes_read:
      actual 4696 <= expected 65536
  #97/3    dmabuf_iter/lots_of_buffers:FAIL

With 4K page on x86, the total_bytes_read is 4593.
With 64K page on arm64, the total_byte_read is 4696.

In progs/dmabuf_iter.c, for each iteration, the output is
  BPF_SEQ_PRINTF(seq, "%lu\n%llu\n%s\n%s\n", inode, size, name, exporter);

The only difference between 4K and 64K page is 'size' in
the above BPF_SEQ_PRINTF. The 4K page will output '4096' and
the 64K page will output '65536'. So the total_bytes_read with 64K page
is slighter greater than 4K page.

Adjusting the total_bytes_read from 65536 to 4096 fixed the issue.

Cc: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061023.3798085-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:32:16 -08:00
Robert Richter
8441c7d3bd cxl: Check for invalid addresses returned from translation functions on errors
Translation functions may return an invalid address in case of errors.
If the address is not checked the further use of the invalid value
will cause an address corruption.

Consistently check for a valid address returned by translation
functions. Use RESOURCE_SIZE_MAX to indicate an invalid address for
type resource_size_t. Depending on the type either RESOURCE_SIZE_MAX
or ULLONG_MAX is used to indicate an address error.

Propagating an invalid address from a failed translation may cause
userspace to think it has received a valid SPA, when in fact it is
wrong. The CXL userspace API, using trace events, expects ULLONG_MAX
to indicate a translation failure. If ULLONG_MAX is not returned
immediately, subsequent calculations can transform that bad address
into a different value (!ULLONG_MAX), and an invalid SPA may be
returned to userspace. This can lead to incorrect diagnostics and
erroneous corrective actions.

[ dj: Added user impact statement from Alison. ]
[ dj: Fixed checkpatch tab alignment issue. ]

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Fixes: c3dd67681c ("cxl/region: Add inject and clear poison by region offset")
Fixes: b78b9e7b79 ("cxl/region: Refactor address translation funcs for testing")
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260107120544.410993-1-rrichter@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-13 08:30:40 -07:00
Thomas Weißschuh
4e6a231298 selftests: vDSO: vdso_test_abi: Add test for clock_getres_time64()
Some architectures will start to implement this function.
Make sure it works correctly.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20251223-vdso-compat-time32-v1-4-97ea7a06a543@linutronix.de
2026-01-13 14:42:23 +01:00
Thomas Weißschuh
1dcd1273ad selftests: vDSO: vdso_test_abi: Use UAPI system call numbers
SYS_clock_getres might have been redirected by libc to some other system
call than the actual clock_getres. For testing it is required to use
exactly this system call.

Use the system call number exported by the UAPI headers which is always
correct.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20251223-vdso-compat-time32-v1-3-97ea7a06a543@linutronix.de
2026-01-13 14:42:23 +01:00
Thomas Weißschuh
609e359ab9 selftests: vDSO: vdso_config: Add configurations for clock_getres_time64()
Some architectures will start to implement this function.
Make sure that tests can be written for it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20251223-vdso-compat-time32-v1-2-97ea7a06a543@linutronix.de
2026-01-13 14:42:23 +01:00
Jonas Köppeler
8d61f1a9f2 selftests/tc-testing: add selftests for cake_mq qdisc
Test 684b: Create CAKE_MQ with default setting (4 queues)
Test 7ee8: Create CAKE_MQ with bandwidth limit (4 queues)
Test 1f87: Create CAKE_MQ with rtt time (4 queues)
Test e9cf: Create CAKE_MQ with besteffort flag (4 queues)
Test 7c05: Create CAKE_MQ with diffserv8 flag (4 queues)
Test 5a77: Create CAKE_MQ with diffserv4 flag (4 queues)
Test 8f7a: Create CAKE_MQ with flowblind flag (4 queues)
Test 7ef7: Create CAKE_MQ with dsthost and nat flag (4 queues)
Test 2e4d: Create CAKE_MQ with wash flag (4 queues)
Test b3e6: Create CAKE_MQ with flowblind and no-split-gso flag (4 queues)
Test 62cd: Create CAKE_MQ with dual-srchost and ack-filter flag (4 queues)
Test 0df3: Create CAKE_MQ with dual-dsthost and ack-filter-aggressive flag (4 queues)
Test 9a75: Create CAKE_MQ with memlimit and ptm flag (4 queues)
Test cdef: Create CAKE_MQ with fwmark and atm flag (4 queues)
Test 93dd: Create CAKE_MQ with overhead 0 and mpu (4 queues)
Test 1475: Create CAKE_MQ with conservative and ingress flag (4 queues)
Test 7bf1: Delete CAKE_MQ with conservative and ingress flag (4 queues)
Test ee55: Replace CAKE_MQ with mpu (4 queues)
Test 6df9: Change CAKE_MQ with mpu (4 queues)
Test 67e2: Show CAKE_MQ class (4 queues)
Test 2de4: Change bandwidth of CAKE_MQ (4 queues)
Test 5f62: Fail to create CAKE_MQ with autorate-ingress flag (4 queues)
Test 038e: Fail to change setting of sub-qdisc under CAKE_MQ
Test 7bdc: Fail to replace sub-qdisc under CAKE_MQ
Test 18e0: Fail to install CAKE_MQ on single queue device

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-6-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-13 11:54:30 +01:00
Ankit Khushwaha
088f35ab9f selftests/net/ipsec: Fix variable size type not at the end of struct
The "struct alg" object contains a union of 3 xfrm structures:

	union {
		struct xfrm_algo;
		struct xfrm_algo_aead;
		struct xfrm_algo_auth;
	}

All of them end with a flexible array member used to store key material,
but the flexible array appears at *different offsets* in each struct.
bcz of this, union itself is of variable-sized & Placing it above
char buf[...] triggers:

ipsec.c:835:5: warning: field 'u' with variable sized type 'union
(unnamed union at ipsec.c:831:3)' not at the end of a struct or class
is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
  835 |                 } u;
      |                   ^

one fix is to use "TRAILING_OVERLAP()" which works with one flexible
array member only.

But In "struct alg" flexible array member exists in all union members,
but not at the same offset, so TRAILING_OVERLAP cannot be applied.

so the fix is to explicitly overlay the key buffer at the correct offset
for the largest union member (xfrm_algo_auth). This ensures that the
flexible-array region and the fixed buffer line up.

No functional change.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20260109152201.15668-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12 19:15:04 -08:00
Sami Tolvanen
ba7f1024a1 selftests/bpf: Use the correct destructor kfunc type
With CONFIG_CFI enabled, the kernel strictly enforces that indirect
function calls use a function pointer type that matches the target
function. As bpf_testmod_ctx_release() signature differs from the
btf_dtor_kfunc_t pointer type used for the destructor calls in
bpf_obj_free_fields(), add a stub function with the correct type to
fix the type mismatch.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260110082548.113748-9-samitolvanen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-12 18:53:57 -08:00
Ming Lei
65955a0993 selftests: ublk: add stop command with --safe option
Add 'stop' subcommand to kublk utility that uses the new
UBLK_CMD_TRY_STOP_DEV command when --safe option is specified.
This allows stopping a device only if it has no active openers,
returning -EBUSY otherwise.

Also add test_generic_16.sh to test the new functionality.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 15:07:31 -07:00
Waiman Long
272bd81833 cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change()
As stated in commit 1c09b195d3 ("cpuset: fix a regression in validating
config change"), it is not allowed to clear masks of a cpuset if
there're tasks in it. This is specific to v1 since empty "cpuset.cpus"
or "cpuset.mems" will cause the v2 cpuset to inherit the effective CPUs
or memory nodes from its parent. So it is OK to have empty cpus or mems
even if there are tasks in the cpuset.

Move this empty cpus/mems check in validate_change() to
cpuset1_validate_change() to allow more flexibility in setting
cpus or mems in v2. cpuset_is_populated() needs to be moved into
cpuset-internal.h as it is needed by the empty cpus/mems checking code.

Also add a test case to test_cpuset_prs.sh to verify that.

Reported-by: Chen Ridong <chenridong@huaweicloud.com>
Closes: https://lore.kernel.org/lkml/7a3ec392-2e86-4693-aa9f-1e668a668b9c@huaweicloud.com/
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-01-12 09:03:22 -10:00
Waiman Long
2a3602030d cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
the sibling's partition state becomes invalid. This is overly harsh and
is probably not necessary.

The cpuset.cpus.exclusive control file, if set, will override the
cpuset.cpus of the same cpuset when creating a cpuset partition.
So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
a partition.  However, it cannot override a conflicting cpuset.cpus file
in a sibling cpuset and the partition creation process will fail. This
is inconsistent.  That will also make using cpuset.cpus.exclusive less
valuable as a tool to set up cpuset partitions as the users have to
check if such a cpuset.cpus conflict exists or not.

Fix these problems by making sure that once a cpuset.cpus.exclusive
is set without failure, it will always be allowed to form a valid
partition as long as at least one CPU can be granted from its parent
irrespective of the state of the siblings' cpuset.cpus values. Of
course, setting cpuset.cpus.exclusive will fail if it conflicts with
the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value
of a sibling.

Partition can still be created by setting only cpuset.cpus without
setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
be removed from its cpuset.cpus.exclusive.effective as long as there
is still one or more CPUs left and can be granted from its parent. This
CPU stripping is currently done in rm_siblings_excl_cpus().

The new code will now try its best to enable the creation of new
partitions with only cpuset.cpus set without invalidating existing ones.
However it is not guaranteed that all the CPUs requested in cpuset.cpus
will be used in the new partition even when all these CPUs can be
granted from the parent.

This is similar to the fact that cpuset.cpus.effective may not be
able to include all the CPUs requested in cpuset.cpus. In this case,
the parent may not able to grant all the exclusive CPUs requested in
cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
already been granted to other partitions earlier.

With the creation of multiple sibling partitions by setting
only cpuset.cpus, this does have the side effect that their exact
cpuset.cpus.exclusive.effective settings will depend on the order of
partition creation if there are conflicts. Due to the exclusive nature
of the CPUs in a partition, it is not easy to make it fair other than
the old behavior of invalidating all the conflicting partitions.

For example,
  # echo "0-2" > A1/cpuset.cpus
  # echo "root" > A1/cpuset.cpus.partition
  # cat A1/cpuset.cpus.partition
  root
  # cat A1/cpuset.cpus.exclusive.effective
  0-2
  # echo "2-4" > B1/cpuset.cpus
  # echo "root" > B1/cpuset.cpus.partition
  # cat B1/cpuset.cpus.partition
  root
  # cat B1/cpuset.cpus.exclusive.effective
  3-4
  # cat B1/cpuset.cpus.effective
  3-4

For users who want to be sure that they can get most of the CPUs they
want, cpuset.cpus.exclusive should be used instead if they can set
it successfully without failure. Setting cpuset.cpus.exclusive will
guarantee that sibling conflicts from then onward is no longer possible.

To make this change, we have to separate out the is_cpu_exclusive()
check in cpus_excl_conflict() into a cgroup v1 only
cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
helper is now no longer needed and can be removed.

Some existing tests in test_cpuset_prs.sh are updated and new ones are
added to reflect the new behavior. The cgroup-v2.rst doc file is also
updated the clarify what exclusive CPUs will be used when a partition
is created.

Reported-by: Sun Shaojie <sunshaojie@kylinos.cn>
Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-01-12 09:02:46 -10:00
Caleb Sander Mateos
78796b6bae selftests: ublk: add end-to-end integrity test
Add test case loop_08 to verify the ublk integrity data flow. It uses
the kublk loop target to create a ublk device with integrity on top of
backing data and integrity files. It then writes to the whole device
with fio configured to generate integrity data. Then it reads back the
whole device with fio configured to verify the integrity data.
It also verifies that injected guard, reftag, and apptag corruptions are
correctly detected.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
9e9f635525 selftests: ublk: add integrity params test
Add test case null_04 to exercise all the different integrity params. It
creates 4 different ublk devices with different combinations of
integrity arguments and verifies their integrity limits via sysfs and
the metadata_size utility.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
f48250dc5b selftests: ublk: add integrity data support to loop target
To perform and end-to-end test of integrity information through a ublk
device, we need to actually store it somewhere and retrieve it. Add this
support to kublk's loop target. It uses a second backing file for the
integrity data corresponding to the data stored in the first file.
The integrity file is initialized with byte 0xFF, which ensures the app
and reference tags are set to the "escape" pattern to disable the
bio-integrity-auto guard and reftag checks until the blocks are written.
The integrity file is opened without O_DIRECT since it will be accessed
at sub-block granularity. Each incoming read/write results in a pair of
reads/writes, one to the data file, and one to the integrity file. If
either backing I/O fails, the error is propagated to the ublk request.
If both backing I/Os read/write some bytes, the ublk request is
completed with the smaller of the number of blocks accessed by each I/O.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
a180544267 selftests: ublk: support non-O_DIRECT backing files
A subsequent commit will add support for using a backing file to store
integrity data. Since integrity data is accessed in intervals of
metadata_size, which may be much smaller than a logical block on the
backing device, direct I/O cannot be used. Add an argument to
backing_file_tgt_init() to specify the number of files to open for
direct I/O. The remaining files will use buffered I/O. For now, continue
to request direct I/O for all the files.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
24f8a44b79 selftests: ublk: implement integrity user copy in kublk
If integrity data is enabled for kublk, allocate an integrity buffer for
each I/O. Extend ublk_user_copy() to copy the integrity data between the
ublk request and the integrity buffer if the ublksrv_io_desc indicates
that the request has integrity data.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
6ed6476c4a selftests: ublk: add kublk support for integrity params
Add integrity param command line arguments to kublk. Plumb these to
struct ublk_params for the null and fault_inject targets, as they don't
need to actually read or write the integrity data. Forbid the integrity
params for loop or stripe until the integrity data copy is implemented.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
261b67f4e3 selftests: ublk: add utility to get block device metadata size
Some block device integrity parameters are available in sysfs, but
others are only accessible using the FS_IOC_GETLBMD_CAP ioctl. Add a
metadata_size utility program to print out the logical block metadata
size, PI offset, and PI size within the metadata. Example output:
$ metadata_size /dev/ublkb0
metadata_size: 64
pi_offset: 56
pi_tuple_size: 8

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Caleb Sander Mateos
c1d7c0f9cd selftests: ublk: display UBLK_F_INTEGRITY support
Add support for printing the UBLK_F_INTEGRITY feature flag in the
human-readable kublk features output.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12 09:16:38 -07:00
Jens Axboe
5df832ba5f Merge branch 'block-6.19' into for-7.0/block
Merge in fixes that went to 6.19 after for-7.0/block was branched.
Pending ublk changes depend on particularly the async scan work.

* block-6.19:
  block: zero non-PI portion of auto integrity buffer
  ublk: fix use-after-free in ublk_partition_scan_work
  blk-mq: avoid stall during boot due to synchronize_rcu_expedited
  loop: add missing bd_abort_claiming in loop_set_status
  block: don't merge bios with different app_tags
  blk-rq-qos: Remove unlikely() hints from QoS checks
  loop: don't change loop device under exclusive opener in loop_set_status
  block, bfq: update outdated comment
  blk-mq: skip CPU offline notify on unmapped hctx
  selftests/ublk: fix Makefile to rebuild on header changes
  selftests/ublk: add test for async partition scan
  ublk: scan partition in async way
  block,bfq: fix aux stat accumulation destination
  md: Fix forward incompatibility from configurable logical block size
  md: Fix logical_block_size configuration being overwritten
  md: suspend array while updating raid_disks via sysfs
  md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
  md: Fix static checker warning in analyze_sbs
2026-01-11 13:16:36 -07:00
Daniel Palmer
a5f00be9b3 tools/nolibc: Add a simple test for writing to a FILE and reading it back
Add a test that exercises create->write->seek->read to check that using the
stream functions (fwrite() etc) is not totally broken.

The only edge cases this is testing for are:
- Reading the file after writing but without rewinding reads nothing.
- Trying to read more items than the file contains returns the count of
  fully read items.

Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260105023629.1502801-4-daniel@thingy.jp
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2026-01-11 12:47:47 +01:00
Thomas Weißschuh
6fe8360b16 selftests/nolibc: also test libc-test through regular selftest framework
Hook up libc-test to the regular selftest build to make sure
nolibc-test.c stays compatible with a normal libc.

As the pattern rule from lib.mk does not handle compiling a target from
a differently named source file, add an explicit rule definition.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-3-f82101c2c505@weissschuh.net
2026-01-11 11:17:15 +01:00
Thomas Weißschuh
20c72de1f8 selftests/nolibc: scope custom flags to the nolibc-test target
A new target for 'libc-test' is going to be added which should not be
affected by these options.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-2-f82101c2c505@weissschuh.net
2026-01-11 11:17:12 +01:00
Thomas Weißschuh
4203c6fb5e selftests/nolibc: try to read from stdin in readv_zero test
When stdout is redirected to a file this test fails.
This happens when running through the kselftest runner since
commit d9e6269e33 ("selftests/run_kselftest.sh: exit with
error if tests fail").

For consistency with other tests that read from a file descriptor,
switch to stdin over stdout. The tests are still brittle against
a redirected stdin, but at least they are now consistently so.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-1-f82101c2c505@weissschuh.net
2026-01-11 11:17:07 +01:00
Linus Torvalds
b061fcffe3 linux_kselftest-fixes-6.19-rc5
Fixes tracing test_multiple_writes stalls when buffer_size_kb is less
 than 12KB.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmlime8ACgkQCwJExA0N
 Qxyung/+KwLzsf+etHDBNFkt8GwMH9+RQcLkR3wZK2X7x074jXPQ1dfd/zHj7O1S
 R42VtDAc5QM6LhiyBEteYdYtGeIkbFo9jVmvjctQzYz7yAOShImeEd2oa0Jnt44J
 c3Gqo5clmGbM85C0tUlgKGtc43GZfdq6YQKr7/sYqNOkVpijNhjby7XF6ewXKjFD
 rem9QglvQoHrG0tTdzOkICIoOPHHCoP8YkQe8qmB0c1Ec9YgYZizIm5ME0oSaxpR
 4c6zMAqMOKv/UepYe9J5MgIO2+xveyDKY1yN9hzGyJNZkIwansjCU9XKndnbsPn4
 QX38EDEHrry5hZ2Lb/mmBxCHy7PpsH/Wx1LnAU3vP7gflxg/+X35zKhb+kyH5tv8
 2v35NRONKhfofSXps4QiQozS5+tUpCXg4k+Dd2NPJ7DHtxJtYB4kksQgbRzChIKX
 iKgKuEPM1Fka/vZtbzJDrxcCwqaneunMpsB+WIQR+HDaHiCjTf3pSY7aZzXelT+G
 zCcFWLGdOBLOFUIoQuPCa+ourmVEjuYm6OSP5oLF9ObdiumBBW9MdHwj8wfqPsXw
 tZf4JdwOCmk0geN8FADIPuJAf+V43cfPFFvcGLiDvRks5+mx+jSRCeG/yynHN2N4
 SACsSPhCaJvdNw/tTbvmTlwT0EPDvo3bZI0XvNCfFMGYT4O592I=
 =zuty
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "Fix tracing test_multiple_writes stalls when buffer_size_kb is less
  than 12KB"

* tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/tracing: Fix test_multiple_writes stall
2026-01-10 14:57:55 -10:00
Jakub Kicinski
7a1ff3545a selftests: net: py: ensure defer() is only used within a test case
I wasted a couple of hours recently after accidentally adding
a defer() from within a function which itself was called as
part of defer(). This leads to an infinite loop of defer().
Make sure this cannot happen and raise a helpful exception.

I understand that the pair of _ksft_defer_arm() calls may
not be the most Pythonic way to implement this, but it's
easy enough to understand.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260108225257.2684238-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:11:59 -08:00
Jakub Kicinski
799a4912ee selftests: net: py: capitalize defer queue and improve import
Import utils and refer to the global defer queue that way instead
of importing the queue. This will make it possible to assign value
to the global variable. While at it capitalize the name, to comply
with the Python coding style.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260108225257.2684238-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:11:59 -08:00
David Wei
de7c600e2d selftests/net: parametrise iou-zcrx.py with ksft_variants
Use ksft_variants to parametrise tests in iou-zcrx.py to either use
single queues or RSS contexts, reducing duplication.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20260108234521.3619621-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:08:03 -08:00
Cosmin Ratiu
9086984ff5 selftests: drv-net: psp: Better control the used PSP dev
The PSP responder fails when zero or multiple PSP devices are detected.
There's an option to select the device id to use (-d) but it's
currently not used from the PSP self test. It's also hard to use because
the PSP test doesn't dump the PSP devices so can't choose one.
When zero devices are detected, psp_responder fails which will cause the
parent test to fail as well instead of skipping PSP tests.

Fix both of these problems. Change psp_responder to:
- not fail when no PSP devs are detected.
- get an optional -i ifindex argument instead of -d.
- select the correct PSP dev from the dump corresponding to ifindex or
- select the first PSP dev when -i is not given.
- fail when multiple devs are found and -i is not given.
- warn and continue when the requested ifindex is not found.

Also plumb the ifindex from the Python test.

With these, when there are no PSP devs found or the wrong one is chosen,
psp_responder opens the server socket, listens for control connections
normally, and leaves the skipping of the various test cases which
require a PSP device (~most, but not all of them) to the parent test.
This results in output like:

ok 1 psp.test_case # SKIP No PSP devices found
[...]
ok 12 psp.dev_get_device # SKIP No PSP devices found
ok 13 psp.dev_get_device_bad
ok 14 psp.dev_rotate # SKIP No PSP devices found
[...]

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Link: https://patch.msgid.link/20260109110851.2952906-2-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 14:44:05 -08:00
Stefano Garzarella
c39a6a277e vsock/test: add a final full barrier after run all tests
If the last test fails, the other side still completes correctly,
which could lead to false positives.

Let's add a final barrier that ensures that the last test has finished
correctly on both sides, but also that the two sides agree on the
number of tests to be performed.

Fixes: 2f65b44e19 ("VSOCK: add full barrier between test cases")
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260108114419.52747-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 12:07:25 -08:00
Sean Christopherson
3611ca7c12 selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1
Rework the AMX test's #NM handling to use kvm_asm_safe() to verify an #NM
actually occurs.  As is, a completely missing #NM could go unnoticed.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10 07:17:30 +01:00
Paolo Bonzini
0383a8edef selftests: kvm: try getting XFD and XSAVE state out of sync
The host is allowed to set FPU state that includes a disabled
xstate component.  Check that this does not cause bad effects.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10 07:17:30 +01:00
Paolo Bonzini
a1025dcd37 selftests: kvm: replace numbered sync points with actions
Rework the guest=>host syncs in the AMX test to use named actions instead
of arbitrary, incrementing numbers.  The "stage" of the test has no real
meaning, what matters is what action the test wants the host to perform.
The incrementing numbers are somewhat helpful for triaging failures, but
fully debugging failures almost always requires a much deeper dive into
the test (and KVM).

Using named actions not only makes it easier to extend the test without
having to shift all sync point numbers, it makes the code easier to read.

[Commit message by Sean Christopherson]

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-10 07:17:29 +01:00
Jakub Kicinski
68ec2b9fc5 selftests: forwarding: update PTP tcpdump patterns
Recent version of tcpdump (tcpdump-4.99.6-1.fc43.x86_64) seems to have
removed the spurious space after msg type in PTP info, e.g.:

 before:  PTPv2, majorSdoId: 0x0, msg type : sync msg, length: 44
 after:   PTPv2, majorSdoId: 0x0, msg type: sync msg, length: 44

Update our patterns to match both.

Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Link: https://patch.msgid.link/20260107145320.1837464-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-09 17:35:23 -08:00
Jakub Kicinski
a0ac0ff382 selftests: drv-net: gro: increase the rcvbuf size
The gro.py test (testing software GRO) is slightly flaky when
running against fbnic. We see one flake per roughly 20 runs in NIPA,
mostly in ipip.large, and always including some EAGAIN:

  # Shouldn't coalesce if exceed IP max pkt size: Test succeeded
  # Expected {65475 899 }, Total 2 packets
  # Received {65475 899 }, Total 2 packets.
  # Expected {64576 900 900 }, Total 3 packets
  # Received {64576 /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable

The test sends 2 large frames (64k + change). Looks like the default
packet socket rcvbuf (~200kB) may not be large enough to hold them.
Bump the rcvbuf to 1MB.

Add a debug print showing socket statistics to make debugging this
issue easier in the future. Without the rcvbuf increase we see:

  # Shouldn't coalesce if exceed IP max pkt size: Test succeeded
  # Expected {65475 899 }, Total 2 packets
  # Received {65475 899 }, Total 2 packets.
  # Expected {64576 900 900 }, Total 3 packets
  # Received {64576 Socket stats: packets=7, drops=3
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260107232557.2147760-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-09 17:34:30 -08:00
Jakub Kicinski
96ea4fa60c selftests: tls: avoid flakiness in data_steal
We see the following failure a few times a week:

  #  RUN           global.data_steal ...
  # tls.c:3280:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (10000) == -1 (-1)
  # data_steal: Test failed
  #          FAIL  global.data_steal
  not ok 8 global.data_steal

The 10000 bytes read suggests that the child process did a recv()
of half of the data using the TLS ULP and we're now getting the
remaining half. The intent of the test is to get the child to
enter _TCP_ recvmsg handler, so it needs to enter the syscall before
parent installed the TLS recvmsg with setsockopt(SOL_TLS).

Instead of the 10msec sleep send 1 byte of data and wait for the
child to consume it.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260106200205.1593915-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-09 17:33:28 -08:00
Xiaochen Shen
86063a2568 selftests/resctrl: Fix non-contiguous CBM check for Hygon
The resctrl selftest currently fails on Hygon CPUs that always supports
non-contiguous CBM, printing the error:

  "# Hardware and kernel differ on non-contiguous CBM support!"

This occurs because the arch_supports_noncont_cat() function lacks
vendor detection for Hygon CPUs, preventing proper identification of
their non-contiguous CBM capability.

Fix this by adding Hygon vendor ID detection to
arch_supports_noncont_cat().

Link: https://lore.kernel.org/r/20251217030456.3834956-5-shenxiaochen@open-hieco.net
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09 16:49:01 -07:00
Xiaochen Shen
367f931e64 selftests/resctrl: Add CPU vendor detection for Hygon
The resctrl selftest currently fails on Hygon CPUs that support Platform
QoS features, printing the error:

  "# Can not get vendor info..."

This occurs because vendor detection is missing for Hygon CPUs.

Fix this by extending the CPU vendor detection logic to include
Hygon's vendor ID.

Link: https://lore.kernel.org/r/20251217030456.3834956-4-shenxiaochen@open-hieco.net
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09 16:49:01 -07:00
Xiaochen Shen
4f4f01cc33 selftests/resctrl: Define CPU vendor IDs as bits to match usage
The CPU vendor IDs are required to be unique bits because they're used
for vendor_specific bitmask in the struct resctrl_test.
Consider for example their usage in test_vendor_specific_check():
	return get_vendor() & test->vendor_specific

However, the definitions of CPU vendor IDs in file resctrl.h is quite
subtle as a bitmask value:
  #define ARCH_INTEL     1
  #define ARCH_AMD       2

A clearer and more maintainable approach is to define these CPU vendor
IDs using BIT(). This ensures each vendor corresponds to a distinct bit
and makes it obvious when adding new vendor IDs.

Accordingly, update the return types of detect_vendor() and get_vendor()
from 'int' to 'unsigned int' to align with their usage as bitmask values
and to prevent potentially risky type conversions.

Furthermore, introduce a bool flag 'initialized' to simplify the
get_vendor() -> detect_vendor() logic. This ensures the vendor ID is
detected only once and resolves the ambiguity of using the same variable
'vendor' both as a value and as a state.

Link: https://lore.kernel.org/r/20251217030456.3834956-3-shenxiaochen@open-hieco.net
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Suggested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09 16:49:01 -07:00
Xiaochen Shen
671ef08d94 selftests/resctrl: Fix a division by zero error on Hygon
Change to adjust effective L3 cache size with SNC enabled change
introduced the snc_nodes_per_l3_cache() function to detect the Intel
Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs
sharing LLC with CPU0. The function was designed to return:
  (1) >1: SNC mode is enabled.
  (2)  1: SNC mode is not enabled or not supported.

However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually
less than #CPUs in node0. This results in snc_nodes_per_l3_cache()
returning 0 (calculated as cache_cpus / node_cpus).

This leads to a division by zero error in get_cache_size():
  *cache_size /= snc_nodes_per_l3_cache();

Causing the resctrl selftest to fail with:
  "Floating point exception (core dumped)"

Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC
mode is not supported on the platform.

Updated commit log to fix commit has issues:
Shuah Khan <skhan@linuxfoundation.org>

Link: https://lore.kernel.org/r/20251217030456.3834956-2-shenxiaochen@open-hieco.net
Fixes: a1cd99e700 ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09 16:48:11 -07:00
Fushuai Wang
6e39903c73 selftests/tracing: Fix test_multiple_writes stall
When /sys/kernel/tracing/buffer_size_kb is less than 12KB,
the test_multiple_writes test will stall and wait for more
input due to insufficient buffer space.

Check current buffer_size_kb value before the test. If it is
less than 12KB, it temporarily increase the buffer to 12KB,
and restore the original value after the tests are completed.

Link: https://lore.kernel.org/r/20260109033620.25727-1-fushuai.wang@linux.dev
Fixes: 37f4660138 ("selftests/tracing: Add basic test for trace_marker_raw file")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-09 15:49:32 -07:00
Günther Noack
15e8d739fd
selftests/landlock: Properly close a file descriptor
Add a missing close(srv_fd) call, and use EXPECT_EQ() to check the
result.

Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Fixes: f83d51a5bd ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets")
Link: https://lore.kernel.org/r/20260101134102.25938-2-gnoack3000@gmail.com
[mic: Use EXPECT_EQ() and update commit message]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-09 11:52:47 +01:00
Shengming Hu
58e3e52654 memblock: drop redundant 'struct page *' argument from memblock_free_pages()
memblock_free_pages() currently takes both a struct page * and the
corresponding PFN. The page pointer is always derived from the PFN at
call sites (pfn_to_page(pfn)), making the parameter redundant and also
allowing accidental mismatches between the two arguments.

Simplify the interface by removing the struct page * argument and
deriving the page locally from the PFN, after the deferred struct page
initialization check. This keeps the behavior unchanged while making
the helper harder to misuse.

Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Link: https://patch.msgid.link/tencent_F741CE6ECC49EE099736685E60C0DBD4A209@qq.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-01-09 11:53:51 +02:00
Yosry Ahmed
ca2eccb953 KVM: selftests: Extend vmx_set_nested_state_test to cover SVM
Add test cases for the validation checks in svm_set_nested_state(), and
allow the test to run with SVM as well as VMX. The SVM test also makes
sure that KVM_SET_NESTED_STATE accepts GIF being set or cleared if
EFER.SVME is cleared, verifying a recently fixed bug where GIF was
incorrectly expected to always be set when EFER.SVME is cleared.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251121204803.991707-5-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:54:19 -08:00
Yosry Ahmed
bda6ae6f29 KVM: selftests: Use TEST_ASSERT_EQ() in test_vmx_nested_state()
The assert messages do not add much value, so use TEST_ASSERT_EQ(),
which also nicely displays the addresses in hex. While at it, also
assert the values of state->flags.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251121204803.991707-4-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:54:19 -08:00
Sean Christopherson
e353850499 KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()
Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the
"page table entry" makes it unnecessarily difficult to quickly understand
what callers are doing.

No functional change intended.

Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:17 -08:00
Yosry Ahmed
59eef1a47b KVM: selftests: Extend memstress to run on nested SVM
Add L1 SVM code and generalize the setup code to work for both VMX and
SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:16 -08:00
Yosry Ahmed
6794d916f8 KVM: selftests: Extend vmx_dirty_log_test to cover SVM
Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1
code, doing some renaming (e.g. EPT -> TDP), and having setup code for
both SVM and VMX in test_dirty_log().

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:16 -08:00
Yosry Ahmed
251e4849a7 KVM: selftests: Set the user bit on nested NPT PTEs
According to the APM, NPT walks are treated as user accesses. In
preparation for supporting NPT mappings, set the 'user' bit on NPTs by
adding a mask of bits to always be set on PTEs in kvm_mmu.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:15 -08:00
Yosry Ahmed
753c0d5a50 KVM: selftests: Add support for nested NPTs
Implement nCR3 and NPT initialization functions, similar to the EPT
equivalents, and create common TDP helpers for enablement checking and
initialization. Enable NPT for nested guests by default if the TDP MMU
was initialized, similar to VMX.

Reuse the PTE masks from the main MMU in the NPT MMU, except for the C
and S bits related to confidential VMs.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-17-seanjc@google.com
[sean: apply Yosry's fixup for ncr3_gpa]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:14 -08:00
Yosry Ahmed
9cb1944f6b KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
In preparation for generalizing the nested dirty logging test, checking
if either EPT or NPT is enabled will be needed. To avoid needing to gate
the kvm_cpu_has_ept() call by the CPU type, make sure the function
returns false if VMX is not available instead of trying to read VMX-only
MSRs.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:14 -08:00
Sean Christopherson
07676c04bd KVM: selftests: Move TDP mapping functions outside of vmx.c
Now that the functions are no longer VMX-specific, move them to
processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:13 -08:00
Yosry Ahmed
508d1cc3ca KVM: selftests: Reuse virt mapping functions for nested EPTs
Rework tdp_map() and friends to use __virt_pg_map() and drop the custom
EPT code in __tdp_pg_map() and tdp_create_pte().  The EPT code and
__virt_pg_map() are practically identical, the main differences are:
  - EPT uses the EPT struct overlay instead of the PTE masks.
  - EPT always assumes 4-level EPTs.

To reuse __virt_pg_map(), extend the PTE masks to work with EPT's RWX and
X-only capabilities, and provide a tdp_mmu_init() API so that EPT can pass
in the EPT PTE masks along with the root page level (which is currently
hardcoded to '4').

Don't reuse KVM's insane overloading of the USER bit for EPT_R as there's
no reason to multiplex bits in the selftests, e.g. selftests aren't trying
to shadow guest PTEs and thus don't care about funnelling protections into
a common permissions check.

Another benefit of reusing the code is having separate handling for
upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
large bit on a 4K PTE in the EPTs.

For all intents and purposes, no functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251230230150.4150236-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:12 -08:00
Sean Christopherson
8296b16c0a KVM: selftests: Add a stage-2 MMU instance to kvm_vm
Add a stage-2 MMU instance so that architectures that support nested
virtualization (more specifically, nested stage-2 page tables) can create
and track stage-2 page tables for running L2 guests.  Plumb the structure
into common code to avoid cyclical dependencies, and to provide some line
of sight to having common APIs for creating stage-2 mappings.

As a bonus, putting the member in common code justifies using stage2_mmu
instead of tdp_mmu for x86.

Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:12 -08:00
Yosry Ahmed
e40e72fec0 KVM: selftests: Stop passing VMX metadata to TDP mapping functions
The root GPA is now retrieved from the nested MMU, stop passing VMX
metadata. This is in preparation for making these functions work for
NPTs as well.

Opportunistically drop tdp_pg_map() since it's unused.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:11 -08:00
Yosry Ahmed
f00f519ceb KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs
prepare_eptp() currently allocates new EPTs for each vCPU.  memstress has
its own hack to share the EPTs between vCPUs.  Currently, there is no
reason to have separate EPTs for each vCPU, and the complexity is
significant.  The only reason it doesn't matter now is because memstress
is the only user with multiple vCPUs.

Add vm_enable_ept() to allocate EPT page tables for an entire VM, and use
it everywhere to replace prepare_eptp().  Drop 'eptp' and 'eptp_hva' from
'struct vmx_pages' as they serve no purpose (e.g. the EPTP can be built
from the PGD), but keep 'eptp_gpa' so that the MMU structure doesn't need
to be passed in along with vmx_pages.  Dynamically allocate the TDP MMU
structure to avoid a cyclical dependency between kvm_util_arch.h and
kvm_util.h.

Remove the workaround in memstress to copy the EPT root between vCPUs
since that's now the default behavior.

Name the MMU tdp_mmu instead of e.g. nested_mmu or nested.mmu to avoid
recreating the same mess that KVM has with respect to "nested" MMUs, e.g.
does nested refer to the stage-2 page tables created by L1, or the stage-1
page tables created by L2?

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251230230150.4150236-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:10 -08:00
Yosry Ahmed
6dd7075721 KVM: selftests: Move PTE bitmasks to kvm_mmu
Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping
functions. Introduce helpers to read/write different PTE bits given a
kvm_mmu.

Drop the 'global' bit definition as it's currently unused, but leave the
'user' bit as it will be used in coming changes. Opportunisitcally
rename 'large' to 'huge' as it's more consistent with the kernel naming.

Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and
a lot of other macros depend on it. It's tempting to move all the other
macros to be per-struct instead, but it would be too much noise for
little benefit.

Keep c_bit and s_bit in vm->arch as they used before the MMU is
initialized, through  __vmcreate() -> vm_userspace_mem_region_add() ->
vm_mem_add() -> vm_arch_has_protected_memory().

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rename accessors to is_<adjective>_pte()]
Link: https://patch.msgid.link/20251230230150.4150236-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:10 -08:00
Sean Christopherson
3d0e7595e8 KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu
Add an arch structure+field in "struct kvm_mmu" so that architectures can
track arch-specific information for a given MMU.

No functional change intended.

Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:09 -08:00
Sean Christopherson
11825209f5 KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs
In preparation for generalizing the x86 virt mapping APIs to work with
TDP (stage-2) page tables, plumb "struct kvm_mmu" into all of the helper
functions instead of operating on vm->mmu directly.

Opportunistically swap the order of the check in virt_get_pte() to first
assert that the parent is the PGD, and then check that the PTE is present,
as it makes more sense to check if the parent PTE is the PGD/root (i.e.
not a PTE) before checking that the PTE is PRESENT.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rebase on common kvm_mmu structure, rewrite changelog]
Link: https://patch.msgid.link/20251230230150.4150236-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:08 -08:00
Sean Christopherson
9f073ac25b KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance
Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1
MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for
both stage-1 and stage-2 MMUs, without creating the potential for subtle
bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2
MMU.

Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of
burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in
____vm_create(), and in the hopes that other architectures can utilize the
formalized MMU structure if/when they too support stage-2 page tables.

No functional change intended.

Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:08 -08:00
Yosry Ahmed
3cd5002807 KVM: selftests: Stop setting A/D bits when creating EPT PTEs
Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that
the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code
without bleeding the EPT hack into the common APIs.

While commit 0944442045 ("selftests: kvm: add test for dirty logging
inside nested guests") is _very_ light on details, the most likely
explanation is that vmx_dirty_log_test was attempting to avoid taking an
EPT Violation on the first _write_ from L2.

  static void l2_guest_code(u64 *a, u64 *b)
  {
	READ_ONCE(*a);
	WRITE_ONCE(*a, 1);   <===
	GUEST_SYNC(true);

	...
  }

When handling read faults in the shadow MMU, KVM opportunistically creates
a writable SPTE if the mapping can be writable *and* the gPTE is dirty (or
doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept
writes in order to emulate Dirty-bit updates.  By setting A/D bits in the
test's EPT entries, the above READ+WRITE will fault only on the read, and
in theory expose the bug fixed by KVM commit 1f4e5fc83a ("KVM: x86: fix
nested guest live migration with PML").  If the Dirty bit is NOT set, the
test will get a false pass due; though again, in theory.

However, the test is flawed (and always was, at least in the versions
posted publicly), as KVM (correctly) marks the corresponding L1 GFN as
dirty (in the dirty bitmap) when creating the writable SPTE.  I.e. without
a check on the dirty bitmap after the READ_ONCE(), the check after the
first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having
been updated by the read fault, not by PML.

Furthermore, the subsequent behavior in the test's l2_guest_code()
effectively hides the flawed test behavior, as the straight writes to a
new L2 GPA fault also trigger the KVM bug, and so the test will still
detect the failure due to lack of isolation between the two testcases
(Read=>Write vs. Write=>Write).

	WRITE_ONCE(*b, 1);
	GUEST_SYNC(true);
	WRITE_ONCE(*b, 1);
	GUEST_SYNC(true);
	GUEST_SYNC(false);

Punt on fixing vmx_dirty_log_test for the moment as it will be easier to
properly fix the test once the TDP code uses the common MMU APIs, at which
point it will be trivially easy for the test to retrieve the EPT PTE and
set the Dirty bit as needed.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rewrite changelog to explain the situation]
Link: https://patch.msgid.link/20251230230150.4150236-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:07 -08:00
Yosry Ahmed
b320c03d68 KVM: selftests: Kill eptPageTablePointer
Replace the struct overlay with explicit bitmasks, which is clearer and
less error-prone. See commit f18b4aebe1 ("kvm: selftests: do not use
bitfields larger than 32-bits for PTEs") for an example of why bitfields
are not preferable.

Remove the unused PAGE_SHIFT_4K definition while at it.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:06 -08:00
Yosry Ahmed
60de423781 KVM: selftests: Rename nested TDP mapping functions
Rename the functions from nested_* to tdp_* to make their purpose
clearer.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:06 -08:00
Yosry Ahmed
97dfbdfea4 KVM: selftests: Stop passing a memslot to nested_map_memslot()
On x86, KVM selftests use memslot 0 for all the default regions used by
the test infrastructure. This is an implementation detail.
nested_map_memslot() is currently used to map the default regions by
explicitly passing slot 0, which leaks the library implementation into
the caller.

Rename the function to a very verbose
nested_identity_map_default_memslots() to reflect what it actually does.
Add an assertion that only memslot 0 is being used so that the
implementation does not change from under us.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:05 -08:00
Yosry Ahmed
69e81ed5e6 KVM: selftests: Make __vm_get_page_table_entry() static
The function is only used in processor.c, drop the declaration in
processor.h and make it static.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:04 -08:00
MJ Pooladkhay
7fe9f5366b KVM: selftests: Fix sign extension bug in get_desc64_base()
The function get_desc64_base() performs a series of bitwise left shifts on
fields of various sizes. More specifically, when performing '<< 24' on
'desc->base2' (which is a u8), 'base2' is promoted to a signed integer
before shifting.

In a scenario where base2 >= 0x80, the shift places a 1 into bit 31,
causing the 32-bit intermediate value to become negative. When this
result is cast to uint64_t or ORed into the return value, sign extension
occurs, corrupting the upper 32 bits of the address (base3).

Example:
Given:
  base0 = 0x5000
  base1 = 0xd6
  base2 = 0xf8
  base3 = 0xfffffe7c

Expected return: 0xfffffe7cf8d65000
Actual return:   0xfffffffff8d65000

Fix this by explicitly casting the fields to 'uint64_t' before shifting
to prevent sign extension.

Signed-off-by: MJ Pooladkhay <mj@pooladkhay.com>
Link: https://patch.msgid.link/20251222174207.107331-1-mj@pooladkhay.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:00:56 -08:00
Jakub Kicinski
59ba823e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.19-rc5).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-08 11:38:33 -08:00
Maciej S. Szmigiero
0b28194c4c KVM: selftests: Test TPR / CR8 sync and interrupt masking
Add a few extra TPR / CR8 tests to x86's xapic_state_test to see if:
  * TPR is 0 on reset,
  * TPR, PPR and CR8 are equal inside the guest,
  * TPR and CR8 read equal by the host after a VMExit
  * TPR borderline values set by the host correctly mask interrupts in the
    guest.

These hopefully will catch the most obvious cases of improper TPR sync or
interrupt masking.

Do these tests both in x2APIC and xAPIC modes.
The x2APIC mode uses SELF_IPI register to trigger interrupts to give it a
bit of exercise too.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
[sean: put code in separate test]
Link: https://patch.msgid.link/20251205224937.428122-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 10:50:50 -08:00
Linus Torvalds
f2a3b12b30 Including fixes from netfilter and wireless.
Current release - fix to a fix:
 
  - net: do not write to msg_get_inq in callee
 
  - arp: do not assume dev_hard_header() does not change skb->head
 
 Current release - regressions:
 
  - wifi: mac80211: don't iterate not running interfaces
 
  - eth: mlx5: fix NULL pointer dereference in ioctl module EEPROM
 
 Current release - new code bugs:
 
  - eth: bnge: add AUXILIARY_BUS to Kconfig dependencies
 
 Previous releases - regressions:
 
  - eth: mlx5: dealloc forgotten PSP RX modify header
 
 Previous releases - always broken:
 
  - ping: fix ICMP out SNMP stats double-counting with ICMP sockets
 
  - bonding: preserve NETIF_F_ALL_FOR_ALL across TSO updates
 
  - bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress
 
  - eth: bnxt: fix potential data corruption with HW GRO/LRO
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmlf6N8ACgkQMUZtbf5S
 Irt9lxAAote4zPojTEJvE1DkABVsx31gPm6DF8zEae39XsFKtOMfb876s5bchOUA
 Rkd27+k28l/U5HFrrhUhqDlaOikjXx6baT12qTPsuxGrOL6Um23EfIgVuNFzxMwE
 lcBgkSNe1tfT8WBirWEVWLc5xme+vvll7ViX7kkQJ7fEdk1mvTPZ5roIq1+pE1U0
 V5Gu4l9QBcWC/IymAO8Z2UE08terMcYt1G4H6mSIoooeMM1QElbPwVEiRWAzJ/NP
 9cTjvnHJDAdRnA4bMa76CGWxg4wgPhgj3+ydlouWjgJADL6hlMj4sIZxaXgjDuoE
 XyCEuk6Y/rUTSmX1yn7rha9FQwJOyMu9XlEjnNSvH0LRdnSa7xO7NzeXtrWv7HSg
 kQOMTnMVgVlabOuMbR6xNqY6UyulQgK/2E56RgOO4Iw6U7crZsbyZx3OFkIKhq8g
 ZWaRBQNdYBBftjJA7FwQSyj/K75sLfbYAS5YizguNyFPBCBhSBJdgFWoGb+XhT0/
 k0KwsX/NWN0apHmZNiD4mT/UdX2PhJRdiTWPczNyEzqJcxh1P2HMWHVZOsIyZQHT
 EK3w6LLqp1eshEERPqFsqCFYX4LUuifQbPPF0kBkL1hTMa2NuXnkIsOzbcWal88o
 qdej9TY9VC5ycabgDI4/9ZNhnAzho1Nk/e6YuHsBcu2pIVDKgNI=
 =vR9r
 -----END PGP SIGNATURE-----

Merge tag 'net-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and wireless.

  Current release - fix to a fix:

   - net: do not write to msg_get_inq in callee

   - arp: do not assume dev_hard_header() does not change skb->head

  Current release - regressions:

   - wifi: mac80211: don't iterate not running interfaces

   - eth: mlx5: fix NULL pointer dereference in ioctl module EEPROM

  Current release - new code bugs:

   - eth: bnge: add AUXILIARY_BUS to Kconfig dependencies

  Previous releases - regressions:

   - eth: mlx5: dealloc forgotten PSP RX modify header

  Previous releases - always broken:

   - ping: fix ICMP out SNMP stats double-counting with ICMP sockets

   - bonding: preserve NETIF_F_ALL_FOR_ALL across TSO updates

   - bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress

   - eth: bnxt: fix potential data corruption with HW GRO/LRO"

* tag 'net-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
  arp: do not assume dev_hard_header() does not change skb->head
  net: enetc: fix build warning when PAGE_SIZE is greater than 128K
  atm: Fix dma_free_coherent() size
  tools: ynl: don't install tests
  net: do not write to msg_get_inq in callee
  bnxt_en: Fix NULL pointer crash in bnxt_ptp_enable during error cleanup
  net: usb: pegasus: fix memory leak in update_eth_regs_async()
  net: 3com: 3c59x: fix possible null dereference in vortex_probe1()
  net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset
  wifi: mac80211: collect station statistics earlier when disconnect
  wifi: mac80211: restore non-chanctx injection behaviour
  wifi: mac80211_hwsim: disable BHs for hwsim_radio_lock
  wifi: mac80211: don't iterate not running interfaces
  wifi: mac80211_hwsim: fix typo in frequency notification
  wifi: avoid kernel-infoleak from struct iw_point
  net: airoha: Fix schedule while atomic in airoha_ppe_deinit()
  selftests: netdevsim: add carrier state consistency test
  net: netdevsim: fix inconsistent carrier state after link/unlink
  selftests: drv-net: Bring back tool() to driver __init__s
  net/sched: act_api: avoid dereferencing ERR_PTR in tcf_idrinfo_destroy
  ...
2026-01-08 08:40:35 -10:00
Linus Torvalds
79b95d7447 hid-for-linus-2026010801
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmlfxBEACgkQpmLzj2vt
 YEl21w//SDcIehN5GWJn6pMKoS8mYj+S17vo9BqQ9S68dQbhTww8D1jZ9k50+xbU
 7YLzwiMpDEex8omY9zs5EYiEBf6RCMSb/y5dGt8eyqLj+cHjqmmUK+4yjq/mKTjz
 SWh7A9A/WqFVbiQz43wOqu/kCBg10cNxKuOXmZm9LknJSb5PJRd9bd9/HGKWFDg2
 VstoZC/n/+mQoMsvDvMtbOJw/iLSGr/wDkhFkIzSbHdCnRKRmCwBOrjGonwzPOet
 8c0kNSZt9kpzhZCmOXTDWhGDr+fpaxWtkpX+WkDZ7Guxjb/4wRkLEMDqs77WRwPx
 cI1ko1ug3oPv0e0dqG1+2VQ1nJ5h6Q83gp/l+R9H4P6aT874IAcmpGoXa04u82ob
 vlkOzfQjBb8qe/O4+KG9lj6Wclr3m3nvZ07vfjQRCnizwnAo5z+NjZqWJPJcuoDf
 7KInyQI2+2KHWqLmGtAXbXfK6xb7Kcxx9ysKcN32kWSubf6FBg0GYclhlAJyqRiC
 /Z1yPNFWroJYChNN3p4IIh4hr2SCMj/Qzil456/JeXjt/3Gqbg+DLmgY8IgUng3g
 M6OTMvXR1UP9i73mb0Z22oN0x4s4Ijysjnj4X2AhL+K+lCrZBJ84j4lt+zY/N39C
 IC5pqbvPKZAvZ5iSnz07g9QQ0CbDCMDzst96jigiMN/aTtx6mm8=
 =s/hE
 -----END PGP SIGNATURE-----

Merge tag 'hid-for-linus-2026010801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - build fix for HID-BPF (Benjamin Tissoires)

 - fix for potential buffer overflow in i2c-hid (Kwok Kin Ming)

 - a couple of selftests/hid fixes (Peter Hutterer)

 - fix for handling pressure pads in hid-multitouch (Peter Hutterer)

 - fix for potential NULL pointer dereference in intel-thc-hid (Even Xu)

 - fix for interrupt delay control in intel-thc-hid (Even Xu)

 - fix finger release detection on some VTL-class touchpads (DaytonCL)

 - fix for correct enumeration on intel-ish-hid systems with no sensors
   (Zhang Lixu)

 - assorted device ID additions and device-specific quirks

* tag 'hid-for-linus-2026010801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (21 commits)
  HID: logitech: add HID++ support for Logitech MX Anywhere 3S
  HID: Elecom: Add support for ELECOM M-XT3DRBK (018C)
  HID: quirks: work around VID/PID conflict for appledisplay
  HID: Apply quirk HID_QUIRK_ALWAYS_POLL to Edifier QR30 (2d99:a101)
  HID: i2c-hid: fix potential buffer overflow in i2c_hid_get_report()
  selftests/hid: add a test for the Digitizer/Button Type pressurepad
  selftests/hid: use a enum class for the different button types
  selftests/hid: require hidtools 0.12
  HID: multitouch: set INPUT_PROP_PRESSUREPAD based on Digitizer/Button Type
  HID: quirks: Add another Chicony HP 5MP Cameras to hid_ignore_list
  HID: Intel-thc-hid: Intel-thc: Add safety check for reading DMA buffer
  hid: intel-thc-hid: Select SGL_ALLOC
  selftests/hid: fix bpf compilations due to -fms-extensions
  HID: bpf: fix bpf compilation with -fms-extensions
  HID: Intel-thc-hid: Intel-thc: Fix wrong register reading
  HID: multitouch: add MT_QUIRK_STICKY_FINGERS to MT_CLS_VTL
  HID: intel-ish-hid: Reset enum_devices_done before enumeration
  HID: intel-ish-hid: Update ishtp bus match to support device ID table
  HID: Intel-thc-hid: Intel-thc: fix dma_unmap_sg() nents value
  HID: playstation: Center initial joystick axes to prevent spurious events
  ...
2026-01-08 07:44:48 -08:00
Srinivas Pandruvada
8190b9ea30 thermal: intel: selftests: workload_hint: Support slow workload hints
Add option to enable slow workload type hints. User can specify
"slow" as the command line argument to enable slow workload type hints.
There are two slow workload type hints: "power" and "performance".

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251218222559.4110027-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-07 21:42:10 +01:00
Thomas Weißschuh
f126d68819 kunit: tool: test: Don't rely on implicit working directory change
If no kunitconfig_paths are passed to LinuxSourceTree() it falls back to
DEFAULT_KUNITCONFIG_PATH. This resolution only works when the current
working directory is the root of the source tree. This works by chance
when running the full testsuite through the default unittest runner, as
some tests will change the current working directory as a side-effect of
'kunit.main()'. When running a single testcase or using pytest, which
resets the working directory for each test, this assumption breaks.

Explicitly specify an empty kunitconfig for the affected tests.

Link: https://lore.kernel.org/r/20260107015936.2316047-2-davidgow@google.com
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-07 10:00:47 -07:00
Thomas Weißschuh
1cabad3a00 kunit: tool: test: Rename test_data_path() to _test_data_path()
Running the KUnit testsuite through pytest fails, as the function
test_data_path() is recognized as a test function. Its execution fails
as pytest tries to resolve the 'path' argument as a fixture which does
not exist.

Rename the function, so the helper function is not incorrectly
recognized as a test function.

Link: https://lore.kernel.org/r/20260107015936.2316047-1-davidgow@google.com
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-07 10:00:41 -07:00
Peter Hutterer
f287ba5951 selftests/hid: add a test for the Digitizer/Button Type pressurepad
We have to resort to a bit of a hack: python-libevdev gets the
properties from libevdev at module init time. If libevdev hasn't been
rebuilt with the new property it won't be automatically populated. So we
hack around this by constructing the property manually.

Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2026-01-07 15:28:09 +01:00
Peter Hutterer
4f36fdab08 selftests/hid: use a enum class for the different button types
Instead of multiple spellings of a string-provided argument, let's make
this a tad more type-safe and use an enum here.

And while we do this fix the two wrong devices:
- elan_04f3_313a (HP ZBook Fury 15) is discrete button pad
- dell_044e_1220 (Dell Precision 7740) is a discrete button pad

Equivalent hid-tools commit
8300a55bf4

Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2026-01-07 15:28:09 +01:00
Peter Hutterer
1d6628f7f2 selftests/hid: require hidtools 0.12
Not all our tests really require it but since it's likely pip-installed
anyway it's trivial to require the new version, just in case we want to
start cleaning up other bits.

Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2026-01-07 15:28:09 +01:00
Benjamin Tissoires
e03fb369b0 selftests/hid: fix bpf compilations due to -fms-extensions
Similar to commit 835a507535 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f4 ("bpftool: Fix build warnings
due to MS extensions")

The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct slab {
   ..
   struct freelist_counters;
};

Use -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"

Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2026-01-07 15:03:49 +01:00
Jose E. Marchesi
681600647c bpf: GCC requires function attributes before the declarator
GCC insists in placing attributes before the declarators in function
declarations.  Now that GCC supports btf_decl_tag and therefore __tag1
and __tag2 expand to actual attributes, the compiler is complaining
about it for

  static __noinline int foo(int x __tag1 __tag2) __tag1 __tag2

progs/test_btf_decl_tag.c:36:1: error: attributes should be specified \
before the declarator in a function definition

This patch simply places the tags before the declarator.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260106173650.18191-3-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 21:04:11 -08:00
Jose E. Marchesi
97fb54d86d bpf: adapt selftests to GCC 16 -Wunused-but-set-variable
GCC 16 has changed the semantics of -Wunused-but-set-variable, as well
as introducing new options -Wunused-but-set-variable={0,1,2,3} to
adjust the level of support.

One of the changes is that GCC now treats 'sum += 1' and 'sum++' as
non-usage, whereas clang (and GCC < 16) considers the first as usage
and the second as non-usage, which is sort of inconsistent.

The GCC 16 -Wunused-but-set-variable=2 option implements the previous
semantics of -Wunused-but-set-variable, but since it is a new option,
it cannot be used unconditionally for forward-compatibility, just for
backwards-compatibility.

So this patch adds pragmas to the two self-tests impacted by this,
progs/free_timer.c and progs/rcu_read_lock.c, to make gcc to ignore
-Wunused-but-set-variable warnings when compiling them with GCC > 15.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44677#c25 for details
on why this regression got introduced in GCC upstream.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260106173650.18191-2-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 21:04:11 -08:00
Leon Hwang
07bf7aa58e selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags
Add test coverage for the new BPF_F_CPU and BPF_F_ALL_CPUS flags support
in percpu maps. The following APIs are exercised:

* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem()
* bpf_map__update_elem()
* bpf_map_lookup_elem_flags()
* bpf_map__lookup_elem()

For lru_percpu_hash map, set max_entries to
'libbpf_num_possible_cpus() + 1' and only use the first
'libbpf_num_possible_cpus()' entries. This ensures a spare entry is always
available in the LRU free list, avoiding eviction.

When updating an existing key in lru_percpu_hash map:

1. l_new = prealloc_lru_pop();  /* Borrow from free list */
2. l_old = lookup_elem_raw();   /* Found, key exists */
3. pcpu_copy_value();           /* In-place update */
4. bpf_lru_push_free();         /* Return l_new to free list */

Also add negative tests to verify that non-percpu array and hash maps
reject the BPF_F_CPU and BPF_F_ALL_CPUS flags.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260107022022.12843-8-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 20:48:32 -08:00
Yohei Kojima
75df712cdd selftests: netdevsim: add carrier state consistency test
This commit adds a test case for netdevsim carrier state consistency.
Specifically, the added test verifies the carrier state during the
following operations:

1. Unlink two netdevsims
2. ifdown one netdevsim, then ifup again
3. Link the netdevsims again
4. ifdown one netdevsim, then ifup again

These steps verifies that the carrier is UP iff two netdevsims are
linked and ifuped.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/481e2729e53b6074ebfc0ad85764d8feb244de8c.1767624906.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06 18:04:01 -08:00
Gal Pressman
353cfc0ef3 selftests: drv-net: Bring back tool() to driver __init__s
The pp_alloc_fail.py test (which doesn't run in NIPA CI?) uses tool, add
back the import.

Resolves:
  ImportError: cannot import name 'tool' from 'lib.py'

Fixes: 68a052239f ("selftests: drv-net: update remaining Python init files")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260105163319.47619-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06 17:52:08 -08:00
Emil Tsalapatis
b81d5e9d96 selftests/bpf: add tests for arena kfuncs under lock
Add selftests to ensure the verifier permits calling the arena
kfunc API while holding a lock.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260106-arena-under-lock-v2-3-378e9eab3066@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 17:44:13 -08:00
Willem de Bruijn
3b7a108c41 selftests/net: packetdrill: add minimal client and server tests
Introduce minimal tests. These can serve as simple illustrative
examples, and as templates when writing new tests.

When adding new cases, it can be easier to extend an existing base
test rather than start from scratch. The existing tests all focus on
real, often non-trivial, features. It is not obvious which to take as
starting point, and arguably none really qualify.

Add two tests
- the client test performs the active open and initial close
- the server test implements the passive open and final close

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260105172529.3514786-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06 17:38:26 -08:00
Jakub Kicinski
c4df070a57 selftests: hw-net: rss-input-xfrm: try to enable the xfrm at the start
The test currently SKIPs if the symmetric RSS xfrm is not enabled
by default. This leads to spurious SKIPs in the Intel CI reporting
results to NIPA.

Testing on CX7:

 # ./drivers/net/hw/rss_input_xfrm.py
  TAP version 13
  1..2
  ok 1 rss_input_xfrm.test_rss_input_xfrm_ipv4 # SKIP Test requires IPv4 connectivity
  # Sym input xfrm already enabled: {'sym-or-xor'}
  ok 2 rss_input_xfrm.test_rss_input_xfrm_ipv6
  # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0

 # ethtool -X eth0 xfrm none

 # ./drivers/net/hw/rss_input_xfrm.py
  TAP version 13
  1..2
  ok 1 rss_input_xfrm.test_rss_input_xfrm_ipv4 # SKIP Test requires IPv4 connectivity
  # Sym input xfrm configured: {'sym-or-xor'}
  ok 2 rss_input_xfrm.test_rss_input_xfrm_ipv6
  # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0

Link: https://patch.msgid.link/20260104184600.795280-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06 16:54:10 -08:00
Yumei Huang
cb3de96eea ipv6: preserve insertion order for same-scope addresses
IPv6 addresses with the same scope are returned in reverse insertion
order, unlike IPv4. For example, when adding a -> b -> c, the list is
reported as c -> b -> a, while IPv4 preserves the original order.

This behavior causes:

a. When using `ip -6 a save` and `ip -6 a restore`, addresses are restored
   in the opposite order from which they were saved. See example below
   showing addresses added as 1::1, 1::2, 1::3 but displayed and saved
   in reverse order.

   # ip -6 a a 1::1 dev x
   # ip -6 a a 1::2 dev x
   # ip -6 a a 1::3 dev x
   # ip -6 a s dev x
   2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
       inet6 1::3/128 scope global tentative
       valid_lft forever preferred_lft forever
       inet6 1::2/128 scope global tentative
       valid_lft forever preferred_lft forever
       inet6 1::1/128 scope global tentative
       valid_lft forever preferred_lft forever
   # ip -6 a save > dump
   # ip -6 a d 1::1 dev x
   # ip -6 a d 1::2 dev x
   # ip -6 a d 1::3 dev x
   # ip a d ::1 dev lo
   # ip a restore < dump
   # ip -6 a s dev x
   2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
       inet6 1::1/128 scope global tentative
       valid_lft forever preferred_lft forever
       inet6 1::2/128 scope global tentative
       valid_lft forever preferred_lft forever
       inet6 1::3/128 scope global tentative
       valid_lft forever preferred_lft forever
   # ip a showdump < dump
    if1:
        inet6 ::1/128 scope host proto kernel_lo
        valid_lft forever preferred_lft forever
    if2:
        inet6 1::3/128 scope global tentative
        valid_lft forever preferred_lft forever
    if2:
        inet6 1::2/128 scope global tentative
        valid_lft forever preferred_lft forever
    if2:
        inet6 1::1/128 scope global tentative
        valid_lft forever preferred_lft forever

b. Addresses in pasta to appear in reversed order compared to host
   addresses.

The ipv6 addresses were added in reverse order by commit e55ffac601
("[IPV6]: order addresses by scope"), then it was changed by commit
502a2ffd73 ("ipv6: convert idev_list to list macros"), and restored by
commit b54c9b98bb ("ipv6: Preserve pervious behavior in
ipv6_link_dev_addr()."). However, this reverse ordering within the same
scope causes inconsistency with IPv4 and the issues described above.

This patch aligns IPv6 address ordering with IPv4 for consistency
by changing the comparison from >= to > when inserting addresses
into the address list. Also updates the ioam6 selftest to reflect
the new address ordering behavior. Combine these two changes into
one patch for bisectability.

Link: https://bugs.passt.top/show_bug.cgi?id=175
Suggested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Acked-by: Justin Iurman <justin.iurman@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260104032357.38555-1-yuhuang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-06 16:41:35 -08:00
Toke Høiland-Jørgensen
ab86d0bf01 selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
Update the selftest to check that the metadata size check takes the
xdp_frame size into account in bpf_prog_test_run. The original
check (for meta size 256) was broken because the data frame supplied was
smaller than this, triggering a different EINVAL return. So supply a
larger data frame for this test to make sure we actually exercise the
check we think we are.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 11:41:41 -08:00
Thomas Weißschuh
6b6dbf3e4e selftests/nolibc: always build sparc32 tests with -mcpu=v8
Since LLVM commit 39e30508a7f6 ("[Driver][Sparc] Default to -mcpu=v9 for
32-bit Linux/sparc64 (#109278)"), clang defaults to -mcpu=v9 for 32-bit
SPARC builds. -mcpu=v9 generates instructions which are not recognized
by qemu-sparc and qemu-system-sparc.

Explicitly enforce -mcpu=v8 to generate compatible code.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-sparc32-fix-v2-1-7c5cd6b175c2@weissschuh.net
2026-01-06 12:30:16 +01:00
Thomas Weißschuh
0313992485 selftests/nolibc: drop NOLIBC_SYSROOT=0 logic
This logic was added in commit 850fad7de8 ("selftests/nolibc: allow
test -include /path/to/nolibc.h") to allow the testing of -include
/path/to/nolibc.h. As it requires as special variable to activate, this
code is nearly never used. Furthermore it complicates the logic a bit.

Since commit a6a054c8ad ("tools/nolibc: add target to check header
usability") and commit 443c6467fc ("selftests/nolibc: always run
nolibc header check") the usability of -include /path/to/nolibc.h is
always checked anyways, making NOLIBC_SYSROOT=0 pointless.

Drop the special logic.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260104-nolibc-nolibc_sysroot-v1-1-98025ad99add@weissschuh.net
2026-01-06 12:08:09 +01:00
Thomas Weißschuh
ca7206b6ad selftests/nolibc: test compatibility of nolibc and kernel time types
Keeping 'struct timespec' and 'struct __kernel_timespec' compatible
allows the source code to stay simple.

Validate that the types stay compatible.

The test is specific to nolibc and does not compile on other libcs, so
skip it there.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251220-nolibc-uapi-types-v3-10-c662992f75d7@weissschuh.net
2026-01-06 12:08:04 +01:00
Ankit Khushwaha
2fa98059fd selftests: mptcp: Mark xerror and die_perror __noreturn
Compiler reports potential uses of uninitialized variables in
mptcp_connect.c when xerror() is called from failure paths.

mptcp_connect.c:1262:11: warning: variable 'raw_addr' is used
      uninitialized whenever 'if' condition is false
      [-Wsometimes-uninitialized]

xerror() terminates execution by calling exit(), but it is not visible
to the compiler & assumes control flow may continue past the call.

Annotate xerror() with __noreturn so the compiler can correctly reason
about control flow and avoid false-positive uninitialized variable
warnings.

Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20260101172840.90186-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-05 16:28:07 -08:00
Victor Nogueira
4bcd49a03b selftests/tc-testing: Add test case redirecting to self on egress
Add single mirred test case that attempts to redirect to self on egress
using clsact

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260101135608.253079-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-05 16:23:43 -08:00
Michal Luczaj
caa20e9e15 vsock/test: Test setting SO_ZEROCOPY on accept()ed socket
Make sure setsockopt(SOL_SOCKET, SO_ZEROCOPY) on an accept()ed socket is
handled by vsock's implementation.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-2-64778d6c4f88@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-05 16:14:51 -08:00
Thomas Weißschuh
ab150c2bba kunit: qemu_configs: Add 32-bit big endian ARM configuration
Add a basic config to run kunit tests on 32-bit big endian ARM.

Link: https://lore.kernel.org/r/20260102-kunit-armeb-v1-1-e8e5475d735c@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-05 15:40:32 -07:00
Thomas Weißschuh
85aff81b0d kunit: tool: Don't overwrite test status based on subtest counts
If a subtest itself reports success, but the outer testcase fails,
the whole testcase should be reported as a failure. However the status
is recalculated based on the test counts, overwriting the outer test
result. Synthesize a failed test in this case to make sure the failure
is not swallowed.

Link: https://lore.kernel.org/r/20251230-kunit-nested-failure-v1-2-98cfbeb87823@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-05 15:36:08 -07:00
Thomas Weißschuh
0c5b86c67f kunit: tool: Add test for nested test result reporting
Currently there is a lack of tests validating the result reporting from
nested tests. Add one, it will also be used to validate upcoming changes
to the nested test parsing.

Link: https://lore.kernel.org/r/20251230-kunit-nested-failure-v1-1-98cfbeb87823@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-05 15:35:49 -07:00
Ryota Sakamoto
5c7a474143 kunit: respect KBUILD_OUTPUT env variable by default
Currently, kunit.py ignores the KBUILD_OUTPUT env variable and always
defaults to .kunit in the working directory. This behavior is inconsistent
with standard Kbuild behavior, where KBUILD_OUTPUT defines the build
artifact location.

This patch modifies kunit.py to respect KBUILD_OUTPUT if set.  A .kunit
subdirectory is created inside KBUILD_OUTPUT to avoid polluting the build
directory.

Link: https://lore.kernel.org/r/20260106-kunit-kbuild_output-v2-1-582281797343@gmail.com
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-01-05 15:35:22 -07:00
Mark Brown
fb36d71308 kselftest/arm64: Support FORCE_TARGETS
The top level kselftest Makefile supports an option FORCE_TARGETS which
causes any failures during the build to be propagated to the exit status
of the top level make, useful during build testing. Currently the recursion
done by the arm64 selftests ignores this option, meaning arm64 failures are
not reported via this mechanism. Add the logic to implement FORCE_TARGETS
so that it works for arm64.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-05 21:16:46 +00:00
Dan Williams
29317f8dc6 cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
Unlike the cxl_pci class driver that opportunistically enables memory
expansion with no other dependent functionality, CXL accelerator drivers
have distinct PCIe-only and CXL-enhanced operation states. If CXL is
available some additional coherent memory/cache operations can be enabled,
otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.

This constitutes a new mode of operation where the caller of
devm_cxl_add_memdev() wants to make a "go/no-go" decision about running
in CXL accelerated mode or falling back to PCIe-only operation. Part of
that decision making process likely also includes additional
CXL-acceleration-specific resource setup. Encapsulate both of those
requirements into 'struct cxl_memdev_attach' that provides a ->probe()
callback. The probe callback runs in cxl_mem_probe() context, after the
port topology is successfully attached for the given memdev. It supports
a contract where, upon successful return from devm_cxl_add_memdev(),
everything needed for CXL accelerated operation has been enabled.

Additionally the presence of @cxlmd->attach indicates that the accelerator
driver be detached when CXL operation ends. This conceptually makes a CXL
link loss event mirror a PCIe link loss event which results in triggering
the ->remove() callback of affected devices+drivers. A driver can re-attach
to recover back to PCIe-only operation. Live recovery, i.e. without a
->remove()/->probe() cycle, is left as a future consideration.

[ dj: Repalce with updated commit log from Dan ]

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251216005616.3090129-7-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-05 10:58:04 -07:00
Dan Williams
f2546eba53 cxl/mem: Drop @host argument to devm_cxl_add_memdev()
In all cases the device that created the 'struct cxl_dev_state' instance is
also the device to host the devm cleanup of devm_cxl_add_memdev(). This
simplifies the function prototype, and limits a degree of freedom of the
API.

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Link: https://patch.msgid.link/20251216005616.3090129-6-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-05 10:14:53 -07:00
Boqun Feng
acb0b2f5d6 Merge branch 'rcu-torture.20260104a' into rcu-next
* rcu-torture.20260104a:
  rcutorture: Add --kill-previous option to terminate previous kvm.sh runs
  rcutorture: Prevent concurrent kvm.sh runs on same source tree
  torture: Include commit discription in testid.txt
  torture: Make config2csv.sh properly handle comments in .boot files
  torture: Make kvm-series.sh give run numbers and totals
  torture: Make kvm-series.sh give build numbers and totals
  torture: Parallelize kvm-series.sh guest-OS execution
  rcutorture: Add context checks to rcu_torture_timer()
2026-01-04 18:53:06 +08:00
Joel Fernandes
cf587c6ff2 rcutorture: Add --kill-previous option to terminate previous kvm.sh runs
When kvm.sh is killed, its child processes (make, gcc, qemu, etc.) may
continue running. This prevents new kvm.sh instances from starting even
though the parent is gone.

Add a --kill-previous option that uses fuser(1) to terminate all
processes holding the flock file before attempting to acquire it. This
provides a clean way to recover from stale/zombie kvm.sh runs which
sometimes may have lots of qemu and compiler processes still disturbing.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-04 18:51:14 +08:00
Joel Fernandes
a590a79d19 rcutorture: Prevent concurrent kvm.sh runs on same source tree
Add flock-based locking to kvm.sh to prevent multiple instances from
running concurrently on the same source tree. This prevents build
failures caused by one instance's "make clean" deleting generated files
while another instance is building causing build failures.

The lock file is placed in the rcutorture directory and added to
.gitignore.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-04 18:50:59 +08:00
Benjamin Berg
ec4bb8e8df tools/nolibc: add ptrace support
Add ptrace support, as it will be useful in UML.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
[Thomas: drop va_args usage and linux/uio.h inclusion]
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2026-01-04 10:28:44 +01:00
Linus Torvalds
3d35fa1190 linux_kselftest-fixes-6.19-rc4
-- Fix for build failures in tests that use an empty FIXTURE() seen in
    Android's build environment, which uses -D_FORTIFY_SOURCE=3), a build
    failure occurs in tests that use an empty FIXTURE().
 
 -- Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920 board
    resulting from including transient trace file name in checksum compare.
 
 -- Fix to remove available_events requirement from toplevel-enable for
    instance as it isn't a valid requirement for this test.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmlX++YACgkQCwJExA0N
 QxyWOQ/+M/BWTZxNKpTVfdy3ouQC96u540/Aeu8F0muP8dL2kEPVbcv9kPcEdMR6
 uWoE5IRK1DYj1mKHU81VsW338IEbWWLQVDiPc+6fIoW2WoLAKbxj7IpOxopgk1tA
 gtFkXGwxkKTioNLtdCVmsMAcb+DRuroIpKNngIs0vn/yrZyR0ovuw8YuAAzBdXFM
 KaKMcDhEadbeRs9yLa3UTDHYS7y+7a+1ZvoUr5gM8L9rvNIGjnUpacXVNpdoscBw
 zQnd9Y0dWEKvjsCW9HGJVlAhNHm5agyL2omF5gjQBd7GQ9c+8udKvRUZ4HSpLHMd
 MGT5aQsvw4c+iDlkaI0oFitPN1HGkR5rDzrwrFnOEYZ+aZqs+5mCNpyS0vB4De77
 uLh1/AoO0dZ+tINQoGT7T4nLz5YYTBZBdTuuTNU292nDtvzegD+N82J8K/qt+Rcp
 +dYJQ5QUsJvhQXjgiO7EGXRt+p3Z+b4T9vyQbs0+jb0nXlLTfIZbpSAoKNFkVpIy
 l0G4f9zQf7DhEWghPh2lfwMZVH9FyBlEe9JkQfVQ1765Bd4mt3CM48o2so6op8cU
 N1SXSYKhwqXXH5HZBGIwDKsd+d3wB0JqsaGIt8NG0os/jpKzXtCVRZ+W9RZwukhx
 egk/kt51p3Gg+4wzFn5l0Dg0q62zfbWNecmdnBkTQMFceKUc5Ls=
 =UYYR
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-fixes-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

 - Fix for build failures in tests that use an empty FIXTURE() seen in
   Android's build environment, which uses -D_FORTIFY_SOURCE=3, a build
   failure occurs in tests that use an empty FIXTURE()

 - Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920
   board resulting from including transient trace file name in checksum
   compare

 - Fix to remove available_events requirement from toplevel-enable for
   instance as it isn't a valid requirement for this test

* tag 'linux_kselftest-fixes-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest/harness: Use helper to avoid zero-size memset warning
  selftests/ftrace: Test toplevel-enable for instance
  selftests/ftrace: traceonoff_triggers: strip off names
2026-01-02 12:21:34 -08:00
Linus Torvalds
bea82c80a5 block-6.19-20260102
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmlX7MMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvuYEACG0VFYmcqmB4JZygecJB3xaxhbVIrCbjFv
 Vmc0XNTkcCpjYAv1jpkS5F3nkJhzZlFNn9xOaP/O8E+6tSctFIre7qjMRpxZM3yl
 GA+MqPI+zNbpYMgsoAH/XTASTVfaTEPOlaoAPQeo8Ey3JRw3Ko1IDNU7zIYK94Xl
 rSAeT65W7vJ+HBjctBoCZYMsE2x0Sn0yrVctkL1mMusQwIg6oMhJ1w1p36P17Mc1
 YgLWQYtfK+eogdTM0Jh9RvDtVJL3WT1I2Ii3KBdCgryY7iSxFXvM0pm1lrOBH+kI
 4bKHTylBnjfmxv7dlz3jHwRmahwdXDk7rpq1EMPygDSj835h3SgAFz3rm9nCUjNI
 xWyEZeN6z4ykdOlqJ6ghTnZTroRdM/12HbSV46n69tczxepG3Mn1i3gBd4UQhn5T
 z6aqa7akIsynlzOnLgrwQjxgVhtfAHptrgAg7g7Kz9hq9xTAEPc2f9Nq7glmLP6f
 wPMoy2lla69vk4Tlzh8TZpTHRPLYLHTtL5OQPM6dnyQ6MzWm2/PHJ/MNfV7/o+VR
 W61BYXUz6d2q81c/I16VWVQvJ0nUa3v7hUGCLUeimQUg+ulyIlMX4wrOI7iYTFTy
 V/4c3DHKEh9y/ptmCgv0jDZdwSoUYvXkn0vFe0fcF3q/T7xea4dok8mcXLcKhMuc
 xPFtx92dhQ==
 =4NB3
 -----END PGP SIGNATURE-----

Merge tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

 - Scan partition tables asynchronously for ublk, similarly to how nvme
   does it. This avoids potential deadlocks, which is why nvme does it
   that way too. Includes a set of selftests as well.

 - MD pull request via Yu:
     - Fix null-pointer dereference in raid5 sysfs group_thread_cnt
       store (Tuo Li)
     - Fix possible mempool corruption during raid1 raid_disks update
       via sysfs (FengWei Shih)
     - Fix logical_block_size configuration being overwritten during
       super_1_validate() (Li Nan)
     - Fix forward incompatibility with configurable logical block size:
       arrays assembled on new kernels could not be assembled on older
       kernels (v6.18 and before) due to non-zero reserved pad rejection
       (Li Nan)
     - Fix static checker warning about iterator not incremented (Li Nan)

 - Skip CPU offlining notifications on unmapped hardware queues

 - bfq-iosched block stats fix

 - Fix outdated comment in bfq-iosched

* tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block, bfq: update outdated comment
  blk-mq: skip CPU offline notify on unmapped hctx
  selftests/ublk: fix Makefile to rebuild on header changes
  selftests/ublk: add test for async partition scan
  ublk: scan partition in async way
  block,bfq: fix aux stat accumulation destination
  md: Fix forward incompatibility from configurable logical block size
  md: Fix logical_block_size configuration being overwritten
  md: suspend array while updating raid_disks via sysfs
  md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
  md: Fix static checker warning in analyze_sbs
2026-01-02 12:15:59 -08:00
Puranjay Mohan
cf503eb2c6 selftests: bpf: Fix test_bpf_nf for trusted args becoming default
With trusted args now being the default, passing NULL to kfunc
parameters that are pointers causes verifier rejection rather than a
runtime error. The test_bpf_nf test was failing because it attempted to
pass NULL to bpf_xdp_ct_lookup() to verify runtime error handling.

Since the NULL check now happens at verification time, remove the
runtime test case that passed NULL to the bpf_tuple parameter and
instead add verification-time tests to ensure the verifier correctly
rejects programs that pass NULL to trusted arguments.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-11-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
cf82580c86 selftests: bpf: fix cgroup_hierarchical_stats
The cgroup_hierarchical_stats selftests uses an fentry program attached
to cgroup_attach_task and then passes the received &dst_cgrp->self to
the css_rstat_updated() kfunc. The verifier now assumes that all kfuncs
only takes trusted pointer arguments, and pointers received by fentry
are not marked trustes by default.

Use a tp_btf program in place for fentry for this test, pointers
received by tp_btf programs are marked trusted by the verifier.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-10-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
230b0118e4 selftests: bpf: fix test_kfunc_dynptr_param
As verifier now assumes that all kfuncs only takes trusted pointer
arguments, passing 0 (NULL) to a kfunc that doesn't mark the argument as
__nullable or __opt will be rejected with a failure message of: Possibly
NULL pointer passed to trusted arg<n>

Pass a non-null value to the kfunc to test the expected failure mode.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-9-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
03cc77b10e selftests: bpf: Update failure message for rbtree_fail
The rbtree_api_use_unchecked_remove_retval() selftest passes a pointer
received from bpf_rbtree_remove() to bpf_rbtree_add() without checking
for NULL, this was earlier caught by __check_ptr_off_reg() in the
verifier. Now the verifier assumes every kfunc only takes trusted pointer
arguments, so it catches this NULL pointer earlier in the path and
provides a more accurate failure message.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-8-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
df5004579b selftests: bpf: Update kfunc_param_nullable test for new error message
With trusted args now being the default, the NULL pointer check runs
before type-specific validation. Update test3 to expect the new error
message "Possibly NULL pointer passed to trusted arg0" instead of the
old dynptr-specific error message.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
7646c7afd9 bpf: Remove redundant KF_TRUSTED_ARGS flag from all kfuncs
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:28 -08:00
Florian Westphal
a675d1caa2 selftests: netfilter: nft_concat_range.sh: add check for overlap detection bug
without 'netfilter: nft_set_pipapo: fix range overlap detection':

  reject overlapping range on add       0s         [FAIL]
Returned success for add { 1.2.3.4 . 1.2.4.1-1.2.4.2 } given set:
table inet filter {
	[..]
       elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
                    1.2.3.0-1.2.3.4 . 1.2.4.2 counter packets 0 bytes 0 }
}

The element collides with existing ones and was not added, but kernel
returned success to userspace.

Signed-off-by: Florian Westphal <fw@strlen.de>
2026-01-01 11:31:48 +01:00
Paul E. McKenney
c89474b9b2 torture: Include commit discription in testid.txt
Currently, the testid.txt file in the top-level directory of the
rcutorture results contains the output of "git rev-parse HEAD", which
just gives the full SHA-1 of the current commit.  This is followed by
the output of "git status", which is further followed by the output of
"git diff".  This works, but is less than helpful to human readers
scanning a list of commits.

This commit therefore instead uses "git show --oneline --no-patch HEAD",
which provides a short SHA-1, but also the names of any branches and
the commit's title.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:44:37 +08:00
Paul E. McKenney
dcd6067322 torture: Make config2csv.sh properly handle comments in .boot files
As in strip the "#" and everything after it and *then* tokenize.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:44:36 +08:00
Paul E. McKenney
3d69b6beb8 torture: Make kvm-series.sh give run numbers and totals
The kvm-series.sh script can easily be convinced to run on the order of
1,000 guest OSes, so some sort of progress indicator would be helpful.
This commit therefore updates the "Starting" output lines to read as in
the following example, adding the ("3 of 4"):

Starting TREE02/1.7e0ad1b49057 using 8 CPUs (4 of 4) Sat Nov 8 10:51:06 PM PST 2025

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:44:36 +08:00
Paul E. McKenney
672621773f torture: Make kvm-series.sh give build numbers and totals
The kvm-series.sh script can easily be convinced to do on the order
of 1,000 builds, so some sort of progress indicator would be helpful.
This commit therefore updates the "Starting" output lines to read
as in the following example, adding the ("2 of 4"):

Starting TREE01/1.7e0ad1b49057 (2 of 4) at Sat Nov 8 10:08:21 PM PST 2025

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:44:36 +08:00
Paul E. McKenney
3ce40539cc torture: Parallelize kvm-series.sh guest-OS execution
Currently, kvm-series.sh builds and runs serially, which makes for
long execution times.  This commit changes its logic to build all of
the needed kernels serially, but then run the corresponding guest OSes
concurrently in batches using the entire machine.  On large systems,
this results in order-of-magnitude speedups of the guest-OS execution
portion of the runtime.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:44:34 +08:00
Paul E. McKenney
a73fc3dcc6 rcu: Clean up after the SRCU-fastification of RCU Tasks Trace
Now that RCU Tasks Trace has been re-implemented in terms of SRCU-fast,
the ->trc_ipi_to_cpu, ->trc_blkd_cpu, ->trc_blkd_node, ->trc_holdout_list,
and ->trc_reader_special task_struct fields are no longer used.

In addition, the rcu_tasks_trace_qs(), rcu_tasks_trace_qs_blkd(),
exit_tasks_rcu_finish_trace(), and rcu_spawn_tasks_trace_kthread(),
show_rcu_tasks_trace_gp_kthread(), rcu_tasks_trace_get_gp_data(),
rcu_tasks_trace_torture_stats_print(), and get_rcu_tasks_trace_gp_kthread()
functions and all the other functions that they invoke are no longer used.

Also, the TRC_NEED_QS and TRC_NEED_QS_CHECKED CPP macros are no longer used.
Neither are the rcu_tasks_trace_lazy_ms and rcu_task_ipi_delay rcupdate
module parameters and the TASKS_TRACE_RCU_READ_MB Kconfig option.

This commit therefore removes all of them.

[ paulmck: Apply Alexei Starovoitov feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:39:46 +08:00
Puranjay Mohan
c286e7e9d1 selftests/bpf: veristat: fix printing order in output_stats()
The order of the variables in the printf() doesn't match the text and
therefore veristat prints something like this:

Done. Processed 24 files, 0 programs. Skipped 62 files, 0 programs.

When it should print:

Done. Processed 24 files, 62 programs. Skipped 0 files, 0 programs.

Fix the order of variables in the printf() call.

Fixes: 518fee8bfa ("selftests/bpf: make veristat skip non-BPF and failing-to-open BPF objects")
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251231221052.759396-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-31 16:11:49 -08:00
Wake Liu
19b8a76cd9 kselftest/harness: Use helper to avoid zero-size memset warning
When building kselftests with a toolchain that enables source
fortification (e.g., Android's build environment, which uses
-D_FORTIFY_SOURCE=3), a build failure occurs in tests that use an
empty FIXTURE().

The root cause is that an empty fixture struct results in
`sizeof(self_private)` evaluating to 0. The compiler's fortification
checks then detect the `memset()` call with a compile-time constant size
of 0, issuing a `-Wuser-defined-warnings` which is promoted to an error
by `-Werror`.

An initial attempt to guard the call with `if (sizeof(self_private) > 0)`
was insufficient. The compiler's static analysis is aggressive enough
to flag the `memset(..., 0)` pattern before evaluating the conditional,
thus still triggering the error.

To resolve this robustly, this change introduces a `static inline`
helper function, `__kselftest_memset_safe()`. This function wraps the
size check and the `memset()` call. By replacing the direct `memset()`
in the `__TEST_F_IMPL` macro with a call to this helper, we create an
abstraction boundary. This prevents the compiler's static analyzer from
"seeing" the problematic pattern at the macro expansion site, resolving
the build failure.

Build Context:
Compiler: Android (14488419, +pgo, +bolt, +lto, +mlgo, based on r584948) clang version 22.0.0 (https://android.googlesource.com/toolchain/llvm-project 2d65e4108033380e6fe8e08b1f1826cd2bfb0c99)
Relevant Options: -O2 -Wall -Werror -D_FORTIFY_SOURCE=3 -target i686-linux-android10000

Test: m kselftest_futex_futex_requeue_pi

Removed Gerrit Change-Id
Shuah Khan <skhan@linuxfoundation.org>

Link: https://lore.kernel.org/r/20251224084120.249417-1-wakel@google.com
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-31 13:27:36 -07:00
Zheng Yejian
0eccd4acd6 selftests/ftrace: Test toplevel-enable for instance
'available_events' is actually not required by
'test.d/event/toplevel-enable.tc' and its Existence has been tested in
'test.d/00basic/basic4.tc'.

So the require of 'available_events' can be dropped and then we can add
'instance' flag to test 'test.d/event/toplevel-enable.tc' for instance.

Test result show as below:
 # ./ftracetest test.d/event/toplevel-enable.tc
 === Ftrace unit tests ===
 [1] event tracing - enable/disable with top level files [PASS]
 [2] (instance)  event tracing - enable/disable with top level files [PASS]

 # of passed:  2
 # of failed:  0
 # of unresolved:  0
 # of untested:  0
 # of unsupported:  0
 # of xfailed:  0
 # of undefined(test bug):  0

Link: https://lore.kernel.org/r/20230509203659.1173917-1-zhengyejian1@huawei.com
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-31 12:46:12 -07:00
Yipeng Zou
b889b4fb4c selftests/ftrace: traceonoff_triggers: strip off names
The func_traceonoff_triggers.tc sometimes goes to fail
on my board, Kunpeng-920.

[root@localhost]# ./ftracetest ./test.d/ftrace/func_traceonoff_triggers.tc -l fail.log
=== Ftrace unit tests ===
[1] ftrace - test for function traceon/off triggers     [FAIL]
[2] (instance)  ftrace - test for function traceon/off triggers [UNSUPPORTED]

I look up the log, and it shows that the md5sum is different between csum1 and csum2.

++ cnt=611
++ sleep .1
+++ cnt_trace
+++ grep -v '^#' trace
+++ wc -l
++ cnt2=611
++ '[' 611 -ne 611 ']'
+++ cat tracing_on
++ on=0
++ '[' 0 '!=' 0 ']'
+++ md5sum trace
++ csum1='76896aa74362fff66a6a5f3cf8a8a500  trace'
++ sleep .1
+++ md5sum trace
++ csum2='ee8625a21c058818fc26e45c1ed3f6de  trace'
++ '[' '76896aa74362fff66a6a5f3cf8a8a500  trace' '!=' 'ee8625a21c058818fc26e45c1ed3f6de  trace' ']'
++ fail 'Tracing file is still changing'
++ echo Tracing file is still changing
Tracing file is still changing
++ exit_fail
++ exit 1

So I directly dump the trace file before md5sum, the diff shows that:

[root@localhost]# diff trace_1.log trace_2.log -y --suppress-common-lines
dockerd-12285   [036] d.... 18385.510290: sched_stat | <...>-12285   [036] d.... 18385.510290: sched_stat
dockerd-12285   [036] d.... 18385.510291: sched_swit | <...>-12285   [036] d.... 18385.510291: sched_swit
<...>-740       [044] d.... 18385.602859: sched_stat | kworker/44:1-740 [044] d.... 18385.602859: sched_stat
<...>-740       [044] d.... 18385.602860: sched_swit | kworker/44:1-740 [044] d.... 18385.602860: sched_swit

And we can see that <...> filed be filled with names.

We can strip off the names there to fix that.

After strip off the names:

kworker/u257:0-12 [019] d..2.  2528.758910: sched_stat | -12 [019] d..2.  2528.758910: sched_stat_runtime: comm=k
kworker/u257:0-12 [019] d..2.  2528.758912: sched_swit | -12 [019] d..2.  2528.758912: sched_switch: prev_comm=kw
<idle>-0          [000] d.s5.  2528.762318: sched_waki | -0  [000] d.s5.  2528.762318: sched_waking: comm=sshd pi
<idle>-0          [037] dNh2.  2528.762326: sched_wake | -0  [037] dNh2.  2528.762326: sched_wakeup: comm=sshd pi
<idle>-0          [037] d..2.  2528.762334: sched_swit | -0  [037] d..2.  2528.762334: sched_switch: prev_comm=sw

Link: https://lore.kernel.org/r/20230818013226.2182299-1-zouyipeng@huawei.com
Fixes: d87b29179a ("selftests: ftrace: Use md5sum to take less time of checking logs")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-31 12:45:25 -07:00
Linus Torvalds
349bd28a86 VFIO fixes for v6.19-rc4
- Restrict ROM access to dword to resolve a regression introduced
    with qword access seen on some Intel NICs.  Update VGA region
    access to the same given lack of precedent for 64-bit users.
    (Kevin Tian)
 
  - Fix missing .get_region_info_caps callback in the xe-vfio-pci
    variant driver due to integration through the DRM tree.
    (Michal Wajdeczko)
 
  - Add aligned 64-bit access macros to tools/include/linux/types.h,
    allowing removal of uapi/linux/type.h includes from various
    vfio selftest, resolving redefinition warnings for integration
    with KVM selftests. (David Matlack)
 
  - Fix error path memory leak in pds-vfio-pci variant driver.
    (Zilin Guan)
 
  - Fix error path use-after-free in xe-vfio-pci variant driver.
    (Alper Ak)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmlUP2oRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyKYsw//fEJ6vfn0TOP/ahfY9tRpw6suZJuEo5wx
 SJj57kMF85/V64iaewRq1G1hbrEAEcOgDtpcR3y57lHAS8METKxyjxL5YZdqvgX4
 kRvDwRJRcFz/kvmO0PZx6rn21ZxLv2d9RXahDwaqaQfw2pR2ZOtr/zaawMr6LPmw
 Z1dl0UhQnHIhw4kG1QKUhdCozhAgSV3/pmGV2bOjgXRS0rVUZ3UQZ0RprLe6uIEl
 hSWLmeWUtyrt30gVzoKPTWWuRvuIw2lnAH2PGhNtha70Djyx1EAUs7iqUA8XBsQh
 7JG/T1yibh9CzE5OzI+JBmix5s8zxd4q0RHNa9T31EMHSdzCJhXQfLiZVeJVv9O6
 EHsFVWHzE4CXgSMEpD+QjfCrEwBcF4n6W6N68BFAuAVN51+0DoFinFF0PaqpTivj
 U/Yh1erkfFhy8IlO33Q2dAOxBfy1aIkszKS2Xkc1pwX3vReMlHiWDmyM5ciB0VKJ
 GwslXQwGljSNuxE81e7EFI6g18FGGLGt5EkkYPhSS/hYAZQy0RpqPdRopcP85OiO
 HtyfZZZ/Ph0y13f7c7rH5awGm9NOc9W+2xNKxuKNkjwvEmjO12lRmrzGkKPu/OTi
 YluIWSFG7gBQfb8eFgcfJoM7vcrRCJ+JeSp5fa7u8QLUsTk+var+/WCxKs9BpX31
 hDb8rHj0Bwo=
 =a1By
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Restrict ROM access to dword to resolve a regression introduced with
   qword access seen on some Intel NICs. Update VGA region access to the
   same given lack of precedent for 64-bit users (Kevin Tian)

 - Fix missing .get_region_info_caps callback in the xe-vfio-pci variant
   driver due to integration through the DRM tree (Michal Wajdeczko)

 - Add aligned 64-bit access macros to tools/include/linux/types.h,
   allowing removal of uapi/linux/type.h includes from various vfio
   selftest, resolving redefinition warnings for integration with KVM
   selftests (David Matlack)

 - Fix error path memory leak in pds-vfio-pci variant driver (Zilin Guan)

 - Fix error path use-after-free in xe-vfio-pci variant driver (Alper Ak)

* tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio:
  vfio/xe: Fix use-after-free in xe_vfio_pci_alloc_file()
  vfio/pds: Fix memory leak in pds_vfio_dirty_enable()
  vfio: selftests: Drop <uapi/linux/types.h> includes
  tools include: Add definitions for __aligned_{l,b}e64
  vfio/xe: Add default handler for .get_region_info_caps
  vfio/pci: Disable qword access to the VGA region
  vfio/pci: Disable qword access to the PCI ROM bar
2025-12-31 10:38:48 -08:00
Clint George
3e6ad272bb kselftest/kublk: include message in _Static_assert for C11 compatibility
Add descriptive message in the _Static_assert to comply with the C11
standard requirement to prevent compiler from throwing out error. The
compiler throws an error when _Static_assert is used without a message as
that is a C23 extension.

[] Testing:
The diff between before and after of running the kselftest test of the
module shows no regression on system with x86 architecture

[] Error log:
~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/ublk$ make LLVM=1 W=1
  CC       kublk
In file included from kublk.c:6:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
  220 |         _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
      |                                                  ^
      |                                                  , ""
1 error generated.
In file included from null.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
  220 |         _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
      |                                                  ^
      |                                                  , ""
1 error generated.
In file included from file_backed.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
  220 |         _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
      |                                                  ^
      |                                                  , ""
1 error generated.
In file included from common.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
  220 |         _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
      |                                                  ^
      |                                                  , ""
1 error generated.
In file included from stripe.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
  220 |         _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
      |                                                  ^
      |                                                  , ""
1 error generated.
In file included from fault_inject.c:11:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
  220 |         _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
      |                                                  ^
      |                                                  , ""
1 error generated.
make: *** [../lib.mk:225: ~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/ublk/kublk] Error 1

Link: https://lore.kernel.org/r/20251215085022.7642-1-clintbgeorge@gmail.com
Signed-off-by: Clint George <clintbgeorge@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-31 10:50:21 -07:00
Clint George
0aaff7b109 kselftest/anon_inode: replace null pointers with empty arrays
Make use of empty (NULL-terminated) array instead of NULL pointer to
avoid compiler errors while maintaining the behavior of the function
intact

[] Testing:
The diff between before and after of running the kselftest test of the
module shows no regression on system with x86 architecture

[] Error log:
~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/filesystems$ make LLVM=1 W=1
  CC       devpts_pts
  CC       file_stressor
  CC       anon_inode_test
anon_inode_test.c:45:37: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
   45 |         ASSERT_LT(execveat(fd_context, "", NULL, NULL, AT_EMPTY_PATH), 0);
      |                                            ^~~~
/usr/lib/llvm-18/lib/clang/18/include/__stddef_null.h:26:14: note: expanded from macro 'NULL'
   26 | #define NULL ((void*)0)
      |              ^~~~~~~~~~
../Desktop/kernel-dev/linux-v1/tools/testing/selftests/../../../tools/testing/selftests/kselftest_harness.h:535:11: note: expanded from macro 'ASSERT_LT'
  535 |         __EXPECT(expected, #expected, seen, #seen, <, 1)
      |                  ^~~~~~~~
../Desktop/kernel-dev/linux-v1/tools/testing/selftests/../../../tools/testing/selftests/kselftest_harness.h:758:33: note: expanded from macro '__EXPECT'
  758 |         __typeof__(_expected) __exp = (_expected); \
      |                                        ^~~~~~~~~
1 warning generated.

Link: https://lore.kernel.org/r/20251215084900.7590-1-clintbgeorge@gmail.com
Signed-off-by: Clint George <clintbgeorge@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-31 10:50:13 -07:00
Clint George
673a55cc49 kselftest/coredump: use __builtin_trap() instead of null pointer
Use __builtin_trap() to truly crash the program instead of dereferencing
null pointer which may be optimized by the compiler preventing the crash
from occurring

[] Testing:
The diff between before and after of running the kselftest test of the
module shows no regression on system with x86 architecture

[] Error log:
~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/coredump$ make LLVM=1 W=1
  CC       stackdump_test
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
   59 |         i = *(int *)NULL;
      |             ^~~~~~~~~~~~
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
1 warning generated.
  CC       coredump_socket_test
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
   59 |         i = *(int *)NULL;
      |             ^~~~~~~~~~~~
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
1 warning generated.
  CC       coredump_socket_protocol_test
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
   59 |         i = *(int *)NULL;
      |             ^~~~~~~~~~~~
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
1 warning generated.

Link: https://lore.kernel.org/r/20251215084737.7504-1-clintbgeorge@gmail.com
Signed-off-by: Clint George <clintbgeorge@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-12-31 10:49:57 -07:00
Ihor Solodrai
1a8fa7faf4 resolve_btfids: Implement --patch_btfids
Recent changes in BTF generation [1] rely on ${OBJCOPY} command to
update .BTF_ids section data in target ELF files.

This exposed a bug in llvm-objcopy --update-section code path, that
may lead to corruption of a target ELF file. Specifically, because of
the bug st_shndx of some symbols may be (incorrectly) set to 0xffff
(SHN_XINDEX) [2][3].

While there is a pending fix for LLVM, it'll take some time before it
lands (likely in 22.x). And the kernel build must keep working with
older LLVM toolchains in the foreseeable future.

Using GNU objcopy for .BTF_ids update would work, but it would require
changes to LLVM-based build process, likely breaking existing build
environments as discussed in [2].

To work around llvm-objcopy bug, implement --patch_btfids code path in
resolve_btfids as a drop-in replacement for:

    ${OBJCOPY} --update-section .BTF_ids=${btf_ids} ${elf}

Which works specifically for .BTF_ids section:

    ${RESOLVE_BTFIDS} --patch_btfids ${btf_ids} ${elf}

This feature in resolve_btfids can be removed at some point in the
future, when llvm-objcopy with a relevant bugfix becomes common.

[1] https://lore.kernel.org/bpf/20251219181321.1283664-1-ihor.solodrai@linux.dev/
[2] https://lore.kernel.org/bpf/20251224005752.201911-1-ihor.solodrai@linux.dev/
[3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251231012558.1699758-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-31 09:04:09 -08:00
Eduard Zingerman
4fd99103ee selftests/bpf: iterator based loop and STACK_MISC states pruning
The test case first initializes 9 stack slots as STACK_MISC,
then conditionally updates each of them to SCALAR spill inside an
iterator based loop. This leads to 2**9 combinations of MISC/SPILL
marks for these slots at the iterator next call.
The loop converges only if the verifier treats such states as
equivalent, otherwise visited states are evicted from the states cache
too quickly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251230-loop-stack-misc-pruning-v1-2-585cfd6cec51@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-31 09:01:13 -08:00
Eduard Zingerman
e6f2612f0e selftests/bpf: test cases for bpf_loop SCC and state graph backedges
Test for state graph backedges accumulation for SCCs formed by
bpf_loop(). Equivalent to the following C program:

  int main(void) {
    1: fp[-8] = bpf_get_prandom_u32();
    2: fp[-16] = -32;                       // used in a memory access below
    3: bpf_loop(7, loop_cb4, fp, 0);
    4: return 0;
  }

  int loop_cb4(int i, void *ctx) {
    5: if (unlikely(ctx[-8] > bpf_get_prandom_u32()))
    6:   *(u64 *)(fp + ctx[-16]) = 42;      // aligned access expected
    7: if (unlikely(fp[-8] > bpf_get_prandom_u32()))
    8:   ctx[-16] = -31;                    // makes said access unaligned
    9: return 0;
  }

If state graph backedges are not accumulated properly at the SCC
formed by loop_cb4() call from bpf_loop(), the state {ctx[-16]=-32}
injected at instruction 9 on verification path 1,2,3,5,7,9,4 would be
considered fully verified and would lack precision mark for ctx[-16].
This would lead to early pruning of verification path 1,2,3,5,7,8,9 in
state {ctx[-16]=-31}, which in turn leads to the incorrect assumption
that the above program is safe.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251229-scc-for-callbacks-v1-2-ceadfe679900@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-30 15:42:42 -08:00
Puranjay Mohan
317a5df78f selftests/bpf: Fix verifier_arena_large/big_alloc3 test
The big_alloc3() test tries to allocate 2051 pages at once in
non-sleepable context and this can fail sporadically on resource
contrained systems, so skip this test in case of such failures.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251230195134.599463-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-30 12:35:23 -08:00
Linus Torvalds
dbf8fe85a1 Including fixes from Bluetooth and WiFi. Notably this includes the fix
for the iwlwifi issue you reported.
 
 Current release - regressions:
 
   - core: avoid prefetching NULL pointers
 
   - wifi:
     - iwlwifi: implement settime64 as stub for MVM/MLD PTP
     - mac80211: fix list iteration in ieee80211_add_virtual_monitor()
 
   - handshake: fix null-ptr-deref in handshake_complete()
 
   - eth: mana: fix use-after-free in reset service rescan path
 
 Previous releases - regressions:
 
   - openvswitch: avoid needlessly taking the RTNL on vport destroy
 
   - dsa: properly keep track of conduit reference
 
   - ipv4:
     - fix reference count leak when using error routes with nexthop objects
     - fib: restore ECMP balance from loopback
 
   - mptcp: ensure context reset on disconnect()
 
   - bluetooth: fix potential UaF in btusb
 
   - nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write
 
   - eth: gve: defer interrupt enabling until NAPI registration
 
   - eth: i40e: fix scheduling in set_rx_mode
 
   - eth: macb: relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()
 
   - eth: rtl8150: fix memory leak on usb_submit_urb() failure
 
   - wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()
 
 Previous releases - always broken:
 
   - ip6_gre: make ip6gre_header() robust
 
   - ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
 
   - af_unix: don't post cmsg for SO_INQ unless explicitly asked for
 
   - phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration
 
   - wifi: mac80211: discard beacon frames to non-broadcast address
 
   - eth: iavf: fix off-by-one issues in iavf_config_rss_reg()
 
   - eth: stmmac: fix the crash issue for zero copy XDP_TX action
 
   - eth: team: fix check for port enabled in team_queue_override_port_prio_changed()
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmlT43wSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkHhMP/jF3B8c3djMgpYpEwPRgqJlzdpBGQvcO
 UsN/8fYI/XowIcU6T/yC/KM5cABCWyfnj6yZe743wPrlj8DnWK7+Fezrfwx7l8e0
 0LH9kVwOIaQXg/QthtXaDHNB/9OanDtgcpitI209gENRjF81bYWCehImil6jbVnn
 DUnVmfZIQ6k3dFsAPC4W7uJdA2FORtQzEZ1dZ13Ivx9jmbazK81ptUbIMAAnyfIZ
 rUhv+UqaDIlflYwuay58ZPdu8no4nQlJMPiPybXiizfTVStEne9SQKOacP8j7XL0
 GSjEyDO8lJXCPVSVnGEyybBH50M0myGUSH73+56o2QRRLtrHLDfieOL/N8AarNDh
 7U2g9pq0+IFPuJsm9SFR14dIpUAvpKohc57ZvsmworC8NuENzl6H3b6/U4n/1oNE
 JCbcitl91GkyF0Bvyac5a9wfk8SsYEJEGLPrtNX8UwH0UJh8spkfoQq5oHkC3juQ
 77n//eOFSz8oPDlV7ayNv+W3CEzOW09mSYFu8bdjBKC5HeyBJsm3HnJdaAhSqNEH
 6duRvcMlUJQ0JPILJoS1Zoy166uIu8hs2mZtygcAzyacia8yckL+Oq2UCYMi2oKP
 psOIzfd6G/+f203w37jdSY0OVlQSHvmCFSeaY6FHs2LEApDrNWK1cMLYO+lMU8EE
 q0j2SmDNMHhM
 =P1hX
 -----END PGP SIGNATURE-----

Merge tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth and WiFi. Notably this includes the fix
  for the iwlwifi issue you reported.

  Current release - regressions:

   - core: avoid prefetching NULL pointers

   - wifi:
      - iwlwifi: implement settime64 as stub for MVM/MLD PTP
      - mac80211: fix list iteration in ieee80211_add_virtual_monitor()

   - handshake: fix null-ptr-deref in handshake_complete()

   - eth: mana: fix use-after-free in reset service rescan path

  Previous releases - regressions:

   - openvswitch: avoid needlessly taking the RTNL on vport destroy

   - dsa: properly keep track of conduit reference

   - ipv4:
      - fix error route reference count leak with nexthop objects
      - fib: restore ECMP balance from loopback

   - mptcp: ensure context reset on disconnect()

   - bluetooth: fix potential UaF in btusb

   - nfc: fix deadlock between nfc_unregister_device and
     rfkill_fop_write

   - eth:
      - gve: defer interrupt enabling until NAPI registration
      - i40e: fix scheduling in set_rx_mode
      - macb: relocate mog_init_rings() callback from macb_mac_link_up()
        to macb_open()
      - rtl8150: fix memory leak on usb_submit_urb() failure

   - wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()

  Previous releases - always broken:

   - ip6_gre: make ip6gre_header() robust

   - ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT

   - af_unix: don't post cmsg for SO_INQ unless explicitly asked for

   - phy: mediatek: fix nvmem cell reference leak in
     mt798x_phy_calibration

   - wifi: mac80211: discard beacon frames to non-broadcast address

   - eth:
      - iavf: fix off-by-one issues in iavf_config_rss_reg()
      - stmmac: fix the crash issue for zero copy XDP_TX action
      - team: fix check for port enabled when priority changes"

* tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
  net: rose: fix invalid array index in rose_kill_by_device()
  net: enetc: do not print error log if addr is 0
  net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()
  selftests: fib_test: Add test case for ipv4 multi nexthops
  net: fib: restore ECMP balance from loopback
  selftests: fib_nexthops: Add test cases for error routes deletion
  ipv4: Fix reference count leak when using error routes with nexthop objects
  net: usb: sr9700: fix incorrect command used to write single register
  ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr()
  usbnet: avoid a possible crash in dql_completed()
  gve: defer interrupt enabling until NAPI registration
  net: stmmac: fix the crash issue for zero copy XDP_TX action
  octeontx2-pf: fix "UBSAN: shift-out-of-bounds error"
  af_unix: don't post cmsg for SO_INQ unless explicitly asked for
  net: mana: Fix use-after-free in reset service rescan path
  net: avoid prefetching NULL pointers
  net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct
  net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write
  net: usb: asix: validate PHY address before use
  ...
2025-12-30 08:45:58 -08:00
Vadim Fedorenko
3be42c3b3d selftests: fib_test: Add test case for ipv4 multi nexthops
The test checks that with multi nexthops route the preferred route is the
one which matches source ip. In case when source ip is on dummy
interface, it checks that the routes are balanced.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251221192639.3911901-2-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30 11:07:38 +01:00
Ido Schimmel
44741e9de2 selftests: fib_nexthops: Add test cases for error routes deletion
Add test cases that check that error routes (e.g., blackhole) are
deleted when their nexthop is deleted.

Output without "ipv4: Fix reference count leak when using error routes
with nexthop objects":

 # ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal"

 IPv4 functional
 ----------------------
 [...]
       WARNING: Unexpected route entry
 TEST: Error route removed on nexthop deletion                       [FAIL]

 IPv6
 ----------------------
 [...]
 TEST: Error route removed on nexthop deletion                       [ OK ]

 Tests passed:  20
 Tests failed:   1
 Tests skipped:  0

Output with "ipv4: Fix reference count leak when using error routes
with nexthop objects":

 # ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal"

 IPv4 functional
 ----------------------
 [...]
 TEST: Error route removed on nexthop deletion                       [ OK ]

 IPv6
 ----------------------
 [...]
 TEST: Error route removed on nexthop deletion                       [ OK ]

 Tests passed:  21
 Tests failed:   0
 Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251221144829.197694-2-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30 10:39:22 +01:00
Linus Torvalds
0b34fd0fea 27 hotfixes. 12 are cc:stable, 18 are MM.
There's a three patch series from Jiayuan Chen which fixes some issues
 with KASAN and vmalloc.  Apart from that it's the usual shower of
 singletons - please see the respective changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaVIWxwAKCRDdBJ7gKXxA
 jt6HAP49s/mEIIbuZbqnX8hxDrvYdYffs+RsSLPEZoR+yKG/7gD/VqSZhMoPw53b
 rMZ56djXNjWxsOAfiVbZit3SFuQfnQ4=
 =5rGD
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-12-28-21-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "27 hotfixes.  12 are cc:stable, 18 are MM.

  There's a patch series from Jiayuan Chen which fixes some
  issues with KASAN and vmalloc. Apart from that it's the usual
  shower of singletons - please see the respective changelogs
  for details"

* tag 'mm-hotfixes-stable-2025-12-28-21-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (27 commits)
  mm/ksm: fix pte_unmap_unlock of wrong address in break_ksm_pmd_entry
  mm/page_owner: fix memory leak in page_owner_stack_fops->release()
  mm/memremap: fix spurious large folio warning for FS-DAX
  MAINTAINERS: notify the "Device Memory" community of memory hotplug changes
  sparse: update MAINTAINERS info
  mm/page_alloc: report 1 as zone_batchsize for !CONFIG_MMU
  mm: consider non-anon swap cache folios in folio_expected_ref_count()
  rust: maple_tree: rcu_read_lock() in destructor to silence lockdep
  mm: memcg: fix unit conversion for K() macro in OOM log
  mm: fixup pfnmap memory failure handling to use pgoff
  tools/mm/page_owner_sort: fix timestamp comparison for stable sorting
  selftests/mm: fix thread state check in uffd-unit-tests
  kernel/kexec: fix IMA when allocation happens in CMA area
  kernel/kexec: change the prototype of kimage_map_segment()
  MAINTAINERS: add ABI headers to KHO and LIVE UPDATE
  .mailmap: remove one of the entries for WangYuli
  mm/damon/vaddr: fix missing pte_unmap_unlock in damos_va_migrate_pmd_entry()
  MAINTAINERS: update one straggling entry for Bartosz Golaszewski
  mm/page_alloc: change all pageblocks migrate type on coalescing
  mm: leafops.h: correct kernel-doc function param. names
  ...
2025-12-29 11:40:38 -08:00
Tingmao Wang
55dc93a7c2
selftests/landlock: Use scoped_base_variants.h for ptrace_test
ptrace_test.c currently contains a duplicated version of the
scoped_domains fixture variants.  This patch removes that and make it use
the shared scoped_base_variants.h instead, like in
scoped_abstract_unix_test and scoped_signal_test.

This required renaming the hierarchy fixture to scoped_domains, but the
test is otherwise the same.

Cc: Tahera Fahimi <fahimitahera@gmail.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/48148f0134f95f819a25277486a875a6fd88ecf9.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-29 16:19:39 +01:00
Tingmao Wang
7aa593d8fb
selftests/landlock: Fix missing semicolon
Add missing semicolon after EXPECT_EQ(0, close(stream_server_child)) in
the scoped_vs_unscoped test.  I suspect currently it's just not executing
the close statement after the line, but this causes no observable
difference.

Fixes: fefcf0f7cf ("selftests/landlock: Test abstract UNIX socket scoping")
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/d9e968e4cd4ecc9bf487593d7b7220bffbb3b5f5.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-29 16:19:38 +01:00
Tingmao Wang
14c00e30d3
selftests/landlock: Fix typo in fs_test
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/62d18e06eeb26f62bc49d24c4467b3793c5ba32b.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-29 16:19:38 +01:00
Ming Lei
a2ce133969 selftests/ublk: fix Makefile to rebuild on header changes
Add header dependencies to kublk build rule so that changes to
kublk.h, ublk_dep.h, or utils.h trigger a rebuild.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-28 09:25:26 -07:00
Ming Lei
60cf863720 selftests/ublk: add test for async partition scan
Add test_generic_15.sh to verify that async partition scan prevents
IO hang when reading partition tables.

The test creates ublk devices with fault_inject target and very large
delay (60s) to simulate blocked partition table reads, then kills the
daemon to verify proper state transitions without hanging:

1. Without recovery support:
   - Create device with fault_inject and 60s delay
   - Kill daemon while partition scan may be blocked
   - Verify device transitions to DEAD state

2. With recovery support (-r 1):
   - Create device with fault_inject, 60s delay, and recovery
   - Kill daemon while partition scan may be blocked
   - Verify device transitions to QUIESCED state

Before the async partition scan fix, killing the daemon during
partition scan would cause deadlock as partition scan held ub->mutex
while waiting for IO. With the async fix, partition scan happens in
a work function and flush_work() ensures proper synchronization.

Add _add_ublk_dev_no_settle() helper function to skip udevadm settle,
which would otherwise hang waiting for partition scan events to
complete when partition table read is delayed.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-28 09:25:26 -07:00
Matthieu Buffet
e4aa4461d4
selftests/landlock: NULL-terminate unix pathname addresses
The size of Unix pathname addresses is computed in selftests using
offsetof(struct sockaddr_un, sun_path) + strlen(xxx). It should have
been that +1, which makes addresses passed to the libc and kernel
non-NULL-terminated. unix_mkname_bsd() fixes that in Linux so there is
no harm, but just using sizeof(the address struct) should improve
readability.

Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251202215141.689986-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-26 20:38:58 +01:00
Matthieu Buffet
e1a57c3359
selftests/landlock: Remove invalid unix socket bind()
Remove bind() call on a client socket that doesn't make sense.
Since strlen(cli_un.sun_path) returns a random value depending on stack
garbage, that many uninitialized bytes are read from the stack as an
unix socket address. This creates random test failures due to the bind
address being invalid or already in use if the same stack value comes up
twice.

Fixes: f83d51a5bd ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251201003631.190817-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-26 20:38:58 +01:00
Matthieu Buffet
6685201ebf
selftests/landlock: Add missing connect(minimal AF_UNSPEC) test
connect_variant(unspec_any0) is called twice. Both calls end
up in connect_variant_addrlen() with an address length of
get_addrlen(minimal=false).
However, the connect() syscall and its variants (e.g.
iouring/compat) accept much shorter addresses of 4 bytes
and that behaviour was not tested.

Replace one of these calls with one using a minimal address
length (just a bare sa_family=AF_UNSPEC field with no actual
address). Also add a call using a truncated address for good
measure.

Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027190726.626244-3-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-26 20:38:57 +01:00
Matthieu Buffet
bd09d9a05c
selftests/landlock: Fix TCP bind(AF_UNSPEC) test case
The nominal error code for bind(AF_UNSPEC) on an IPv6 socket
is -EAFNOSUPPORT, not -EINVAL. -EINVAL is only returned when
the supplied address struct is too short, which happens to be
the case in current selftests because they treat AF_UNSPEC
like IPv4 sockets do: as an alias for AF_INET (which is a
16-byte struct instead of the 24 bytes required by IPv6
sockets).

Make the union large enough for any address (by adding struct
sockaddr_storage to the union), and make AF_UNSPEC addresses
large enough for any family.

Test for -EAFNOSUPPORT instead, and add a dedicated test case
for truncated inputs with -EINVAL.

Fixes: a549d055a2 ("selftests/landlock: Add network tests")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027190726.626244-2-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-12-26 20:38:57 +01:00
David Matlack
193120dddd vfio: selftests: Drop <uapi/linux/types.h> includes
Drop the <uapi/linux/types.h> includes now that <linux/types.h>
(tools/include/linux/types.h) has a definition for __aligned_le64, which
is needed by <linux/iommufd.h>.

Including <uapi/linux/types.h> is harmless but causes benign typedef
redefinitions. This is not a problem for VFIO selftests but becomes an
issue when the VFIO selftests library is built into KVM selftests, since
they are built with -std=gnu99 which does not allow typedef redifitions.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251219233818.1965306-3-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-12-23 14:17:13 -07:00
Puranjay Mohan
efecc9e825 selftests: bpf: test non-sleepable arena allocations
As arena kfuncs can now be called from non-sleepable contexts, test this
by adding non-sleepable copies of tests in verifier_arena, this is done
by using a socket program instead of syscall.

Add a new test case in verifier_arena_large to check that the
bpf_arena_alloc_pages() works for more than 1024 pages.
1024 * sizeof(struct page *) is the upper limit of kmalloc_nolock() but
bpf_arena_alloc_pages() should still succeed because it re-uses this
array in a loop.

Augment the arena_list selftest to also run in non-sleepable context by
taking rcu_read_lock.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251222195022.431211-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-23 11:30:00 -08:00
Wake Liu
632b874d59 selftests/mm: fix thread state check in uffd-unit-tests
In the thread_state_get() function, the logic to find the thread's state
character was using `sizeof(header) - 1` to calculate the offset from the
"State:\t" string.

The `header` variable is a `const char *` pointer.  `sizeof()` on a
pointer returns the size of the pointer itself, not the length of the
string literal it points to.  This makes the code's behavior dependent on
the architecture's pointer size.

This bug was identified on a 32-bit ARM build (`gsi_tv_arm`) for Android,
running on an ARMv8-based device, compiled with Clang 19.0.1.

On this 32-bit architecture, `sizeof(char *)` is 4.  The expression
`sizeof(header) - 1` resulted in an incorrect offset of 3, causing the
test to read the wrong character from `/proc/[tid]/status` and fail.

On 64-bit architectures, `sizeof(char *)` is 8, so the expression
coincidentally evaluates to 7, which matches the length of "State:\t". 
This is why the bug likely remained hidden on 64-bit builds.

To fix this and make the code portable and correct across all
architectures, this patch replaces `sizeof(header) - 1` with
`strlen(header)`.  The `strlen()` function correctly calculates the
string's length, ensuring the correct offset is always used.

Link: https://lkml.kernel.org/r/20251210091408.3781445-1-wakel@google.com
Fixes: f60b6634cd ("mm/selftests: add a test to verify mmap_changing race with -EAGAIN")
Signed-off-by: Wake Liu <wakel@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-12-23 11:23:14 -08:00
Matthew Wilcox (Oracle)
c6e8e595a0 idr: fix idr_alloc() returning an ID out of range
If you use an IDR with a non-zero base, and specify a range that lies
entirely below the base, 'max - base' becomes very large and
idr_get_free() can return an ID that lies outside of the requested range.

Link: https://lkml.kernel.org/r/20251128161853.3200058-1-willy@infradead.org
Fixes: 6ce711f275 ("idr: Make 1-based IDRs more efficient")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6449
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-12-23 11:23:11 -08:00
Alice C. Munduruca
472c5dd6b9 selftests: net: fix "buffer overflow detected" for tap.c
When the selftest 'tap.c' is compiled with '-D_FORTIFY_SOURCE=3',
the strcpy() in rtattr_add_strsz() is replaced with a checked
version which causes the test to consistently fail when compiled
with toolchains for which this option is enabled by default.

 TAP version 13
 1..3
 # Starting 3 tests from 1 test cases.
 #  RUN           tap.test_packet_valid_udp_gso ...
 *** buffer overflow detected ***: terminated
 # test_packet_valid_udp_gso: Test terminated by assertion
 #          FAIL  tap.test_packet_valid_udp_gso
 not ok 1 tap.test_packet_valid_udp_gso
 #  RUN           tap.test_packet_valid_udp_csum ...
 *** buffer overflow detected ***: terminated
 # test_packet_valid_udp_csum: Test terminated by assertion
 #          FAIL  tap.test_packet_valid_udp_csum
 not ok 2 tap.test_packet_valid_udp_csum
 #  RUN           tap.test_packet_crash_tap_invalid_eth_proto ...
 *** buffer overflow detected ***: terminated
 # test_packet_crash_tap_invalid_eth_proto: Test terminated by assertion
 #          FAIL  tap.test_packet_crash_tap_invalid_eth_proto
 not ok 3 tap.test_packet_crash_tap_invalid_eth_proto
 # FAILED: 0 / 3 tests passed.
 # Totals: pass:0 fail:3 xfail:0 xpass:0 skip:0 error:0

A buffer overflow is detected by the fortified glibc __strcpy_chk()
since the __builtin_object_size() of `RTA_DATA(rta)` is incorrectly
reported as 1, even though there is ample space in its bounding
buffer `req`.

Additionally, given that IFLA_IFNAME also expects a null-terminated
string, callers of rtaddr_add_str{,sz}() could simply use the
rtaddr_add_strsz() variant. (which has been renamed to remove the
trailing `sz`) memset() has been used for this function since it
is unchecked and thus circumvents the issue discussed in the
previous paragraph.

Fixes: 2e64fe4624 ("selftests: add few test cases for tap driver")
Signed-off-by: Alice C. Munduruca <alice.munduruca@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251216170641.250494-1-alice.munduruca@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23 12:30:23 +01:00
Daniel Zahka
f0e5126f5e selftests: drv-net: psp: fix test names in ipver_test_builder()
test_case will only take on the formatted name after being
called. This does not work with the way ksft_run() currently
works. Assign the name after the test_case is created.

Fixes: 81236c74db ("selftests: drv-net: psp: add test for auto-adjusting TCP MSS")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251216-psp-test-fix-v1-2-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23 12:05:04 +01:00
Daniel Zahka
d52668cac3 selftests: drv-net: psp: fix templated test names in psp_ip_ver_test_builder()
test_case will only take on its formatted name after it is called by
the test runner. Move the assignment to test_case.__name__ to when the
test_case is constructed, not called.

Fixes: 8f90dc6e41 ("selftests: drv-net: psp: add basic data transfer and key rotation tests")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251216-psp-test-fix-v1-1-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23 12:05:04 +01:00
Puranjay Mohan
83dd46ecb6 selftests: bpf: fix tests with raw_tp calling kfuncs
As the previous commit allowed raw_tp programs to call kfuncs, so of the
selftests that were expected to fail will now succeed.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251222133250.1890587-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22 22:23:38 -08:00
JP Kobryn
6bce6ddbe6 bpf: selftests: selftests for memcg stat kfuncs
Add test coverage for the kfuncs that fetch memcg stats. Using some common
stats, test scenarios ensuring that the given stat increases by some
arbitrary amount. The stats selected cover the three categories represented
by the enums: node_stat_item, memcg_stat_item, vm_event_item.

Since only a subset of all stats are queried, use a static struct made up
of fields for each stat. Write to the struct with the fetched values when
the bpf program is invoked and read the fields in the user mode program for
verification.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20251223044156.208250-6-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22 22:20:22 -08:00
Gopi Krishna Menon
42f53b3900 selftests/powerpc/pmu/: Add check_extended_reg_test to .gitignore
Add the check_extended_reg_test binary to .gitignore to avoid accidentally
staging the build artifact.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Tested-by: Aditya Bodkhe <adityab1@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250922004439.2395-1-krishnagopi487@gmail.com
2025-12-22 17:55:07 +05:30
Matt Bobrowski
d2749ae85a selftests/bpf: add test case for BPF LSM hook bpf_lsm_mmap_file
Add a trivial test case asserting that the BPF verifier enforces
PTR_MAYBE_NULL semantics on the struct file pointer argument of BPF
LSM hook bpf_lsm_mmap_file().

Dereferencing the struct file pointer passed into bpf_lsm_mmap_file()
without explicitly performing a NULL check first should not be
permitted by the BPF verifier as it can lead to NULL pointer
dereferences and a kernel crash.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251216133000.3690723-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-21 10:56:33 -08:00
Linus Torvalds
18dfd1cbf6 Two left-over updates that could not go into -rc1 due to conflicts with
other series:
 
  - Simplify checks in arch_kfence_init_pool() since force_pte_mapping()
    already takes BBML2-noabort (break-before-make Level 2 with no aborts
    generated) into account
 
  - Remove unneeded SVE/SME fallback preserve/store handling in the arm64
    EFI. With the recent updates, the fallback path is only taken for EFI
    runtime calls from hardirq or NMI contexts. In practice, this only
    happens under panic/oops/emergency_restart() and no restoring of the
    user state expected. There's a corresponding lkdtm update to trigger
    a BUG() or panic() from hardirq context together with a fixup not to
    confuse clang/objtool about the control flow
 
 GCS (guarded control stacks) fix: flush the GCS locking state on exec,
 otherwise the new task will not be able to enable GCS (locked as
 disabled).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmlGb0YACgkQa9axLQDI
 XvGNTA/+MFft7b7Seb0DlanyxMfnKmLqoPbdciSCQh5I79fklWDMCqh/B1KVUxhG
 cnY45GyDEO/AC46wYvqqm6frS4sYHwiDybIJwLPGcxmPaoeNacVQkWYjiM6sU68m
 RjcFe5xN1+3y+d37ZPgoClU6YNDqNcxv1jLHzQly3eO/8LLOGi6S8bEX9v8SGLWR
 4PJki0YSOIJQYEehmAglPwroRgJXeHcaLboF/I/ULPZjhrDMIz2BSoNKreGfzgGI
 0+myd6CeGjxcAKf2JI9zmBHiCmgGGi9gMz5BXm6yo+jGoyeMaeTi5/w4r/D4GJlp
 kS2RbI+LpWEFN4yKk23YVMM7fYmI9ZCTEsRw0osHV2uqMSt2WNGadE4klG8uV6oK
 h0VMKY5QB0SZVc0uT+S44zkK9iLVPbn8FgCAbzMNBE9gTgf/nT6l09VsdgNrzK90
 CTwwvt9S0hiRk9VMa57WAWodb8lxp0IUE0eP4ptc/Y7EL8ibgTZcm0GVxG9OLMBD
 J+RXQZ1VugeUoYncEH23KYSo89kjRJ7k6tAkz1B6bK1ULnIoQsebyNFhqITCx9t8
 dc/3hqsgijiyMhZchLZtvqYh/tfSFBsKTN5cQj2qQCxTlnX7Pi1UvbnSwXeGGdki
 lEWTEtkqKQdE2wryYkMlvMllCLibmr7/FClMncj0pEPQjz/FJDA=
 =vAzy
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Two left-over updates that could not go into -rc1 due to conflicts
  with other series:

   - Simplify checks in arch_kfence_init_pool() since
     force_pte_mapping() already takes BBML2-noabort (break-before-make
     Level 2 with no aborts generated) into account

   - Remove unneeded SVE/SME fallback preserve/store handling in the
     arm64 EFI. With the recent updates, the fallback path is only taken
     for EFI runtime calls from hardirq or NMI contexts. In practice,
     this only happens under panic/oops/emergency_restart() and no
     restoring of the user state expected.

     There's a corresponding lkdtm update to trigger a BUG() or panic()
     from hardirq context together with a fixup not to confuse
     clang/objtool about the control flow

  GCS (guarded control stacks) fix: flush the GCS locking state on exec,
  otherwise the new task will not be able to enable GCS (locked as
  disabled)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  lkdtm/bugs: Do not confuse the clang/objtool with busy wait loop
  arm64/gcs: Flush the GCS locking state on exec
  arm64/efi: Remove unneeded SVE/SME fallback preserve/store handling
  lkdtm/bugs: Add cases for BUG and PANIC occurring in hardirq context
  arm64: mm: Simplify check in arch_kfence_init_pool()
2025-12-20 11:34:37 -08:00
Linus Torvalds
072c0b4f0f x86 fixes. Everyone else is already in holiday mood apparently.
- Add a missing "break" to fix param parsing in the rseq selftest.
 
 - Apply runtime updates to the _current_ CPUID when userspace is setting
   CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
   dropping the pending update.
 
 - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
   supported by KVM and leads to a use-after-free due to KVM failing to unbind
   the memslot from the previously-associated guest_memfd instance.
 
 - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
   flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.
 
 - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
   SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
   as -1ull (a 64-bit value).
 
 - Update SVI when activating APICv to fix a bug where a post-activation EOI
   for an in-service IRQ would effective be lost due to SVI being stale.
 
 - Immediately refresh APICv controls (if necessary) on a nested VM-Exit
   instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
   effectively ignored because KVM thinks the vCPU already has the correct
   APICv settings.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmlEQXEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOscwf/ahimZsBtFTtd6yZpZ84E1O7P8g31
 aZ4p9CRcx+I0aMQKOfS98R3G1w8FtMpcX11aPWvIi4Vkx/4PVlZr+ds8cfiMV7Z+
 Hdd7k/uR4eC+KcL0XoGcmukBlHYi/8lZeijUcID+Iueo7MMlyX5G/VQ1RzBZwP++
 VFObgtzoAmlKrR5AaFeRMBnY+fiH6wMjPSwY7fI5pr4qNHrxB06wgtX5+C1Y3Poa
 7ETMrfX2XtyfG53N+1YJMGoaXqBkFaGCateMJQNOu/QQzOZhUuBCkC8muzbCTckO
 Y405j7GfpyUmhf/mdLI8BlMYrn5Mwdnvo9iJ3U5fGUN/Eh4XmPIYKy7sMg==
 =Fa0R
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm fixes from Paolo Bonzini:
 "x86 fixes.  Everyone else is already in holiday mood apparently.

   - Add a missing 'break' to fix param parsing in the rseq selftest

   - Apply runtime updates to the _current_ CPUID when userspace is
     setting CPUID, e.g. as part of vCPU hotplug, to fix a false
     positive and to avoid dropping the pending update

   - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as
     it's not supported by KVM and leads to a use-after-free due to KVM
     failing to unbind the memslot from the previously-associated
     guest_memfd instance

   - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for
     supporting flags-only changes on KVM_MEM_GUEST_MEMFD memlslots,
     e.g. for dirty logging

   - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
     SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is
     defined as -1ull (a 64-bit value)

   - Update SVI when activating APICv to fix a bug where a
     post-activation EOI for an in-service IRQ would effective be lost
     due to SVI being stale

   - Immediately refresh APICv controls (if necessary) on a nested
     VM-Exit instead of deferring the update via KVM_REQ_APICV_UPDATE,
     as the request is effectively ignored because KVM thinks the vCPU
     already has the correct APICv settings"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit
  KVM: VMX: Update SVI during runtime APICv activation
  KVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN)
  KVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits
  KVM: Harden and prepare for modifying existing guest_memfd memslots
  KVM: Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot
  KVM: selftests: Add a CPUID testcase for KVM_SET_CPUID2 with runtime updates
  KVM: x86: Apply runtime updates to current CPUID during KVM_SET_CPUID{,2}
  KVM: selftests: Add missing "break" in rseq_test's param parsing
2025-12-20 11:31:37 -08:00
Linus Torvalds
d8ba32c5a4 block-6.19-20251218
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmlEvbAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgphTLD/9Gx4wnirRvzo1Vz77Odhqqbipww/gjDV0G
 HASzF4KczfusTei4NABfuSVA4/LS28csZOYq0ZeDmFVg+YPQcmr4DNw/eJzzi1Xh
 TDmYuf8y7jAJjwWmX0fgVF2uDg3/kp3tWKfBxSFqpqHrZUkOLr8IY3y8/uJYKakm
 r8K9HsS7hyPhx4xdfRQixUil5TgQxBtCgwTX+JRv/K9AMnEVHkBkJ+m4GSmjJ3n8
 tr5lL8fQmIQcsSWqTvMH/m2xeLG7gr9LE33wfu2sCLvNJ4CAYFGT9aMf2F0nGbzK
 Uu5XOyq1rJ8vKWJuDVfFsjOca4fyyvaKX4bqycv89Z7MVzQOXsjbs+4sHFycsfYC
 P+OBmxLb2+rowNsawyzObT9wnd37xptl+tF5nXL0LL6W4PWHmUgnDuM6Vg5zpNv+
 n0nO6y7I0iL9o4hgqN8kUOr5e7IjvJccJbratOA/jVXmAw1I/Gg6HDo+W7rEqsVP
 0hrlE+RuJSRypbCO2pcvviahaDaCWo3gQmp9TZCBmgqgoiI0WStPAgYrG424+69Y
 Hoq5HBmtN4GV3EnU0Ybuawzl2OKtQwwr9DrRpvrxdN9qm4E44CD6eTmE5fEnrTnX
 Rcs5VosBAavPTbl8LFDtj2h6qPak+5VHHVZwimLNdOyCLTNf7D/kiAjmOHRBbmdt
 AopIuvCipA==
 =lk7H
 -----END PGP SIGNATURE-----

Merge tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

 - ublk selftests for missing coverage

 - two fixes for the block integrity code

 - fix for the newly added newly added PR read keys ioctl, limiting the
   memory that can be allocated

 - work around for a deadlock that can occur with ublk, where partition
   scanning ends up recursing back into file closure, which needs the
   same mutex grabbed. Not the prettiest thing in the world, but an
   acceptable work-around until we can eliminate the reliance on
   disk->open_mutex for this

 - fix for a race between enabling writeback throttling and new IO
   submissions

 - move a bit of bio flag handling code. No changes, but needed for a
   patchset for a future kernel

 - fix for an init time id leak failure in rnbd

 - loop/zloop state check fix

* tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block: validate interval_exp integrity limit
  block: validate pi_offset integrity limit
  block: rnbd-clt: Fix leaked ID in init_dev()
  ublk: fix deadlock when reading partition table
  block: add allocation size check in blkdev_pr_read_keys()
  Documentation: admin-guide: blockdev: replace zone_capacity with zone_capacity_mb when creating devices
  zloop: use READ_ONCE() to read lo->lo_state in queue_rq path
  loop: use READ_ONCE() to read lo->lo_state without locking
  block: fix race between wbt_enable_default and IO submission
  selftests: ublk: add user copy test cases
  selftests: ublk: add support for user copy to kublk
  selftests: ublk: forbid multiple data copy modes
  selftests: ublk: don't share backing files between ublk servers
  selftests: ublk: use auto_zc for PER_IO_DAEMON tests in stress_04
  selftests: ublk: fix fio arguments in run_io_and_recover()
  selftests: ublk: remove unused ios map in seq_io.bt
  selftests: ublk: correct last_rw map type in seq_io.bt
  selftests: ublk: fix overflow in ublk_queue_auto_zc_fallback()
  block: move around bio flagging helpers
2025-12-20 09:48:56 -08:00
Ihor Solodrai
522397d05e resolve_btfids: Change in-place update with raw binary output
Currently resolve_btfids updates .BTF_ids section of an ELF file
in-place, based on the contents of provided BTF, usually within the
same input file, and optionally a BTF base.

Change resolve_btfids behavior to enable BTF transformations as part
of its main operation. To achieve this, in-place ELF write in
resolve_btfids is replaced with generation of the following binaries:
  * ${1}.BTF with .BTF section data
  * ${1}.BTF_ids with .BTF_ids section data if it existed in ${1}
  * ${1}.BTF.base with .BTF.base section data for out-of-tree modules

The execution of resolve_btfids and consumption of its output is
orchestrated by scripts/gen-btf.sh introduced in this patch.

The motivation for emitting binary data is that it allows simplifying
resolve_btfids implementation by delegating ELF update to the $OBJCOPY
tool [1], which is already widely used across the codebase.

There are two distinct paths for BTF generation and resolve_btfids
application in the kernel build: for vmlinux and for kernel modules.

For the vmlinux binary a .BTF section is added in a roundabout way to
ensure correct linking. The patch doesn't change this approach, only
the implementation is a little different.

Before this patch it worked as follows:

  * pahole consumed .tmp_vmlinux1 [2] and added .BTF section with
    llvm-objcopy [3] to it
  * then everything except the .BTF section was stripped from .tmp_vmlinux1
    into a .tmp_vmlinux1.bpf.o object [2], later linked into vmlinux
  * resolve_btfids was executed later on vmlinux.unstripped [4],
    updating it in-place

After this patch gen-btf.sh implements the following:

  * pahole consumes .tmp_vmlinux1 and produces a *detached* file with
    raw BTF data
  * resolve_btfids consumes .tmp_vmlinux1 and detached BTF to produce
    (potentially modified) .BTF, and .BTF_ids sections data
  * a .tmp_vmlinux1.bpf.o object is then produced with objcopy copying
    BTF output of resolve_btfids
  * .BTF_ids data gets embedded into vmlinux.unstripped in
    link-vmlinux.sh by objcopy --update-section

For kernel modules, creating a special .bpf.o file is not necessary,
and so embedding of sections data produced by resolve_btfids is
straightforward with objcopy.

With this patch an ELF file becomes effectively read-only within
resolve_btfids, which allows deleting elf_update() call and satellite
code (like compressed_section_fix [5]).

Endianness handling of .BTF_ids data is also changed. Previously the
"flags" part of the section was bswapped in sets_patch() [6], and then
Elf_Type was modified before elf_update() to signal to libelf that
bswap may be necessary. With this patch we explicitly bswap entire
data buffer on load and on dump.

[1] https://lore.kernel.org/bpf/131b4190-9c49-4f79-a99d-c00fac97fa44@linux.dev/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n110
[3] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/btf_encoder.c?h=v1.31#n1803
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n284
[5] https://lore.kernel.org/bpf/20200819092342.259004-1-jolsa@kernel.org/
[6] https://lore.kernel.org/bpf/cover.1707223196.git.vmalik@redhat.com/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev
2025-12-19 10:55:40 -08:00
Ihor Solodrai
014e1cdb5f selftests/bpf: Run resolve_btfids only for relevant .test.o objects
A selftest targeting resolve_btfids functionality relies on a resolved
.BTF_ids section to be available in the TRUNNER_BINARY. The underlying
BTF data is taken from a special BPF program (btf_data.c), and so
resolve_btfids is executed as a part of a TRUNNER_BINARY build recipe
on the final binary.

Subsequent patches in this series allow resolve_btfids to modify BTF
before resolving the symbols, which means that the test needs access
to that modified BTF [1]. Currently the test simply reads in
btf_data.bpf.o on the assumption that BTF hasn't changed.

Implement resolve_btfids call only for particular test objects (just
resolve_btfids.test.o for now). The test objects are linked into the
TRUNNER_BINARY, and so .BTF_ids section will be available there.

This will make it trivial for the resolve_btfids test to access BTF
modified by resolve_btfids.

[1] https://lore.kernel.org/bpf/CAErzpmvsgSDe-QcWH8SFFErL6y3p3zrqNri5-UHJ9iK2ChyiBw@mail.gmail.com/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-2-ihor.solodrai@linux.dev
2025-12-19 10:55:40 -08:00
Linus Torvalds
cf26839d7f iommufd 6.19 first rc pull request
Few minor fixes, other than the randconfig fix this is only relevant to
 test code, not releases.
 
 - Randconfig failure if CONFIG_DMA_SHARED_BUFFER is not set
 
 - Remove gcc warning in kselftest
 
 - Fix a refcount leak on an error path in the selftest support code
 
 - Fix missing overflow checks in the selftest support code
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaURMcAAKCRCFwuHvBreF
 YUUIAP4u6RUth7SkqxnoM8KOqsV3YaT721AFc+lH/NA4D+XX3gD/VRHwORq8RiCQ
 APzvUXP/+j+FBy2b0NINfGOdr+cfLAI=
 =0fWH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:
 "A few minor fixes, other than the randconfig fix this is only relevant
  to test code, not releases:

   - Randconfig failure if CONFIG_DMA_SHARED_BUFFER is not set

   - Remove gcc warning in kselftest

   - Fix a refcount leak on an error path in the selftest support code

   - Fix missing overflow checks in the selftest support code"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd/selftest: Check for overflow in IOMMU_TEST_OP_ADD_RESERVED
  iommufd/selftest: Do not leak the hwpt if IOMMU_TEST_OP_MD_CHECK_MAP fails
  iommufd/selftest: Make it clearer to gcc that the access is not out of bounds
  iommufd: Fix building without dmabuf
2025-12-19 08:11:53 +12:00
Linus Torvalds
7b8e9264f5 Including fixes from netfilter and CAN.
Current release - regressions:
 
   - netfilter: nf_conncount: fix leaked ct in error paths
 
   - sched: act_mirred: fix loop detection
 
   - sctp: fix potential deadlock in sctp_clone_sock()
 
   - can: fix build dependency
 
   - eth: mlx5e: do not update BQL of old txqs during channel reconfiguration
 
 Previous releases - regressions:
 
   - sched: ets: always remove class from active list before deleting it
 
   - inet: frags: flush pending skbs in fqdir_pre_exit()
 
   - netfilter:  nf_nat: remove bogus direction check
 
   - mptcp:
     - schedule rtx timer only after pushing data
     - avoid deadlock on fallback while reinjecting
 
   - can: gs_usb: fix error handling
 
   - eth: mlx5e:
     - avoid unregistering PSP twice
     - fix double unregister of HCA_PORTS component
 
   - eth: bnxt_en: fix XDP_TX path
 
   - eth: mlxsw: fix use-after-free when updating multicast route stats
 
 Previous releases - always broken:
 
   - ethtool: avoid overflowing userspace buffer on stats query
 
   - openvswitch: fix middle attribute validation in push_nsh() action
 
   - eth: mlx5: fw_tracer, validate format string parameters
 
   - eth: mlxsw: spectrum_router: fix neighbour use-after-free
 
   - eth: ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()
 
 Misc:
 
   - Jozsef Kadlecsik retires from maintaining netfilter
 
   - tools: ynl: fix build on systems with old kernel headers
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmlEPS0SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkHLEP/3TnVI9qLivOGsZ48bn5UUcIaIlX0wiw
 i5gwpUhbtW3zcLSphO0Nh/CmProme6dFhaMOOkk48bKAdIxRpOMbCC20bfYDLyd0
 ZJTUQqheKpuI23vOnhfs2TTOkcqz6cM7txUq681taQ8Nfo8Dbf0fgOO2HT5ljRD+
 JXlxrdvicZDtve68sSdnsAj5M15EPQGKTPMyPkymBmCKnCMQK9SQKaeTEtaW6NIO
 yM0KqDSNoulDc/6LYMhx8DTtyE7yTiTxwe2NixjdyYljXsk0KiRDirCzxnZuSgh8
 oh+cFLdFDq4mwdYEKjgC5c3ifQyLEpZvwzlY5MKoobsVnT5SbigeSK53l5rEkO3V
 sM84lITHfqTJvlid0AF/ixEc6iWwV7nGRBHh2FXNbfoIKt45eF77jPi9YFsq6Z95
 vlCzYIbY0f2L1y3mPvZbzGQbh2Z12b5kyK8QA1j7SK+zNzxgXbf4+ZtYxHg7O3Ne
 gecmIpTKXMWodaZyfsRQPjR/F6UIlMqsgl9Ci9bfUw+XwL8x+7bJxQAQz+yVjLla
 ng14BItiYKBavcPZBjYlhGKqD1fzGhVZqQecrCkF0VTbMusRd9RcwytU1NG4QGDx
 V5aL28ht85KtMednEWOBkrg+PeXnNyZHzLAf2Xtx3UkaGgiDC8G4IxUv3orlOLD1
 sFPfZnSiGCof
 =6pla
 -----END PGP SIGNATURE-----

Merge tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and CAN.

  Current release - regressions:

   - netfilter: nf_conncount: fix leaked ct in error paths

   - sched: act_mirred: fix loop detection

   - sctp: fix potential deadlock in sctp_clone_sock()

   - can: fix build dependency

   - eth: mlx5e: do not update BQL of old txqs during channel
     reconfiguration

  Previous releases - regressions:

   - sched: ets: always remove class from active list before deleting it

   - inet: frags: flush pending skbs in fqdir_pre_exit()

   - netfilter: nf_nat: remove bogus direction check

   - mptcp:
      - schedule rtx timer only after pushing data
      - avoid deadlock on fallback while reinjecting

   - can: gs_usb: fix error handling

   - eth:
      - mlx5e:
         - avoid unregistering PSP twice
         - fix double unregister of HCA_PORTS component
      - bnxt_en: fix XDP_TX path
      - mlxsw: fix use-after-free when updating multicast route stats

  Previous releases - always broken:

   - ethtool: avoid overflowing userspace buffer on stats query

   - openvswitch: fix middle attribute validation in push_nsh() action

   - eth:
      - mlx5: fw_tracer, validate format string parameters
      - mlxsw: spectrum_router: fix neighbour use-after-free
      - ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()

  Misc:

   - Jozsef Kadlecsik retires from maintaining netfilter

   - tools: ynl: fix build on systems with old kernel headers"

* tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: hns3: add VLAN id validation before using
  net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx
  net: hns3: using the num_tqps in the vf driver to apply for resources
  net: enetc: do not transmit redirected XDP frames when the link is down
  selftests/tc-testing: Test case exercising potential mirred redirect deadlock
  net/sched: act_mirred: fix loop detection
  sctp: Clear inet_opt in sctp_v6_copy_ip_options().
  sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock().
  net/handshake: duplicate handshake cancellations leak socket
  net/mlx5e: Don't include PSP in the hard MTU calculations
  net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
  net/mlx5e: Trigger neighbor resolution for unresolved destinations
  net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init
  net/mlx5: Serialize firmware reset with devlink
  net/mlx5: fw_tracer, Handle escaped percent properly
  net/mlx5: fw_tracer, Validate format string parameters
  net/mlx5: Drain firmware reset in shutdown callback
  net/mlx5: fw reset, clear reset requested on drain_fw_reset
  net: dsa: mxl-gsw1xx: manually clear RANEG bit
  net: dsa: mxl-gsw1xx: fix .shutdown driver operation
  ...
2025-12-19 07:55:35 +12:00
Paolo Bonzini
0499add8ef KVM fixes for 6.19-rc1
- Add a missing "break" to fix param parsing in the rseq selftest.
 
  - Apply runtime updates to the _current_ CPUID when userspace is setting
    CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
    dropping the pending update.
 
  - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
    supported by KVM and leads to a use-after-free due to KVM failing to unbind
    the memslot from the previously-associated guest_memfd instance.
 
  - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
    flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.
 
  - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
    SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
    as -1ull (a 64-bit value).
 
  - Update SVI when activating APICv to fix a bug where a post-activation EOI
    for an in-service IRQ would effective be lost due to SVI being stale.
 
  - Immediately refresh APICv controls (if necessary) on a nested VM-Exit
    instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
    effectively ignored because KVM thinks the vCPU already has the correct
    APICv settings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmk5p18ACgkQOlYIJqCj
 N/0YlBAAvnhGVmqVc3nhd313mo4YGk+Z1RxpO1sJAsGJu42Ir/QqMC9aPHy9ejcS
 hfoIXzPFdVJEztuBUWRje9mvocQnXSAWjXFTaoqJE/LXVnh96Txhh4nvCJSQzyrn
 5/0uk5ZD5dfPVyPtYk6G9w3q9kgYv3Q6O4UEU48ru0q6wcu5FmRshULfHVZnlyNa
 ZALY4k8QzsdSzB0XWusmD0OQpjGRyUR79mqEzUybg4E/b0LAK9Nv6Fr5YgGq7g0N
 AU1T1t+/hMN7x4/24RrxNBO+skPKmCi7nM3iVKqilQSO2fZzrNVUOkr/tGb+5EL6
 iw4JHJQjp9LlzRVxP3QNZp8Bg+knMVRdbSzAkdomDRguMgpu/TGAe7TtkzfEyVel
 VAQUVpDaThp0FK5wAdyMKvpOQqTGjl3KKtM2zs187v+eJJcjJQnIsTk4zW5mmvk4
 Y6YOqbulSNAqVOmJj7oqrDxWgjD75PtXlPFEoOsJM0AuL/sHBo8bKT18cGBwVGmH
 lJoNfkS45kofG4i0zIBwzQuvKjDIQU7ZdVXa2MLL1aqDlyu/66Hll0vCJLAMvHz5
 eb65WQ6Br97e0BNuzVJJNyTGQ3Pr9DdSkTpPkOalwQ3VEyZwcKm3OsB/N0FsgP2V
 ta7vZQ5b6Sn568A9LAgXGhcnQ7mA31VBoNCkLLowxIT4kxVKGUY=
 =4iuO
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.19-rc1' of https://github.com/kvm-x86/linux into HEAD

KVM fixes for 6.19-rc1

 - Add a missing "break" to fix param parsing in the rseq selftest.

 - Apply runtime updates to the _current_ CPUID when userspace is setting
   CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
   dropping the pending update.

 - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
   supported by KVM and leads to a use-after-free due to KVM failing to unbind
   the memslot from the previously-associated guest_memfd instance.

 - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
   flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.

 - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
   SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
   as -1ull (a 64-bit value).

 - Update SVI when activating APICv to fix a bug where a post-activation EOI
   for an in-service IRQ would effective be lost due to SVI being stale.

 - Immediately refresh APICv controls (if necessary) on a nested VM-Exit
   instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
   effectively ignored because KVM thinks the vCPU already has the correct
   APICv settings.
2025-12-18 18:38:45 +01:00
Victor Nogueira
5cba412d6a selftests/tc-testing: Test case exercising potential mirred redirect deadlock
Add a test case that reproduces deadlock scenario where the user has
a drr qdisc attached to root and has a mirred action that redirects to
self on egress

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251210162255.1057663-2-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-18 16:42:18 +01:00
Alexei Starovoitov
ec439c3801 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.19-rc1
Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-16 21:29:38 -08:00
Linus Torvalds
ea1013c153 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmlCBmwACgkQ6rmadz2v
 bToUZA//ZY0IE1x1nCixEAqGF/nGpDzVX4YQQfjrUoXQOD4ykzt35yTNXl6B1IVA
 dliVSI6kUtdoThUa7xJUxMSkDsVBsEMT/zYXQEXJG1zXvJANCB9wTzsC3OCBWbXt
 BRczcEkq0OHC9/l5CrILR6ocwxKGDIMIysfeOSABgfqckSEhylWy3+EWZQCk08ka
 gNpXlDJUG7dYpcZD/zhuC7e5Rg1uNvN7WiTv+Biig8xZCsEtYOq+qC5C/sOnsypI
 nqfECfbx48cVl49SjatdgquuHn/INESdLRCHisshkurA2Mp5PQuCmrwlXbv4JG59
 v9b7lsFQlkpvEXMdo9VYe6K2gjfkOPRdWsVPu2oXA1qISRmrDqX8cKOpapUIwRhL
 p3ASruMOnz0KFqVaET8+5u2SwtALeW+c+1p1aHMfVGF/qbXuyG05qBkLoGFJR+Xr
 WznXUXY80Z7pjD57SpA6U3DigAkGqKCBXUwdifaOq8HQonwsnQGqkW/3NngNULGP
 IC4u0JXn61VgQsM/kAw+ucc4bdKI0g4oKJR56lT48elrj6Yxrjpde4oOqzZ0IQKu
 VQ0YnzWqqT2tjh4YNMOwkNPbFR4ALd329zI6TUkWib/jByEBNcfjSj9BRANd1KSx
 JgSHAE6agrbl6h3nOx584YCasX3Zq+nfv1Sj4Z/5GaHKKW3q/Vw=
 =wHLt
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix BPF builds due to -fms-extensions. selftests (Alexei
   Starovoitov), bpftool (Quentin Monnet).

 - Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n
   (Geert Uytterhoeven)

 - Fix livepatch/BPF interaction and support reliable unwinding through
   BPF stack frames (Josh Poimboeuf)

 - Do not audit capability check in arm64 JIT (Ondrej Mosnacek)

 - Fix truncated dmabuf BPF iterator reads (T.J. Mercier)

 - Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)

 - Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
   C23 (Mikhail Gavrilov)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: add regression test for bpf_d_path()
  bpf: Fix verifier assumptions of bpf_d_path's output buffer
  selftests/bpf: Add test for truncated dmabuf_iter reads
  bpf: Fix truncated dmabuf iterator reads
  x86/unwind/orc: Support reliable unwinding through BPF stack frames
  bpf: Add bpf_has_frame_pointer()
  bpf, arm64: Do not audit capability check in do_jit()
  libbpf: Fix -Wdiscarded-qualifiers under C23
  bpftool: Fix build warnings due to MS extensions
  net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
  selftests/bpf: Add -fms-extensions to bpf build flags
2025-12-17 15:54:58 +12:00
Emil Tsalapatis
19f12431b6 selftests/bpf: Add tests for the arena offset of globals
Add tests for the new libbpf globals arena offset logic. The
tests cover the case of globals being as large as the arena
itself, and being smaller than the arena. In that case, the
data is placed at the end of the arena, and the beginning
of the arena is free.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251216173325.98465-6-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Emil Tsalapatis
c1f61171d4 libbpf: Move arena globals to the end of the arena
Arena globals are currently placed at the beginning of the arena
by libbpf. This is convenient, but prevents users from reserving
guard pages in the beginning of the arena to identify NULL pointer
dereferences. Adjust the load logic to place the globals at the
end of the arena instead.

Also modify bpftool to set the arena pointer in the program's BPF
skeleton to point to the globals. Users now call bpf_map__initial_value()
to find the beginning of the arena mapping and use the arena pointer
in the skeleton to determine which part of the mapping holds the
arena globals and which part is free.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251216173325.98465-5-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Emil Tsalapatis
12a1fe6e12 bpf/verifier: Do not limit maximum direct offset into arena map
The verifier currently limits direct offsets into a map to 512MiB
to avoid overflow during pointer arithmetic. However, this prevents
arena maps from using direct addressing instructions to access data
at the end of > 512MiB arena maps. This is necessary when moving
arena globals to the end of the arena instead of the front.

Refactor the verifier code to remove the offset calculation during
direct value access calculations. This is possible because the only
two map types that implement .map_direct_value_addr() are arrays and
arenas, and they both do their own internal checks to ensure the
offset is within bounds.

Adjust selftests that expect the old error. These tests still fail
because the verifier identifies the access as out of bounds for the
map, so change them to expect an "invalid access to map value pointer"
error instead.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251216173325.98465-3-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Emil Tsalapatis
0355911ac0 selftests/bpf: Explicitly account for globals in verifier_arena_large
The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.

Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251216173325.98465-2-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Linus Torvalds
dbf89321bf sched_ext: Fixes for v6.19-rc1
- Fix memory leak when destroying helper kthread workers during scheduler
   disable.
 
 - Fix bypass depth accounting on scx_enable() failure which could leave
   the system permanently in bypass mode.
 
 - Fix missing preemption handling when moving tasks to local DSQs via
   scx_bpf_dsq_move().
 
 - Misc fixes including NULL check for put_prev_task(), flushing stdout in
   selftests, and removing unused code.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaUA9aQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGXWKAP9VeREr2ceRFd3WK5vFsFzCGh1L8rRu+t352MGL
 qWXkzQD/apLFSo5RQZ0tE8qORwKM9hNhS8QkVJhlsv5VZRWMBQI=
 =ueoE
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix memory leak when destroying helper kthread workers during
   scheduler disable

 - Fix bypass depth accounting on scx_enable() failure which could leave
   the system permanently in bypass mode

 - Fix missing preemption handling when moving tasks to local DSQs via
   scx_bpf_dsq_move()

 - Misc fixes including NULL check for put_prev_task(), flushing stdout
   in selftests, and removing unused code

* tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Remove unused code in the do_pick_task_scx()
  selftests/sched_ext: flush stdout before test to avoid log spam
  sched_ext: Fix missing post-enqueue handling in move_local_task_to_local_dsq()
  sched_ext: Factor out local_dsq_post_enq() from dispatch_enqueue()
  sched_ext: Fix bypass depth leak on scx_enable() failure
  sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
  sched_ext: Fix the memleak for sch->helper objects
2025-12-16 19:24:35 +12:00
Jason Gunthorpe
5b244b077c iommufd/selftest: Make it clearer to gcc that the access is not out of bounds
GCC gets a bit confused and reports:

   In function '_test_cmd_get_hw_info',
       inlined from 'iommufd_ioas_get_hw_info' at iommufd.c:779:3,
       inlined from 'wrapper_iommufd_ioas_get_hw_info' at iommufd.c:752:1:
>> iommufd_utils.h:804:37: warning: array subscript 'struct iommu_test_hw_info[0]' is partly outside array bounds of 'struct iommu_test_hw_info_buffer_smaller[1]' [-Warray-bounds=]
     804 |                         assert(!info->flags);
         |                                 ~~~~^~~~~~~
   iommufd.c: In function 'wrapper_iommufd_ioas_get_hw_info':
   iommufd.c:761:11: note: object 'buffer_smaller' of size 4
     761 |         } buffer_smaller;
         |           ^~~~~~~~~~~~~~

While it is true that "struct iommu_test_hw_info[0]" is partly out of
bounds of the input pointer, it is not true that info->flags is out of
bounds. Unclear why it warns on this.

Reuse an existing properly sized stack buffer and pass a truncated length
instead to test the same thing.

Fixes: af4fde93c3 ("iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl")
Link: https://patch.msgid.link/r/0-v1-63a2cffb09da+4486-iommufd_gcc_bounds_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512032344.kaAcKFIM-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-15 20:34:41 -04:00
Florian Westphal
fec7b07955 selftests: netfilter: packetdrill: avoid failure on HZ=100 kernel
packetdrill --ip_version=ipv4 --mtu=1500 --tolerance_usecs=1000000 --non_fatal packet conntrack_syn_challenge_ack.pkt
conntrack v1.4.8 (conntrack-tools): 1 flow entries have been shown.
conntrack_syn_challenge_ack.pkt:32: error executing `conntrack -f $NFCT_IP_VERSION \
-L -p tcp --dport 8080 | grep UNREPLIED | grep -q SYN_SENT` command: non-zero status 1

Affected kernel had CONFIG_HZ=100; reset packet was still sitting in
backlog.

Reported-by: Yi Chen <yiche@redhat.com>
Fixes: a8a388c2aa ("selftests: netfilter: add packetdrill based conntrack tests")
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-15 15:04:04 +01:00
Bhavik Sachdev
0c82fdbbbf
selftests: statmount: tests for STATMOUNT_BY_FD
Add tests for STATMOUNT_BY_FD flag, which adds support for passing a
file descriptors to statmount(). The fd can also be on a "unmounted"
mount (mount unmounted with MNT_DETACH), we also include tests for that.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Link: https://patch.msgid.link/20251129091455.757724-4-b.sachdev1904@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:13:14 +01:00
Ard Biesheuvel
eb972eab07 lkdtm/bugs: Add cases for BUG and PANIC occurring in hardirq context
Add lkdtm cases to trigger a BUG() or panic() from hardirq context. This
is useful for testing pstore behavior being invoked from such contexts.

Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-12-15 12:05:37 +00:00
Caleb Sander Mateos
63276182c5 selftests: ublk: add user copy test cases
The ublk selftests cover every data copy mode except user copy. Add
tests for user copy based on the existing test suite:
- generic_14 ("basic recover function verification (user copy)") based
  on generic_04 and generic_05
- null_03 ("basic IO test with user copy") based on null_01 and null_02
- loop_06 ("write and verify over user copy") based on loop_01 and
  loop_03
- loop_07 ("mkfs & mount & umount with user copy") based on loop_02 and
  loop_04
- stripe_05 ("write and verify test on user copy") based on stripe_03
- stripe_06 ("mkfs & mount & umount on user copy") based on stripe_02
  and stripe_04
- stress_06 ("run IO and remove device (user copy)") based on stress_01
  and stress_03
- stress_07 ("run IO and kill ublk server (user copy)") based on
  stress_02 and stress_04

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:41 -07:00
Caleb Sander Mateos
b9f0a94c3b selftests: ublk: add support for user copy to kublk
The ublk selftests mock ublk server kublk supports every data copy mode
except user copy. Add support for user copy to kublk, enabled via the
--user_copy (-u) command line argument. On writes, issue pread() calls
to copy the write data into the ublk_io's buffer before dispatching the
write to the target implementation. On reads, issue pwrite() calls to
copy read data from the ublk_io's buffer before committing the request.
Copy in 2 KB chunks to provide some coverage of the offseting logic.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:41 -07:00
Caleb Sander Mateos
52bc483763 selftests: ublk: forbid multiple data copy modes
The kublk mock ublk server allows multiple data copy mode arguments to
be passed on the command line (--zero_copy, --get_data, and --auto_zc).
The ublk device will be created with all the requested feature flags,
however kublk will only use one of the modes to interact with request
data (arbitrarily preferring auto_zc over zero_copy over get_data). To
clarify the intent of the test, don't allow multiple data copy modes to
be specified. --zero_copy and --auto_zc are allowed together for
--auto_zc_fallback, which uses both copy modes.
Don't set UBLK_F_USER_COPY for zero_copy, as it's a separate feature.
Fix the test cases in test_stress_05 passing --get_data along with
--zero_copy or --auto_zc.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:41 -07:00
Caleb Sander Mateos
d8295408e0 selftests: ublk: don't share backing files between ublk servers
stress_04 is missing a wait between blocks of tests, meaning multiple
ublk servers will be running in parallel using the same backing files.
Add a wait after each section to ensure each backing file is in use by a
single ublk server at a time.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:40 -07:00
Caleb Sander Mateos
20da98a07b selftests: ublk: use auto_zc for PER_IO_DAEMON tests in stress_04
stress_04 is described as "run IO and kill ublk server(zero copy)" but
the --per_io_tasks tests cases don't use zero copy. Plus, one of the
test cases is duplicated. Add --auto_zc to these test cases and
--auto_zc_fallback to one of the duplicated ones. This matches the test
cases in stress_03.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:40 -07:00
Caleb Sander Mateos
58eec4f3fc selftests: ublk: fix fio arguments in run_io_and_recover()
run_io_and_recover() invokes fio with --size="${size}", but the variable
size doesn't exist. Thus, the argument expands to --size=, which causes
fio to exit immediately with an error without issuing any I/O. Pass the
value for size as the first argument to the function.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:40 -07:00
Caleb Sander Mateos
fe8c0182d4 selftests: ublk: remove unused ios map in seq_io.bt
The ios map populated by seq_io.bt is never read, so remove it.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:40 -07:00
Caleb Sander Mateos
1fd4b8d7e3 selftests: ublk: correct last_rw map type in seq_io.bt
The last_rw map is initialized with a value of 0 but later assigned the
value args.sector + args.nr_sector, which has type sector_t = u64.
bpftrace complains about the type mismatch between int64 and uint64:
trace/seq_io.bt:18:3-59: ERROR: Type mismatch for @last_rw: trying to assign value of type 'uint64' when map already contains a value of type 'int64'
        @last_rw[$dev, str($2)] = (args.sector + args.nr_sector);

Cast the initial value to uint64 so bpftrace will load the program.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:40 -07:00
Ming Lei
9637fc3bdd selftests: ublk: fix overflow in ublk_queue_auto_zc_fallback()
The functions ublk_queue_use_zc(), ublk_queue_use_auto_zc(), and
ublk_queue_auto_zc_fallback() were returning int, but performing
bitwise AND on q->flags which is __u64.

When a flag bit is set in the upper 32 bits (beyond INT_MAX), the
result of the bitwise AND operation could overflow when cast to int,
leading to incorrect boolean evaluation.

For example, if UBLKS_Q_AUTO_BUF_REG_FALLBACK is 0x8000000000000000:
  - (u64)flags & 0x8000000000000000 = 0x8000000000000000
  - Cast to int: undefined behavior / incorrect value
  - Used in if(): may evaluate incorrectly

Fix by:
1. Changing return type from int to bool for semantic correctness
2. Using !! to explicitly convert to boolean (0 or 1)

This ensures the functions return proper boolean values regardless
of which bit position the flags occupy in the 64-bit field.

Fixes: c3a6d48f86 ("selftests: ublk: remove ublk queue self-defined flags")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-12 12:50:40 -07:00
Emil Tsalapatis
579a3297b2 selftests/sched_ext: flush stdout before test to avoid log spam
The sched_ext selftests runner runs each test in the same process,
with each test possibly forking multiple times. When the main runner
has not flushed its stdout, the children inherit the buffered output
for previous tests and emit it during exit. This causes log spam.

Make sure stdout/stderr is fully flushed before each test.

Cc: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-12-12 08:18:32 -10:00
Florian Westphal
5ec8ca26fe netfilter: nf_nat: remove bogus direction check
Jakub reports spurious failures of the 'conntrack_reverse_clash.sh'
selftest.  A bogus test makes nat core resort to port rewrite even
though there is no need for this.

When the test is made, nf_nat_used_tuple() would already have caused us
to return if no other CPU had added a colliding entry.
Moreover, nf_nat_used_tuple() would have ignored the colliding entry if
their origin tuples had been the same.

All that is left to check is if the colliding entry in the hash table
is subject to NAT, and, if its not, if our entry matches in the reverse
direction, e.g. hash table has

addr1:1234 -> addr2:80, and we want to commit
addr2:80   -> addr1:1234.

Because we already checked that neither the new nor the committed entry is
subject to NAT we only have to check origin vs. reply tuple:
for non-nat entries, the reply tuple is always the inverted original.

Just in case there are more problems extend the error reporting
in the selftest while at it and dump conntrack table/stats on error.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251206175135.4a56591b@kernel.org/
Fixes: d8f84a9bc7 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash")
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-11 13:08:37 +01:00
Victor Nogueira
5914428e0e selftests/tc-testing: Create tests to exercise ets classes active list misplacements
Add a test case for a bug fixed by Jamal [1] and for scenario where an
ets drr class is inserted into the active list twice.

- Try to delete ets drr class' qdisc while still keeping it in the
  active list
- Try to add ets class to the active list twice

[1] https://lore.kernel.org/netdev/20251128151919.576920-1-jhs@mojatatu.com/

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251208190125.1868423-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-11 01:36:40 -08:00
Jakub Kicinski
898ae76a70 netfilter pull request nf-25-12-10
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmk5UkYNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2ABGKEAChqKDT9NB2rbp1uxyj26TE8KtWB90FioA/yZcV
 6H9lmTp5CXYRsYvkmRDM+xihYHSJpwNptxMVQJ/QonL+LPe8TwiP+THBBwknqrmY
 JY5rgcEbhAbYBnmHJ9T1soK2FHsisUoyNjrA/6rvQQpkoi9+2j/WPrDT765RdXmM
 I1fiF3qvKP1rYwpibOqXRozrlPVmBDB/ffibM7rNGR+K2N8B3vZPRbhQrc6B60ao
 gj6LLdTWZJ4wduPOVo03CzDqzu8hpL1HGwGPYqP1ol3TFlSy8+ByEM+92Nj4BEfm
 qX+3WDqxYc0U01pjz+TiXfI8nC1o5hzZiDWZ39eBp0zvxBoAyNCjc5ZKLE++EjvI
 6hdDcJCcaOipFDv5GI/UmPadYjXf33YWsPK/9mXdChagXvaXtEB/b2KwbHUO4wMT
 eJEtA9NmuvRAk6tGC9L2T7hurre/Dt8Jx35uFz0nJowqNfLxvFmAKeEB6v3yQKGv
 Jb+LD3cEY3a4q7o+Vq3AVUZTqQtifW5JYwIailWHWXtxbnEtADQdWFa7NZD1Nm12
 U2HFPN0z697WaDByNuK33Svc4CetE5oMXl+0WwE6Qa9hNCSnAkpyZKL3bTwJqEtQ
 Ij+4IXoQndgTLbr84koJYcwlTGBDu9li7ii59xh1B1K+BLxJmu/RULxlIR4x010v
 ygvezw==
 =LRTM
 -----END PGP SIGNATURE-----

Merge tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

1) Fix refcount leaks in nf_conncount, from Fernando Fernandez Mancera.
   This addresses a recent regression that came in the last -next
   pull request.

2) Fix a null dereference in route error handling in IPVS, from Slavin
   Liu.  This is an ancient issue dating back to 5.1 days.

3) Always set ifindex in route tuple in the flowtable output path, from
   Lorenzo Bianconi.  This bug came in with the recent output path refactoring.

4) Prefer 'exit $ksft_xfail' over 'exit $ksft_skip' when we fail to
   trigger a nat race condition to exercise the clash resolution path in
   selftest infra, $ksft_skip should be reserved for missing tooling,
   From myself.

* tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: prefer xfail in case race wasn't triggered
  netfilter: always set route tuple out ifindex
  ipvs: fix ipv4 null-ptr-deref in route error path
  netfilter: nf_conncount: fix leaked ct in error paths
====================

Link: https://patch.msgid.link/20251210110754.22620-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-11 00:58:38 -08:00
Petr Machata
514520b34b selftests: forwarding: vxlan_bridge_1q_mc_ul: Drop useless sleeping
After fixing traffic matching in the previous patch, the test does not need
to use the sleep anymore. So drop vx_wait() altogether, migrate all callers
of vx{10,20}_create_wait() to the corresponding _create(), and drop the now
unused _create_wait() helpers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/eabfe4fa12ae788cf3b8c5c876a989de81dfc3d3.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-11 00:53:16 -08:00
Petr Machata
0c8b9a68b3 selftests: forwarding: vxlan_bridge_1q_mc_ul: Fix flakiness
This test runs an overlay traffic, forwarded over a multicast-routed VXLAN
underlay. In order to determine whether packets reach their intended
destination, it uses a TC match. For convenience, it uses a flower match,
which however does not allow matching on the encapsulated packet. So
various service traffic ends up being indistinguishable from the test
packets, and ends up confusing the test. To alleviate the problem, the test
uses sleep to allow the necessary service traffic to run and clear the
channel, before running the test traffic. This worked for a while, but
lately we have nevertheless seen flakiness of the test in the CI.

Fix the issue by using u32 to match the encapsulated packet as well. The
confusing packets seem to always be IPv6 multicast listener reports.
Realistically they could be ARP or other ICMP6 traffic as well. Therefore
look for ethertype IPv4 in the IPv4 traffic test, and for IPv6 / UDP
combination in the IPv6 traffic test.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/6438cb1613a2a667d3ff64089eb5994778f247af.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-11 00:53:15 -08:00
Petr Machata
0842e34849 selftests: net: lib: tc_rule_stats_get(): Don't hard-code array index
Flower is commonly used to match on packets in many bash-based selftests.
A dump of a flower filter including statistics looks something like this:

 [
   {
     "protocol": "all",
     "pref": 49152,
     "kind": "flower",
     "chain": 0
   },
   {
     ...
     "options": {
       ...
       "actions": [
         {
	   ...
           "stats": {
             "bytes": 0,
             "packets": 0,
             "drops": 0,
             "overlimits": 0,
             "requeues": 0,
             "backlog": 0,
             "qlen": 0
           }
         }
       ]
     }
   }
 ]

The JQ query in the helper function tc_rule_stats_get() assumes this form
and looks for the second element of the array.

However, a dump of a u32 filter looks like this:

 [
   {
     "protocol": "all",
     "pref": 49151,
     "kind": "u32",
     "chain": 0
   },
   {
     "protocol": "all",
     "pref": 49151,
     "kind": "u32",
     "chain": 0,
     "options": {
       "fh": "800:",
       "ht_divisor": 1
     }
   },
   {
     ...
     "options": {
       ...
       "actions": [
         {
	   ...
           "stats": {
             "bytes": 0,
             "packets": 0,
             "drops": 0,
             "overlimits": 0,
             "requeues": 0,
             "backlog": 0,
             "qlen": 0
           }
         }
       ]
     }
   },
 ]

There's an extra element which the JQ query ends up choosing.

Instead of hard-coding a particular index, look for the entry on which a
selector .options.actions yields anything.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/12982a44471c834511a0ee6c1e8f57e3a5307105.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-11 00:53:15 -08:00
Florian Westphal
b8a81b0ce5 selftests: netfilter: prefer xfail in case race wasn't triggered
Jakub says: "We try to reserve SKIP for tests skipped because tool is
missing in env, something isn't built into the kernel etc."

use xfail, we can't force the race condition to appear at will
so its expected that the test 'fails' occasionally.

Fixes: 78a5883635 ("selftests: netfilter: add conntrack clash resolution test case")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251206175647.5c32f419@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10 11:55:59 +01:00
Shuran Liu
79e247d660 selftests/bpf: add regression test for bpf_d_path()
Add a regression test for bpf_d_path() to cover incorrect verifier
assumptions caused by an incorrect function prototype. The test
attaches to the fallocate hook, calls bpf_d_path() and verifies that
a simple prefix comparison on the returned pathname behaves correctly
after the fix in patch 1. It ensures the verifier does not assume
the buffer remains unwritten.

Co-developed-by: Zesen Liu <ftyg@live.com>
Signed-off-by: Zesen Liu <ftyg@live.com>
Co-developed-by: Peili Gao <gplhust955@gmail.com>
Signed-off-by: Peili Gao <gplhust955@gmail.com>
Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Shuran Liu <electronlsr@gmail.com>
Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10 01:36:26 -08:00
Guenter Roeck
91dc09a609 selftests: net: tfo: Fix build warning
Fix

tfo.c: In function ‘run_server’:
tfo.c:84:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’

by evaluating the return value from read() and displaying an error message
if it reports an error.

Fixes: c65b5bb232 ("selftests: net: add passive TFO test binary")
Cc: David Wei <dw@davidwei.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-14-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10 01:11:12 -08:00
Guenter Roeck
59546e8744 selftests: net: Fix build warnings
Fix

ksft.h: In function ‘ksft_ready’:
ksft.h:27:9: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’

ksft.h: In function ‘ksft_wait’:
ksft.h:51:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’

by checking the return value of the affected functions and displaying
an error message if an error is seen.

Fixes: 2b6d490b82 ("selftests: drv-net: Factor out ksft C helpers")
Cc: Joe Damato <jdamato@fastly.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-11-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10 01:11:12 -08:00
Guenter Roeck
06f7cae92f selftest: af_unix: Support compilers without flex-array-member-not-at-end support
Fix:

gcc: error: unrecognized command-line option ‘-Wflex-array-member-not-at-end’

by making the compiler option dependent on its support.

Fixes: 1838731f10 ("selftest: af_unix: Add -Wall and -Wflex-array-member-not-at-end to CFLAGS.")
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-7-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10 01:08:51 -08:00
Ankit Khushwaha
9580f6d47d selftests: tls: fix warning of uninitialized variable
In 'poll_partial_rec_async' a uninitialized char variable 'token' with
is used for write/read instruction to synchronize between threads
via a pipe.

tls.c:2833:26: warning: variable 'token' is uninitialized
      		   when passed as a const pointer argument

Initialize 'token' to '\0' to silence compiler warning.

Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20251205163242.14615-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10 01:03:02 -08:00
Linus Torvalds
0048fbb401 Futex changes for v6.19:
- Standardize on ktime_t in restart_block::time as well
    (Thomas Weißschuh)
 
  - Futex selftests:
    - Add robust list testcases (André Almeida)
    - Formatting fixes/cleanups (Carlos Llamas)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmk5J9QRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gimQ//VhPIVYqioY/opLXBWFAiyxWLF8RbzbsD
 URC4ppVqMa0d8np4VaoaNGTCyPOvhC5m/5q14BNgNhGfbvHOrx1W5cJ122v2vroC
 aNwBkapt655lHwgax4vLgM7jMPW1do9hoyqtjsUIQeKMDlyZZW22gY+ExRTPJZTm
 lP6PreWR74QTTKo5pTjNCoWSaLEmuoD5rbo52SkiTYHwBLL+Tqubl92mVFFUbE+Q
 9n/c/zDeRSaixKr64TvDAijGyjrH8XFhOoMyyOGCiiWyrXfjDPde0Cblt6OkBQeK
 /4/9uFo2TX3vFxQ2Zhj21gqMwQ84cdPvC86GkPh3FoxRNd66gArIpC14ME7rRn0V
 aDKmz+c8QhFdS9gp4+Gw2OkuzYylLl2RSoSjhz/ndrZh7OLn64cA9KjfhRscCfjD
 mcFMVFcjyK8BlgjXyTltC00tLftf6pyO7bqzO4kaYYbpls687ztfZ0mUSE60gLS4
 WuCnntD+Q2IADV0r5xU9ps1a37eaqiHfgtjON2wznJeyj46PbpkWxSpdTIogTRoS
 5JdLv+w1LdDMHaUuEPOgYYd6L69X0HpDauHotyRlGY+SoSoCoX6DbSXTg7CSqlqY
 VHeuLj2OKnELJpxLEo16pcW6qTd+BM/2jcG91H7VJxbb2sJnngT3XtgJagXE5KYR
 IYjcKq6DWig=
 =9KQQ
 -----END PGP SIGNATURE-----

Merge tag 'locking-futex-2025-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull futex updates from Ingo Molnar:

 - Standardize on ktime_t in restart_block::time as well (Thomas
   Weißschuh)

 - Futex selftests:
     - Add robust list testcases (André Almeida)
     - Formatting fixes/cleanups (Carlos Llamas)

* tag 'locking-futex-2025-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Store time as ktime_t in restart block
  selftests/futex: Create test for robust list
  selftests/futex: Skip tests if shmget unsupported
  selftests/futex: Add newline to ksft_exit_fail_msg()
  selftests/futex: Remove unused test_futex_mpol()
2025-12-10 17:21:30 +09:00
Cupertino Miranda
a5b4867fad selftests/bpf: add verifier sign extension bound computation tests.
This commit adds 3 tests to verify a common compiler generated
pattern for sign extension (r1 <<= 32; r1 s>>= 32).
The tests make sure the register bounds are correctly computed both for
positive and negative register values.

Signed-off-by: Cupertino Miranda  <cupertino.miranda@oracle.com>
Signed-off-by: Andrew Pinski  <andrew.pinski@oss.qualcomm.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: David Faust  <david.faust@oracle.com>
Cc: Jose Marchesi  <jose.marchesi@oracle.com>
Cc: Elena Zannoni  <elena.zannoni@oracle.com>
Link: https://lore.kernel.org/r/20251202180220.11128-3-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10 00:12:09 -08:00
Kohei Enju
18352f8fae selftests/bpf: add tests for attaching invalid fd
Add test cases for situations where adding the following types of file
descriptors to a cpumap entry should fail:
- Non-BPF file descriptor (expect -EINVAL)
- Nonexistent file descriptor (expect -EBADF)

Also tighten the assertion for the expected error when adding a
non-BPF_XDP_CPUMAP program to a cpumap entry.

Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20251208131449.73036-3-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09 23:53:27 -08:00
T.J. Mercier
9489d457d4 selftests/bpf: Add test for truncated dmabuf_iter reads
If many dmabufs are present, reads of the dmabuf iterator can be
truncated at PAGE_SIZE or user buffer size boundaries before the fix in
"bpf: Fix truncated dmabuf iterator reads". Add a test to
confirm truncation does not occur.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20251204000348.1413593-2-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09 23:49:04 -08:00
Matthieu Baerts (NGI0)
29f4801e9c selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-08 23:54:02 -08:00
H. Peter Anvin
c278d72b99 selftests/bpf: replace "__auto_type" with "auto"
Replace instances of "__auto_type" with "auto" in:

	tools/testing/selftests/bpf/prog_tests/socket_helpers.h

This file does not seem to be including <linux/compiler_types.h>
directly or indirectly, so copy the definition but guard it with
!defined(auto).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-12-08 15:32:15 -08:00
Guopeng Zhang
50133c09d1 selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups()
with the new helper `cg_read_key_long_poll()`. This change improves
the robustness of the test by polling the "nr_dying_descendants"
counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.

Additionally, increase the retry timeout to 8 seconds (from 5 seconds)
based on testing results:
  - With 5-second timeout: 4/20 runs passed.
  - With 8-second timeout: 20/20 runs passed.

The 8 second timeout is based on stress testing of test_kmem_dead_cgroups()
under load: 5 seconds was occasionally not enough for reclaim of dying
descendants to complete, whereas 8 seconds consistently covered the observed
latencies. This value is intended as a generous upper bound for the
asynchronous reclaim and is not tied to any specific kernel constant, so it
can be adjusted in the future if reclaim behavior changes.

Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-12-08 08:44:54 -10:00
Guopeng Zhang
6360d444ae selftests: cgroup: make test_memcg_sock robust against delayed sock stats
test_memcg_sock() currently requires that memory.stat's "sock " counter
is exactly zero immediately after the TCP server exits. On a busy system
this assumption is too strict:

  - Socket memory may be freed with a small delay (e.g. RCU callbacks).
  - memcg statistics are updated asynchronously via the rstat flushing
    worker, so the "sock " value in memory.stat can stay non-zero for a
    short period of time even after all socket memory has been uncharged.

As a result, test_memcg_sock() can intermittently fail even though socket
memory accounting is working correctly.

Make the test more robust by polling memory.stat for the "sock "
counter and allowing it some time to drop to zero instead of checking
it only once. The timeout is set to 3 seconds to cover the periodic
rstat flush interval (FLUSH_TIME = 2*HZ by default) plus some
scheduling slack. If the counter does not become zero within the
timeout, the test still fails as before.

On my test system, running test_memcontrol 50 times produced:

  - Before this patch:  6/50 runs passed.
  - After this patch:  50/50 runs passed.

Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-12-08 08:44:54 -10:00
Guopeng Zhang
311ead1be0 selftests: cgroup: Add cg_read_key_long_poll() to poll a cgroup key with retries
Introduce a new helper function `cg_read_key_long_poll()` in
cgroup_util.h. This function polls the specified key in a cgroup file
until it matches the expected value or the retry limit is reached,
with configurable wait intervals between retries.

This helper is particularly useful for handling asynchronously updated
cgroup statistics (e.g., memory.stat), where immediate reads may
observe stale values, especially on busy systems. It allows tests and
other utilities to handle such cases more flexibly.

Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-12-08 08:44:54 -10:00
Linus Torvalds
edf602a17b TTY/Serial changes for 6.19-rc1
Here is the big set of tty/serial driver changes for 6.19-rc1.  Nothing
 major at all, just small constant churn to make the tty layer "cleaner"
 as well as serial driver updates and even a new test added!  Included in
 here are:
   - More tty/serial cleanups from Jiri
   - tty tiocsti test added to hopefully ensure we don't regress in this
     area again
   - sc16is7xx driver updates
   - imx serial driver updates
   - 8250 driver updates
   - new hardware device ids added
   - other minor serial/tty driver cleanups and tweaks
 
 All of these have been in linux-next for a while with no reported issues
 other than a merge-conflict that you will have when you merge into your
 tree (should be simple to resolve, just delete the code on both sides
 of the merge).
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaTTJ+g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymiEQCgweRHAivbv2avIlfsmpFewV27WMgAnR1aE5mW
 JhhopcWcdjnRJikavHGo
 =lhK0
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the big set of tty/serial driver changes for 6.19-rc1. Nothing
  major at all, just small constant churn to make the tty layer
  "cleaner" as well as serial driver updates and even a new test added!
  Included in here are:

   - More tty/serial cleanups from Jiri

   - tty tiocsti test added to hopefully ensure we don't regress in this
     area again

   - sc16is7xx driver updates

   - imx serial driver updates

   - 8250 driver updates

   - new hardware device ids added

   - other minor serial/tty driver cleanups and tweaks

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (60 commits)
  serial: sh-sci: Fix deadlock during RSCI FIFO overrun error
  dt-bindings: serial: rsci: Drop "uart-has-rtscts: false"
  LoongArch: dts: Add uart new compatible string
  serial: 8250: Add Loongson uart driver support
  dt-bindings: serial: 8250: Add Loongson uart compatible
  serial: 8250: add driver for KEBA UART
  serial: Keep rs485 settings for devices without firmware node
  serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms
  serial: qcom-geni: Enable PM runtime for serial driver
  serial: sprd: Return -EPROBE_DEFER when uart clock is not ready
  tty: serial: samsung: Declare earlycon for Exynos850
  serial: icom: Convert PCIBIOS_* return codes to errnos
  serial: 8250-of: Fix style issues in 8250_of.c
  serial: add support of CPCI cards
  serial: mux: Fix kernel doc for mux_poll()
  tty: replace use of system_unbound_wq with system_dfl_wq
  serial: 8250_platform: simplify IRQF_SHARED handling
  serial: 8250: make share_irqs local to 8250_platform
  serial: 8250: move skip_txen_test to core
  serial: drop SERIAL_8250_DEPRECATED_OPTIONS
  ...
2025-12-06 18:38:19 -08:00
Linus Torvalds
509d3f4584 Significant patch series in this pull request:
- The 6 patch series "panic: sys_info: Refactor and fix a potential
   issue" from Andy Shevchenko fixes a build issue and does some cleanup in
   ib/sys_info.c.
 
 - The 9 patch series "Implement mul_u64_u64_div_u64_roundup()" from
   David Laight enhances the 64-bit math code on behalf of a PWM driver and
   beefs up the test module for these library functions.
 
 - The 2 patch series "scripts/gdb/symbols: make BPF debug info available
   to GDB" from Ilya Leoshkevich makes BPF symbol names, sizes, and line
   numbers available to the GDB debugger.
 
 - The 4 patch series "Enable hung_task and lockup cases to dump system
   info on demand" from Feng Tang adds a sysctl which can be used to cause
   additional info dumping when the hung-task and lockup detectors fire.
 
 - The 6 patch series "lib/base64: add generic encoder/decoder, migrate
   users" from Kuan-Wei Chiu adds a general base64 encoder/decoder to lib/
   and migrates several users away from their private implementations.
 
 - The 2 patch series "rbree: inline rb_first() and rb_last()" from Eric
   Dumazet makes TCP a little faster.
 
 - The 9 patch series "liveupdate: Rework KHO for in-kernel users" from
   Pasha Tatashin reworks the KEXEC Handover interfaces in preparation for
   Live Update Orchestrator (LUO), and possibly for other future clients.
 
 - The 13 patch series "kho: simplify state machine and enable dynamic
   updates" from Pasha Tatashin increases the flexibility of KEXEC
   Handover.  Also preparation for LUO.
 
 - The 18 patch series "Live Update Orchestrator" from Pasha Tatashin is
   a major new feature targeted at cloud environments.  Quoting the [0/N]:
 
     This series introduces the Live Update Orchestrator, a kernel subsystem
     designed to facilitate live kernel updates using a kexec-based reboot.
     This capability is critical for cloud environments, allowing hypervisors
     to be updated with minimal downtime for running virtual machines.  LUO
     achieves this by preserving the state of selected resources, such as
     memory, devices and their dependencies, across the kernel transition.
 
     As a key feature, this series includes support for preserving memfd file
     descriptors, which allows critical in-memory data, such as guest RAM or
     any other large memory region, to be maintained in RAM across the kexec
     reboot.
 
   Mike Rappaport merits a mention here, for his extensive review and
   testing work.
 
 - The 3 patch series "kexec: reorganize kexec and kdump sysfs" from
   Sourabh Jain moves the kexec and kdump sysfs entries from /sys/kernel/
   to /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day.
 
 - The 2 patch series "kho: fixes for vmalloc restoration" from Mike
   Rapoport fixes a BUG which was being hit during KHO restoration of
   vmalloc() regions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTSAkQAKCRDdBJ7gKXxA
 jrkiAP9QKfsRv46XZaM5raScjY1ayjP+gqb2rgt6BQ/gZvb2+wD/cPAYOR6BiX52
 n0pVpQmG5P/KyOmpLztn96ejL4heKwQ=
 =JY96
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...
2025-12-06 14:01:20 -08:00
Linus Torvalds
eee654ca9a Landlock update for v6.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCaTMgExAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSN0kBALPG/cpioGMk0j3DagnUtV6fPvGuux9YTmbe
 KpIWdsoCAQC5gO9nzHYIqBOL0CjMKjovljbN+W/AOiirJew95ocyAA==
 =msQS
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:
 "This mainly fixes handling of disconnected directories and adds new
  tests"

* tag 'landlock-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add disconnected leafs and branch test suites
  selftests/landlock: Add tests for access through disconnected paths
  landlock: Improve variable scope
  landlock: Fix handling of disconnected directories
  selftests/landlock: Fix makefile header list
  landlock: Make docs in cred.h and domain.h visible
  landlock: Minor comments improvements
2025-12-06 09:52:41 -08:00
Linus Torvalds
56a1a04dc9 NVDIMM changes for 6.19
* nvdimm: Prevent integer overflow in ramdax_get_config_data()
 	* Documentation: btt: Unwrap bit 31-30 nested table
 	* nvdimm: replace use of system_wq with system_percpu_wq
 	* tools/testing/nvdimm: Use per-DIMM device handle
 	* nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCaTMAfRQcaXJhLndlaW55
 QGludGVsLmNvbQAKCRCebuN7TNx1MWGXAP4lGiQL2fMOpP0PnFVf6aiiNyaTzdaa
 47haKGmsvM6R1AEAqIPelP1tZ+zh6au2G27SJXzJAJk3Nxg1tpxQnZshmg0=
 =LhgA
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull nvdimm updates from Ira Weiny:
 "These are mainly bug fixes and code updates.

  There is a new feature to divide up memmap= carve outs and a fix
  caught in linux-next for that patch. Managing memmap memory on the fly
  for multiple VM's was proving difficult and Mike provided a driver
  which allows for the memory to be better manged.

  Summary:
   - Allow exposing RAM carveouts as NVDIMM DIMM devices
   - Prevent integer overflow in ramdax_get_config_data()
   - Replace use of system_wq with system_percpu_wq
   - Documentation: btt: Unwrap bit 31-30 nested table
   - tools/testing/nvdimm: Use per-DIMM device handle"

* tag 'libnvdimm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm: Prevent integer overflow in ramdax_get_config_data()
  Documentation: btt: Unwrap bit 31-30 nested table
  nvdimm: replace use of system_wq with system_percpu_wq
  tools/testing/nvdimm: Use per-DIMM device handle
  nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
2025-12-06 09:32:25 -08:00
Linus Torvalds
a7405aa92f dma-mapping updates for Linux 6.19:
- next part of DMA mapping API refactoring to physical addresses as the primary
 interface instead of page+offset parameters; this time dma_map_ops callbacks
 are converted to physical addresses, what in turn results also in some
 simplification of architecture specific code (Leon Romanovsky and Jason
 Gunthorpe)
 - clarify that dma_map_benchmark is not a kernel self-test, but standalone
 tool (Qinxin Xia)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaTLpeQAKCRCJp1EFxbsS
 RKlAAQCo/gVslheoJ+h4Hk5oUjLfVQaiamJOlzxw12EVHVAs3AEA7GILIAL1GbGn
 EpkgwjIuz/J/4aGhNbYO6J+C1qMAHww=
 =aM6P
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping updates from Marek Szyprowski:

 - More DMA mapping API refactoring to physical addresses as the primary
   interface instead of page+offset parameters.

   This time dma_map_ops callbacks are converted to physical addresses,
   what in turn results also in some simplification of architecture
   specific code (Leon Romanovsky and Jason Gunthorpe)

 - Clarify that dma_map_benchmark is not a kernel self-test, but
   standalone tool (Qinxin Xia)

* tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: remove unused map_page callback
  xen: swiotlb: Convert mapping routine to rely on physical address
  x86: Use physical address for DMA mapping
  sparc: Use physical address DMA mapping
  powerpc: Convert to physical address DMA mapping
  parisc: Convert DMA map_page to map_phys interface
  MIPS/jazzdma: Provide physical address directly
  alpha: Convert mapping routine to rely on physical address
  dma-mapping: remove unused mapping resource callbacks
  xen: swiotlb: Switch to physical address mapping callbacks
  ARM: dma-mapping: Switch to physical address mapping callbacks
  ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*()
  dma-mapping: convert dummy ops to physical address mapping
  dma-mapping: prepare dma_map_ops to conversion to physical address
  tools/dma: move dma_map_benchmark from selftests to tools/dma
2025-12-06 09:25:05 -08:00
Linus Torvalds
51d90a15fe ARM:
- Support for userspace handling of synchronous external aborts (SEAs),
   allowing the VMM to potentially handle the abort in a non-fatal
   manner.
 
 - Large rework of the VGIC's list register handling with the goal of
   supporting more active/pending IRQs than available list registers in
   hardware. In addition, the VGIC now supports EOImode==1 style
   deactivations for IRQs which may occur on a separate vCPU than the
   one that acked the IRQ.
 
 - Support for FEAT_XNX (user / privileged execute permissions) and
   FEAT_HAF (hardware update to the Access Flag) in the software page
   table walkers and shadow MMU.
 
 - Allow page table destruction to reschedule, fixing long need_resched
   latencies observed when destroying a large VM.
 
 - Minor fixes to KVM and selftests
 
 Loongarch:
 
 - Get VM PMU capability from HW GCFG register.
 
 - Add AVEC basic support.
 
 - Use 64-bit register definition for EIOINTC.
 
 - Add KVM timer test cases for tools/selftests.
 
 RISC/V:
 
 - SBI message passing (MPXY) support for KVM guest
 
 - Give a new, more specific error subcode for the case when in-kernel
   AIA virtualization fails to allocate IMSIC VS-file
 
 - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
   in small chunks
 
 - Fix guest page fault within HLV* instructions
 
 - Flush VS-stage TLB after VCPU migration for Andes cores
 
 s390:
 
 - Always allocate ESCA (Extended System Control Area), instead of
   starting with the basic SCA and converting to ESCA with the
   addition of the 65th vCPU.  The price is increased number of
   exits (and worse performance) on z10 and earlier processor;
   ESCA was introduced by z114/z196 in 2010.
 
 - VIRT_XFER_TO_GUEST_WORK support
 
 - Operation exception forwarding support
 
 - Cleanups
 
 x86:
 
 - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE
   caching is disabled, as there can't be any relevant SPTEs to zap.
 
 - Relocate a misplaced export.
 
 - Fix an async #PF bug where KVM would clear the completion queue when the
   guest transitioned in and out of paging mode, e.g. when handling an SMI and
   then returning to paged mode via RSM.
 
 - Leave KVM's user-return notifier registered even when disabling
   virtualization, as long as kvm.ko is loaded.  On reboot/shutdown, keeping
   the notifier registered is ok; the kernel does not use the MSRs and the
   callback will run cleanly and restore host MSRs if the CPU manages to
   return to userspace before the system goes down.
 
 - Use the checked version of {get,put}_user().
 
 - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
   timers can result in a hard lockup in the host.
 
 - Revert the periodic kvmclock sync logic now that KVM doesn't use a
   clocksource that's subject to NTP corrections.
 
 - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
   behind CONFIG_CPU_MITIGATIONS.
 
 - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path;
   the only reason they were handled in the fast path was to paper of a bug
   in the core #MC code, and that has long since been fixed.
 
 - Add emulator support for AVX MOV instructions, to play nice with emulated
   devices whose guest drivers like to access PCI BARs with large multi-byte
   instructions.
 
 x86 (AMD):
 
 - Fix a few missing "VMCB dirty" bugs.
 
 - Fix the worst of KVM's lack of EFER.LMSLE emulation.
 
 - Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
 
 - Fix incorrect handling of selective CR0 writes when checking intercepts
   during emulation of L2 instructions.
 
 - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
   VMRUN and #VMEXIT.
 
 - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
   interrupt if the guest patched the underlying code after the VM-Exit, e.g.
   when Linux patches code with a temporary INT3.
 
 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
   userspace, and extend KVM "support" to all policy bits that don't require
   any actual support from KVM.
 
 x86 (Intel):
 
 - Use the root role from kvm_mmu_page to construct EPTPs instead of the
   current vCPU state, partly as worthwhile cleanup, but mostly to pave the
   way for tracking per-root TLB flushes, and elide EPT flushes on pCPU
   migration if the root is clean from a previous flush.
 
 - Add a few missing nested consistency checks.
 
 - Rip out support for doing "early" consistency checks via hardware as the
   functionality hasn't been used in years and is no longer useful in general;
   replace it with an off-by-default module param to WARN if hardware fails
   a check that KVM does not perform.
 
 - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
   on VM-Enter.
 
 - Misc cleanups.
 
 - Overhaul the TDX code to address systemic races where KVM (acting on behalf
   of userspace) could inadvertantly trigger lock contention in the TDX-Module;
   KVM was either working around these in weird, ugly ways, or was simply
   oblivious to them (though even Yan's devilish selftests could only break
   individual VMs, not the host kernel)
 
 - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU,
   if creating said vCPU failed partway through.
 
 - Fix a few sparse warnings (bad annotation, 0 != NULL).
 
 - Use struct_size() to simplify copying TDX capabilities to userspace.
 
 - Fix a bug where TDX would effectively corrupt user-return MSR values if the
   TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.
 
 Selftests:
 
 - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
 
 - Forcefully override ARCH from x86_64 to x86 to play nice with specifying
   ARCH=x86_64 on the command line.
 
 - Extend a bunch of nested VMX to validate nested SVM as well.
 
 - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
   verify KVM can save/restore nested VMX state when L1 is using 5-level
   paging, but L2 is not.
 
 - Clean up the guest paging code in anticipation of sharing the core logic for
   nested EPT and nested NPT.
 
 guest_memfd:
 
 - Add NUMA mempolicy support for guest_memfd, and clean up a variety of
   rough edges in guest_memfd along the way.
 
 - Define a CLASS to automatically handle get+put when grabbing a guest_memfd
   from a memslot to make it harder to leak references.
 
 - Enhance KVM selftests to make it easer to develop and debug selftests like
   those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
   often result in hard-to-debug SIGBUS errors.
 
 - Misc cleanups.
 
 Generic:
 
 - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
   irqfd cleanup.
 
 - Fix a goof in the dirty ring documentation.
 
 - Fix choice of target for directed yield across different calls to
   kvm_vcpu_on_spin(); the function was always starting from the first
   vCPU instead of continuing the round-robin search.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkvMa8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMlFwf+Ow7zOYUuELSQ+Jn+hOYXiCNrdBDx
 ZamvMU8kLPr7XX0Zog6HgcMm//qyA6k5nSfqCjfsQZrIhRA/gWJ61jz1OX/Jxq18
 pJ9Vz6epnEPYiOtBwz+v8OS8MqDqVNzj2i6W1/cLPQE50c1Hhw64HWS5CSxDQiHW
 A7PVfl5YU12lW1vG3uE0sNESDt4Eh/spNM17iddXdF4ZUOGublserjDGjbc17E7H
 8BX3DkC2plqkJKwtjg0ae62hREkITZZc7RqsnftUkEhn0N0H9+rb6NKUyzIVh9NZ
 bCtCjtrKN9zfZ0Mujnms3ugBOVqNIputu/DtPnnFKXtXWSrHrgGSNv5ewA==
 =PEcw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Support for userspace handling of synchronous external aborts
     (SEAs), allowing the VMM to potentially handle the abort in a
     non-fatal manner

   - Large rework of the VGIC's list register handling with the goal of
     supporting more active/pending IRQs than available list registers
     in hardware. In addition, the VGIC now supports EOImode==1 style
     deactivations for IRQs which may occur on a separate vCPU than the
     one that acked the IRQ

   - Support for FEAT_XNX (user / privileged execute permissions) and
     FEAT_HAF (hardware update to the Access Flag) in the software page
     table walkers and shadow MMU

   - Allow page table destruction to reschedule, fixing long
     need_resched latencies observed when destroying a large VM

   - Minor fixes to KVM and selftests

  Loongarch:

   - Get VM PMU capability from HW GCFG register

   - Add AVEC basic support

   - Use 64-bit register definition for EIOINTC

   - Add KVM timer test cases for tools/selftests

  RISC/V:

   - SBI message passing (MPXY) support for KVM guest

   - Give a new, more specific error subcode for the case when in-kernel
     AIA virtualization fails to allocate IMSIC VS-file

   - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
     in small chunks

   - Fix guest page fault within HLV* instructions

   - Flush VS-stage TLB after VCPU migration for Andes cores

  s390:

   - Always allocate ESCA (Extended System Control Area), instead of
     starting with the basic SCA and converting to ESCA with the
     addition of the 65th vCPU. The price is increased number of exits
     (and worse performance) on z10 and earlier processor; ESCA was
     introduced by z114/z196 in 2010

   - VIRT_XFER_TO_GUEST_WORK support

   - Operation exception forwarding support

   - Cleanups

  x86:

   - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
     SPTE caching is disabled, as there can't be any relevant SPTEs to
     zap

   - Relocate a misplaced export

   - Fix an async #PF bug where KVM would clear the completion queue
     when the guest transitioned in and out of paging mode, e.g. when
     handling an SMI and then returning to paged mode via RSM

   - Leave KVM's user-return notifier registered even when disabling
     virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
     keeping the notifier registered is ok; the kernel does not use the
     MSRs and the callback will run cleanly and restore host MSRs if the
     CPU manages to return to userspace before the system goes down

   - Use the checked version of {get,put}_user()

   - Fix a long-lurking bug where KVM's lack of catch-up logic for
     periodic APIC timers can result in a hard lockup in the host

   - Revert the periodic kvmclock sync logic now that KVM doesn't use a
     clocksource that's subject to NTP corrections

   - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
     latter behind CONFIG_CPU_MITIGATIONS

   - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
     path; the only reason they were handled in the fast path was to
     paper of a bug in the core #MC code, and that has long since been
     fixed

   - Add emulator support for AVX MOV instructions, to play nice with
     emulated devices whose guest drivers like to access PCI BARs with
     large multi-byte instructions

  x86 (AMD):

   - Fix a few missing "VMCB dirty" bugs

   - Fix the worst of KVM's lack of EFER.LMSLE emulation

   - Add AVIC support for addressing 4k vCPUs in x2AVIC mode

   - Fix incorrect handling of selective CR0 writes when checking
     intercepts during emulation of L2 instructions

   - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
     on VMRUN and #VMEXIT

   - Fix a bug where KVM corrupt the guest code stream when re-injecting
     a soft interrupt if the guest patched the underlying code after the
     VM-Exit, e.g. when Linux patches code with a temporary INT3

   - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
     to userspace, and extend KVM "support" to all policy bits that
     don't require any actual support from KVM

  x86 (Intel):

   - Use the root role from kvm_mmu_page to construct EPTPs instead of
     the current vCPU state, partly as worthwhile cleanup, but mostly to
     pave the way for tracking per-root TLB flushes, and elide EPT
     flushes on pCPU migration if the root is clean from a previous
     flush

   - Add a few missing nested consistency checks

   - Rip out support for doing "early" consistency checks via hardware
     as the functionality hasn't been used in years and is no longer
     useful in general; replace it with an off-by-default module param
     to WARN if hardware fails a check that KVM does not perform

   - Fix a currently-benign bug where KVM would drop the guest's
     SPEC_CTRL[63:32] on VM-Enter

   - Misc cleanups

   - Overhaul the TDX code to address systemic races where KVM (acting
     on behalf of userspace) could inadvertantly trigger lock contention
     in the TDX-Module; KVM was either working around these in weird,
     ugly ways, or was simply oblivious to them (though even Yan's
     devilish selftests could only break individual VMs, not the host
     kernel)

   - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
     TDX vCPU, if creating said vCPU failed partway through

   - Fix a few sparse warnings (bad annotation, 0 != NULL)

   - Use struct_size() to simplify copying TDX capabilities to userspace

   - Fix a bug where TDX would effectively corrupt user-return MSR
     values if the TDX Module rejects VP.ENTER and thus doesn't clobber
     host MSRs as expected

  Selftests:

   - Fix a math goof in mmu_stress_test when running on a single-CPU
     system/VM

   - Forcefully override ARCH from x86_64 to x86 to play nice with
     specifying ARCH=x86_64 on the command line

   - Extend a bunch of nested VMX to validate nested SVM as well

   - Add support for LA57 in the core VM_MODE_xxx macro, and add a test
     to verify KVM can save/restore nested VMX state when L1 is using
     5-level paging, but L2 is not

   - Clean up the guest paging code in anticipation of sharing the core
     logic for nested EPT and nested NPT

  guest_memfd:

   - Add NUMA mempolicy support for guest_memfd, and clean up a variety
     of rough edges in guest_memfd along the way

   - Define a CLASS to automatically handle get+put when grabbing a
     guest_memfd from a memslot to make it harder to leak references

   - Enhance KVM selftests to make it easer to develop and debug
     selftests like those added for guest_memfd NUMA support, e.g. where
     test and/or KVM bugs often result in hard-to-debug SIGBUS errors

   - Misc cleanups

  Generic:

   - Use the recently-added WQ_PERCPU when creating the per-CPU
     workqueue for irqfd cleanup

   - Fix a goof in the dirty ring documentation

   - Fix choice of target for directed yield across different calls to
     kvm_vcpu_on_spin(); the function was always starting from the first
     vCPU instead of continuing the round-robin search"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
  ...
2025-12-05 17:01:20 -08:00
Linus Torvalds
07025b51c1 First set of RISC-V updates for v6.19-rc1
- Enable parallel hotplug for RISC-V
 
 - Optimize vector regset allocation for ptrace()
 
 - Add a kernel selftest for the vector ptrace interface
 
 - Enable the userspace RAID6 test to build and run using RISC-V
   vectors
 
 - Add initial support for the Zalasr RISC-V ratified ISA extension
 
 - For the Zicbop RISC-V ratified ISA extension to userspace, expose
   hardware and kernel support to userspace and add a kselftest for
   Zicbop
 
 - Convert open-coded instances of 'asm goto's that are controlled by
   runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
   following arm64's alternative_has_cap_{un,}likely()
 
 - Remove an unnecessary mask in the GFP flags used in some calls to
   pagetable_alloc()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmkx2zkACgkQx4+xDQu9
 KkvagQ/+Iv94J0S8VWS4vvRMtvPMkEHvPADCmEOrMrVsQ+hZdasKsSFBKY4DHZt6
 baGKEQoyJ0NDrIt51uNWzmR503/PGPwVYSDfrgrpS87IkcO9OWe/HiMSEAu0aCyp
 dhKGYnjWUtawWzFQg1GQ2n1vLOm5cQ2u1vTptISqU1yg9XlXUN0LNzIqNB8GpQv3
 Hm7qqNFyxZOyAC9/ZFGbX2/0KpKQh1xkXSEONxzRGADJ/bqHKKz3hdaOj3aVcoMM
 zObvHBm+I0U6AnGSgiZm71dvO0vlijg1RsuD/wd1DlcGO9QAuaPHX+RKqmJUJFe8
 d0JjOon+d6n/pBKxoPnZyfB1IHxFNb3kX3LjmKdP6NYeOmdHZ7LI3dzB+LFyPXZe
 mkC948+9GExEqmHQx5ZqyCWDwKtknIQPA45ZjBi8e5YOU8nJQiapShYWQu/6ybKV
 GFHRqjDIGfhpjZGVdxnUX2Iok1XcwaGKrI6/6P6WlQ/zHKGurtEEtGiI7XHDYS8m
 FSGxYay4GIWUsNbNSBWcZp5QaGH0jW7qiZle3DR+8gDLe1DJzbIo6pWMiArm5v7x
 fUmroR4FcF9x2X7A01IB2tEUGf/0nfuHgVfMNPNoHBGsiRs9mXzLMngJjbFXOGyF
 EP61/W+K+eaJ0jjkfmnscBQEy8URLSNoACnwQLT9SQcAIC4xWTs=
 =J1nv
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:

 - Enable parallel hotplug for RISC-V

 - Optimize vector regset allocation for ptrace()

 - Add a kernel selftest for the vector ptrace interface

 - Enable the userspace RAID6 test to build and run using RISC-V vectors

 - Add initial support for the Zalasr RISC-V ratified ISA extension

 - For the Zicbop RISC-V ratified ISA extension to userspace, expose
   hardware and kernel support to userspace and add a kselftest for
   Zicbop

 - Convert open-coded instances of 'asm goto's that are controlled by
   runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
   following arm64's alternative_has_cap_{un,}likely()

 - Remove an unnecessary mask in the GFP flags used in some calls to
   pagetable_alloc()

* tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  selftests/riscv: Add Zicbop prefetch test
  riscv: hwprobe: Expose Zicbop extension and its block size
  riscv: Introduce Zalasr instructions
  riscv: hwprobe: Export Zalasr extension
  dt-bindings: riscv: Add Zalasr ISA extension description
  riscv: Add ISA extension parsing for Zalasr
  selftests: riscv: Add test for the Vector ptrace interface
  riscv: ptrace: Optimize the allocation of vector regset
  raid6: test: Add support for RISC-V
  raid6: riscv: Allow code to be compiled in userspace
  raid6: riscv: Prevent compiler from breaking inline vector assembly code
  riscv: cmpxchg: Use riscv_has_extension_likely
  riscv: bitops: Use riscv_has_extension_likely
  riscv: hweight: Use riscv_has_extension_likely
  riscv: checksum: Use riscv_has_extension_likely
  riscv: pgtable: Use riscv_has_extension_unlikely
  riscv: Remove __GFP_HIGHMEM masking
  RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
2025-12-05 16:26:57 -08:00
Amery Hung
0e841d1926 selftests/bpf: Test getting associated struct_ops in timer callback
Make sure 1) a timer callback can also reference the associated
struct_ops, and then make sure 2) the timer callback cannot get a
dangled pointer to the struct_ops when the map is freed.

The test schedules a timer callback from a struct_ops program since
struct_ops programs do not pin the map. It is possible for the timer
callback to run after the map is freed. The timer callback calls a
kfunc that runs .test_1() of the associated struct_ops, which should
return MAP_MAGIC when the map is still alive or -1 when the map is
gone.

The first subtest added in this patch schedules the timer callback to
run immediately, while the map is still alive. The second subtest added
schedules the callback to run 500ms after syscall_prog runs and then
frees the map right after syscall_prog runs. Both subtests then wait
until the callback runs to check the return of the kfunc.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-7-ameryhung@gmail.com
2025-12-05 16:17:58 -08:00
Amery Hung
04fd12df4e selftests/bpf: Test ambiguous associated struct_ops
Add a test to make sure implicit struct_ops association does not
break backward compatibility nor return incorrect struct_ops.
struct_ops programs should still be allowed to be reused in
different struct_ops map. The associated struct_ops map set implicitly
however will be poisoned. Trying to read it through the helper
bpf_prog_get_assoc_struct_ops() should result in a NULL pointer.

While recursion of test_1() cannot happen due to the associated
struct_ops being ambiguois, explicitly check for it to prevent stack
overflow if the test regresses.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-6-ameryhung@gmail.com
2025-12-05 16:17:58 -08:00
Amery Hung
33a165f9c2 selftests/bpf: Test BPF_PROG_ASSOC_STRUCT_OPS command
Test BPF_PROG_ASSOC_STRUCT_OPS command that associates a BPF program
with a struct_ops. The test follows the same logic in commit
ba7000f1c3 ("selftests/bpf: Test multi_st_ops and calling kfuncs from
different programs"), but instead of using map id to identify a specific
struct_ops, this test uses the new BPF command to associate a struct_ops
with a program.

The test consists of two sets of almost identical struct_ops maps and BPF
programs associated with the map. Their only difference is the unique
value returned by bpf_testmod_multi_st_ops::test_1().

The test first loads the programs and associates them with struct_ops
maps. Then, it exercises the BPF programs. They will in turn call kfunc
bpf_kfunc_multi_st_ops_test_1_prog_arg() to trigger test_1() of the
associated struct_ops map, and then check if the right unique value is
returned.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-5-ameryhung@gmail.com
2025-12-05 16:17:58 -08:00
Linus Torvalds
7203ca412f Significant patch series in this merge are as follows:
- The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from
   Uladzislau Rezki reworks the vmalloc() code to support non-blocking
   allocations (GFP_ATOIC, GFP_NOWAIT).
 
 - The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes
   a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited
   across fork/exec.
 
 - The 4 patch series "mm/zswap: misc cleanup of code and documentations"
   from SeongJae Park does some light maintenance work on the zswap code.
 
 - The 5 patch series "mm/page_owner: add debugfs files 'show_handles'
   and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the
   /sys/kernel/debug/page_owner debug feature.  It adds unique identifiers
   to differentiate the various stack traces so that userspace monitoring
   tools can better match stack traces over time.
 
 - The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua
   Hahn makes some minor alterations to the page allocator's per-cpu-pages
   feature.
 
 - The 2 patch series "Improve UFFDIO_MOVE scalability by removing
   anon_vma lock" from Lokesh Gidra addresses a scalability issue in
   userfaultfd's UFFDIO_MOVE operation.
 
 - The 2 patch series "kasan: cleanups for kasan_enabled() checks" from
   Sabyrzhan Tasbolatov performs some cleanup in the KASAN code.
 
 - The 2 patch series "drivers/base/node: fold node register and
   unregister functions" from Donet Tom cleans up the NUMA node handling
   code a little.
 
 - The 4 patch series "mm: some optimizations for prot numa" from Kefeng
   Wang provides some cleanups and small optimizations to the NUMA
   allocation hinting code.
 
 - The 5 patch series "mm/page_alloc: Batch callers of
   free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at
   boot on large machines.  These were causing (harmless) softlockup
   warnings.
 
 - The 2 patch series "optimize the logic for handling dirty file folios
   during reclaim" from Baolin Wang removes some now-unnecessary work from
   page reclaim.
 
 - The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg
   per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning
   feature.
 
 - The 2 patch series "mm/damon: fixes for address alignment issues in
   DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT
   and DAMON_RECLAIM with certain userspace configuration.
 
 - The 15 patch series "expand mmap_prepare functionality, port more
   users" from Lorenzo Stoakes enhances the new(ish)
   file_operations.mmap_prepare() method and ports additional callsites
   from the old ->mmap() over to ->mmap_prepare().
 
 - The 8 patch series "Fix stale IOTLB entries for kernel address space"
   from Lu Baolu fixes a bug (and possible security issue on non-x86) in
   the IOMMU code.  In some situations the IOMMU could be left hanging onto
   a stale kernel pagetable entry.
 
 - The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()"
   from Wei Yang cleans up and optimizes the folio splitting code.
 
 - The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui
   Song implements some cleanups and a minor fix in the swap discard code.
 
 - The 8 patch series "mm/damon: misc documentation fixups" from SeongJae
   Park does as advertised.
 
 - The 9 patch series "mm/damon: support pin-point targets removal" from
   SeongJae Park permits userspace to remove a specific monitoring target
   in the middle of the current targets list.
 
 - The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h"
   from Harry Yoo implements a couple of cleanups related to mm header file
   inclusion.
 
 - The 2 patch series "mm/swapfile.c: select swap devices of default
   priority round robin" from Baoquan He improves the selection of swap
   devices for NUMA machines.
 
 - The 3 patch series "mm: Convert memory block states (MEM_*) macros to
   enums" from Israel Batista changes the memory block labels from macros
   to enums so they will appear in kernel debug info.
 
 - The 3 patch series "ksm: perform a range-walk to jump over holes in
   break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM
   unmerges an address range.
 
 - The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests"
   from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON
   userspace unit tests.
 
 - The 2 patch series "some cleanups for pageout()" from Baolin Wang
   cleans up a couple of minor things in the page scanner's
   writeback-for-eviction code.
 
 - The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from
   Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file.
 
 - The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from
   Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps
   and improves the mergeability of guarded VMAs.
 
 - The 2 patch series "mm: perform guard region install/remove under VMA
   lock" from Lorenzo Stoakes reduces mmap lock contention for callers
   performing VMA guard region operations.
 
 - The 2 patch series "vma_start_write_killable" from Matthew Wilcox
   starts work in permitting applications to be killed when they are
   waiting on a read_lock on the VMA lock.
 
 - The 11 patch series "mm/damon/tests: add more tests for online
   parameters commit" from SeongJae Park adds additional userspace testing
   of DAMON's "commit" feature.
 
 - The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does
   that.
 
 - The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo
   Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when
   that VMA is merged with another.
 
 - The 16 patch series "mm: support device-private THP" from Balbir Singh
   introduces support for Transparent Huge Page (THP) migration in zone
   device-private memory.
 
 - The 3 patch series "Optimize folio split in memory failure" from Zi
   Yan optimizes folio split operations in the memory failure code.
 
 - The 2 patch series "mm/huge_memory: Define split_type and consolidate
   split support checks" from Wei Yang provides some more cleanups in the
   folio splitting code.
 
 - The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap
   entries, introduce leaf entries" from Lorenzo Stoakes cleans up our
   handling of pagetable leaf entries by introducing the concept of
   'software leaf entries', of type softleaf_t.
 
 - The 4 patch series "reparent the THP split queue" from Muchun Song
   reparents the THP split queue to its parent memcg.  This is in
   preparation for addressing the long-standing "dying memcg" problem,
   wherein dead memcg's linger for too long, consuming memory resources.
 
 - The 3 patch series "unify PMD scan results and remove redundant
   cleanup" from Wei Yang does a little cleanup in the hugepage collapse
   code.
 
 - The 6 patch series "zram: introduce writeback bio batching" from
   Sergey Senozhatsky improves zram writeback efficiency by introducing
   batched bio writeback support.
 
 - The 4 patch series "memcg: cleanup the memcg stats interfaces" from
   Shakeel Butt cleans up our handling of the interrupt safety of some
   memcg stats.
 
 - The 4 patch series "make vmalloc gfp flags usage more apparent" from
   Vishal Moola cleans up vmalloc's handling of incoming GFP flags.
 
 - The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V"
   from Chunyan Zhang teches soft dirty and userfaultfd write protect
   tracking to use RISC-V's Svrsw60t59b extension.
 
 - The 5 patch series "mm: swap: small fixes and comment cleanups" from
   Youngjun Park fixes a small bug and cleans up some of the swap code.
 
 - The 4 patch series "initial work on making VMA flags a bitmap" from
   Lorenzo Stoakes starts work on converting the vma struct's flags to a
   bitmap, so we stop running out of them, especially on 32-bit.
 
 - The 2 patch series "mm/swapfile: fix and cleanup swap list iterations"
   from Youngjun Park addresses a possible bug in the swap discard code and
   cleans things up a little.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA
 jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng
 o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4=
 =IUzS
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

  "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
     Rework the vmalloc() code to support non-blocking allocations
     (GFP_ATOIC, GFP_NOWAIT)

  "ksm: fix exec/fork inheritance" (xu xin)
     Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
     inherited across fork/exec

  "mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
     Some light maintenance work on the zswap code

  "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
     Enhance the /sys/kernel/debug/page_owner debug feature by adding
     unique identifiers to differentiate the various stack traces so
     that userspace monitoring tools can better match stack traces over
     time

  "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
     Minor alterations to the page allocator's per-cpu-pages feature

  "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
     Address a scalability issue in userfaultfd's UFFDIO_MOVE operation

  "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)

  "drivers/base/node: fold node register and unregister functions" (Donet Tom)
     Clean up the NUMA node handling code a little

  "mm: some optimizations for prot numa" (Kefeng Wang)
     Cleanups and small optimizations to the NUMA allocation hinting
     code

  "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
     Address long lock hold times at boot on large machines. These were
     causing (harmless) softlockup warnings

  "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
     Remove some now-unnecessary work from page reclaim

  "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
     Enhance the DAMOS auto-tuning feature

  "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
     Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
     configuration

  "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
     Enhance the new(ish) file_operations.mmap_prepare() method and port
     additional callsites from the old ->mmap() over to ->mmap_prepare()

  "Fix stale IOTLB entries for kernel address space" (Lu Baolu)
     Fix a bug (and possible security issue on non-x86) in the IOMMU
     code. In some situations the IOMMU could be left hanging onto a
     stale kernel pagetable entry

  "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
     Clean up and optimize the folio splitting code

  "mm, swap: misc cleanup and bugfix" (Kairui Song)
     Some cleanups and a minor fix in the swap discard code

  "mm/damon: misc documentation fixups" (SeongJae Park)

  "mm/damon: support pin-point targets removal" (SeongJae Park)
     Permit userspace to remove a specific monitoring target in the
     middle of the current targets list

  "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
     A couple of cleanups related to mm header file inclusion

  "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
     improve the selection of swap devices for NUMA machines

  "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
     Change the memory block labels from macros to enums so they will
     appear in kernel debug info

  "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
     Address an inefficiency when KSM unmerges an address range

  "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
     Fix leaks and unhandled malloc() failures in DAMON userspace unit
     tests

  "some cleanups for pageout()" (Baolin Wang)
     Clean up a couple of minor things in the page scanner's
     writeback-for-eviction code

  "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
     Move hugetlb's sysfs/sysctl handling code into a new file

  "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
     Make the VMA guard regions available in /proc/pid/smaps and
     improves the mergeability of guarded VMAs

  "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
     Reduce mmap lock contention for callers performing VMA guard region
     operations

  "vma_start_write_killable" (Matthew Wilcox)
     Start work on permitting applications to be killed when they are
     waiting on a read_lock on the VMA lock

  "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
     Add additional userspace testing of DAMON's "commit" feature

  "mm/damon: misc cleanups" (SeongJae Park)

  "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
     Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
     VMA is merged with another

  "mm: support device-private THP" (Balbir Singh)
     Introduce support for Transparent Huge Page (THP) migration in zone
     device-private memory

  "Optimize folio split in memory failure" (Zi Yan)

  "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
     Some more cleanups in the folio splitting code

  "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
     Clean up our handling of pagetable leaf entries by introducing the
     concept of 'software leaf entries', of type softleaf_t

  "reparent the THP split queue" (Muchun Song)
     Reparent the THP split queue to its parent memcg. This is in
     preparation for addressing the long-standing "dying memcg" problem,
     wherein dead memcg's linger for too long, consuming memory
     resources

  "unify PMD scan results and remove redundant cleanup" (Wei Yang)
     A little cleanup in the hugepage collapse code

  "zram: introduce writeback bio batching" (Sergey Senozhatsky)
     Improve zram writeback efficiency by introducing batched bio
     writeback support

  "memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
     Clean up our handling of the interrupt safety of some memcg stats

  "make vmalloc gfp flags usage more apparent" (Vishal Moola)
     Clean up vmalloc's handling of incoming GFP flags

  "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
     Teach soft dirty and userfaultfd write protect tracking to use
     RISC-V's Svrsw60t59b extension

  "mm: swap: small fixes and comment cleanups" (Youngjun Park)
     Fix a small bug and clean up some of the swap code

  "initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
     Start work on converting the vma struct's flags to a bitmap, so we
     stop running out of them, especially on 32-bit

  "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
     Address a possible bug in the swap discard code and clean things
     up a little

[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
  register device memory for poison handling") because it looks
  broken to me, I've asked for clarification   - Linus ]

* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm: fix vma_start_write_killable() signal handling
  mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
  mm/swapfile: fix list iteration when next node is removed during discard
  fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
  mm/kfence: add reboot notifier to disable KFENCE on shutdown
  memcg: remove inc/dec_lruvec_kmem_state helpers
  selftests/mm/uffd: initialize char variable to Null
  mm: fix DEBUG_RODATA_TEST indentation in Kconfig
  mm: introduce VMA flags bitmap type
  tools/testing/vma: eliminate dependency on vma->__vm_flags
  mm: simplify and rename mm flags function for clarity
  mm: declare VMA flags by bit
  zram: fix a spelling mistake
  mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
  mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
  pagemap: update BUDDY flag documentation
  mm: swap: remove scan_swap_map_slots() references from comments
  mm: swap: change swap_alloc_slow() to void
  mm, swap: remove redundant comment for read_swap_cache_async
  mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
  ...
2025-12-05 13:52:43 -08:00
Linus Torvalds
2e8c1c6a50 ktest: Fix for v6.19:
- Fix incorrect variable in error message in config-bisect.pl
 
   If the old config file fails to get copied as the last good or bad
   config file, then it fails the program and prints an error message.
   But the variable used to print what the old config's name was incorrect.
   It was $config when it should have been $output_config.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTL5PxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnWBAQDtZ+UNtbeNR2eHzbcQ1+ENi0aWGwF9
 e93hKyvAWkMgWAEAklDIdstyCaSQQgq3X4ilv1kaG1eu+KSWNnyhmqnyqAM=
 =KPW2
 -----END PGP SIGNATURE-----

Merge tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest

Pull ktest fix from Steven Rostedt:

 - Fix incorrect variable in error message in config-bisect.pl

   If the old config file fails to get copied as the last good or bad
   config file, then it fails the program and prints an error message.

   But the variable used to print what the old config's name was
   incorrect. It was $config when it should have been $output_config.

* tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest.pl: Fix uninitialized var in config-bisect.pl
2025-12-05 10:53:43 -08:00
Linus Torvalds
0b1b4a3d8e Runtime verifier updates for v6.19:
- Adapt the ftracetest script to be run from a different folder
 
   This uses the already existing OPT_TEST_DIR but extends it further to run
   independent tests, then add an --rv flag to allow using the script for
   testing RV (mostly) independently on ftrace.
 
 - Add basic RV selftests in selftests/verification for more validations
 
   Add more validations for available/enabled monitors and reactors. This
   could have caught the bug introducing kernel panic solved above. Tests use
   ftracetest.
 
 - Convert react() function in reactor to use va_list directly
 
   Use a central helper to handle the variadic arguments. Clean up macros
   and mark functions as static.
 
 - Add lockdep annotations to reactors to have lockdep complain of errors
 
   If the reactors are called from improper context. Useful to develop new
   reactors. This highlights a warning in the panic reactor that is related
   to the printk subsystem and not to RV.
 
 - Convert core RV code to use lock guards and __free helpers
 
   This completely removes goto statements.
 
 - Fix compilation if !CONFIG_RV_REACTORS
 
   Fix the warning by keeping LTL monitor variable as always static.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTBoVxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWpAQDxPQAJQvBZ41l9q9Cis7PqGGezT4Nv
 g6Fh/ydMOlJCsQD/R0Xd5JxPmBI8FLCwCfqHo7wYKUhP8GfL/ORPEWhU2gI=
 =EEot
 -----END PGP SIGNATURE-----

Merge tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull runtime verifier updates from Steven Rostedt:

 - Adapt the ftracetest script to be run from a different folder

   This uses the already existing OPT_TEST_DIR but extends it further to
   run independent tests, then add an --rv flag to allow using the
   script for testing RV (mostly) independently on ftrace.

 - Add basic RV selftests in selftests/verification for more validations

   Add more validations for available/enabled monitors and reactors.
   This could have caught the bug introducing kernel panic solved above.
   Tests use ftracetest.

 - Convert react() function in reactor to use va_list directly

   Use a central helper to handle the variadic arguments. Clean up
   macros and mark functions as static.

 - Add lockdep annotations to reactors to have lockdep complain of
   errors

   If the reactors are called from improper context. Useful to develop
   new reactors. This highlights a warning in the panic reactor that is
   related to the printk subsystem and not to RV.

 - Convert core RV code to use lock guards and __free helpers

   This completely removes goto statements.

 - Fix compilation if !CONFIG_RV_REACTORS

   Fix the warning by keeping LTL monitor variable as always static.

* tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix compilation if !CONFIG_RV_REACTORS
  rv: Convert to use __free
  rv: Convert to use lock guard
  rv: Add explicit lockdep context for reactors
  rv: Make rv_reacting_on() static
  rv: Pass va_list to reactors
  selftests/verification: Add initial RV tests
  selftest/ftrace: Generalise ftracetest to use with RV
2025-12-05 10:17:00 -08:00
Linus Torvalds
028bd4a146 Hi,
This pull request for TPM driver contains changes to unify TPM return
 code translation between trusted_tpm2 and TPM driver itself. Other than
 that the changes are either bug fixes or minor imrovements.
 
 Change log that should explain the previous iterations:
 
 1. "Documentation: tpm-security.rst: change title to section"
    https://lore.kernel.org/all/86514a6ab364e01f163470a91cacef120e1b8b47.camel@HansenPartnership.com/
 2. "drivers/char/tpm: use min() instead of min_t()"
    https://lore.kernel.org/all/20251201161228.3c09d88a@pumpkin/
 3. Removed spurious kfree(): https://lore.kernel.org/linux-integrity/aS+K5nO2MP7N+kxQ@ly-workstation/
 
 BR, Jarkko
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRE6pSOnaBC00OEHEIaerohdGur0gUCaTComgAKCRAaerohdGur
 0t38AQDThfcJhDmgfR3zYo0C8rtNQwM06fnooqsiDjTRHYXu6QEArRKJfR9B/vpN
 vIAloxIgIUxQbewBJ1DfxJ7OVO2kGwA=
 =hMwZ
 -----END PGP SIGNATURE-----

Merge tag 'tpmdd-next-6.19-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
 "This contains changes to unify TPM return code translation between
  trusted_tpm2 and TPM driver itself. Other than that the changes are
  either bug fixes or minor imrovements"

* tag 'tpmdd-next-6.19-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  KEYS: trusted: Use tpm_ret_to_err() in trusted_tpm2
  tpm: Use -EPERM as fallback error code in tpm_ret_to_err
  tpm: Cap the number of PCR banks
  tpm: Remove tpm_find_get_ops
  tpm: add WQ_PERCPU to alloc_workqueue users
  tpm_crb: add missing loc parameter to kerneldoc
  tpm_crb: Fix a spelling mistake
  selftests: tpm2: Fix ill defined assertions
2025-12-04 19:30:09 -08:00
Linus Torvalds
056daec292 iommufd 6.19 pull request
- Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO. This
   is the first step to broader DMABUF support in iommufd, right now it
   only works with VFIO. This closes the last functional gap with classic
   VFIO type 1 to safely support PCI peer to peer DMA by mapping the VFIO
   device's MMIO into the IOMMU.
 
 - Relax SMMUv3 restrictions on nesting domains to better support qemu's
   sequence to have an identity mapping before the vSID is established.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaS8DrgAKCRCFwuHvBreF
 YY/kAP0Q1s7LkVb83uQb8kIW3xKzEnFNTlhrSSGV5UBuYLbaDgD+J+y+4VrSkJem
 85LMipmzaoZdHqtxMhQWrlYbZMr9TAM=
 =BacK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "This is a pretty consequential cycle for iommufd, though this pull is
  not too big. It is based on a shared branch with VFIO that introduces
  VFIO_DEVICE_FEATURE_DMA_BUF a DMABUF exporter for VFIO device's MMIO
  PCI BARs. This was a large multiple series journey over the last year
  and a half.

  Based on that work IOMMUFD gains support for VFIO DMABUF's in its
  existing IOMMU_IOAS_MAP_FILE, which closes the last major gap to
  support PCI peer to peer transfers within VMs.

  In Joerg's iommu tree we have the "generic page table" work which aims
  to consolidate all the duplicated page table code in every iommu
  driver into a single algorithm. This will be used by iommufd to
  implement unique page table operations to start adding new features
  and improve performance.

  In here:

   - Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO.
     This is the first step to broader DMABUF support in iommufd, right
     now it only works with VFIO. This closes the last functional gap
     with classic VFIO type 1 to safely support PCI peer to peer DMA by
     mapping the VFIO device's MMIO into the IOMMU.

   - Relax SMMUv3 restrictions on nesting domains to better support
     qemu's sequence to have an identity mapping before the vSID is
     established"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases
  iommufd/selftest: Add some tests for the dmabuf flow
  iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
  iommufd: Have iopt_map_file_pages convert the fd to a file
  iommufd: Have pfn_reader process DMABUF iopt_pages
  iommufd: Allow MMIO pages in a batch
  iommufd: Allow a DMABUF to be revoked
  iommufd: Do not map/unmap revoked DMABUFs
  iommufd: Add DMABUF to iopt_pages
  vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
2025-12-04 18:50:11 -08:00
Linus Torvalds
a3ebb59eee VFIO updates for v6.19-rc1
- Move libvfio selftest artifacts in preparation of more tightly
    coupled integration with KVM selftests. (David Matlack)
 
  - Fix comment typo in mtty driver. (Chu Guangqing)
 
  - Support for new hardware revision in the hisi_acc vfio-pci variant
    driver where the migration registers can now be accessed via the PF.
    When enabled for this support, the full BAR can be exposed to the
    user. (Longfang Liu)
 
  - Fix vfio cdev support for VF token passing, using the correct size
    for the kernel structure, thereby actually allowing userspace to
    provide a non-zero UUID token.  Also set the match token callback for
    the hisi_acc, fixing VF token support for this this vfio-pci variant
    driver. (Raghavendra Rao Ananta)
 
  - Introduce internal callbacks on vfio devices to simplify and
    consolidate duplicate code for generating VFIO_DEVICE_GET_REGION_INFO
    data, removing various ioctl intercepts with a more structured
    solution. (Jason Gunthorpe)
 
  - Introduce dma-buf support for vfio-pci devices, allowing MMIO regions
    to be exposed through dma-buf objects with lifecycle managed through
    move operations.  This enables low-level interactions such as a
    vfio-pci based SPDK drivers interacting directly with dma-buf capable
    RDMA devices to enable peer-to-peer operations.  IOMMUFD is also now
    able to build upon this support to fill a long standing feature gap
    versus the legacy vfio type1 IOMMU backend with an implementation of
    P2P support for VM use cases that better manages the lifecycle of the
    P2P mapping. (Leon Romanovsky, Jason Gunthorpe, Vivek Kasireddy)
 
  - Convert eventfd triggering for error and request signals to use RCU
    mechanisms in order to avoid a 3-way lockdep reported deadlock issue.
    (Alex Williamson)
 
  - Fix a 32-bit overflow introduced via dma-buf support manifesting with
    large DMA buffers. (Alex Mastro)
 
  - Convert nvgrace-gpu vfio-pci variant driver to insert mappings on
    fault rather than at mmap time.  This conversion serves both to make
    use of huge PFNMAPs but also to both avoid corrected RAS events
    during reset by now being subject to vfio-pci-core's use of
    unmap_mapping_range(), and to enable a device readiness test after
    reset. (Ankit Agrawal)
 
  - Refactoring of vfio selftests to support multi-device tests and split
    code to provide better separation between IOMMU and device objects.
    This work also enables a new test suite addition to measure parallel
    device initialization latency. (David Matlack)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkvV3IRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyIpIQ/9GwpjLH5Vdv0v2d9mkHmZIWFpG/tr3zJa
 +spQqOjO0etASc67PtIJArT9pWib+s6O8OaG7iFrdNR65HCSsXSZbIGbMThPODfy
 DdDj1ipAqMVwcaCZT8un2N8Sktu9YpFQMvc5IoXWWYhw88vili7bBx+OTrEFV2T0
 6qQijSBdhw1TXVFHG6BGSmqmisyMepIebA6GmPWdfYu6BfoWBYMdcMjDwd1J61Q5
 DDwFRzn/Dz2Tvb1jbXiiRMRuFIuegFQii+wtd30S/cRPFZhZLWzc+drimC6oOFiQ
 qL19vQQsBPnLtGvch40HsET/AbY5w0pLCkYX5qacxP3sq27+N+KuotzCvbnVMN+H
 e2BqOCujyoce8z1Br6BzV71Lr2yzPDcc5pXTuEuuBT+J/ptOY8hfEikOj85s5Wzj
 aKsTrdDRGMrn/o11NkGSzYwFcMs9MxCX9mo98U6OkWDr0+cmPLf4LGZgpJudWg4E
 POUlzPpnzJrTlX5d+OqCdKJG0a1hTlTa2udzRa5hCDANHaZWLoAssfgSEKfV9xt1
 PzOMf0UIJmPJmFcw/OpMO72/5xp8O4WslJS0ulSm6vrAJDtutLApHZ7bJ44KniNd
 4vte+gOjyZY8ibTDKRULhXVlCDxkEnZjRBbApgI9HJD61IElOzjqohRuRx77J09B
 7c8OSLI8d1U=
 =tpee
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Move libvfio selftest artifacts in preparation of more tightly
   coupled integration with KVM selftests (David Matlack)

 - Fix comment typo in mtty driver (Chu Guangqing)

 - Support for new hardware revision in the hisi_acc vfio-pci variant
   driver where the migration registers can now be accessed via the PF.
   When enabled for this support, the full BAR can be exposed to the
   user (Longfang Liu)

 - Fix vfio cdev support for VF token passing, using the correct size
   for the kernel structure, thereby actually allowing userspace to
   provide a non-zero UUID token. Also set the match token callback for
   the hisi_acc, fixing VF token support for this this vfio-pci variant
   driver (Raghavendra Rao Ananta)

 - Introduce internal callbacks on vfio devices to simplify and
   consolidate duplicate code for generating VFIO_DEVICE_GET_REGION_INFO
   data, removing various ioctl intercepts with a more structured
   solution (Jason Gunthorpe)

 - Introduce dma-buf support for vfio-pci devices, allowing MMIO regions
   to be exposed through dma-buf objects with lifecycle managed through
   move operations. This enables low-level interactions such as a
   vfio-pci based SPDK drivers interacting directly with dma-buf capable
   RDMA devices to enable peer-to-peer operations. IOMMUFD is also now
   able to build upon this support to fill a long standing feature gap
   versus the legacy vfio type1 IOMMU backend with an implementation of
   P2P support for VM use cases that better manages the lifecycle of the
   P2P mapping (Leon Romanovsky, Jason Gunthorpe, Vivek Kasireddy)

 - Convert eventfd triggering for error and request signals to use RCU
   mechanisms in order to avoid a 3-way lockdep reported deadlock issue
   (Alex Williamson)

 - Fix a 32-bit overflow introduced via dma-buf support manifesting with
   large DMA buffers (Alex Mastro)

 - Convert nvgrace-gpu vfio-pci variant driver to insert mappings on
   fault rather than at mmap time. This conversion serves both to make
   use of huge PFNMAPs but also to both avoid corrected RAS events
   during reset by now being subject to vfio-pci-core's use of
   unmap_mapping_range(), and to enable a device readiness test after
   reset (Ankit Agrawal)

 - Refactoring of vfio selftests to support multi-device tests and split
   code to provide better separation between IOMMU and device objects.
   This work also enables a new test suite addition to measure parallel
   device initialization latency (David Matlack)

* tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio: (65 commits)
  vfio: selftests: Add vfio_pci_device_init_perf_test
  vfio: selftests: Eliminate INVALID_IOVA
  vfio: selftests: Split libvfio.h into separate header files
  vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c
  vfio: selftests: Rename vfio_util.h to libvfio.h
  vfio: selftests: Stop passing device for IOMMU operations
  vfio: selftests: Move IOVA allocator into iova_allocator.c
  vfio: selftests: Move IOMMU library code into iommu.c
  vfio: selftests: Rename struct vfio_dma_region to dma_region
  vfio: selftests: Upgrade driver logging to dev_err()
  vfio: selftests: Prefix logs with device BDF where relevant
  vfio: selftests: Eliminate overly chatty logging
  vfio: selftests: Support multiple devices in the same container/iommufd
  vfio: selftests: Introduce struct iommu
  vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode
  vfio: selftests: Allow passing multiple BDFs on the command line
  vfio: selftests: Split run.sh into separate scripts
  vfio: selftests: Move run.sh into scripts directory
  vfio/nvgrace-gpu: wait for the GPU mem to be ready
  vfio/nvgrace-gpu: Inform devmem unmapped after reset
  ...
2025-12-04 18:42:48 -08:00
Linus Torvalds
ce5cfb0fa2 IOMMU Updates for Linux v6.19
Including:
 
 	- Introduction of the generic IO page-table framework with support for
 	  Intel and AMD IOMMU formats from Jason. This has good potential for
 	  unifying more IO page-table implementations and making future
 	  enhancements more easy. But this also needed quite some fixes during
 	  development. All known issues have been fixed, but my feeling is that
 	  there is a higher potential than usual that more might be needed.
 
 	- Intel VT-d updates:
 	  - Use right invalidation hint in qi_desc_iotlb().
 
 	  - Reduce the scope of INTEL_IOMMU_FLOPPY_WA.
 
 	- ARM-SMMU updates:
 	  - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
 	    and a new clock for the TBU.
 
 	  - Fix error handling if level 1 CD table allocation fails.
 
 	  - Permit more than the architectural maximum number of SMRs for funky
 	    Qualcomm mis-implementations of SMMUv2.
 
 	- Mediatek driver:
 	  - MT8189 iommu support.
 
 	- Move ARM IO-pgtable selftests to kunit.
 
 	- Device leak fixes for a couple of drivers.
 
 	- Random smaller fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmktjWgACgkQK/BELZcB
 GuMQgA//YljqMVbMmYpFQF/9nXsyTvkpzaaVqj3CscjfnLJQ7YNIrWHLMw4xcZP7
 c2zsSDMPc5LAe2nkyNPvMeFufsbDOYE1CcbqFfZhNqwKGWIVs7J1j73aqJXKfXB4
 RaVv5GA1NFeeIRRKPx/ZpD9W1WiR9PiUQXBxNTAJLzbBNhcTJzo+3YuSpJ6ckOak
 aG67Aox6Dwq/u0gHy8gWnuA2XL6Eit+nQbod7TfchHoRu+TUdbv8qWL+sUChj+u/
 IlyBt1YL/do3bJC0G1G2E81J1KGPU/OZRfB34STQKlopEdXX17ax3b2X0bt3Hz/h
 9Yk3BLDtGMBQ0aVZzAZcOLLlBlEpPiMKBVuJQj29kK9KJSYfmr2iolOK0cGyt+kv
 DfQ8+nv6HRFMbwjetfmhGYf6WemPcggvX44Hm/rgR2qbN3P+Q8/klyyH8MLuQeqO
 ttoQIwDd9DYKJelmWzbLgpb2vGE3O0EAFhiTcCKOk643PaudfengWYKZpJVIIqtF
 nEUEpk17HlpgFkYrtmIE7CMONqUGaQYO84R3j7DXcYXYAvqQJkhR3uJejlWQeh8x
 uMc9y04jpg3p5vC5c7LfkQ3at3p/jf7jzz4GuNoZP5bdVIUkwXXolir0ct3/oby3
 /bXXNA1pSaRuUADm7pYBBhKYAFKC7vCSa8LVaDR3CB95aNZnvS0=
 =KG7j
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:

 - Introduction of the generic IO page-table framework with support for
   Intel and AMD IOMMU formats from Jason.

   This has good potential for unifying more IO page-table
   implementations and making future enhancements more easy. But this
   also needed quite some fixes during development. All known issues
   have been fixed, but my feeling is that there is a higher potential
   than usual that more might be needed.

 - Intel VT-d updates:
    - Use right invalidation hint in qi_desc_iotlb()
    - Reduce the scope of INTEL_IOMMU_FLOPPY_WA

 - ARM-SMMU updates:
    - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
      and a new clock for the TBU.
    - Fix error handling if level 1 CD table allocation fails.
    - Permit more than the architectural maximum number of SMRs for
      funky Qualcomm mis-implementations of SMMUv2.

 - Mediatek driver:
    - MT8189 iommu support

 - Move ARM IO-pgtable selftests to kunit

 - Device leak fixes for a couple of drivers

 - Random smaller fixes and improvements

* tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits)
  iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
  iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
  powerpc/pseries/svm: Make mem_encrypt.h self contained
  genpt: Make GENERIC_PT invisible
  iommupt: Avoid a compiler bug with sw_bit
  iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal
  iommupt: Fix unlikely flows in increase_top()
  iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga()
  MAINTAINERS: Update my email address
  iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables
  dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock
  iommu/vt-d: Restore previous domain::aperture_end calculation
  iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
  iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD
  iommu/tegra: fix device leak on probe_device()
  iommu/sun50i: fix device leak on of_xlate()
  iommu/omap: simplify probe_device() error handling
  iommu/omap: fix device leaks on probe_device()
  iommu/mediatek-v1: add missing larb count sanity check
  iommu/mediatek-v1: fix device leaks on probe()
  ...
2025-12-04 18:05:06 -08:00
Linus Torvalds
5797d10ea4 cxl for v6.19
Misc:
 - Remove incorrect page-allocator quirk section in documentation.
 - Remove unused devm_cxl_port_enumerate_dports() function.
 - Fix typo in cdat.c code comment.
 - Replace use of system_wq with system_percpu_wq
 - Add locked CXL decoder support for region removal.
 - Return when generic target updated
 - Rename region_res_match_cxl_range() to spa_maps_hpa()
 - Clarify comment in spa_maps_hpa()
 
 Enable unit testing for XOR address translation of SPA to DPA and vice versa.
 - Refactor address translation funcs for testing in cxl_region.
 - Make the XOR calculations available for testing.
 - Add cxl_translate module for address translation testing in cxl_test.
 
 Extended Linear Cache changes:
 - Add extended linear cache size sysfs attribute.
 - Adjust failure emission of extended linear cache detection in cxl_acpi.
 - Added extended linear cache unit testing support in cxl_test
 
 Preparation refactor patches for PRM translation support.
 - Simplify cxl_rd_ops allocation and handling.
 - Group xor arithmetric setup code in a single block.
 - Remove local variable @inc in cxl_port_setup_targets()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmktwo8ACgkQYGjFFmlT
 OEo+zw/+JvSlKu6H9XPB6eiMS4i7aWpo1/k8uqpejZp29jxmZFe4N+YYuvVaMlEB
 c7FhMd0e+jXC+/GocsleGyvXSXUJf+mgkrcsRHFNkwaZHSFud2LIbbyrwXRh3wdN
 KPViMdhrB12YuCASkZLMXf8PIQT5apDUhXoo9F/pxtlFjlpWnOjqewL7dmNNLK9i
 IIfdfrGUb90CyFDCLMreJOPnsRm4TFKaG4ITAImZOlj94Gx9KSdsS0Dcfzm0tZr/
 sYjSamgN3ctoOVbzEhEy3Kmy3NOSLyWD8MOUzSXggZo2dU9u1F1h7yu1RABNisCd
 jWq1x8/liFS0X1MPtHbjb9VVrgRNsemUrnIhGPTOG+iDWHk33vcftUulYtw1IZ14
 tepArmR/duTU9v7J7GLbshjUO8hyPhDvF0TZKV+/290GQyrV4Mkd7kZJ0zU0OrJQ
 Z1TiMne1e/RMWP/RNtpyOVif3hX71T3UeB28xo+HuElWd2jlZQQ1wh0JW7gZJzxD
 p4Npl1elY1j4lGsXjPrXrBjMJ6X95XTIwsYLdtaSJlKXxvByec47dVSOhJZyfVHG
 5UBsZZsAawS+bf6LFbNWH09MBwTZeS0K0i44Bf9ttiafkjSWCk4uJmhmkbGNOt49
 DYRYRHykIik64t+9AFzW0SjHcPTsKjfGgCBwsws+UilSIZl5HNk=
 =vT5l
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull compute express link (CXL) updates from Dave Jiang:
 "The additions of note are adding CXL region remove support for locked
  CXL decoders, adding unit testing support for XOR address translation,
  and adding unit testing support for extended linear cache.

  Misc:
   - Remove incorrect page-allocator quirk section in documentation
   - Remove unused devm_cxl_port_enumerate_dports() function
   - Fix typo in cdat.c code comment
   - Replace use of system_wq with system_percpu_wq
   - Add locked CXL decoder support for region removal
   - Return when generic target updated
   - Rename region_res_match_cxl_range() to spa_maps_hpa()
   - Clarify comment in spa_maps_hpa()

  Enable unit testing for XOR address translation of SPA to DPA and vice versa:
   - Refactor address translation funcs for testing in cxl_region
   - Make the XOR calculations available for testing
   - Add cxl_translate module for address translation testing in
     cxl_test

  Extended Linear Cache changes:
   - Add extended linear cache size sysfs attribute
   - Adjust failure emission of extended linear cache detection in
     cxl_acpi
   - Added extended linear cache unit testing support in cxl_test

  Preparation refactor patches for PRM translation support:
   - Simplify cxl_rd_ops allocation and handling
   - Group xor arithmetric setup code in a single block
   - Remove local variable @inc in cxl_port_setup_targets()"

* tag 'cxl-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (22 commits)
  cxl/test: Assign overflow_err_count from log->nr_overflow
  cxl/test: Remove ret_limit race condition in mock_get_event()
  cxl/test: remove unused mock function for cxl_rcd_component_reg_phys()
  cxl/test: Add support for acpi extended linear cache
  cxl/test: Add cxl_test CFMWS support for extended linear cache
  cxl/test: Standardize CXL auto region size
  cxl/region: Remove local variable @inc in cxl_port_setup_targets()
  cxl/acpi: Group xor arithmetric setup code in a single block
  cxl: Simplify cxl_rd_ops allocation and handling
  cxl: Clarify comment in spa_maps_hpa()
  cxl: Rename region_res_match_cxl_range() to spa_maps_hpa()
  acpi/hmat: Return when generic target is updated
  cxl: Add handling of locked CXL decoder
  cxl/region: Add support to indicate region has extended linear cache
  cxl: Adjust extended linear cache failure emission in cxl_acpi
  cxl/test: Add cxl_translate module for address translation testing
  cxl/acpi: Make the XOR calculations available for testing
  cxl/region: Refactor address translation funcs for testing
  cxl/pci: replace use of system_wq with system_percpu_wq
  cxl: fix typos in cdat.c comments
  ...
2025-12-04 17:55:18 -08:00
Linus Torvalds
fde4ce068d hid-for-linus-2025120201
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmku8DAACgkQpmLzj2vt
 YEk1oQ/+PTkkVxeXaSMuC5/xLPuvDOwOt6w9bN5RbUaqGhg15x67clZ8Ibi6nXYt
 cYxXR8rWMHFGujBdiP1UXRbYAAJEHnzYHAj2AQKFhPmSpeIm39TOcN2qEtSsYQ0r
 mgcNi/9SAf+M3iF/b5qmcPgnfqZ3SB8jt1AFMV6F5Mbv+9y5ZYLbCDDD6TylJOGM
 IU3VRt9B1ffSyc93SZkODXRNNGR/Iv0/0Yn1bPtNNSZeZ6ZNgrRPUukg0P+PsllY
 W/hAhuoISKW1oSojlSeF4x5BJR/Xt0Mym7/IdraBbgPJPCwIBxfxcatQiSFJlaMD
 GRFf0V1xD9gWxY/xNkmtjJhUjh6hmL6hTTX6lUVA+ltXKnUl7mfnFuMOt1naWKU+
 hRech8eFRxJ4ZfwYTpT6w8IXV1OI0MGhcL7uWmbhpPGDGRTxQOqQE5DFTBrf54GB
 OMlT0soDbyYQmCzZXaQyrn3HhCKpovmgjJRTMBvY1aQ6T707V/qu8C4eGNqlPjUn
 4ywxwPaJ6B4M2q/nSl1BQPakka47pTVZoLkmsJuDU2npskPOs9ZHvfUXgMs7eOri
 jS3BFoBOkrNAZ6YTiFtwjV0fw3CNHbSeicXV/fd1B3tB48NA52XBjq/uKiwOkZAJ
 3bcJ5K7fgJXwOkVTAQhfgCPplfI1bbdvei40gkyITP36Ye3IyxE=
 =7Jr3
 -----END PGP SIGNATURE-----

Merge tag 'hid-for-linus-2025120201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID updates from Jiri Kosina:

 - Proper mapping of HID_GD_Z to ABS_DISTANCE for stylus/pen types of
   devices (Ping Cheng)

 - Power management/hibernation improvements in intel-ish (Zhang Lixu)

 - Improved support for several Logitech devices, e.g. G Pro X
   Superlight 2, new iteration of Lighspeed receiver, G13, G510 (Nathan
   Rossi, Mavroudis Chatzilazaridis, Leo L Schwab, Hans de Goede)

 - Support for UcLogic XP-PEN Artist 24 Pro (Joshua Goins)

 - WinWing Orion2 throttle support improvement (Ivan Gorinov)

 - other assorted small fixes and device ID additions

* tag 'hid-for-linus-2025120201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (37 commits)
  drivers: hid: renegotiate resolution multipliers with device after reset
  HID: evision: Fix Report Descriptor for Evision Wireless Receiver 320f:226f
  HID: logitech-dj: Fix probe failure when used with KVM
  HID: logitech-dj: Remove duplicate error logging
  HID: logitech-dj: Add support for G Pro X Superlight 2 receiver
  selftests/hid-tablet: add ABS_DISTANCE test for stylus/pen
  HID: input: map HID_GD_Z to ABS_DISTANCE for stylus/pen
  HID: bpf: fix typo in HID usage table
  HID: bpf: add the Huion Kamvas 27 Pro
  HID: bpf: add heuristics to the Huion Inspiroy 2S eraser button
  HID: bpf: Add support for XP-Pen Deco02
  HID: bpf: Add support for the XP-Pen Deco 01 V3
  HID: bpf: Add support for the Waltop Batteryless Tablet
  HID: bpf: Add fixup for Logitech SpaceNavigator variants
  HID: bpf: support for Huion Kamvas 16 Gen 3
  HID: bpf: add support for Huion Kamvas 13 (Gen 3) (model GS1333)
  HID: bpf: Add support for the Inspiroy 2M
  Documentation: hid-alps: Format DataByte* subsection headings
  Documentation: hid-alps: Fix packet format section headings
  HID: nintendo: add WQ_PERCPU to alloc_workqueue users
  ...
2025-12-04 15:44:48 -08:00
Linus Torvalds
2aa680df68 sound updates for 6.19-rc1
The majority of changes at this time were about ASoC with a lot of
 code refactoring works.  From the functionality POV, there aren't much
 to see, but we have a wide range of device-specific fixes and updates.
 Here are some highlights:
 
 - Continued ASoC API clean works, spanned over many files
 - Added a SoundWire SCDA generic class driver with regmap support
 - Enhancements and fixes for Cirrus, Intel, Maxim and Qualcomm.
 - Support for ASoC Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
   QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
   TAS5815, TAS5828 and TAS5830
 - Usual HD-audio and USB-audio quirks and fixups
 - Support for Onkyo SE-300PCIE, TASCAM IF-FW/DM MkII
 
 Some gpiolib changes for shared GPIOs are included along with this PR
 for covering ASoC drivers changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmkwQ2UOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8tIRAAjCHdIlMejNTCzGRlhsRSQVD6bo1wASXcjfJ6
 COH84akbnA0oT5z7H7JnzTOmfjzxLJpwC8j6IpZ/9CQazanT5IIVE41FZquXZ1JB
 RhQVzuGw9Pl4MaYVdFuRqIXjiP+msY1jpbo9/QXQo8D/B41wpmVTgzkFVW2rxPMy
 0aBOu4Wpu+11aBpNBy6dXDiKQ5kDqn7zOLoFGgcf5wlFIvOGZJ0Wg/i0kvCjl+ia
 xYiP+/F6xKOyTY1c98iqExbKzSSy4ddGFUwrkevm6bWpu8hkXiL1O0zMWOe769x6
 0wy0b5zvsbtOQOxbtK5+8gdjJw7ycgDa441hDtsaXBBROYZEV3D6+XZJCfq8Tz8F
 +vLH5lfZeLg+59eqt3GOMGlwBfuhH91qzukIYG3q9EQGOkNkZ19ySJnFMLom68Ei
 TCfNzh/ggSGXA9qAmfBcPoizgC/j9o+v4kbLRQteuRRWxES1FxqeN9Ba3d5JcHT3
 BQpz1bhUli73477D6voPcwXLiQlM+Alv4QUKTFr2nUnWUQKwMvkZFwiv2jTqVdDf
 f71Usv7xdyM7XijgmXuLg+3n0UvCwUPBB+bv3a1Bu7G4iTB1deNKU8t9k+sBJpcX
 aRs5ych3MiU/zG+KRMB5FEx31KzpKu+Kk9NQ207/1HLaNhTgD3cg2wS3T3qdRUPv
 Yf6wFHs=
 =1JUI
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "The majority of changes at this time were about ASoC with a lot of
  code refactoring works. From the functionality POV, there isn't much
  to see, but we have a wide range of device-specific fixes and updates.
  Here are some highlights:

   - Continued ASoC API cleanup work, spanned over many files

   - Added a SoundWire SCDA generic class driver with regmap support

   - Enhancements and fixes for Cirrus, Intel, Maxim and Qualcomm.

   - Support for ASoC Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
     QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
     TAS5815, TAS5828 and TAS5830

   - Usual HD-audio and USB-audio quirks and fixups

   - Support for Onkyo SE-300PCIE, TASCAM IF-FW/DM MkII

  Some gpiolib changes for shared GPIOs are included along with this PR
  for covering ASoC drivers changes"

* tag 'sound-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (739 commits)
  ALSA: hda/realtek: Add PCI SSIDs to HP ProBook quirks
  ALSA: usb-audio: Simplify with usb_endpoint_max_periodic_payload()
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for more HP laptops
  ALSA: rawmidi: Fix inconsistent indenting warning reported by smatch
  ALSA: dice: fix buffer overflow in detect_stream_formats()
  ASoC: codecs: Modify awinic amplifier dsp read and write functions
  ASoC: SDCA: Fixup some more Kconfig issues
  ASoC: cs35l56: Log a message if firmware is missing
  ASoC: nau8325: Delete a stray tab
  firmware: cs_dsp: Add test cases for client_ops == NULL
  firmware: cs_dsp: Don't require client to provide a struct cs_dsp_client_ops
  ASoC: fsl_micfil: Set channel range control
  ASoC: fsl_micfil: Add default quality for different platforms
  ASoC: intel: sof_sdw: Add codec_info for cs42l45
  ASoC: sdw_utils: Add cs42l45 support functions
  ASoC: intel: sof_sdw: Add ability to have auxiliary devices
  ASoC: sdw_utils: Move codec_name to dai info
  ASoC: sdw_utils: Add codec_conf for every DAI
  ASoC: SDCA: Add terminal type into input/output widget name
  ASoC: SDCA: Align mute controls to ALSA expectations
  ...
2025-12-04 10:08:40 -08:00
Alexei Starovoitov
835a507535 selftests/bpf: Add -fms-extensions to bpf build flags
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct slab {
   ..
   struct freelist_counters;
};

Use -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-03 20:20:11 -08:00
Linus Torvalds
cc25df3e2e for-6.19/block-20251201
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmktsoMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuiUD/92eivL+HmOh10o8trvxajB0yuyqfSjHHrL
 g+xUbF4s9bgAg/v+Upx7sTY8jdrTcMjKov+G9T6uPvBMqVmeVdZckA1PSAKQaIX1
 Zb7nS2LnO7F6JKbwpwVrrIaqVbcz8MfGIIMbN4yNNEOMCwdIVMp4fo7trPBknJNx
 WddNSGUFlIF3NqSI8AflSS/pYnGm+McfBHXBpJAKipI3iquKKubHv+FX9kLp7Tn4
 x27ZoCWOHglIBTJXU0mmXCVsLF8b5BA8DQcGtT62azb8+l0cRTkaHY0DFAv5BvhG
 TqcjrKdmR0cGSNt+nEmFrujE3atBRl0G0kiHA80YgA1MTtYzdPaUVOUtM9k/rEem
 gpiGMDpBypdxyJAyijPSaVJdfcg0psOlYbhIR4N2wbj/dq8268h+cWzXlF1spgVt
 /7ygoaCmfMNbTy9rKThTjH+es787AVXUAXXaPHhIFsnCKUj8xQl4pT7XltmgYeWx
 1/XD1NEJeLHHog5upAVlGX3H5tbvP1nIICxbZa9mDOJX1rwxxI7/s/RucPjbNXuY
 AiaKPTfxtB9+Ihd2HrJ/76RVMkckcOBc4GIKoFfwuKDbcdLXQ5FcZCmVRoI1V9SV
 KsH7JBgihLwR9XWKE1vp9+CBNe1Qlu3K4IjG/E7CNLeuDntIBu73ihqGP/DqV6Bq
 RX1Dc0OyAQ==
 =m22w
 -----END PGP SIGNATURE-----

Merge tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - Fix head insertion for mq-deadline, a regression from when priority
   support was added

 - Series simplifying and improving the ublk user copy code

 - Various ublk related cleanups

 - Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the
   request is punted to a thread for handling

 - Merge and then later revert loop dio nowait support, as it ended up
   causing excessive stack usage for when the inline issue code needs to
   dip back into the full file system code

 - Improve auto integrity code, making it less deadlock prone

 - Speedup polled IO handling, but manually managing the hctx lookups

 - Fixes for blk-throttle for SSD devices

 - Small series with fixes for the S390 dasd driver

 - Add support for caching zones, avoiding unnecessary report zone
   queries

 - MD pull requests via Yu:
      - fix null-ptr-dereference regression for dm-raid0
      - fix IO hang for raid5 when array is broken with IO inflight
      - remove legacy 1s delay to speed up system shutdown
      - change maintainer's email address
      - data can be lost if array is created with different lbs devices,
        fix this problem and record lbs of the array in metadata
      - fix rcu protection for md_thread
      - fix mddev kobject lifetime regression
      - enable atomic writes for md-linear
      - some cleanups

 - bcache updates via Coly
      - remove useless discard and cache device code
      - improve usage of per-cpu workqueues

 - Reorganize the IO scheduler switching code, fixing some lockdep
   reports as well

 - Improve the block layer P2P DMA support

 - Add support to the block tracing code for zoned devices

 - Segment calculation improves, and memory alignment flexibility
   improvements

 - Set of prep and cleanups patches for ublk batching support. The
   actual batching hasn't been added yet, but helps shrink down the
   workload of getting that patchset ready for 6.20

 - Fix for how the ps3 block driver handles segments offsets

 - Improve how block plugging handles batch tag allocations

 - nbd fixes for use-after-free of the configuration on device clear/put

 - Set of improvements and fixes for zloop

 - Add Damien as maintainer of the block zoned device code handling

 - Various other fixes and cleanups

* tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
  block/rnbd: correct all kernel-doc complaints
  blk-mq: use queue_hctx in blk_mq_map_queue_type
  md: remove legacy 1s delay in md_notify_reboot
  md/raid5: fix IO hang when array is broken with IO inflight
  md: warn about updating super block failure
  md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid
  sbitmap: fix all kernel-doc warnings
  ublk: add helper of __ublk_fetch()
  ublk: pass const pointer to ublk_queue_is_zoned()
  ublk: refactor auto buffer register in ublk_dispatch_req()
  ublk: add `union ublk_io_buf` with improved naming
  ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg()
  kfifo: add kfifo_alloc_node() helper for NUMA awareness
  blk-mq: fix potential uaf for 'queue_hw_ctx'
  blk-mq: use array manage hctx map instead of xarray
  ublk: prevent invalid access with DEBUG
  s390/dasd: Use scnprintf() instead of sprintf()
  s390/dasd: Move device name formatting into separate function
  s390/dasd: Remove unnecessary debugfs_create() return checks
  s390/dasd: Fix gendisk parent after copy pair swap
  ...
2025-12-03 19:26:18 -08:00
Linus Torvalds
8f7aa3d3c7 Networking changes for 6.19.
Core & protocols
 ----------------
 
  - Replace busylock at the Tx queuing layer with a lockless list. Resulting
    in a 300% (4x) improvement on heavy TX workloads, sending twice the
    number of packets per second, for half the cpu cycles.
 
  - Allow constantly busy flows to migrate to a more suitable CPU/NIC
    queue. Normally we perform queue re-selection when flow comes out
    of idle, but under extreme circumstances the flows may be constantly
    busy. Add sysctl to allow periodic rehashing even if it'd risk packet
    reordering.
 
  - Optimize the NAPI skb cache, make it larger, use it in more paths.
 
  - Attempt returning Tx skbs to the originating CPU (like we already did
    for Rx skbs).
 
  - Various data structure layout and prefetch optimizations from Eric.
 
  - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly
    quite expensive on recent AMD machines.
 
  - Extend threaded NAPI polling to allow the kthread busy poll for packets.
 
  - Make MPTCP use Rx backlog processing. This lowers the lock pressure,
    improving the Rx performance.
 
  - Support memcg accounting of MPTCP socket memory.
 
  - Allow admin to opt sockets out of global protocol memory accounting
    (using a sysctl or BPF-based policy). The global limits are a poor fit
    for modern container workloads, where limits are imposed using cgroups.
 
  - Improve heuristics for when to kick off AF_UNIX garbage collection.
 
  - Allow users to control TCP SACK compression, and default to 33% of RTT.
 
  - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily
    aggressive rcvbuf growth and overshot when the connection RTT is low.
 
  - Preserve skb metadata space across skb_push / skb_pull operations.
 
  - Support for IPIP encapsulation in the nftables flowtable offload.
 
  - Support appending IP interface information to ICMP messages (RFC 5837).
 
  - Support setting max record size in TLS (RFC 8449).
 
  - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
 
  - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
 
  - Let users configure the number of write buffers in SMC.
 
  - Add new struct sockaddr_unsized for sockaddr of unknown length,
    from Kees.
 
  - Some conversions away from the crypto_ahash API, from Eric Biggers.
 
  - Some preparations for slimming down struct page.
 
  - YAML Netlink protocol spec for WireGuard.
 
  - Add a tool on top of YAML Netlink specs/lib for reporting commonly
    computed derived statistics and summarized system state.
 
 Driver API
 ----------
 
  - Add CAN XL support to the CAN Netlink interface.
 
  - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics,
    as defined by the OPEN Alliance's "Advanced diagnostic features
    for 100BASE-T1 automotive Ethernet PHYs" specification.
 
  - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x).
 
  - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec
    and performs RSS.
 
  - Add info to devlink params whether the current setting is the default
    or a user override. Allow resetting back to default.
 
  - Add standard device stats for PSP crypto offload.
 
  - Leverage DSA frame broadcast to implement simple HSR frame duplication
    for a lot of switches without dedicated HSR offload.
 
  - Add uAPI defines for 1.6Tbps link modes.
 
 Device drivers
 --------------
 
  - Add Motorcomm YT921x gigabit Ethernet switch support.
 
  - Add MUCSE driver for N500/N210 1GbE NIC series.
 
  - Convert drivers to support dedicated ops for timestamping control,
    and away from the direct IOCTL handling. While at it support GET
    operations for PHY timestamping.
 
  - Add (and convert most drivers to) a dedicated ethtool callback
    for reading the Rx ring count.
 
  - Significant refactoring efforts in the STMMAC driver, which supports
    Synopsys turn-key MAC IP integrated into a ton of SoCs.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - support PPS in/out on all pins
    - Intel (100G, ice, idpf):
      - ice: implement standard ethtool and timestamping stats
      - i40e: support setting the max number of MAC addresses per VF
      - iavf: support RSS of GTP tunnels for 5G and LTE deployments
    - nVidia/Mellanox (mlx5):
      - reduce downtime on interface reconfiguration
      - disable being an XDP redirect target by default (same as other
        drivers) to avoid wasting resources if feature is unused
    - Meta (fbnic):
      - add support for Linux-managed PCS on 25G, 50G, and 100G links
    - Wangxun:
      - support Rx descriptor merge, and Tx head writeback
      - support Rx coalescing offload
      - support 25G SPF and 40G QSFP modules
 
  - Ethernet virtual:
    - Google (gve):
      - allow ethtool to configure rx_buf_len
      - implement XDP HW RX Timestamping support for DQ descriptor format
    - Microsoft vNIC (mana):
      - support HW link state events
      - handle hardware recovery events when probing the device
 
  - Ethernet NICs consumer, and embedded:
    - usbnet: add support for Byte Queue Limits (BQL)
    - AMD (amd-xgbe):
      - add device selftests
    - NXP (enetc):
      - add i.MX94 support
    - Broadcom integrated MACs (bcmgenet, bcmasp):
      - bcmasp: add support for PHY-based Wake-on-LAN
    - Broadcom switches (b53):
      - support port isolation
      - support BCM5389/97/98 and BCM63XX ARL formats
    - Lantiq/MaxLinear switches:
      - support bridge FDB entries on the CPU port
      - use regmap for register access
      - allow user to enable/disable learning
      - support Energy Efficient Ethernet
      - support configuring RMII clock delays
      - add tagging driver for MaxLinear GSW1xx switches
    - Synopsys (stmmac):
      - support using the HW clock in free running mode
      - add Eswin EIC7700 support
      - add Rockchip RK3506 support
      - add Altera Agilex5 support
    - Cadence (macb):
      - cleanup and consolidate descriptor and DMA address handling
      - add EyeQ5 support
    - TI:
      - icssg-prueth: support AF_XDP
    - Airoha access points:
      - add missing Ethernet stats and link state callback
      - add AN7583 support
      - support out-of-order Tx completion processing
    - Power over Ethernet:
      - pd692x0: preserve PSE configuration across reboots
      - add support for TPS23881B devices
 
  - Ethernet PHYs:
    - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
    - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
    - micrel:
      - support for non PTP SKUs of lan8814
      - enable in-band auto-negotiation on lan8814
    - realtek:
      - cable testing support on RTL8224
      - interrupt support on RTL8221B
    - motorcomm: support for PHY LEDs on YT853
    - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
    - mscc: support for PHY LED control
 
  - CAN drivers:
    - m_can: add support for optional reset and system wake up
    - remove can_change_mtu() obsoleted by core handling
    - mcp251xfd: support GPIO controller functionality
 
  - Bluetooth:
    - add initial support for PASTa
 
  - WiFi:
    - split ieee80211.h file, it's way too big
    - improvements in VHT radiotap reporting, S1G, Channel Switch
      Announcement handling, rate tracking in mesh networks
    - improve multi-radio monitor mode support, and add a cfg80211 debugfs
      interface for it
    - HT action frame handling on 6 GHz
    - initial chanctx work towards NAN
    - MU-MIMO sniffer improvements
 
  - WiFi drivers:
    - RealTek (rtw89):
      - support USB devices RTL8852AU and RTL8852CU
      - initial work for RTL8922DE
      - improved injection support
    - Intel:
      - iwlwifi: new sniffer API support
    - MediaTek (mt76):
      - WED support for >32-bit DMA
      - airoha NPU support
      - regdomain improvements
      - continued WiFi7/MLO work
    - Qualcomm/Atheros:
      - ath10k: factory test support
      - ath11k: TX power insertion support
      - ath12k: BSS color change support
      - ath12k: statistics improvements
    - brcmfmac: Acer A1 840 tablet quirk
    - rtl8xxxu: 40 MHz connection fixes/support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkveRQACgkQMUZtbf5S
 IrvY7A/+Nb0o4BxLHjPkAl1m3t3q2d0Y29B7SNkwnwEtxAV8EkNeZ3GWrdtDnTQY
 MYhmc7LEzvz8/lihapr7UJkcokzSASUV54hbez5jDBKC8EEoyUk8FdWDPerwlcRI
 zmCFNAVFyh9GX8i7wcrzKbDTHT5+GZLbSlGl9U5mhLsDdRlJgH7d8PJ7vWcmtLFY
 XN0paDyaeHfCl8wReWNAYx4C/I0ODOvlscpO0tnAKhB0ngJbQCKY2t6tn3rOYdif
 ZSQ5KwVRnJtQ4fYOFMOy9+FSCjVXtyrxF8KLxD+mqom2ZhmO00UpOMl09tqhq3uT
 WnvwoHUVBt6F+iITHwg5kMgIDPUq1kpUvL4S4UbVSuUm9ZKD+4KRU2ZHRBYMx+MU
 bsqmtY8/IULClUoRz+tZhltA8eb0NEqNZE2JPOFDiJHn1YiCCkFwxibhir893oM3
 sB7x65D7LQI2ty2BBGVGYnwYDPtyaxOA/s3WTwPvLEi3+Y/TGNIIrS9lBLA4U+Yr
 Gi93WQGVjttMmVyaHgXBUGmi3L52hvolm0AZ8zSRGrnIEpecjhly2KfYuaOzuxXC
 IHEQ6AFLdRh6JzafXGb/mQwGCHNmhwsY8A49i94fakWQamaL/L6A+1dyPu4LXMqi
 NwqCmlVb/LKGlfNG+V4wT27srJ+yBA2Vk3tpR1sZQQytFh0LKHI=
 =UoDR
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Replace busylock at the Tx queuing layer with a lockless list.

     Resulting in a 300% (4x) improvement on heavy TX workloads, sending
     twice the number of packets per second, for half the cpu cycles.

   - Allow constantly busy flows to migrate to a more suitable CPU/NIC
     queue.

     Normally we perform queue re-selection when flow comes out of idle,
     but under extreme circumstances the flows may be constantly busy.

     Add sysctl to allow periodic rehashing even if it'd risk packet
     reordering.

   - Optimize the NAPI skb cache, make it larger, use it in more paths.

   - Attempt returning Tx skbs to the originating CPU (like we already
     did for Rx skbs).

   - Various data structure layout and prefetch optimizations from Eric.

   - Remove ktime_get() from the recvmsg() fast path, ktime_get() is
     sadly quite expensive on recent AMD machines.

   - Extend threaded NAPI polling to allow the kthread busy poll for
     packets.

   - Make MPTCP use Rx backlog processing. This lowers the lock
     pressure, improving the Rx performance.

   - Support memcg accounting of MPTCP socket memory.

   - Allow admin to opt sockets out of global protocol memory accounting
     (using a sysctl or BPF-based policy). The global limits are a poor
     fit for modern container workloads, where limits are imposed using
     cgroups.

   - Improve heuristics for when to kick off AF_UNIX garbage collection.

   - Allow users to control TCP SACK compression, and default to 33% of
     RTT.

   - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
     unnecessarily aggressive rcvbuf growth and overshot when the
     connection RTT is low.

   - Preserve skb metadata space across skb_push / skb_pull operations.

   - Support for IPIP encapsulation in the nftables flowtable offload.

   - Support appending IP interface information to ICMP messages (RFC
     5837).

   - Support setting max record size in TLS (RFC 8449).

   - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.

   - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.

   - Let users configure the number of write buffers in SMC.

   - Add new struct sockaddr_unsized for sockaddr of unknown length,
     from Kees.

   - Some conversions away from the crypto_ahash API, from Eric Biggers.

   - Some preparations for slimming down struct page.

   - YAML Netlink protocol spec for WireGuard.

   - Add a tool on top of YAML Netlink specs/lib for reporting commonly
     computed derived statistics and summarized system state.

  Driver API:

   - Add CAN XL support to the CAN Netlink interface.

   - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
     defined by the OPEN Alliance's "Advanced diagnostic features for
     100BASE-T1 automotive Ethernet PHYs" specification.

   - Add DPLL phase-adjust-gran pin attribute (and implement it in
     zl3073x).

   - Refactor xfrm_input lock to reduce contention when NIC offloads
     IPsec and performs RSS.

   - Add info to devlink params whether the current setting is the
     default or a user override. Allow resetting back to default.

   - Add standard device stats for PSP crypto offload.

   - Leverage DSA frame broadcast to implement simple HSR frame
     duplication for a lot of switches without dedicated HSR offload.

   - Add uAPI defines for 1.6Tbps link modes.

  Device drivers:

   - Add Motorcomm YT921x gigabit Ethernet switch support.

   - Add MUCSE driver for N500/N210 1GbE NIC series.

   - Convert drivers to support dedicated ops for timestamping control,
     and away from the direct IOCTL handling. While at it support GET
     operations for PHY timestamping.

   - Add (and convert most drivers to) a dedicated ethtool callback for
     reading the Rx ring count.

   - Significant refactoring efforts in the STMMAC driver, which
     supports Synopsys turn-key MAC IP integrated into a ton of SoCs.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support PPS in/out on all pins
      - Intel (100G, ice, idpf):
         - ice: implement standard ethtool and timestamping stats
         - i40e: support setting the max number of MAC addresses per VF
         - iavf: support RSS of GTP tunnels for 5G and LTE deployments
      - nVidia/Mellanox (mlx5):
         - reduce downtime on interface reconfiguration
         - disable being an XDP redirect target by default (same as
           other drivers) to avoid wasting resources if feature is
           unused
      - Meta (fbnic):
         - add support for Linux-managed PCS on 25G, 50G, and 100G links
      - Wangxun:
         - support Rx descriptor merge, and Tx head writeback
         - support Rx coalescing offload
         - support 25G SPF and 40G QSFP modules

   - Ethernet virtual:
      - Google (gve):
         - allow ethtool to configure rx_buf_len
         - implement XDP HW RX Timestamping support for DQ descriptor
           format
      - Microsoft vNIC (mana):
         - support HW link state events
         - handle hardware recovery events when probing the device

   - Ethernet NICs consumer, and embedded:
      - usbnet: add support for Byte Queue Limits (BQL)
      - AMD (amd-xgbe):
         - add device selftests
      - NXP (enetc):
         - add i.MX94 support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - bcmasp: add support for PHY-based Wake-on-LAN
      - Broadcom switches (b53):
         - support port isolation
         - support BCM5389/97/98 and BCM63XX ARL formats
      - Lantiq/MaxLinear switches:
         - support bridge FDB entries on the CPU port
         - use regmap for register access
         - allow user to enable/disable learning
         - support Energy Efficient Ethernet
         - support configuring RMII clock delays
         - add tagging driver for MaxLinear GSW1xx switches
      - Synopsys (stmmac):
         - support using the HW clock in free running mode
         - add Eswin EIC7700 support
         - add Rockchip RK3506 support
         - add Altera Agilex5 support
      - Cadence (macb):
         - cleanup and consolidate descriptor and DMA address handling
         - add EyeQ5 support
      - TI:
         - icssg-prueth: support AF_XDP
      - Airoha access points:
         - add missing Ethernet stats and link state callback
         - add AN7583 support
         - support out-of-order Tx completion processing
      - Power over Ethernet:
         - pd692x0: preserve PSE configuration across reboots
         - add support for TPS23881B devices

   - Ethernet PHYs:
      - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
      - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
      - micrel:
         - support for non PTP SKUs of lan8814
         - enable in-band auto-negotiation on lan8814
      - realtek:
         - cable testing support on RTL8224
         - interrupt support on RTL8221B
      - motorcomm: support for PHY LEDs on YT853
      - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
      - mscc: support for PHY LED control

   - CAN drivers:
      - m_can: add support for optional reset and system wake up
      - remove can_change_mtu() obsoleted by core handling
      - mcp251xfd: support GPIO controller functionality

   - Bluetooth:
      - add initial support for PASTa

   - WiFi:
      - split ieee80211.h file, it's way too big
      - improvements in VHT radiotap reporting, S1G, Channel Switch
        Announcement handling, rate tracking in mesh networks
      - improve multi-radio monitor mode support, and add a cfg80211
        debugfs interface for it
      - HT action frame handling on 6 GHz
      - initial chanctx work towards NAN
      - MU-MIMO sniffer improvements

   - WiFi drivers:
      - RealTek (rtw89):
         - support USB devices RTL8852AU and RTL8852CU
         - initial work for RTL8922DE
         - improved injection support
      - Intel:
         - iwlwifi: new sniffer API support
      - MediaTek (mt76):
         - WED support for >32-bit DMA
         - airoha NPU support
         - regdomain improvements
         - continued WiFi7/MLO work
      - Qualcomm/Atheros:
         - ath10k: factory test support
         - ath11k: TX power insertion support
         - ath12k: BSS color change support
         - ath12k: statistics improvements
      - brcmfmac: Acer A1 840 tablet quirk
      - rtl8xxxu: 40 MHz connection fixes/support"

* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
  net: page_pool: sanitise allocation order
  net: page pool: xa init with destroy on pp init
  net/mlx5e: Support XDP target xmit with dummy program
  net/mlx5e: Update XDP features in switch channels
  selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
  net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  wireguard: netlink: generate netlink code
  wireguard: uapi: generate header with ynl-gen
  wireguard: uapi: move flag enums
  wireguard: uapi: move enum wg_cmd
  wireguard: netlink: add YNL specification
  selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
  selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
  selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
  selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
  selftests: drv-net: introduce Iperf3Runner for measurement use cases
  selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
  net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
  Documentation: net: dsa: mention simple HSR offload helpers
  Documentation: net: dsa: mention availability of RedBox
  ...
2025-12-03 17:24:33 -08:00
Linus Torvalds
015e7b0b0e bpf-next-6.19
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmktzC4ACgkQ6rmadz2v
 bTpA1w/+PZ45N3y6O+NQVIpBlpnHG7DEMK7Lw19On0xVLwH+XPHz6J5PEfzjyJR1
 SCbsV30qkJ1YCtgRHHf+ZCuWPWm58hY8dXYwSDyjNavdQyVGOdf17aBu9pvH45NW
 K20OhwQHpCHWIfDlijjPkDdiHnYf5S7Xy6ctt/3ztF0pMDHIaghGxJymG4wULcDT
 iLKnT37kwO8b2ihmw/HbcZPQYMWfHRye7X009K+wCv0dnhJ6q/Ny1m+Pg4kF92e6
 ON/RY26ep2dq7LpaNWa1rI1yOgFlI7uUlVojqrAuAb+xrg+64wUDBxeijvE37EN1
 s/+PuEKAR6xwz1dbY2cWAI0D633saz24UdV6kCBW9HrjHKVRQ7ZSsBF9ENkS4DTK
 nowx4wOe1ZHc/6YgTktZp9LEn/0YrmQtFxjqEAJiYUgD18FrBrSjmhHpBiL+HghP
 sTqy41qDQGoKtg3bRu42Co9wmNeeLsnxT8NQExCmTYQ4ufpdA/VMQux9cBVX3GBq
 EchJb465+AcvvCJUiKbnHLxDsHCQz1YYytz3RqyFLgGDFZnHOE0FjwPJmM8I5kkK
 gvDB3ZYdO3Halm8BZfZZBnKv5uK7myuAWwqRLgMRanuZcRgmIV1oUP5EP88HdH75
 fB20vZSVcfzB17SLyhiM20ivEWodJa9VCLEw9WDOmDoml+33Pks=
 =kaCJ
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
   test_progs runner (Alexis Lothoré)

 - Convert selftests/bpf/test_xsk to test_progs runner (Bastien
   Curutchet)

 - Replace bpf memory allocator with kmalloc_nolock() in
   bpf_local_storage (Amery Hung), and in bpf streams and range tree
   (Puranjay Mohan)

 - Introduce support for indirect jumps in BPF verifier and x86 JIT
   (Anton Protopopov) and arm64 JIT (Puranjay Mohan)

 - Remove runqslower bpf tool (Hoyeon Lee)

 - Fix corner cases in the verifier to close several syzbot reports
   (Eduard Zingerman, KaFai Wan)

 - Several improvements in deadlock detection in rqspinlock (Kumar
   Kartikeya Dwivedi)

 - Implement "jmp" mode for BPF trampoline and corresponding
   DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
   from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)

 - Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
   Chaignon)

 - Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
   Borkmann)

 - Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)

 - Optimize bpf_map_update_elem() for map-in-map types (Ritesh
   Oedayrajsingh Varma)

 - Introduce overwrite mode for BPF ring buffer (Xu Kuohai)

* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
  bpf: optimize bpf_map_update_elem() for map-in-map types
  bpf: make kprobe_multi_link_prog_run always_inline
  selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
  selftests/bpf: remove test_tc_edt.sh
  selftests/bpf: integrate test_tc_edt into test_progs
  selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
  selftests/bpf: Add success stats to rqspinlock stress test
  rqspinlock: Precede non-head waiter queueing with AA check
  rqspinlock: Disable spinning for trylock fallback
  rqspinlock: Use trylock fallback when per-CPU rqnode is busy
  rqspinlock: Perform AA checks immediately
  rqspinlock: Enclose lock/unlock within lock entry acquisitions
  bpf: Remove runqslower tool
  selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
  bpf: Disable file_alloc_security hook
  bpf: check for insn arrays in check_ptr_alignment
  bpf: force BPF_F_RDONLY_PROG on insn array creation
  bpf: Fix exclusive map memory leak
  selftests/bpf: Make CS length configurable for rqspinlock stress test
  selftests/bpf: Add lock wait time stats to rqspinlock stress test
  ...
2025-12-03 16:54:54 -08:00
Steven Rostedt
d3042cbe84 ktest.pl: Fix uninitialized var in config-bisect.pl
The error path of copying the old config used the wrong variable in the
error message:

 $ mkdir /tmp/build
 $ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad
 $ chmod 0 /tmp/build
 $ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad good
 cp /tmp/build//.config config-good.tmp ... [0 seconds] FAILED!
 Use of uninitialized value $config in concatenation (.) or string at ./tools/testing/ktest/config-bisect.pl line 744.
 failed to copy  to config-good.tmp

When it should have shown:

 failed to copy /tmp/build//.config to config-good.tmp

Cc: stable@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Fixes: 0f0db06599 ("ktest: Add standalone config-bisect.pl program")
Link: https://patch.msgid.link/20251203180924.6862bd26@gandalf.local.home
Reported-by: "John W. Krahn" <jwkrahn@shaw.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2025-12-03 18:25:18 -05:00
Linus Torvalds
2488655b2f linux_kselftest-next-6.19-rc1
Adds basic test for trace_marker_raw file to tracing selftest.
 Fixes invalid array access in printf dma_map_benchmark selftest.
 Adds tprobe enable/disable testcase to tracing selftest.
 Updates fprobe selftest for ftrace based fprobe.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmkvXJ0ACgkQCwJExA0N
 QxxLWg/+PjZJry0x3jzyQrYNzxewcjrtH8GwGXgBXJxBGwXZYPv1AubnC3o+Ru6Q
 AES5s5M3Fm+oMwPc/2VhVbSWLBX4HQthsiShTCZQf7eC0uuxkE/ZCQ4qqs1usT4G
 l6z+AecIJcflEjX/hwb9ALoKqniwuodrhIl8oNeFgQgB3VQ55Zt3LylxBvJAKACM
 cqJbDMxSPOqVcRF0xm+A/WFDU5GVmt79+XhxLLg6sZmLCmmjGjE/rtDxgLPZYdE4
 NGPghjYlDiAqLJZNUVbxpSKlurq1o36zOKowZO+rlhqOilxaRYoNvrll13u8zIjb
 Y1AkZeIU9ITzbdXfjWWgMoYOcrevfRygUGZ1R5BEwoGQ9jD0qMCDkaT5cpeBcZ53
 1oYcmyP65e2GWDSywVSKuetqtTCXT5xnGpo9TQmm9lNTasB7LwkEXbP9/4FH7BCl
 f52T14cclZHND3XbDBdaR1ZALj2Qb5CFfuWAG2pZCNdt/jrxyEqU2RdLG95S1yDz
 8diKhtflxOq627V5VtOj9VgS3AGj9pkVSSmWiyS0KjdV/v7vZWUbYhFPrJntoTiY
 GLAwkdGiIS6edDqQ7lqgpr8/NLxtsZwzYYIwxul+DJybAyJSt7Q3YFe9vSdDPHlf
 rgj1VeW+byotE3A+92nL17Yy0MhjUVS/M31EyzFupChwxScDhhE=
 =NOnZ
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:

 - Add basic test for trace_marker_raw file to tracing selftest

 - Fix invalid array access in printf dma_map_benchmark selftest

 - Add tprobe enable/disable testcase to tracing selftest

 - Update fprobe selftest for ftrace based fprobe

* tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: tracing: Update fprobe selftest for ftrace based fprobe
  selftests: tracing: Add tprobe enable/disable testcase
  selftests/run_kselftest.sh: exit with error if tests fail
  selftests/dma: fix invalid array access in printf
  selftests/tracing: Add basic test for trace_marker_raw file
2025-12-03 15:08:18 -08:00
Linus Torvalds
51ab33fc0a livepatching changes for 6.19
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmkumVobFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTypHMQALImoj39UZiLLVMn2d4P
 5droOQw+qMCfsVVY5m9Rqw3vrCsSds+7eC5Qa2GaXR480l4ZeMGyoYCr+2gDWcM3
 vwTWuj8UuUOxQlMXubIFyL9r/SUtYSiqfAPK5tGtzHuNNUEHWGed9r5RrOGKZOnj
 SDe6ZL5G2Kekuua1mtSZhw8qpW/nK2IuFTRPxN6O243dUGtX0wkzWr1tVRpDDzoT
 UNWZ2xoE1lzb/EAviyuSgSIc5YsT50bM+fMzOUHsY/E9u8fI42N1z5B+P3ZlTvGx
 Z5nJ79ZMtbF3/4o6OaUqvR9Y4zZ73TtkMoV4sjrjYg28q+n4/tqVlCsjy0Hys/dP
 MlaKbLSI54+RldTWx02Yu0jcn2Artua3N0bURyEenqkhDLW9RXVfx/h29QQYPpE9
 7FSlpUAQnFzFbGUGJcB02oClkxUSUcOuzVqC0n30wfdo+bVzDn19rr+Vi2bHe8TD
 MsASva96n2FhRVPkm8eBxnxpaYQj9Q6G1/7B1F30Et7PLzwD9qYUvXPXceyPATnd
 r3hdUiKgkPvUaDhApJRHwv/ZGnNDKujGBDc/YgBQrl3cPfG0ShGzJzWz892JjX0+
 NB+O+eabHVs6tFRFNvxS2fu/ROZxrGfBzZUKoaZjZh3JEgOhlLKVlL7aRqtctFvP
 Dz0QYiOqlkjho7FykCIEKXxQ
 =6o9X
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching updates from Petr Mladek:

 - Support both paths where tracefs is typically mounted in selftests

 - Make old_sympos 0 and 1 equal. They both are valid when there is only
   one symbol with the given name.

* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  selftests: livepatch: use canonical ftrace path
  livepatch: Match old_sympos 0 and 1 in klp_find_func()
2025-12-03 13:46:48 -08:00
Linus Torvalds
02baaa67d9 sched_ext: Changes for v6.19
- Improve recovery from misbehaving BPF schedulers. When a scheduler puts many
   tasks with varying affinity restrictions on a shared DSQ, CPUs scanning
   through tasks they cannot run can overwhelm the system, causing lockups.
   Bypass mode now uses per-CPU DSQs with a load balancer to avoid this, and
   hooks into the hardlockup detector to attempt recovery. Add scx_cpu0 example
   scheduler to demonstrate this scenario.
 
 - Add lockless peek operation for DSQs to reduce lock contention for schedulers
   that need to query queue state during load balancing.
 
 - Allow scx_bpf_reenqueue_local() to be called from anywhere in preparation for
   deprecating cpu_acquire/release() callbacks in favor of generic BPF hooks.
 
 - Prepare for hierarchical scheduler support: add scx_bpf_task_set_slice() and
   scx_bpf_task_set_dsq_vtime() kfuncs, make scx_bpf_dsq_insert*() return bool,
   and wrap kfunc args in structs for future aux__prog parameter.
 
 - Implement cgroup_set_idle() callback to notify BPF schedulers when a cgroup's
   idle state changes.
 
 - Fix migration tasks being incorrectly downgraded from stop_sched_class to
   rt_sched_class across sched_ext enable/disable. Applied late as the fix is
   low risk and the bug subtle but needs stable backporting.
 
 - Various fixes and cleanups including cgroup exit ordering, SCX_KICK_WAIT
   reliability, and backward compatibility improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS4h1A4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGe/MAP9EZ0pLiTpmMtt6mI/11Fmi+aWfL84j1zt13cz9
 W4vb4gEA9eVEH6n9xyC4nhcOk9AQwSDuCWMOzLsnhW8TbEHVTww=
 =8W/B
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Improve recovery from misbehaving BPF schedulers.

   When a scheduler puts many tasks with varying affinity restrictions
   on a shared DSQ, CPUs scanning through tasks they cannot run can
   overwhelm the system, causing lockups.

   Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
   and hooks into the hardlockup detector to attempt recovery.

   Add scx_cpu0 example scheduler to demonstrate this scenario.

 - Add lockless peek operation for DSQs to reduce lock contention for
   schedulers that need to query queue state during load balancing.

 - Allow scx_bpf_reenqueue_local() to be called from anywhere in
   preparation for deprecating cpu_acquire/release() callbacks in favor
   of generic BPF hooks.

 - Prepare for hierarchical scheduler support: add
   scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
   make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
   structs for future aux__prog parameter.

 - Implement cgroup_set_idle() callback to notify BPF schedulers when a
   cgroup's idle state changes.

 - Fix migration tasks being incorrectly downgraded from
   stop_sched_class to rt_sched_class across sched_ext enable/disable.
   Applied late as the fix is low risk and the bug subtle but needs
   stable backporting.

 - Various fixes and cleanups including cgroup exit ordering,
   SCX_KICK_WAIT reliability, and backward compatibility improvements.

* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
  sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
  sched_ext: tools: Removing duplicate targets during non-cross compilation
  sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
  sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
  sched_ext: Update comments replacing breather with aborting mechanism
  sched_ext: Implement load balancer for bypass mode
  sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
  sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
  sched_ext: Add scx_cpu0 example scheduler
  sched_ext: Hook up hardlockup detector
  sched_ext: Make handle_lockup() propagate scx_verror() result
  sched_ext: Refactor lockup handlers into handle_lockup()
  sched_ext: Make scx_exit() and scx_vexit() return bool
  sched_ext: Exit dispatch and move operations immediately when aborting
  sched_ext: Simplify breather mechanism with scx_aborting flag
  sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
  sched_ext: Refactor do_enqueue_task() local and global DSQ paths
  sched_ext: Use shorter slice in bypass mode
  sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
  sched_ext: Minor cleanups to scx_task_iter
  ...
2025-12-03 13:25:39 -08:00
Linus Torvalds
8449d3252c cgroup: Changes for v6.19
- Defer task cgroup unlink until after the dying task's final context switch
   so that controllers see the cgroup properly populated until the task is
   truly gone.
 
 - cpuset cleanups and simplifications. Enforce that domain isolated CPUs
   stay in root or isolated partitions and fail if isolated+nohz_full would
   leave no housekeeping CPU. Fix sched/deadline root domain handling during
   CPU hot-unplug and race for tasks in attaching cpusets.
 
 - Misc fixes including memory reclaim protection documentation and selftest
   KTAP conformance.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS3pEQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGYbrAP9H0kVyWH5tK9VhjSZyqidic8NuvtmNOyhIRrg0
 8S8K0wD/YG9xlh2JUyRmS4B23ggc59+9y5xM2/sctrho51Pvsgg=
 =0MB+
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Defer task cgroup unlink until after the dying task's final context
   switch so that controllers see the cgroup properly populated until
   the task is truly gone

 - cpuset cleanups and simplifications.

   Enforce that domain isolated CPUs stay in root or isolated partitions
   and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
   sched/deadline root domain handling during CPU hot-unplug and race
   for tasks in attaching cpusets

 - Misc fixes including memory reclaim protection documentation and
   selftest KTAP conformance

* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: Treat cpusets in attaching as populated
  sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
  cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
  docs: cgroup: No special handling of unpopulated memcgs
  docs: cgroup: Note about sibling relative reclaim protection
  docs: cgroup: Explain reclaim protection target
  selftests/cgroup: conform test to KTAP format output
  cpuset: remove need_rebuild_sched_domains
  cpuset: remove global remote_children list
  cpuset: simplify node setting on error
  cgroup: include missing header for struct irq_work
  cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
  cgroup/cpuset: Globally track isolated_cpus update
  cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
  cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
  cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
  cgroup: Defer task cgroup unlink until after the task is done switching out
  cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
  cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
  ...
2025-12-03 13:04:07 -08:00
Maurice Hieronymus
cffc934c0d selftests: tpm2: Fix ill defined assertions
Remove parentheses around assert statements in Python. With parentheses,
assert always evaluates to True, making the checks ineffective.

Signed-off-by: Maurice Hieronymus <mhi@mailbox.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-12-03 22:55:27 +02:00
Linus Torvalds
98e7dcbb82 RCU pull request for v6.19
SRCU
 ----
 
 _ Properly handle SRCU readers within IRQ disabled sections in tiny SRCU
 
 - Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
 
     - Introduce API to expedite a grace period and test it through
       rcutorture.
 
     - Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
       Both are still targeted toward faster readers (without full
       barriers on LOCK and UNLOCK) at the expense of heavier write side
       (using full RCU grace period ordering instead of simply full
       ordering) as compared to "traditional" non-fast SRCU. But those
       srcu-fast flavours are going to be optimized in two different
       ways:
 
          - SRCU-fast will become the reimplementation basis for
            RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
            be NMI safe, SRCU-fast must be as well.
 
          - SRCU-fast-updown will be needed for uretprobes code in order
            to get rid of the read-side memory barriers while still
            allowing entering the reader at task level while exiting it
            in a timer handler. It is considered semaphore-like in that
            it can have different owners between LOCK and UNLOCK.
            However it is not NMI-safe.
 
       The actual optimizations are work in progress for the next cycle.
       Only the new interfaces are added for now, along with related
       torture and scalability test code.
 
 - Create/document/debug/torture new proper initializers for RCU fast:
   DEFINE_SRCU_FAST() and init_srcu_struct_fast()
 
   This allows for using right away the proper ordering on the write
   side (either full ordering or full RCU grace period ordering) without
   waiting for the read side to tell which to use. Also this optimizes
   the read side altogether with moving flavour debug checks under debug
   config and with removing a costly RmW operation on their first call.
 
 - Make some diagnostic functions tracing safe.
 
 REFSCALE
 -------
 
 Add performance testing for common context synchronizations
 (Preemption, IRQ, Softirq) and per-cpu increments. Those are
 relevant comparisons against SRCU-fast read side APIs, especially
 as they are planned to synchronize further tracing fast-path code.
 
 MISCELLANOUS
 ------
 
 - In order to prepare the layout for nohz_full work deferral to
   user exit, the context tracking state must shrink the counter
   of transitions to/from RCU not watching. The only possible hazard
   is to trigger wrap-around more easily, delaying a bit grace periods
   when that happens. This should be a rare event though. Yet add
   debugging and torture code to test that assumption.
 
 - Fix memory leak on locktorture module
 
 - Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
   On recent discussions, we also concluded that all those WRITE_ONCE()
   and READ_ONCE() on list APIs deserve appropriate comments. Something
   to be expected for the next cycle.
 
 - Provide a script to apply several configs to several commits with torture.
 
 - Allow torture to reuse a build directory in order to save needless
   rebuild time.
 
 - Various cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmksubgbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEIUkVEdQjox3MJ0P/Aq+AOPWuaE70B8SxVYY
 VfucjjzZwSyI30Vw4GE8skNVMOnyY2wPIrNRgmckghi9LM5luureMcks9esrdFfv
 csWj3AhnBu+/Lyd4EhP8AxMH2OPIX50FUrAQaY7LAcmc4nIng1MqRm35eT776XzR
 tN07+nsXjNumPcdN8JAv/VzInUf5DdMvhy+Qw0Krpvan/sUpzzpFpNUWdHsyZ2Re
 UCQr//AooQLfCIIIHUYlviThqigJETZlv0NKYrD0Zaxx5MCOwwWmz5otQzacSRk2
 Zgn6Bayp98s0972sOghiMRXEoSvqBSJe5eVTWcj1oNLruO3flpyREgE5Wuo8T8Bg
 BHvOiYaPNC5FW+btCWOQcKcSx1mxrgsX3/9OAth+kmtPqZdJSQEeY+zqWzdN7ZlV
 q5+oipsFONvWXL6+U6uZf44MG1pscrzn6QvXajjg98iaLZgY0+mkAhvtpjcfuTwZ
 2CXYQKacfwftpOXgM2QQGB4QE8CKXchlma381KbTHv0rX8zfK0BnWoe/oGPz22GL
 3kldmhfIC/87i8t9HHngkU6SZ+Mkvf9vC28eclnuXB1PKCRsBHePGvi3I671kbLj
 hycBwnwWriULEwKYTVQ8gmdBHSANB4IpmMYflADVqLuvp+7U6W5mNjWNTcXoSwFl
 NoR7AViEUNzOEoa8u6WMKoec
 =XnUt
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Frederic Weisbecker:
 "SRCU:

   - Properly handle SRCU readers within IRQ disabled sections in tiny
     SRCU

   - Preparation to reimplement RCU Tasks Trace on top of SRCU fast:

      - Introduce API to expedite a grace period and test it through
        rcutorture

      - Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.

        Both are still targeted toward faster readers (without full
        barriers on LOCK and UNLOCK) at the expense of heavier write
        side (using full RCU grace period ordering instead of simply
        full ordering) as compared to "traditional" non-fast SRCU. But
        those srcu-fast flavours are going to be optimized in two
        different ways:

          - SRCU-fast will become the reimplementation basis for
            RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
            be NMI safe, SRCU-fast must be as well.

          - SRCU-fast-updown will be needed for uretprobes code in order
            to get rid of the read-side memory barriers while still
            allowing entering the reader at task level while exiting it
            in a timer handler. It is considered semaphore-like in that
            it can have different owners between LOCK and UNLOCK.
            However it is not NMI-safe.

        The actual optimizations are work in progress for the next
        cycle. Only the new interfaces are added for now, along with
        related torture and scalability test code.

   - Create/document/debug/torture new proper initializers for RCU fast:
     DEFINE_SRCU_FAST() and init_srcu_struct_fast()

     This allows for using right away the proper ordering on the write
     side (either full ordering or full RCU grace period ordering)
     without waiting for the read side to tell which to use.

     This also optimizes the read side altogether with moving flavour
     debug checks under debug config and with removing a costly RmW
     operation on their first call.

   - Make some diagnostic functions tracing safe

  Refscale:

   - Add performance testing for common context synchronizations
     (Preemption, IRQ, Softirq) and per-cpu increments. Those are
     relevant comparisons against SRCU-fast read side APIs, especially
     as they are planned to synchronize further tracing fast-path code

  Miscellanous:

   - In order to prepare the layout for nohz_full work deferral to user
     exit, the context tracking state must shrink the counter of
     transitions to/from RCU not watching. The only possible hazard is
     to trigger wrap-around more easily, delaying a bit grace periods
     when that happens. This should be a rare event though. Yet add
     debugging and torture code to test that assumption

   - Fix memory leak on locktorture module

   - Annotate accesses in rculist_nulls.h to prevent from KCSAN
     warnings. On recent discussions, we also concluded that all those
     WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
     comments. Something to be expected for the next cycle

   - Provide a script to apply several configs to several commits with
     torture

   - Allow torture to reuse a build directory in order to save needless
     rebuild time

   - Various cleanups"

* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
  refscale: Add SRCU-fast-updown readers
  refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
  rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
  srcu: Create an SRCU-fast-updown API
  refscale: Do not disable interrupts for tests involving local_bh_enable()
  refscale: Add non-atomic per-CPU increment readers
  refscale: Add this_cpu_inc() readers
  refscale: Add preempt_disable() readers
  refscale: Add local_bh_disable() readers
  refscale: Add local_irq_disable() and local_irq_save() readers
  torture: Permit negative kvm.sh --kconfig numberic arguments
  srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
  rcu: Mark diagnostic functions as notrace
  rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
  rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
  rcutorture: Permit kvm-again.sh to re-use the build directory
  torture: Add kvm-series.sh to test commit/scenario combination
  rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
  locktorture: Fix memory leak in param_set_cpumask()
  doc: Update for SRCU-fast definitions and initialization
  ...
2025-12-03 12:18:07 -08:00
Linus Torvalds
f2310b6271 nolibc changes for v6.19
Highlights:
 
 * Preparations to the use of nolibc in UML.
   * Cleanup of sparse warnings.
   * Library mode without _start().
   * More consistency when disabling errno.
 * Unconditional installation of all architecture support files.
 * Always 64-bit wide ino_t and off_t.
 * Various cleanups and bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTg4lxklFHAidmUs57B+h1jyw5bOAUCaSwY2wAKCRDB+h1jyw5b
 OPpKAP9wvshTAWselixtw/xR8BXBIxHUh9y/NeT827Ut4IvpUgD/UH3duISRfB++
 31nIRPI5BSZpC4CgaN9QKChcmxR8rw4=
 =CbTX
 -----END PGP SIGNATURE-----

Merge tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc

Pull nolibc updates from Thomas Weißschuh:

 - Preparations to the use of nolibc in UML:
     - Cleanup of sparse warnings
     - Library mode without _start()
     - More consistency when disabling errno

 - Unconditional installation of all architecture support files

 - Always 64-bit wide ino_t and off_t

 - Various cleanups and bug fixes

* tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (25 commits)
  selftests/nolibc: error out on linker warnings
  selftests/nolibc: use lld to link loongarch binaries
  tools/nolibc: remove more __nolibc_enosys() fallbacks
  tools/nolibc: remove now superfluous overflow check in llseek
  tools/nolibc: use 64-bit off_t
  tools/nolibc: prefer the llseek syscall
  tools/nolibc: handle 64-bit off_t for llseek
  tools/nolibc: use 64-bit ino_t
  tools/nolibc: avoid using plain integer as NULL pointer
  tools/nolibc: add support for fchdir()
  tools/nolibc: clean up outdated comments in generic arch.h
  tools/nolibc: make the "headers" target install all supported archs
  tools/nolibc: add the more portable inttypes.h
  tools/nolibc: provide the portable sys/select.h
  tools/nolibc: add missing memchr() to string.h
  tools/nolibc: fix misleading help message regarding installation path
  tools/nolibc: add uio.h with readv and writev
  tools/nolibc: add option to disable runtime
  tools/nolibc: use __fallthrough__ rather than fallthrough
  tools/nolibc: implement %m if errno is not defined
  ...
2025-12-03 09:23:25 -08:00
Linus Torvalds
44fc84337b arm64 updates for 6.19:
Core features:
 
  - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
    driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
 
 Perf and PMU:
 
  - Avoid cycle counter on multi-threaded CPUs
 
  - Extend CSPMU device probing and add additional filtering support for
    NVIDIA implementations
 
  - Add support for the PMUs on the NoC S3 interconnect
 
  - Add additional compatible strings for new Cortex and C1 CPUs
 
  - Add support for data source filtering to the SPE driver
 
  - Add support for i.MX8QM and "DB" PMU in the imx PMU driver
 
 Memory managemennt:
 
  - Avoid broadcast TLBI if page reused in write fault
 
  - Elide TLB invalidation if the old PTE was not valid
 
  - Drop redundant cpu_set_*_tcr_t0sz() macros
 
  - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
 
  - Propagate return value from __change_memory_common()
 
 ACPI and EFI:
 
  - Call EFI runtime services without disabling preemption
 
  - Remove unused ACPI function
 
 Miscellaneous:
 
  - ptrace support to disable streaming on SME-only systems
 
  - Improve sysreg generation to include a 'Prefix' descriptor
 
  - Replace __ASSEMBLY__ with __ASSEMBLER__
 
  - Align register dumps in the kselftest zt-test
 
  - Remove some no longer used macros/functions
 
  - Various spelling corrections
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmkvMjkACgkQa9axLQDI
 XvGaGg//dtT/ZAqrWa6Yniv1LOlh837C07YdxAYTTuJ+I87DnrxIqjwbW+ye+bF+
 61RTkioeCUm3PH+ncO9gPVNi4ASZ1db3/Rc8Fb6rr1TYOI1sMIeBsbbVdRJgsbX6
 zu9197jOBHscTAeDceB6jZBDyW8iSLINPZ7LN6lGxXsZM/Vn5zfE0heKEEio6Fsx
 +AzO2vos0XcwBR9vFGXtiCDx57T+/cXUtrWfA0Cjz4nvHSgD8+ghS+Jwv+kHMt1L
 zrarqbeQfj+Iixm9PVHiazv+8THo9QdNl1yGLxDmJ4LEVPewjW5jBs8+5e8e3/Gj
 p5JEvmSyWvKTTbFoM5vhxC72A7yuT1QwAk2iCyFIxMbQ25PndHboKVp/569DzOkT
 +6CjI88sVSP6D7bVlN6pFlzc/Fa07YagnDMnMCSfk4LBjUfE3jYb+usaFydyv/rl
 jwZbJrnSF/H+uQlyoJFgOEXSoQdDsll3dv6yEsUCwbd8RqXbAe3svbguOUHSdvIj
 sCViezGZQ7Rkn6D21AfF9j6e7ceaSDaf5DWMxPI3dAxFKG8TJbCBsToR59NnoSj+
 bNEozbZ1mCxmwH8i43wZ6P0RkClvJnoXcvRA+TJj02fSZACO39d3XDNswfXWL41r
 KiWGUJZyn2lPKtiAWVX6pSBtDJ+5rFhuoFgADLX6trkxDe9/EMQ=
 =4Sb6
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "These are the arm64 updates for 6.19.

  The biggest part is the Arm MPAM driver under drivers/resctrl/.
  There's a patch touching mm/ to handle spurious faults for huge pmd
  (similar to the pte version). The corresponding arm64 part allows us
  to avoid the TLB maintenance if a (huge) page is reused after a write
  fault. There's EFI refactoring to allow runtime services with
  preemption enabled and the rest is the usual perf/PMU updates and
  several cleanups/typos.

  Summary:

  Core features:

   - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
     driver under drivers/resctrl/ which makes use of the fs/rectrl/ API

  Perf and PMU:

   - Avoid cycle counter on multi-threaded CPUs

   - Extend CSPMU device probing and add additional filtering support
     for NVIDIA implementations

   - Add support for the PMUs on the NoC S3 interconnect

   - Add additional compatible strings for new Cortex and C1 CPUs

   - Add support for data source filtering to the SPE driver

   - Add support for i.MX8QM and "DB" PMU in the imx PMU driver

  Memory managemennt:

   - Avoid broadcast TLBI if page reused in write fault

   - Elide TLB invalidation if the old PTE was not valid

   - Drop redundant cpu_set_*_tcr_t0sz() macros

   - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()

   - Propagate return value from __change_memory_common()

  ACPI and EFI:

   - Call EFI runtime services without disabling preemption

   - Remove unused ACPI function

  Miscellaneous:

   - ptrace support to disable streaming on SME-only systems

   - Improve sysreg generation to include a 'Prefix' descriptor

   - Replace __ASSEMBLY__ with __ASSEMBLER__

   - Align register dumps in the kselftest zt-test

   - Remove some no longer used macros/functions

   - Various spelling corrections"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
  arm64/pageattr: Propagate return value from __change_memory_common
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
  Documentation/arm64: Fix the typo of register names
  ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
  perf: arm_spe: Add support for filtering on data source
  perf: Add perf_event_attr::config4
  perf/imx_ddr: Add support for PMU in DB (system interconnects)
  perf/imx_ddr: Get and enable optional clks
  perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
  dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
  arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
  arm64: mm: use untagged address to calculate page index
  MAINTAINERS: new entry for MPAM Driver
  arm_mpam: Add kunit tests for props_mismatch()
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add helper to reset saved mbwu state
  ...
2025-12-02 17:03:55 -08:00
Linus Torvalds
2547f79b0b s390 updates for 6.19 merge window
- Provide a new interface for dynamic configuration and deconfiguration of
   hotplug memory, allowing with and without memmap_on_memory support. This
   makes the way memory hotplug is handled on s390 much more similar to
   other architectures
 
 - Remove compat support. There shouldn't be any compat user space around
   anymore, therefore get rid of a lot of code which also doesn't need to be
   tested anymore
 
 - Add stackprotector support. GCC 16 will get new compiler options, which
   allow to generate code required for kernel stackprotector support
 
 - Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes
   a lot of duplicated code. The new driver is also extendable and allows
   to support new PMUs
 
 - Add driver override support for AP queues
 
 - Rework and extend zcrypt and AP trace events to allow for tracing of
   crypto requests
 
 - Support block sizes larger than 65535 bytes for CCW tape devices
 
 - Since the rework of the virtual kernel address space the module area and
   the kernel image are within the same 4GB area. This eliminates the need
   of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
 
 - Various other small improvements and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmktZioACgkQIg7DeRsp
 bsK4Rw//VzkvHyzOtGKZ8Hb4S+Sh/PFlaZQXNhj+Xt5gWoOhP1uPmmhBe6LxjYaB
 J9Ns3hpONQ1dTHV7VVkds8FvM/SBcGe8m5RpefmChC/bjm5UEOV/MppKtA0aLnEH
 hJmdubIrrRAXKggxlHEfRLzBsFvV/rJ9Xf16FhRxGDc4pgmgkI1NPQ41/dyCHklQ
 dB3YrFVPIETywVYYVB/G3h11JgF5Z6CKtjYCdSx72Fkbj65+6JPfcPgLKMpcJuPd
 UxUXtCo1FCXlP70jsz8JQI8cdieG0KDQTtnZP4P/pqjQ3wirOqvMewNa9t9xmQ2e
 p6Rc1Vx5DESkq9bRWtQEaprTVVzK7DDLH3RuZwB+uLrcLGD8JvVS6/m9n9CgzBMT
 BnJXG2sLZH+gdQy+DSD/fVDD7OvIk8TGrH+OFwVIKhrT/J3B2E7ZSYyZZCNIS7VG
 yiuypoDGYg3ZpYjH9+qOXWB3nc0vQWrlFzb1bsQu1omJGmunLv4jtTjAKGN82C33
 auBsIYAlQW20X7DV0vZa59PwqwtBqtdQQcTidwtSztzKogRXAdK8KKHtN60JM4S2
 7sWFOFCQaTChAeDNw6MF5EtULb551nwH2RtJ9x3CrJj+OGK6clbQNcxIA7Oy0veR
 Sl9v1lMfeKOgDrPdDy3ArQBJ8WLlF9qX9wLKbiaNyIKmkz2ymkg=
 =CNrb
 -----END PGP SIGNATURE-----

Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Provide a new interface for dynamic configuration and deconfiguration
   of hotplug memory, allowing with and without memmap_on_memory
   support. This makes the way memory hotplug is handled on s390 much
   more similar to other architectures

 - Remove compat support. There shouldn't be any compat user space
   around anymore, therefore get rid of a lot of code which also doesn't
   need to be tested anymore

 - Add stackprotector support. GCC 16 will get new compiler options,
   which allow to generate code required for kernel stackprotector
   support

 - Merge pai_crypto and pai_ext PMU drivers into a new driver. This
   removes a lot of duplicated code. The new driver is also extendable
   and allows to support new PMUs

 - Add driver override support for AP queues

 - Rework and extend zcrypt and AP trace events to allow for tracing of
   crypto requests

 - Support block sizes larger than 65535 bytes for CCW tape devices

 - Since the rework of the virtual kernel address space the module area
   and the kernel image are within the same 4GB area. This eliminates
   the need of weak per cpu variables. Get rid of
   ARCH_MODULE_NEEDS_WEAK_PER_CPU

 - Various other small improvements and fixes

* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
  watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
  s390/entry: Use lay instead of aghik
  s390/vdso: Get rid of -m64 flag handling
  s390/vdso: Rename vdso64 to vdso
  s390: Rename head64.S to head.S
  s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
  s390: Add stackprotector support
  s390/modules: Simplify module_finalize() slightly
  s390: Remove KMSG_COMPONENT macro
  s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
  s390/ap: Restrict driver_override versus apmask and aqmask use
  s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
  s390/ap: Support driver_override for AP queue devices
  s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
  s390/debug: Update description of resize operation
  s390/syscalls: Switch to generic system call table generation
  s390/syscalls: Remove system call table pointer from thread_struct
  s390/uapi: Remove 31 bit support from uapi header files
  s390: Remove compat support
  tools: Remove s390 compat support
  ...
2025-12-02 16:37:00 -08:00
Jakub Kicinski
4de4454299 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes in preparation for the net-next PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-02 15:37:53 -08:00
Linus Torvalds
d61f1cc5db * Enable Linear Address Space Separation (LASS)
* Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuIXAACgkQaDWVMHDJ
 krAwoRAAqqavNrthj26XJHjR3x7FVGu11/rvYXAd1U2moN/dhM2w82HMHNFvPuQY
 3iq9GDRQdc2rKL7LTkREvN4ZM/rFvkFLt6a5Yv0eCRK8KAiSJEw6Yzu/qgG7kF+0
 9clujDUskjjHU0zR5v+o1RxirrLVQ+R50sMVI5uoFx6+WJRiW1BvMG4Csw4BgbvA
 AqgrZpyq1dQ/GQOW4f0yxBPH0z84wgUbdllYzQzE0GeUlGWQSI4lqa8GFMOmE/Gr
 7gBcKmyE0M/BycwTZW7tiMnjWgNL+Y5/RroQJ7hh6R+f5WOd+SpGvlyOihbF7GER
 L3yZfeQ+EWz1aY1QMWwOSvSawIPJo8EkSn3d9/JFq5Vl9zsFh+ZoPZfZ8bEi36U0
 inO93swDcyMkkfOTh4sIgxedLgHja5GFNCGPs0yblvLulWbw7yYVzzEmEjXnclzS
 fmmifsJjGrUpegEnWdEjAQzXkWPd/hKiAvpzDE/3thBal5NkOzFrudITFvCVuk8w
 uS2MW0U8VCskNoON0jjwnvv84p0XdHJOsgPB9WnsuMMASKC1RqKAJWXh8AXvZA+I
 TfCNdSyHDTm+o1e+SMQZRbqoE/r7MmAxUQOkKnlvpJDCz58tsLzW64hRXTe7QpCt
 rry9/wODswu+oaHoDgfjAmzYde2RhCjwWLzGmqmapNIYfCCVhYs=
 =5bcW
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CPU feature updates from Dave Hansen:
 "The biggest thing of note here is Linear Address Space Separation
  (LASS). It represents the first time I can think of that the
  upper=>kernel/lower=>user address space convention is actually
  recognized by the hardware on x86. It ensures that userspace can not
  even get the hardware to _start_ page walks for the kernel address
  space. This, of course, is a really nice generic side channel defense.

  This is really only a down payment on LASS support. There are still
  some details to work out in its interaction with EFI calls and
  vsyscall emulation. For now, LASS is disabled if either of those
  features is compiled in (which is almost always the case).

  There's also one straggler commit in here which converts an
  under-utilized AMD CPU feature leaf into a generic Linux-defined leaf
  so more feature can be packed in there.

  Summary:

   - Enable Linear Address Space Separation (LASS)

   - Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined"

* tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Enable LASS during CPU initialization
  selftests/x86: Update the negative vsyscall tests to expect a #GP
  x86/traps: Communicate a LASS violation in #GP message
  x86/kexec: Disable LASS during relocate kernel
  x86/alternatives: Disable LASS when patching kernel code
  x86/asm: Introduce inline memcpy and memset
  x86/cpu: Add an LASS dependency on SMAP
  x86/cpufeatures: Enumerate the LASS feature bits
  x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
2025-12-02 14:48:08 -08:00
Paolo Bonzini
e0c26d47de - SCA rework
- VIRT_XFER_TO_GUEST_WORK support
 - Operation exception forwarding support
 - Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmktiX8ACgkQ41TmuOI4
 ufhozBAAuyPxu1cZqfAiuEpftR0fUFZeyqRLHqfFPNQUGW/kPZRz2uNd38qulboV
 gmbu5jcwf8SdbF+p8f7RLvkEyTEnzuXELrfSwcwyv9IUiK++p9gRNkuppHbNnTI7
 yK21hJz+jZmRzUrSxnLylTC3++RZczhVeHqHzwosnHcNerK6FLcIjjsl7YinJToI
 T3jiTmprXl5NzFu7O5N/3J2KAIqNr+3DfnOf2lnLzHeupc52Z6TtvdizypAAV7Yk
 qWQ/81HI8GtIPFWss1kNwrJXQBjgBObz3XBOtq0bw1Ycs+BijsQh424vFoetV1/n
 bdmEh38lfY3sbbSE3RomnEATRdzremiYb63v5E4Bg7/bpLPhXw+jMF2Hp8jNqOiZ
 jI7KpGPOA4+C1EzS+Uge81fksW+ylNEYk/dZgGQgOFtF8Vf+Ana0NloDAqMHUeXq
 gVI2Sd9nMR80WslVzs5DMj/XK86J2TsFxtKYPa1cHV9PkHegO+eJm2nWCRHbfddz
 iEymokTm9xmfykjFfKDwZ4EcB5vdV7cuNE8aedsp9NXgICrgDbPn8ualG6aZUB0c
 ScvfRuoiZT7e4D8UZ79uCOCPQqwGCffOfIOee3ocf/95ZVY+9xv7FTTh200DjBU2
 Jv1NoTe9ZOO4+dYWRsht0fzC7zBVDO3CEb6OcNRB9wgNidDQaeM=
 =PtzZ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-6.19-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

- SCA rework
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
2025-12-02 18:58:47 +01:00
Linus Torvalds
d42e504a55 Update to the time/timers core:
- Prevent a thundering herd problem when the timekeeper CPU is delayed
     and a large number of CPUs compete to acquire jiffies_lock to do the
     update. Limit it to one CPU with a separate "uncontended" atomic
     variable.
 
   - A set of improvements for the timer migration mechanism:
 
     - Support imbalanced NUMA trees correctly
 
     - Support dynamic exclusion of CPUs from the migrator duty to allow the
       cpuset/isolation mechanism to exclude them from handling timers of
       remote idle CPUs.
 
    - The usual small updates, cleanups and enhancements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmks7doTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaxrD/40nxx+8cEXsVbVLIkP2PQbd2Y8+7sk
 YbNu/Cb7j7Bg7R8YIs4p5GHk+7Yt/hNsW77SmbAzRPUyYYG6L3bUYlBa3yQlvIuo
 xRPbzGA+RJies9skIGHbQ8z6ig1zUASRJPcBYiuaVIAuQhCfLNc4Nii9cEWtjZ24
 +5gfRwV+vy74ArWwRkwaGejDK1tav+gd62OkFQZC8WtjQ08ozGZ6VBJNg7nYq/gH
 FYO1rH2tQ/ZyjlO/x5NF8gFcjYD8iv5PDp8oH35MPx+XTdDccf0G3QB7ug0ffVdV
 b4gA6lZTAmpsu/NHb6ByN4i/kf3wf8la/i+EaAh/Ov7NW078gunvVKVA7jStcbBl
 ZgG5SRHiKRvQF/WXLGVQAnilRDZwRuS0nmJlqfExa44v23l5o3768RwdRYwQlv8g
 X5KSRl0jlVgVtZHgNBlZtgX9+rnQSr9sB5sVGBP2a6a1WhVXQV/2kp0wjdnU0mPw
 jLCnSdsHqBlSf9V7O/na823WCnBFb7blrLBXUoSbHBnICqtVFzhE1kBXWw3S7Kqh
 CiaWM+S4WfR0HRnUlWMTS8BZ82MgiDnd7nGUXWwXBbdqWmoj/9CoU6SZRjbMBkzi
 EY1XvmoYf6eSzdxfydI1hFi0/bbb8K9umHQlrpW3HeN9uXnVc0/+TroVPLuaKUdi
 53ClqXjzE+CpJg==
 =lQKn
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:

 - Prevent a thundering herd problem when the timekeeper CPU is delayed
   and a large number of CPUs compete to acquire jiffies_lock to do the
   update. Limit it to one CPU with a separate "uncontended" atomic
   variable.

 - A set of improvements for the timer migration mechanism:

     - Support imbalanced NUMA trees correctly

     - Support dynamic exclusion of CPUs from the migrator duty to allow
       the cpuset/isolation mechanism to exclude them from handling
       timers of remote idle CPUs

 - The usual small updates, cleanups and enhancements

* tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Exclude isolated cpus from hierarchy
  cpumask: Add initialiser to use cleanup helpers
  sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
  cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
  timers/migration: Use scoped_guard on available flag set/clear
  timers/migration: Add mask for CPUs available in the hierarchy
  timers/migration: Rename 'online' bit to 'available'
  selftests/timers/nanosleep: Add tests for return of remaining time
  selftests/timers: Clean up kernel version check in posix_timers
  time: Fix a few typos in time[r] related code comments
  time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc
  hrtimer: Store time as ktime_t in restart block
  timers/migration: Remove dead code handling idle CPU checking for remote timers
  timers/migration: Remove unused "cpu" parameter from tmigr_get_group()
  timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy
  timers/migration: Fix imbalanced NUMA trees
  timers/migration: Remove locking on group connection
  timers/migration: Convert "while" loops to use "for"
  tick/sched: Limit non-timekeeper CPUs calling jiffies update
2025-12-02 09:58:33 -08:00
Paolo Bonzini
f58e70cc31 KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
    allowing the VMM to potentially handle the abort in a non-fatal
    manner.
 
  - Large rework of the VGIC's list register handling with the goal of
    supporting more active/pending IRQs than available list registers in
    hardware. In addition, the VGIC now supports EOImode==1 style
    deactivations for IRQs which may occur on a separate vCPU than the
    one that acked the IRQ.
 
  - Support for FEAT_XNX (user / privileged execute permissions) and
    FEAT_HAF (hardware update to the Access Flag) in the software page
    table walkers and shadow MMU.
 
  - Allow page table destruction to reschedule, fixing long need_resched
    latencies observed when destroying a large VM.
 
  - Minor fixes to KVM and selftests
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaS3m5RIcb3VwdG9uQGtl
 cm5lbC5vcmcACgkQor51iCR83Rb4NAD8C1fGoiCErb6htQMHf1I7ua0ThdIx7OnY
 Mk1EysNWu94BAI/VKEYgz+UC5uapHh+gnsoOdVTMJZedI/OPrnKa3QIA
 =/Vl1
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.19

 - Support for userspace handling of synchronous external aborts (SEAs),
   allowing the VMM to potentially handle the abort in a non-fatal
   manner.

 - Large rework of the VGIC's list register handling with the goal of
   supporting more active/pending IRQs than available list registers in
   hardware. In addition, the VGIC now supports EOImode==1 style
   deactivations for IRQs which may occur on a separate vCPU than the
   one that acked the IRQ.

 - Support for FEAT_XNX (user / privileged execute permissions) and
   FEAT_HAF (hardware update to the Access Flag) in the software page
   table walkers and shadow MMU.

 - Allow page table destruction to reschedule, fixing long need_resched
   latencies observed when destroying a large VM.

 - Minor fixes to KVM and selftests
2025-12-02 18:36:26 +01:00
Paolo Bonzini
63a9b0bc65 KVM/riscv changes for 6.19
- SBI MPXY support for KVM guest
 - New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
   AIA virtualization fails to allocate IMSIC VS-file
 - Support enabling dirty log gradually in small chunks
 - Fix guest page fault within HLV* instructions
 - Flush VS-stage TLB after VCPU migration for Andes cores
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmkpa8kACgkQrUjsVaLH
 LAd3lBAAhNlBVnva6fZseKf1ICGpwclXT/Ndqhn6CKWAPuvqsZvApQzTkW6f/txI
 dwhu7SfAJeH62bQRHoyH/gpd5I1cplogp/xmUcAQJrzD4W0Wf0799hdFNOm9PAJf
 IWeMMXSvj4CT8s3xinoKPt1YbmNvdDq3KkK776CET5B0/mIaGi3zBWC9ThU0aMl9
 mlUTvIojApqmdhe6rXpjIZWj/nSP8XrDuYVmJS1Ys4xvCRW4Qyiu4QU1OKYMcwYR
 xh6fgXDYufxojMs+h59mL8HOqBO5Kf79aO4lvjesFfZiRIii0+BATf16InH3XPyn
 bkX3RD4LqgkU4q9I5TtwZ+UpxFvrkigliUewLYrxWFgLzJu6kSBpACduQYDyNSgm
 X33iAm+m8V2tbl0FLHWRQGw970H9z4ycmEa4eII//+AePGTeFlHK90Qy9As2uW4E
 XQet0Wqh/tw+qHRpy7Bls1k5MRtyYGJwi4fbSOp/g8Kjgg/DzSsF+qN2FyNE8GNj
 +w8044fNYpDqd13BsSR99K/cUtFiAOjWN+RiMsu1wM8MRXpAL1lgW01KWqcH/LaD
 gKZjmevETiWMKDUdERkXj+e7xZCb2cfyheJ+vw9Ds5u8Dwp9p8cga8dGyvgcUTEX
 gF+4dx+MoW6uirX+Cd/TJYluu5c19bYKhgEybVBG/5er24cnshE=
 =9ob6
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.19

- SBI MPXY support for KVM guest
- New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
  AIA virtualization fails to allocate IMSIC VS-file
- Support enabling dirty log gradually in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
2025-12-02 18:35:25 +01:00
Paolo Bonzini
8040280405 LoongArch KVM changes for v6.19
1. Get VM PMU capability from HW GCFG register.
 2. Add AVEC basic support.
 3. Use 64-bit register definition for EIOINTC.
 4. Add KVM timer test cases for tools/selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmkpR/sWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImevPcD/9foNp5fo4MYnMe7WtRnWfjrAsY
 VLaNJclUr9tER7HGbRzfj//mx7JkTjCNqlD2Ii6r6N1tikU0o9OVAGVV4ROXbopJ
 efQxBZc5TfOrkecrCkKVJ634+tkwuf8Uea/jK2nxkE2UYCVIGPYlS0ZSkXB1lmi/
 YnYHGv7EOVAuJ64BsVOWfFQoKBD5AJtChibqTaUeZuq9Y6k087Ns3gPRS5AqjueG
 FFmKYO9pIZZV7hlV5+misR+UiKk7tk8p/7MjpBKN1fJ4P2j9dshfDb+uF1Ir671N
 F+ZxujYJkG+52NQuTSOq9q9EyWh7qzrlWRah/YpM3OMiRB9VpxuYvAthyN7o2NyA
 ftEmYYi+Ose24/ND6aeDQDKeoTtZm7UsfO5X4rMRC5VnrbHUH6d3ZlZQDpnfoeHA
 yw9eL4JI5i5DM8oFo/E8Ag38MUQ1o6btTgeQwXUTgGUZWGnNKfkdi3LTxKr2J18C
 5b2Pudhts6f8cL1pfNgbzbglkNtWdi2UBr7fwNZYHKK2i8JRX2rD9cfEdjWU0qxY
 Ybzqp6DL/+p38cGt29oQOv51+z/aEwOLTnnrf9wl7LBWRB/tbzuh6kIGGE6Ap9Wv
 qC+I0F/nitOSjmNmmb5HHOB4LnkjwRb6cJhzWZH1zrwz/ZkTQqyZqltOGsiHRo24
 z1TqIjJ0Er7CNfrb4Q==
 =880E
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.19

1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
2025-12-02 18:34:22 +01:00
Sean Christopherson
824d227324 KVM: selftests: Add a CPUID testcase for KVM_SET_CPUID2 with runtime updates
Add a CPUID testcase to verify that KVM allows KVM_SET_CPUID2 after (or in
conjunction with) runtime updates.  This is a regression test for the bug
introduced by commit 93da6af3ae ("KVM: x86: Defer runtime updates of
dynamic CPUID bits until CPUID emulation"), where KVM would incorrectly
reject KVM_SET_CPUID due to a not handling a pending runtime update on the
current CPUID, resulting in a false mismatch between the "old" and "new"
CPUID entries.

Link: https://lore.kernel.org/all/20251128123202.68424a95@imammedo
Link: https://patch.msgid.link/20251202015049.1167490-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-12-02 08:49:32 -08:00
Gavin Shan
1b9439c933 KVM: selftests: Add missing "break" in rseq_test's param parsing
In commit 0297cdc12a ("KVM: selftests: Add option to rseq test to
override /dev/cpu_dma_latency"), a 'break' is missed before the option
'l' in the argument parsing loop, which leads to an unexpected core
dump in atoi_paranoid(). It tries to get the latency from non-existent
argument.

  host$ ./rseq_test -u
  Random seed: 0x6b8b4567
  Segmentation fault (core dumped)

Add a 'break' before the option 'l' in the argument parsing loop to avoid
the unexpected core dump.

Fixes: 0297cdc12a ("KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Gavin Shan <gshan@redhat.com>
Link: https://patch.msgid.link/20251124050427.1924591-1-gshan@redhat.com
[sean: describe code change in shortlog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-12-02 08:49:10 -08:00
Xiang Mei
108f9405ce selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
Add tests that trigger packet drops in cake_enqueue(): "CAKE with QFQ
Parent - CAKE enqueue with packets dropping". It forces CAKE_enqueue to
return NET_XMIT_CN after dropping the packets when it has a QFQ parent.

Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20251128001415.377823-3-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-02 13:28:00 +01:00
Takashi Iwai
9747b22a41 ASoC: Updates for v6.19
This is a very large set of updates, as well as some more extensive
 cleanup work from Morimto-san we've also added a generic SCDA class
 driver for SoundWire devices enabling us to support many chips with
 no custom code.  There's also a batch of new drivers added for both
 SoCs and CODECs.
 
  - Added a SoundWire SCDA generic class driver, pulling in a little
    regmap work to support it.
  - A *lot* of cleaup and API improvement work from Morimoto-san.
  - Lots of work on the existing Cirrus, Intel, Maxim and Qualcomm
    drivers.
  - Support for Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
    QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
    TAS5815, TAS5828 and TAS5830.
 
 This also pulls in some gpiolib changes supporting shared GPIOs in the
 core there so we can convert some of the ASoC drivers open coding
 handling of that to the core functionality.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmkt6lUACgkQJNaLcl1U
 h9D7dgf+JP2+yZIeRBud7CEO4Docda2uoRssT7GAIY/Rqrpem5FI0c0pWyZISvhn
 scyjkoCrQfHEoeYrtC3l5bDI7F8o5Tc91hGzhJiCi3mb8jSwi+CaNIpR0Cet3epV
 B9wQgzxlxbmKCxJRUYTPQF3n1uBJWc5EBHSc5QPddTZ0vdUfSlX0FAKHsabpmaOC
 TpkdJnOlH8WUokmP3kP3TpzlflmOSLehnWX4BelJe5Os5O0PQpiKh/JG3oCYHSmc
 yEbzCjOaya80HHn11FShOpy+B4b6sLUMcN+CAmDiLAdNFGvvjgmjpwwZtLYAm09Z
 zFhN7XuVk1vXf+Zx/jHqYKaZtvvAsQ==
 =Xwls
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Updates for v6.19

This is a very large set of updates, as well as some more extensive
cleanup work from Morimto-san we've also added a generic SCDA class
driver for SoundWire devices enabling us to support many chips with
no custom code.  There's also a batch of new drivers added for both
SoCs and CODECs.

 - Added a SoundWire SCDA generic class driver, pulling in a little
   regmap work to support it.
 - A *lot* of cleaup and API improvement work from Morimoto-san.
 - Lots of work on the existing Cirrus, Intel, Maxim and Qualcomm
   drivers.
 - Support for Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
   QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
   TAS5815, TAS5828 and TAS5830.

This also pulls in some gpiolib changes supporting shared GPIOs in the
core there so we can convert some of the ASoC drivers open coding
handling of that to the core functionality.
2025-12-02 07:12:56 +01:00
Carolina Jubran
5cc1bddcfe selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
Currently, tolerance is computed against the TC’s expected percentage,
making TC3 (20%) validation overly strict and TC4 (80%) overly loose.

Update BandwidthValidator to take a dict of shares and compute bounds
relative to the overall total, so that all shares are validated
consistently.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-7-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:18:41 -08:00
Carolina Jubran
9ecd05a2c8 selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
Correct the documented bandwidth distribution between TC3 and TC4
from 80/20 to 20/80. Update test descriptions and printed messages
to consistently reflect the intended split.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-6-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:18:41 -08:00
Carolina Jubran
3796e549e3 selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
Commit 7c32f7a2d3db ("selftests: net: py: don't default to shell=True")
changed the cmd() helper to avoid spawning a shell unless explicitly
requested.

The devlink_rate_tc_bw test enables SR-IOV by writing to the
sriov_numvfs sysfs attribute using redirection. Without shell=True the
redirection is not interpreted and the VF device never appears,
causing the test to fail.

Fix by explicitly passing shell=True in the two places that update
sriov_numvfs.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-5-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:18:41 -08:00
Carolina Jubran
cb1acbd30a selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
Replace the inline iperf3 subprocess and JSON parsing with
Iperf3Runner.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-4-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:18:41 -08:00
Carolina Jubran
2a60ce94c6 selftests: drv-net: introduce Iperf3Runner for measurement use cases
GenerateTraffic was added to spin up long-running iperf3 load, mainly
to drive high PPS background traffic. It was never meant to provide
stable throughput numbers, and trying to repurpose it for measurement
does not make sense.

Introduce Iperf3Runner to allow tests to split out server/client
configuration, control start/stop, and collect JSON output for
analysis. This makes it possible to measure bandwidth directly when
validating egress shaping.

GenerateTraffic stays as the background load generator, reusing the
common iperf3 helpers under the hood.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-3-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:18:41 -08:00
Carolina Jubran
a8658f7bb6 selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
This makes devlink_rate_tc_bw.py present in the Makefile under the same
directory.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-2-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:18:40 -08:00
Andre Carvalho
e3b8cbf40c selftests: netconsole: remove log noise due to socat exit
This removes some noise that can be distracting while looking at
selftests by redirecting socat stderr to /dev/null.

Before this commit, netcons_basic would output:

Running with target mode: basic (ipv6)
2025/11/29 12:08:03 socat[259] W exiting on signal 15
2025/11/29 12:08:03 socat[271] W exiting on signal 15
basic : ipv6 : Test passed
Running with target mode: basic (ipv4)
2025/11/29 12:08:05 socat[329] W exiting on signal 15
2025/11/29 12:08:05 socat[322] W exiting on signal 15
basic : ipv4 : Test passed
Running with target mode: extended (ipv6)
2025/11/29 12:08:08 socat[386] W exiting on signal 15
2025/11/29 12:08:08 socat[386] W exiting on signal 15
2025/11/29 12:08:08 socat[380] W exiting on signal 15
extended : ipv6 : Test passed
Running with target mode: extended (ipv4)
2025/11/29 12:08:10 socat[440] W exiting on signal 15
2025/11/29 12:08:10 socat[435] W exiting on signal 15
2025/11/29 12:08:10 socat[435] W exiting on signal 15
extended : ipv4 : Test passed

After these changes, output looks like:

Running with target mode: basic (ipv6)
basic : ipv6 : Test passed
Running with target mode: basic (ipv4)
basic : ipv4 : Test passed
Running with target mode: extended (ipv6)
extended : ipv6 : Test passed
Running with target mode: extended (ipv4)
extended : ipv4 : Test passed

Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251129-netcons-socat-noise-v1-1-605a0cea8fca@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 12:06:55 -08:00
Jakub Kicinski
aadff9f766 selftests: net: add a hint about MACAddressPolicy=persistent
New NIPA installation had been reporting a few flaky tests.
arp_ndisc_evict_nocarrier is most flaky of them all.
I suspect that the flakiness is due to udev swapping the MAC
addresses on the interfaces. Extend the message in
arp_ndisc_evict_nocarrier to hint at this potential issue.
Having the neigh get fail right after ping is rather unusual,
unless udev changes the MAC addr causing a flush in the meantime.

Link: https://patch.msgid.link/20251127194556.2409574-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 12:02:13 -08:00
Jakub Kicinski
4b1639cac0 selftests: net: py: handle interrupt during cleanup
Following up on the old discussion [1]. Let the BaseExceptions out of
defer()'ed cleanup. And handle it in the main loop. This allows us to
exit the tests if user hit Ctrl-C during defer().

Link: https://lore.kernel.org/20251119063228.3adfd743@kernel.org # [1]
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251128004846.2602687-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 12:01:29 -08:00
Linus Torvalds
212c4053a1 vfs-6.19-rc1.coredump
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 oji0AQC5jl35xh04fJKB343InVAxtRFp8mSkJJ9Bx6x7xA7a+QEAiBMxYilUgYIW
 bZMcI5LU+gNO/1y076QkVt84jTUQLww=
 =WIBZ
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfd and coredump updates from Christian Brauner:
 "Features:

   - Expose coredump signal via pidfd

     Expose the signal that caused the coredump through the pidfd
     interface. The recent changes to rework coredump handling to rely
     on unix sockets are in the process of being used in systemd. The
     previous systemd coredump container interface requires the coredump
     file descriptor and basic information including the signal number
     to be sent to the container. This means the signal number needs to
     be available before sending the coredump to the container.

   - Add supported_mask field to pidfd

     Add a new supported_mask field to struct pidfd_info that indicates
     which information fields are supported by the running kernel. This
     allows userspace to detect feature availability without relying on
     error codes or kernel version checks.

  Cleanups:

   - Drop struct pidfs_exit_info and prepare to drop exit_info pointer,
     simplifying the internal publication mechanism for exit and
     coredump information retrievable via the pidfd ioctl

   - Use guard() for task_lock in pidfs

   - Reduce wait_pidfd lock scope

   - Add missing PIDFD_INFO_SIZE_VER1 constant

   - Add missing BUILD_BUG_ON() assert on struct pidfd_info

  Fixes:

   - Fix PIDFD_INFO_COREDUMP handling

  Selftests:

   - Split out coredump socket tests and common helpers into separate
     files for better organization

   - Fix userspace coredump client detection issues

   - Handle edge-triggered epoll correctly

   - Ignore ENOSPC errors in tests

   - Add debug logging to coredump socket tests, socket protocol tests,
     and test helpers

   - Add tests for PIDFD_INFO_COREDUMP_SIGNAL

   - Add tests for supported_mask field

   - Update pidfd header for selftests"

* tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
  pidfs: reduce wait_pidfd lock scope
  selftests/coredump: add second PIDFD_INFO_COREDUMP_SIGNAL test
  selftests/coredump: add first PIDFD_INFO_COREDUMP_SIGNAL test
  selftests/coredump: ignore ENOSPC errors
  selftests/coredump: add debug logging to coredump socket protocol tests
  selftests/coredump: add debug logging to coredump socket tests
  selftests/coredump: add debug logging to test helpers
  selftests/coredump: handle edge-triggered epoll correctly
  selftests/coredump: fix userspace coredump client detection
  selftests/coredump: fix userspace client detection
  selftests/coredump: split out coredump socket tests
  selftests/coredump: split out common helpers
  selftests/pidfd: add second supported_mask test
  selftests/pidfd: add first supported_mask test
  selftests/pidfd: update pidfd header
  pidfs: expose coredump signal
  pidfs: drop struct pidfs_exit_info
  pidfs: prepare to drop exit_info pointer
  pidfd: add a new supported_mask field
  pidfs: add missing BUILD_BUG_ON() assert on struct pidfd_info
  ...
2025-12-01 10:17:39 -08:00
Linus Torvalds
415d34b92c namespace-6.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 ooKwAP4kR5kMjHlthf8jHmmCjVU3nQFO9hUZsIQL9gFJLOIQMAD+LLoTaq1WJufl
 oSgZpREXZVmI1TK61eR6EZMB1YikGAo=
 =TExi
 -----END PGP SIGNATURE-----

Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains substantial namespace infrastructure changes including a new
  system call, active reference counting, and extensive header cleanups.
  The branch depends on the shared kbuild branch for -fms-extensions support.

  Features:

   - listns() system call

     Add a new listns() system call that allows userspace to iterate
     through namespaces in the system. This provides a programmatic
     interface to discover and inspect namespaces, addressing
     longstanding limitations:

     Currently, there is no direct way for userspace to enumerate
     namespaces. Applications must resort to scanning /proc/*/ns/ across
     all processes, which is:
      - Inefficient - requires iterating over all processes
      - Incomplete - misses namespaces not attached to any running
        process but kept alive by file descriptors, bind mounts, or
        parent references
      - Permission-heavy - requires access to /proc for many processes
      - No ordering or ownership information
      - No filtering per namespace type

     The listns() system call solves these problems:

       ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
                      size_t nr_ns_ids, unsigned int flags);

       struct ns_id_req {
             __u32 size;
             __u32 spare;
             __u64 ns_id;
             struct /* listns */ {
                     __u32 ns_type;
                     __u32 spare2;
                     __u64 user_ns_id;
             };
       };

     Features include:
      - Pagination support for large namespace sets
      - Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
      - Filtering by owning user namespace
      - Permission checks respecting namespace isolation

   - Active Reference Counting

     Introduce an active reference count that tracks namespace
     visibility to userspace. A namespace is visible in the following
     cases:
      - The namespace is in use by a task
      - The namespace is persisted through a VFS object (namespace file
        descriptor or bind-mount)
      - The namespace is a hierarchical type and is the parent of child
        namespaces

     The active reference count does not regulate lifetime (that's still
     done by the normal reference count) - it only regulates visibility
     to namespace file handles and listns().

     This prevents resurrection of namespaces that are pinned only for
     internal kernel reasons (e.g., user namespaces held by
     file->f_cred, lazy TLB references on idle CPUs, etc.) which should
     not be accessible via (1)-(3).

   - Unified Namespace Tree

     Introduce a unified tree structure for all namespaces with:
      - Fixed IDs assigned to initial namespaces
      - Lookup based solely on inode number
      - Maintained list of owned namespaces per user namespace
      - Simplified rbtree comparison helpers

   Cleanups

    - Header Reorganization:
      - Move namespace types into separate header (ns_common_types.h)
      - Decouple nstree from ns_common header
      - Move nstree types into separate header
      - Switch to new ns_tree_{node,root} structures with helper functions
      - Use guards for ns_tree_lock

   - Initial Namespace Reference Count Optimization
      - Make all reference counts on initial namespaces a nop to avoid
        pointless cacheline ping-pong for namespaces that can never go
        away
      - Drop custom reference count initialization for initial namespaces
      - Add NS_COMMON_INIT() macro and use it for all namespaces
      - pid: rely on common reference count behavior

   - Miscellaneous Cleanups
      - Rename exit_task_namespaces() to exit_nsproxy_namespaces()
      - Rename is_initial_namespace() and make argument const
      - Use boolean to indicate anonymous mount namespace
      - Simplify owner list iteration in nstree
      - nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
      - nsfs: use inode_just_drop()
      - pidfs: raise DCACHE_DONTCACHE explicitly
      - pidfs: simplify PIDFD_GET__NAMESPACE ioctls
      - libfs: allow to specify s_d_flags
      - cgroup: add cgroup namespace to tree after owner is set
      - nsproxy: fix free_nsproxy() and simplify create_new_namespaces()

  Fixes:

   - setns(pidfd, ...) race condition

     Fix a subtle race when using pidfds with setns(). When the target
     task exits after prepare_nsset() but before commit_nsset(), the
     namespace's active reference count might have been dropped. If
     setns() then installs the namespaces, it would bump the active
     reference count from zero without taking the required reference on
     the owner namespace, leading to underflow when later decremented.

     The fix resurrects the ownership chain if necessary - if the caller
     succeeded in grabbing passive references, the setns() should
     succeed even if the target task exits or gets reaped.

   - Return EFAULT on put_user() error instead of success

   - Make sure references are dropped outside of RCU lock (some
     namespaces like mount namespace sleep when putting the last
     reference)

   - Don't skip active reference count initialization for network
     namespace

   - Add asserts for active refcount underflow

   - Add asserts for initial namespace reference counts (both passive
     and active)

   - ipc: enable is_ns_init_id() assertions

   - Fix kernel-doc comments for internal nstree functions

   - Selftests
      - 15 active reference count tests
      - 9 listns() functionality tests
      - 7 listns() permission tests
      - 12 inactive namespace resurrection tests
      - 3 threaded active reference count tests
      - commit_creds() active reference tests
      - Pagination and stress tests
      - EFAULT handling test
      - nsid tests fixes"

* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
  pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
  nstree: fix kernel-doc comments for internal functions
  nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
  selftests/namespaces: fix nsid tests
  ns: drop custom reference count initialization for initial namespaces
  pid: rely on common reference count behavior
  ns: add asserts for initial namespace active reference counts
  ns: add asserts for initial namespace reference counts
  ns: make all reference counts on initial namespace a nop
  ipc: enable is_ns_init_id() assertions
  fs: use boolean to indicate anonymous mount namespace
  ns: rename is_initial_namespace()
  ns: make is_initial_namespace() argument const
  nstree: use guards for ns_tree_lock
  nstree: simplify owner list iteration
  nstree: switch to new structures
  nstree: add helper to operate on struct ns_tree_{node,root}
  nstree: move nstree types into separate header
  nstree: decouple from ns_common header
  ns: move namespace types into separate header
  ...
2025-12-01 09:47:41 -08:00
Takashi Iwai
72987d2ddc Merge branch 'for-linus' into for-next
Pull remaining 6.18-devel changes.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-12-01 16:25:31 +01:00
Oliver Upton
3eef0c83c3 Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next
* kvm-arm64/nv-xnx-haf: (22 commits)
  : Support for FEAT_XNX and FEAT_HAF in nested
  :
  : Add support for a couple of MMU-related features that weren't
  : implemented by KVM's software page table walk:
  :
  :  - FEAT_XNX: Allows the hypervisor to describe execute permissions
  :    separately for EL0 and EL1
  :
  :  - FEAT_HAF: Hardware update of the Access Flag, which in the context of
  :    nested means software walkers must also set the Access Flag.
  :
  : The series also adds some basic support for testing KVM's emulation of
  : the AT instruction, including the implementation detail that AT sets the
  : Access Flag in KVM.
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:41 -08:00
Oliver Upton
938309b028 Merge branch 'kvm-arm64/vgic-lr-overflow' into kvmarm/next
* kvm-arm64/vgic-lr-overflow: (50 commits)
  : Support for VGIC LR overflows, courtesy of Marc Zyngier
  :
  : Address deficiencies in KVM's GIC emulation when a vCPU has more active
  : IRQs than can be represented in the VGIC list registers. Sort the AP
  : list to prioritize inactive and pending IRQs, potentially spilling
  : active IRQs outside of the LRs.
  :
  : Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
  : which involves special consideration for SPIs being deactivated from a
  : different vCPU than the one that acked it.
  KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
  KVM: arm64: selftests: vgic_irq: Add timer deactivation test
  KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
  KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
  KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
  KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
  KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
  KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
  KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
  KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
  KVM: arm64: selftests: gic_v3: Add irq group setting helper
  KVM: arm64: GICv2: Always trap GICV_DIR register
  KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
  KVM: arm64: GICv2: Handle LR overflow when EOImode==0
  KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
  KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
  KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
  KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
  KVM: arm64: GICv3: Handle in-LR deactivation when possible
  KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:32 -08:00
Oliver Upton
11b8e6edc1 Merge branch 'kvm-arm64/sea-user' into kvmarm/next
* kvm-arm64/sea-user:
  : Userspace handling of SEAs, courtesy of Jiaqi Yan
  :
  : Add support for processing external aborts in userspace in situations
  : where the host has failed to do so, allowing the VMM to potentially
  : reinject an external abort into the VM.
  Documentation: kvm: new UAPI for handling SEA
  KVM: selftests: Test for KVM_EXIT_ARM_SEA
  KVM: arm64: VM exit to userspace to handle SEA

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:20 -08:00
Colin Ian King
05474b7bc7 KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
There is a spelling mistake in a TEST_FAIL message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://msgid.link/20251128175124.319094-1-colin.i.king@gmail.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Oliver Upton
66f1888583 KVM: arm64: selftests: Add test for AT emulation
Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Frederic Weisbecker
9a08942f17 Merge branch 'rcu/misc' into next
- In order to prepare the layout for nohz_full work deferral to
  user exit, the context tracking state must shrink the counter
  of transitions to/from RCU not watching. The only possible hazard
  is to trigger wrap-around more easily, delaying a bit grace periods
  when that happens. This should be a rare event though. Yet add
  debugging and torture code to test that assumption.

- Fix memory leak on locktorture module

- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
  On recent discussions, we also concluded that all those WRITE_ONCE()
  and READ_ONCE() on list APIs deserve appropriate comments. Something
  to be expected for the next cycle.

- Provide a script to apply several configs to several commits with torture.

- Allow torture to reuse a build directory in order to save needless
  rebuild time.

- Various cleanups.
2025-11-30 22:20:33 +01:00
Ankit Khushwaha
0384c8ea96 selftests/mm/uffd: initialize char variable to Null
In "uffd-stress.c" & "uffd-unit-tests.c". address of char variable having
garbage value (uninitialized) is passed to 'write' syscall triggers
warning.

	uffd-stress.c:246:39: warning: variable 'c' is uninitialized when
	passed  as a const pointer argument here
	[-Wuninitialized-const-pointer]

	uffd-unit-tests.c:581:31: warning: variable 'c' is uninitialized
	when passed as a const pointer argument here
	[-Wuninitialized-const-pointer]

so the fix is to assign char variable to '\0' to prevent writing of
garbage value.

Link: https://lkml.kernel.org/r/20251126160830.52124-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Lorenzo Stoakes
9ea35a25d5 mm: introduce VMA flags bitmap type
It is useful to transition to using a bitmap for VMA flags so we can avoid
running out of flags, especially for 32-bit kernels which are constrained
to 32 flags, necessitating some features to be limited to 64-bit kernels
only.

By doing so, we remove any constraint on the number of VMA flags moving
forwards no matter the platform and can decide in future to extend beyond
64 if required.

We start by declaring an opaque types, vma_flags_t (which resembles
mm_struct flags of type mm_flags_t), setting it to precisely the same size
as vm_flags_t, and place it in union with vm_flags in the VMA declaration.

We additionally update struct vm_area_desc equivalently placing the new
opaque type in union with vm_flags.

This change therefore does not impact the size of struct vm_area_struct or
struct vm_area_desc.

In order for the change to be iterative and to avoid impacting
performance, we designate VM_xxx declared bitmap flag values as those
which must exist in the first system word of the VMA flags bitmap.

We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(),
vma_flags_overwrite_word(), vma_flags_overwrite_word_once(),
vma_flags_set_word() and vma_flags_clear_word() in order to allow us to
update the existing vm_flags_*() functions to utilise these helpers.

This is a stepping stone towards converting users to the VMA flags bitmap
and behaves precisely as before.

By doing this, we can eliminate the existing private vma->__vm_flags field
in the vma->vm_flags union and replace it with the newly introduced opaque
type vma_flags, which we call flags so we refer to the new bitmap field as
vma->flags.

We update vma_flag_[test, set]_atomic() to account for the change also.

We adapt vm_flags_reset_once() to only clear those bits above the first
system word providing write-once semantics to the first system word (which
it is presumed the caller requires - and in all current use cases this is
so).

As we currently only specify that the VMA flags bitmap size is equal to
BITS_PER_LONG number of bits, this is a noop, but is defensive in
preparation for a future change that increases this.

We additionally update the VMA userland test declarations to implement the
same changes there.

Finally, we update the rust code to reference vma->vm_flags on update
rather than vma->__vm_flags which has been removed.  This is safe for now,
albeit it is implicitly performing a const cast.

Once we introduce flag helpers we can improve this more.

No functional change intended.

Link: https://lkml.kernel.org/r/bab179d7b153ac12f221b7d65caac2759282cfe9.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Lorenzo Stoakes
4c613f518f tools/testing/vma: eliminate dependency on vma->__vm_flags
The userland VMA test code relied on an internal implementation detail -
the existence of vma->__vm_flags to directly access VMA flags.  There is
no need to do so when we have the vm_flags_*() helper functions available.

This is ugly, but also a subsequent commit will eliminate this field
altogether so this will shortly become broken.

This patch has us utilise the helper functions instead.

Link: https://lkml.kernel.org/r/6275c53a6bb20743edcbe92d3e130183b47d18d0.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:09 -08:00
Lorenzo Stoakes
2b6a3f061f mm: declare VMA flags by bit
Patch series "initial work on making VMA flags a bitmap", v3.

We are in the rather silly situation that we are running out of VMA flags
as they are currently limited to a system word in size.

This leads to absurd situations where we limit features to 64-bit
architectures only because we simply do not have the ability to add a flag
for 32-bit ones.

This is very constraining and leads to hacks or, in the worst case, simply
an inability to implement features we want for entirely arbitrary reasons.

This also of course gives us something of a Y2K type situation in mm where
we might eventually exhaust all of the VMA flags even on 64-bit systems.

This series lays the groundwork for getting away from this limitation by
establishing VMA flags as a bitmap whose size we can increase in future
beyond 64 bits if required.

This is necessarily a highly iterative process given the extensive use of
VMA flags throughout the kernel, so we start by performing basic steps.

Firstly, we declare VMA flags by bit number rather than by value,
retaining the VM_xxx fields but in terms of these newly introduced
VMA_xxx_BIT fields.

While we are here, we use sparse annotations to ensure that, when dealing
with VMA bit number parameters, we cannot be passed values which are not
declared as such - providing some useful type safety.

We then introduce an opaque VMA flag type, much like the opaque mm_struct
flag type introduced in commit bb6525f2f8 ("mm: add bitmap mm->flags
field"), which we establish in union with vma->vm_flags (but still set at
system word size meaning there is no functional or data type size change).

We update the vm_flags_xxx() helpers to use this new bitmap, introducing
sensible helpers to do so.

This series lays the foundation for further work to expand the use of
bitmap VMA flags and eventually eliminate these arbitrary restrictions.


This patch (of 4):

In order to lay the groundwork for VMA flags being a bitmap rather than a
system word in size, we need to be able to consistently refer to VMA flags
by bit number rather than value.

Take this opportunity to do so in an enum which we which is additionally
useful for tooling to extract metadata from.

This additionally makes it very clear which bits are being used for what
at a glance.

We use the VMA_ prefix for the bit values as it is logical to do so since
these reference VMAs.  We consistently suffix with _BIT to make it clear
what the values refer to.

We declare bit values even when the flags that use them would not be
enabled by config options as this is simply clearer and clearly defines
what bit numbers are used for what, at no additional cost.

We declare a sparse-bitwise type vma_flag_t which ensures that users can't
pass around invalid VMA flags by accident and prepares for future work
towards VMA flags being a bitmap where we want to ensure bit values are
type safe.

To make life easier, we declare some macro helpers - DECLARE_VMA_BIT()
allows us to avoid duplication in the enum bit number declarations (and
maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to
assist with declaration of flags.

Unfortunately we can't declare both in the enum, as we run into issue with
logic in the kernel requiring that flags are preprocessor definitions, and
additionally we cannot have a macro which declares another macro so we
must define each flag macro directly.

Additionally, update the VMA userland testing vma_internal.h header to
include these changes.

We also have to fix the parameters to the vma_flag_*_atomic() functions
since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will
complain otherwise.

We have to update some rather silly if-deffery found in mm/task_mmu.c
which would otherwise break.

Finally, we update the rust binding helper as now it cannot auto-detect
the flags at all.

Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com>	[rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-29 10:41:08 -08:00
Alexis Lothoré (eBPF Foundation)
1d17bcce6a selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
test_tc_edt currently defines the target rate in both the userspace and
BPF parts. This value could be defined once in the userspace part if we
make it able to configure the BPF program before starting the test.

Add a target_rate variable in the BPF part, and make the userspace part
set it to the desired rate before attaching the shaping program.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Alexis Lothoré (eBPF Foundation)
50ce5ea5f7 selftests/bpf: remove test_tc_edt.sh
Now that test_tc_edt has been integrated in test_progs, remove the
legacy shell script.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Alexis Lothoré (eBPF Foundation)
b0f82e7ab6 selftests/bpf: integrate test_tc_edt into test_progs
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX
veth to shape the traffic to 5MBps. It then checks that the amount of
received bytes (at interface level), compared to the TX duration, indeed
matches 5Mbps.

Convert this test script to the test_progs framework:
- keep the double veth setup, isolated in two veths
- run a small tcp server, and connect client to server
- push a pre-configured amount of bytes, and measure how much time has
  been needed to push those
- ensure that this rate is in a 2% error margin around the target rate

This two percent value, while being tight, is hopefully large enough to
not make the test too flaky in CI, while also turning it into a small
example of BPF-based shaping.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Alexis Lothoré (eBPF Foundation)
4b4833acc6 selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
The test_tc_edt BPF program uses a custom section name, which works fine
when manually loading it with tc, but prevents it from being loaded with
libbpf.

Update the program section name to "tc" to be able to manipulate it with
a libbpf-based C test.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Kumar Kartikeya Dwivedi
3448375e71 selftests/bpf: Add success stats to rqspinlock stress test
Add stats to observe the success and failure rate of lock acquisition
attempts in various contexts.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:35:36 -08:00
Hangbin Liu
2c28ee720a selftests: bonding: add delay before each xvlan_over_bond connectivity check
Jakub reported increased flakiness in bond_macvlan_ipvlan.sh on regular
kernel, while the tests consistently pass on a debug kernel. This suggests
a timing-sensitive issue.

To mitigate this, introduce a short sleep before each xvlan_over_bond
connectivity check. The delay helps ensure neighbor and route cache
have fully converged before verifying connectivity.

The sleep interval is kept minimal since check_connection() is invoked
nearly 100 times during the test.

Fixes: 246af950b9 ("selftests: bonding: add macvlan over bond testing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251114082014.750edfad@kernel.org
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251127143310.47740-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:16:22 -08:00
Hoyeon Lee
bd5bdd200c bpf: Remove runqslower tool
runqslower was added in commit 9c01546d26 "tools/bpf: Add runqslower
tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf
workflows. runqslower continues to live in BCC (libbpf-tools), so there
is no need to keep building and maintaining it.

Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and
selftests accordingly.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28 15:20:16 -08:00
Amery Hung
a3a60cc120 selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
file_alloc_security hook is disabled. Use other LSM hooks in selftests
instead.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20251126202927.2584874-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28 15:18:28 -08:00
Anton Protopopov
7feff23cdf bpf: force BPF_F_RDONLY_PROG on insn array creation
The original implementation added a hack to check_mem_access()
to prevent programs from writing into insn arrays. To get rid
of this hack, enforce BPF_F_RDONLY_PROG on map creation.

Also fix the corresponding selftest, as the error message changes
with this patch.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28 15:15:43 -08:00
David Matlack
d721f52e31 vfio: selftests: Add vfio_pci_device_init_perf_test
Add a new VFIO selftest for measuring the time it takes to run
vfio_pci_device_init() in parallel for one or more devices.

This test serves as manual regression test for the performance
improvement of commit e908f58b6b ("vfio/pci: Separate SR-IOV VF
dev_set"). For example, when running this test with 64 VFs under the
same PF:

Before:

  $ ./vfio_pci_device_init_perf_test -r vfio_pci_device_init_perf_test.iommufd.init 0000:1a:00.0 0000:1a:00.1 ...
  ...
  Wall time: 6.653234463s
  Min init time (per device): 0.101215344s
  Max init time (per device): 6.652755941s
  Avg init time (per device): 3.377609608s

After:

  $ ./vfio_pci_device_init_perf_test -r vfio_pci_device_init_perf_test.iommufd.init 0000:1a:00.0 0000:1a:00.1 ...
  ...
  Wall time: 0.122978332s
  Min init time (per device): 0.108121915s
  Max init time (per device): 0.122762761s
  Avg init time (per device): 0.113816748s

This test does not make any assertions about performance, since any such
assertion is likely to be flaky due to system differences and random
noise. However this test can be fed into automation to detect
regressions, and can be used by developers in the future to measure
performance optimizations.

Suggested-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-19-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:07 -07:00
David Matlack
b8e96c8805 vfio: selftests: Eliminate INVALID_IOVA
Eliminate INVALID_IOVA as there are platforms where UINT64_MAX is a
valid iova.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-18-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:07 -07:00
David Matlack
5fabc49abf vfio: selftests: Split libvfio.h into separate header files
Split out the contents of libvfio.h into separate header files, but keep
libvfio.h as the top-level include that all tests can use.

Put all new header files into a libvfio/ subdirectory to avoid future
name conflicts in include paths when libvfio is used by other selftests
like KVM.

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-17-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:07 -07:00
David Matlack
19cf492c1b vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c
Move the vfio_selftests_*() helpers into their own file libvfio.c. These
helpers have nothing to do with struct vfio_pci_device, so they don't
make sense in vfio_pci_device.c.

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-16-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:07 -07:00
David Matlack
657d241e69 vfio: selftests: Rename vfio_util.h to libvfio.h
Rename vfio_util.h to libvfio.h to match the name of libvfio.mk.

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-15-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:07 -07:00
David Matlack
831c37a5bf vfio: selftests: Stop passing device for IOMMU operations
Drop the struct vfio_pci_device wrappers for IOMMU map/unmap functions
and require tests to directly call iommu_map(), iommu_unmap(), etc. This
results in more concise code, and also makes it clear the map operations
are happening on a struct iommu, not necessarily on a specific device,
especially when multi-device tests are introduced.

Do the same for iova_allocator_init() as that function only needs the
struct iommu, not struct vfio_pci_device.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-14-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
2607a43613 vfio: selftests: Move IOVA allocator into iova_allocator.c
Move the IOVA allocator into its own file, to provide better separation
between the allocator and the struct vfio_pci_device helper code.

The allocator could go into iommu.c, but it is standalone enough that a
separate file seems cleaner. This also continues the trend of having a
.c for every major object in VFIO selftests (vfio_pci_device.c,
vfio_pci_driver.c, iommu.c, and now iova_allocator.c).

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-13-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
2aca571089 vfio: selftests: Move IOMMU library code into iommu.c
Move all the IOMMU related library code into their own file iommu.c.
This provides a better separation between the vfio_pci_device helper
code and the iommu code.

No function change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-12-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
9a659d74f2 vfio: selftests: Rename struct vfio_dma_region to dma_region
Rename struct vfio_dma_region to dma_region. This is in preparation for
separating the VFIO PCI device library code from the IOMMU library code.
This name change also better reflects the fact that DMA mappings can be
managed by either VFIO or IOMMUFD. i.e. the "vfio_" prefix is
misleading.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-11-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
28a84da744 vfio: selftests: Upgrade driver logging to dev_err()
Upgrade various logging in the VFIO selftests drivers from dev_info() to
dev_err(). All of these logs indicate scenarios that may be unexpected.
For example, the logging during probing indicates matching devices but
that aren't supported by the driver. And the memcpy errors can indicate
a problem if the caller was not trying to do something like exercise I/O
fault handling. Exercising I/O fault handling is certainly a valid thing
to do, but the driver can't infer the caller's expectations, so better
to just log with dev_err().

Suggested-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-10-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
c48545442e vfio: selftests: Prefix logs with device BDF where relevant
Prefix log messages with the device's BDF where relevant. This will help
understanding VFIO selftests logs when tests are run with multiple
devices.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-9-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
6c74d9830d vfio: selftests: Eliminate overly chatty logging
Eliminate overly chatty logs that are printed during almost every test.
These logs are adding more noise than value. If a test cares about this
information it can log it itself. This is especially true as the VFIO
selftests gains support for multiple devices in a single test (which
multiplies all these logs).

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-8-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
d8470a775c vfio: selftests: Support multiple devices in the same container/iommufd
Support tests that want to add multiple devices to the same
container/iommufd by decoupling struct vfio_pci_device from
struct iommu.

Multi-devices tests can now put multiple devices in the same
container/iommufd like so:

  iommu = iommu_init(iommu_mode);

  device1 = vfio_pci_device_init(bdf1, iommu);
  device2 = vfio_pci_device_init(bdf2, iommu);
  device3 = vfio_pci_device_init(bdf3, iommu);

  ...

  vfio_pci_device_cleanup(device3);
  vfio_pci_device_cleanup(device2);
  vfio_pci_device_cleanup(device1);

  iommu_cleanup(iommu);

To account for the new separation of vfio_pci_device and iommu, update
existing tests to initialize and cleanup a struct iommu.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-7-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
c9756b4d27 vfio: selftests: Introduce struct iommu
Introduce struct iommu, which logically represents either a VFIO
container or an iommufd IOAS, depending on which IOMMU mode is used by
the test.

This will be used in a subsequent commit to allow devices to be added to
the same container/iommufd.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-6-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
dd56ef239d vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode
Rename struct vfio_iommu_mode to struct iommu_mode since the mode can
include iommufd. This also prepares for splitting out all the IOMMU code
into its own structs/helpers/files which are independent from the
vfio_pci_device code.

No function change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-5-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
6282ca8585 vfio: selftests: Allow passing multiple BDFs on the command line
Add support for passing multiple device BDFs to a test via the command
line. This is a prerequisite for multi-device tests.

Single-device tests can continue using vfio_selftests_get_bdf(), which
will continue to return argv[argc - 1] (if it is a BDF string), or the
environment variable $VFIO_SELFTESTS_BDF otherwise.

For multi-device tests, a new helper called vfio_selftests_get_bdfs() is
introduced which will return an array of all BDFs found at the end of
argv[], as well as the number of BDFs found (passed back to the caller
via argument). The array of BDFs returned does not need to be freed by
the caller.

The environment variable VFIO_SELFTESTS_BDF continues to support only a
single BDF for the time being.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-4-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
fa246a1d06 vfio: selftests: Split run.sh into separate scripts
Split run.sh into separate scripts (setup.sh, run.sh, cleanup.sh) to
enable multi-device testing, and prepare for VFIO selftests
automatically detecting which devices to use for testing by storing
device metadata on the filesystem.

 - setup.sh takes one or more BDFs as arguments and sets up each device.
   Metadata about each device is stored on the filesystem in the
   directory:

	   ${TMPDIR:-/tmp}/vfio-selftests-devices

   Within this directory is a directory for each BDF, and then files in
   those directories that cleanup.sh uses to cleanup the device.

 - run.sh runs a selftest by passing it the BDFs of all set up devices.

 - cleanup.sh takes zero or more BDFs as arguments and cleans up each
   device. If no BDFs are provided, it cleans up all devices.

This split enables multi-device testing by allowing multiple BDFs to be
set up and passed into tests:

For example:

  $ tools/testing/selftests/vfio/scripts/setup.sh <BDF1> <BDF2>
  $ tools/testing/selftests/vfio/scripts/setup.sh <BDF3>
  $ tools/testing/selftests/vfio/scripts/run.sh echo
  <BDF1> <BDF2> <BDF3>
  $ tools/testing/selftests/vfio/scripts/cleanup.sh

In the future, VFIO selftests can automatically detect set up devices by
inspecting ${TMPDIR:-/tmp}/vfio-selftests-devices. This will avoid the
need for the run.sh script.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-3-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
David Matlack
2d5dbd3156 vfio: selftests: Move run.sh into scripts directory
Move run.sh in a new sub-directory scripts/. This directory will be used
to house various helper scripts to be used by humans and automation for
running VFIO selftests.

Opportunistically also switch run.sh from TEST_PROGS_EXTENDED to
TEST_FILES. The former is for actual test executables that are just not
run by default. TEST_FILES is a better fit for helper scripts.

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-2-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:58:06 -07:00
Alex Williamson
a63a03afd8 VFIO fixes for v6.18-rc6
- Fix vfio selftests to remove the expectation that the IOMMU
    supports a 64-bit IOVA space.  These manifest both in the original
    set of tests introduced this development cycle in identity mapping
    the IOVA to buffer virtual address space, as well as the more
    recent boundary testing.  Implement facilities for collecting the
    valid IOVA ranges from the backend, implement a simple IOVA
    allocator, and use the information for determining extents.
    (Alex Mastro)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkWXwYRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyK96w//ep97M1B5RSA4/AIP4j+fovnlM2RVUrPy
 RKxVYK+rzftO4Hq8sg1Co3aB68hItbYiw/0BDUgQBFFCEoInQ9zht89tsHhuXq55
 lYq9hyYMnp60FLzpX8Fi2HKf8SKTdD+6P1hX5UWZfYExc/tCD/9FPE2QsfUTbnh/
 u2aguEp0RhgC8xCcR/5EsGFzcBgioDcTBQq5UQ/5By51nUNvdmHKRHKRC7ZmQfiK
 bwFpYseWHRdzdBH2KggNaDZUh0q7Y9+cwL2UndQV5IfwGhGWODztOsVhhoxAAu0n
 c6gXYDg34i5i1epPrr8Yvv8trDzMFCRYJi4g8BsLPOBXe8Lr2bWj/4wA1ftdW6Ha
 0ii4Gx1UO1vz0KZQBajMHv5pO06lJZshzEQgHr3sYgjaRg8AVTV6hI3IVaty2JWk
 DPKPNSGKhjkWvWnx73KFiiepz5zXEL7r7ooFBimtBwFr/5bDB8hfPjpXMh+mhiZu
 QW6DV+Mrqg1G1KDCwdHSvbBwH8IvfX/tQbs9eKvGt+6ICpH1GJ1cUr58vjcc6ESi
 sJzKI0ceTjGZAhHJogO4xAzhI8H6kwtDHB1ZC9QLHIwbkXFN1m64VjOwwXuHk0HT
 559gjqjH0yKNSABwRMV1nKzn9EFnivQn1tAoGkUGSFE9ZI/VfCNNlspPziZEz0z9
 u14SILGMqWU=
 =ywvT
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.18-rc6' into v6.19/vfio/next

Merge mainline vfio-selftest updates for ongoing v6.19 work.

Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:54:22 -07:00
Mickaël Salaün
54f9baf537
selftests/landlock: Add disconnected leafs and branch test suites
Test disconnected directories with two test suites
(layout4_disconnected_leafs and layout5_disconnected_branch) and 43
variants to cover the main corner cases.

These tests are complementary to the previous commit.

Add test_renameat() and test_exchangeat() helpers.

Test coverage for security/landlock is 92.1% of 1927 lines according to
LLVM 20.

Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20251128172200.760753-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-11-28 18:27:07 +01:00
Tingmao Wang
a18ee3f31f
selftests/landlock: Add tests for access through disconnected paths
This adds tests for the edge case discussed in [1], with specific ones
for rename and link operations when the operands are through
disconnected paths, as that go through a separate code path in Landlock.

This has resulted in a warning, due to collect_domain_accesses() not
expecting to reach a different root from path->mnt:

  #  RUN           layout1_bind.path_disconnected ...
  #            OK  layout1_bind.path_disconnected
  ok 96 layout1_bind.path_disconnected
  #  RUN           layout1_bind.path_disconnected_rename ...
  [..] ------------[ cut here ]------------
  [..] WARNING: CPU: 3 PID: 385 at security/landlock/fs.c:1065 collect_domain_accesses
  [..] ...
  [..] RIP: 0010:collect_domain_accesses (security/landlock/fs.c:1065 (discriminator 2) security/landlock/fs.c:1031 (discriminator 2))
  [..] current_check_refer_path (security/landlock/fs.c:1205)
  [..] ...
  [..] hook_path_rename (security/landlock/fs.c:1526)
  [..] security_path_rename (security/security.c:2026 (discriminator 1))
  [..] do_renameat2 (fs/namei.c:5264)
  #            OK  layout1_bind.path_disconnected_rename
  ok 97 layout1_bind.path_disconnected_rename

Move the const char definitions a bit above so that we can use the path
for s4d1 in cleanup code.

Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/027d5190-b37a-40a8-84e9-4ccbc352bcdf@maowtm.org [1]
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20251128172200.760753-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-11-28 18:27:06 +01:00
Catalin Marinas
52c4d1d624 Merge branch 'for-next/sysreg' into for-next/core
* for-next/sysreg:
  : arm64 sysreg updates/cleanups
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64/sysreg: Add ICH_VMCR_EL2
  arm64/sysreg: Move generation of RES0/RES1/UNKN to function
  arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
  arm64/sysreg: Fix checks for incomplete sysreg definitions
  arm64/sysreg: Replace TCR_EL1 field macros
2025-11-28 15:47:53 +00:00
Catalin Marinas
17c05cb0ef Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: arm_spe: Add support for filtering on data source
  perf: Add perf_event_attr::config4
  perf/imx_ddr: Add support for PMU in DB (system interconnects)
  perf/imx_ddr: Get and enable optional clks
  perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
  dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
  arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY
  perf/arm-ni: Fix and optimise register offset calculation
  perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs
  perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister()
  perf/arm-ni: Add NoC S3 support
  perf/arm_cspmu: nvidia: Add pmevfiltr2 support
  perf/arm_cspmu: nvidia: Add revision id matching
  perf/arm_cspmu: Add pmpidr support
  perf/arm_cspmu: Add callback to reset filter config
  perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores

* for-next/misc:
  : Miscellaneous patches
  arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
  arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
  arm64: mm: use untagged address to calculate page index
  arm64: mm: make linear mapping permission update more robust for patial range
  arm64/mm: Elide TLB flush in certain pte protection transitions
  arm64/mm: Rename try_pgd_pgtable_alloc_init_mm
  arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
  arm64: add unlikely hint to MTE async fault check in el0_svc_common
  arm64: acpi: add newline to deferred APEI warning
  arm64: entry: Clean out some indirection
  arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52
  arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()
  arm64: remove unused ARCH_PFN_OFFSET
  arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack
  arm64: Remove assertion on CONFIG_VMAP_STACK

* for-next/kselftest:
  : arm64 kselftest patches
  kselftest/arm64: Align zt-test register dumps

* for-next/efi-preempt:
  : arm64: Make EFI calls preemptible
  arm64/efi: Call EFI runtime services without disabling preemption
  arm64/efi: Move uaccess en/disable out of efi_set_pgd()
  arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper
  arm64/fpsimd: Permit kernel mode NEON with IRQs off
  arm64/fpsimd: Don't warn when EFI execution context is preemptible
  efi/runtime-wrappers: Keep track of the efi_runtime_lock owner
  efi: Add missing static initializer for efi_mm::cpus_allowed_lock

* for-next/assembler-macro:
  : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
  arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
  arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers

* for-next/typos:
  : Random typo/spelling fixes
  arm64: Fix double word in comments
  arm64: Fix typos and spelling errors in comments

* for-next/sme-ptrace-disable:
  : Support disabling streaming mode via ptrace on SME only systems
  kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
  kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
  arm64/sme: Support disabling streaming mode via ptrace on SME only systems

* for-next/local-tlbi-page-reused:
  : arm64, mm: avoid TLBI broadcast if page reused in write fault
  arm64, tlbflush: don't TLBI broadcast if page reused in write fault
  mm: add spurious fault fixing support for huge pmd

* for-next/mpam: (34 commits)
  : Basic Arm MPAM driver (more to follow)
  MAINTAINERS: new entry for MPAM Driver
  arm_mpam: Add kunit tests for props_mismatch()
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add helper to reset saved mbwu state
  arm_mpam: Use long MBWU counters if supported
  arm_mpam: Probe for long/lwd mbwu counters
  arm_mpam: Consider overflow in bandwidth counter state
  arm_mpam: Track bandwidth counter state for power management
  arm_mpam: Add mpam_msmon_read() to read monitor value
  arm_mpam: Add helpers to allocate monitors
  arm_mpam: Probe and reset the rest of the features
  arm_mpam: Allow configuration to be applied and restored during cpu online
  arm_mpam: Use a static key to indicate when mpam is enabled
  arm_mpam: Register and enable IRQs
  arm_mpam: Extend reset logic to allow devices to be reset any time
  arm_mpam: Add a helper to touch an MSC from any CPU
  arm_mpam: Reset MSC controls from cpuhp callbacks
  arm_mpam: Merge supported features during mpam_enable() into mpam_class
  arm_mpam: Probe the hardware features resctrl supports
  arm_mpam: Add helpers for managing the locking around the mon_sel registers
  ...

* for-next/acpi:
  : arm64 acpi updates
  ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()

* for-next/documentation:
  : arm64 Documentation updates
  Documentation/arm64: Fix the typo of register names
2025-11-28 15:47:12 +00:00
Joerg Roedel
0d081b1694 Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'mediatek', 'nvidia/tegra', 'intel/vt-d', 'amd/amd-vi' and 'core' into next 2025-11-28 08:44:21 +01:00
Bibo Mao
0f90fa6e2e KVM: LoongArch: selftests: Add time counter test case
With time counter test, it is to verify that time count starts from 0
and always grows up then.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:48 +08:00
Bibo Mao
4e88240940 KVM: LoongArch: selftests: Add SW emulated timer test case
This test case setup one-shot timer and execute idle instruction
immediately to indicate giving up CPU, hypervisor will emulate SW
hrtimer and wakeup vCPU when SW hrtimer is fired.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:47 +08:00
Bibo Mao
df41742343 KVM: LoongArch: selftests: Add timer interrupt test case
Add timer test case based on common arch_timer code, timer interrupt
with one-shot and period mode is tested.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:44 +08:00
Lorenzo Bianconi
c0bd21682a selftests: netfilter: nft_flowtable.sh: Add the capability to send IPv6 TCP traffic
Introduce the capability to send TCP traffic over IPv6 to
nft_flowtable netfilter selftest.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:07:19 +00:00
Lorenzo Bianconi
fe8313316e selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
Introduce specific selftest for IPIP flowtable SW acceleration in
nft_flowtable.sh

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:51 +00:00
Pasha Tatashin
724bf8c559 selftests/liveupdate: add kexec test for multiple and empty sessions
Introduce a new kexec-based selftest, luo_kexec_multi_session, to validate
the end-to-end lifecycle of a more complex LUO scenario.

While the existing luo_kexec_simple test covers the basic end-to-end
lifecycle, it is limited to a single session with one preserved file. 
This new test significantly expands coverage by verifying LUO's ability to
handle a mixed workload involving multiple sessions, some of which are
intentionally empty.  This ensures that the LUO core correctly preserves
and restores the state of all session types across a reboot.

The test validates the following sequence:

Stage 1 (Pre-kexec):

  - Creates two empty test sessions (multi-test-empty-1,
    multi-test-empty-2).
  - Creates a session with one preserved memfd (multi-test-files-1).
  - Creates another session with two preserved memfds
    (multi-test-files-2), each containing unique data.
  - Creates a state-tracking session to manage the transition to
    Stage 2.
  - Executes a kexec reboot via the helper script.

Stage 2 (Post-kexec):

  - Retrieves the state-tracking session to confirm it is in the
    post-reboot stage.
  - Retrieves all four test sessions (both the empty and non-empty
    ones).
  - For the non-empty sessions, restores the preserved memfds and
    verifies their contents match the original data patterns.
  - Finalizes all test sessions and the state session to ensure a clean
    teardown and that all associated kernel resources are correctly
    released.

This test provides greater confidence in the robustness of the LUO
framework by validating its behavior in a more realistic, multi-faceted
scenario.

Link: https://lkml.kernel.org/r/20251125165850.3389713-19-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:42 -08:00
Pasha Tatashin
a003bdb9ec selftests/liveupdate: add simple kexec-based selftest for LUO
Introduce a kexec-based selftest, luo_kexec_simple, to validate the
end-to-end lifecycle of a Live Update Orchestrator session across a
reboot.

While existing tests verify the uAPI in a pre-reboot context, this test
ensures that the core functionality—preserving state via Kexec Handover
and restoring it in a new kernel—works as expected.

The test operates in two stages, managing its state across the reboot by
preserving a dedicated "state session" containing a memfd.  This mechanism
dogfoods the LUO feature itself for state tracking, making the test
self-contained.

The test validates the following sequence:

Stage 1 (Pre-kexec):
 - Creates a test session (test-session).
 - Creates and preserves a memfd with a known data pattern into the test
   session.
 - Creates the state-tracking session to signal progression to Stage 2.
 - Executes a kexec reboot via a helper script.

Stage 2 (Post-kexec):
 - Retrieves the state-tracking session to confirm it is in the
   post-reboot stage.
 - Retrieves the preserved test session.
 - Restores the memfd from the test session and verifies its contents
   match the original data pattern written in Stage 1.
 - Finalizes both the test and state sessions to ensure a clean
   teardown.

The test relies on a helper script (do_kexec.sh) to perform the reboot and
a shared utility library (luo_test_utils.c) for common LUO operations,
keeping the main test logic clean and focused.

Link: https://lkml.kernel.org/r/20251125165850.3389713-18-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:41 -08:00
Pasha Tatashin
80bab43f6f selftests/liveupdate: add userspace API selftests
Introduce a selftest suite for LUO.  These tests validate the core
userspace-facing API provided by the /dev/liveupdate device and its
associated ioctls.

The suite covers fundamental device behavior, session management, and the
file preservation mechanism using memfd as a test case.  This provides
regression testing for the LUO uAPI.

The following functionality is verified:

Device Access:
    Basic open and close operations on /dev/liveupdate.
    Enforcement of exclusive device access (verifying EBUSY on a
    second open).

Session Management:
    Successful creation of sessions with unique names.
    Failure to create sessions with duplicate names.

File Preservation:
    Preserving a single memfd and verifying its content remains
    intact post-preservation.
    Preserving multiple memfds within a single session, each with
    unique data.
    A complex scenario involving multiple sessions, each containing
    a mix of empty and data-filled memfds.

Note: This test suite is limited to verifying the pre-kexec functionality
of LUO (e.g., session creation, file preservation).  The post-kexec
restoration of resources is not covered, as the kselftest framework does
not currently support orchestrating a reboot and continuing execution in
the new kernel.

Link: https://lkml.kernel.org/r/20251125165850.3389713-17-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:41 -08:00
Pasha Tatashin
03d3963464 kho: make debugfs interface optional
Patch series "liveupdate: Rework KHO for in-kernel users", v9.

This series refactors the KHO framework to better support in-kernel users
like the upcoming LUO.  The current design, which relies on a notifier
chain and debugfs for control, is too restrictive for direct programmatic
use.

The core of this rework is the removal of the notifier chain in favor of a
direct registration API.  This decouples clients from the shutdown-time
finalization sequence, allowing them to manage their preserved state more
flexibly and at any time.

In support of this new model, this series also:
 - Makes the debugfs interface optional.
 - Introduces APIs to unpreserve memory and fixes a bug in the abort
   path where client state was being incorrectly discarded. Note that
   this is an interim step, as a more comprehensive fix is planned as
   part of the stateless KHO work [1].
 - Moves all KHO code into a new kernel/liveupdate/ directory to
   consolidate live update components.


This patch (of 9):

Currently, KHO is controlled via debugfs interface, but once LUO is
introduced, it can control KHO, and the debug interface becomes optional.

Add a separate config CONFIG_KEXEC_HANDOVER_DEBUGFS that enables the
debugfs interface, and allows to inspect the tree.

Move all debugfs related code to a new file to keep the .c files clear of
ifdefs.

Link: https://lkml.kernel.org/r/20251101142325.1326536-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20251101142325.1326536-2-pasha.tatashin@soleen.com
Link: https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com [1]
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:31 -08:00
Bala-Vignesh-Reddy
e6fbd1759c selftests: complete kselftest include centralization
This follow-up patch completes centralization of kselftest.h and
ksefltest_harness.h includes in remaining seltests files, replacing all
relative paths with a non-relative paths using shared -I include path in
lib.mk

Tested with gcc-13.3 and clang-18.1, and cross-compiled successfully on
riscv, arm64, x86_64 and powerpc arch.

[reddybalavignesh9979@gmail.com: add selftests include path for kselftest.h]
  Link: https://lkml.kernel.org/r/20251017090201.317521-1-reddybalavignesh9979@gmail.com
Link: https://lkml.kernel.org/r/20251016104409.68985-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mickael Salaun <mic@digikod.net>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:31 -08:00
Andrew Morton
bc947af677 Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be able
to merge "kho: make debugfs interface optional" into mm-nonmm-stable.
2025-11-27 14:17:02 -08:00
Jakub Kicinski
db4029859d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

net/xdp/xsk.c
  0ebc27a4c6 ("xsk: avoid data corruption on cq descriptor number")
  8da7bea7db ("xsk: add indirect call for xsk_destruct_skb")
  30ed05adca ("xsk: use a smaller new lock for shared pool case")
https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au
https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27 12:19:08 -08:00
Ben Horgan
4138cc63d3 KVM: arm64: selftests: Consider all 7 possible levels of cache
In test_clidr() if an empty cache level is not found then the TEST_ASSERT
will not fire. Fix this by considering all 7 possible levels when iterating
through the hierarchy. Found by inspection.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:16:46 +00:00
Ben Horgan
bf09ee9180 KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
ARM64_FEATURE_FIELD_BITS is set to 4 but not all ID register fields are 4
bits. See for instance ID_AA64SMFR0_EL1. The last user of this define,
ARM64_FEATURE_FIELD_BITS, is the set_id_regs selftest. Its logic assumes
the fields aren't a single bits; assert that's the case and stop using the
define. As there are no more users, ARM64_FEATURE_FIELD_BITS is removed
from the arm64 tools sysreg.h header. A separate commit removes this from
the kernel version of the header.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:16:46 +00:00
Sunday Adelodun
5c9c1e78de selftests: af_unix: remove unused stdlib.h include
The unix_connreset.c test included <stdlib.h>, but no symbol from that
header is used. This causes a fatal build error under certain
linux-next configurations where stdlib.h is not available.

Remove the unused include to fix the build.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202511221800.hcgCKvVa-lkp@intel.com/
Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Link: https://patch.msgid.link/20251125113648.25903-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27 12:42:51 +01:00
Bibo Mao
d84fe2f30b KVM: LoongArch: selftests: Add exception handler register interface
Add interrupt and exception handler register interface. When exception
happens, execute registered exception handler if exists, else report an
error.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Bibo Mao
1c5d3a1eab KVM: LoongArch: selftests: Add basic interfaces
Add some basic function interfaces such as CSR register access, local
irq enable or disable APIs.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Bibo Mao
985a96983b KVM: LoongArch: selftests: Add system registers save/restore on exception
When system returns from exception with ertn instruction, PC comes from
LOONGARCH_CSR_ERA, and CSR.CRMD comes LOONGARCH_CSR_PRMD.

Here save CSR register CSR.ERA and CSR.PRMD into stack, and then restore
them from stack. So it can be modified by exception handlers in future.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-27 11:00:18 +08:00
Willem de Bruijn
c01a6e5b2e selftests/net: packetdrill: pass send_omit_free to MSG_ZEROCOPY tests
The --send_omit_free flag is needed for TCP zero copy tests, to ensure
that packetdrill doesn't free the send() buffer after the send() call.

Fixes: 1e42f73fd3 ("selftests/net: packetdrill: import tcp/zerocopy")
Closes: https://lore.kernel.org/netdev/20251124071831.4cbbf412@kernel.org/
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251125234029.1320984-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26 15:57:55 -08:00
Ankit Khushwaha
af7273cc7a selftests/net: initialize char variable to null
char variable in 'so_txtime.c' & 'txtimestamp.c' were left uninitilized
when switch default case taken. which raises following warning.

	txtimestamp.c:240:2: warning: variable 'tsname' is used uninitialized
	whenever switch default is taken [-Wsometimes-uninitialized]

	so_txtime.c:210:3: warning: variable 'reason' is used uninitialized
	whenever switch default is taken [-Wsometimes-uninitialized]

initializing these variables to NULL to fix this.

Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20251125165302.20079-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26 15:23:27 -08:00
Linus Torvalds
9eb220eddd 8 hotfixes. 4 are cc:stable, 7 are against mm/.
All are singletons - please see the respective changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaSdaWwAKCRDdBJ7gKXxA
 jlRmAP92O8ez4IeVq1VVhVfGVf9Xbjd8zFTykQXuGM/3yDRoTgEA+kLDFTTaJ5Wb
 MGlgQAeFXqlfMZfBljhyzbV8Ubz1BAc=
 =4UwS
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-11-26-11-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "8 hotfixes.  4 are cc:stable, 7 are against mm/.

  All are singletons - please see the respective changelogs for details"

* tag 'mm-hotfixes-stable-2025-11-26-11-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/filemap: fix logic around SIGBUS in filemap_map_pages()
  mm/huge_memory: fix NULL pointer deference when splitting folio
  MAINTAINERS: add test_kho to KHO's entry
  mailmap: add entry for Sam Protsenko
  selftests/mm: fix division-by-zero in uffd-unit-tests
  mm/mmap_lock: reset maple state on lock_vma_under_rcu() retry
  mm/memfd: fix information leak in hugetlb folios
  mm: swap: remove duplicate nr_swap_pages decrement in get_swap_page_of_type()
2025-11-26 12:38:05 -08:00
Matthieu Buffet
e61462232a
selftests/landlock: Fix makefile header list
Make all headers part of make's dependencies computations.
Otherwise, updating audit.h, common.h, scoped_base_variants.h,
scoped_common.h, scoped_multiple_domain_variants.h, or wrappers.h,
re-running make and running selftests could lead to testing stale headers.

Fixes: 6a500b2297 ("selftests/landlock: Add tests for audit flags and domain IDs")
Fixes: fefcf0f7cf ("selftests/landlock: Test abstract UNIX socket scoping")
Fixes: 5147779d5e ("selftests/landlock: Add wrappers.h")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027011440.1838514-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-11-26 20:20:23 +01:00
Jason Gunthorpe
5185c4d8a5 Merge branch 'iommufd_dmabuf' into k.o-iommufd/for-next
Jason Gunthorpe says:

====================
This series is the start of adding full DMABUF support to
iommufd. Currently it is limited to only work with VFIO's DMABUF exporter.
It sits on top of Leon's series to add a DMABUF exporter to VFIO:

   https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com/

The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but
otherwise works the same as it does today for a memfd. The user can select
a slice of the FD to map into the ioas and if the underliyng alignment
requirements are met it will be placed in the iommu_domain.

Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR
memory from VFIO to an iommu_domain controlled by iommufd. This is used
for PCI Peer to Peer support in VMs, and is the last feature that the VFIO
type 1 container has that iommufd couldn't do.

The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime
control and is a use-after-free security problem.

Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there
should be no access to the MMIO it can shoot down the mapping in iommufd
which will unmap it from the iommu_domain. There is no automatic remap,
this is a safety protocol so the kernel doesn't get stuck. Userspace is
expected to know it is doing something that will revoke the dmabuf and
map/unmap it around the activity. Eg when QEMU goes to issue FLR it should
do the map/unmap to iommufd.

Since DMABUF is missing some key general features for this use case it
relies on a "private interconnect" between VFIO and iommufd via the
vfio_pci_dma_buf_iommufd_map() call.

The call confirms the DMABUF has revoke semantics and delivers a phys_addr
for the memory suitable for use with iommu_map().

Medium term there is a desire to expand the supported DMABUFs to include
GPU drivers to support DPDK/SPDK type use cases so future series will work
to add a general concept of revoke and a general negotiation of
interconnect to remove vfio_pci_dma_buf_iommufd_map().

I also plan another series to modify iommufd's vfio_compat to
transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI
of type1.

The latest series for interconnect negotation to exchange a phys_addr is:
 https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com

And the discussion for design of revoke is here:
 https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/
====================

Based on a shared branch with vfio.

* iommufd_dmabuf:
  iommufd/selftest: Add some tests for the dmabuf flow
  iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
  iommufd: Have iopt_map_file_pages convert the fd to a file
  iommufd: Have pfn_reader process DMABUF iopt_pages
  iommufd: Allow MMIO pages in a batch
  iommufd: Allow a DMABUF to be revoked
  iommufd: Do not map/unmap revoked DMABUFs
  iommufd: Add DMABUF to iopt_pages
  vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
  vfio/nvgrace: Support get_dmabuf_phys
  vfio/pci: Add dma-buf export support for MMIO regions
  vfio/pci: Enable peer-to-peer DMA transactions by default
  vfio/pci: Share the core device pointer while invoking feature functions
  vfio: Export vfio device get and put registration helpers
  dma-buf: provide phys_vec to scatter-gather mapping routine
  PCI/P2PDMA: Document DMABUF model
  PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
  PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
  PCI/P2PDMA: Simplify bus address mapping API
  PCI/P2PDMA: Separate the mmap() support from the core logic

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-26 14:04:10 -04:00
Paolo Bonzini
b0bf3d67a7 KVM selftests changes for 6.19:
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
 
  - Forcefully override ARCH from x86_64 to x86 to play nice with specifying
    ARCH=x86_64 on the command line.
 
  - Extend a bunch of nested VMX to validate nested SVM as well.
 
  - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
    verify KVM can save/restore nested VMX state when L1 is using 5-level
    paging, but L2 is not.
 
  - Clean up the guest paging code in anticipation of sharing the core logic for
    nested EPT and nested NPT.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmUnIACgkQOlYIJqCj
 N/0+Zg/+NOklTP01cgj49RDkkYyk43UQBf3VdwL72Yzq4KSHdZMx4Zggj/GbmtI6
 tioIbhprSPvEw880KkSzJTh+BeJHUo0sZT23hG65AeJdWGp/nMo95SHRiW6uB5mX
 g/R6RLsCUyu97rzuL+6MI9P46hBfXHCMvUh55M08UL3WAV3kX3wTh+9JSmW4e+Wc
 d9C+0vtZ2GcODGaD++GVlkVt+cSqPHKhEq9KTrbgIFY5oQKuPpuxaB0HshfBQ79L
 LPJozoA9qJ6fXUFWAt0QGyAXdYMg8Q1sG5PdnLwKvWUBXxYB2JMIbwFIhQqFr/NE
 lZ9bxrtIclQaNrtmpYCufzyNcWl93YserfS3CJY0OtnOW/2Tcic1WEdlw0cUTgpJ
 INKjwzu4CleevxpVsymHz7grZy/3qHcA8c0liX88GTk/o9r/wtK9Hgh6FpVo/6Gx
 DEbyZEw5vNg1mEGFIXtHxHbuyVqoR8/HLsPhMaMSL1R6A4fqVNAFjjxnydSKsUOr
 5gbqZp4R4iRan48iYrsgcb4gQ5bAXzedCEBzVnX7Gxp3wCW56YHqTMBJumBV2Dji
 jdYUKUgV0TCP2xgGaA5HRrTicxHXsr/wS2IojgoqCouvtHhrsDS036DhL2Wm4nf+
 f1I9nAaLdDxFhfB45I2CWjVj4KWkEZMUnr5vTbtWuwiZmVo3vMM=
 =vqnp
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests-6.19' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.19:

 - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.

 - Forcefully override ARCH from x86_64 to x86 to play nice with specifying
   ARCH=x86_64 on the command line.

 - Extend a bunch of nested VMX to validate nested SVM as well.

 - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
   verify KVM can save/restore nested VMX state when L1 is using 5-level
   paging, but L2 is not.

 - Clean up the guest paging code in anticipation of sharing the core logic for
   nested EPT and nested NPT.
2025-11-26 09:35:40 +01:00
Paolo Bonzini
236831743c KVM guest_memfd changes for 6.19:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
    rough edges in guest_memfd along the way.
 
  - Define a CLASS to automatically handle get+put when grabbing a guest_memfd
    from a memslot to make it harder to leak references.
 
  - Enhance KVM selftests to make it easer to develop and debug selftests like
    those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
    often result in hard-to-debug SIGBUS errors.
 
  - Misc cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmSgcACgkQOlYIJqCj
 N/0MBQ/+KzI/6q6AR2m9jYCbz8APcJpmAEJ4Ma0+Q4XrJfkymAmt9P4M46bRZl5G
 Aznaqq9vak1weIzlaFsOzYqEWvSN54P/EfZKkqh0kCDKXGzl1HkSCC7FeThyqZz0
 +uuWUiRtWI/dyNFEpIXB/G06DwqhMIKAlk421Zt84iBI/wz2oZeAgWoFCjPWca4a
 /L/ClmpzM6LnP/Hg2DyoZtfwAXIy65pb9h0IhKbvGcgSrS4sesZPiSV20KKvKVSp
 4+WVLHuNbjk9vWKkmV8IZH+BXAO2J2+y2JYckbx4DvUKQcauXUJjFjp8+wZ4gMrC
 SK3SuWTTc3oe7fhaZZ98KY3BO9dq68iCbxpmdkhYrEbNqcDbQA5GMhIUZ2TgqDX7
 KQ58s9zyHZvsX4cnL3XY9igsdl6tvqnFqVhGyBaxUrWNJDqAs3NPJr8Td3cnrMKg
 MRB2AaJN8xX6DK7JAxKQ4LC0Nb7w3hxF0t0XL2XzBw+6nsu67NjrM7EflIhG+mHJ
 YWhrnNh83Yut95cym5mpUU0kFqfHNB5SxRGGokDcQ1NAXBwwE2Tq+YopR9OKRfiZ
 52/hkC195D7Xaeo9raf6iKN+YfiZ0uj3TvxuxIHs/EIqmmjhsc8VSwkfFW5OIeYf
 Tulmh/ohkq1675By/dapyHvW97PSgd0IqbDzjDrlxmbZbuuS1F8=
 =j3On
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-gmem-6.19' of https://github.com/kvm-x86/linux into HEAD

KVM guest_memfd changes for 6.19:

 - Add NUMA mempolicy support for guest_memfd, and clean up a variety of
   rough edges in guest_memfd along the way.

 - Define a CLASS to automatically handle get+put when grabbing a guest_memfd
   from a memslot to make it harder to leak references.

 - Enhance KVM selftests to make it easer to develop and debug selftests like
   those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
   often result in hard-to-debug SIGBUS errors.

 - Misc cleanups.
2025-11-26 09:32:44 +01:00
Kuniyuki Iwashima
ebe2f0b3cf selftest: af_unix: Extend recv() timeout in so_peek_off.c.
so_peek_off.c is reported to be flaky on NIPA:

  # # so_peek_off.c:149:two_chunks_overlap_blocking:Expected -1 (-1) != bytes (-1)
  # # two_chunks_overlap_blocking: Test terminated by assertion
  # #          FAIL  so_peek_off.stream.two_chunks_overlap_blocking

The test fork()s a child process to send() data after 1ms to
wake up the parent process being blocked (up to 3ms) on recv().

But, from the log, the parent woke up after 3ms timeout, so it
could be too short when the host is overloaded.

Let's extend it to 5s.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251124070722.1e828c53@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124212805.486235-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:52:28 -08:00
Kuniyuki Iwashima
adb6b68c50 selftest: af_unix: Create its own .gitignore.
Somehow AF_UNIX tests have reused ../.gitignore,
but now NIPA warns about it.

Let's create .gitignore under af_unix/.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124212805.486235-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:52:28 -08:00
Eric Dumazet
9a5e5334ad tcp: remove icsk->icsk_retransmit_timer
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Eric Dumazet
08dfe37023 tcp: introduce icsk->icsk_keepalive_timer
sk->sk_timer has been used for TCP keepalives.

Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.

Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.

Added space is reclaimed in the following patch.

This includes changes to MPTCP, which was also using sk_timer.

Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Michal Luczaj
b796632fc8 vsock/test: Extend transport change null-ptr-deref test
syzkaller reported a lockdep lock order inversion warning[1] due to
commit 687aa0c558 ("vsock: Fix transport_* TOCTOU"). This was fixed in
commit f7c877e753 ("vsock: fix lock inversion in
vsock_assign_transport()").

Redo syzkaller's repro by piggybacking on a somewhat related test
implemented in commit 3a764d9338 ("vsock/test: Add test for null ptr
deref when transport changes").

[1]: https://lore.kernel.org/netdev/68f6cdb0.a70a0220.205af.0039.GAE@google.com/

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251123-vsock_test-linger-lockdep-warn-v1-1-4b1edf9d8cdc@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:16:21 -08:00
Kumar Kartikeya Dwivedi
88337b587b selftests/bpf: Make CS length configurable for rqspinlock stress test
Allow users to configure the critical section delay for both task/normal
and NMI contexts, and set to 20ms and 10ms as before by default.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25 15:30:14 -08:00
Kumar Kartikeya Dwivedi
6173c1d620 selftests/bpf: Add lock wait time stats to rqspinlock stress test
Add statistics per-CPU broken down by context and various timing windows
for the time taken to acquire an rqspinlock. Cases where all
acquisitions fit into the 10ms window are skipped from printing,
otherwise the full breakdown is displayed when printing the summary.
This allows capturing precisely the number of times outlier attempts
happened for a given lock in a given context.

A critical detail is that time is captured regardless of success or
failure, which is important to capture events for failed but long
waiting timeout attempts.

Output:

[   64.279459] rqspinlock acquisition latency histogram (ms):
[   64.279472]  cpu1: total 528426 (normal 526559, nmi 1867)
[   64.279477]    0-1ms: total 524697 (normal 524697, nmi 0)
[   64.279480]    2-2ms: total 3652 (normal 1811, nmi 1841)
[   64.279482]    3-3ms: total 66 (normal 47, nmi 19)
[   64.279485]    4-4ms: total 2 (normal 1, nmi 1)
[   64.279487]    5-5ms: total 1 (normal 1, nmi 0)
[   64.279489]    6-6ms: total 1 (normal 0, nmi 1)
[   64.279490]    101-150ms: total 1 (normal 0, nmi 1)
[   64.279492]    >= 251ms: total 6 (normal 2, nmi 4)
...

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25 15:30:14 -08:00
Kumar Kartikeya Dwivedi
224de8d5a3 selftests/bpf: Relax CPU requirements for rqspinlock stress test
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is
calculated nicely by adding to the mode enum. Enables running single CPU
AA tests.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25 15:30:13 -08:00
Menglong Dong
f2cb0660ac selftests/bpf: Call bpf_get_numa_node_id() in trigger_count()
The bench test "trig-kernel-count" can be used as a baseline comparison
for fentry and other benchmarks, and the calling to bpf_get_numa_node_id()
should be considered as composition of the baseline. So, let's call it in
trigger_count(). Meanwhile, rename trigger_count() to
trigger_kernel_count() to make it easier understand.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
2025-11-25 14:32:50 -08:00
Jason Gunthorpe
d2041f1f11 iommufd/selftest: Add some tests for the dmabuf flow
Basic tests of establishing a dmabuf and revoking it. The selftest kernel
side provides a basic small dmabuf for this testing.

Link: https://patch.msgid.link/r/9-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25 11:30:16 -04:00
Jakub Kicinski
e254c212cd selftests: af_unix: don't use SKIP for expected failures
netdev CI reserves SKIP in selftests for cases which can't be executed
due to setup issues, like missing or old commands. Tests which are
expected to fail must use XFAIL.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251123021601.158709-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 19:07:51 -08:00
Andre Carvalho
00f3b32518 selftests: netconsole: ensure required log level is set on netcons_basic
This commit ensures that the required log level is set at the start of
the test iteration.

Part of the cleanup performed at the end of each test iteration resets
the log level (do_cleanup in lib_netcons.sh) to the values defined at the
time test script started. This may cause further test iterations to fail
if the default values are not sufficient.

Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251121-netcons-basic-loglevel-v1-1-577f8586159c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 18:52:20 -08:00
Jakub Kicinski
5aadc15584 selftests: hw-net: toeplitz: give the test up to 4 seconds
Increase the receiver timeout. When running between machines
in different geographic regions the test needs more than
a second to SSH across and send the frames.

The bkg() command that runs the receiver defaults to 5 sec timeout,
so using 4 sec sounds like a reasonable value for the receiver itself.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 18:51:41 -08:00
Jakub Kicinski
c0105ffc50 selftests: hw-net: toeplitz: read indirection table from the device
Replace the simple modulo math with the real indirection table
read from the device. This makes the tests pass for mlx5 and
bnxt NICs.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 18:51:41 -08:00
Jakub Kicinski
aa91dbf3ed selftests: hw-net: toeplitz: read the RSS key directly from C
Now that we have YNL support for RSS accessing the RSS info from
C is very easy. Instead of passing the RSS key from Python do it
directly in the C code.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 18:51:40 -08:00
Jakub Kicinski
27c512af19 selftests: hw-net: toeplitz: make sure NICs have pure Toeplitz configured
Make sure that the NIC under test is configured for pure Toeplitz
hashing, and no input key transform (no symmetric hashing).

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 18:51:40 -08:00
Jakub Kicinski
f81171fecd selftests: hw-net: auto-disable building the iouring C code
Looks like the liburing is not updated by distros very aggressively.
Presumably because a lot of packages depend on it. I just updated
to Fedora 43 and it's still on liburing 2.9. The test is 9mo old,
at this stage I think this warrants handling the build failure
more gracefully.

Detect if iouring is recent enough and if not print a warning
and exclude the C prog from build. The Python test will just
fail since the binary won't exist. But it removes the major
annoyance of having to update liburing from sources when
developing other tests.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24 18:51:40 -08:00
Saket Kumar Bhaskar
590699d858 selftests/bpf: Fix htab_update/reenter_update selftest failure
Since commit 31158ad02d ("rqspinlock: Add deadlock detection
and recovery") the updated path on re-entrancy now reports deadlock
via -EDEADLK instead of the previous -EBUSY.

Also, the way reentrancy was exercised (via fentry/lookup_elem_raw)
has been fragile because lookup_elem_raw may be inlined
(find_kernel_btf_id() will return -ESRCH).

To fix this fentry is attached to bpf_obj_free_fields() instead of
lookup_elem_raw() and:

- The htab map is made to use a BTF-described struct val with a
  struct bpf_timer so that check_and_free_fields() reliably calls
  bpf_obj_free_fields() on element replacement.

- The selftest is updated to do two updates to the same key (insert +
  replace) in prog_test.

- The selftest is updated to align with expected errno with the
  kernel’s current behavior.

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20251117060752.129648-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24 16:48:28 -08:00
Lorenzo Stoakes
ccf9eb326b tools/testing/vma: add missing stub
vm_flags_reset() is not available in the userland VMA tests, so add a stub
which const-casts vma->vm_flags and avoids the upcoming removal of the
vma->__vm_flags field.

Link: https://lkml.kernel.org/r/4aff8bf7-d367-4ba3-90ad-13eef7a063fa@lucifer.local
Fixes: c5c67c1de357 ("tools/testing/vma: eliminate dependency on vma->__vm_flags")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:55 -08:00
Chunyan Zhang
277a1ae387 mm: softdirty: add pgtable_supports_soft_dirty()
Patch series "mm: Add soft-dirty and uffd-wp support for RISC-V", v15.

This patchset adds support for Svrsw60t59b [1] extension which is ratified
now, also add soft dirty and userfaultfd write protect tracking for
RISC-V.

The patches 1 and 2 add macros to allow architectures to define their own
checks if the soft-dirty / uffd_wp PTE bits are available, in other words
for RISC-V, the Svrsw60t59b extension is supported on which device the
kernel is running.  Also patch1-2 are removing "ifdef
CONFIG_MEM_SOFT_DIRTY" "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef
CONFIG_PTE_MARKER_UFFD_WP" in favor of checks which if not overridden by
the architecture, no change in behavior is expected.

This patchset has been tested with kselftest mm suite in which soft-dirty,
madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass, and
no regressions are observed in any of the other tests.


This patch (of 6):

Some platforms can customize the PTE PMD entry soft-dirty bit making it
unavailable even if the architecture provides the resource.

Add an API which architectures can define their specific implementations
to detect if soft-dirty bit is available on which device the kernel is
running.

This patch is removing "ifdef CONFIG_MEM_SOFT_DIRTY" in favor of
pgtable_supports_soft_dirty() checks that defaults to
IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), if not overridden by the architecture,
no change in behavior is expected.

We make sure to never set VM_SOFTDIRTY if !pgtable_supports_soft_dirty(),
so we will never run into VM_SOFTDIRTY checks.

[lorenzo.stoakes@oracle.com: fix VMA selftests]
  Link: https://lkml.kernel.org/r/dac6ddfe-773a-43d5-8f69-021b9ca4d24b@lucifer.local
Link: https://lkml.kernel.org/r/20251113072806.795029-1-zhangchunyan@iscas.ac.cn
Link: https://lkml.kernel.org/r/20251113072806.795029-2-zhangchunyan@iscas.ac.cn
Link: https://github.com/riscv-non-isa/riscv-iommu/pull/543 [1]
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:54 -08:00
Peng Li
3e700b715e selftests/mm: gup_test: fix comment regarding origin of FOLL_WRITE
The 'FOLL_WRITE' of the copied source is located in mm_types.h of mm, not
mm.h, so fix it.

Link: https://lkml.kernel.org/r/20251117154012.197499-2-peng8420.li@gmail.com
Signed-off-by: Peng Li <peng8420.li@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Peng Li
218fbfad16 selftests/mm: gup_test: stop testing FOLL_TOUCH
commit 0f20bba168 ("mm/gup: explicitly define and check internal GUP
flags, disallow FOLL_TOUCH") marked FOLL_TOUCH as a GUP-internal flag.

This causes a warning to fire when running gup_test, for example:

    $ ./gup_test -L -r 100 -z

dmesg:
    WARNING: CPU: 1 PID: 117 at mm/gup.c:2512 is_valid_gup_args+0x66/0x8c

Therefore, remove the "FOLL_TOUCH" test code from gup_test.c.

Link: https://lkml.kernel.org/r/20251117154012.197499-1-peng8420.li@gmail.com
Signed-off-by: Peng Li <peng8420.li@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:53 -08:00
Balbir Singh
271a7b2e3c selftests/mm/hmm-tests: new throughput tests including THP
Add new benchmark style support to test transfer bandwidth for zone device
memory operations.

Link: https://lkml.kernel.org/r/20251001065707.920170-16-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Matthew Brost
24c2c5b8ff selftests/mm/hmm-tests: partial unmap, mremap and anon_write tests
Add partial unmap test case which munmaps memory while in the device.

Add tests exercising mremap on faulted-in memory (CPU and GPU) at various
offsets and verify correctness.

Update anon_write_child to read device memory after fork verifying this
flow works in the kernel.

Both THP and non-THP cases are updated.

Link: https://lkml.kernel.org/r/20251001065707.920170-15-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Balbir Singh
519071529d selftests/mm/hmm-tests: new tests for zone device THP migration
Add new tests for migrating anon THP pages, including anon_huge,
anon_huge_zero and error cases involving forced splitting of pages during
migration.

Link: https://lkml.kernel.org/r/20251001065707.920170-14-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 15:08:49 -08:00
Andrew Morton
87fcafc4e2 Merge branch 'mm-hotfixes-stable' into mm-stable in order to merge
"mm/huge_memory: only get folio_order() once during __folio_split()" into
mm-stable.
2025-11-24 15:07:34 -08:00
Marc Zyngier
de88423277 KVM: arm64: selftests: vgic_irq: Add timer deactivation test
Add a new test case that triggers the HW deactivation emulation path
when trapping ICV_DIR_EL1. This is obviously tied to the way KVM
works now, but the test follows the expected architectural behaviour.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-50-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
1c9c71ac1b KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
Add a new test case that inject a Group-0 interrupt together
with a bunch of Group-1 interrupts, Ack/EOI the G1 interrupts,
and only then enable G0, expecting to get the G0 interrupt.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-49-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
d2dee2e849 KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
Add a new test case that makes an interrupt pending on a vcpu,
activates it, do the priority drop, and then get *another* vcpu
to do the deactivation.

Special care is taken not to trigger an exit in the process, so
that we are sure that the active interrupt is in an LR. Joy.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-48-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
b6c68612ab KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
When EOImode==1, perform the deactivation in the order of activation,
just to make things a bit worse for KVM. Yes, I'm nasty.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-47-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
fd5fa1c8d0 KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
Good news: our GIC emulation is not completely broken, and we can
activate as many interrupts as we want.

Bump the test to cover all the SGIs, all the allowed PPIs, and
31 SPIs. Yes, 31, because we have 31 available priorities, and the
test is not happy with having two interrupts with the same priority.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-46-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
5053c2ab92 KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
The PPI injection API is clear that you can't inject the timer PPIs
from userspace, since they are controlled by the timers themselves.

Add an exclusion list for this purpose.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-45-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
8b7888c511 KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
The architecture is pretty clear that changing the configuration of
an enable interrupt is not OK. It doesn't really matter here, but
doing the right thing is not more expensive.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-44-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:15 -08:00
Marc Zyngier
27392612c8 KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
No, 0 is not a spurious INTID. Never been, never was.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-43-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
2366295c76 KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
Make sure G0 is disabled at the point of initialising the GIC.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-42-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Marc Zyngier
a1650de7c1 KVM: arm64: selftests: gic_v3: Add irq group setting helper
Being able to set the group of an interrupt is pretty useful.
Add such a helper.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-41-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:14 -08:00
Carlos Llamas
f0bb6dba3d selftests/mm: fix division-by-zero in uffd-unit-tests
Commit 4dfd4bba85 ("selftests/mm/uffd: refactor non-composite global
vars into struct") moved some of the operations previously implemented in
uffd_setup_environment() earlier in the main test loop.

The calculation of nr_pages, which involves a division by page_size, now
occurs before checking that default_huge_page_size() returns a non-zero
This leads to a division-by-zero error on systems with !CONFIG_HUGETLB.

Fix this by relocating the non-zero page_size check before the nr_pages
calculation, as it was originally implemented.

Link: https://lkml.kernel.org/r/20251113034623.3127012-1-cmllamas@google.com
Fixes: 4dfd4bba85 ("selftests/mm/uffd: refactor non-composite global vars into struct")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24 14:25:17 -08:00
Alan Maguire
ad93ba0267 selftests/bpf: Allow selftests to build with older xxd
Currently selftests require xxd with the "-n <name>" option
which allows the user to specify a name not derived from
the input object path.  Instead of relying on this newer
feature, older xxd can be used if we link our desired name
("test_progs_verification_cert") to the input object.

Many distros ship xxd in vim-common package and do not have
the latest xxd with -n support.

Fixes: b720903e2b ("selftests/bpf: Enable signature verification for some lskel tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20251120084754.640405-3-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24 10:00:16 -08:00
Anup Patel
d1c5620781 KVM: riscv: selftests: Add SBI MPXY extension to get-reg-list
The KVM RISC-V allows SBI MPXY extensions for Guest/VM so add
it to the get-reg-list test.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-5-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24 09:55:36 +05:30
Thomas Weißschuh
1d57346474 selftests/nolibc: error out on linker warnings
If the linker emits warnings these should abort the build.
Otherwise they will be swallowed by run-tests.sh and not shown.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-22 12:35:12 +01:00
Thomas Weißschuh
682bf67529 selftests/nolibc: use lld to link loongarch binaries
LLVM 21 switched to -mcmodel=medium for LoongArch64 compilations.
This code model uses R_LARCH_ECALL36 relocations which might not be
supported by GNU ld which to nolibc testsuite uses by default.
ld will not resolve the relocation and all function calls will end up
as busy loops.

Use lld instead.

We can not switch to lld for all LLVM builds, as it does not support all
necessary architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
2025-11-22 12:35:02 +01:00
Puranjay Mohan
cf49ec5705 selftests: bpf: Add tests for unbalanced rcu_read_lock
As verifier now supports nested rcu critical sections, add new test
cases to make sure unbalanced usage of rcu_read_lock()/unlock() is
rejected.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 18:34:59 -08:00
Puranjay Mohan
4167096cb9 bpf: support nested rcu critical sections
Currently, nested rcu critical sections are rejected by the verifier and
rcu_lock state is managed by a boolean variable. Add support for nested
rcu critical sections by make active_rcu_locks a counter similar to
active_preempt_locks. bpf_rcu_read_lock() increments this counter and
bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition
happens when active_rcu_locks drops to 0.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 18:34:59 -08:00
Eduard Zingerman
8f7cf305a1 bpf: test the correct stack liveness of tail calls
A new test is added: caller_stack_write_tail_call tests that the live
stack is correctly tracked for a tail call.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Link: https://lore.kernel.org/r/20251119160355.1160932-5-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 17:45:30 -08:00
Martin Teichmann
978da762ea bpf: test the proper verification of tail calls
Three tests are added:

- invalidate_pkt_pointers_by_tail_call checks that one can use the
  packet pointer after a tail call. This was originally possible
  and also poses not problems, but was made impossible by 1a4607ffba.

- invalidate_pkt_pointers_by_static_tail_call tests a corner case
  found by Eduard Zingerman during the discussion of the original fix,
  which was broken in that fix.

- subprog_result_tail_call tests that precision propagation works
  correctly across tail calls. This did not work before.

Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251119160355.1160932-3-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 17:45:30 -08:00
Xing Guo
b7f7d76d6e selftests/bpf: Update test_tag to use sha256
commit 603b441623 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
changed digest of prog_tag to SHA256 but forgot to update tests
correspondingly. Fix it.

Fixes: 603b441623 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Link: https://lore.kernel.org/r/20251121061458.3145167-1-higuoxing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:56:08 -08:00
Matt Bobrowski
ae24fc8a16 selftests/bpf: Improve reliability of test_perf_branches_no_hw()
Currently, test_perf_branches_no_hw() relies on the busy loop within
test_perf_branches_common() being slow enough to allow at least one
perf event sample tick to occur before starting to tear down the
backing perf event BPF program. With a relatively small fixed
iteration count of 1,000,000, this is not guaranteed on modern fast
CPUs, resulting in the test run to subsequently fail with the
following:

bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:PASS:output not valid 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
test_perf_branches_no_hw:PASS:perf_event_open 0 nsec
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_bad_sample:FAIL:output not valid no valid sample from prog
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.

On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1
million increments of a volatile integer can take significantly less
than 1 millisecond. If the spin loop and detachment of the perf event
BPF program elapses before the first 1 ms sampling interval elapses,
the perf event will never end up firing. Fix this by bumping the loop
iteration counter a little within test_perf_branches_common(), along
with ensuring adding another loop termination condition which is
directly influenced by the backing perf event BPF program
executing. Notably, a concious decision was made to not adjust the
sample_freq value as that is just not a reliable way to go about
fixing the problem. It effectively still leaves the race window open.

Fixes: 67306f84ca ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:49:16 -08:00
Matt Bobrowski
27746aaf1b selftests/bpf: skip test_perf_branches_hw() on unsupported platforms
Gracefully skip the test_perf_branches_hw subtest on platforms that
do not support LBR or require specialized perf event attributes
to enable branch sampling.

For example, AMD's Milan (Zen 3) supports BRS rather than traditional
LBR. This requires specific configurations (attr.type = PERF_TYPE_RAW,
attr.config = RETIRED_TAKEN_BRANCH_INSTRUCTIONS) that differ from the
generic setup used within this test. Notably, it also probably doesn't
hold much value to special case perf event configurations for selected
micro architectures.

Fixes: 67306f84ca ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251120142059.2836181-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:47:58 -08:00
Puranjay Mohan
d8774a3623 selftests: bpf: Enable gotox tests from arm64
arm64 JIT now supports gotox instruction and jumptables, so run tests in
verifier_gotox.c for arm64.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251117130732.11107-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:40:21 -08:00
Hoyeon Lee
db354a1577 selftests/bpf: Use sockaddr_storage instead of sa46 in select_reuseport test
The select_reuseport selftest uses a custom sa46 union to represent
IPv4 and IPv6 addresses. This custom wrapper requires extra manual
handling for address family and field extraction.

Replace sa46 with sockaddr_storage and update the helper functions to
operate on native socket structures. This simplifies the code and
removes unnecessary custom address-handling logic. No functional
changes intended.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-3-hoyeon.lee@suse.com
2025-11-21 10:46:31 -08:00
Hoyeon Lee
fd6ed07a05 selftests/bpf: Use sockaddr_storage directly in cls_redirect test
The cls_redirect test uses a custom addr_port/tuple wrapper to represent
IPv4/IPv6 addresses and ports. This custom wrapper requires extra
conversion logic and specific helpers such as fill_addr_port(), which
are no longer necessary when using standard socket address structures.

This commit replaces addr_port/tuple with the standard sockaddr_storage
so test handles address families and ports using native socket types.
It removes the custom helper, eliminates redundant casts, and simplifies
the setup helpers without functional changes. set_up_conn() and
build_input() now take src/dst sockaddr_storage directly.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-2-hoyeon.lee@suse.com
2025-11-21 10:46:01 -08:00
Yosry Ahmed
d2e50389ab KVM: selftests: Make sure vm->vpages_mapped is always up-to-date
Call paths leading to __virt_pg_map() are currently:
(a) virt_pg_map() -> virt_arch_pg_map() -> __virt_pg_map()
(b) virt_map_level() -> __virt_pg_map()

For (a), calls to virt_pg_map() from kvm_util.c make sure they update
vm->vpages_mapped, but other callers do not. Move the sparsebit_set()
call into virt_pg_map() to make sure all callers are captured.

For (b), call sparsebit_set_num() from virt_map_level().

It's tempting to have a single the call inside __virt_pg_map(), however:
- The call path in (a) is not x86-specific, while (b) is. Moving the
  call into __virt_pg_map() would require doing something similar for
  other archs implementing virt_pg_map().

- Future changes will reusue __virt_pg_map() for nested PTEs, which should
  not update vm->vpages_mapped, i.e. a triple underscore version that does
  not update vm->vpages_mapped would need to be provided.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-12-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-21 10:17:05 -08:00
Yosry Ahmed
1de4dc15ba KVM: selftests: Stop using __virt_pg_map() directly in tests
Replace __virt_pg_map() calls in tests by high-level equivalent
functions, removing some loops in the process.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-11-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-21 10:15:14 -08:00
Janosch Frank
8e8678e740 KVM: s390: Add capability that forwards operation exceptions
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation
exceptions to user space. This also includes the 0x0000 instructions
managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants
to emulate instructions which do not (yet) have an opcode.

While we're at it refine the documentation for
KVM_CAP_S390_USER_INSTR0.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-21 10:26:03 +01:00
Daniel Zahka
8be656cfb9 selftest: netdevsim: test devlink default params
Test querying default values and resetting to default values for
netdevsim devlink params.

This should cover the basic paths of interest: driverinit and
non-driverinit cmodes, as well as bool and non-bool value
type. Default param values of type bool are encoded with u8 netlink
type as opposed to flag type, so that userspace can distinguish
"not-present" from false.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-7-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 19:01:23 -08:00
Gustavo Luiz Duarte
5ad9945341 netconsole: Increase MAX_USERDATA_ITEMS
Increase MAX_USERDATA_ITEMS from 16 to 256 entries now that the userdata
buffer is allocated dynamically.

The previous limit of 16 was necessary because the buffer was statically
allocated for all targets. With dynamic allocation, we can support more
entries without wasting memory on targets that don't use userdata.

This allows users to attach more metadata to their netconsole messages,
which is useful for complex debugging and logging scenarios.

Also update the testcase accordingly.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-4-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:47:18 -08:00
Jakub Kicinski
bd28e5bddc selftests: net: remove old setup_* scripts
gro.sh and toeplitz.sh used to source in one of two setup scripts
depending on whether the test was expected to be run against
veth or a real device. veth testing is replaced by netdevsim
and existing "remote endpoint" support in our Python tests.
Add a script which sets up loopback mode.

The usage is a little bit more complicated than running
the scripts used to be. Testing used to work like this:

  ./../gro.sh -i eth0 ...

now the "setup script" has to be run explicitly:

  NETIF=eth0 ./../ksft_setup_loopback.sh ./../gro.sh

But the functionality itself is retained.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:31 -08:00
Jakub Kicinski
9cf9aa77a1 selftests: drv-net: hw: convert the Toeplitz test to Python
Rewrite the existing toeplitz.sh test in Python. The conversion
is a lot less exact than the GRO one. We use Netlink APIs to
get the device RSS and IRQ information. We expect that the device
has neither RPS nor RFS configured, and set RPS up as part of
the test.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:30 -08:00
Jakub Kicinski
fdb0267d56 selftests: drv-net: add a Python version of the GRO test
Rewrite the existing gro.sh test in Python. The conversion
not exact, the changes are related to integrating the test
with our "remote endpoint" paradigm. The test now reads
the IP addresses from the user config. It resolves the MAC
address (including running over Layer 3 networks).

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:30 -08:00
Jakub Kicinski
15011a57d0 selftests: net: py: read ip link info about remote dev
We're already saving the info about the local dev in env.dev
for the tests, save remote dev as well. This is more symmetric,
env generally provides the same info for local and remote end.

While at it make sure that we reliably get the detailed info
about the local dev. nsim used to read the dev info without -d.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:29 -08:00
Jakub Kicinski
e02b52ecef selftests: net: py: support ksft ready without wait
There's a common synchronization problem when a script (Python test)
uses a C program to set up some state (usually start a receiving
process for traffic). The script needs to know when the process
has fully initialized. The inverse of the problem exists for shutting
the process down - we need a reliable way to tell the process to exit.

We added helpers to do this safely in
commit 7147713799 ("selftests: drv-net: add a way to wait for a local process")
unfortunately the two operations (wait for init, and shutdown) are
controlled by a single parameter (ksft_wait). Add support for using
ksft_ready without using the second fd for exit.

This is useful for programs which wait for a specific number of packets
to rx so exit_wait is a good match, but we still need to wait for init.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251120021024.2944527-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:29 -08:00
Jakub Kicinski
89268f7dbc selftests: net: relocate gro and toeplitz tests to drivers/net
The GRO test can run on a real device or a veth.
The Toeplitz hash test can only run on a real device.
Move them from net/ to drivers/net/ and drivers/net/hw/ respectively.

There are two scripts which set up the environment for these tests
setup_loopback.sh and setup_veth.sh. Move those scripts to net/lib.
The paths to the setup files are a little ugly but they will be
deleted shortly.

toeplitz_client.sh is not a test in itself, but rather a helper
to send traffic, so add it to TEST_FILES rather than TEST_PROGS.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:29 -08:00
Jakub Kicinski
173227d7d6 selftests: drv-net: xdp: use variants for qstat tests
Use just-added ksft variants for XDP qstat tests.

While at it correct the number of packets, we're sending
1000 packets now.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:29 -08:00
Jakub Kicinski
6ae67f1159 selftests: net: py: add test variants
There's a lot of cases where we try to re-run the same code with
different parameters. We currently need to either use a generator
method or create a "main" case implementation which then gets called
by trivial case functions:

  def _test(x, y, z):
     ...

  def case_int():
     _test(1, 2, 3)

  def case_str():
     _test('a', 'b', 'c')

Add support for variants, similar to kselftests_harness.h and
a lot of other frameworks. Variants can be added as decorator
to test functions:

  @ksft_variants([(1, 2, 3), ('a', 'b', 'c')])
  def case(x, y, z):
     ...

ksft_run() will auto-generate case names:
  case.1_2_3
  case.a_b_c

Because the names may not always be pretty (and to avoid forcing
classes to implement case-friendly __str__()) add a wrapper class
KsftNamedVariant which lets the user specify the name for the variant.

Note that ksft_run's args are still supported. ksft_run splices args
and variant params together.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251120021024.2944527-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:29 -08:00
Jakub Kicinski
80970e0fc0 selftests: net: py: extract the case generation logic
In preparation for adding test variants move the test case
collection logic to a dedicated function. New helper returns

 (function, args, name, )

tuples. The main test loop can simply run them, not much
logic or discernment needed.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:29 -08:00
Jakub Kicinski
5cb7b71b76 selftests: net: py: coding style improvements
We're about to add more features here and finding new issues with old
ones in place is hard. Address ruff checks:
 - bare exceptions
 - f-string with no params
 - unused import

We need to use BaseException when handling defer(), as Petr points out.
This retains the old behavior of ignoring SIGTERM while running cleanups.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251120021024.2944527-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 18:19:28 -08:00
Jim Mattson
6a8818de21 KVM: selftests: Add a VMX test for LA57 nested state
Add a selftest that verifies KVM's ability to save and restore
nested state when the L1 guest is using 5-level paging and the L2
guest is using 4-level paging. Specifically, canonicality tests of
the VMCS12 host-state fields should accept 57-bit virtual addresses.

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-5-jmattson@google.com
[sean: rename to vmx_nested_la57_state_test to prep nested_<test> namespace]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:21:52 -08:00
Jim Mattson
ec5806639e KVM: selftests: Change VM_MODE_PXXV48_4K to VM_MODE_PXXVYY_4K
Use 57-bit addresses with 5-level paging on hardware that supports
LA57. Continue to use 48-bit addresses with 4-level paging on hardware
that doesn't support LA57.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-4-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:59 -08:00
Jim Mattson
2103a8baf5 KVM: selftests: Use a loop to walk guest page tables
Walk the guest page tables via a loop when searching for a PTE,
instead of using unique variables for each level of the page tables.

This simplifies the code and makes it easier to support 5-level paging
in the future.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:59 -08:00
Jim Mattson
ae5b498b8d KVM: selftests: Use a loop to create guest page tables
Walk the guest page tables via a loop when creating new mappings,
instead of using unique variables for each level of the page tables.

This simplifies the code and makes it easier to support 5-level paging
in the future.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:59 -08:00
Yosry Ahmed
ff736dba47 KVM: selftests: Remove the unused argument to prepare_eptp()
eptp_memslot is unused, remove it. No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-10-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:57 -08:00
Yosry Ahmed
28b2dced8b KVM: selftests: Stop hardcoding PAGE_SIZE in x86 selftests
Use PAGE_SIZE instead of 4096.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-9-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:56 -08:00
Yosry Ahmed
3c40777f0e KVM: selftests: Extend vmx_tsc_adjust_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-8-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:56 -08:00
Yosry Ahmed
91423b041d KVM: selftests: Extend nested_invalid_cr3_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-7-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:55 -08:00
Yosry Ahmed
4d256d00e4 KVM: selftests: Move nested invalid CR3 check to its own test
vmx_tsc_adjust_test currently verifies that a nested VMLAUNCH fails with
an invalid CR3. This is irrelevant to TSC scaling, move it to a
standalone test.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-6-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:54 -08:00
Yosry Ahmed
e6bcdd2122 KVM: selftests: Extend vmx_nested_tsc_scaling_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-5-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:54 -08:00
Yosry Ahmed
0a9eb2afa1 KVM: selftests: Extend vmx_close_while_nested_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-4-yosry.ahmed@linux.dev
[sean: rename to "nested_close_kvm_test" to provide nested_* sorting]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:53 -08:00
Ping Cheng
10c64d4ff4 selftests/hid-tablet: add ABS_DISTANCE test for stylus/pen
For pen and stylus, the ABS_Z event reports ABS_DISTANCE values
in the hid generic kernel driver. This test is to make sure that
the assignment is properly done for all pen and stylus tools.
Same as tilt, distance is an optional event.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Tatsunosuke Tobit <tatsunosuke.tobita@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-11-20 22:47:14 +01:00
Lorenzo Stoakes
c7ba92bcfe testing/selftests/mm: add soft-dirty merge self-test
Assert that we correctly merge VMAs containing VM_SOFTDIRTY flags now that
we correctly handle these as sticky.

In order to do so, we have to account for the fact the pagemap interface
checks soft dirty PTEs and additionally that newly merged VMAs are marked
VM_SOFTDIRTY.

We do this by using use unfaulted anon VMAs, establishing one and clearing
references on that one, before establishing another and merging the two
before checking that soft-dirty is propagated as expected.

We check that this functions correctly with mremap() and mprotect() as
sample cases, because VMA merge of adjacent newly mapped VMAs will
automatically be made soft-dirty due to existing logic which does so.

We are therefore exercising other means of merging VMAs.

Link: https://lkml.kernel.org/r/d5a0f735783fb4f30a604f570ede02ccc5e29be9.1763399675.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
Lorenzo Stoakes
6707915e03 mm: propagate VM_SOFTDIRTY on merge
Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2.

Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().

However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.

Now we have the concept of making VMA flags 'sticky', that is that they
both don't prevent merge and, importantly, are propagated to merged VMAs,
this seems a sensible alternative to the existing special-casing of
VM_SOFTDIRTY.

We additionally add a self-test that demonstrates that this logic behaves
as expected.


This patch (of 2):

Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().

However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.

This is because currently we simply ignore VM_SOFTDIRTY for the purposes
of merge, so one VMA may possess the flag and another not, and whichever
happens to be the target VMA will be the one upon which the merge is
performed which may or may not have VM_SOFTDIRTY set.

Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one
which solves this issue.

Additionally update VMA userland tests to propagate changes.

[akpm@linux-foundation.org: update comments, per Lorenzo]
  Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local
Link: https://lkml.kernel.org/r/cover.1763399675.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Andrey Vagin <avagin@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
675774adbe selftests/damon/sysfs.py: merge DAMON status dumping into commitment assertion
For each test case, sysfs.py makes changes to DAMON, dumps DAMON internal
status and asserts the expectation is met.  The dumping part should be the
same for all cases, so it is duplicated for each test case.  Which means
it is easy to make mistakes.  Actually a few of those duplicates are not
turning DAMON off in case of the dumping failure.  It makes following
selftests that need to turn DAMON on fails with -EBUSY.  Merge the status
dumping into commitment assertion with proper dumping failure handling, to
deduplicate and avoid the unnecessary following tests failures.

Link: https://lkml.kernel.org/r/20251112154114.66053-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
SeongJae Park
53298afe45 mm/damon: rename damos->filters to damos->core_filters
DAMOS filters that are handled by the ops layer are linked to
damos->ops_filters.  Owing to the ops_ prefix on the name, it is easy to
understand it is for ops layer handled filters.  The other types of
filters, which are handled by the core layer, are linked to
damos->filters.  Because of the name, it is easy to confuse the list is
there for not only core layer handled ones but all filters.  Avoid such
confusions by renaming the field to core_filters.

Link: https://lkml.kernel.org/r/20251112154114.66053-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:01 -08:00
Mehdi Ben Hadj Khelifa
1ec5d5810b selftests/mm/uffd: remove static address usage in shmem_allocate_area()
The current shmem_allocate_area() implementation uses a hardcoded virtual
base address (BASE_PMD_ADDR) as a hint for mmap() when creating
shmem-backed test areas.  This approach is fragile and may fail on systems
with ASLR or different virtual memory layouts, where the chosen address is
unavailable.

Replace the static base address with a dynamically reserved address range
obtained via mmap(NULL, ..., PROT_NONE).  The memfd-backed areas and their
alias are then mapped into that reserved region using MAP_FIXED,
preserving the original layout and aliasing semantics while avoiding
collisions with unrelated mappings.

This change improves robustness and portability of the test suite without
altering its behavior or coverage.

[mehdi.benhadjkhelifa@gmail.com: make cleanup code more clear, per Mike]
  Link: https://lkml.kernel.org/r/20251113142050.108638-1-mehdi.benhadjkhelifa@gmail.com
Link: https://lkml.kernel.org/r/20251111205739.420009-1-mehdi.benhadjkhelifa@gmail.com
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Hunter <david.hunter.linux@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
Matthew Wilcox (Oracle)
2197bb60f8 mm: add vma_start_write_killable()
Patch series "vma_start_write_killable"", v2.

When we added the VMA lock, we made a major oversight in not adding a
killable variant.  That can run us into trouble where a thread takes the
VMA lock for read (eg handling a page fault) and then goes out to lunch
for an hour (eg doing reclaim).  Another thread tries to modify the VMA,
taking the mmap_lock for write, then attempts to lock the VMA for write. 
That blocks on the first thread, and ensures that every other page fault
now tries to take the mmap_lock for read.  Because everything's in an
uninterruptible sleep, we can't kill the task, which makes me angry.

This patchset just adds vma_start_write_killable() and converts one caller
to use it.  Most users are somewhat tricky to convert, so expect follow-up
individual patches per call-site which need careful analysis to make sure
we've done proper cleanup.


This patch (of 2):

The vma can be held read-locked for a substantial period of time, eg if
memory allocation needs to go into reclaim.  It's useful to be able to
send fatal signals to threads which are waiting for the write lock.

Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Chris Li <chriscli@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:59 -08:00
Lorenzo Stoakes
c0ae966fac tools/testing/selftests/mm: add smaps visibility guard region test
Assert that we observe guard regions appearing in /proc/$pid/smaps as
expected, and when split/merge is performed too (with expected sticky
behaviour).

Also add handling for file systems which don't sanely handle mmap() VMA
merging so we don't incorrectly encounter a test failure in this
situation.

Link: https://lkml.kernel.org/r/059e62b8c67e55e6d849878206a95ea1d7c1e885.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
89330ec897 tools/testing/selftests/mm: add MADV_COLLAPSE test case
To ensure the retract_page_tables() logic functions correctly with the
introduction of VM_MAYBE_GUARD, add a test to assert that madvise collapse
fails when guard regions are established in the collapsed range in all
cases.

Unfortunately we cannot differentiate between e.g. 
CONFIG_READ_ONLY_THP_FOR_FS not being set vs.  a file-backed VMA having
collapse correctly disallowed, so in each instance we will get an assert
pass here.

We add an additional check to see whether guard regions are preserved
across collapse in case of a bug causing the collapse to succeed, which
will give us more data to debug with should this occur in future.

Link: https://lkml.kernel.org/r/0748beeb864525b8ddfa51adad7128dd32eb3ac4.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
29bef05e6d tools/testing/vma: add VMA sticky userland tests
Modify existing merge new/existing userland VMA tests to assert that
sticky VMA flags behave as expected.

We do so by generating every possible permutation of VMAs being
manipulated being sticky/not sticky and asserting that VMA flags with this
property retain are retained upon merge.

Link: https://lkml.kernel.org/r/5e2c7244485867befd052f8afc8188be6a4be670.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
ab04b530e7 mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one
Gather all the VMA flags whose presence implies that page tables must be
copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this
rather than specifying individual flags in vma_needs_copy().

We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies
that there may be metadata contained in the page tables (that is - guard
markers) which would will not and cannot be propagated upon fork.

This was already being done manually previously in vma_needs_copy(), but
this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and
VM_UFFD_WP all of which imply the same.

Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too
- because equally a flag being VM_STICKY indicates that the VMA contains
metadat that is not propagated by being faulted in - i.e.  that the VMA
metadata does not fully describe the VMA alone, and thus we must propagate
whatever metadata there is on a fork.

However, for maximum flexibility, we do not make this necessarily the case
here.

Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
64212ba02e mm: implement sticky VMA flags
It is useful to be able to designate that certain flags are 'sticky', that
is, if two VMAs are merged one with a flag of this nature and one without,
the merged VMA sets this flag.

As a result we ignore these flags for the purposes of determining VMA flag
differences between VMAs being considered for merge.

This patch therefore updates the VMA merge logic to perform this action,
with flags possessing this property being described in the VM_STICKY
bitmap.

Those flags which ought to be ignored for the purposes of VMA merge are
described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also
updated to use.

As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
already had this behaviour, alongside VM_STICKY as sticky flags by
implication must not disallow merge.

Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its
own right, but this change is out of scope for this series.

The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result
of this change, once the VMA flag is set upon guard region installation,
VMAs with guard ranges will now not have their merge behaviour impacted as
a result and can be freely merged with other VMAs without VM_MAYBE_GUARD
set.

Also update the comments for vma_modify_flags() to directly reference
sticky flags now we have established the concept.

We also update the VMA userland tests to account for the changes.

Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
9119d6c209 mm: update vma_modify_flags() to handle residual flags, document
The vma_modify_*() family of functions each either perform splits, a merge
or no changes at all in preparation for the requested modification to
occur.

When doing so for a VMA flags change, we currently don't account for any
flags which may remain (for instance, VM_SOFTDIRTY) despite the requested
change in the case that a merge succeeded.

This is made more important by subsequent patches which will introduce the
concept of sticky VMA flags which rely on this behaviour.

This patch fixes this by passing the VMA flags parameter as a pointer and
updating it accordingly on merge and updating callers to accommodate for
this.

Additionally, while we are here, we add kdocs for each of the
vma_modify_*() functions, as the fact that the requested modification is
not performed is confusing so it is useful to make this abundantly clear.

We also update the VMA userland tests to account for this change.

Link: https://lkml.kernel.org/r/23b5b549b0eaefb2922625626e58c2a352f3e93c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Lorenzo Stoakes
5dba5cc2e0 mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4.

Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.

This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.

This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.

The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.

It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.

To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:

a. if set in one VMA and not in another still permit those VMAs to be
   merged (if otherwise compatible).

b. When they are merged, the resultant VMA must have the flag set.

The VMA logic is updated to propagate these flags correctly.

Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.

We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves
this issue.

Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.

The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.

This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.

Finally we introduce extensive VMA userland tests to assert that the
sticky VMA logic behaves correctly as well as guard region self tests to
assert that smaps visibility is correctly implemented.


This patch (of 9):

Currently, if a user needs to determine if guard regions are present in a
range, they have to scan all VMAs (or have knowledge of which ones might
have guard regions).

Since commit 8e2f2aeb8b ("fs/proc/task_mmu: add guard region bit to
pagemap") and the related commit a516403787 ("fs/proc: extend the
PAGEMAP_SCAN ioctl to report guard regions"), users can use either
/proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
operation at a virtual address level.

This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
that guard regions exist in ranges.

This patch remedies the situation by establishing a new VMA flag,
VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
uncertain because we cannot reasonably determine whether a
MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
additionally VMAs may change across merge/split).

We utilise 0x800 for this flag which makes it available to 32-bit
architectures also, a flag that was previously used by VM_DENYWRITE, which
was removed in commit 8d0920bde5 ("mm: remove VM_DENYWRITE") and hasn't
bee reused yet.

We also update the smaps logic and documentation to identify these VMAs.

Another major use of this functionality is that we can use it to identify
that we ought to copy page tables on fork.

We do not actually implement usage of this flag in mm/madvise.c yet as we
need to allow some VMA flags to be applied atomically under mmap/VMA read
lock in order to avoid the need to acquire a write lock for this purpose.

Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
Kefeng Wang
340b59816b mm: kill mm_wr_locked from unmap_vmas() and unmap_single_vma()
Kill mm_wr_locked since commit f8e97613fe ("mm: convert VM_PFNMAP
tracking to pfnmap_track() + pfnmap_untrack()") remove the user.

Link: https://lkml.kernel.org/r/20251104085709.2688433-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Ankit Khushwaha
3b12a53b64 selftest/mm: fix pointer comparison in mremap_test
Pointer arthemitic with 'void * addr' and 'ulong dest_alignment' triggers
following warning:

mremap_test.c:1035:31: warning: pointer comparison always evaluates to
false [-Wtautological-compare]
 1035 |                 if (addr + c.dest_alignment < addr) {
      |                                             ^

this warning is raised from clang version 20.1.8 (Fedora 20.1.8-4.fc42).

use 'void *tmp_addr' to do the pointer arthemitic.

Link: https://lkml.kernel.org/r/20251108161829.25105-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:57 -08:00
Matt Bobrowski
d088da9042 selftests/bpf: Use ASSERT_STRNEQ to factor in long slab cache names
subtest_kmem_cache_iter_check_slabinfo() fundamentally compares slab
cache names parsed out from /proc/slabinfo against those stored within
struct kmem_cache_result. The current problem is that the slab cache
name within struct kmem_cache_result is stored within a bounded
fixed-length array (sized to SLAB_NAME_MAX(32)), whereas the name
parsed out from /proc/slabinfo is not. Meaning, using ASSERT_STREQ()
can certainly lead to test failures, particularly when dealing with
slab cache names that are longer than SLAB_NAME_MAX(32)
bytes. Notably, kmem_cache_create() allows callers to create slab
caches with somewhat arbitrarily sized names via its __name identifier
argument, so exceeding the SLAB_NAME_MAX(32) limit that is in place
now can certainly happen.

Make subtest_kmem_cache_iter_check_slabinfo() more reliable by only
checking up to sizeof(struct kmem_cache_result.name) - 1 using
ASSERT_STRNEQ().

Fixes: a496d0cdc8 ("selftests/bpf: Add a test for kmem_cache_iter")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20251118073734.4188710-1-mattbobrowski@google.com
2025-11-20 09:26:06 -08:00
Jakub Kicinski
9e203721ec Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc7).

No conflicts, adjacent changes:

tools/testing/selftests/net/af_unix/Makefile
  e1bb28bf13 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
  45a1cd8346 ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 09:13:26 -08:00
Linus Torvalds
8e621c9a33 Including fixes from IPsec and wireless.
Previous releases - regressions:
 
  - prevent NULL deref in generic_hwtstamp_ioctl_lower(),
    newer APIs don't populate all the pointers in the request
 
  - phylink: add missing supported link modes for the fixed-link
 
  - mptcp: fix false positive warning in mptcp_pm_nl_rm_addr
 
 Previous releases - always broken:
 
  - openvswitch: remove never-working support for setting NSH fields
 
  - xfrm: number of fixes for error paths of xfrm_state creation/
    modification/deletion
 
  - xfrm: fixes for offload
    - fix the determination of the protocol of the inner packet
    - don't push locally generated packets directly to L2 tunnel
      mode offloading, they still need processing from the standard
      xfrm path
 
  - mptcp: fix a couple of corner cases in fallback and fastclose
    handling
 
  - wifi: rtw89: hw_scan: prevent connections from getting stuck,
    work around apparent bug in FW by tweaking messages we send
 
  - af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait
 
  - veth: more robust handing of race to avoid txq getting stuck
 
  - eth: ps3_gelic_net: handle skb allocation failures
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkfQv8ACgkQMUZtbf5S
 IruptQ//aa3wFISt/xfCxaygNnQAL1gxWbKg15cLnv88S5r5wrh+Mf/r+qHkT0IG
 qaxPToztcTKNGzRqMI/7x/OEXjCdfEmJtNcb5JfYRxGkYVcje81bhHrxZN+nLLcj
 jx1xD7rXCjfFNDVFSl+OzQkoTZtfGphX+JgCTKLPZ8t4qR/UhU1IwYgpoaiK0uCT
 gitXJPHrDGJLHud0ilaXbn+q0ZH9ZU08QSRhsaAq7C6YPafUseEr2fuQYgFr1kkp
 mtzFm84bEnRSneQ5+noVzoc5llbu3vf3Wd9Y5tr5sBaBjh5OpH/kK0kuPy6Lzhvd
 8y05jl8kyASMtRBoTvmpYOdi3xUNbge5AwJN3FIo4KPCmyr1bQoNoQVIivUpoiGz
 Zvv7QcceP7E5057CRuhi06krld/QaScHkmhUododbqJKmL8NWaSoXLRO7oYjB4kU
 1MuAwOJWVpsqnTUEbMw7dHJPZFTqRkLYv6oAmIqb/AWzTcqg9OXjLEc+OzUqWEl6
 mHv/SonSjAM81m3GxCxtNtH2ErKwdMIgI/ado39+KvksKTH1bhN9ZpmuyyktzpCj
 PBn9FVdd7zyct92u5xXtsKVkDWKGyD419Z2lFHC0FaAsNQmPrSqzi+U50pAtie5M
 JWABm8hsSdia4dTDrdldMgM50dzuBd5nSceY7XCqPA8nZ5+Af/8=
 =89Xl
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from IPsec and wireless.

  Previous releases - regressions:

   - prevent NULL deref in generic_hwtstamp_ioctl_lower(),
     newer APIs don't populate all the pointers in the request

   - phylink: add missing supported link modes for the fixed-link

   - mptcp: fix false positive warning in mptcp_pm_nl_rm_addr

  Previous releases - always broken:

   - openvswitch: remove never-working support for setting NSH fields

   - xfrm: number of fixes for error paths of xfrm_state creation/
     modification/deletion

   - xfrm: fixes for offload
      - fix the determination of the protocol of the inner packet
      - don't push locally generated packets directly to L2 tunnel
        mode offloading, they still need processing from the standard
        xfrm path

   - mptcp: fix a couple of corner cases in fallback and fastclose
     handling

   - wifi: rtw89: hw_scan: prevent connections from getting stuck,
     work around apparent bug in FW by tweaking messages we send

   - af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait

   - veth: more robust handing of race to avoid txq getting stuck

   - eth: ps3_gelic_net: handle skb allocation failures"

* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  vsock: Ignore signal/timeout on connect() if already established
  be2net: pass wrb_params in case of OS2BMC
  l2tp: reset skb control buffer on xmit
  net: dsa: microchip: lan937x: Fix RGMII delay tuning
  selftests: mptcp: add a check for 'add_addr_accepted'
  mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
  selftests: mptcp: join: userspace: longer timeout
  selftests: mptcp: join: endpoints: longer timeout
  selftests: mptcp: join: fastclose: remove flaky marks
  mptcp: fix duplicate reset on fastclose
  mptcp: decouple mptcp fastclose from tcp close
  mptcp: do not fallback when OoO is present
  mptcp: fix premature close in case of fallback
  mptcp: avoid unneeded subflow-level drops
  mptcp: fix ack generation for fallback msk
  wifi: rtw89: hw_scan: Don't let the operating channel be last
  net: phylink: add missing supported link modes for the fixed-link
  selftest: af_unix: Add test for SO_PEEK_OFF.
  af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
  net/mlx5: Clean up only new IRQ glue on request_irq() failure
  ...
2025-11-20 08:52:07 -08:00
Takashi Iwai
2e90ff5462 Merge branch 'for-linus' into for-next
Pull 6.18-devel branch for applying the further HD-audio fixups for HP.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-20 09:49:30 +01:00
Gang Yan
0eee0fdf9b selftests: mptcp: add a check for 'add_addr_accepted'
The previous patch fixed an issue with the 'add_addr_accepted' counter.
This was not spot by the test suite.

Check this counter and 'add_addr_signal' in MPTCP Join 'delete re-add
signal' test. This should help spotting similar regressions later on.
These counters are crucial for ensuring the MPTCP path manager correctly
handles the subflow creation via 'ADD_ADDR'.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-11-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19 20:07:16 -08:00
Matthieu Baerts (NGI0)
0e4ec14dc1 selftests: mptcp: join: userspace: longer timeout
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.

To play it safe, all userspace tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.

The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.

Fixes: 290493078b ("selftests: mptcp: join: userspace: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-9-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19 20:07:15 -08:00
Matthieu Baerts (NGI0)
fb13c6bb81 selftests: mptcp: join: endpoints: longer timeout
In rare cases, when the test environment is very slow, some endpoints
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.

To play it safe, all endpoints tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.

The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.

Fixes: 6457595db9 ("selftests: mptcp: join: endpoints: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-8-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19 20:07:15 -08:00
Matthieu Baerts (NGI0)
efff6cd53a selftests: mptcp: join: fastclose: remove flaky marks
After recent fixes like the parent commit, and "selftests: mptcp:
connect: trunc: read all recv data", the two fastclose subtests no
longer look flaky any more.

It then feels fine to remove these flaky marks, to no longer ignore
these subtests in case of errors.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-7-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19 20:07:15 -08:00
Masami Hiramatsu (Google)
a2f7990d33 selftests: tracing: Update fprobe selftest for ftrace based fprobe
Since the ftrace fprobe is both fgraph and ftrace based implemented,
the selftest needs to be updated. This does not count the actual
number of lines, but just check the differences.

Link: https://lore.kernel.org/r/176295318112.431538.11780280333728368327.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:55:14 -07:00
Masami Hiramatsu (Google)
a1ca238936 selftests: tracing: Add tprobe enable/disable testcase
Commit 2867495dea ("tracing: tprobe-events: Register tracepoint when
enable tprobe event") caused regression bug and tprobe did not work.
To prevent similar problems, add a testcase which enables/disables a
tprobe and check the results.

Link: https://lore.kernel.org/r/176252610176.214996.3978515319000806265.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:55:08 -07:00
Brendan Jackman
d9e6269e33 selftests/run_kselftest.sh: exit with error if tests fail
Parsing KTAP is quite an inconvenience, but most of the time the thing
you really want to know is "did anything fail"?

Let's give the user the his information without them needing
to parse anything.

Because of the use of subshells and namespaces, this needs to be
communicated via a file. Just write arbitrary data into the file and
treat non-empty content as a signal that something failed.

In case any user depends on the current behaviour, such as running this
from a script with `set -e` and parsing the result for failures
afterwards, add a flag they can set to get the old behaviour, namely
--no-error-on-fail.

Link: https://lore.kernel.org/r/20251111-b4-ksft-error-on-fail-v3-1-0951a51135f6@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:00:22 -07:00
Zhang Chujun
26347f8443 selftests/dma: fix invalid array access in printf
The printf statement attempts to print the DMA direction string using
the syntax 'dir[directions]', which is an invalid array access. The
variable 'dir' is an integer, and 'directions' is a char pointer array.
This incorrect syntax should be 'directions[dir]', using 'dir' as the
index into the 'directions' array. Fix this by correcting the array
access from 'dir[directions]' to 'directions[dir]'.

Link: https://lore.kernel.org/r/20251104025234.2363-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-19 15:00:14 -07:00
Maximilian Dittgen
85f329df29 KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stress
vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC setup to
map interrupt events to ITT entries and collection IDs to
redistributors, respectively.

We have no guarantee that the ITS will finish handling these mapping
commands before the selftest calls KVM_SIGNAL_MSI to inject LPIs to the
guest. If LPIs are injected before ITS mapping completes, the ITS cannot
properly pass the interrupt on to the redistributor.

Fix by adding a SYNC command to the selftests ITS library, then calling
SYNC after ITS mapping to ensure mapping completes before signal_lpi()
writes to GITS_TRANSLATER.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-2-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 12:38:59 -08:00
Maximilian Dittgen
31df012da4 KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU number
The selftests GIC library and tests assume that the
GICR_TYPER.Processor_number associated with a given CPU is the same as
the CPU's selftest index.

Since this assumption is not guaranteed by specification, add an assert
in gicv3_cpu_init() that validates this is true.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-1-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19 12:38:59 -08:00
Yao Zihong
a131fd6079 selftests/riscv: Add Zicbop prefetch test
Add selftests to cbo.c to verify Zicbop extension behavior, and split
the previous `--sigill` mode into two options so they can be tested
independently.

The test checks:
- That hwprobe correctly reports Zicbop presence and block size.
- That prefetch instructions execute without exception on valid and NULL
  addresses when Zicbop is present.

Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://patch.msgid.link/20251118162436.15485-3-zihong.plct@isrc.iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:29 -07:00
Yong-Xuan Wang
f0ae09a892 selftests: riscv: Add test for the Vector ptrace interface
Add a test case that does some basic verification of the Vector ptrace
interface. This forks a child process then using ptrace to inspect and
manipulate the v31 register of the child.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251013091318.467864-3-yongxuan.wang@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-11-19 09:19:28 -07:00
Fernando Fernandez Mancera
d7dbda8789 selftests: fib_tests: add fib6 from ra to static test
The new test checks that a route that has been promoted from RA-learned
to static does not switch back when a new RA message arrives. In
addition, it checks that the route is owned by RA again when the static
address is removed.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20251115095939.6967-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18 19:28:08 -08:00
Kuniyuki Iwashima
e1bb28bf13 selftest: af_unix: Add test for SO_PEEK_OFF.
The test covers various cases to verify SO_PEEK_OFF behaviour
for all AF_UNIX socket types.

two_chunks_blocking and two_chunks_overlap_blocking reproduce
the issue mentioned in the previous patch.

Without the patch, the two tests fail:

  #  RUN           so_peek_off.stream.two_chunks_blocking ...
  # so_peek_off.c:121:two_chunks_blocking:Expected 'bbbb' == 'aaaabbbb'.
  # two_chunks_blocking: Test terminated by assertion
  #          FAIL  so_peek_off.stream.two_chunks_blocking
  not ok 3 so_peek_off.stream.two_chunks_blocking

  #  RUN           so_peek_off.stream.two_chunks_overlap_blocking ...
  # so_peek_off.c:159:two_chunks_overlap_blocking:Expected 'bbbb' == 'aaaabbbb'.
  # two_chunks_overlap_blocking: Test terminated by assertion
  #          FAIL  so_peek_off.stream.two_chunks_overlap_blocking
  not ok 5 so_peek_off.stream.two_chunks_overlap_blocking

With the patch, all tests pass:

  # PASSED: 15 / 15 tests passed.
  # Totals: pass:15 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251117174740.3684604-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18 19:19:09 -08:00
Dave Jiang
ea5514e300 Merge branch 'for-6.19/cxl-misc' into cxl-for-next
- Remove ret_limit race condition in mock_get_event()
- Assign overflow_err_count from log->nr_overflow
2025-11-18 16:27:58 -07:00
Alison Schofield
f1840efdb2 cxl/test: Assign overflow_err_count from log->nr_overflow
mock_get_event() uses an uninitialized local variable, nr_overflow, to
populate the overflow_err_count field. That results in incorrect
overflow_err_count values in mocked cxl_overflow trace events, such as
this case where the records are reported as 0 and should be non-zero:

[] cxl_overflow: memdev=mem7 host=cxl_mem.6 serial=7: log=Failure : 0 records from 1763228189130895685 to 1763228193130896180

Fix by using log->nr_overflow and remove the unused local variable.

A follow-up change was considered in cxl_mem_get_records_log() to
confirm that the overflow_err_count is non-zero when the overflow flag
is set [1]. Since the driver has no functional dependency on this
constraint, and a device that violates this specific requirement does
not cause incorrect driver behavior, no validation check is added.

[1] CXL 3.2, Table 8-65 Get Event Records Output Payload

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251116013036.1713313-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-18 16:21:57 -07:00
Alison Schofield
b6369daf0d cxl/test: Remove ret_limit race condition in mock_get_event()
Commit 364ee9f326 ("cxl/test: Enhance event testing") changed the
loop iterator in mock_get_event() from a static constant,
CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
intent was to vary the number of events returned per call to simulate
events occurring while logs are being read.

However, ret_limit is modified without synchronization. When multiple
threads call mock_get_event() concurrently, one thread may read
ret_limit, another thread may increment it, and the first thread's
loop condition and size calculation see and use the updated value.

This is visible during cxl_test module load when all memdevs are
initializing simultaneously, which includes getting event records. It
is not tied to the cxl-events.sh unit test specifically, as that
operates on a single memdev.

While no actual harm results (the buffer is always large enough and
the record count fields correctly reflect what was written), this is
a correctness issue. The race creates an inconsistent state within
mock_get_event() and adding variability based on a race appears
unintended.

Make ret_limit a local variable populated from an atomic counter. Each
call gets a stable value that won't change during execution. That
preserves the intended behavior of varying the return counts across
calls while eliminating the race condition.

This implementation uses "+ 1" to produce the full range of 1 to
CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
produced.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251116013819.1713780-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-18 16:19:48 -07:00
Hoyeon Lee
ec12ab2cda selftests/bpf: Replace TCP CC string comparisons with bpf_strncmp
The connect4_prog and bpf_iter_setsockopt tests duplicate the same
open-coded TCP congestion control string comparison logic. Since
bpf_strncmp() provides the same functionality, use it instead to
avoid repeated open-coded loops.

This change applies only to functional BPF tests and does not affect
the verifier performance benchmarks (veristat.cfg). No functional
changes intended.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-5-hoyeon.lee@suse.com
2025-11-18 14:57:53 -08:00
Hoyeon Lee
f700b37314 selftests/bpf: Move common TCP helpers into bpf_tracing_net.h
Some BPF selftests contain identical copies of the min(), max(),
before(), and after() helpers. These repeated snippets are the same
across the tests and do not need to be defined separately.

Move these helpers into bpf_tracing_net.h so they can be shared by
TCP related BPF programs. This removes repeated code and keeps the
helpers in a single place.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-4-hoyeon.lee@suse.com
2025-11-18 14:57:45 -08:00
Dave Jiang
b5bea8cee5 Merge branch 'for-6.19/cxl-misc' into cxl-for-next
- remove unused mock function for cxl_rcd_component_reg_phys()
2025-11-18 15:41:53 -07:00
Alejandro Lucero
26c5b0d9c0 cxl/test: remove unused mock function for cxl_rcd_component_reg_phys()
Since commit 733b57f262 ("cxl/pci: Early setup RCH dport component registers from RCRB")
is not necessary under mocking tests.

[ dj: Fixup commit representation flagged by checkpatch. ]
[ dj: Ammend subject line to indicate which function. ]

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20251118182202.2083244-1-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-18 15:33:50 -07:00
Sohil Mehta
c9129cf0f0 selftests/x86: Update the negative vsyscall tests to expect a #GP
Some of the vsyscall selftests expect a #PF when vsyscalls are disabled.
However, with LASS enabled, an invalid access results in a SIGSEGV due
to a #GP instead of a #PF. One such negative test fails because it is
expecting X86_PF_INSTR to be set.

Update the failing test to expect either a #GP or a #PF. Also, update
the printed messages to show the trap number (denoting the type of
fault) instead of assuming a #PF.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-8-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Sunday Adelodun
45a1cd8346 selftests: af_unix: Add tests for ECONNRESET and EOF semantics
Add selftests to verify and document Linux’s intended behaviour for
UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes.
The tests verify that:

 1. SOCK_STREAM returns EOF when the peer closes normally.
 2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data.
 3. SOCK_SEQPACKET returns EOF when the peer closes normally.
 4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data.
 5. SOCK_DGRAM does not return ECONNRESET when the peer closes.

This follows up on review feedback suggesting a selftest to clarify
Linux’s semantics.

Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Link: https://patch.msgid.link/20251113112802.44657-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-18 11:33:17 +01:00
Ido Schimmel
bed22c7b90 selftests: net: lib: Do not overwrite error messages
ret_set_ksft_status() calls ksft_status_merge() with the current return
status and the last one. It treats a non-zero return code from
ksft_status_merge() as an indication that the return status was
overwritten by the last one and therefore overwrites the return message
with the last one.

Currently, ksft_status_merge() returns a non-zero return code even if
the current return status and the last one are equal. This results in
return messages being overwritten which is counter-productive since we
are more interested in the first failure message and not the last one.

Fix by changing ksft_status_merge() to only return a non-zero return
code if the current return status was actually changed.

Add a test case which checks that the first error message is not
overwritten.

Before:

 # ./lib_sh_test.sh
 [...]
 TEST: RET tfail2 tfail -> fail                                      [FAIL]
        retmsg=tfail expected tfail2
 [...]
 # echo $?
 1

After:

 # ./lib_sh_test.sh
 [...]
 TEST: RET tfail2 tfail -> fail                                      [ OK ]
 [...]
 # echo $?
 0

Fixes: 596c8819cb ("selftests: forwarding: Have RET track kselftest framework constants")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251116081029.69112-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:32:12 -08:00
Matthieu Baerts (NGI0)
eea2f44870 selftests: mptcp: get stats just before timing out
Recently, some debugging happened around a test that was timing out. The
stats were showing connections being closed which was confusing because
the closing state was caused by the timeout stopping the transfer.

To avoid such confusion, the timeout is no longer done per mptcp_connect
process, but separately. In case of timeout, the stats are now printed,
then the apps are killed.

The stats will still be printed after the kill, but that's fine, and
this might even be useful, just in case. Timeout should be exceptional.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-8-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:47 -08:00
Matthieu Baerts (NGI0)
39348f5f2f selftests: mptcp: wait for port instead of sleep
After having started mptcp_connect in listening mode,
'mptcp_lib_wait_local_port_listen' can be used to wait for the listening
socket to be ready.

This is better than using the 'sleep' command, not to pause for a fixed
amount of time, but waiting for an event. This helper is used in all
other MPTCP selftests, but not in these two.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-7-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:47 -08:00
Matthieu Baerts (NGI0)
8c1fe0a500 selftests: mptcp: connect: avoid double packet traces
When the same netns is used for the listener and the connector, no need
to take exactly the same packet trace twice, one is enough.

This avoids confusions when the traces are the same, and wasting
resources which might not help reproducing an issue.

While at it, avoid long lines and double spaces now that these lines are
no longer aligned.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-6-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:47 -08:00
Matthieu Baerts (NGI0)
71388a9f33 selftests: mptcp: lib: get counters from nstat history
Before, 'nstat' was used to retrieve each individual counter: this means
querying 4 different sources from /proc/net and iterating over 100+
counters each time. Instead, the stats could be retrieved once, and the
output file could be parsed for each counter. Even better, such file is
already present: the nstat history file.

To be able to get this working, the nstat history file also needs to
contains zero counters too, so it is still possible to know if a counter
is missing or set to 0.

This also simplifies mptcp_connect.sh: instead of checking multiple
counters before and after a test to compute the difference, the stats
history files can be reset before each test, and nstat can display only
the difference.

mptcp_lib_get_counter() continues to work when no history file is
available: by fetching nstat directly, like before. This is the case in
diag.sh and userspace_pm.sh where there is no need to save the history
file. This is also the case in mptcp_join.sh, when 'run_tests' is
executed in the background: easier to continue fetching counters than
updating the history each time it is needed.

Note: 'nstat' is called with '-s' in mptcp_lib_nstat_get(), so this
helper can be called multiple times during the test if needed.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-5-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:47 -08:00
Matthieu Baerts (NGI0)
658e531417 selftests: mptcp: join: dump stats from history
In case of errors, dump the stats from history instead of using nstat.

There are multiple advantages to that:

- The same filters from pr_err_stats are used, e.g. the unused 'rate'
  column is not displayed.

- The counters are closer to the ones from when the test stopped.

- While at it, the errors can be better presented: error colours, a
  small indentation to distinguish the different parts, extra new lines.

Even if it should only happen in rare cases -- internal errors, or netns
issues -- if no history is available, 'nstat' is used like before, just
in case.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-4-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:46 -08:00
Matthieu Baerts (NGI0)
2e6daf6b9b selftests: mptcp: lib: stats: remove nstat rate columns
With the MPTCP selftests, the nstat daemon is not used. It means that
the last column (the rate) is always 0.0, and that's not something
interesting to display.

Then, this last column can be filtered out.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-3-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:46 -08:00
Matthieu Baerts (NGI0)
a89fc262b6 selftests: mptcp: lib: remove stats files args
Now that these files are written from MPTCP lib helpers, the stats file
paths are uniformed. Then, no need to specify them from the each
selftest.

No behavioural changes intended.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-2-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:46 -08:00
Matthieu Baerts (NGI0)
d3305c016a selftests: mptcp: lib: introduce 'nstat_{init,get}'
These new helpers are easier to read than the long and multi lines
commands. Plus it will ease the addition of new features related to that
in the next commits.

No behavioural changes intended.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-1-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:27:46 -08:00
Mark Brown
a0245b42f8 kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
On a system which support SME but not SVE we can now disable streaming mode
via ptrace by writing FPSIMD formatted data through NT_ARM_SVE with a VL of
0. Extend fp-ptrace to cover rather than skip these cases, relax the check
for SVE writes of FPSIMD format data to not skip if SME is supported and
accept 0 as the VL when performing the ptrace write.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-17 20:11:54 +00:00
Mark Brown
eb9df6d69a kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
In order to allow exiting streaming mode on systems with SME but not SVE
we allow writes of FPSIMD format data via NT_ARM_SVE even when SVE is not
supported, add a test case that covers this to sve-ptrace.

We do not support reads.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-17 20:11:54 +00:00
Dave Jiang
7ec9db66cc Merge branch 'for-6.19/cxl-elc-test' into cxl-for-next
Extended linear cache unit testing support
  - Standardize CXL auto region size
  - Add cxl_test CFMWS support for extended linear cache
  - Add support for acpi extended linear cache
2025-11-17 11:05:47 -07:00
Dave Jiang
68f4a852e1 cxl/test: Add support for acpi extended linear cache
Add the mock wrappers for hmat_get_extended_linear_cache_size() in order
to emulate the ACPI helper function for the regions that are mock'd by
cxl_test.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20251117144611.903692-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-17 09:46:44 -07:00
Dave Jiang
4b1c0466c8 cxl/test: Add cxl_test CFMWS support for extended linear cache
Add a module parameter to allow activation of extended linear cache
on the auto region for cxl_test. The current platform implementation
for extended linear cache is 1:1 of DRAM and CXL memory. A CFMWS is
created with the size of both memory together where DRAM takes the
first part of the memory range and CXL covers the second part. The
current CXL auto region on cxl_test consists of 2 256M devices that
creates a 512M region. The new extended linear cache setup will have
512M DRAM and 512M CXL memory for a total of 1G CFMWS. The hardware
decoders must have their starting offset moved to after the DRAM region
to handle the CXL regions.

[ dj: Fixup commenting style. (Jonathan) ]

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20251117144611.903692-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-17 09:46:42 -07:00
Dave Jiang
fa59c35167 cxl/test: Standardize CXL auto region size
Create a global define for the size of the mock CXL auto region used
in cxl_test. Remove the declared size in mock_init_hdm_decoder()
function.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20251117144611.903692-2-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-17 09:44:38 -07:00
Heiko Carstens
169ebcbb90 tools: Remove s390 compat support
Remove s390 compat support from everything within tools, since s390 compat
support will be removed from the kernel.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO
Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:38 +01:00
SeongJae Park
809ba69f9f selftests/damon/sysfs: add obsolete_target test
A new DAMON sysfs file for pin-point target removal, namely
obsolete_target, has been added.  Add a test for the functionality.  It
starts DAMON with three monitoring target processes, mark one in the
middle as obsolete, commit it, and confirm the internal DAMON status is
updated to remove the target in the middle.

Link: https://lkml.kernel.org/r/20251023012535.69625-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:25 -08:00
SeongJae Park
65a9033db7 sysfs.py: extend assert_ctx_committed() for monitoring targets
assert_ctx_committed() is not asserting monitoring targets commitment,
since all existing callers of the function assume no target changes. 
Extend it for future usage.

Link: https://lkml.kernel.org/r/20251023012535.69625-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:25 -08:00
SeongJae Park
a00f18abef drgn_dump_damon_status: dump damon_target->obsolete
A new field of damon_target for pin-point target removal, namely obsolete,
has newly been added.  Extend drgn_dump_damon_status.py to dump it, for
easily writing a future DAMON selftests of it.

Link: https://lkml.kernel.org/r/20251023012535.69625-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:24 -08:00
SeongJae Park
badfa4361c selftests/damon/_damon_sysfs: support obsolete_target file
A DAMON sysfs file, namely obsolete_target, has been newly introduced. 
Add a support of that file to _damon_sysfs.py so that DAMON selftests for
the file can be easily written.

Link: https://lkml.kernel.org/r/20251023012535.69625-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:24 -08:00
Lorenzo Stoakes
ac0a3fc9c0 mm: add ability to take further action in vm_area_desc
Some drivers/filesystems need to perform additional tasks after the VMA is
set up.  This is typically in the form of pre-population.

The forms of pre-population most likely to be performed are a PFN remap
or the insertion of normal folios and PFNs into a mixed map.

We start by implementing the PFN remap functionality, ensuring that we
perform the appropriate actions at the appropriate time - that is setting
flags at the point of .mmap_prepare, and performing the actual remap at the
point at which the VMA is fully established.

This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.

Unfortunately callers still do often require some kind of custom action,
so we add an optional success/error _hook to allow the caller to do
something after the action has succeeded or failed.

This is done at the point when the VMA has already been established, so
the harm that can be done is limited.

The error hook can be used to filter errors if necessary.

There may be cases in which the caller absolutely must hold the file rmap
lock until the operation is entirely complete. It is an edge case, but
certainly the hugetlbfs mmap hook requires it.

To accommodate this, we add the hide_from_rmap_until_complete flag to the
mmap_action type. In this case, if a new VMA is allocated, we will hold the
file rmap lock until the operation is entirely completed (including any
success/error hooks).

Note that we do not need to update __compat_vma_mmap() to accommodate this
flag, as this function will be invoked from an .mmap handler whose VMA is
not yet visible, so we implicitly hide it from the rmap.

If any error arises on these final actions, we simply unmap the VMA
altogether.

Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.

While we're here, rename __compat_vma_mmap_prepare() to __compat_vma_mmap()
as we are now performing actions invoked by the mmap_prepare in addition to
just the mmap_prepare hook.

Link: https://lkml.kernel.org/r/2601199a7b2eaeadfcd8ab6e199c6d1706650c94.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:28:12 -08:00
xu xin
bda7bf0684 selftests: update ksm inheritance tests for prctl fork/exec
To reproduce the issue mentioned by [1], this add a setting of
pages_to_scan and sleep_millisecs at the start of test_prctl_fork_exec(). 
The main change is just raise the scanning frequency of ksmd.

[1] https://lore.kernel.org/all/202510012256278259zrhgATlLA2C510DMD3qI@zte.com.cn/

Link: https://lkml.kernel.org/r/20251007182935207jm31wCIgLpZg5XbXQY64S@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16 17:27:55 -08:00
Linus Torvalds
7ba45f1504 7 hotfixes. 5 are cc:stable, 4 are against mm/.
All are singletons - please see the respective changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaRoauQAKCRDdBJ7gKXxA
 jtNFAQDEMH0+zRGz/Larkf9cgmdKcDgij1DP2gP/3i8PWAoaGQD8C9evZxu1h9wC
 rFbaSkPDeSdDafo3RZfpo1gqE0LdEA4=
 =oew8
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "7 hotfixes.  5 are cc:stable, 4 are against mm/

  All are singletons - please see the respective changelogs for details"

* tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm, swap: fix potential UAF issue for VMA readahead
  selftests/user_events: fix type cast for write_index packed member in perf_test
  lib/test_kho: check if KHO is enabled
  mm/huge_memory: fix folio split check for anon folios in swapcache
  MAINTAINERS: update David Hildenbrand's email address
  crash: fix crashkernel resource shrink
  mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
2025-11-16 13:31:14 -08:00
Ankit Khushwaha
216158f063 selftests/user_events: fix type cast for write_index packed member in perf_test
Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member
warning due to potential unaligned pointer access:

perf_test.c:239:38: warning: taking address of packed member 'write_index'
of class or structure 'user_reg' may result in an unaligned pointer value
[-Waddress-of-packed-member]
  239 |         ASSERT_NE(-1, write(self->data_fd, &reg.write_index,
      |                                             ^~~~~~~~~~~~~~~

Since write(2) works with any alignment. Casting '&reg.write_index'
explicitly to 'void *' to suppress this warning.

Link: https://lkml.kernel.org/r/20251106095532.15185-1-ankitkhushwaha.linux@gmail.com
Fixes: 42187bdc3c ("selftests/user_events: Add perf self-test for empty arguments events")
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: sunliming <sunliming@kylinos.cn>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-15 10:52:02 -08:00
Martin KaFai Lau
6cc73f3540 selftests/bpf: Test bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) when transport_header is not set
Add a test to check that bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) is
rejected (-EINVAL) if skb->transport_header is not set. The test
needs to lower the MTU of the loopback device. Thus, take this
opportunity to run the test in a netns by adding "ns_" to the test
name. The "serial_" prefix can then be removed.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251112232331.1566074-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 18:49:18 -08:00
Mykyta Yatsenko
a4d31f451d selftests/bpf: Align kfuncs renamed in bpf tree
bpf_task_work_schedule_resume() and bpf_task_work_schedule_signal() have
been renamed in bpf tree to bpf_task_work_schedule_resume_impl() and
bpf_task_work_schedule_signal_impl() accordingly.
There are few uses of these kfuncs in selftests that are not in bpf
tree, so that when we port [1] into bpf-next, those BPF programs will
not compile.
This patch aligns those remaining callsites with the kfunc renaming.
It should go on top of [1] when applying on bpf-next.

1: https://lore.kernel.org/all/20251104-implv2-v3-0-4772b9ae0e06@meta.com/

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251105132105.597344-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 17:45:43 -08:00
Jakub Kicinski
eca8b8fc74 selftests: drv-net: xdp: make the XDP qstats tests less flaky
The XDP qstats tests send 2k packets over a single socket.
Looks like when netdev CI is busy running those tests in QEMU
occasionally flakes. The target doesn't get to run at all
before all 2000 packets are sent.

Lower the number of packets to 1000 and reopen the socket
every 50 packets, to give RSS a chance to spread the packets
to multiple queues.

For the netdev CI testing either lowering the count or using
multiple sockets is enough, but let's do both for extra resiliency.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20251113152703.3819756-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14 17:45:38 -08:00
Dimitri Daskalakis
e1215d1d38 selftests: drv-net: xdp: Fix register spill error with clang 20
On clang 20.1.8 the XDP program fails to load with a register spill error.
Since hdr_len is a __u32, the compiler decided it only needed the lower
32-bits of ctx->data, which later triggers the register spill verifier
error.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251113043102.4062150-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14 17:45:07 -08:00
Jakub Kicinski
c7dc5b5228 ipv6: clean up routes when manually removing address with a lifetime
When an IPv6 address with a finite lifetime (configured with valid_lft
and preferred_lft) is manually deleted, the kernel does not clean up the
associated prefix route. This results in orphaned routes (marked "proto
kernel") remaining in the routing table even after their corresponding
address has been deleted.

This is particularly problematic on networks using combination of SLAAC
and bridges.

1. Machine comes up and performs RA on eth0.
2. User creates a bridge
   - does an ip -6 addr flush dev eth0;
   - adds the eth0 under the bridge.
3. SLAAC happens on br0.

Even tho the address has "moved" to br0 there will still be a route
pointing to eth0, but eth0 is not usable for IP any more.

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251113031700.3736285-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14 17:44:47 -08:00
Alexei Starovoitov
e47b68bda4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+
Cross-merge BPF and other fixes after downstream PR.

Minor conflict in kernel/bpf/helpers.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 17:43:41 -08:00
Paul Houssel
a69e09823e selftests/bpf: Add BTF dedup tests for recursive typedef definitions
Add several ./test_progs tests:
    1.  btf/dedup:recursive typedef ensures that deduplication no
	longer fails on recursive typedefs.
    2.  btf/dedup:typedef ensures that typedefs are deduplicated correctly
	just as they were before this patch.

Signed-off-by: Paul Houssel <paul.houssel@orange.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/9fac2f744089f6090257d4c881914b79f6cd6c6a.1763037045.git.paul.houssel@orange.com
2025-11-14 17:07:20 -08:00
Alexei Starovoitov
c133390398 selftests/bpf: Fix failure paths in send_signal test
When test_send_signal_kern__open_and_load() fails parent closes the
pipe which cases ASSERT_EQ(read(pipe_p2c...)) to fail, but child
continues and enters infinite loop, while parent is stuck in wait(NULL).
Other error paths have similar issue, so kill the child before waiting on it.

The bug was discovered while compiling all of selftests with -O1 instead of -O2
which caused progs/test_send_signal_kern.c to fail to load.

Fixes: ab8b7f0cb3 ("tools/bpf: Add self tests for bpf_send_signal_thread()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251113171153.2583-1-alexei.starovoitov@gmail.com
2025-11-14 17:02:25 -08:00
Linus Torvalds
cbba5d1b53 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmkXpZUACgkQ6rmadz2v
 bTrGCw//UCx+KBXbzvv7m0A1QGOUL3oHL/Qd+OJA3RW3B+saVbYYzn9jjl0SRgFP
 X0q/DwbDOjFtOSORV9oFgJkrucn7+BM/yxPaC4sE1SQZJAjDFA/CSaF0r8duuGsM
 Mvat9TTiwwetOMAkNB9WZ1e6AKGovBLguLFGAWZc6vLeQZopcER5+pFwS44a9RrK
 dq0Th8O/oY3VmUDgSKJ2KyY51KxpJU7k2ipifiIbu1M1MWZ7s2vERkMEkzJ/lB8/
 nldMsTZUdknGFzVH/W6Rc9ScFYlH+h/x1gkOHwTibMsqDBm92mWVo6O7hvuUbsEO
 NlPDgMtkhBp7PDSx9SA0UBcriMs1M6ovNBOpj/cI4AL1k8WNubf/FHZtrBwoy8C9
 3HaM+8lkA2uiHVPUvT5dImzWqshweN0GXoXAoa9xPSQPchJ38UdzCHqYRAg/kWFZ
 5jUK2j4e5+yyII44pD7Xti0PrfoP81giliqmTbGFV8+Y89dQnk+WK12vnbv34ER7
 unLwId8HLtq0ZN7FVG4F6s/4qNdEMKqXbAkve0WWFXn4vKZMCju4ol6NYVGisRAg
 zcn7Yk+weSuY3UOzC+/4SxhfTEAD0Kg6fUoG/1JdflgNsm8XhLBja0DZaAlIVO0p
 xz5UaljwcNvjAKGGMYbCGrf3XN2tOmGpVyJkMj17Vcq88y3bJBU=
 =JJui
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix interaction between livepatch and BPF fexit programs (Song Liu)
   With Steven and Masami acks.

 - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa)
   With Steven and Masami acks.

 - Fix out of bounds access in widen_imprecise_scalars() in the verifier
   (Eduard Zingerman)

 - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen)

 - Fix net_sched storage collision with BPF data_meta/data_end (Eric
   Dumazet)

 - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking
   them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Test widen_imprecise_scalars() with different stack depth
  bpf: account for current allocated stack depth in widen_imprecise_scalars()
  bpf: Add bpf_prog_run_data_pointers()
  selftests/bpf: Add mptcp test with sockmap
  mptcp: Fix proto fallback detection with BPF
  mptcp: Disallow MPTCP subflows from sockmap
  selftests/bpf: Add stacktrace ips test for raw_tp
  selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
  x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
  Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"
  bpf: add _impl suffix for bpf_stream_vprintk() kfunc
  bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
  selftests/bpf: Add tests for livepatch + bpf trampoline
  ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()
  ftrace: Fix BPF fexit with livepatch
2025-11-14 15:39:39 -08:00
Alexei Starovoitov
63066b7a8e selftests/bpf: Convert glob_match() to bpf arena
Increase arena test coverage.
Convert glob_match() to bpf arena in two steps:
1.
Copy paste lib/glob.c into bpf_arena_strsearch.h
Copy paste lib/globtests.c into progs/arena_strsearch.c

2.
Add __arena to pointers
Add __arg_arena to global functions that accept arena pointers
Add cond_break to loops

The test also serves as a good example of what's possible
with bpf arena and how existing algorithms can be converted.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251111032931.21430-1-alexei.starovoitov@gmail.com
2025-11-14 13:57:28 -08:00
Thomas Weißschuh
308bc2e338 selftests/timers/nanosleep: Add tests for return of remaining time
If interrupted by a signal clock_nanosleep() returns the remaining time
into the structure pointed to by the rmtp parameter. So far this
functionality was not tested by the timer selftests.

Extend the nanosleep selftest to cover this feature.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251106-nanosleep-rtmp-selftest-v1-1-f9212fb295fe@linutronix.de
2025-11-14 20:34:50 +01:00
Wake Liu
05d89fe7e4 selftests/timers: Clean up kernel version check in posix_timers
Several tests in the posix_timers selftest which test timer behavior
related to SIG_IGN fail on kernels older than 6.13. This is due to
a refactoring of signal handling in commit caf77435dd ("signal:
Handle ignored signals in do_sigaction(action != SIG_IGN)").

A previous attempt to fix this by adding a kernel version check to each
of the nine affected tests was suboptimal, as it resulted in emitting
the same skip message nine times.

Following the suggestion from Thomas Gleixner, this is refactored to
perform a single version check in main(). To satisfy the kselftest
framework's requirement for the test count to match the declared plan,
the plan is now conditionally set to 10 (for older kernels) or 19.

While setting the plan conditionally may seem complex, it is the
better approach to avoid the alternatives: either running tests on
unsupported kernels that are known to fail, or emitting a noisy series
of nine identical skip messages. A single informational message is now
printed instead when the tests are skipped.

Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250807085042.1690931-1-wakel@google.com/
Link: https://patch.msgid.link/20251103114502.584940-1-wakel@google.com
2025-11-14 20:34:50 +01:00
Guopeng Zhang
1dc830ee4c selftests/cgroup: conform test to KTAP format output
Conform the layout, informational and status messages to KTAP.  No
functional change is intended other than the layout of output messages.

Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Sebastian Chlad <sebastian.chlad@suse.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14 08:12:57 -10:00
Eduard Zingerman
6c762611fe selftests/bpf: Test widen_imprecise_scalars() with different stack depth
A test case for a situation when widen_imprecise_scalars() is called
with old->allocated_stack > cur->allocated_stack. Test structure:

    def widening_stack_size_bug():
      r1 = 0
      for r6 in 0..1:
        iterator_with_diff_stack_depth(r1)
        r1 = 42

    def iterator_with_diff_stack_depth(r1):
      if r1 != 42:
        use 128 bytes of stack
      iterator based loop

iterator_with_diff_stack_depth() is verified with r1 == 0 first and
r1 == 42 next. Causing stack usage of 128 bytes on a first visit and 8
bytes on a second. Such arrangement triggered a KASAN error in
widen_imprecise_scalars().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 09:26:28 -08:00
André Almeida
cd91b502f1 selftests/futex: Create test for robust list
Create a test for the robust list mechanism. Test the following uAPI
operations:

- Creating a robust mutex where the lock waiter is wake by the kernel
  when the lock owner died
- Setting a robust list to the current task
- Getting a robust list from the current task
- Getting a robust list from another task
- Using the list_op_pending field from robust_list_head struct to test
  robustness when the lock owner dies before completing the locking
- Setting a invalid size for syscall argument `len`
- Adding multiple elements to a robust list wait waiting for each of
  them
- Creating a circular list and checking that the kernel does not get
  stuck in an infinity loop

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251110224130.3044761-1-andrealmeid@igalia.com
2025-11-14 14:39:37 +01:00
Carlos Llamas
275498b881 selftests/futex: Skip tests if shmget unsupported
On systems where the shmget() syscall is not supported, tests like
anon_page and shared_waitv will fail. Skip these tests in such cases to
allow the rest of the test suite to run.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251016162009.3270784-1-cmllamas@google.com
2025-11-14 14:39:36 +01:00
Carlos Llamas
9407d138b8 selftests/futex: Add newline to ksft_exit_fail_msg()
This was missed in commit e5c04d0f3e ("selftests/futex: Refactor
futex_wait with kselftest_harness.h") while replacing previous perror()
calls, which automatically append the newline character.

Fixes: e5c04d0f3e ("selftests/futex: Refactor futex_wait with kselftest_harness.h")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251015173556.2899646-1-cmllamas@google.com
2025-11-14 14:39:36 +01:00
André Almeida
2d98144440 selftests/futex: Remove unused test_futex_mpol()
Commit ed323aeda5 ("selftest/futex: Compile also with libnuma <
2.0.16") removed the unused function test_futex_mpol() and commit
d35ca2f642 ("selftests/futex: Refactor futex_numa_mpol with
kselftest_harness.h") added it back by accident. Remove it again.

Fixes: d35ca2f642 ("selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h")
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251001220438.66227-1-andrealmeid@igalia.com
2025-11-14 14:39:36 +01:00
Linus Torvalds
6da43bbeb6 VFIO fixes for v6.18-rc6
- Fix vfio selftests to remove the expectation that the IOMMU
    supports a 64-bit IOVA space.  These manifest both in the original
    set of tests introduced this development cycle in identity mapping
    the IOVA to buffer virtual address space, as well as the more
    recent boundary testing.  Implement facilities for collecting the
    valid IOVA ranges from the backend, implement a simple IOVA
    allocator, and use the information for determining extents.
    (Alex Mastro)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkWXwYRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyK96w//ep97M1B5RSA4/AIP4j+fovnlM2RVUrPy
 RKxVYK+rzftO4Hq8sg1Co3aB68hItbYiw/0BDUgQBFFCEoInQ9zht89tsHhuXq55
 lYq9hyYMnp60FLzpX8Fi2HKf8SKTdD+6P1hX5UWZfYExc/tCD/9FPE2QsfUTbnh/
 u2aguEp0RhgC8xCcR/5EsGFzcBgioDcTBQq5UQ/5By51nUNvdmHKRHKRC7ZmQfiK
 bwFpYseWHRdzdBH2KggNaDZUh0q7Y9+cwL2UndQV5IfwGhGWODztOsVhhoxAAu0n
 c6gXYDg34i5i1epPrr8Yvv8trDzMFCRYJi4g8BsLPOBXe8Lr2bWj/4wA1ftdW6Ha
 0ii4Gx1UO1vz0KZQBajMHv5pO06lJZshzEQgHr3sYgjaRg8AVTV6hI3IVaty2JWk
 DPKPNSGKhjkWvWnx73KFiiepz5zXEL7r7ooFBimtBwFr/5bDB8hfPjpXMh+mhiZu
 QW6DV+Mrqg1G1KDCwdHSvbBwH8IvfX/tQbs9eKvGt+6ICpH1GJ1cUr58vjcc6ESi
 sJzKI0ceTjGZAhHJogO4xAzhI8H6kwtDHB1ZC9QLHIwbkXFN1m64VjOwwXuHk0HT
 559gjqjH0yKNSABwRMV1nKzn9EFnivQn1tAoGkUGSFE9ZI/VfCNNlspPziZEz0z9
 u14SILGMqWU=
 =ywvT
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio

Pull VFIO seftest fixes from Alex Williamson:

 - Fix vfio selftests to remove the expectation that the IOMMU supports
   a 64-bit IOVA space.

   These manifest both in the original set of tests introduced this
   development cycle in identity mapping the IOVA to buffer virtual
   address space, as well as the more recent boundary testing.

   Implement facilities for collecting the valid IOVA ranges from the
   backend, implement a simple IOVA allocator, and use the information
   for determining extents (Alex Mastro)

* tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio:
  vfio: selftests: replace iova=vaddr with allocated iovas
  vfio: selftests: add iova allocator
  vfio: selftests: fix map limit tests to use last available iova
  vfio: selftests: add iova range query helpers
2025-11-13 17:00:40 -08:00
Matt Bobrowski
93ce3bee31 selftests/bpf: retry bpf_map_update_elem() when E2BIG is returned
Executing the test_maps binary on platforms with extremely high core
counts may cause intermittent assertion failures in
test_update_delete() (called via test_map_parallel()). This can occur
because bpf_map_update_elem() under some circumstances (specifically
in this case while performing bpf_map_update_elem() with BPF_NOEXIST
on a BPF_MAP_TYPE_HASH with its map_flags set to BPF_F_NO_PREALLOC)
can return an E2BIG error code i.e.

error -7 7 tools/testing/selftests/bpf/test_maps.c:#: void
test_update_delete(unsigned int, void *): Assertion `err == 0' failed.
tools/testing/selftests/bpf/test_maps.c:#: void
__run_parallel(unsigned int, void (*)(unsigned int, void *), void *):
Assertion `status == 0' failed.

As it turns out, is_map_full() which is called from alloc_htab_elem()
can take on a conservative approach when htab->use_percpu_counter is
true (which is the case here because the percpu_counter is used when a
BPF_MAP_TYPE_HASH is created with its map_flags set to
BPF_F_NO_PREALLOC). This conservative approach prioritizes preventing
over-allocation and potential issues that could arise from possibly
exceeding htab->map.max_entries in highly concurrent environments,
even if it means slightly under-utilizing the htab map's capacity.

Given that bpf_map_update_elem() from test_update_delete() can return
E2BIG, update can_retry() such that it also accounts for the E2BIG
error code (specifically only when running with map_flags being set to
BPF_F_NO_PREALLOC). The retry loop will allow the global count
belonging to the percpu_counter to become synchronized and better
reflect the current htab map's capacity.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251113092519.2632079-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13 14:36:28 -08:00
Jiayuan Chen
cb730e4ac1 selftests/bpf: Add mptcp test with sockmap
Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.

Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-4-jiayuan.chen@linux.dev
2025-11-13 13:18:25 -08:00
Jakub Kicinski
c99ebb6132 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc6).

No conflicts, adjacent changes in:

drivers/net/phy/micrel.c
  96a9178a29 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
  61b7ade9ba ("net: phy: micrel: Add support for non PTP SKUs for lan8814")

and a trivial one in tools/testing/selftests/drivers/net/Makefile.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13 12:35:38 -08:00
Linus Torvalds
8b4a014e28 linux_kselftest-fixes-6.18-rc6
Fixes event-filter-function.tc tracing test failure caused when a first
 run to sample events triggers kmem_cache_free which interferes with the
 rest of the test. Fix this calling sample_events twice to eliminate the
 kmem_cache_free related noise from the sampling.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmkWC6YACgkQCwJExA0N
 QxzbIQ/9G93/8N7G9otEc5MRalvJlv2+NZmD2DOvbitc1zpKyYlGJc2ZakBkuQrA
 z3WBa1bei4cZfrWEZDtkU7YzcRiCmLFPV1cDzAudc30InnJDLVN8IwMpFZY61Gwp
 gBanPFHpxpcccolFzDQ5rtP2pJL6/XFsHqDZ5wUFgV8MbosgNskpIeGk7f33JZrh
 xqUi+fBTOnR1TszsDCT4pRr1bM2jB/HhcJIQNPLkD75VIOXer+6DGenREkbiMubf
 Wss9JdpA8x2L40GgkMRn6dqXW/1KSL7CT9UgMoPxgs6F1s1R6iDd4f5PO4IXa2sU
 FnD8x4QIh1cZHirtRC+ChQwO9KKCoHq5CeEXtr2zGs3Xus2YbS+2OBGFCVoxivQp
 BPE093Auglunk3Zf4ZQSophYoc8IC1xC8ie99TOJexxCANpXAcVd5mDoK9tIjsKu
 mCmY0ng9yVsHaNyiX/NcsViyp5LcIRlL3P9q0ZKXwOxB4MB0c4IOJeAyN8KtTH/4
 GXjYzIq0+u7Pk47rMvkHHOi3myIYm1mUfMQsWtIqXracsFN824AXeoWtcxuHoZ9Q
 puobPEwNBKcj61JYymogS58Eu7+3AFWLagO+U3zKORDBq3OAq7nb2qF/VMcpGlF6
 LQEo7wdzcrRmaU4OVw7NTIddy1E035rjFsuKNjBls5PK5NSfAHc=
 =OZMi
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "Fixes event-filter-function.tc tracing test failure caused when a
  first run to sample events triggers kmem_cache_free which interferes
  with the rest of the test.

  Fix this by calling sample_events twice to eliminate the
  kmem_cache_free related noise from the sampling"

* tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/tracing: Run sample events to clear page cache events
2025-11-13 11:37:40 -08:00
Linus Torvalds
d0309c0543 Including fixes from Bluetooth and Wireless. No known outstanding
regressions.
 
 Current release - regressions:
 
   - eth: bonding: fix mii_status when slave is down
 
   - eth: mlx5e: fix missing error assignment in mlx5e_xfrm_add_state()
 
 Previous releases - regressions:
 
   - sched: limit try_bulk_dequeue_skb() batches
 
   - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe
 
   - af_unix: initialise scc_index in unix_add_edge()
 
   - netpoll: fix incorrect refcount handling causing incorrect cleanup
 
   - bluetooth: don't hold spin lock over sleeping functions
 
   - hsr: Fix supervision frame sending on HSRv0
 
   - sctp: prevent possible shift out-of-bounds
 
   - tipc: fix use-after-free in tipc_mon_reinit_self().
 
   - dsa: tag_brcm: do not mark link local traffic as offloaded
 
   - eth: virtio-net: fix incorrect flags recording in big mode
 
 Previous releases - always broken:
 
   - sched: initialize struct tc_ife to fix kernel-infoleak
 
   - wifi:
     - mac80211: reject address change while connecting
     - iwlwifi: avoid toggling links due to wrong element use
 
   - bluetooth: cancel mesh send timer when hdev removed
 
   - strparser: fix signed/unsigned mismatch bug
 
   - handshake: fix memory leak in tls_handshake_accept()
 
 Misc:
 
   - selftests: mptcp: fix some flaky tests
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmkV/O8SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkeBQQALgJv+sfJ/8H+BA3US7BU9kwydXtJneo
 mHKnse3NCiV3kXyJu7p+CIBU4xP8LMMmpVFZQ0dPKnbFyfGWjCX+UOEMU4NBnhG7
 dfSGWAxR8iS0gUi/5K4dT55LZjeQ392Zu2OqrBRjAAYG0s5vXbAcCx0jUhfmG+ZD
 rJupFYuRuF7W3UvWC/TQviY4oJ0goaZnrm6Y5ADX9rblPlzD5xgOTqZzSuUpLxQM
 Q37IGjW3F12FrNmqabC3MBQcNfWNpqUgkDfFxMJbxGPDe9CjvSAAmMVNmDDkr+EH
 eL5+M1cdH+L9RRBNSQ4SX9dsNwhyyU04wrEADGf39jcgw/ACXI+t0laj/Hm0O1xg
 vMOtqiQYSe8lVVdCswmi6BdYxBsNU6l2gx0evq/qGztFbDW9s5rn26uXy5CIqlJa
 04k1lmT5EeMpE9opwxPJpw+5LyjjtCsD6AGOlLb8DU6cbXKVUhxCZVHdDNnfxt4J
 ZfQo8aUx6X3vnDGMzPWEZbYMqd4va6hVPTJdUuqk1enuE6KfhKMhWbP5D9a/qiqM
 lhinWYdmBP6bKJlxdfsH7kgwhfuoi/jT54VYu33l5LmrOk7+tO7gLJcRhNZW9jsl
 KkJ+Wk1JQ7/oiQI8tcUCc+LEwwY54F+34HAHjFLNELW+bp/vvoMEVeZKku2YM3Gy
 xW+7WYdrx2RK
 =JAyr
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth and Wireless. No known outstanding
  regressions.

  Current release - regressions:

   - eth:
      - bonding: fix mii_status when slave is down
      - mlx5e: fix missing error assignment in mlx5e_xfrm_add_state()

  Previous releases - regressions:

   - sched: limit try_bulk_dequeue_skb() batches

   - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe

   - af_unix: initialise scc_index in unix_add_edge()

   - netpoll: fix incorrect refcount handling causing incorrect cleanup

   - bluetooth: don't hold spin lock over sleeping functions

   - hsr: Fix supervision frame sending on HSRv0

   - sctp: prevent possible shift out-of-bounds

   - tipc: fix use-after-free in tipc_mon_reinit_self().

   - dsa: tag_brcm: do not mark link local traffic as offloaded

   - eth: virtio-net: fix incorrect flags recording in big mode

  Previous releases - always broken:

   - sched: initialize struct tc_ife to fix kernel-infoleak

   - wifi:
      - mac80211: reject address change while connecting
      - iwlwifi: avoid toggling links due to wrong element use

   - bluetooth: cancel mesh send timer when hdev removed

   - strparser: fix signed/unsigned mismatch bug

   - handshake: fix memory leak in tls_handshake_accept()

  Misc:

   - selftests: mptcp: fix some flaky tests"

* tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
  hsr: Follow standard for HSRv0 supervision frames
  hsr: Fix supervision frame sending on HSRv0
  virtio-net: fix incorrect flags recording in big mode
  ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe
  wifi: iwlwifi: mld: always take beacon ies in link grading
  wifi: iwlwifi: mvm: fix beacon template/fixed rate
  wifi: iwlwifi: fix aux ROC time event iterator usage
  net_sched: limit try_bulk_dequeue_skb() batches
  selftests: mptcp: join: properly kill background tasks
  selftests: mptcp: connect: trunc: read all recv data
  selftests: mptcp: join: userspace: longer transfer
  selftests: mptcp: join: endpoints: longer transfer
  selftests: mptcp: join: rm: set backup flag
  selftests: mptcp: connect: fix fallback note due to OoO
  ethtool: fix incorrect kernel-doc style comment in ethtool.h
  mlx5: Fix default values in create CQ
  Bluetooth: btrtl: Avoid loading the config file on security chips
  net/mlx5e: Fix potentially misleading debug message
  net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps
  net/mlx5e: Fix maxrate wraparound in threshold between units
  ...
2025-11-13 11:20:25 -08:00
Leon Hwang
c1cbf0d21c selftests/bpf: Add test to verify freeing the special fields in pcpu maps
Add test to verify that updating [lru_,]percpu_hash maps decrements
refcount when BPF_KPTR_REF objects are involved.

The tests perform the following steps:
. Call update_elem() to insert an initial value.
. Use bpf_refcount_acquire() to increment the refcount.
. Store the node pointer in the map value.
. Add the node to a linked list.
. Probe-read the refcount and verify it is *2*.
. Call update_elem() again to trigger refcount decrement.
. Probe-read the refcount and verify it is *1*.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251105151407.12723-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13 09:15:33 -08:00
Dave Jiang
87c69670da Merge branch 'for-6.19/cxl-addr-xlat' into cxl-for-next
Enable unit testing for XOR address translation of SPA to DPA and vice versa.
2025-11-13 08:44:00 -07:00
Dave Jiang
2be575434e Merge branch 'for-6.19/cxl-misc' into cxl-for-next
Misc patches for CXL 6.19
- Remove incorrect page-allocator quirk section in documentation.
- Remove unused devm_cxl_port_enumerate_dports() function.
- Fix typo in cdat.c code comment.
- Replace use of system_wq with system_percpu_wq
- Add locked decoder support
- Return when generic target updated
- Rename region_res_match_cxl_range() to spa_maps_hpa()
- Clarify comment in spa_maps_hpa()
2025-11-13 08:42:46 -07:00
Dimitri Daskalakis
f766f8cdde selftests: drv-net: Limit the max number of queues in procfs_downup_hammer
For NICs with a large (1024+) number of queues, this test can cause
excessive memory fragmentation. This results in OOM errors, and in the
worst case driver/kernel crashes. We don't need to test with the max number
of queues, just enough to create a high likelihood of races between
reconfiguration and stats getting read.

Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251111225319.3019542-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 18:04:18 -08:00
Li RongQing
9544f9e694 hung_task: panic when there are more than N hung tasks at the same time
The hung_task_panic sysctl is currently a blunt instrument: it's all or
nothing.

Panicking on a single hung task can be an overreaction to a transient
glitch.  A more reliable indicator of a systemic problem is when
multiple tasks hang simultaneously.

Extend hung_task_panic to accept an integer threshold, allowing the
kernel to panic only when N hung tasks are detected in a single scan. 
This provides finer control to distinguish between isolated incidents
and system-wide failures.

The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic on the first hung task (unchanged)
- N > 1: Panic after N hung tasks are detected in a single scan

The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility.

[lance.yang@linux.dev: new changelog]
Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> [aspeed_g5_defconfig]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Florian Wesphal <fw@strlen.de>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12 10:00:14 -08:00
Alex Mastro
d323ad7396 vfio: selftests: replace iova=vaddr with allocated iovas
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr
as part of DMA mapping operations. However, not all IOMMUs support the
same virtual address width as the processor. For instance, older Intel
consumer platforms only support 39-bits of IOMMU address space. On such
platforms, using the virtual address as the IOVA fails.

Make the tests more robust by using iova_allocator to vend IOVAs, which
queries legally accessible IOVAs from the underlying IOMMUFD or VFIO
container.

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Alex Mastro
ce0e3c403e vfio: selftests: add iova allocator
Add struct iova_allocator, which gives tests a convenient way to generate
legally-accessible IOVAs to map. This allocator traverses the sorted
available IOVA ranges linearly, requires power-of-two size allocations,
and does not support freeing iova allocations. The assumption is that
tests are not IOVA space-bounded, and will not need to recycle IOVAs.

This is based on Alex Williamson's patch series for adding an IOVA
allocator [1].

[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-3-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Alex Mastro
a77fa0b922 vfio: selftests: fix map limit tests to use last available iova
Use the newly available vfio_pci_iova_ranges() to determine the last
legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.

Fixes: de8d1f2fd5 ("vfio: selftests: add end of address space DMA map/unmap tests")
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-2-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Alex Mastro
7c44656ab3 vfio: selftests: add iova range query helpers
VFIO selftests need to map IOVAs from legally accessible ranges, which
could vary between hardware. Tests in vfio_dma_mapping_test.c are making
excessively strong assumptions about which IOVAs can be mapped.

Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the
IOMMUFD or VFIO container associated with the device. The queried ranges
are normalized to IOMMUFD's iommu_iova_range representation so that
handling of IOVA ranges up the stack can be implementation-agnostic.
iommu_iova_range and vfio_iova_range are equivalent, so bias to using the
new interface's struct.

Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES.
Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.

The underlying vfio_iommu_type1_info buffer-related functionality has
been kept generic so the same helpers can be used to query other
capability chain information, if needed.

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Bobby Eshleman
99f932c905 selftests/vsock: disable shellcheck SC2317 and SC2119
Disable shellcheck rules SC2317 and SC2119. These rules are being
triggered due to false positives. For SC2317, many `return
"${KSFT_PASS}"` lines are reported as unreachable, even though they are
executed during normal runs. For SC2119, the fact that
log_guest/log_host accept either stdin or arguments triggers SC2119,
despite being valid.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-12-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:40 -08:00
Bobby Eshleman
338c5ddf4c selftests/vsock: add vsock_loopback module loading
Add vsock_loopback module loading to the loopback test so that vmtest.sh
can be used for kernels built with loopback as a module.

This is not technically a fix as kselftest expects loopback to be
built-in already (defined in selftests/vsock/config). This is useful
only for using vmtest.sh outside of kselftest.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-11-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:39 -08:00
Bobby Eshleman
67422ef38f selftests/vsock: add 1.37 to tested virtme-ng versions
Testing with 1.37 shows all tests passing but emits the warning:

warning: vng version 'virtme-ng 1.37' has not been tested and may not function properly.
	The following versions have been tested: 1.33 1.36

This patch adds 1.37 to the virtme-ng versions to get rid of the above
warning.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-10-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:39 -08:00
Bobby Eshleman
592e3d14ce selftests/vsock: add BUILD=0 definition
Add the definition for BUILD and initialize it to zero. This avoids
'bash -u vmtest.sh` from throwing 'unbound variable' when BUILD is not
set to 1 and is later checked for its value.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-9-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:39 -08:00
Bobby Eshleman
d13fb04a4b selftests/vsock: identify and execute tests that can re-use VM
In preparation for future patches that introduce tests that cannot
re-use the same VM, add functions to identify those that *can* re-use a
VM.

By continuing to re-use the same VM for these tests we can save time by
avoiding the delay of booting a VM for every test.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-8-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:39 -08:00
Bobby Eshleman
7fea50dff9 selftests/vsock: add check_result() for pass/fail counting
Add check_result() function to reuse logic for incrementing the
pass/fail counters. This function will get used by different callers as
we add different types of tests in future patches (namely, namespace and
non-namespace tests will be called at different places, and re-use this
function).

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-7-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:39 -08:00
Bobby Eshleman
9e2ad0bc36 selftests/vsock: speed up tests by reducing the QEMU pidfile timeout
Reduce the time waiting for the QEMU pidfile from three minutes to five
seconds. The three minute time window was chosen to make sure QEMU had
enough time to fully boot up. This, however, is an unreasonably long
delay for QEMU to write the pidfile, which happens earlier when the QEMU
process starts (not after VM boot). The three minute delay becomes
noticeably wasteful in future tests that expect QEMU to fail and wait a
full three minutes for a pidfile that will never exist.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-6-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:38 -08:00
Bobby Eshleman
c7df4adc06 selftests/vsock: do not unconditionally die if qemu fails
If QEMU fails to boot, then set the returncode (via timeout) instead of
unconditionally dying. This is in preparation for tests that expect QEMU
to fail to boot. In that case, we just want to know if the boot failed
or not so we can test the pass/fail criteria, and continue executing the
next test.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-5-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:38 -08:00
Bobby Eshleman
ac8997e943 selftests/vsock: avoid multi-VM pidfile collisions with QEMU
Change QEMU to use generated pidfile names instead of just a single
globally-defined pidfile. This allows multiple QEMU instances to
co-exist with different pidfiles. This is required for future tests that
use multiple VMs to check for CID collissions.

Additionally, this also places the burden of killing the QEMU process
and cleaning up the pidfile on the caller of vm_start(). To help with
this, a function terminate_pidfiles() is introduced that callers use to
perform the cleanup. The terminate_pidfiles() function supports multiple
pidfile removals because future patches will need to process two
pidfiles at a time.

Change QEMU_OPTS to be initialized inside the vm_start(). This allows
the generated pidfile to be passed to the string assignment, and
prepares for future vm-specific options as well (e.g., cid).

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-4-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:38 -08:00
Bobby Eshleman
4f76ff14d3 selftests/vsock: reuse logic for vsock_test through wrapper functions
Add wrapper functions vm_vsock_test() and host_vsock_test() to invoke
the vsock_test binary. This encapsulates several items of repeat logic,
such as waiting for the server to reach listening state and
enabling/disabling the bash option pipefail to avoid pipe-style logging
from hiding failures.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-3-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:38 -08:00
Bobby Eshleman
2ed3ce7efb selftests/vsock: make wait_for_listener() work even if pipefail is on
Rewrite wait_for_listener()'s pattern matching to avoid tripping the
if-condition when pipefail is on.

awk doesn't gracefully handle SIGPIPE with a non-zero exit code, so grep
exiting upon finding a match causes false-positives when the pipefail
option is used (grep exits, SIGPIPE emits, and awk complains with a
non-zero exit code). Instead, move all of the pattern matching into awk
so that SIGPIPE cannot happen and the correct exit code is returned.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-2-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:38 -08:00
Bobby Eshleman
d9cac93cd1 selftests/vsock: improve logging in vmtest.sh
Improve usability of logging functions. Remove the test name prefix from
logging functions so that logging calls can be made deeper into the call
stack without passing down the test name or setting some global. Teach
log function to accept a LOG_PREFIX variable to avoid unnecessary
argument shifting.

Remove log_setup() and instead use log_host(). The host/guest prefixes
are useful to show whether a failure happened on the guest or host side,
but "setup" doesn't really give additional useful information. Since all
log_setup() calls happen on the host, lets just use log_host() instead.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-1-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:19:37 -08:00
Jiaqi Yan
feee9ef7ac KVM: selftests: Test for KVM_EXIT_ARM_SEA
Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.

The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
   has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
   guest and KVM injects expected SEA into the VCPU.

Tested on a data center server running Siryn AmpereOne processor
that has RAS support.

Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
  inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
  Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
  APEI and GHES to offline the memory before guest can consume
  injected UER, and making test unable to trigger SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12 01:27:16 -08:00
Matthieu Baerts (NGI0)
852b644acb selftests: mptcp: join: properly kill background tasks
The 'run_tests' function is executed in the background, but killing its
associated PID would not kill the children tasks running in the
background.

To properly kill all background tasks, 'kill -- -PID' could be used, but
this requires kill from procps-ng. Instead, all children tasks are
listed using 'ps', and 'kill' is called with all PIDs of this group.

Fixes: 31ee4ad86a ("selftests: mptcp: join: stop transfer when check is done (part 1)")
Cc: stable@vger.kernel.org
Fixes: 04b57c9e09 ("selftests: mptcp: join: stop transfer when check is done (part 2)")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-6-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:49:49 -08:00
Matthieu Baerts (NGI0)
ee79980f7a selftests: mptcp: connect: trunc: read all recv data
MPTCP Join "fastclose server" selftest is sometimes failing because the
client output file doesn't have the expected size, e.g. 296B instead of
1024B.

When looking at a packet trace when this happens, the server sent the
expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE.
It is then strange to see the client only receiving 296B, which would
mean it only got a part of the second packet. The problem is then not on
the networking side, but rather on the data reception side.

When mptcp_connect is launched with '-f -1', it means the connection
might stop before having sent everything, because a reset has been
received. When this happens, the program was directly stopped. But it is
also possible there are still some data to read, simply because the
previous 'read' step was done with a buffer smaller than the pending
data, see do_rnd_read(). In this case, it is important to read what's
left in the kernel buffers before stopping without error like before.

SIGPIPE is now ignored, not to quit the app before having read
everything.

Fixes: 6bf41020b7 ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-5-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:49:49 -08:00
Matthieu Baerts (NGI0)
290493078b selftests: mptcp: join: userspace: longer transfer
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications

To play it safe, all userspace tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.

Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Fixes: b2e2248f36 ("selftests: mptcp: userspace pm create id 0 subflow")
Fixes: e3b47e460b ("selftests: mptcp: userspace pm remove initial subflow")
Fixes: b9fb176081 ("selftests: mptcp: userspace pm send RM_ADDR for ID 0")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-4-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:49:49 -08:00
Matthieu Baerts (NGI0)
6457595db9 selftests: mptcp: join: endpoints: longer transfer
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications

To play it safe, all endpoints tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.

Fixes: 69c6ce7b6e ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Fixes: b5e2fb832f ("selftests: mptcp: add explicit test case for remove/readd")
Fixes: e06959e9ee ("selftests: mptcp: join: test for flush/re-add endpoints")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-3-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:49:48 -08:00
Matthieu Baerts (NGI0)
aea73bae66 selftests: mptcp: join: rm: set backup flag
Some of these 'remove' tests rarely fail because a subflow has been
reset instead of cleanly removed. This can happen when one extra subflow
which has never carried data is being closed (FIN) on one side, while
the other is sending data for the first time.

To avoid such subflows to be used right at the end, the backup flag has
been added. With that, data will be only carried on the initial subflow.

Fixes: d2c4333a80 ("selftests: mptcp: add testcases for removing addrs")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-2-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:49:48 -08:00
Matthieu Baerts (NGI0)
63c643aa7b selftests: mptcp: connect: fix fallback note due to OoO
The "fallback due to TCP OoO" was never printed because the stat_ooo_now
variable was checked twice: once in the parent if-statement, and one in
the child one. The second condition was then always true then, and the
'else' branch was never taken.

The idea is that when there are more ACK + MP_CAPABLE than expected, the
test either fails if there was no out of order packets, or a notice is
printed.

Fixes: 69ca3d29a7 ("mptcp: update selftest for fallback due to OoO")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-1-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11 17:49:47 -08:00
Gabriele Monaco
0c0cd931a0 selftests/verification: Add initial RV tests
Add a series of tests to validate the RV tracefs API and basic
functionality.

* available monitors:
    Check that all monitors (from the monitors folder) appear as
    available and have a description. Works with nested monitors.

* enable/disable:
    Enable and disable all monitors and validate both the enabled file
    and the enabled_monitors. Check that enabling container monitors
    enables all nested monitors.

* reactors:
    Set all reactors and validate the setting, also for nested monitors.

* wwnr with printk:
    wwnr is broken on purpose, run it with a load and check that the
    printk reactor works. Also validate disabling reacting_on or
    monitoring_on prevents reactions.

These tests use the ftracetest suite.

Acked-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20251017115203.140080-3-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 13:18:55 +01:00
Gabriele Monaco
a0aa283c53 selftest/ftrace: Generalise ftracetest to use with RV
The ftracetest script is a fairly complete test framework for tracefs-like
subsystem, but it can only be used for ftrace selftests.

If OPT_TEST_DIR is provided and includes a function file, use that as
test directory going forward rather than just grabbing tests from it.

Generalise function names like initialize_ftrace to initialize_system.

Add the --rv argument to set up the test for rv, basically changing the
trace directory to $TRACING_DIR/rv and displaying an error if that
cannot be found.

This prepares for rv selftests inclusion.

Link: https://lore.kernel.org/r/20251017115203.140080-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 12:40:15 +01:00
Christian Brauner
6453937581
selftests/namespaces: fix nsid tests
Ensure that we always kill and cleanup all processes.

Link: https://patch.msgid.link/20251110-work-namespace-nstree-fixes-v1-17-e8a9264e0fb9@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-11 10:01:32 +01:00
Christian Brauner
a67ee4e2ba
Merge branch 'kbuild-6.19.fms.extension'
Bring in the shared branch with the kbuild tree to enable
'-fms-extensions' for 6.19. Further namespace cleanup work
requires this extension.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-11 09:59:08 +01:00
Breno Leitao
236682db3b selftest: netcons: add test for netconsole over bonded interfaces
This patch adds a selftest that verifies netconsole functionality
over bonded network interfaces using netdevsim. It sets up two bonded
interfaces acting as transmit (TX) and receive (RX) ends, placed in
separate network namespaces. The test sends kernel log messages and
verifies that they are properly received on the bonded RX interfaces
with both IPv4 and IPv6, and using basic and extended netconsole
formats.

This patchset aims to test a long-standing netpoll subsystem where
netpoll has multiple users. (in this case netconsole and bonding). A
similar selftest has been discussed in [1] and [2].

This test also tries to enable bonding and netpoll in different order,
just to guarantee that all the possibilities are exercised.

Link: https://lore.kernel.org/all/20250905-netconsole_torture-v3-0-875c7febd316@debian.org/ [1]
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/ [2]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-4-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10 18:34:44 -08:00
Breno Leitao
6701896eb9 selftest: netcons: create a torture test
Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
messages are being sent, as creating parallel targets.

The code launches three background jobs on distinct schedules:

 * Toggle netcons target every 30 iterations
 * create and delete random_target every 50 iterations
 * toggle iface every 70 iterations

This creates multiple concurrency sources that interact with netconsole
states. This is good practice to simulate stress, and exercise netpoll
and netconsole locks.

This test already found an issue as reported in [1]

Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-3-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10 18:34:44 -08:00
Breno Leitao
39acc6a95e selftest: netcons: refactor target creation
Extract the netconsole target creation from create_dynamic_target(), by
moving it from create_dynamic_target() into a new helper function. This
enables other tests to use the creation of netconsole targets with
arbitrary parameters and no sleep.

The new helper will be utilized by forthcoming torture-type selftests
that require dynamic target management.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-2-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10 18:34:44 -08:00
Steven Rostedt
dd4adb986a selftests/tracing: Run sample events to clear page cache events
The tracing selftest "event-filter-function.tc" was failing because it
first runs the "sample_events" function that triggers the kmem_cache_free
event and it looks at what function was used during a call to "ls".

But the first time it calls this, it could trigger events that are used to
pull pages into the page cache.

The rest of the test uses the function it finds during that call to see if
it will be called in subsequent "sample_events" calls. But if there's no
need to pull pages into the page cache, it will not trigger that function
and the test will fail.

Call the "sample_events" twice to trigger all the page cache work before
it calls it to find a function to use in subsequent checks.

Cc: stable@vger.kernel.org
Fixes: eb50d0f250 ("selftests/ftrace: Choose target function for filter test from samples")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-10 18:00:07 -07:00
Victor Nogueira
60260ad935 selftests/tc-testing: Create tests trying to add children to clsact/ingress qdiscs
In response to Wang's bug report [1], add the following test cases:

- Try and fail to add an fq child to an ingress qdisc
- Try and fail to add an fq child to a clsact qdisc

[1] https://lore.kernel.org/netdev/20251105022213.1981982-1-wangliang74@huawei.com/

Reviewed-by: Pedro Tammela <pctammela@mojatatu.ai>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Cong Wang <cwang@multikernel.io>
Link: https://patch.msgid.link/20251106205621.3307639-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10 16:57:56 -08:00
D. Wythe
beb3c67297 bpf/selftests: Add selftest for bpf_smc_hs_ctrl
This tests introduces a tiny smc_hs_ctrl for filtering SMC connections
based on IP pairs, and also adds a realistic topology model to verify it.

Also, we can only use SMC loopback under CI test, so an additional
configuration needs to be enabled.

Follow the steps below to run this test.

make -C tools/testing/selftests/bpf
cd tools/testing/selftests/bpf
sudo ./test_progs -t smc

Results shows:
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20251107035632.115950-4-alibuda@linux.alibaba.com
2025-11-10 12:00:45 -08:00
Jakub Sitnicki
d2c5cca3fb selftests/bpf: Cover skb metadata access after bpf_skb_change_proto
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_proto(), which modifies packet headroom to accommodate
different IP header sizes.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-16-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
85d454afef selftests/bpf: Cover skb metadata access after change_head/tail helper
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet
headroom/tailroom and can trigger head reallocation.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-15-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
29960e635b selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.

The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
354d020c29 selftests/bpf: Cover skb metadata access after vlan push/pop helper
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
1e1357fde8 selftests/bpf: Expect unclone to preserve skb metadata
Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.

Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
9ef9ac15a5 selftests/bpf: Dump skb metadata on verification failure
Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
967534e57c selftests/bpf: Verify skb metadata in BPF instead of userspace
Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.

Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Linus Torvalds
4ea7c1717f Arm:
- Fix trapping regression when no in-kernel irqchip is present
 
 - Check host-provided, untrusted ranges and offsets in pKVM
 
 - Fix regression restoring the ID_PFR1_EL1 register
 
 - Fix vgic ITS locking issues when LPIs are not directly injected
 
 Arm selftests:
 
 - Correct target CPU programming in vgic_lpi_stress selftest
 
 - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
 
 RISC-V:
 
 - Fix check for local interrupts on riscv32
 
 - Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts
 
 - Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
 
 x86:
 
 - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
   doesn't support virtualization the instructions, but the instructions
   are gated only by VMXON.  That is, they will VM-Exit instead of taking
   a #UD and until now this resulted in KVM exiting to userspace with an
   emulation error.
 
 - Unload the "FPU" when emulating INIT of XSTATE features if and only if
   the FPU is actually loaded, instead of trying to predict when KVM will
   emulate an INIT (CET support missed the MP_STATE path).  Add sanity
   checks to detect and harden against similar bugs in the future.
 
 - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.
 
 - Use a raw spinlock for svm->ir_list_lock as the lock is taken during
   schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.
 
 - Remove guest_memfd bindings on memslot deletion when a gmem file is dying
   to fix a use-after-free race found by syzkaller.
 
 - Fix a goof in the EPT Violation handler where KVM checks the wrong
   variable when determining if the reported GVA is valid.
 
 - Fix and simplify the handling of LBR virtualization on AMD, which was made
   buggy and unnecessarily complicated by nested VM support
 
 Misc:
 
 - Update Oliver's email address
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkQSAAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMDzQf9GW6F8yAo1QTnLNfSIvHMQcJAKwr3
 pd+bOLtMN/OzdQO/dvGVdkYuYYg/PQv/+zw6dVIeiLjtlrsTg40rsqVZEAXYCsbB
 q7TFM634thgON6R6fD6eLa+72UP0wMai7xqEfEyXVW3enAEEe+lrPWC9BiwJRqKZ
 pY1MpVIdYa0XNfUBeiOhO5AH+y9OmUDq5AHptrYn9X5xdsU2OWqQCyHW/1RLPWvA
 9bkyuz1A8+EQ20ngHUd0hrQx4UeJ7jvPblbryUxXaMwqahPC9sA2iDI12gAu8a84
 skvWbIPHSMgj5qDO/CkAsHb47GiqudU4LH7LniDZNsq21iTekiURSbdklw==
 =SM6O
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Arm:

   - Fix trapping regression when no in-kernel irqchip is present

   - Check host-provided, untrusted ranges and offsets in pKVM

   - Fix regression restoring the ID_PFR1_EL1 register

   - Fix vgic ITS locking issues when LPIs are not directly injected

  Arm selftests:

   - Correct target CPU programming in vgic_lpi_stress selftest

   - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest

  RISC-V:

   - Fix check for local interrupts on riscv32

   - Read HGEIP CSR on the correct cpu when checking for IMSIC
     interrupts

   - Remove automatic I/O mapping from kvm_arch_prepare_memory_region()

  x86:

   - Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as
     KVM doesn't support virtualization the instructions, but the
     instructions are gated only by VMXON. That is, they will VM-Exit
     instead of taking a #UD and until now this resulted in KVM exiting
     to userspace with an emulation error.

   - Unload the "FPU" when emulating INIT of XSTATE features if and only
     if the FPU is actually loaded, instead of trying to predict when
     KVM will emulate an INIT (CET support missed the MP_STATE path).
     Add sanity checks to detect and harden against similar bugs in the
     future.

   - Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is
     unloaded.

   - Use a raw spinlock for svm->ir_list_lock as the lock is taken
     during schedule(), and "normal" spinlocks are sleepable locks when
     PREEMPT_RT=y.

   - Remove guest_memfd bindings on memslot deletion when a gmem file is
     dying to fix a use-after-free race found by syzkaller.

   - Fix a goof in the EPT Violation handler where KVM checks the wrong
     variable when determining if the reported GVA is valid.

   - Fix and simplify the handling of LBR virtualization on AMD, which
     was made buggy and unnecessarily complicated by nested VM support

  Misc:

   - Update Oliver's email address"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  KVM: nSVM: Fix and simplify LBR virtualization handling with nested
  KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
  KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
  MAINTAINERS: Switch myself to using kernel.org address
  KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
  KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
  KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
  KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
  KVM: arm64: Make all 32bit ID registers fully writable
  KVM: VMX: Fix check for valid GVA on an EPT violation
  KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
  KVM: SVM: switch to raw spinlock for svm->ir_list_lock
  KVM: SVM: Make avic_ga_log_notifier() local to avic.c
  KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
  KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
  KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
  KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
  KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
  KVM: arm64: Check the untrusted offset in FF-A memory share
  KVM: arm64: Check range args for pKVM mem transitions
  ...
2025-11-10 08:54:36 -08:00
Christian Brauner
07d7ad46da
selftests/namespaces: test for efault
Ensure that put_user() can fail and that namespace cleanup works
correctly.

Link: https://patch.msgid.link/20251109-namespace-6-19-fixes-v1-8-ae8a4ad5a3b3@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-10 15:53:56 +01:00
Christian Brauner
88efd7c699
selftests/namespaces: add active reference count regression test
Add a regression test for setns() with pidfd.

Link: https://patch.msgid.link/20251109-namespace-6-19-fixes-v1-7-ae8a4ad5a3b3@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-10 10:20:54 +01:00
Paolo Bonzini
ca00c3af8e KVM/arm654 fixes for 6.18, take #2
* Core fixes
 
   - Fix trapping regression when no in-kernel irqchip is present
     (20251021094358.1963807-1-sascha.bischoff@arm.com)
 
   - Check host-provided, untrusted ranges and offsets in pKVM
     (20251016164541.3771235-1-vdonnefort@google.com)
     (20251017075710.2605118-1-sebastianene@google.com)
 
   - Fix regression restoring the ID_PFR1_EL1 register
     (20251030122707.2033690-1-maz@kernel.org
 
   - Fix vgic ITS locking issues when LPIs are not directly injected
     (20251107184847.1784820-1-oupton@kernel.org)
 
 * Test fixes
 
   - Correct target CPU programming in vgic_lpi_stress selftest
     (20251020145946.48288-1-mdittgen@amazon.de)
 
   - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
     (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org)
     (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org)
 
 * Misc
 
   - Update Oliver's email address
     (20251107012830.1708225-1-oupton@kernel.org)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmkPMCoACgkQI9DQutE9
 ekPbaBAAuunvToIxTQoa1hf8+9qlCl/uQNO0R1Q/72N2sLv02kyma9xNA6UGuGFX
 a5AURJItw3RzEDnsJ3Grvu/2IgRgHUrTOqpwPSdGzN/t5wq046Rjp7tWpyQVlwDn
 hax1y7xWpHgC2pDpVran+I7fkC9SZlw5FXGTyEyyGhhYsw41aT6DwYJNyv4TKlxQ
 btXkFN0dhJGw0GjqFvUsZoWBvdWYfhZpMwt24eJ0H8uZXgVImG2TeIcRdwF7A4zr
 0Z3eE17uIochflpWARydX2bAejWWkBGXwFMLtE8SpbFz5W6SpA6e0kkaUsfwIM3c
 d29azeTSspneqjl4CqT7r28ea3F4pOUzCSSsVMWZ9CplRef1/lqIQAQ+sjVWebE/
 gonzdGiRkhylb3lKMPuaip8tnoaLx+KLYGNkY2uBWUtC1C182d+Jy8W49POZHfSn
 01k0VoiwwUiyCA152n2VGThzwmNyfGImNKLnPBlUZpsFGHbdMdKgHr6IBIGkCK8C
 K9ylq++xoQ5KFpDqkE2/mcv75FAJyFJDb7TavtHl6b6H/LjglSYhhperkXbWhx3K
 +GYfsG/m7eEaro41vfLdqZEJt0mgLJaX+asruA+lFHqWTfnY8QFyczFJs8doox+o
 EqIf6bmlDmSE8Waz0YMiPBFl3fgvj3yvMuceDmV3fL8gmQpPa+0=
 =ptQ6
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm654 fixes for 6.18, take #2

* Core fixes

  - Fix trapping regression when no in-kernel irqchip is present
    (20251021094358.1963807-1-sascha.bischoff@arm.com)

  - Check host-provided, untrusted ranges and offsets in pKVM
    (20251016164541.3771235-1-vdonnefort@google.com)
    (20251017075710.2605118-1-sebastianene@google.com)

  - Fix regression restoring the ID_PFR1_EL1 register
    (20251030122707.2033690-1-maz@kernel.org

  - Fix vgic ITS locking issues when LPIs are not directly injected
    (20251107184847.1784820-1-oupton@kernel.org)

* Test fixes

  - Correct target CPU programming in vgic_lpi_stress selftest
    (20251020145946.48288-1-mdittgen@amazon.de)

  - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
    (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org)
    (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org)

* Misc

  - Update Oliver's email address
    (20251107012830.1708225-1-oupton@kernel.org)
2025-11-09 08:07:55 +01:00
Thomas Weißschuh
107eb8336e tools/nolibc: add support for fchdir()
Add support for the file descriptor based variant of chdir().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-08 14:54:25 +01:00
Daniel Zahka
2098cec328 selftests: drv-net: psp: add assertions on core-tracked psp dev stats
Add assertions to existing test cases to cover key rotations and
'stale-events'.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-3-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:53:56 -08:00
Alexander Sverdlin
57531b3416 selftests: net: local_termination: Wait for interfaces to come up
It seems that most of the tests prepare the interfaces once before the test
run (setup_prepare()), rely on setup_wait() to wait for link and only then
run the test(s).

local_termination brings the physical interfaces down and up during test
run but never wait for them to come up. If the auto-negotiation takes
some seconds, first test packets are being lost, which leads to
false-negative test results.

Use setup_wait() in run_test() to make sure auto-negotiation has been
completed after all simple_if_init() calls on physical interfaces and test
packets will not be lost because of the race against link establishment.

Fixes: 90b9566aa5 ("selftests: forwarding: add a test for local_termination.sh")
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20251106161213.459501-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:46:36 -08:00
Kuniyuki Iwashima
ffc56c9081 selftest: packetdrill: Add max RTO test for SYN+ACK.
This script sets net.ipv4.tcp_rto_max_ms to 1000 and checks
if SYN+ACK RTO is capped at 1s for TFO and non-TFO.

Without the previous patch, the max RTO is applied to TFO
SYN+ACK only, and non-TFO SYN+ACK RTO increases exponentially.

  # selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
  # TAP version 13
  # 1..2
  # tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
     expected outbound packet at 5.091936 sec but happened at 6.107826 sec; tolerance 0.127974 sec
  # script packet:  5.091936 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
  # actual packet:  6.107826 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
  # not ok 1 ipv4
  # tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
     expected outbound packet at 5.075901 sec but happened at 6.091841 sec; tolerance 0.127976 sec
  # script packet:  5.075901 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
  # actual packet:  6.091841 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
  # not ok 2 ipv6
  # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
  not ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt # exit=1

With the previous patch, all SYN+ACKs are retransmitted
after 1s.

  # selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
  # TAP version 13
  # 1..2
  # ok 1 ipv4
  # ok 2 ipv6
  # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
  ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:05:26 -08:00
Linus Torvalds
a2e33fb926 iommufd 6.18 first rc pull
Three fixes:
 
 - Syzkaller found a case where maths overflows can cause divide by 0
 
 - Typo in a compiler bug warning fix in the selftests broke the selftests
 
 - type1 compatability had a mismatch when unmapping an already unmaped
   range, it should succeed
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaQ473AAKCRCFwuHvBreF
 Ya8gAP9Qifu4AYobrhgD9rKKXLniGRejXUHBQ0G8y8y9sBHJdgEApwxzCUcI1jLZ
 GTaPEABblBBWxUmN+Psya2Hm3Oa53AA=
 =TJHp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:

 - Syzkaller found a case where maths overflows can cause divide by 0

 - Typo in a compiler bug warning fix in the selftests broke the
   selftests

 - type1 compatability had a mismatch when unmapping an already unmapped
   range, it should succeed

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Make vfio_compat's unmap succeed if the range is already empty
  iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
  iommufd: Don't overflow during division for dirty tracking
2025-11-07 13:13:09 -08:00
Mark Rutland
a7717cad61 kselftest/arm64: Align zt-test register dumps
The zt-test output is awkward to read, as the 'Expected' value isn't
dumped on its own line and isn't aligned with the 'Got' value beneath.
For example:

  Mismatch: PID=5281, iteration=3270249   Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469]
          Got      [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
          SVCR: 2

Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get
output that can be read more easily:

  Mismatch: PID=5281, iteration=3270249
          Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469]
          Got      [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
          SVCR: 2

Admittedly this isn't all that important when the 'Got' value is all
zeroes, but otherwise this would be a major help for identifying which
portion of the 'Got' value is not as expected.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kselftest@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07 20:00:49 +00:00
Paul E. McKenney
f2b7d6252c torture: Permit negative kvm.sh --kconfig numberic arguments
This commit loosens the kvm.sh script's regular expressions to permit
negative-valued Kconfig options, for example:

	--kconfig CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN=-1

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-07 13:57:38 +01:00
Alexis Lothoré (eBPF Foundation)
5b7d6c9198 selftests/bpf: Use start_server_str rather than start_reuseport_server in tc_tunnel
Now that start_server_str enforces SO_REUSEADDR, there's no need to keep
using start_reusport_server in tc_tunnel, especially since it only uses
one server at a time.

Replace start_reuseport_server with start_server_str in tc_tunnel test.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-start-server-soreuseaddr-v1-2-1bbd9c1f8d65@bootlin.com
2025-11-06 15:23:04 -08:00
Alexis Lothoré (eBPF Foundation)
38e36514fc selftests/bpf: Systematically add SO_REUSEADDR in start_server_addr
Some tests have to stop/start a server multiple time with the same
listening address. Doing so without SO_REUSADDR leads to failures due to
the socket still being in TIME_WAIT right after the first instance
stop/before the second instance start. Instead of letting each test
manually set SO_REUSEADDR on their servers, it can be done automatically
by start_server_addr for all tests (and without any major downside).

Enforce SO_REUSEADDR in start_server_addr for all tests.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-start-server-soreuseaddr-v1-1-1bbd9c1f8d65@bootlin.com
2025-11-06 15:23:00 -08:00
Steven Rostedt
37f4660138 selftests/tracing: Add basic test for trace_marker_raw file
Commit 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read
user space") made an update that fixed both trace_marker and
trace_marker_raw. But the small difference made to trace_marker_raw had a
blatant bug in it that any basic testing would have uncovered.
Unfortunately, the self tests have tests for trace_marker but nothing for
trace_marker_raw which allowed the bug to get upstream.

Add basic selftests to test trace_marker_raw so that this doesn't happen
again.

Link: https://lore.kernel.org/r/20251014145149.3e3c1033@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-06 15:23:50 -07:00
Jakub Kicinski
1ec9871fbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc5).

Conflicts:

drivers/net/wireless/ath/ath12k/mac.c
  9222582ec5 ("Revert "wifi: ath12k: Fix missing station power save configuration"")
  6917e268c4 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon")
https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net

Adjacent changes:

drivers/net/ethernet/intel/Kconfig
  b1d16f7c00 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG")
  93f53db9f9 ("ice: switch to Page Pool")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06 09:27:40 -08:00
Linus Torvalds
c2c2ccfd4b Including fixes from bluetooth and wireless.
Current release - new code bugs:
 
  - ptp: expose raw cycles only for clocks with free-running counter
 
  - bonding: fix null-deref in actor_port_prio setting
 
  - mdio: ERR_PTR-check regmap pointer returned by device_node_to_regmap()
 
  - eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG
 
 Previous releases - regressions:
 
  - virtio_net: fix perf regression due to bad alignment of
    virtio_net_hdr_v1_hash
 
  - Revert "wifi: ath10k: avoid unnecessary wait for service ready message"
    caused regressions for QCA988x and QCA9984
 
  - Revert "wifi: ath12k: Fix missing station power save configuration"
    caused regressions for WCN7850
 
  - eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
    corruptions after kexec
 
 Previous releases - always broken:
 
  - virtio-net: fix received packet length check for big packets
 
  - sctp: fix races in socket diag handling
 
  - wifi: add an hrtimer-based delayed work item to avoid low granularity
    of timers set relatively far in the future, and use it where it matters
    (e.g. when performing AP-scheduled channel switch)
 
  - eth: mlx5e:
    - correctly propagate error in case of module EEPROM read failure
    - fix HW-GRO on systems with PAGE_SIZE == 64kB
 
  - dsa: b53: fixes for tagging, link configuration / RMII, FDB, multicast
 
  - phy: lan8842: implement latest errata
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkMyx0ACgkQMUZtbf5S
 Irs4gA//fRGPHfEksbVO0lAR9QVUKq/Wvokt6DnLXQsi7js7kun7ymVFyU8tVacP
 sGoYbTTXO34XpijKVMTFe5WnnnSF3cri1/e+PYNXndKGHOheLvz0aq+CniiLexaT
 ag/opHX9+Io6x8DyOVkkRJYByvEcXgy1vFjhk5O04wn7tGPBNzvaKQJzjfVIiiWP
 thma/e77jVUqsNptS/VzJNCDwPB3Qi0fqylRovu7A59Kmr7GLZxaqZQpvM5/XjU3
 s3NhNjNDawmiwxN2AIztXo+vMReqFaiCr+z36OcruE04CFslkQGmE/5g3FdDGg+Q
 RE7Wfgk2UJ+Q5Y1jnQWNYOkGaMd6bMgPQD/8zZvwEf163Gh1YBh0rtDY07DWV5FR
 wp5cmeNDo2Mf+nnCM2UaoUQS2AAmkub2aAIjnlvrefeJMKciyMqtQe+V02aIxuvK
 BPmeSCN1IOyAIDCRE3Fd6JFyk+4Z4YC6Hdm/WOG517eGrDh3yV9JE/+C3jdh3JwT
 lvaScKhCdyq/9mFqEcU/yR5Kc4Od6Rw0K0s5DM4LAbKSsxI6hp6SkuoFHUsqFsRR
 HL3ucy4c4hmLxz6AA5/+UVIPzFsY5cwOnhs3AU2UVuA95fH0QWqX2u2ll0rC9mAW
 xDXa/ai0/KAGvVYqjC+MDyF8zrtzSG7C40ZsxhdcU84s94ZNVFI=
 =a/Ml
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
  Including fixes from bluetooth and wireless.

  Current release - new code bugs:

   - ptp: expose raw cycles only for clocks with free-running counter

   - bonding: fix null-deref in actor_port_prio setting

   - mdio: ERR_PTR-check regmap pointer returned by
     device_node_to_regmap()

   - eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG

  Previous releases - regressions:

   - virtio_net: fix perf regression due to bad alignment of
     virtio_net_hdr_v1_hash

   - Revert "wifi: ath10k: avoid unnecessary wait for service ready
     message" caused regressions for QCA988x and QCA9984

   - Revert "wifi: ath12k: Fix missing station power save configuration"
     caused regressions for WCN7850

   - eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
     corruptions after kexec

  Previous releases - always broken:

   - virtio-net: fix received packet length check for big packets

   - sctp: fix races in socket diag handling

   - wifi: add an hrtimer-based delayed work item to avoid low
     granularity of timers set relatively far in the future, and use it
     where it matters (e.g. when performing AP-scheduled channel switch)

   - eth: mlx5e:
       - correctly propagate error in case of module EEPROM read failure
       - fix HW-GRO on systems with PAGE_SIZE == 64kB

   - dsa: b53: fixes for tagging, link configuration / RMII, FDB,
     multicast

   - phy: lan8842: implement latest errata"

* tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  selftests/vsock: avoid false-positives when checking dmesg
  net: bridge: fix MST static key usage
  net: bridge: fix use-after-free due to MST port state bypass
  lan966x: Fix sleeping in atomic context
  bonding: fix NULL pointer dereference in actor_port_prio setting
  net: dsa: microchip: Fix reserved multicast address table programming
  net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
  net: libwx: fix device bus LAN ID
  net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
  net/mlx5e: SHAMPO, Fix skb size check for 64K pages
  net/mlx5e: SHAMPO, Fix header mapping for 64K pages
  net: ti: icssg-prueth: Fix fdb hash size configuration
  net/mlx5e: Fix return value in case of module EEPROM read error
  net: gro_cells: Reduce lock scope in gro_cell_poll
  libie: depend on DEBUG_FS when building LIBIE_FWLOG
  wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
  netpoll: Fix deadlock in memory allocation under spinlock
  net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error
  virtio-net: fix received length check in big packets
  bnxt_en: Fix warning in bnxt_dl_reload_down()
  ...
2025-11-06 08:52:30 -08:00
Bobby Eshleman
3534e03e0e selftests/vsock: avoid false-positives when checking dmesg
Sometimes VMs will have some intermittent dmesg warnings that are
unrelated to vsock. Change the dmesg parsing to filter on strings
containing 'vsock' to avoid false positive failures that are unrelated
to vsock. The downside is that it is possible for some vsock related
warnings to not contain the substring 'vsock', so those will be missed.

Fixes: a4a65c6fe0 ("selftests/vsock: add initial vmtest.sh for vsock")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251105-vsock-vmtest-dmesg-fix-v2-1-1a042a14892c@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06 07:34:50 -08:00
Zhang Chujun
79d062bdef selftest/alsa: correct grammar in conf_get_bool error string
The phrase "an bool" is grammatically incorrect; it should be
"a bool".

Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>

Link: https://patch.msgid.link/20251106055819.1996-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-06 11:47:36 +01:00
Anton Protopopov
ac4d838ce1 selftests/bpf: add C-level selftests for indirect jumps
Add C-level selftests for indirect jumps to validate LLVM and libbpf
functionality. The tests are intentionally disabled, to be run
locally by developers, but will not make the CI red.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-13-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:56 -08:00
Anton Protopopov
ccbdb48ce5 selftests/bpf: add new verifier_gotox test
Add a set of tests to validate core gotox functionality
without need to rely on compilers.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-12-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:23 -08:00
Anton Protopopov
ae48162a66 selftests/bpf: test instructions arrays with blinding
Add a specific test for instructions arrays with blinding enabled.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-7-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:22 -08:00
Anton Protopopov
218edd6db6 selftests/bpf: add selftests for new insn_array map
Add the following selftests for new insn_array map:

  * Incorrect instruction indexes are rejected
  * Two programs can't use the same map
  * BPF progs can't operate the map
  * no changes to code => map is the same
  * expected changes when instructions are added
  * expected changes when instructions are deleted
  * expected changes when multiple functions are present

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-5-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:18 -08:00
Jiri Olsa
3490d29964 selftests/bpf: Add stacktrace ips test for raw_tp
Adding test that verifies we get expected initial 2 entries from
stacktrace for rawtp probe via ORC unwind.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05 17:05:19 -08:00
Jiri Olsa
c9e208fa93 selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
Adding test that attaches kprobe/kretprobe multi and verifies the
ORC stacktrace matches expected functions.

Adding bpf_testmod_stacktrace_test function to bpf_testmod kernel
module which is called through several functions so we get reliable
call path for stacktrace.

The test is only for ORC unwinder to keep it simple.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05 17:05:19 -08:00
Valentin Schneider
82a2244980 rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
We now have an RCU_EXPERT config for testing small-sized RCU dynticks
counter:  CONFIG_RCU_DYNTICKS_TORTURE.

Modify scenario TREE04 to exercise to use this config in order to test a
ridiculously small counter (2 bits).

Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:09:16 +01:00
Paul E. McKenney
f121fbbdaf rcutorture: Permit kvm-again.sh to re-use the build directory
This commit adds "inplace" and "inplace-force" values to the kvm-again.sh
"--link" argument, which causes the run's output to be placed into the
build directory.  This could be used to save build time if the machine
went down partway into a run, but it can also be used to do a large
number of builds, and run the resulting kernels concurrently even if the
builds are based on different commits.  A later commit will add this
latter capability to kvm-series.sh in order to produce large speedups
for branch-checking operations.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:12 +01:00
Paul E. McKenney
515a48fedc torture: Add kvm-series.sh to test commit/scenario combination
This commit adds a kvm-series.sh script that takes a list of scenarios and
a list of commits, and then runs each scenario on all of the commits.
A given scenario is run on all the commits before advancing to the
next scenario to minimize build times.  The successes and failures are
summarized at the end of the run.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-06 00:03:10 +01:00
Jason Gunthorpe
afb47765f9 iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd returns ENOENT when attempting to unmap a range that is already
empty, while vfio type1 returns success. Fix vfio_compat to match.

Fixes: d624d6652a ("iommufd: vfio container FD ioctl compatibility")
Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Reported-by: Alex Mastro <amastro@fb.com>
Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05 15:11:26 -04:00
David Matlack
a52b1a7112 vfio: selftests: Store libvfio build outputs in $(OUTPUT)/libvfio
Store the tools/testing/selftests/vfio/lib outputs (e.g. object files)
in $(OUTPUT)/libvfio rather than in $(OUTPUT)/lib. This is in
preparation for building the VFIO selftests library into the KVM
selftests (see Link below).

Specifically this will avoid name conflicts between
tools/testing/selftests/{vfio,kvm/lib and also avoid leaving behind
empty directories under tools/testing/selftests/kvm after a make clean.

Link: https://lore.kernel.org/kvm/20250912222525.2515416-2-dmatlack@google.com/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250922224857.2528737-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-05 12:02:14 -07:00
Jason Gunthorpe
e93d5945ed iommufd: Change the selftest to use iommupt instead of xarray
The iommufd self test uses an xarray to store the pfns and their orders to
emulate a page table. Make it act more like a real iommu driver by
replacing the xarray with an iommupt based page table. The new AMDv1 mock
format behaves similarly to the xarray.

Add set_dirty() as a iommu_pt operation to allow the test suite to
simulate HW dirty.

Userspace can select between several formats including the normal AMDv1
format and a special MOCK_IOMMUPT_HUGE variation for testing huge page
dirty tracking. To make the dirty tracking test work the page table must
only store exactly 2M huge pages otherwise the logic the test uses
fails. They cannot be broken up or combined.

Aside from aligning the selftest with a real page table implementation,
this helps test the iommupt code itself.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05 09:07:13 +01:00
Kees Cook
85cb0757d7 net: Convert proto_ops connect() callbacks to use sockaddr_unsized
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Kees Cook
0e50474fa5 net: Convert proto_ops bind() callbacks to use sockaddr_unsized
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Mykyta Yatsenko
137cc92ffe bpf: add _impl suffix for bpf_stream_vprintk() kfunc
Rename bpf_stream_vprintk() to bpf_stream_vprintk_impl().

This makes bpf_stream_vprintk() follow the already established "_impl"
suffix-based naming convention for kfuncs with the bpf_prog_aux
argument provided by the verifier implicitly. This convention will be
taken advantage of with the upcoming KF_IMPLICIT_ARGS feature to
preserve backwards compatibility to BPF programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-2-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
2025-11-04 17:50:25 -08:00
Mykyta Yatsenko
ea0714d61d bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
Rename:
bpf_task_work_schedule_resume()->bpf_task_work_schedule_resume_impl()
bpf_task_work_schedule_signal()->bpf_task_work_schedule_signal_impl()

This aligns task work scheduling kfuncs with the established naming
scheme for kfuncs with the bpf_prog_aux argument provided by the
verifier implicitly. This convention will be taken advantage of with the
upcoming KF_IMPLICIT_ARGS feature to preserve backwards compatibility to
BPF programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-1-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
2025-11-04 17:50:25 -08:00
Matthieu Baerts (NGI0)
5c59df126b selftests: mptcp: join: validate extra bind cases
By design, an MPTCP connection will not accept extra subflows where no
MPTCP listening sockets can accept such requests.

In other words, it means that if the 'server' listens on a specific
address / device, it cannot accept MP_JOIN sent to a different address /
device. Except if there is another MPTCP listening socket accepting
them.

This is what the new tests are validating:

 - Forcing a bind on the main v4/v6 address, and checking that MP_JOIN
   to announced addresses are not accepted.

 - Also forcing a bind on the main v4/v6 address, but before, another
   listening socket is created to accept additional subflows. Note that
   'mptcpize run nc -l' -- or something else only doing: socket(MPTCP),
   bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is
   reused not to depend on another tool just for that.

 - Same as the previous one, but using v6 link-local addresses: this is
   a bit particular because it is required to specify the outgoing
   network interface when connecting to a link-local address announced
   by the other peer. When using the routing rules, this doesn't work
   (the outgoing interface is not known) ; but it does work with a
   'laminar' endpoint having a specified interface.

Note that extra small modifications are needed for these tests to work:

 - mptcp_connect's check_getpeername_connect() check should strip the
   specified interface when comparing addresses.

 - With IPv6 link-local addresses, it is required to wait for them to
   be ready (no longer in 'tentative' mode) before using them, otherwise
   the bind() will not be allowed.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/591
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-4-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 17:15:07 -08:00
Matthieu Baerts (NGI0)
4a6220a453 selftests: mptcp: join: do_transfer: reduce code dup
The same extra long commands are present twice, with small differences:
the variable for the stdin file is different.

Use new dedicated variables in one command to avoid this code
duplication.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-3-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 17:15:06 -08:00
Matthieu Baerts (NGI0)
e461e8a799 mptcp: pm: in kernel: only use fullmesh endp if any
Our documentation is saying that the in-kernel PM is only using fullmesh
endpoints to establish subflows to announced addresses when at least one
endpoint has a fullmesh flag. But this was not totally correct: only
fullmesh endpoints were used if at least one endpoint *from the same
address family as the received ADD_ADDR* has the fullmesh flag.

This is confusing, and it seems clearer not to have differences
depending on the address family.

So, now, when at least one MPTCP endpoint has a fullmesh flag, the local
addresses are picked from all fullmesh endpoints, which might be 0 if
there are no endpoints for the correct address family.

One selftest needs to be adapted for this behaviour change.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-2-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 17:15:06 -08:00
Alan Maguire
cc77a20389 selftests/bpf: Test parsing of (multi-)split BTF
Write raw BTF to files, parse it and compare to original;
this allows us to test parsing of (multi-)split BTF code.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251104203309.318429-3-alan.maguire@oracle.com
2025-11-04 13:44:12 -08:00
Christian Brauner
cbb842548a
selftests/coredump: add second PIDFD_INFO_COREDUMP_SIGNAL test
Verify that when using simple socket-based coredump (@ pattern),
the coredump_signal field is correctly exposed as SIGABRT.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-22-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:05:01 +01:00
Christian Brauner
619e2227cc
selftests/coredump: add first PIDFD_INFO_COREDUMP_SIGNAL test
Verify that when using simple socket-based coredump (@ pattern),
the coredump_signal field is correctly exposed as SIGSEGV.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-21-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:59 +01:00
Christian Brauner
32ae33f796
selftests/coredump: ignore ENOSPC errors
If we crash multiple processes at the same time we may run out of space.
Just ignore those errors. They're not actually all that relevant for the
test.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-20-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:57 +01:00
Christian Brauner
408a0ed9ee
selftests/coredump: add debug logging to coredump socket protocol tests
So it's easier to figure out bugs.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-19-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:55 +01:00
Christian Brauner
2343cbee9f
selftests/coredump: add debug logging to coredump socket tests
So it's easier to figure out bugs.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-18-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:53 +01:00
Christian Brauner
d5694db5dc
selftests/coredump: add debug logging to test helpers
so we can easily figure out why something failed.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-17-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:51 +01:00
Christian Brauner
305e6b167c
selftests/coredump: handle edge-triggered epoll correctly
by putting the file descriptor into non-blocking mode.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-16-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:48 +01:00
Christian Brauner
8b64f54c81
selftests/coredump: fix userspace coredump client detection
PIDFD_INFO_COREDUMP is only retrievable until the task has exited. After
it has exited task->mm is NULL. So if the task didn't actually coredump
we can't retrieve it's dumpability settings anymore. Only if the task
did coredump will we have stashed the coredump information in the
respective struct pid.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-15-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:46 +01:00
Christian Brauner
32ae9fa406
selftests/coredump: fix userspace client detection
We need to request PIDFD_INFO_COREDUMP in the first place.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-14-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:44 +01:00
Christian Brauner
c09ea6659e
selftests/coredump: split out coredump socket tests
Split the coredump socket tests into separate files.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-13-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:42 +01:00
Christian Brauner
c71147f42b
selftests/coredump: split out common helpers
into separate files.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-12-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:40 +01:00
Christian Brauner
2593deaac8
selftests/pidfd: add second supported_mask test
Verify that supported_mask is returned even when other fields are
requested.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-11-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:38 +01:00
Christian Brauner
e12f734208
selftests/pidfd: add first supported_mask test
Verify that when PIDFD_INFO_SUPPORTED_MASK is requested, the kernel
returns the supported_mask field indicating which flags the kernel
supports.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-10-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:36 +01:00
Christian Brauner
a945535dfd
selftests/pidfd: update pidfd header
Include the new defines and members.

Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-9-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 22:04:32 +01:00
Samiullah Khawaja
add3c1324a selftests: Add napi threaded busy poll test in busy_poller
Add testcase to run busy poll test with threaded napi busy poll enabled.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-3-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03 18:11:40 -08:00
KaFai Wan
9f32bfec54 selftests/bpf: Add test for conditional jumps on same scalar register
Add test cases to verify the correctness of the BPF verifier's branch analysis
when conditional jumps are performed on the same scalar register. And make sure
that JGT does not trigger verifier BUG.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251103063108.1111764-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-03 17:44:53 -08:00
Song Liu
62d2d0a338 selftests/bpf: Add tests for livepatch + bpf trampoline
Both livepatch and BPF trampoline use ftrace. Special attention is needed
when livepatch and fexit program touch the same function at the same
time, because livepatch updates a kernel function and the BPF trampoline
need to call into the right version of the kernel function.

Use samples/livepatch/livepatch-sample.ko for the test.

The test covers two cases:
  1) When a fentry program is loaded first. This exercises the
     modify_ftrace_direct code path.
  2) When a fentry program is loaded first. This exercises the
     register_ftrace_direct code path.

Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251027175023.1521602-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-03 17:22:06 -08:00
Alexei Starovoitov
5dae7453ec Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc4
Cross-merge BPF and other fixes after downstream PR.
No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-03 14:59:55 -08:00
Alison Schofield
f59b701b46 tools/testing/nvdimm: Use per-DIMM device handle
KASAN reports a global-out-of-bounds access when running these nfit
tests: clear.sh, pmem-errors.sh, pfn-meta-errors.sh, btt-errors.sh,
daxdev-errors.sh, and inject-error.sh.

[] BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x769f/0x7840 [nfit_test]
[] Read of size 4 at addr ffffffffc03ea01c by task ndctl/1215
[] The buggy address belongs to the variable:
[] handle+0x1c/0x1df4 [nfit_test]

nfit_test_search_spa() uses handle[nvdimm->id] to retrieve a device
handle and triggers a KASAN error when it reads past the end of the
handle array. It should not be indexing the handle array at all.

The correct device handle is stored in per-DIMM test data. Each DIMM
has a struct nfit_mem that embeds a struct acpi_nfit_memdev that
describes the NFIT device handle. Use that device handle here.

Fixes: 10246dc84d ("acpi nfit: nfit_test supports translate SPA")
Cc: stable@vger.kernel.org
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251031234227.1303113-1-alison.schofield@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2025-11-03 16:47:13 -06:00
Tejun Heo
587eb08a5f sched_ext: Merge branch 'for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-6.19
Pull cgroup/for-6.19 to receive:

 16dad7801a ("cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()")
 260fbcb92b ("cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()")
 d245698d72 ("cgroup: Defer task cgroup unlink until after the task is done switching out")

These are needed for the sched_ext cgroup exit ordering fix.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:57:26 -10:00
Sean Christopherson
83e0e12219 KVM: selftests: Rename "guest_paddr" variables to "gpa"
Rename "guest_paddr" variables in vm_userspace_mem_region_add() and
vm_mem_add() to KVM's de facto standard "gpa", both for consistency and
to shorten line lengths.

Opportunistically fix the indentation of the
vm_userspace_mem_region_add() declaration.

Link: https://patch.msgid.link/20251007223625.369939-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-03 12:54:21 -08:00
Christian Brauner
2cc1c01fe9
selftests/namespace: test listns() pagination
Minimal test case to reproduce KASAN out-of-bounds in listns pagination.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-72-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:25 +01:00
Christian Brauner
fc85885692
selftests/namespace: add stress test
Stress tests for namespace active reference counting.

These tests validate that the active reference counting system can
handle high load scenarios including rapid namespace
creation/destruction, large numbers of concurrent namespaces, and
various edge cases under stress.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-71-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:25 +01:00
Christian Brauner
d18cf3f9a4
selftests/namespace: commit_creds() active reference tests
Test credential changes and their impact on namespace active references.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-70-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
80fedf8168
selftests/namespace: third threaded active reference count test
Test that namespaces become inactive after subprocess with multiple
threads exits. Create a subprocess that unshares user and network
namespaces, then creates two threads that share those namespaces. Verify
that after all threads and subprocess exit, the namespaces are no longer
listed by listns() and cannot be opened by open_by_handle_at().

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-69-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
ee86103238
selftests/namespace: second threaded active reference count test
Test that a namespace remains active while a thread holds an fd to it.
Even after the thread exits, the namespace should remain active as long
as another thread holds a file descriptor to it.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-68-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
29f083c499
selftests/namespace: first threaded active reference count test
Test that namespace becomes inactive after thread exits. This verifies
active reference counting works with threads, not just processes.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-67-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
c89d100f6a
selftests/namespaces: twelth inactive namespace resurrection test
Test multi-level namespace resurrection across three user namespace levels.

This test creates a complex namespace hierarchy with three levels of user
namespaces and a network namespace at the deepest level. It verifies that
the resurrection semantics work correctly when SIOCGSKNS is called on a
socket from an inactive namespace tree, and that listns() and
open_by_handle_at() correctly respect visibility rules.

Hierarchy after child processes exit (all with 0 active refcount):

         net_L3A (0)                <- Level 3 network namespace
             |
             +
         userns_L3 (0)              <- Level 3 user namespace
             |
             +
         userns_L2 (0)              <- Level 2 user namespace
             |
             +
         userns_L1 (0)              <- Level 1 user namespace
             |
             x
         init_user_ns

The test verifies:
1. SIOCGSKNS on a socket from inactive net_L3A resurrects the entire chain
2. After resurrection, all namespaces are visible in listns()
3. Resurrected namespaces can be reopened via file handles
4. Closing the netns FD cascades down: the entire ownership chain
   (userns_L3 -> userns_L2 -> userns_L1) becomes inactive again
5. Inactive namespaces disappear from listns() and cannot be reopened
6. Calling SIOCGSKNS again on the same socket resurrects the tree again
7. After second resurrection, namespaces are visible and can be reopened

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-66-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
c80168b677
selftests/namespaces: eleventh inactive namespace resurrection test
Test combined listns() and file handle operations with socket-kept
netns. Create a netns, keep it alive with a socket, verify it appears in
listns(), then reopen it via file handle obtained from listns() entry.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-65-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
3798991a9f
selftests/namespaces: tenth inactive namespace resurrection test
Test that socket-kept netns can be reopened via file handle.
Verify that a network namespace kept alive by a socket FD can be
reopened using file handles even after the creating process exits.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-64-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
b9d09f568b
selftests/namespaces: ninth inactive namespace resurrection test
Test that socket-kept netns appears in listns() output.
Verify that a network namespace kept alive by a socket FD appears in
listns() output even after the creating process exits, and that it
disappears when the socket is closed.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-63-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:24 +01:00
Christian Brauner
6de17ec3cc
selftests/namespaces: eigth inactive namespace resurrection test
Test IPv6 sockets also work with SIOCGSKNS.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-62-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
54a29d1233
selftests/namespaces: seventh inactive namespace resurrection test
Test socket keeps netns active after creating process exits. Verify that
as long as the socket FD exists, the namespace remains active.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-61-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
aec2237695
selftests/namespaces: sixth inactive namespace resurrection test
Test multiple sockets keep the same network namespace active. Create
multiple sockets, verify closing some doesn't affect others.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-60-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
2b9fa5bf0c
selftests/namespaces: fifth inactive namespace resurrection test
Test SIOCGSKNS fails on non-socket file descriptors.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-59-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
40226da471
selftests/namespaces: fourth inactive namespace resurrection test
Test SIOCGSKNS across setns. Create a socket in netns A, switch to netns
B, verify SIOCGSKNS still returns netns A.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-58-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
5aec9f455c
selftests/namespaces: third inactive namespace resurrection test
Test SIOCGSKNS with different socket types (TCP, UDP, RAW).

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-57-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
c0f06da568
selftests/namespaces: second inactive namespace resurrection test
Test that socket file descriptors keep network namespaces active. Create
a network namespace, create a socket in it, then exit the namespace. The
namespace should remain active while the socket FD is held.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-56-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:23 +01:00
Christian Brauner
a1e49d8d18
selftests/namespaces: first inactive namespace resurrection test
Test basic SIOCGSKNS functionality. Create a socket and verify SIOCGSKNS
returns the correct network namespace.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-55-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
39bcc7ae57
selftests/namespaces: seventh listns() permission test
Test that dropping CAP_SYS_ADMIN restricts what we can see.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-54-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
cff66421ee
selftests/namespaces: sixth listns() permission test
Test that we can see user namespaces we have CAP_SYS_ADMIN inside of.
This is different from seeing namespaces owned by a user namespace.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-53-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
1c28817eb3
selftests/namespaces: fifth listns() permission test
Test that CAP_SYS_ADMIN in parent user namespace allows seeing
child user namespace's owned namespaces.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-52-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
6f360f2b2f
selftests/namespaces: fourth listns() permission test
Test permission checking with LISTNS_CURRENT_USER.
Verify that listing with LISTNS_CURRENT_USER respects permissions.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-51-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
2635f93989
selftests/namespaces: third listns() permission test
Test that users cannot see namespaces from unrelated user namespaces.
Create two sibling user namespaces, verify they can't see each other's
owned namespaces.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-50-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
ec38237731
selftests/namespaces: second listns() permission test
Test that users with CAP_SYS_ADMIN in a user namespace can see
all namespaces owned by that user namespace.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-49-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
1f8ee4a1f9
selftests/namespaces: first listns() permission test
Test that unprivileged users can only see namespaces they're currently
in. Create a namespace, drop privileges, verify we can only see our own
namespaces.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-48-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:22 +01:00
Christian Brauner
674294a479
selftests/namespaces: ninth listns() test
Test error cases for listns().

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-47-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
b0de4c80fb
selftests/namespaces: eigth listns() test
Test that hierarchical active reference propagation keeps parent
user namespaces visible in listns().

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-46-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
6aeca1dd49
selftests/namespaces: seventh listns() test
Test listns() with multiple namespace types filter.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-45-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
bc8da67e0e
selftests/namespaces: sixth listns() test
Test listns() with specific user namespace ID.
Create a user namespace and list namespaces it owns.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-44-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
4080b9d946
selftests/namespaces: fifth listns() test
Test that listns() only returns active namespaces.
Create a namespace, let it become inactive, verify it's not listed.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-43-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
abac8de3e5
selftests/namespaces: fourth listns() test
Test listns() with LISTNS_CURRENT_USER.
List namespaces owned by current user namespace.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-42-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
46909d1343
selftests/namespaces: third listns() test
Test listns() pagination.
List namespaces in batches.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-41-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
6a68c7f919
selftests/namespaces: second listns() test
test listns() with type filtering.
List only network namespaces.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-40-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:21 +01:00
Christian Brauner
e2ff8d8864
selftests/namespaces: first listns() test
Test basic listns() functionality with the unified namespace tree.
List all active namespaces globally.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-39-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
158c5c786e
selftests/namespaces: add listns() wrapper
Add a wrapper for the listns() system call.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-38-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
da3c02b70c
selftests/namespaces: fifteenth active reference count tests
Test different namespace types (net, uts, ipc) all contributing
active references to the same owning user namespace.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-37-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
a9d84bf7bf
selftests/namespaces: fourteenth active reference count tests
Test that user namespace as a child also propagates correctly.
Create user_A -> user_B, verify when user_B is active that user_A
is also active. This is different from non-user namespace children.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-36-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
2a94bf7bb8
selftests/namespaces: thirteenth active reference count tests
Test that parent stays active as long as ANY child is active.
Create parent user namespace with two child net namespaces.
Parent should remain active until BOTH children are inactive.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-35-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
04aee1a346
selftests/namespaces: twelth active reference count tests
Test hierarchical propagation with deep namespace hierarchy.
Create: init_user_ns -> user_A -> user_B -> net_ns
When net_ns is active, both user_A and user_B should be active.
This verifies the conditional recursion in __ns_ref_active_put() works.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-34-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
26d238ea6a
selftests/namespaces: eleventh active reference count tests
Test that different namespace types with same owner all contribute
active references to the owning user namespace.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-33-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
e7585a9ef5
selftests/namespaces: tenth active reference count tests
Test multiple children sharing same parent.
Parent should stay active as long as ANY child is active.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-32-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:20 +01:00
Christian Brauner
a8ce47a1ac
selftests/namespaces: ninth active reference count tests
Test multi-level hierarchy (3+ levels deep).
Grandparent → Parent → Child
When child is active, both parent AND grandparent should be active.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-31-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
94f8711080
selftests/namespaces: eigth active reference count tests
Test that bind mounts keep namespaces in the tree even when inactive

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-30-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
4b971b07e4
selftests/namespaces: seventh active reference count tests
Test hierarchical active reference propagation.
When a child namespace is active, its owning user namespace should also
be active automatically due to hierarchical active reference propagation.
This ensures parents are always reachable when children are active.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-29-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
47a5fd8ce1
selftests/namespaces: sixth active reference count tests
Test that an open file descriptor keeps a namespace active.
Even after the creating process exits, the namespace should remain
active as long as an fd is held open.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-28-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
c4803b255f
selftests/namespaces: fifth active reference count tests
Test PID namespace active ref tracking

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-27-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
28655ff253
selftests/namespaces: fourth active reference count tests
Test user namespace active ref tracking via credential lifecycle.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-26-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
c6e25d930b
selftests/namespaces: third active reference count tests
Test that a namespace remains active while a process is using it,
even after the creating process exits.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-25-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
721c7e41b1
selftests/namespaces: second active reference count tests
Test namespace lifecycle: create a namespace in a child process, get a
file handle while it's active, then try to reopen after the process
exits (namespace becomes inactive).

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-24-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:19 +01:00
Christian Brauner
6bdce845fd
selftests/namespaces: first active reference count tests
Test that initial namespaces can be reopened via file handle. Initial
namespaces should always have a ref count of one from boot.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-23-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:18 +01:00
Christian Brauner
e2b6e5eadc
selftests/filesystems: remove CLONE_NEWPIDNS from setup_userns() helper
This is effectively unused and doesn't really server any purpose after
having reviewed all of the tests that rely on it.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-22-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:18 +01:00
Alison Schofield
06377c54a1 cxl/test: Add cxl_translate module for address translation testing
Add a loadable test module that validates CXL address translation
calculations using parameterized test vectors. The module tests both
host-to-device and device-to-host address translations for Modulo and
XOR interleave arithmetic.

Two types of testing are provided:

1. Parameterized test vectors:
   Test vectors are passed as module parameters in the format:
	"dpa pos r_eiw r_eig hb_ways math expected_spa".
   Round-trip validation is performed:
   - Translate a DPA and position to a SPA
   - Verify the result matches expected SPA
   - Translate that SPA back to a DPA and position
   - Verify round-trip consistency

2. Internal validation testing:
   When no test vectors are provided, the module performs validation
   of the translation functions by checking parameter boundaries and
   running 10,000 iterations of randomly generated valid parameters
   to exercise the core calculation functions.

The module uses the CXL Driver translation functions through symbols
exported exclusively for cxl_translate.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-03 09:27:32 -07:00
Li Ming
3f5b8f7f34 cxl/port: Remove devm_cxl_port_enumerate_dports()
devm_cxl_port_enumerate_dports() is not longer used after below commit
commit 4f06d81e7c ("cxl: Defer dport allocation for switch ports")

Delete it and the relevant interface implemented in cxl_test.

Signed-off-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-03 09:16:02 -07:00
Ming Lei
3f5b1169d2 selftests: ublk: make ublk_thread thread-local variable
Refactor ublk_thread to be a thread-local variable instead of storing
it in ublk_dev:

- Remove pthread_t thread field from struct ublk_thread and move it to
  struct ublk_thread_info

- Remove struct ublk_thread array from struct ublk_dev, reducing memory
  footprint

- Define struct ublk_thread as local variable in __ublk_io_handler_fn()
  instead of accessing it from dev->threads[]

- Extract main IO handling logic into __ublk_io_handler_fn() which is
  marked as noinline

- Move CPU affinity setup to ublk_io_handler_fn() before calling
  __ublk_io_handler_fn()

- Update ublk_thread_set_sched_affinity() to take struct ublk_thread_info *
  instead of struct ublk_thread *, and use pthread_setaffinity_np()
  instead of sched_setaffinity()

- Reorder struct ublk_thread fields to group related state together

This change makes each thread's ublk_thread structure truly local to
the thread, improving cache locality and reducing memory usage.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-03 08:34:59 -07:00
Ming Lei
0123bb91f4 selftests: ublk: set CPU affinity before thread initialization
Move ublk_thread_set_sched_affinity() call before ublk_thread_init()
to ensure memory allocations during thread initialization occur on
the correct NUMA node. This leverages Linux's first-touch memory
policy for better NUMA locality.

Also convert ublk_thread_set_sched_affinity() to use
pthread_setaffinity_np() instead of sched_setaffinity(), as the
pthread API is the proper interface for setting thread affinity in
multithreaded programs.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-03 08:34:59 -07:00
Willy Tarreau
db75042e93 tools/nolibc: add missing memchr() to string.h
Surprisingly we forgot to add this common one. It was added with a
per-arch guard allowing to later implement it in arch-specific asm
code like was done for a few other ones.

The test verifies that we don't search past the indicated length.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-11-02 12:11:48 +01:00
Wang Liang
d01f8136d4 selftests: netdevsim: Fix ethtool-coalesce.sh fail by installing ethtool-common.sh
The script "ethtool-common.sh" is not installed in INSTALL_PATH, and
triggers some errors when I try to run the test
'drivers/net/netdevsim/ethtool-coalesce.sh':

  TAP version 13
  1..1
  # timeout set to 600
  # selftests: drivers/net/netdevsim: ethtool-coalesce.sh
  # ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory
  # ./ethtool-coalesce.sh: line 25: make_netdev: command not found
  # ethtool: bad command line argument(s)
  # ./ethtool-coalesce.sh: line 124: check: command not found
  # ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected
  # FAILED /0 checks
  not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1

Install this file to avoid this error. After this patch:

  TAP version 13
  1..1
  # timeout set to 600
  # selftests: drivers/net/netdevsim: ethtool-coalesce.sh
  # PASSED all 22 checks
  ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh

Fixes: fbb8531e58 ("selftests: extract common functions in ethtool-common.sh")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251030040340.3258110-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:41:54 -07:00
Anubhav Singh
f8e8486702 selftests/net: use destination options instead of hop-by-hop
The GRO self-test, gro.c, currently constructs IPv6 packets containing a
Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path
correctly handles IPv6 extension headers.

However, network elements may be configured to drop packets with the
Hop-by-Hop Options header (HBH). This causes the self-test to fail
in environments where such network elements are present.

To improve the robustness and reliability of this test in diverse
network environments, switch from using IPPROTO_HOPOPTS to
IPPROTO_DSTOPTS (Destination Options).

The Destination Options header is less likely to be dropped by
intermediate routers and still serves the core purpose of the test:
validating GRO's handling of an IPv6 extension header. This change
ensures the test can execute successfully without being incorrectly
failed by network policies outside the kernel's control.

Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030060436.1556664-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:33:17 -07:00
Anubhav Singh
02d064de05 selftests/net: fix out-of-order delivery of FIN in gro:tcp test
Due to the gro_sender sending data packets and FIN packets
in very quick succession, these are received almost simultaneously
by the gro_receiver. FIN packets are sometimes processed before the
data packets leading to intermittent (~1/100) test failures.

This change adds a delay of 100ms before sending FIN packets
in gro:tcp test to avoid the out-of-order delivery. The same
mitigation already exists for the gro:ip test.

Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030062818.1562228-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 17:32:24 -07:00
Alexis Lothoré (eBPF Foundation)
e6e10c51fb selftests/bpf: Add checks in tc_tunnel when entering net namespaces
test_tc_tunnel is missing checks on any open_netns. Add those checks
anytime we try to enter a net namespace, and skip the related operations
if we fail. While at it, reduce the number of open_netns/close_netns for
cases involving operations in two distinct namespaces: the test
currently does the following:

  nstoken = open_netns("foo")
  do_operation();
  close(nstoken);
  nstoken = open_netns("bar")
  do_another_operation();
  close(nstoken);

As already stated in reviews for the initial test, we don't need to go
back to the root net namespace to enter a second namespace, so just do:

  ntoken_client = open_netns("foo")
  do_operation();
  nstoken_server = open_netns("bar")
  do_another_operation();
  close(nstoken_server);
  close(nstoken_client);

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251031-tc_tunnel_improv-v1-2-0ffe44d27eda@bootlin.com
2025-10-31 15:00:30 -07:00
Alexis Lothoré (eBPF Foundation)
c076fd5bb4 selftests/bpf: Skip tc_tunnel subtest if its setup fails
A subtest setup can fail in a wide variety of ways, so make sure not to
run it if an issue occurs during its setup. The return value is
already representing whether the setup succeeds or fails, it is just
about wiring it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251031-tc_tunnel_improv-v1-1-0ffe44d27eda@bootlin.com
2025-10-31 15:00:30 -07:00
Linus Torvalds
39bcf0f7d4 VFIO update for v6.18-rc4
- Fix overflows in vfio type1 backend for mappings at the end of the
    64-bit address space, resulting in leaked pinned memory.  New
    selftest support included to avoid such issues in the future.
    (Alex Mastro)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkFF0YRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyLZsQ/+OzFOEOOw5UIpVrMsnp7DmLOSYgiorlpS
 q9jtsP99dh4U1JqJaONAGb6uu1SL6yX6VL3QZNEppPY83WXVSxX44lo3qpTZ90iq
 xpeUTNHv/YOhAYJ+eXbGfjoJsxQqRLECkWyJwQAQl227yRsqWzAR5twRhBrLi9Gf
 hd6d+ZQuTbEQo3oa7bNA2Q+EUevn7gEiNEBAz6L+zz0DWiVrd+XEk4N+37MR7UKr
 6og/p3lI6AyjFG3dqUGBB2uUVfCGzdCXi6F01reZm3q7aPUdO+PuP7MnYjpJ5j4n
 Ph3qN5iZ+bF24wXtZEGs+NDvaNXsfk5RoPxjc62DeYRG7WfH83KZCyrQju6vuCsd
 CFirCf/97MK/LdDaj46nOi37vNQGap4EibmJTI9hMRh2x2RRlxTVdOv+j/lkQAEr
 GgNwPMZO1yp2tJKQ5DJEgaXgQyOfEpWAtG2qbVCXHUc6lzsN2HQTpIDSbXsZdpfK
 IuCLtmaR20tfUoRQJYCryb2hww2+6Z5j6W+CDdZpCfkqmQnsC0duoebru1g32vc0
 20PrVE/xMMtw+I1CBTDLWFDLs/M8sg0AbBicoHFXh1Ea8n9EC3b/aczPk0ff0b9g
 x2BcDBaoCyslE7pSpeCwemCrACtreXu7Ttzb5CgAIQAuBcZCvKAFGC80KGOt8Tbh
 M0bQYDkeouM=
 =sfMw
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Fix overflows in vfio type1 backend for mappings at the end of the
   64-bit address space, resulting in leaked pinned memory.

   New selftest support included to avoid such issues in the future
   (Alex Mastro)

* tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: selftests: add end of address space DMA map/unmap tests
  vfio: selftests: update DMA map/unmap helpers to support more test kinds
  vfio/type1: handle DMA map/unmap up to the addressable limit
  vfio/type1: move iova increment to unmap_unpin_*() caller
  vfio/type1: sanitize for overflow using check_*_overflow()
2025-10-31 14:20:09 -07:00
Bastien Curutchet (eBPF Foundation)
d1aec26fce selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
test_xsk.c isn't part of the test_progs framework.

Integrate the tests defined by test_xsk.c into the test_progs framework
through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it
as veth peers don't support it.

Move test_xsk{.c/.h} to prog_tests/.

Add the find_bit library to test_progs sources in the Makefile as it is
is used by test_xsk.c

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-15-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
75fc630867 selftests/bpf: test_xsk: Isolate non-CI tests
Following tests won't fit in the CI:
- XDP_ADJUST_TAIL_* and SEND_RECEIVE_9K_PACKETS because of their
  flakyness
- UNALIGNED_* because they depend on huge page allocations
- *_RING_SIZE because they depend on HW rings
- TEARDOWN because it's too long

Remove these tests from the nominal tests table so they won't be
run by the CI in upcoming patch.
Create a skip_ci_tests table to hold them.
Use this skip_ci table in xskxceiver.c to keep all the tests available
from the test_xsk.sh script.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-14-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
7a96615f2e selftests/bpf: test_xsk: Don't exit immediately on allocation failures
If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is
called. This terminates the program immediately. It prevents the following
tests from running and isn't compliant with the CI.

Return NULL in case of allocation failure.
Return TEST_FAILURE when something goes wrong in the packet generation.
Clean up the resources if a failure happens between two steps of a test.

Move exit_with_error()'s definition into xskxceiver.c as it isn't used
anywhere else now.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-13-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
844b13a9ff selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
__testapp_validate_traffic() calls exit_with_error() on failures. This
exits the program immediately. It prevents the following tests from
running and isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_with_error().
Release the resource of the 1st thread if a failure happens between its
creation and the creation of the second thread.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-12-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
5b2a757a16 selftests/bpf: test_xsk: Don't exit immediately when workers fail
TX and RX workers can fail in many places. These failures trigger a call
to exit_with_error() which exits the program immediately. It prevents the
following tests from running and isn't compliant with the CI.

Add return value to functions that can fail.
Handle failures more smoothly through report_failure().

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-11-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
3f09728f90 selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
exit_with_error() is called when gettimeofday() fails. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_on_error().

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-10-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
f12f1b5d14 selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
xsk_reattach_xdp calls exit_with_error() on failures. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Add a return value to the functions handling XDP attachments to handle
errors more smoothly.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-9-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
e645bcfb16 selftests/bpf: test_xsk: Add return value to init_iface()
init_iface() doesn't have any return value while it can fail. In case of
failure it calls exit_on_error() which exits the application
immediately. This prevents the following tests from being run and isn't
compliant with the CI

Add a return value to init_iface() so errors can be handled more
smoothly.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-8-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
f477b0fd75 selftests/bpf: test_xsk: Release resources when swap fails
testapp_validate_traffic() doesn't release the sockets and the umem
created by the threads if the test isn't currently in its last step.
Thus, if the swap_xsk_resources() fails before the last step, the
created resources aren't cleaned up.

Clean the sockets and the umem in case of swap_xsk_resources() failure.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-7-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
e3dfa0faf1 selftests/bpf: test_xsk: Wrap test clean-up in functions
The clean-up done at the end of a test in __testapp_validate_traffic()
isn't wrapped in a function. It isn't convenient if we want to use it
somewhere else in the code.

Wrap the clean-up in two new functions : the first deletes the sockets,
the second releases the umem.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-6-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
bea4f03897 selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem()
testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr,
where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed.
At the end of the test pkt_stream_restore_default() only releases
xsk_arr[0] which leads to memory leaks.

Release the missing pkt_stream at the end of testapp_xdp_shared_umem()

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-5-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
d66e49ffa0 selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
testapp_stats_rx_dropped() generates pkt_stream twice. The last
generated is released by pkt_stream_restore_default() at the end of the
test but we lose the pointer of the first pkt_stream.

Release the 'middle' pkt_stream when it's getting replaced to prevent
memory leaks.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-4-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
cadc0c1fd7 selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value
__testapp_validate_traffic is supposed to return an integer value that
tells if the test passed (0), failed (-1) or was skiped (2). It actually
returns a boolean in the end. This doesn't harm when the test is
successful but can lead to misinterpretation in case of failure as 1
will be returned instead of -1.

Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-3-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
2233ef8bba selftests/bpf: test_xsk: Initialize bitmap before use
bitmap is used before being initialized.

Initialize it to zero before using it.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-2-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
3ab77f35a7 selftests/bpf: test_xsk: Split xskxceiver
AF_XDP features are tested by the test_xsk.sh script but not by the
test_progs framework. The tests used by the script are defined in
xksxceiver.c which can't be integrated in the test_progs framework as is.

Extract these test definitions from xskxceiver{.c/.h} to put them in new
test_xsk{.c/.h} files.
Keep the main() function and its unshared dependencies in xksxceiver to
avoid impacting the test_xsk.sh script which is often used to test real
hardware.
Move ksft_test_result_*() calls to xskxceiver.c to keep the kselftest's
report valid

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-1-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Jakub Kicinski
1a2352ad82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc4).

No conflicts, adjacent changes:

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
  ded9813d17 ("net: stmmac: Consider Tx VLAN offload tag length for maxSDU")
  26ab9830be ("net: stmmac: replace has_xxxx with core_type")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 06:46:03 -07:00
Linus Torvalds
d127176862 linux_kselftest-fixes-6.18-rc4
Fixes build warning in cachestat found during clang build and adds
 tmpshmcstat to .gitignore.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmkDy4wACgkQCwJExA0N
 Qxz/zw//XkD/6mI0KHL88ko6i9GGcbN7QI0RAZpJFHmKTCRkZ59eXrA1h0eeZGwB
 KOt9JzhGkeYHeFYVXV6ghzlV8zXig5romxyBZ7f3WTtPDIFobNJQqnV8ZcOMe95A
 PtdxNtR1tXK7VYP6MlUETCcKEjO1USrbaHMBHSPpDhoO+uysiW2XptqVjVMWbaCZ
 9yydd4bCsIKSWHc4O2AaOOdA45A1ta+DZFGwvvq4Y0RrRUrLYsvgN/EcC5KGd9yO
 2wqbQ9pmNdqZgrVWXYGNjGoNAlDsyc0vBSDBEYzvEUvcASQw9MG62xXHF6C23TMM
 szNg4QR0xwWXB3EHceHJYUqmiPLWsfjk1az2mkkMKE0f+84JWdKWGCTb2hyU+vjl
 dzUGDRLKuRcrfmAcgGxa1ARuJFElo4H4iAf76/dXVCS7rOciPx2Xc2J2yP2y7wsb
 rh17+k5fmpeOIrAHju4tiSOejDG9bxr27wj/VsL3qUxOcxh20tAk/0D6ojtuyzWF
 Zg247AQT+X/7C4lKwwKMZKdOTqZN5iIEOb4T1vm3dC8rY6KI2hzwyt3TlDyN7rsS
 q84/zP7h3z95LhgQ5WCb3Z7x3vo0gkIirpooD81XVN9X6nQJg73TpcNkzrGcnY9K
 rKALuPtPeaX9b6Bc2tuns10HWMtBCQLI+8sSayMSoz/IFqN1HAU=
 =XQbQ
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Fix build warning in cachestat found during clang build and add
  tmpshmcstat to .gitignore"

* tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: cachestat: Fix warning on declaration under label
  selftests/cachestat: add tmpshmcstat file to .gitignore
2025-10-30 19:48:13 -07:00
Jakub Kicinski
ecca75ae5a selftests: drv-net: replace the nsim ring test with a drv-net one
We are trying to move away from netdevsim-only tests and towards
tests which can be run both against netdevsim and real drivers.

Replace the simple bash script we have for checking ethtool -g/-G
on netdevsim with a Python test tweaking those params as well
as channel count.

The new test is not exactly equivalent to the netdevsim one,
but real drivers don't often support random ring sizes,
let alone modifying max values via debugfs.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251029164930.2923448-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-30 17:32:18 -07:00
Mark Brown
a186fbcfd8 KVM: arm64: selftests: Filter ZCR_EL2 in get-reg-list
get-reg-list includes ZCR_EL2 in the list of EL2 registers that it looks
for when NV is enabled but does not have any feature gate for this register,
meaning that testing any combination of features that includes EL2 but does
not include SVE will result in a test failure due to a missing register
being reported:

| The following lines are missing registers:
|
|	ARM64_SYS_REG(3, 4, 1, 2, 0),

Add ZCR_EL2 to feat_id_regs so that the test knows not to expect to see it
without SVE being enabled.

Fixes: 3a90b6f279 ("KVM: arm64: selftests: get-reg-list: Add base EL2 registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:13:27 +00:00
Mark Brown
92e781c93e KVM: arm64: selftests: Add SCTLR2_EL2 to get-reg-list
We recently added support for SCTLR2_EL2 to the kernel but did not add it
to get-reg-list, resulting in it reporting the missing register when it
is available. Add it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:13:04 +00:00
Maximilian Dittgen
a24f7afce0 KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress
Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID,
rather than a physical redistributor address, for its RDbase
command argument.

As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates
over CPU IDs in the range [0, nr_cpus), passing them as the RDbase
vcpu_id argument to its_send_mapc_cmd().

However, its_encode_target() in the its_send_mapc_cmd() selftest
handler expects RDbase arguments to be formatted with a 16 bit
offset, as shown by the 16-bit target_addr right shift its implementation:

        its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16)

At the moment, all CPU IDs passed into its_send_mapc_cmd() have no
offset, therefore becoming 0x0 after the bit shift. Thus, when
vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c,
it always interprets the RDbase target CPU as CPU 0. All interrupts
sent to collections will be processed by vCPU 0, which defeats the
purpose of this multi-vCPU test.

Fix by creating procnum_to_rdbase() helper function, which left-shifts
the vCPU parameter received by its_send_mapc_cmd 16 bits before passing
it to its_encode_target for encoding.

Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://patch.msgid.link/20251020145946.48288-1-mdittgen@amazon.de
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30 16:12:30 +00:00
Ido Schimmel
02da595751 selftests: traceroute: Add ICMP extensions tests
Test that ICMP extensions are reported correctly when enabled and not
reported when disabled. Test both IPv4 and IPv6 and using different
packet sizes, to make sure trimming / padding works correctly.

Disable ICMP rate limiting (defaults to 1 per-second per-target) so that
the kernel will always generate ICMP errors when needed.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251027082232.232571-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29 18:28:30 -07:00
Kumar Kartikeya Dwivedi
a8a0abf097 selftests/bpf: Add ABBCCA case for rqspinlock stress test
Introduce a new mode for the rqspinlock stress test that exercises a
deadlock that won't be detected by the AA and ABBA checks, such that we
always reliably trigger the timeout fallback. We need 4 CPUs for this
particular case, as CPU 0 is untouched, and three participant CPUs for
triggering the ABBCCA case.

Refactor the lock acquisition paths in the module to better reflect the
three modes and choose the right lock depending on the context.

Also drop ABBA case from running by default as part of test progs, since
the stress test can consume a significant amount of time.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251029181828.231529-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-29 18:17:56 -07:00
Mykyta Yatsenko
5913e936f6 selftests/bpf: Fix intermittent failures in file_reader test
file_reader/on_open_expect_fault intermittently fails when test_progs
runs tests in parallel, because it expects a page fault on first read.
Another file_reader test running concurrently may have already pulled
the same pages into the page cache, eliminating the fault and causing a
spurious failure.

Make file_reader/on_open_expect_fault read from a file region that does
not overlap with other file_reader tests, so the initial access still
faults even under parallel execution.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251029195907.858217-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-29 18:15:30 -07:00
Po-Hsu Lin
9311e9540a selftests: net: use BASH for bareudp testing
In bareudp.sh, this script uses /bin/sh and it will load another lib.sh
BASH script at the very beginning.

But on some operating systems like Ubuntu, /bin/sh is actually pointed to
DASH, thus it will try to run BASH commands with DASH and consequently
leads to syntax issues:
  # ./bareudp.sh: 4: ./lib.sh: Bad substitution
  # ./bareudp.sh: 5: ./lib.sh: source: not found
  # ./bareudp.sh: 24: ./lib.sh: Syntax error: "(" unexpected

Fix this by explicitly using BASH for bareudp.sh. This fixes test
execution failures on systems where /bin/sh is not BASH.

Reported-by: Edoardo Canepa <edoardo.canepa@canonical.com>
Link: https://bugs.launchpad.net/bugs/2129812
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20251027095710.2036108-2-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29 17:56:03 -07:00
Ankit Khushwaha
afb8f6567a selftest: net: fix socklen_t type mismatch in sctp_collision test
Socket APIs like recvfrom(), accept(), and getsockname() expect socklen_t*
arg, but tests were using int variables. This causes -Wpointer-sign
warnings on platforms where socklen_t is unsigned.

Change the variable type from int to socklen_t to resolve the warning and
ensure type safety across platforms.

warning fixed:

sctp_collision.c:62:70: warning: passing 'int *' to parameter of
type 'socklen_t *' (aka 'unsigned int *') converts between pointers to
integer types with different sign [-Wpointer-sign]
   62 |                 ret = recvfrom(sd, buf, sizeof(buf),
									0, (struct sockaddr *)&daddr, &len);
      |                                                           ^~~~
/usr/include/sys/socket.h:165:27: note: passing argument to
parameter '__addr_len' here
  165 |                          socklen_t *__restrict __addr_len);
      |                                                ^

Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251028172947.53153-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29 17:39:26 -07:00
Alexis Lothoré (eBPF Foundation)
5d3591607d selftests/bpf: Remove test_tc_tunnel.sh
Now that test_tc_tunnel.sh scope has been ported to the test_progs
framework, remove it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-4-505c12019f9d@bootlin.com
2025-10-29 12:17:24 -07:00
Alexis Lothoré (eBPF Foundation)
8517b1abe5 selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs
The test_tc_tunnel.sh script checks that a large variety of tunneling
mechanisms handled by the kernel can be handled as well by eBPF
programs. While this test shares similarities with test_tunnel.c (which
is already integrated in test_progs), those are testing slightly
different things:
- test_tunnel.c creates a tunnel interface, and then get and set tunnel
  keys in packet metadata, from BPF programs.
- test_tc_tunnels.sh manually parses/crafts packets content

Bring the tests covered by test_tc_tunnel.sh into the test_progs
framework, by creating a dedicated test_tc_tunnel.sh. This new test
defines a "generic" runner which, for each test configuration:
- will configure the relevant veth pair, each of those isolated in a
  dedicated namespace
- will check that traffic will fail if there is only an encapsulating
  program attached to one veth egress
- will check that traffic succeed if we enable some decapsulation module
  on kernel side
- will check that traffic still succeeds if we replace the kernel
  decapsulation with some eBPF ingress decapsulation.

Example of the new test execution:

  # ./test_progs -a tc_tunnel
  #447/1   tc_tunnel/ipip_none:OK
  #447/2   tc_tunnel/ipip6_none:OK
  #447/3   tc_tunnel/ip6tnl_none:OK
  #447/4   tc_tunnel/sit_none:OK
  #447/5   tc_tunnel/vxlan_eth:OK
  #447/6   tc_tunnel/ip6vxlan_eth:OK
  #447/7   tc_tunnel/gre_none:OK
  #447/8   tc_tunnel/gre_eth:OK
  #447/9   tc_tunnel/gre_mpls:OK
  #447/10  tc_tunnel/ip6gre_none:OK
  #447/11  tc_tunnel/ip6gre_eth:OK
  #447/12  tc_tunnel/ip6gre_mpls:OK
  #447/13  tc_tunnel/udp_none:OK
  #447/14  tc_tunnel/udp_eth:OK
  #447/15  tc_tunnel/udp_mpls:OK
  #447/16  tc_tunnel/ip6udp_none:OK
  #447/17  tc_tunnel/ip6udp_eth:OK
  #447/18  tc_tunnel/ip6udp_mpls:OK
  #447     tc_tunnel:OK
  Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-3-505c12019f9d@bootlin.com
2025-10-29 12:17:22 -07:00
Alexis Lothoré (eBPF Foundation)
86433db932 selftests/bpf: Make test_tc_tunnel.bpf.c compatible with big endian platforms
When trying to run bpf-based encapsulation in a s390x environment, some
parts of test_tc_tunnel.bpf.o do not encapsulate correctly the traffic,
leading to tests failures. Adding some logs shows for example that
packets about to be sent on an interface with the ip6vxlan_eth program
attached do not have the expected value 5 in the ip header ihl field,
and so are ignored by the program.

This phenomenon appears when trying to cross-compile the selftests,
rather than compiling it from a virtualized host: the selftests build
system may then wrongly pick some host headers. If <asm/byteorder.h>
ends up being picked on the host (and if the host has a endianness
different from the target one), it will then expose wrong endianness
defines (e.g __LITTLE_ENDIAN_BITFIELD instead of __BIT_ENDIAN_BITFIELD),
and it will for example mess up the iphdr structure layout used in the
ebpf program.

To prevent this, directly use the vmlinux.h header generated by the
selftests build system rather than including directly specific kernel
headers. As a consequence, add some missing definitions that are not
exposed by vmlinux.h, and adapt the bitfield manipulations to allow
building and using the program on both types of platforms.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-2-505c12019f9d@bootlin.com
2025-10-29 11:07:26 -07:00
Alexis Lothoré (eBPF Foundation)
1d5137c8d1 selftests/bpf: Add tc helpers
The test_tunnel.c file defines small fonctions to easily attach eBPF
programs to tc hooks, either on egress, ingress or both.

Create a shared helper in network_helpers.c so that other tests can
benefit from it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-1-505c12019f9d@bootlin.com
2025-10-29 11:07:24 -07:00
Benjamin Berg
4bb30188c7 tools/nolibc: add uio.h with readv and writev
This is generally useful and struct iovec is also needed for other
purposes such as ptrace.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-10-29 16:29:19 +01:00
Qinxin Xia
f74ee32963 tools/dma: move dma_map_benchmark from selftests to tools/dma
dma_map_benchmark is a standalone developer tool rather than an
automated selftest. It has no pass/fail criteria, expects manual
invocation, and is built as a normal userspace binary. Move it to
tools/dma/ and add a minimal Makefile.

Suggested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Suggested-by: Barry Song <baohua@kernel.org>
Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com>
Acked-by: Barry Song <baohua@kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20251028120900.2265511-3-xiaqinxin@huawei.com
2025-10-29 09:41:40 +01:00
Alex Mastro
de8d1f2fd5 vfio: selftests: add end of address space DMA map/unmap tests
Add tests which validate dma map/unmap at the end of address space. Add
negative test cases for checking that overflowing ioctl args fail with
the expected errno.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251028-fix-unmap-v6-5-2542b96bcc8e@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-10-28 15:54:41 -06:00
Alex Mastro
16950b60c1 vfio: selftests: update DMA map/unmap helpers to support more test kinds
Add __vfio_pci_dma_*() helpers which return -errno from the underlying
ioctls.

Add __vfio_pci_dma_unmap_all() to test more unmapping code paths. Add an
out unmapped arg to report the unmapped byte size.

The existing vfio_pci_dma_*() functions, which are intended for
happy-path usage (assert on failure) are now thin wrappers on top of the
double-underscore helpers.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251028-fix-unmap-v6-4-2542b96bcc8e@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-10-28 15:54:41 -06:00
Gopi Krishna Menon
02227b97a8 selftests: tty: add tty_tiocsti_test to .gitignore
Building the tty selftests generates the tty_tiocsti_test binary, which
appears as untracked file in git. As mentioned in the kselftest
documentation, all the generated objects must be placed inside
.gitignore. This prevents the generated objects from accidentally
getting staged and keeps the working tree clean.

Add the tty_tiocsti_test binary to .gitignore to avoid accidentally
staging the build artifact and maintain a clean working tree.

Link: https://docs.kernel.org/dev-tools/kselftest.html#contributing-new-tests-details

Fixes: 7553f5173e ("selftests/tty: add TIOCSTI test suite")
Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Suggested-by: David Hunter <david.hunter.linux@gmail.com>
Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Link: https://patch.msgid.link/20251026100104.3354-1-krishnagopi487@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-28 15:24:43 +01:00
Xu Kuohai
f9db3a3822 selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer
Add --rb-overwrite option to benchmark BPF ring buffer in overwrite mode.
Since overwrite mode is not yet supported by libbpf for consumer, also add
--rb-bench-producer option to benchmark producer directly without a consumer.

Benchmarks on an x86_64 and an arm64 CPU are shown below for reference.

- AMD EPYC 9654 (x86_64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    32.180 ± 0.033M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    9.617 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    8.810 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    9.272 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    9.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.086 ± 0.032M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   2.945 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   2.519 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   2.545 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.363 ± 0.024M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   2.357 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.267 ± 0.011M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.284 ± 0.020M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.215 ± 0.025M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.193 ± 0.023M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 52   2.208 ± 0.024M/s (drops 0.000 ± 0.000M/s)

- HiSilicon Kunpeng 920 (arm64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    14.478 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    21.787 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    6.045 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    5.352 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    4.850 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.542 ± 0.016M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   3.509 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   3.171 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   3.154 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.974 ± 0.015M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   3.167 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.903 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.866 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.914 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.806 ± 0.012M/s (drops 0.000 ± 0.000M/s)
Rb-prod nr_prod 52   2.840 ± 0.012M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-4-xukuohai@huaweicloud.com
2025-10-27 19:47:32 -07:00
Xu Kuohai
8f7a86ecde selftests/bpf: Add overwrite mode test for BPF ring buffer
Add overwrite mode test for BPF ring buffer. The test creates a BPF ring
buffer in overwrite mode, then repeatedly reserves and commits records
to check if the ring buffer works as expected both before and after
overwriting occurs.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-3-xukuohai@huaweicloud.com
2025-10-27 19:46:32 -07:00
Petr Machata
d10920607f selftests: bridge_mdb: Add a test for MDB flush on snooping disable
Check that non-permanent MDB entries are removed as IGMP / MLD snooping is
disabled.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/9420dfbcf26c8e1134d31244e9e7d6a49d677a69.1761228273.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 17:57:21 -07:00
Wilfred Mallawa
5f30bc4706 selftests: tls: add tls record_size_limit test
Test that outgoing plaintext records respect the tls TLS_TX_MAX_PAYLOAD_LEN
set using setsockopt(). The limit is set to be 128, thus, in all received
records, the plaintext must not exceed this amount.

Also test that setting a new record size limit whilst a pending open
record exists is handled correctly by discarding the request.

Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251022001937.20155-2-wilfred.opensource@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27 16:13:43 -07:00
Mykyta Yatsenko
784cdf9315 selftests/bpf: add file dynptr tests
Introducing selftests for validating file-backed dynptr works as
expected.
 * validate implementation supports dynptr slice and read operations
 * validate destructors should be paired with initializers
 * validate sleepable progs can page in.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-11-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27 09:56:27 -07:00
Mykyta Yatsenko
531b87d865 bpf: widen dynptr size/offset to 64 bit
Dynptr currently caps size and offset at 24 bits, which isn’t sufficient
for file-backed use cases; even 32 bits can be limiting. Refactor dynptr
helpers/kfuncs to use 64-bit size and offset, ensuring consistency
across the APIs.

This change does not affect internals of xdp, skb or other dynptrs,
which continue to behave as before. Also it does not break binary
compatibility.

The widening enables large-file access support via dynptr, implemented
in the next patches.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27 09:56:26 -07:00
Mykyta Yatsenko
a61a257ff5 selftests/bpf: remove unnecessary kfunc prototypes
Remove unnecessary kfunc prototypes from test programs, these are
provided by vmlinux.h

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-2-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27 09:56:26 -07:00
Greg Kroah-Hartman
4e68ae3642 Merge 6.18-rc3 into tty-next
We need the tty/serial fixes in here as well to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-27 08:17:40 +01:00
Alessandro Zanni
13cb6ac5b5 selftest: net: prevent use of uninitialized variable
Fix to avoid the usage of the `ret` variable uninitialized in the
following macro expansions.

It solves the following warning:

In file included from netlink-dumps.c:21:
netlink-dumps.c: In function ‘dump_extack’:
../kselftest_harness.h:788:35: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized]
  788 |                         intmax_t  __exp_print = (intmax_t)__exp; \
      |                                   ^~~~~~~~~~~
../kselftest_harness.h:631:9: note: in expansion of macro ‘__EXPECT’
  631 |         __EXPECT(expected, #expected, seen, #seen, ==, 0)
      |         ^~~~~~~~
netlink-dumps.c:169:9: note: in expansion of macro ‘EXPECT_EQ’
  169 |         EXPECT_EQ(ret, FOUND_EXTACK);
      |         ^~~~~~~~~

The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=net

Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com>
Link: https://patch.msgid.link/20251023205354.28249-1-alessandro.zanni87@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24 18:43:37 -07:00
Jakub Kicinski
2b7553db91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc3).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23 10:53:08 -07:00
Linus Torvalds
ab431bc397 Including fixes from can. Slim pickings, I'm guessing people haven't
really started testing.
 
 Current release - new code bugs:
 
  - eth: mlx5e:
    - psp: avoid 'accel' NULL pointer dereference
    - skip PPHCR register query for FEC histogram if not supported
 
 Previous releases - regressions:
 
  - bonding: update the slave array for broadcast mode
 
  - rtnetlink: re-allow deleting FDB entries in user namespace
 
  - eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path
 
 Previous releases - always broken:
 
  - can: drop skb on xmit if device is in listen-only mode
 
  - gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb()
 
  - eth: mlx5e
    - RX, fix generating skb from non-linear xdp_buff if program
      trims frags
    - make devcom init failures non-fatal, fix races with IPSec
 
 Misc:
 
  - some documentation formatting "fixes"
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmj6QcgACgkQMUZtbf5S
 IrsRvw/+N4rc/M7X6kYXftxW+DchJ9daXVFFXDR6/L05LyOhNL6wDi0TNT0MM2f0
 RLQQ6ReOyCGhDGDzUJN6FNdn6B75HSzoeZKDoCVBoCpHg8C55X2YXMpTZILlKxH+
 A7yI8H6Zb5ljiXlOwhqSrIzth6orNxjqUBAOIiu0JUhvfvuYnRh1RZPYX+3babjS
 2DcEGK36GLe671PR6S33tu2q/xcmPonbBtQjplEtDKyGrV6+/BtvHSJ/bnKjtnar
 5DR1yJE8PdtujdXyxzaS7x2xfVnbnX/huNepqY3K0d5ZXJphQGxCEfY5DTWAsooF
 zX7Q/lRe56bmNzjg+LRhYzIeKaSFlXw23FQe2b+Dpz/qPMKkm4F2LdIq1C9GOXom
 StX0zz7v6C332N/fABAzCUbklaclZA+HyVmjEJYIdK037xaIpYdh9AfkX0Byd8xP
 JOGFeVtqlW/Vg7yqmiJ8N66yaMKOtmzfA3yyQid0F0CVwy/h3z/fDp78rfTjgORp
 tXX2DO9jF5aUZXE5p6VAWQGHOylpVP83GHKzTZtLxDgzK0d4VBXUGslwifn1rhTz
 yzr//qwt18Rcbv7ezmpsLT+Q2S7pTXtEblWZgf/IM5HE0AGdJIItbhYm3sjUYMiM
 UGZgBr0AwWywXcUmtiD/IdA85UJLUGjDKx88Du356VGPBizIG3A=
 =rD+V
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from can. Slim pickings, I'm guessing people haven't
  really started testing.

  Current release - new code bugs:

   - eth: mlx5e:
       - psp: avoid 'accel' NULL pointer dereference
       - skip PPHCR register query for FEC histogram if not supported

  Previous releases - regressions:

   - bonding: update the slave array for broadcast mode

   - rtnetlink: re-allow deleting FDB entries in user namespace

   - eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path

  Previous releases - always broken:

   - can: drop skb on xmit if device is in listen-only mode

   - gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb()

   - eth: mlx5e
       - RX, fix generating skb from non-linear xdp_buff if program
         trims frags
       - make devcom init failures non-fatal, fix races with IPSec

  Misc:

   - some documentation formatting 'fixes'"

* tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net/mlx5: Fix IPsec cleanup over MPV device
  net/mlx5: Refactor devcom to return NULL on failure
  net/mlx5e: Skip PPHCR register query if not supported by the device
  net/mlx5: Add PPHCR to PCAM supported registers mask
  virtio-net: zero unused hash fields
  net: phy: micrel: always set shared->phydev for LAN8814
  vsock: fix lock inversion in vsock_assign_transport()
  ovpn: use datagram_poll_queue for socket readiness in TCP
  espintcp: use datagram_poll_queue for socket readiness
  net: datagram: introduce datagram_poll_queue for custom receive queues
  net: bonding: fix possible peer notify event loss or dup issue
  net: hsr: prevent creation of HSR device with slaves from another netns
  sctp: avoid NULL dereference when chunk data buffer is missing
  ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop
  net: ravb: Ensure memory write completes before ringing TX doorbell
  net: ravb: Enforce descriptor type ordering
  net: hibmcge: select FIXED_PHY
  net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb
  Documentation: networking: ax25: update the mailing list info.
  net: gro_cells: fix lock imbalance in gro_cells_receive()
  ...
2025-10-23 07:03:18 -10:00
Sidharth Seela
920aa3a770 selftests: cachestat: Fix warning on declaration under label
Fix warning caused from declaration under a case label. The proper way
is to declare variable at the beginning of the function. The warning
came from running clang using LLVM=1; and is as follows:

-test_cachestat.c:260:3: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
  260 |                 char *map = mmap(NULL, filesize, PROT_READ | PROT_WRITE,
      |

Link: https://lore.kernel.org/r/20250929115405.25695-2-sidharthseela@gmail.com
Signed-off-by: Sidharth Seela <sidharthseela@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-10-22 09:23:18 -06:00
Madhur Kumar
b90cafb438 selftests/cachestat: add tmpshmcstat file to .gitignore
Add the tmpshmcstat file to .gitignore to avoid
accidentally staging the build artifact

Link: https://lore.kernel.org/r/20251013095149.1386628-1-madhurkumar004@gmail.com
Signed-off-by: Madhur Kumar <madhurkumar004@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-10-22 09:23:05 -06:00
Abhinav Saxena
7553f5173e selftests/tty: add TIOCSTI test suite
TIOCSTI is a TTY ioctl command that allows inserting characters into
the terminal input queue, making it appear as if the user typed those
characters. This functionality has behavior that varies based on system
configuration and process credentials.

The dev.tty.legacy_tiocsti sysctl introduced in commit 83efeeeb3d
("tty: Allow TIOCSTI to be disabled") controls TIOCSTI usage. When
disabled, TIOCSTI requires CAP_SYS_ADMIN capability.

The current implementation checks the current process's credentials via
capable(CAP_SYS_ADMIN), but does not validate against the file opener's
credentials stored in file->f_cred. This creates different behavior when
file descriptors are passed between processes via SCM_RIGHTS.

Add a test suite with 16 test variants using fixture variants to verify
TIOCSTI behavior when dev.tty.legacy_tiocsti is enabled/disabled:

- Basic TIOCSTI tests (8 variants): Direct testing with different
  capability and controlling terminal combinations
- FD passing tests (8 variants): Test behavior when file descriptors
  are passed between processes with different capabilities

The FD passing tests document this behavior - some tests show different
results than expected based on file opener credentials, demonstrating
that TIOCSTI uses current process credentials rather than file opener
credentials.

The tests validate proper enforcement of the legacy_tiocsti sysctl. Test
implementation uses openpty(3) with TIOCSCTTY for isolated PTY
environments. See tty_ioctl(4) for details on TIOCSTI behavior and
security requirements.

Signed-off-by: Abhinav Saxena <xandfury@gmail.com>
Link: https://patch.msgid.link/20250903-toicsti-bug-v4-1-4894b6649ef8@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-22 12:09:07 +02:00
Matthieu Baerts (NGI0)
a9649dfbe5 selftests: mptcp: join: mark laminar tests as skipped if not supported
The call to 'continue_if' was missing: it properly marks a subtest as
'skipped' if the attached condition is not valid.

Without that, the test is wrongly marked as passed on older kernels.

Fixes: c912f935a5 ("selftests: mptcp: join: validate new laminar endp")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-5-8207030cb0e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21 17:36:47 -07:00
Matthieu Baerts (NGI0)
c3496c052a selftests: mptcp: join: mark 'delete re-add signal' as skipped if not supported
The call to 'continue_if' was missing: it properly marks a subtest as
'skipped' if the attached condition is not valid.

Without that, the test is wrongly marked as passed on older kernels.

Fixes: b5e2fb832f ("selftests: mptcp: add explicit test case for remove/readd")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-4-8207030cb0e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21 17:36:46 -07:00
Matthieu Baerts (NGI0)
973f80d715 selftests: mptcp: join: mark implicit tests as skipped if not supported
The call to 'continue_if' was missing: it properly marks a subtest as
'skipped' if the attached condition is not valid.

Without that, the test is wrongly marked as passed on older kernels.

Fixes: 36c4127ae8 ("selftests: mptcp: join: skip implicit tests if not supported")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-3-8207030cb0e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21 17:36:46 -07:00
Matthieu Baerts (NGI0)
d68460bc31 selftests: mptcp: join: mark 'flush re-add' as skipped if not supported
The call to 'continue_if' was missing: it properly marks a subtest as
'skipped' if the attached condition is not valid.

Without that, the test is wrongly marked as passed on older kernels.

Fixes: e06959e9ee ("selftests: mptcp: join: test for flush/re-add endpoints")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-2-8207030cb0e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21 17:36:46 -07:00
Xin Long
a73ca0449b selftests: net: fix server bind failure in sctp_vrf.sh
sctp_vrf.sh could fail:

  TEST 12: bind vrf-2 & 1 in server, connect from client 1 & 2, N [FAIL]
  not ok 1 selftests: net: sctp_vrf.sh # exit=3

The failure happens when the server bind in a new run conflicts with an
existing association from the previous run:

[1] ip netns exec $SERVER_NS ./sctp_hello server ...
[2] ip netns exec $CLIENT_NS ./sctp_hello client ...
[3] ip netns exec $SERVER_NS pkill sctp_hello ...
[4] ip netns exec $SERVER_NS ./sctp_hello server ...

It occurs if the client in [2] sends a message and closes immediately.
With the message unacked, no SHUTDOWN is sent. Killing the server in [3]
triggers a SHUTDOWN the client also ignores due to the unacked message,
leaving the old association alive. This causes the bind at [4] to fail
until the message is acked and the client responds to a second SHUTDOWN
after the server’s T2 timer expires (3s).

This patch fixes the issue by preventing the client from sending data.
Instead, the client blocks on recv() and waits for the server to close.
It also waits until both the server and the client sockets are fully
released in stop_server and wait_client before restarting.

Additionally, replace 2>&1 >/dev/null with -q in sysctl and grep, and
drop other redundant 2>&1 >/dev/null redirections, and fix a typo from
N to Y (connect successfully) in the description of the last test.

Fixes: a61bd7b9fe ("selftests: add a selftest for sctp vrf")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/be2dacf52d0917c4ba5e2e8c5a9cb640740ad2b6.1760731574.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20 16:41:33 -07:00
Nicolin Chen
b09ed52db1 iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
The ioctl returns 0 upon success, so !0 returning -1 breaks the selftest.

Drop the '!' to fix it.

Fixes: 1d235d8494 ("iommu/selftest: prevent use of uninitialized variable")
Link: https://patch.msgid.link/r/20251014214847.1113759-1-nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-10-20 20:01:23 -03:00
Linus Torvalds
6548d364a3 cgroup: Fixes for v6.18-rc2
- Fix seqcount lockdep assertion failure in cgroup freezer on PREEMPT_RT.
   Plain seqcount_t expects preemption disabled, but PREEMPT_RT spinlocks
   don't disable preemption. Switch to seqcount_spinlock_t to properly
   associate css_set_lock with the freeze timing seqcount.
 
 - Misc changes including kernel-doc warning fix for misc_res_type enum and
   improved selftest diagnostics.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaPZ7wg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGTiuAP4swnb+BmcqOu37rxIuKXS+6FwRb0Om2wBs4m5W
 bgASMAEAva+DWpWx0l8GFV4kf14wRV6rYlkMRcP3KSyqXNLmLg4=
 =8A4u
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.18-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Fix seqcount lockdep assertion failure in cgroup freezer on
   PREEMPT_RT.

   Plain seqcount_t expects preemption disabled, but PREEMPT_RT
   spinlocks don't disable preemption. Switch to seqcount_spinlock_t to
   properly associate css_set_lock with the freeze timing seqcount.

 - Misc changes including kernel-doc warning fix for misc_res_type enum
   and improved selftest diagnostics.

* tag 'cgroup-for-6.18-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/misc: fix misc_res_type kernel-doc warning
  selftests: cgroup: Use values_close_report in test_cpu
  selftests: cgroup: add values_close_report helper
  cgroup: Fix seqcount lockdep assertion in cgroup freezer
2025-10-20 09:41:27 -10:00
Sean Christopherson
9e4ce7a89e KVM: selftests: Use "gpa" and "gva" for local variable names in pre-fault test
Rename guest_test_{phys,virt}_mem to g{p,v}a in the pre-fault memory test
to shorten line lengths and to use standard terminology.

Opportunsitically use "base_gva" in the guest code instead of "base_gpa"
to match the host side code, which now passes in "gva" (and because
referencing the virtual address avoids having to know that the data is
identity mapped).

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20251007224515.374516-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 08:59:30 -07:00
Sean Christopherson
17e5a9b777 KVM: selftests: Forcefully override ARCH from x86_64 to x86
Forcefully override ARCH from x86_64 to x86 to handle the scenario where
the user specifies ARCH=x86_64 on the command line.

Fixes: 9af04539d4 ("KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR")
Cc: stable@vger.kernel.org
Reported-by: David Matlack <dmatlack@google.com>
Closes: https://lore.kernel.org/all/20250724213130.3374922-1-dmatlack@google.com
Link: https://lore.kernel.org/r/20251007223057.368082-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 08:55:24 -07:00
Brendan Jackman
b146b289f7 KVM: selftests: Don't fall over in mmu_stress_test when only one CPU is present
Running mmu_stress_test on a system with only one CPU is not a recipe for
success. However, there's no clear-cut reason why it absolutely
shouldn't work, so the test shouldn't completely reject such a platform.

At present, the *3/4 calculation will return zero on these platforms and
the test fails. So, instead just skip that calculation.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/20251007-b4-kvm-mmu-stresstest-1proc-v1-1-8c95aa0e30b6@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 08:55:23 -07:00
Shivank Garg
38ccc50ac0 KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support
Add tests for NUMA memory policy binding and NUMA aware allocation in
guest_memfd. This extends the existing selftests by adding proper
validation for:
  - KVM GMEM set_policy and get_policy() vm_ops functionality using
    mbind() and get_mempolicy()
  - NUMA policy application before and after memory allocation

Run the NUMA mbind() test with and without INIT_SHARED, as KVM should allow
doing mbind(), madvise(), etc. on guest-private memory, e.g. so that
userspace can set NUMA policy for CoCo VMs.

Run the NUMA allocation test only for INIT_SHARED, i.e. if the host can't
fault-in memory (via direct access, madvise(), etc.) as move_pages()
returns -ENOENT if the page hasn't been faulted in (walks the host page
tables to find the associated folio)

[sean: don't skip entire test when running on non-NUMA system, test mbind()
       with private memory, provide more info in assert messages]

Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:45 -07:00
Shivank Garg
e698e89b3e KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems
Add NUMA helpers to probe for support/availability and to check if the
test is running on a multi-node system.  The APIs will be used to verify
guest_memfd NUMA support.

Signed-off-by: Shivank Garg <shivankg@amd.com>
[sean: land helpers in numaif.h, add comments, tweak names]
Link: https://lore.kernel.org/r/20251016172853.52451-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:44 -07:00
Sean Christopherson
fe7baebb99 KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
Drop the KVM's re-definitions of MPOL_xxx flags in numaif.h as they are
defined by the already-included, kernel-provided mempolicy.h.  The only
reason the duplicate definitions don't cause compiler warnings is because
they are identical, but only on x86-64!  The syscall numbers in particular
are subtly x86_64-specific, i.e. will cause problems if/when numaif.h is
used outsize of x86.

Opportunistically clean up the file comment as the license information is
covered by the SPDX header, the path is superfluous, and as above the
comment about the contents is flat out wrong.

Fixes: 346b59f220 ("KVM: selftests: Add missing header file needed by xAPIC IPI tests")
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:44 -07:00
Sean Christopherson
2189d78269 KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h
Add APIs for all syscalls defined in the kernel's mm/mempolicy.c to match
those that would be provided by linking to libnuma.  Opportunistically use
the recently inroduced KVM_SYSCALL_DEFINE() builders to take care of the
boilerplate, and to fix a flaw where the two existing wrappers would
generate multiple symbols if numaif.h were to be included multiple times.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:43 -07:00
Sean Christopherson
29dc539d74 KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default
Register handlers for signals for all selftests that are likely happen due
to test (or kernel) bugs, and explicitly fail tests on unexpected signals
so that users get a stack trace, i.e. don't have to go spelunking to do
basic triage.

Register the handlers as early as possible, to catch as many unexpected
signals as possible, and also so that the common code doesn't clobber a
handler that's installed by test (or arch) code.

Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:42 -07:00
Sean Christopherson
3223560c93 KVM: selftests: Define wrappers for common syscalls to assert success
Add kvm_<sycall> wrappers for munmap(), close(), fallocate(), and
ftruncate() to cut down on boilerplate code when a sycall is expected
to succeed, and to make it easier for developers to remember to assert
success.

Implement and use a macro framework similar to the kernel's SYSCALL_DEFINE
infrastructure to further cut down on boilerplate code, and to drastically
reduce the probability of typos as the kernel's syscall definitions can be
copy+paste almost verbatim.

Provide macros to build the raw <sycall>() wrappers as well, e.g. to
replace hand-coded wrappers (NUMA) or pure open-coded calls.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lore.kernel.org/r/20251016172853.52451-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:42 -07:00
Puranjay Mohan
7361c86485 selftests/bpf: Fix list_del() in arena list
The __list_del fuction doesn't set the previous node's next pointer to
the next node of the node to be deleted. It just updates the local variable
and not the actual pointer in the previous node.

The test was passing up till now because the bpf code is doing bpf_free()
after list_del and therfore reading head->first from the userspace will
read all zeroes. But after arena_list_del() is finished, head->first should
point to NULL;

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017141727.51355-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:27:26 -07:00
Yonghong Song
4f8543b5f2 selftests/bpf: Fix selftest verif_scale_strobemeta failure with llvm22
With latest llvm22, I hit the verif_scale_strobemeta selftest failure
below:
  $ ./test_progs -n 618
  libbpf: prog 'on_event': BPF program load failed: -E2BIG
  libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
  BPF program is too large. Processed 1000001 insn
  verification time 7019091 usec
  stack depth 488
  processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0
  -- END PROG LOAD LOG --
  libbpf: prog 'on_event': failed to load: -E2BIG
  libbpf: failed to load object 'strobemeta.bpf.o'
  scale_test:FAIL:expect_success unexpected error: -7 (errno 7)
  #618     verif_scale_strobemeta:FAIL

But if I increase the verificaiton insn limit from 1M to 10M, the above
test_progs run actually will succeed. The below is the result from veristat:
  $ ./veristat strobemeta.bpf.o
  Processing 'strobemeta.bpf.o'...
  File              Program   Verdict  Duration (us)    Insns  States  Program size  Jited size
  ----------------  --------  -------  -------------  -------  ------  ------------  ----------
  strobemeta.bpf.o  on_event  success       90250893  9777685  358230         15954       80794
  ----------------  --------  -------  -------------  -------  ------  ------------  ----------
  Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.

Further debugging shows the llvm commit [1] is responsible for the verificaiton
failure as it tries to convert certain switch statement to if-condition. Such
change may cause different transformation compared to original switch statement.

In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is
  define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2,
      ptr noundef %3, ptr noundef %4) #2 !dbg !535 {
    %6 = alloca ptr, align 8
    %7 = alloca i64, align 8
    %8 = alloca ptr, align 8
    %9 = alloca ptr, align 8
    %10 = alloca ptr, align 8
    %11 = alloca ptr, align 8
    %12 = alloca i32, align 4
    ...
    %20 = icmp ne ptr %19, null, !dbg !561
    br i1 %20, label %22, label %21, !dbg !562

  21:                                               ; preds = %5
    store i32 1, ptr %12, align 4
    br label %48, !dbg !563

  22:
    %23 = load ptr, ptr %9, align 8, !dbg !564
    ...

  47:                                               ; preds = %38, %22
    store i32 0, ptr %12, align 4, !dbg !588
    br label %48, !dbg !588

  48:                                               ; preds = %47, %21
    call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588
    %49 = load i32, ptr %12, align 4
    switch i32 %49, label %51 [
      i32 0, label %50
      i32 1, label %50
    ]

  50:                                               ; preds = %48, %48
    ret void, !dbg !589

  51:                                               ; preds = %48
    unreachable
  }

Note that the above 'switch' statement is added by clang frontend.
Without [1], the switch statement will survive until SelectionDag,
so the switch statement acts like a 'barrier' and prevents some
transformation involved with both 'before' and 'after' the switch statement.

But with [1], the switch statement will be removed during middle end
optimization and later middle end passes (esp. after inlining) have more
freedom to reorder the code.

The following is the related source code:

  static void *calc_location(struct strobe_value_loc *loc, void *tls_base):
        bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv);
        /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */
        return tls_ptr && tls_ptr != (void *)-1
                ? tls_ptr + tls_index.offset
                : NULL;

  In read_int_var() func, we have:
        void *location = calc_location(&cfg->int_locs[idx], tls_base);
        if (!location)
                return;

        bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location);
        ...

The static func calc_location() is called inside read_int_var(). The asm code
without [1]:
     77: .123....89 (85) call bpf_probe_read_user#112
     78: ........89 (79) r1 = *(u64 *)(r10 -368)
     79: .1......89 (79) r2 = *(u64 *)(r10 -8)
     80: .12.....89 (bf) r3 = r2
     81: .123....89 (0f) r3 += r1
     82: ..23....89 (07) r2 += 1
     83: ..23....89 (79) r4 = *(u64 *)(r10 -464)
     84: ..234...89 (a5) if r2 < 0x2 goto pc+13
     85: ...34...89 (15) if r3 == 0x0 goto pc+12
     86: ...3....89 (bf) r1 = r10
     87: .1.3....89 (07) r1 += -400
     88: .1.3....89 (b4) w2 = 16
In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place,
so the verifier actually prefers to do verification first at 'r1 = r10' etc.

The asm code with [1]:
    119: .123....89 (85) call bpf_probe_read_user#112
    120: ........89 (79) r1 = *(u64 *)(r10 -368)
    121: .1......89 (79) r2 = *(u64 *)(r10 -8)
    122: .12.....89 (bf) r3 = r2
    123: .123....89 (0f) r3 += r1
    124: ..23....89 (07) r2 += -1
    125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6
    126: ........89 (05) goto pc+17
    ...
    144: ........89 (b4) w1 = 0
    145: .1......89 (6b) *(u16 *)(r8 +80) = r1
In this case, if 'r2 < 0xfffffffe' is true, the control will go to
non-null 'location' branch, so 'goto pc+17' will actually go to
null 'location' branch. This seems causing tremendous amount of
verificaiton state.

To fix the issue, rewrite the following code
  return tls_ptr && tls_ptr != (void *)-1
                ? tls_ptr + tls_index.offset
                : NULL;
to if/then statement and hopefully these explicit if/then statements
are sticky during middle-end optimizations.

Test with llvm20 and llvm21 as well and all strobemeta related selftests
are passed.

  [1] https://github.com/llvm/llvm-project/pull/161000

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251014051639.1996331-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:25:03 -07:00
Yafang Shao
7484e7cd8a bpf: mark vma->{vm_mm,vm_file} as __safe_trusted_or_null
The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus,
we can mark it as trusted_or_null. With this change, BPF helpers can safely
access vma->vm_mm to retrieve the associated mm_struct from the VMA.
Then we can make policy decision from the VMA.

The "trusted" annotation enables direct access to vma->vm_mm within kfuncs
marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and
bpf_task_under_cgroup(). Conversely, "null" enforcement requires all
callsites using vma->vm_mm to perform NULL checks.

The lsm selftest must be modified because it directly accesses vma->vm_mm
without a NULL pointer check; otherwise it will break due to this
change.

For the VMA based THP policy, the use case is as follows,

  @mm = @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null
  if (!@mm)
      return;
  bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner
  @owner = @mm->owner; // mm_struct::owner is rcu trusted or null
  if (!@owner)
    goto out;
  @cgroup1 = bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID);

  /* make the decision based on the @cgroup1 attribute */

  bpf_cgroup_release(@cgroup1); // release the associated cgroup
out:
  bpf_rcu_read_unlock();

PSI memory information can be obtained from the associated cgroup to inform
policy decisions. Since upstream PSI support is currently limited to cgroup
v2, the following example demonstrates cgroup v2 implementation:

  @owner = @mm->owner;
  if (@owner) {
      // @ancestor_cgid is user-configured
      @ancestor = bpf_cgroup_from_id(@ancestor_cgid);
      if (bpf_task_under_cgroup(@owner, @ancestor)) {
          @psi_group = @ancestor->psi;

          /* Extract PSI metrics from @psi_group and
           * implement policy logic based on the values
           */

      }
  }

The vma::vm_file can also be marked with __safe_trusted_or_null.

No additional selftests are required since vma->vm_file and vma->vm_mm are
already validated in the existing selftest suite.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20251016063929.13830-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:23:08 -07:00
Tiezhu Yang
c67f4ae737 selftests/bpf: Silence unused-but-set build warnings
There are some set but not used build errors when compiling bpf selftests
with the latest upstream mainline GCC, at the beginning add the attribute
__maybe_unused for the variables, but it is better to just add the option
-Wno-unused-but-set-variable to CFLAGS in Makefile to disable the errors
instead of hacking the tests.

  tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:36:
  error: variable ‘n_matches_after_delete’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:25:
  error: variable ‘n_matches’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/bpf_cookie.c:426:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/find_vma.c:52:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/perf_branches.c:67:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/perf_link.c:15:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20251018082815.20622-1-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:21:29 -07:00
Alexei Starovoitov
50de48a4dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf at 6.18-rc2
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 18:20:57 -07:00
Linus Torvalds
2953fb6548 hid-for-linus-2025101701
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmjyYSUACgkQpmLzj2vt
 YEnGlxAAukp9PohVJYj5XXfIZXd6nl9wQyTLfVxCaw03R4IBSwq6/SpbSFP3Khr0
 nR6wss6GdpiHGc8WA7Yo3EIZGsZtSolaOrg2pozKop24FD42S5bgq/qWeUNERhGF
 LU6+zpZbYhZ5Joo2Q57vRZ4D9UDYg7CoU1jVAkscEuvQhishLFKpG88VE9Jy/2Eu
 SGQ9Aqq/0NAC/msUHtVWn90EiLIQBS4izWO02Js6JObf8mdisH1MQscOeJ2rfJhV
 7WcdMX/LkqL7DQ9dFFo3j5NqE3SF1aJFjhMY+xivQaYeuwPSffCuWrbVq6TM74hP
 uRfIII1YUYAEyh+69KgQbQLGl0OqZbDhtEZ5p2SCn3DaMwXu+sraxfWRQ8nkG55w
 Gw74ibc3F9pUwwls9i7FUJE3CzoNBcN/FdxvWST5izni4rC2XAC7pzeSmyO3jiQO
 Zvpwa0VkaOeZ1OmVqo4YQBZoNlgBIaBPxqtoHfHxwvEvqUmivSzU+//XFGbvXgK1
 XL8hgYd7P1P6g9UVSvsgiVA6FIdR4EtpbOV3YxQpMVv8uAOCN5omeOAd9jRImN2a
 BxpKr5bcB4JhZcji/E4T4IPegA+8LUniZ6AW5KIUCCz34uDyeDSHsWB+cZiy31zc
 gGa0NqGRfhKOzqj9ExnnRDoZWyhnoHZ3yxUb41TPQfS2p5uZepM=
 =UpoT
 -----END PGP SIGNATURE-----

Merge tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - fix for sticky fingers handling in hid-multitouch (Benjamin
   Tissoires)

 - fix for reporting of 0 battery levels (Dmitry Torokhov)

 - build fix for hid-haptic in certain configurations (Jonathan Denose)

 - improved probe and avoiding spamming kernel log by hid-nintendo
   (Vicki Pfau)

 - fix for OOB in hid-cp2112 (Deepak Sharma)

 - interrupt handling fix for intel-thc-hid (Even Xu)

 - a couple of new device IDs and device-specific quirks

* tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL
  selftests/hid: add tests for missing release on the Dell Synaptics
  HID: multitouch: fix sticky fingers
  HID: multitouch: fix name of Stylus input devices
  HID: hid-input: only ignore 0 battery events for digitizers
  HID: hid-debug: Fix spelling mistake "Rechargable" -> "Rechargeable"
  HID: Kconfig: Fix build error from CONFIG_HID_HAPTIC
  HID: nintendo: Rate limit IMU compensation message
  HID: nintendo: Wait longer for initial probe
  HID: core: Add printk_ratelimited variants to hid_warn() etc
  HID: quirks: Add ALWAYS_POLL quirk for VRS R295 steering wheel
  HID: quirks: avoid Cooler Master MM712 dongle wakeup bug
  HID: cp2112: Add parameter validation to data length
  HID: intel-thc-hid: intel-quickspi: Add ARL PCI Device Id's
  HID: intel-thc-hid: Intel-quickspi: switch first interrupt from level to edge detection
  HID: intel-thc-hid: intel-quicki2c: Fix wrong type casting
2025-10-18 08:18:18 -10:00
Linus Torvalds
d303caf5ca bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjykZUACgkQ6rmadz2v
 bTppcA/+LKOzgJVJd9q3tbPeXb+xJOqiCGPDuv3tfmglHxD+rR5n2lsoTkryL4PZ
 k5yJhA95Z5RVeV3cevl8n4HlTJipm795k+fKz1YRbe9gB49w2SqqDf5s7LYuABOm
 YqdUSCWaLyoBNd+qi9SuyOVXSg0mSbJk0bEsXsgTp/5rUt9v6cz+36BE1Pkcp3Aa
 d6y2I2MBRGCADtEmsXh7+DI0aUvgMi2zlX8veQdfAJZjsQFwbr1ZxSxHYqtS6gux
 wueEiihzipakhONACJskQbRv80NwT5/VmrAI/ZRVzIhsywQhDGdXtQVNs6700/oq
 QmIZtgAL17Y0SAyhzQsQuGhJGKdWKhf3hzDKEDPslmyJ6OCpMAs/ttPHTJ6s/MmG
 6arSwZD/GAIoWhvWYP/zxTdmjqKX13uradvaNTv55hOhCLTTnZQKRSLk1NabHl7e
 V0f7NlZaVPnLerW/90+pn5pZFSrhk0Nno9to+yXaQ9TYJlK4e6dcB9y+rButQNrr
 7bTRyQ55fQlyS+NnaD85wL41IIX4WmJ3ATdrKZIMvGMJaZxjzXRvW4AjGHJ6hbbt
 GATdtISkYqZ4AdlSf2Vj9BysZkf7tS83SyRlM6WDm3IaAS3v5E/e1Ky2Kn6gWu70
 MNcSW/O0DSqXRkcDkY/tSOlYJBZJYo6ZuWhNdQAERA3OxldBSFM=
 =p8bI
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
   imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)

 - Make selftests/bpf arg_parsing.c more robust to errors (Andrii
   Nakryiko)

 - Fix redefinition of 'off' as different kind of symbol when I40E
   driver is builtin (Brahmajit Das)

 - Do not disable preemption in bpf_test_run (Sahil Chandna)

 - Fix memory leak in __lookup_instance error path (Shardul Bankar)

 - Ensure test data is flushed to disk before reading it (Xing Guo)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix redefinition of 'off' as different kind of symbol
  bpf: Do not disable preemption in bpf_test_run().
  bpf: Fix memory leak in __lookup_instance error path
  selftests: arg_parsing: Ensure data is flushed to disk before reading.
  bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
  selftests/bpf: make arg_parsing.c more robust to crashes
  bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
2025-10-18 08:00:43 -10:00
Linus Torvalds
02e5f74ef0 ARM:
- Fix the handling of ZCR_EL2 in NV VMs
 
 - Pick the correct translation regime when doing a PTW on
   the back of a SEA
 
 - Prevent userspace from injecting an event into a vcpu that isn't
   initialised yet
 
 - Move timer save/restore to the sysreg handling code, fixing EL2 timer
   access in the process
 
 - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
 
 - Fix trapping configuration when the host isn't GICv3
 
 - Improve the detection of HCR_EL2.E2H being RES1
 
 - Drop a spurious 'break' statement in the S1 PTW
 
 - Don't try to access SPE when owned by EL3
 
 Documentation updates:
 
 - Document the failure modes of event injection
 
 - Document that a GICv3 guest can be created on a GICv5 host
   with FEAT_GCIE_LEGACY
 
 Selftest improvements:
 
 - Add a selftest for the effective value of HCR_EL2.AMO
 
 - Address build warning in the timer selftest when building with clang
 
 - Teach irqfd selftests about non-x86 architectures
 
 - Add missing sysregs to the set_id_regs selftest
 
 - Fix vcpu allocation in the vgic_lpi_stress selftest
 
 - Correctly enable interrupts in the vgic_lpi_stress selftest
 
 x86:
 
 - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
   bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return -EAGAIN if userspace
   deletes/moves memslot during prefault")
 
 - Don't try to get PMU capabilities from perf when running a CPU with hybrid
   CPUs/PMUs, as perf will rightly WARN.
 
 guest_memfd:
 
 - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
   generic KVM_CAP_GUEST_MEMFD_FLAGS
 
 - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
   said flag to initialize memory as SHARED, irrespective of MMAP.  The
   behavior merged in 6.18 is that enabling mmap() implicitly initializes
   memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
   as their memory is currently always initialized PRIVATE.
 
 - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
   memory, to enable testing such setups, i.e. to hopefully flush out any
   other lurking ABI issues before 6.18 is officially released.
 
 - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
   and host userspace accesses to mmap()'d private memory.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjzqVIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO+qQgArc7XXmoiHQfTmdqbFL+1ipzfqd/c
 SHJghONWVNKaSm0EsH72iEokmUyI8HssllaBuaGEAT/1F6YmRFwSSFgUG+N02rah
 pL5ShCG2fPVxHal9ZJ04M4DYWPPClmcE2myfQ6k9kwcMgCRK2BdSRRnKH3XfOKrY
 jAFNZVBCeODcnSvjOyxK2QFEt7J97H1AoAxOORvdqFmRqVIEQNJA/3Hx51wPfkwD
 UnCQiNaPinDMxuuwvcmlYsIrQhGaqO4de1Kx0A4ZkSQqFUcyhvB6Qa+DoApz/IBw
 qsFLqoR/1XXJ90wxutSTFzfjHM/SU6fhj57Cl9dAHI3pgnssC1iUvEt9Iw==
 =dvAj
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the handling of ZCR_EL2 in NV VMs

   - Pick the correct translation regime when doing a PTW on the back of
     a SEA

   - Prevent userspace from injecting an event into a vcpu that isn't
     initialised yet

   - Move timer save/restore to the sysreg handling code, fixing EL2
     timer access in the process

   - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug

   - Fix trapping configuration when the host isn't GICv3

   - Improve the detection of HCR_EL2.E2H being RES1

   - Drop a spurious 'break' statement in the S1 PTW

   - Don't try to access SPE when owned by EL3

  Documentation updates:

   - Document the failure modes of event injection

   - Document that a GICv3 guest can be created on a GICv5 host with
     FEAT_GCIE_LEGACY

  Selftest improvements:

   - Add a selftest for the effective value of HCR_EL2.AMO

   - Address build warning in the timer selftest when building with
     clang

   - Teach irqfd selftests about non-x86 architectures

   - Add missing sysregs to the set_id_regs selftest

   - Fix vcpu allocation in the vgic_lpi_stress selftest

   - Correctly enable interrupts in the vgic_lpi_stress selftest

  x86:

   - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
     for the bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return
     -EAGAIN if userspace deletes/moves memslot during prefault")

   - Don't try to get PMU capabilities from perf when running a CPU with
     hybrid CPUs/PMUs, as perf will rightly WARN.

  guest_memfd:

   - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
     more generic KVM_CAP_GUEST_MEMFD_FLAGS

   - Add a guest_memfd INIT_SHARED flag and require userspace to
     explicitly set said flag to initialize memory as SHARED,
     irrespective of MMAP.

     The behavior merged in 6.18 is that enabling mmap() implicitly
     initializes memory as SHARED, which would result in an ABI
     collision for x86 CoCo VMs as their memory is currently always
     initialized PRIVATE.

   - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
     private memory, to enable testing such setups, i.e. to hopefully
     flush out any other lurking ABI issues before 6.18 is officially
     released.

   - Add testcases to the guest_memfd selftest to cover guest_memfd
     without MMAP, and host userspace accesses to mmap()'d private
     memory"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
  arm64: Revamp HCR_EL2.E2H RES1 detection
  KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
  KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
  KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
  KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
  KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
  KVM: arm64: Kill leftovers of ad-hoc timer userspace access
  KVM: arm64: Fix WFxT handling of nested virt
  KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
  KVM: arm64: Make timer_set_offset() generally accessible
  KVM: arm64: Replace timer context vcpu pointer with timer_id
  KVM: arm64: Introduce timer_context_to_vcpu() helper
  KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
  Documentation: KVM: Update GICv3 docs for GICv5 hosts
  KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
  KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
  KVM: arm64: selftests: Allocate vcpus with correct size
  ...
2025-10-18 07:07:14 -10:00
Paolo Bonzini
4361f5aa8b KVM x86 fixes for 6.18:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
    bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return -EAGAIN if userspace
    deletes/moves memslot during prefault")
 
  - Don't try to get PMU capabbilities from perf when running a CPU with hybrid
    CPUs/PMUs, as perf will rightly WARN.
 
  - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
    generic KVM_CAP_GUEST_MEMFD_FLAGS
 
  - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
    said flag to initialize memory as SHARED, irrespective of MMAP.  The
    behavior merged in 6.18 is that enabling mmap() implicitly initializes
    memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
    as their memory is currently always initialized PRIVATE.
 
  - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
    memory, to enable testing such setups, i.e. to hopefully flush out any
    other lurking ABI issues before 6.18 is officially released.
 
  - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
    and host userspace accesses to mmap()'d private memory.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjyszoACgkQOlYIJqCj
 N/03+g/9FPoIPxGL9+tJyGahdH2Mygiip8Q3tUTlGVkskp+dplf+6T51ogdqBOkS
 tvlGjccxAOVW73ijn8ox7UtGSY9B1IJ+rj/uhsBOMFQptyUJEv4iEujFKB/t5RIF
 gOTVVR6Z/mcrYJY7F21qBPCHPbz++rEXrfgyAsosz6tpS/nL6vrNdp1LZlcsLM/k
 5DuhkQHLEwJoUXO5VUsBWRa3gRdOuZ+SJGd4C1mfDwXagxe8uFYRO/vraOV9dKJx
 l9RxKPIhf42cfu9tIeJYDJyXIDgzXUEND4smVw/ito1SeakBlC9rQ15ya8ETLAEX
 tHzmj38RPOfjsycGKIRhlLzQx77b+t7FhNdse19QC3p3u9Jn8FuMyFcGZuiaP5kK
 e9Xrp/zcaCNfHB6gGKEud7h7fpV9yB8SNlCsP81if73buq3qHx0+jeVg3jCS4mkb
 zGY7CG+oKJOdTSN8JAh8nJMM4bUv5m8myr6yYGU+SsGBQsyPyqQdRWuKdQyOQIVC
 RZSWXSjKfmT5FwSs0KRI0yUjMeUYbgwrsysFuS3qX62mZpr0vLaPoiYFvuffe9gB
 W3Tt98QNPtBmJdINNncgKvbn3sp/CzUHirygaI0APZwh6QQAkL1Dp2beA1i95uVI
 8uin4zUjRbRjUpMUaEtAQIaVMTVQqgfiPAvtcDCRBBeOGaNr81M=
 =lBtj
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.18:

 - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
   bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return -EAGAIN if userspace
   deletes/moves memslot during prefault")

 - Don't try to get PMU capabbilities from perf when running a CPU with hybrid
   CPUs/PMUs, as perf will rightly WARN.

 - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
   generic KVM_CAP_GUEST_MEMFD_FLAGS

 - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
   said flag to initialize memory as SHARED, irrespective of MMAP.  The
   behavior merged in 6.18 is that enabling mmap() implicitly initializes
   memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
   as their memory is currently always initialized PRIVATE.

 - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
   memory, to enable testing such setups, i.e. to hopefully flush out any
   other lurking ABI issues before 6.18 is officially released.

 - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
   and host userspace accesses to mmap()'d private memory.
2025-10-18 10:25:43 +02:00
Jakub Kicinski
e90576829c bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaPFVxwAKCRAraS+Z+3y5
 E1yDAP98nxKLBFb9auLj8AV1zUNl4lKEpOIhH7ceXaDT7ucsagEAwvSbGAN5sun/
 M0+sOpKEZwF8JBoRU8L61uYPYSu7XAk=
 =2K9O
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-10-16

We've added 6 non-merge commits during the last 1 day(s) which contain
a total of 18 files changed, 577 insertions(+), 38 deletions(-).

The main changes are:

1) Bypass the global per-protocol memory accounting either by setting
   a netns sysctl or using bpf_setsockopt in a bpf program,
   from Kuniyuki Iwashima.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Add test for sk->sk_bypass_prot_mem.
  bpf: Introduce SK_BPF_BYPASS_PROT_MEM.
  bpf: Support bpf_setsockopt() for BPF_CGROUP_INET_SOCK_CREATE.
  net: Introduce net.core.bypass_prot_mem sysctl.
  net: Allow opt-out from global protocol memory accounting.
  tcp: Save lock_sock() for memcg in inet_csk_accept().
====================

Link: https://patch.msgid.link/20251016204539.773707-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-17 17:20:42 -07:00
Carlos Llamas
f578ff4c53 selftests/net: io_uring: fix unknown errnum values
The io_uring functions return negative error values, but error() expects
these to be positive to properly match them to an errno string. Fix this
to make sure the correct error descriptions are displayed upon failure.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://patch.msgid.link/20251016182538.3790567-1-cmllamas@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-17 16:57:53 -07:00
Brahmajit Das
a1e83d4c03 selftests/bpf: Fix redefinition of 'off' as different kind of symbol
This fixes the following build error

   CLNG-BPF [test_progs] verifier_global_ptr_args.bpf.o
progs/verifier_global_ptr_args.c:228:5: error: redefinition of 'off' as
different kind of symbol
   228 | u32 off;
       |     ^

The symbol 'off' was previously defined in
tools/testing/selftests/bpf/tools/include/vmlinux.h, which includes an
enum i40e_ptp_gpio_pin_state from
drivers/net/ethernet/intel/i40e/i40e_ptp.c:

	enum i40e_ptp_gpio_pin_state {
		end = -2,
		invalid = -1,
		off = 0,
		in_A = 1,
		in_B = 2,
		out_A = 3,
		out_B = 4,
	};

This enum is included when CONFIG_I40E is enabled. As of commit
032676ff82 ("LoongArch: Update Loongson-3 default config file"),
CONFIG_I40E is set in the defconfig, which leads to the conflict.

Renaming the local variable avoids the redefinition and allows the
build to succeed.

Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017171551.53142-1-listout@listout.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-17 11:33:23 -07:00
Eric Dumazet
56cef47c28 selftests/net: packetdrill: unflake tcp_user_timeout_user-timeout-probe.pkt
This test fails the first time I am running it after a fresh virtme-ng boot.

tcp_user_timeout_user-timeout-probe.pkt:33: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory)

Tweaks the timings a bit, to reduce flakiness.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soham Chakradeo <sohamch@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251014171907.3554413-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-16 16:25:09 -07:00
Kuniyuki Iwashima
5f941dd87b selftests/bpf: Add test for sk->sk_bypass_prot_mem.
The test does the following for IPv4/IPv6 x TCP/UDP sockets
with/without sk->sk_bypass_prot_mem, which can be turned on by
net.core.bypass_prot_mem or bpf_setsockopt(SK_BPF_BYPASS_PROT_MEM).

  1. Create socket pairs
  2. Send NR_PAGES (32) of data (TCP consumes around 35 pages,
     and UDP consuems 66 pages due to skb overhead)
  3. Read memory_allocated from sk->sk_prot->memory_allocated and
     sk->sk_prot->memory_per_cpu_fw_alloc
  4. Check if unread data is charged to memory_allocated

If sk->sk_bypass_prot_mem is set, memory_allocated should not be
changed, but we allow a small error (up to 10 pages) in case
other processes on the host use some amounts of TCP/UDP memory.

The amount of allocated pages are buffered to per-cpu variable
{tcp,udp}_memory_per_cpu_fw_alloc up to +/- net.core.mem_pcpu_rsv
before reported to {tcp,udp}_memory_allocated.

At 3., memory_allocated is calculated from the 2 variables at
fentry of socket create function.

We drain the receive queue only for UDP before close() because UDP
recv queue is destroyed after RCU grace period.  When I printed
memory_allocated, UDP bypass cases sometimes saw the no-bypass
case's leftover, but it's still in the small error range (<10 pages).

  bpf_trace_printk: memory_allocated: 0   <-- TCP no-bypass
  bpf_trace_printk: memory_allocated: 35
  bpf_trace_printk: memory_allocated: 0   <-- TCP w/ sysctl
  bpf_trace_printk: memory_allocated: 0
  bpf_trace_printk: memory_allocated: 0   <-- TCP w/ bpf
  bpf_trace_printk: memory_allocated: 0
  bpf_trace_printk: memory_allocated: 0   <-- UDP no-bypass
  bpf_trace_printk: memory_allocated: 66
  bpf_trace_printk: memory_allocated: 2   <-- UDP w/ sysctl (2 pages leftover)
  bpf_trace_printk: memory_allocated: 2
  bpf_trace_printk: memory_allocated: 2   <-- UDP w/ bpf (2 pages leftover)
  bpf_trace_printk: memory_allocated: 2

We prefer finishing tests faster than oversleeping for call_rcu()
 + sk_destruct().

The test completes within 2s on QEMU (64 CPUs) w/ KVM.

  # time ./test_progs -t sk_bypass
  #371/1   sk_bypass_prot_mem/TCP  :OK
  #371/2   sk_bypass_prot_mem/UDP  :OK
  #371/3   sk_bypass_prot_mem/TCPv6:OK
  #371/4   sk_bypass_prot_mem/UDPv6:OK
  #371     sk_bypass_prot_mem:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

  real	0m1.481s
  user	0m0.181s
  sys	0m0.441s

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://patch.msgid.link/20251014235604.3057003-7-kuniyu@google.com
2025-10-16 12:04:47 -07:00
Linus Torvalds
634ec1fc79 Including fixes from CAN
Current release - regressions:
 
   - udp: do not use skb_release_head_state() before skb_attempt_defer_free()
 
   - gro_cells: use nested-BH locking for gro_cell
 
   - dpll: zl3073x: increase maximum size of flash utility
 
 Previous releases - regressions:
 
   - core: fix lockdep splat on device unregister
 
   - tcp: fix tcp_tso_should_defer() vs large RTT
 
   - tls:
     - don't rely on tx_work during send()
     - wait for pending async decryptions if tls_strp_msg_hold fails
 
   - can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
 
   - eth: lan78xx: fix lost EEPROM write timeout in lan78xx_write_raw_eeprom
 
 Previous releases - always broken:
 
   - ip6_tunnel: prevent perpetual tunnel growth
 
   - dpll: zl3073x: handle missing or corrupted flash configuration
 
   - can: m_can: fix pm_runtime and CAN state handling
 
   - eth:ixgbe: fix too early devlink_free() in ixgbe_remove()
 
   - eth: ixgbevf: fix mailbox API compatibility
 
   - eth: gve: Check valid ts bit on RX descriptor before hw timestamping
 
   - eth: idpf: cleanup remaining SKBs in PTP flows
 
   - eth: r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjxBUASHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkTfwP/0I5aKZHbZ9I3gHL6Dw9TtcYPsP8cZDK
 Iu7nOtgyD8AKSnX+3ynABg4K3WbE6HFi1BQzv63uO9G2UYLZSMdU16uytUKbstR6
 HWw+r2N0NMb+qacb++nbMugmYczSWUYKRyrpWHrNtqbKi3aQRJt5lQ81AGwgMITX
 mx8VILjMTXIt/K055LVnF9z2c4ltFmLTGiODKqHjQQirNbBiRJFNYn4qtw3BkbdS
 OwxNtTqOozQppRf+1l4MRHHXNbwGe7W9hOnALbiHpwtXxvG/+NVRu8IkNcD8vMD9
 I7aThpkOt7B9xcWqLb2pAAk976MuKQr6BUGKQrP26vZb9W0NIJbew1Y8d74Zld5d
 Z9bD5Uqr85Ge4Xp3/mWnaQZWk2OTsVdOOxkUoPWilFByULGfJORMvu6+y6/RWpZt
 dezE7WL9o+K1AJLBWnZ3RTWI8vmSEDup4WzPSP1v7paAFcvb9Ub4pBghX9+RGCfE
 ZmCecQPZfVzTe08PvAysc9VR7o3jGx5FpCLhinLxaHaSrV4YvALyV8iYagrmtQWY
 iWVkMYEIxh5B6KLmIVVu5cQJJjuRLm6k+nCTemBJaAItu2YLdy4ZeLeFWUhVgmxM
 SIsueBmTGorRqCbSed7L17yMApsMaBzE775+qOFoK7tLoKVTm22AMA3DzQtKGeIv
 OFlNreoOWzMX
 =jVlx
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from CAN

  Current release - regressions:

    - udp: do not use skb_release_head_state() before
      skb_attempt_defer_free()

    - gro_cells: use nested-BH locking for gro_cell

    - dpll: zl3073x: increase maximum size of flash utility

  Previous releases - regressions:

    - core: fix lockdep splat on device unregister

    - tcp: fix tcp_tso_should_defer() vs large RTT

    - tls:
        - don't rely on tx_work during send()
        - wait for pending async decryptions if tls_strp_msg_hold fails

    - can: j1939: add missing calls in NETDEV_UNREGISTER notification
      handler

    - eth: lan78xx: fix lost EEPROM write timeout in
      lan78xx_write_raw_eeprom

  Previous releases - always broken:

    - ip6_tunnel: prevent perpetual tunnel growth

    - dpll: zl3073x: handle missing or corrupted flash configuration

    - can: m_can: fix pm_runtime and CAN state handling

    - eth:
        - ixgbe: fix too early devlink_free() in ixgbe_remove()
        - ixgbevf: fix mailbox API compatibility
        - gve: Check valid ts bit on RX descriptor before hw timestamping
        - idpf: cleanup remaining SKBs in PTP flows
        - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"

* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  udp: do not use skb_release_head_state() before skb_attempt_defer_free()
  net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
  netdevsim: set the carrier when the device goes up
  selftests: tls: add test for short splice due to full skmsg
  selftests: net: tls: add tests for cmsg vs MSG_MORE
  tls: don't rely on tx_work during send()
  tls: wait for pending async decryptions if tls_strp_msg_hold fails
  tls: always set record_type in tls_process_cmsg
  tls: wait for async encrypt in case of error during latter iterations of sendmsg
  tls: trim encrypted message to match the plaintext on short splice
  tg3: prevent use of uninitialized remote_adv and local_adv variables
  MAINTAINERS: new entry for IPv6 IOAM
  gve: Check valid ts bit on RX descriptor before hw timestamping
  net: core: fix lockdep splat on device unregister
  MAINTAINERS: add myself as maintainer for b53
  selftests: net: check jq command is supported
  net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
  tcp: fix tcp_tso_should_defer() vs large RTT
  r8152: add error handling in rtl8152_driver_init
  usbnet: Fix using smp_processor_id() in preemptible code warnings
  ...
2025-10-16 09:41:21 -07:00
Xing Guo
0c1999ed33 selftests: arg_parsing: Ensure data is flushed to disk before reading.
test_parse_test_list_file writes some data to
/tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read
the data back.  However, after writing data to that file, we forget to
call fsync() and it's causing testing failure in my laptop.  This patch
helps fix it by adding the missing fsync() call.

Fixes: 64276f01dc ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251016035330.3217145-1-higuoxing@gmail.com
2025-10-16 09:34:39 -07:00
Sabrina Dubroca
3667e9b442 selftests: tls: add test for short splice due to full skmsg
We don't have a test triggering a partial splice caused by a full
skmsg. Add one, based on a program by Jann Horn.

Use MAX_FRAGS=48 to make sure the skmsg will be full for any allowed
value of CONFIG_MAX_SKB_FRAGS (17..45).

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/1d129a15f526ea3602f3a2b368aa0b6f7e0d35d5.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:46 -07:00
Sabrina Dubroca
f95fce1e95 selftests: net: tls: add tests for cmsg vs MSG_MORE
We don't have a test to check that MSG_MORE won't let us merge records
of different types across sendmsg calls.

Add new tests that check:
 - MSG_MORE is only allowed for DATA records
 - a pending DATA record gets closed and pushed before a non-DATA
   record is processed

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/b34feeadefe8a997f068d5ed5617afd0072df3c0.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:45 -07:00
Ryan Newton
5aff3b3199 sched_ext: Add a selftest for scx_bpf_dsq_peek
This commit adds two tests. The first is the most basic unit test:
make sure an empty queue peeks as empty, and when we put one element
in the queue, make sure peek returns that element.

However, even this simple test is a little complicated by the different
behavior of scx_bpf_dsq_insert in different calling contexts:
 - insert is for direct dispatch in enqueue
 - insert is delayed when called from select_cpu

In this case we split the insert and the peek that verifies the
result between enqueue/dispatch.

Note: An alternative would be to call `scx_bpf_dsq_move_to_local` on an
empty queue, which in turn calls `flush_dispatch_buf`, in order to flush
the buffered insert. Unfortunately, this is not viable within the
enqueue path, as it attempts a voluntary context switch within an RCU
read-side critical section.

The second test is a stress test that performs many peeks on all DSQs
and records the observed tasks.

Signed-off-by: Ryan Newton <newton@meta.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 06:46:36 -10:00
Benjamin Tissoires
d9b3014a7f selftests/hid: add tests for missing release on the Dell Synaptics
Add a simple test for the corner case not currently covered by the
sticky fingers quirk. Because it's a corner case test, we only test this
on a couple of devices, not on all of them because the value of adding
the same test over and over is rather moot.

Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-10-15 17:27:06 +02:00
Sebastian Chlad
4cdde87d72 selftests: cgroup: Use values_close_report in test_cpu
Convert test_cpu to use the newly added values_close_report() helper
to print detailed diagnostics when a tolerance check fails. This
provides clearer insight into deviations while run in the CI.

Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 05:00:59 -10:00
Sebastian Chlad
3f9c60f4d3 selftests: cgroup: add values_close_report helper
Some cgroup selftests, such as test_cpu, occasionally fail by a very
small margin and if run in the CI context, it is useful to have detailed
diagnostic output to understand the deviation.

Introduce a values_close_report() helper which performs the same
comparison as values_close(), but prints detailed information when the
values differ beyond the allowed tolerance.

Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15 05:00:49 -10:00
Fushuai Wang
5cb5575308 selftests: livepatch: use canonical ftrace path
Since v4.1 kernel, a new interface for ftrace called "tracefs" was
introduced, which is usually mounted in /sys/kernel/tracing. Therefore,
tracing files can now be accessed via either the legacy path
/sys/kernel/debug/tracing or the newer path /sys/kernel/tracing.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-10-15 14:50:46 +02:00
Andrii Nakryiko
e603a342cf selftests/bpf: make arg_parsing.c more robust to crashes
We started getting a crash in BPF CI, which seems to originate from
test_parse_test_list_file() test and is happening at this line:

  ASSERT_OK(strcmp("test_with_spaces", set.tests[0].name), "test 0 name");

One way we can crash there is if set.cnt zero, which is checked for with
ASSERT_EQ() above, but we proceed after this regardless of the outcome.
Instead of crashing, we should bail out with test failure early.

Similarly, if parse_test_list_file() fails, we shouldn't be even looking
at set, so bail even earlier if ASSERT_OK() fails.

Fixes: 64276f01dc ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251014202037.72922-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-14 16:39:33 -07:00
Wang Liang
4f86eb0a38 selftests: net: check jq command is supported
The jq command is used in vlan_bridge_binding.sh, if it is not supported,
the test will spam the following log.

  # ./vlan_bridge_binding.sh: line 51: jq: command not found
  # ./vlan_bridge_binding.sh: line 51: jq: command not found
  # ./vlan_bridge_binding.sh: line 51: jq: command not found
  # ./vlan_bridge_binding.sh: line 51: jq: command not found
  # ./vlan_bridge_binding.sh: line 51: jq: command not found
  # TEST: Test bridge_binding on->off when lower down                   [FAIL]
  #       Got operstate of , expected 0

The rtnetlink.sh has the same problem. It makes sense to check if jq is
installed before running these tests. After this patch, the
vlan_bridge_binding.sh skipped if jq is not supported:

  # timeout set to 3600
  # selftests: net: vlan_bridge_binding.sh
  # TEST: jq not installed                                              [SKIP]

Fixes: dca12e9ab7 ("selftests: net: Add a VLAN bridge binding selftest")
Fixes: 6a414fd77f ("selftests: rtnetlink: Add an address proto test")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251013080039.3035898-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-14 15:12:18 +02:00
Marc Zyngier
5c7cf1e44e KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
The userspace-visible encoding for CNTV_CVAL_EL0 and CNTVCNT_EL0
have been swapped for as long as usersapce has had access to the
registers. This is documented in arch/arm64/include/uapi/asm/kvm.h.

Despite that, the get_reg_list test has unhelpful comments indicating
the wrong register for the encoding.

Replace this with definitions exposed in the include file, and
a comment explaining again the brokenness.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:43:12 +01:00
Marc Zyngier
4da5a9af78 KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
Add yet another configuration, this time dealing E2H=0.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:42:41 +01:00
Marc Zyngier
6418330c84 KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
The hyp virtual timer registers only exist when VHE is present,
Similarly, VNCR_EL2 only exists when NV2 is present.

Make these dependencies explicit.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:42:41 +01:00
Oliver Upton
d5e6310a0d KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
vgic_lpi_stress rather hilariously leaves IRQs disabled for the duration
of the test. While the ITS translation of MSIs happens regardless of
this, for completeness the guest should actually handle the LPIs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:28:27 +01:00
Zenghui Yu
2192d348c0 KVM: arm64: selftests: Allocate vcpus with correct size
vcpus array contains pointers to struct kvm_vcpu {}. It is way overkill
to allocate the array with (nr_cpus * sizeof(struct kvm_vcpu)). Fix the
allocation by using the correct size.

Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:27:55 +01:00
Zenghui Yu
9a7f87eb58 KVM: arm64: selftests: Sync ID_AA64PFR1, MPIDR, CLIDR in guest
We forgot to sync several registers (ID_AA64PFR1, MPIDR, CLIDR) in guest to
make sure that the guest had seen the written value.

Add them to the list.

Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Reviewed-By: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:17:03 +01:00
Oliver Upton
a133052666 KVM: selftests: Fix irqfd_test for non-x86 architectures
The KVM_IRQFD ioctl fails if no irqchip is present in-kernel, which
isn't too surprising as there's not much KVM can do for an IRQ if it
cannot resolve a destination.

As written the irqfd_test assumes that a 'default' VM created in
selftests has an in-kernel irqchip created implicitly. That may be the
case on x86 but it isn't necessarily true on other architectures.

Add an arch predicate indicating if 'default' VMs get an irqchip and
make the irqfd_test depend on it. Work around arm64 VGIC initialization
requirements by using vm_create_with_one_vcpu(), ignoring the created
vCPU as it isn't used for the test.

Reported-by: Sebastian Ott <sebott@redhat.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Acked-by: Sean Christopherson <seanjc@google.com>
Fixes: 7e9b231c40 ("KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:17:03 +01:00
Sean Christopherson
cb49b7b862 KVM: arm64: selftests: Track width of timer counter as "int", not "uint64_t"
Store the width of arm64's timer counter as an "int", not a "uint64_t".
ilog2() returns an "int", and more importantly using what is an "unsigned
long" under the hood makes clang unhappy due to a type mismatch when
clamping the width to a sane value.

  arm64/arch_timer_edge_cases.c:1032:10: error: comparison of distinct pointer types
     ('typeof (width) *' (aka 'unsigned long *') and 'typeof (56) *' (aka 'int *'))
     [-Werror,-Wcompare-distinct-pointer-types]
   1032 |         width = clamp(width, 56, 64);
        |                 ^~~~~~~~~~~~~~~~~~~~
  tools/include/linux/kernel.h:47:45: note: expanded from macro 'clamp'
     47 | #define clamp(val, lo, hi)      min((typeof(val))max(val, lo), hi)
        |                                                  ^~~~~~~~~~~~
  tools/include/linux/kernel.h:33:17: note: expanded from macro 'max'
     33 |         (void) (&_max1 == &_max2);              \
        |                 ~~~~~~ ^  ~~~~~~
  tools/include/linux/kernel.h:39:9: note: expanded from macro 'min'
     39 |         typeof(x) _min1 = (x);                  \
        |                ^

Fixes: fad4cf9448 ("KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases")
Cc: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:17:03 +01:00
Oliver Upton
890c608b4d KVM: arm64: selftests: Test effective value of HCR_EL2.AMO
A defect against the architecture now allows an implementation to treat
AMO as 1 when HCR_EL2.{E2H, TGE} = {1, 0}. KVM now takes advantage of
this interpretation to address a quality of emulation issue w.r.t.
SError injection.

Add a corresponding test case and expect a pending SError to be taken.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13 14:17:03 +01:00
Jakub Kicinski
68a052239f selftests: drv-net: update remaining Python init files
Convert remaining __init__ files similar to what we did in
commit b615879dbf ("selftests: drv-net: make linters happy with our imports")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-10-12 19:03:53 +01:00
Alexei Starovoitov
39e9d5f630 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf before 6.18-rc1
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-11 18:27:47 -07:00
Linus Torvalds
fbde105f13 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjpp+sACgkQ6rmadz2v
 bTo+9Q//bUzEc2C64NbG0DTCcxnkDEadzBLIB0BAwnAkuRjR8HJiPGoCdBJhUqzM
 /hNfIHTtDdyspU2qZbM+r4nVJ6zRAwIHrT2d/knERxXtRQozaWvUlRhVmf5tdWYm
 DkbThS9sAfAOs21YjV7OWPrf7bC7T9syQTAfN0CE8cujZY7OnqCyzNwfb8iIusyo
 +Ctm0/qUDVtd6SjdPAQjzp82fHIIwnMFZtWJiZml5LklL1Mx5cuVrT/sWgr5KATW
 vZ9rUfgaiJkAsSX0sSlLnAI76+kJRB+IkmK1TRdWFlwW6dTsa/7MkDeXXPN1dEDi
 o5ZqhcvaY0eAMbU4iX72Juf6gVFF6AgVwsrHmM79ICjg5umCLN/90QqYPc0ChRxl
 EYuSWVQ6/cgV3W6l+KU53cwmRjjdSzyJQFei03COZ0iKF6xic0cynS3BKMQL6HkU
 3BfTj19h+dxt7qywRaJFsrWK4t/uBX6N75XlVa9od/sk91tR/ibtJ6hcyuJGATr5
 nkfMkyN155upAffUnkhv37TXtMXyX8/kd7BddCet31JJXyJuJZ0vYuOcur6awGyN
 aB0T1ueG15sTfGf0zpxVNWhVqswHI/1Suk8EXwbDeHRcsmtrp8XWYawf5StIMwW0
 Uy8GjS5KVl5bcrfDbcvj79jajpVjwnvFR1Sir9C4aROm5OpH3Hs=
 =77JH
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Finish constification of 1st parameter of bpf_d_path() (Rong Tao)

 - Harden userspace-supplied xdp_desc validation (Alexander Lobakin)

 - Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
   Borkmann)

 - Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)

 - Use correct context to unpin bpf hash map with special types (KaFai
   Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add test for unpinning htab with internal timer struct
  bpf: Avoid RCU context warning when unpinning htab with internal structs
  xsk: Harden userspace-supplied xdp_desc validation
  bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
  libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
  bpf: Finish constification of 1st parameter of bpf_d_path()
2025-10-11 10:31:38 -07:00
Sean Christopherson
505f5224b1 KVM: selftests: Verify that reads to inaccessible guest_memfd VMAs SIGBUS
Expand the guest_memfd negative testcases for overflow and MAP_PRIVATE to
verify that reads to inaccessible memory also get a SIGBUS.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Lisa Wang <wyihan@google.com>
Tested-by: Lisa Wang <wyihan@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:30 -07:00
Sean Christopherson
19942d4fd9 KVM: selftests: Verify that faulting in private guest_memfd memory fails
Add a guest_memfd testcase to verify that faulting in private memory gets
a SIGBUS.  For now, test only the case where memory is private by default
since KVM doesn't yet support in-place conversion.

Deliberately run the CoW test with and without INIT_SHARED set as KVM
should disallow MAP_PRIVATE regardless of whether the memory itself is
private from a CoCo perspective.

Cc: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:30 -07:00
Sean Christopherson
f91187c0ec KVM: selftests: Add wrapper macro to handle and assert on expected SIGBUS
Extract the guest_memfd test's SIGBUS handling functionality into a common
TEST_EXPECT_SIGBUS() macro in anticipation of adding more SIGBUS testcases.
Eating a SIGBUS isn't terrible difficult, but it requires a non-trivial
amount of boilerplate code, and using a macro allows selftests to print
out the exact action that failed to generate a SIGBUS without the developer
needing to remember to add a useful error message.

Explicitly mark the SIGBUS handler as "used", as gcc-14 at least likes to
discard the function before linking.

Opportunistically use TEST_FAIL(...) instead of TEST_ASSERT(false, ...),
and fix the write path of the guest_memfd test to use the local "val"
instead of hardcoding the literal value a second time.

Suggested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Lisa Wang <wyihan@google.com>
Tested-by: Lisa Wang <wyihan@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:29 -07:00
Sean Christopherson
505c953009 KVM: selftests: Isolate the guest_memfd Copy-on-Write negative testcase
Move the guest_memfd Copy-on-Write (CoW) testcase to its own function to
better separate positive testcases from negative testcases.

No functional change intended.

Suggested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:28 -07:00
Sean Christopherson
61cee97f40 KVM: selftests: Add wrappers for mmap() and munmap() to assert success
Add and use wrappers for mmap() and munmap() that assert success to reduce
a significant amount of boilerplate code, to ensure all tests assert on
failure, and to provide consistent error messages on failure.

No functional change intended.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:28 -07:00
Ackerley Tng
df0d9923f7 KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP
If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will
run all test cases with GUEST_MEMFD_FLAG_MMAP set.  This leaves the code
path for creating a non-mmap()-able guest_memfd on a VM that supports
mappable guest memfds untested.

Refactor the test to run the main test suite with a given set of flags.
Then, for VM types that support the mappable capability, invoke the test
suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP
set.

This ensures both creation paths are properly exercised on capable VMs.

Run test_guest_memfd_flags() only once per VM type since it depends only
on the set of valid/supported flags, i.e. iterating over an arbitrary set
of flags is both unnecessary and wrong.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
[sean: use double-underscores for the inner helper]
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:27 -07:00
Sean Christopherson
21d602ed61 KVM: selftests: Create a new guest_memfd for each testcase
Refactor the guest_memfd selftest to improve test isolation by creating a
a new guest_memfd for each testcase.  Currently, the test reuses a single
guest_memfd instance for all testcases, and thus creates dependencies
between tests, e.g. not truncating folios from the guest_memfd instance
at the end of a test could lead to unexpected results (see the PUNCH_HOLE
purging that needs to done by in-flight the NUMA testcases[1]).

Invoke each test via a macro wrapper to create and close a guest_memfd
to cut down on the boilerplate copy+paste needed to create a test.

Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com
Reported-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:26 -07:00
Sean Christopherson
3a6c08538c KVM: selftests: Stash the host page size in a global in the guest_memfd test
Use a global variable to track the host page size in the guest_memfd test
so that the information doesn't need to be constantly passed around.  The
state is purely a reflection of the underlying system, i.e. can't be set
by the test and is constant for a given invocation of the test, and thus
explicitly passing the host page size to individual testcases adds no
value, e.g. doesn't allow testing different combinations.

Making page_size a global will simplify an upcoming change to create a new
guest_memfd instance per testcase.

No functional change intended.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:26 -07:00
Sean Christopherson
fe2bf6234e KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set
Add a guest_memfd flag to allow userspace to state that the underlying
memory should be configured to be initialized as shared, and reject user
page faults if the guest_memfd instance's memory isn't shared.  Because
KVM doesn't yet support in-place private<=>shared conversions, all
guest_memfd memory effectively follows the initial state.

Alternatively, KVM could deduce the initial state based on MMAP, which for
all intents and purposes is what KVM currently does.  However, implicitly
deriving the default state based on MMAP will result in a messy ABI when
support for in-place conversions is added.

For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
by default (otherwise the memory would be unusable).  If MMAP implies
memory is shared by default, then the default state for CoCo VMs will vary
based on MMAP, and from userspace's perspective, will change when in-place
conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
would need to immediately convert all memory from shared=>private, which
is both ugly and inefficient.  The inefficiency could be avoided by adding
a flag to state that memory is _private_ by default, irrespective of MMAP,
but that would lead to an equally messy and hard to document ABI.

Bite the bullet and immediately add a flag to control the default state so
that the effective behavior is explicit and straightforward.

Fixes: 3d3a04fad2 ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:23 -07:00
Sean Christopherson
d2042d8f96 KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS
Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic
KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't
require a new capability, and so that developers aren't tempted to bundle
multiple flags into a single capability.

Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit
value, but that limitation can be easily circumvented by adding e.g.
KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more
than 32 flags.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:22 -07:00
Mykyta Yatsenko
bca2b74ea9 selftests/bpf: Add more bpf_wq tests
Add bpf_wq selftests to verify:
 * BPF program using non-constant offset of struct bpf_wq is rejected
 * BPF program using map with no BTF for storing struct bpf_wq is rejected

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251010164606.147298-2-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 11:12:51 -07:00
Paul Chaignon
bc3eeb4259 selftests/bpf: Test direct packet access on non-linear skbs
This patch adds new selftests in the direct packet access suite, to
cover the non-linear case. The first six tests cover the behavior of
the bounds check with a non-linear skb. The last test adds a call to
bpf_skb_pull_data() to be able to access the packet.

Note that the size of the linear area includes the L2 header, but for
some program types like cgroup_skb, ctx->data points to the L3 header.
Therefore, a linear area of 22 bytes will have only 8 bytes accessible
to the BPF program (22 - ETH_HLEN). For that reason, the cgroup_skb test
cases access the packet at an offset of 8 bytes.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/ceedbfd719e58f0d49dcceb8592f5e6bd38ce5fe.1760037899.git.paul.chaignon@gmail.com
2025-10-10 10:43:04 -07:00
Paul Chaignon
8d45d0398d selftests/bpf: Support non-linear flag in test loader
This patch adds support for a new tag __linear_size in the test loader,
to specify the size of the linear area in case of non-linear skbs. If
the tag is absent or null, a linear skb is crafted.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/7ad928ec7591daef4f1b84032aeb86c918b3e5a7.1760037899.git.paul.chaignon@gmail.com
2025-10-10 10:43:04 -07:00
KaFai Wan
accb9a7e87 selftests/bpf: Add test for unpinning htab with internal timer struct
Add test to verify that unpinning hash tables containing internal timer
structures does not trigger context warnings.

Each subtest (timer_prealloc and timer_no_prealloc) can trigger the
context warning when unpinning, but the warning cannot be triggered
twice within a short time interval (a HZ), which is expected behavior.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251008102628.808045-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:10:08 -07:00
Rong Tao
eca0b643ef selftests/bpf: Test bpf_strcasestr,bpf_strncasestr kfuncs
Add tests for new kfuncs bpf_strcasestr() and bpf_strncasestr().

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_4F1A340A8966155C52AA9CBDB68FD221FE0A@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:05:32 -07:00
Kumar Kartikeya Dwivedi
5b1b5d380a selftests/bpf: Add tests for async cb context
Add tests to verify that async callback's sleepable attribute is
correctly determined by the callback type, not the arming program's
context, reflecting its true execution context.

Introduce verifier_async_cb_context.c with tests for all three async
callback primitives: bpf_timer, bpf_wq, and bpf_task_work. Each
primitive is tested when armed from both sleepable (lsm.s/file_open) and
non-sleepable (fentry) programs.

Test coverage:
- bpf_timer callbacks: Verify they are never sleepable, even when armed
  from sleepable programs. Both tests should fail when attempting to use
  sleepable helper bpf_copy_from_user() in the callback.

- bpf_wq callbacks: Verify they are always sleepable, even when armed
  from non-sleepable programs. Both tests should succeed when using
  sleepable helpers in the callback.

- bpf_task_work callbacks: Verify they are always sleepable, even when
  armed from non-sleepable programs. Both tests should succeed when
  using sleepable helpers in the callback.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251007220349.3852807-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:04:51 -07:00
Linus Torvalds
18a7e218cf Including fixes from netfilter.
Current release - regressions:
 
   - mlx5: fix pre-2.40 binutils assembler error
 
 Current release - new code bugs:
 
   - net: psp: don't assume reply skbs will have a socket
 
   - eth: fbnic: fix missing programming of the default descriptor
 
 Previous releases - regressions:
 
   - page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
 
   - tcp:
     - take care of zero tp->window_clamp in tcp_set_rcvlowat()
     - don't call reqsk_fastopen_remove() in tcp_conn_request().
 
   - eth: ice: release xa entry on adapter allocation failure
 
   - eth: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
 
 Previous releases - always broken:
 
   - netfilter: validate objref and objrefmap expressions
 
   - sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()
 
   - eth: mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
 
   - eth: mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
 
   - eth: ocelot: fix use-after-free caused by cyclic delayed work
 
 Misc:
 
   -  add support for MediaTek PCIe 5G HP DRMR-H01
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjnlD0SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkQnYP/iQAwtLSyE2lqdP+OsfnwhdpeeMpy2B/
 rfN7hk4OS1Rs1WBEf+Y45Lm5yC2poRleEf3QrixcqAHrtr4SED1JgZynjzjtYJ6c
 mmCdzZPZMfBEVG7rcOusSMI+ebWCA1yvSzjqw6CctwzW6Sac4S3KRyYzUNuTJ+De
 3APf6NptD324veJuyP01opafaExBpu21DfaEXeu4kP030S74J5TMfjt7dNvVpYDt
 qwb7s6syJ9O+U3INmi6SVxJY9711Cj8tjUXzz94FWc/kC28D3UnIKUO1tL/qNthe
 do0OhNa1YlsmoGJuRthxQa9ubyScnP6hKcy8VlanvuDya0JVyzdod/RDO6A2L9bp
 bdJIM2wnLd4Sqe7QcGPBFj0KNahgqG1lzj0uHKhrWBetp+5wWySuk9Mohg8yZaGP
 w53X0+WhAeferpcdMPFbwBL9pcgmHo2FxDMaHKcIrIAFb+TIzSZA6CXWB2RFRSgH
 smgQ9EYUbfUtP6nZCs1lfOux7CdN94tuKA9jnUXFrlIda44akxVe/wVi9bRazrVx
 TcpF0BEeQTyM4wCoVsk/JzTNeK2VZtj8i1B7TrZvR68UkwGrXm8DB4xG42LxdRaw
 PEtu8mbUGiWtkojmUAVpYc/TfMdQOVyBUijd2uAe1pXV4yPDz5cWNkEPX8vvAiRu
 rIe8v1VAz+gg
 =LlXo
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull  networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mlx5: fix pre-2.40 binutils assembler error

  Current release - new code bugs:

   - net: psp: don't assume reply skbs will have a socket

   - eth: fbnic: fix missing programming of the default descriptor

  Previous releases - regressions:

   - page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches

   - tcp:
       - take care of zero tp->window_clamp in tcp_set_rcvlowat()
       - don't call reqsk_fastopen_remove() in tcp_conn_request()

   - eth:
       - ice: release xa entry on adapter allocation failure
       - usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock

  Previous releases - always broken:

   - netfilter: validate objref and objrefmap expressions

   - sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()

   - eth:
       - mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
       - mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
       - ocelot: fix use-after-free caused by cyclic delayed work

  Misc:

   -  add support for MediaTek PCIe 5G HP DRMR-H01"

* tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  net: airoha: Fix loopback mode configuration for GDM2 port
  selftests: drv-net: pp_alloc_fail: add necessary optoins to config
  selftests: drv-net: pp_alloc_fail: lower traffic expectations
  selftests: drv-net: fix linter warnings in pp_alloc_fail
  eth: fbnic: fix reporting of alloc_failed qstats
  selftests: drv-net: xdp: add test for interface level qstats
  selftests: drv-net: xdp: rename netnl to ethnl
  eth: fbnic: fix saving stats from XDP_TX rings on close
  eth: fbnic: fix accounting of XDP packets
  eth: fbnic: fix missing programming of the default descriptor
  selftests: netfilter: query conntrack state to check for port clash resolution
  selftests: netfilter: nft_fib.sh: fix spurious test failures
  bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
  netfilter: nft_objref: validate objref and objrefmap expressions
  net: pse-pd: tps23881: Fix current measurement scaling
  net/mlx5: fix pre-2.40 binutils assembler error
  net/mlx5e: Do not fail PSP init on missing caps
  net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed
  net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tables
  net: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
  ...
2025-10-09 11:13:08 -07:00
Jakub Kicinski
5d683e5505 selftests: drv-net: pp_alloc_fail: add necessary optoins to config
Add kernel config for error injection as needed by pp_alloc_fail.py

Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 9da271f825 ("selftests: drv-net-hw: add test for memory allocation failures with page pool")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251007232653.2099376-10-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09 11:10:02 +02:00
Jakub Kicinski
fbb467f0ed selftests: drv-net: pp_alloc_fail: lower traffic expectations
Lower the expected level of traffic in the pp_alloc_fail test
and calculate failure counter thresholds based on the traffic
rather than using a fixed constant.

We only have "QEMU HW" in NIPA right now, and the test (due to
debug dependencies) only works on debug kernels in the first place.
We need some place for it to pass otherwise it seems to be bit
rotting. So lower the traffic threshold so that it passes on QEMU
and with a debug kernel...

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251007232653.2099376-9-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09 11:10:02 +02:00
Jakub Kicinski
0be740fb22 selftests: drv-net: fix linter warnings in pp_alloc_fail
Fix linter warnings, it's a bit hard to check for new ones otherwise.

  W0311: Bad indentation. Found 16 spaces, expected 12 (bad-indentation)
  C0114: Missing module docstring (missing-module-docstring)
  W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
  C0116: Missing function or method docstring (missing-function-docstring)

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251007232653.2099376-8-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09 11:10:02 +02:00
Jakub Kicinski
27ba92560b selftests: drv-net: xdp: add test for interface level qstats
Send a non-trivial number of packets and make sure that they
are counted correctly in qstats. Per qstats specification
XDP is the first layer of the stack so we should see Rx and Tx
counters go up for packets which went thru XDP.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251007232653.2099376-6-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09 11:10:02 +02:00
Jakub Kicinski
1ad3f62089 selftests: drv-net: xdp: rename netnl to ethnl
Test uses "netnl" for the ethtool family which is quite confusing
(one would expect netdev family would use this name).

No functional changes.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251007232653.2099376-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09 11:10:02 +02:00
Florian Westphal
e84945bdc6 selftests: netfilter: query conntrack state to check for port clash resolution
Jakub reported this self test flaking occasionally (fails, but passes on
re-run) on debug kernels.

This is because the test checks for elapsed time to determine if both
connections were established in parallel.

Rework this to no longer depend on timing.
Use busywait helper to check that both sockets have moved to established
state and then query the conntrack engine for the two entries.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netfilter-devel/20250926163318.40d1a502@kernel.org/
Fixes: 117e149e26 ("selftests: netfilter: test nat source port clash resolution interaction with tcp early demux")
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-10-08 13:17:31 +02:00
Florian Westphal
a126ab6b26 selftests: netfilter: nft_fib.sh: fix spurious test failures
Jakub reports spurious failure of nft_fib.sh test.
This is caused by a subtle bug inherited when i moved faulty ping
from one test case to another.

nft_fib.sh not only checks that the fib expression matched, it also
records the number of matches and then validates we have the expected
count.  When I did this it was under the assumption that we would
have 0 to n matching packets.  In case of the failure, the entry has
n+1 matching packets.

This happens because ping_unreachable helper uses "ping -c 1 -w 1",
instead of the intended "-W".  -w alters the meaning of -c (count),
namely, its then treated as number of wanted *replies* instead of
"number of packets to send".

So, in some cases, ping -c 1 -w 1 ends up sending two packets which then
makes the test fail due to the higher-than-expected packet count.

Fix the actual bug (s/-w/-W) and also change the error handling:
1. Show the number of expected packets in the error message
2. Always try to delete the key from the set.
   Else, later test that makes sure we don't have unexpected keys
   in there will always fail as well.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netfilter-devel/20250927090709.0b3cd783@kernel.org/
Fixes: 98287045c9 ("selftests: netfilter: move fib vrf test to nft_fib.sh")
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-10-08 13:17:31 +02:00
Yan Zhao
1bcc3f8791 KVM: selftests: Test prefault memory during concurrent memslot removal
Expand the prefault memory selftest to add a regression test for a KVM bug
where KVM's retry logic would result in (breakable) deadlock due to the
memslot deletion waiting on prefaulting to release SRCU, and prefaulting
waiting on the memslot to fully disappear (KVM uses a two-step process to
delete memslots, and KVM x86 retries page faults if a to-be-deleted, a.k.a.
INVALID, memslot is encountered).

To exercise concurrent memslot remove, spawn a second thread to initiate
memslot removal at roughly the same time as prefaulting.  Test memslot
removal for all testcases, i.e. don't limit concurrent removal to only the
success case.  There are essentially three prefault scenarios (so far)
that are of interest:

 1. Success
 2. ENOENT due to no memslot
 3. EAGAIN due to INVALID memslot

For all intents and purposes, #1 and #2 are mutually exclusive, or rather,
easier to test via separate testcases since writing to non-existent memory
is trivial.  But for #3, making it mutually exclusive with #1 _or_ #2 is
actually more complex than testing memslot removal for all scenarios.  The
only requirement to let memslot removal coexist with other scenarios is a
way to guarantee a stable result, e.g. that the "no memslot" test observes
ENOENT, not EAGAIN, for the final checks.

So, rather than make memslot removal mutually exclusive with the ENOENT
scenario, simply restore the memslot and retry prefaulting.  For the "no
memslot" case, KVM_PRE_FAULT_MEMORY should be idempotent, i.e. should
always fail with ENOENT regardless of how many times userspace attempts
prefaulting.

Pass in both the base GPA and the offset (instead of the "full" GPA) so
that the worker can recreate the memslot.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250924174255.2141847-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-07 09:18:22 -07:00
Jakub Kicinski
b615879dbf selftests: drv-net: make linters happy with our imports
Linters are still not very happy with our __init__ files,
which was pointed out in recent review (see Link).

We have previously started importing things one by one to
make linters happy with the test files (which import from __init__).
But __init__ file itself still makes linters unhappy.

To clean it up I believe we must completely remove the wildcard
imports, and assign the imported modules to __all__.

hds.py needs to be fixed because it seems to be importing
the Python standard random from lib.net.

We can't use ksft_pr() / ktap_result() in case importing
from net.lib fails. Linters complain that those helpers
themselves may not have been imported.

Link: https://lore.kernel.org/9d215979-6c6d-4e9b-9cdd-39cff595866e@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20251003164748.860042-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-07 12:12:44 +02:00
Jakub Kicinski
f07f91a360 selftests: net: unify the Makefile formats
We get a significant number of conflicts between net and net-next
because of selftests Makefile changes. People tend to append new
test cases at the end of the Makefile when there's no clear sort
order. Sort all networking selftests Makefiles, use the following
format:

 VAR_NAME := \
	 entry1 \
	 entry2 \
	 entry3 \
 # end of VAR_NAME

Some Makefiles are already pretty close to this.

Acked-by: Antonio Quartulli <antonio@openvpn.net>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251003210127.1021918-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 13:14:06 -07:00
Jakub Kicinski
2aa74c6258 selftests: net: sort configs
Sort config files for networking selftests. This should help us
avoid merge conflicts between net and net-next. patchwork check
will be added to prevent new issues.

Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20251003205736.1019673-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 13:12:59 -07:00
Linus Torvalds
256e341706 Generic:
* Rework almost all of KVM's exports to expose symbols only to KVM's x86
   vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
 
 x86:
 
 * Rework almost all of KVM x86's exports to expose symbols only to KVM's
   vendor modules, i.e. to kvm-{amd,intel}.ko.
 
 * Add support for virtualizing Control-flow Enforcement Technology (CET) on
   Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
   It's worth noting that while SHSTK and IBT can be enabled separately in CPUID,
   it is not really possible to virtualize them separately.  Therefore, Intel
   processors will really allow both SHSTK and IBT under the hood if either is
   made visible in the guest's CPUID.  The alternative would be to intercept
   XSAVES/XRSTORS, which is not feasible for performance reasons.
 
 * Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when
   completing userspace I/O.  KVM has already committed to allowing L2 to
   to perform I/O at that point.
 
 * Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
   supposed to exist for v2 PMUs.
 
 * Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
 
 * Add support for the immediate forms of RDMSR and WRMSRNS, sans full
   emulator support (KVM should never need to emulate the MSRs outside of
   forced emulation and other contrived testing scenarios).
 
 * Clean up the MSR APIs in preparation for CET and FRED virtualization, as
   well as mediated vPMU support.
 
 * Clean up a pile of PMU code in anticipation of adding support for mediated
   vPMUs.
 
 * Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits
   needed to faithfully emulate an I/O APIC for such guests.
 
 * Many cleanups and minor fixes.
 
 * Recover possible NX huge pages within the TDP MMU under read lock to
   reduce guest jitter when restoring NX huge pages.
 
 * Return -EAGAIN during prefault if userspace concurrently deletes/moves the
   relevant memslot, to fix an issue where prefaulting could deadlock with the
   memslot update.
 
 x86 (AMD):
 
 * Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported.
 
 * Require a minimum GHCB version of 2 when starting SEV-SNP guests via
   KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
   instead of latent guest failures.
 
 * Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
   unauthorized CPU accesses from reading the ciphertext of SNP guest private
   memory, e.g. to attempt an offline attack.  This feature splits the shared
   SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests,
   therefore a new module parameter is needed to control the number of ASIDs
   that can be used for VMs with CipherText Hiding vs. how many can be used to
   run SEV-ES guests.
 
 * Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
   host from tampering with the guest's TSC frequency, while still allowing the
   the VMM to configure the guest's TSC frequency prior to launch.
 
 * Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs
   resulting from bogus XCR0 values.
 
 * Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
   avoid leaving behind stale state (thankfully not consumed in KVM).
 
 * Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
   instead of subtly relying on guest_memfd to deal with them.
 
 * Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
   desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's
   TSC_AUX in the host MSR until return to userspace.
 
 KVM (Intel):
 
 * Preparation for FRED support.
 
 * Don't retry in TDX's anti-zero-step mitigation if the target memslot is
   invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
   to the aforementioned prefaulting case.
 
 * Misc bugfixes and minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjjx/0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMLFwf9HXZdqBn6VvkbSL/HIGdNG1BEzeJ0
 MQVEMMdmWJ72JtI6soJ6oN5NWTIJJeMTPuCgRrNxFbIivSdm9vYPTSCNwNBhKb+H
 FEsr62a9T4XgnTqy20h+yZJiKNvwtaggdTWFnUAUqsBSFkEtksAP72odvZx+GNv/
 cndqtxy/84TcJ4ZXFdxElylCcQ9xRoRkqkU8KaVfg88wqMIMbSR3OBSH/g8bqR+3
 cjvDGNC7TPHPEN2Wmq2AYluRlBxB2ZhsOauArsdidPXHAevO+AFnbS27fz6bixZK
 LTS/qwKOsvhFzyHngemuG6s6HgkgBEshfcKk5i7d2ReRjaGP4EvkhmlImA==
 =k49c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
 "Generic:

   - Rework almost all of KVM's exports to expose symbols only to KVM's
     x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko

  x86:

   - Rework almost all of KVM x86's exports to expose symbols only to
     KVM's vendor modules, i.e. to kvm-{amd,intel}.ko

   - Add support for virtualizing Control-flow Enforcement Technology
     (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
     (Shadow Stacks).

     It is worth noting that while SHSTK and IBT can be enabled
     separately in CPUID, it is not really possible to virtualize them
     separately. Therefore, Intel processors will really allow both
     SHSTK and IBT under the hood if either is made visible in the
     guest's CPUID. The alternative would be to intercept
     XSAVES/XRSTORS, which is not feasible for performance reasons

   - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
     when completing userspace I/O. KVM has already committed to
     allowing L2 to to perform I/O at that point

   - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
     MSR is supposed to exist for v2 PMUs

   - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs

   - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
     emulator support (KVM should never need to emulate the MSRs outside
     of forced emulation and other contrived testing scenarios)

   - Clean up the MSR APIs in preparation for CET and FRED
     virtualization, as well as mediated vPMU support

   - Clean up a pile of PMU code in anticipation of adding support for
     mediated vPMUs

   - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
     vmexits needed to faithfully emulate an I/O APIC for such guests

   - Many cleanups and minor fixes

   - Recover possible NX huge pages within the TDP MMU under read lock
     to reduce guest jitter when restoring NX huge pages

   - Return -EAGAIN during prefault if userspace concurrently
     deletes/moves the relevant memslot, to fix an issue where
     prefaulting could deadlock with the memslot update

  x86 (AMD):

   - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
     supported

   - Require a minimum GHCB version of 2 when starting SEV-SNP guests
     via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
     errors instead of latent guest failures

   - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
     prevents unauthorized CPU accesses from reading the ciphertext of
     SNP guest private memory, e.g. to attempt an offline attack. This
     feature splits the shared SEV-ES/SEV-SNP ASID space into separate
     ranges for SEV-ES and SEV-SNP guests, therefore a new module
     parameter is needed to control the number of ASIDs that can be used
     for VMs with CipherText Hiding vs. how many can be used to run
     SEV-ES guests

   - Add support for Secure TSC for SEV-SNP guests, which prevents the
     untrusted host from tampering with the guest's TSC frequency, while
     still allowing the the VMM to configure the guest's TSC frequency
     prior to launch

   - Validate the XCR0 provided by the guest (via the GHCB) to avoid
     bugs resulting from bogus XCR0 values

   - Save an SEV guest's policy if and only if LAUNCH_START fully
     succeeds to avoid leaving behind stale state (thankfully not
     consumed in KVM)

   - Explicitly reject non-positive effective lengths during SNP's
     LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
     them

   - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
     host's desired TSC_AUX, to fix a bug where KVM was keeping a
     different vCPU's TSC_AUX in the host MSR until return to userspace

  KVM (Intel):

   - Preparation for FRED support

   - Don't retry in TDX's anti-zero-step mitigation if the target
     memslot is invalid, i.e. is being deleted or moved, to fix a
     deadlock scenario similar to the aforementioned prefaulting case

   - Misc bugfixes and minor cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: x86: Export KVM-internal symbols for sub-modules only
  KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
  KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
  KVM: Export KVM-internal symbols for sub-modules only
  KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
  KVM: VMX: Make CR4.CET a guest owned bit
  KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
  KVM: selftests: Add coverage for KVM-defined registers in MSRs test
  KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  KVM: selftests: Extend MSRs test to validate vCPUs without supported features
  KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  KVM: selftests: Add an MSR test to exercise guest/host and read/write
  KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  KVM: x86: Define Control Protection Exception (#CP) vector
  KVM: x86: Add human friendly formatting for #XM, and #VE
  KVM: SVM: Enable shadow stack virtualization for SVM
  KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  KVM: SVM: Pass through shadow stack MSRs as appropriate
  KVM: SVM: Update dump_vmcb with shadow stack save area additions
  KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  ...
2025-10-06 12:37:34 -07:00
Linus Torvalds
ba9dac9873 libnvdimm for 6.18
nvdimm:
         - ndtest: Return -ENOMEM if devm_kcalloc() fails in ndtest_probe()
         - Clean up __nd_ioctl() and remove gotos
                 - Remove duplicate linux/slab.h header
         - Introduce guard() for nvdimm_bus_lock
         - Use str_plural() to simplify the code
 
 ACPI:
         - NFIT: Fix incorrect ndr_desc being reportedin dev_err message
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCaOPt+RQcaXJhLndlaW55
 QGludGVsLmNvbQAKCRCebuN7TNx1MS7qAQDG+AJt2PalEoCxRRUURdvWOHpUHYho
 V8/tGmkUo0HWAQEA4Ar9W2rXjO2pmlSu9ocRWona3FqVy/LOBj6pBQHVqg8=
 =Rx3m
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Ira Weiny:
 "Primarily bug fixes. Dave introduced the usage of cleanup.h a bit late
  in the cycle to help with the new label work required within CXL [1]

  nvdimm:
   - Return -ENOMEM if devm_kcalloc() fails in ndtest_probe()
   - Clean up __nd_ioctl() and remove gotos
   - Remove duplicate linux/slab.h header
   - Introduce guard() for nvdimm_bus_lock
   - Use str_plural() to simplify the code

  ACPI:
   - NFIT: Fix incorrect ndr_desc being reportedin dev_err message"

Link: https://lore.kernel.org/all/20250917134116.1623730-1-s.neeraj@samsung.com/ [1]

* tag 'libnvdimm-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm: Remove duplicate linux/slab.h header
  nvdimm: ndtest: Return -ENOMEM if devm_kcalloc() fails in ndtest_probe()
  nvdimm: Clean up __nd_ioctl() and remove gotos
  nvdimm: Introduce guard() for nvdimm_bus_lock
  ACPI: NFIT: Fix incorrect ndr_desc being reportedin dev_err message
  nvdimm: Use str_plural() to simplify the code
2025-10-06 11:17:18 -07:00
Sidharth Seela
7fc25c5a5a selftest: net: ovpn: Fix uninit return values
Fix functions that return undefined values. These issues were caught by
running clang using LLVM=1 option.

Clang warnings are as follows:
ovpn-cli.c:1587:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
 1587 |         if (!sock) {
      |             ^~~~~
ovpn-cli.c:1635:9: note: uninitialized use occurs here
 1635 |         return ret;
      |                ^~~
ovpn-cli.c:1587:2: note: remove the 'if' if its condition is always false
 1587 |         if (!sock) {
      |         ^~~~~~~~~~~~
 1588 |                 fprintf(stderr, "cannot allocate netlink socket\n");
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1589 |                 goto err_free;
      |                 ~~~~~~~~~~~~~~
 1590 |         }
      |         ~
ovpn-cli.c:1584:15: note: initialize the variable 'ret' to silence this warning
 1584 |         int mcid, ret;
      |                      ^
      |                       = 0
ovpn-cli.c:2107:7: warning: variable 'ret' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
 2107 |         case CMD_INVALID:
      |              ^~~~~~~~~~~
ovpn-cli.c:2111:9: note: uninitialized use occurs here
 2111 |         return ret;
      |                ^~~
ovpn-cli.c:1939:12: note: initialize the variable 'ret' to silence this warning
 1939 |         int n, ret;
      |                   ^
      |

Fixes: 959bc330a4 ("testing/selftests: add test tool and scripts for ovpn module")
Signed-off-by: Sidharth Seela <sidharthseela@gmail.com>
Link: https://patch.msgid.link/20251001123107.96244-2-sidharthseela@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 11:14:07 -07:00
Linus Torvalds
2f2c725493 pci-v6.18-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmjgOAkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxzlA//QxoUF4p1cN7+rPwuzCPNi2ZmKNyU
 T7mLfUciV/t8nPLPFdtxdttHB3F+BsA/E9WYFiUUGBzvdYafnoZ/Qnio1WdMIIYz
 0eVrTpnMUMBXrUwGFnnIER3b4GCJb2WR3RPfaBrbqQRHoAlDmv/ijh7rIKhgWIeR
 NsCmPiFnsxPjgVusn2jXWLheUHEbZh2dVTk9lceQXFRdrUELC9wH7zigAA6GviGO
 ssPC1pKfg5DrtuuM6k9JCcEYibQIlynxZ8sbT6YfQ2bs1uSEd2pEcr7AORb4l2yQ
 rcirHwGTpvZ/QvzKpDY8FcuzPFRP7QPd+34zMEQ2OW04y1k61iKE/4EE2Z9w/OoW
 esFQXbevy9P5JHu6DBcaJ2uwvnLiVesry+9CmkKCc6Dxyjbcbgeta1LR5dhn1Rv0
 dMtRnkd/pxzIF5cRnu+WlOFV2aAw2gKL9pGuimH5TO4xL2qCZKak0hh8PAjUN2c/
 12GAlrwAyBK1FeY2ZflTN7Vr8o2O0I6I6NeaF3sCW1VO2e6E9/bAIhrduUO4lhGq
 BHTVRBefFRtbFVaxTlUAj+lSCyqES3Wzm8y/uLQvT6M3opunTziSDff1aWbm1Y2t
 aASl1IByuKsGID8VrT5khHeBKSWtnd/v7LLUjCeq+g6eKdfN2arInPvw5X1NpVMj
 tzzBYqwHgBoA4u8=
 =BUw/
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that
     take config space accessor functions.

     Implement pci_find_capability(), pci_find_ext_capability(), and
     dwc, dwc endpoint, and cadence capability search interfaces with
     them (Hans Zhang)

   - Leave parent unit address 0 in 'interrupt-map' so that when we
     build devicetree nodes to describe PCI functions that contain
     multiple peripherals, we can build this property even when
     interrupt controllers lack 'reg' properties (Lorenzo Pieralisi)

   - Add a Xeon 6 quirk to disable Extended Tags and limit Max Read
     Request Size to 128B to avoid a performance issue (Ilpo Järvinen)

   - Add sysfs 'serial_number' file to expose the Device Serial Number
     (Matthew Wood)

   - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)

  Resource management:

   - Align m68k pcibios_enable_device() with other arches (Ilpo
     Järvinen)

   - Remove sparc pcibios_enable_device() implementations that don't do
     anything beyond what pci_enable_resources() does (Ilpo Järvinen)

   - Remove mips pcibios_enable_resources() and use
     pci_enable_resources() instead (Ilpo Järvinen)

   - Clean up bridge window sizing and assignment (Ilpo Järvinen),
     including:

       - Leave non-claimed bridge windows disabled

       - Enable bridges even if a window wasn't assigned because not all
         windows are required by downstream devices

       - Preserve bridge window type when releasing the resource, since
         the type is needed for reassignment

       - Consolidate selection of bridge windows into two new
         interfaces, pbus_select_window() and
         pbus_select_window_for_type(), so this is done consistently

       - Compute bridge window start and end earlier to avoid logging
         stale information

  MSI:

   - Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
     Vives)

  Error handling:

   - Align AER with EEH by allowing drivers to request a Bus Reset on
     Non-Fatal Errors (in addition to the reset on Fatal Errors that we
     already do) (Lukas Wunner)

   - If error recovery fails, emit FAILED_RECOVERY uevents for the
     devices, not for the bridge leading to them.

     This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner)

   - Align AER with EEH by calling err_handler.error_detected()
     callbacks to notify drivers if error recovery fails (Lukas Wunner)

   - Align AER with EEH by restoring device error_state to
     pci_channel_io_normal before the err_handler.slot_reset() callback.

     This is earlier than before the err_handler.resume() callback
     (Lukas Wunner)

   - Emit a BEGIN_RECOVERY uevent when driver's
     err_handler.error_detected() requests a reset, as well as when it
     says recovery is complete or can be done without a reset (Niklas
     Schnelle)

   - Align s390 with AER and EEH by emitting uevents during error
     recovery (Niklas Schnelle)

   - Align EEH with AER and s390 by emitting BEGIN_RECOVERY,
     SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the
     result of err_handler.error_detected() (Niklas Schnelle)

   - Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES
     error information identifies a device without an AER Capability
     (Breno Leitao)

   - Update error decoding and TLP Log printing for new errors in
     current PCIe base spec (Lukas Wunner)

   - Update error recovery documentation to match the current code
     and use consistent nomenclature (Lukas Wunner)

  ASPM:

   - Enable all ClockPM and ASPM states for devicetree platforms, since
     there's typically no firmware that enables ASPM

     This is a risky change that may uncover hardware or configuration
     defects at boot-time rather than when users enable ASPM via sysfs
     later. Booting with "pcie_aspm=off" prevents this enabling
     (Manivannan Sadhasivam)

   - Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)

  Power management:

   - If a device has already been disconnected, e.g., by a hotplug
     removal, don't bother trying to resume it to D0 when detaching the
     driver.

     This avoids annoying "Unable to change power state from D3cold to
     D0" messages (Mario Limonciello)

   - Ensure devices are powered up before config reads for
     'max_link_width', 'current_link_speed', 'current_link_width',
     'secondary_bus_number', and 'subordinate_bus_number' sysfs files.

     This prevents using invalid data (~0) in drivers or lspci and,
     depending on how the PCIe controller reports errors, may avoid
     error interrupts or crashes (Brian Norris)

  Virtualization:

   - Add rescan/remove locking when enabling/disabling SR-IOV, which
     avoids list corruption on s390, where disabling SR-IOV also
     generates hotplug events (Niklas Schnelle)

  Peer-to-peer DMA:

   - Free struct p2p_pgmap, not a member within it, in the
     pci_p2pdma_add_resource() error path (Sungho Kim)

  Endpoint framework:

   - Document sysfs interface for BAR assignment of vNTB endpoint
     functions (Jerome Brunet)

   - Fix array underflow in endpoint BAR test case (Dan Carpenter)

   - Skip endpoint IRQ test if the IRQ is out of range to avoid false
     errors (Christian Bruel)

   - Fix endpoint test case for controllers with fixed-size BARs smaller
     than requested by the test (Marek Vasut)

   - Restore inbound translation when disabling doorbell so the endpoint
     doorbell test case can be run more than once (Niklas Cassel)

   - Avoid a NULL pointer dereference when releasing DMA channels in
     endpoint DMA test case (Shin'ichiro Kawasaki)

   - Convert tegra194 interrupt number to MSI vector to fix endpoint
     Kselftest MSI_TEST test case (Niklas Cassel)

   - Reset tegra194 BARs when running in endpoint mode so the BAR tests
     don't overwrite the ATU settings in BAR4 (Niklas Cassel)

   - Handle errors in tegra194 BPMP transactions so we don't mistakenly
     skip future PERST# assertion (Vidya Sagar)

  AMD MDB PCIe controller driver:

   - Update DT binding example to separate PERST# to a Root Port stanza
     to make multiple Root Ports possible in the future (Sai Krishna
     Musham)

   - Add driver support for PERST# being described in a Root Port
     stanza, falling back to the host bridge if not found there (Sai
     Krishna Musham)

  Freescale i.MX6 PCIe controller driver:

   - Enable the 3.3V Vaux supply if available so devices can request
     wakeup with either Beacon or WAKE# (Richard Zhu)

  MediaTek PCIe Gen3 controller driver:

   - Add optional sys clock ready time setting to avoid sys_clk_rdy
     signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)

   - Add DT binding and driver support for MT6991 and MT8196
     (AngeloGioacchino Del Regno)

  NVIDIA Tegra PCIe controller driver:

   - When asserting PERST#, disable the controller instead of mistakenly
     disabling the PLL twice (Nagarjuna Kristam)

   - Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  Qualcomm PCIe controller driver:

   - Select PCI Power Control Slot driver so slot voltage rails can be
     turned on/off if described in Root Port devicetree node (Qiang Yu)

   - Parse only PCI bridge child nodes in devicetree, skipping unrelated
     nodes such as OPP (Operating Performance Points), which caused
     probe failures (Krishna Chaitanya Chundru)

   - Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang)

   - Consolidate Root Port 'phy' and 'reset' properties in struct
     qcom_pcie_port, regardless of whether we got them from the Root
     Port node or the host bridge node (Manivannan Sadhasivam)

   - Fetch and map the ELBI register space in the DWC core rather than
     in each driver individually (Krishna Chaitanya Chundru)

   - Enable ECAM mechanism in DWC core by setting up iATU with 'CFG
     Shift Feature' and use this in the qcom driver (Krishna Chaitanya
     Chundru)

   - Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
     Chundru)

   - Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
     Glymur, which is compatible with X1E80100 but doesn't have the
     cnoc_sf_axi clock (Qiang Yu)

  Renesas R-Car PCIe controller driver:

   - Fix a typo that prevented correct PHY initialization (Marek Vasut)

   - Add a missing 1ms delay after PWR reset assertion as required by
     the V4H manual (Marek Vasut)

   - Assure reset has completed before DBI access to avoid SError (Marek
     Vasut)

   - Fix inverted PHY initialization check, which sometimes led to
     timeouts and failure to start the controller (Marek Vasut)

   - Pass the correct IRQ domain to generic_handle_domain_irq() to fix a
     regression when converting to msi_create_parent_irq_domain()
     (Claudiu Beznea)

   - Drop the spinlock protecting the PMSR register - it's no longer
     required since pci_lock already serializes accesses (Marek Vasut)

   - Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  SOPHGO PCIe controller driver:

   - Check for existence of struct cdns_pcie.ops before using it to
     allow Cadence drivers that don't need to supply ops (Chen Wang)

   - Add DT binding and driver for the SOPHGO SG2042 PCIe controller
     (Chen Wang)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Update pinctrl documentation of initial states and use in runtime
     suspend/resume (Christian Bruel)

   - Add pinctrl_pm_select_init_state() for use by stm32 driver, which
     needs it during resume (Christian Bruel)

   - Add devicetree bindings and drivers for the STMicroelectronics
     STM32MP25 in host and endpoint modes (Christian Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Add support for x16 in devicetree 'num-lanes' property (Konrad
     Dybcio)

   - Verify that if DT specifies a single IRQ for all eDMA channels, it
     is named 'dma' (Niklas Cassel)

  TI J721E PCIe driver:

   - Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth
     Vadapalli)

   - Power controller off before configuring the glue layer so the
     controller latches the correct values on power-on (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver
     exits with error (Siddharth Vadapalli)

   - Add Peripheral Virtualization Unit (PVU), which restricts DMA from
     PCIe devices to specific regions of host memory, to the ti,am65
     binding (Jan Kiszka)

  Xilinx NWL PCIe controller driver:

   - Clear bootloader E_ECAM_CONTROL before merging in the new driver
     value to avoid writing invalid values (Jani Nurminen)"

* tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits)
  PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
  MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers
  PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25
  dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings
  PCI: stm32: Add PCIe host support for STM32MP25
  PCI: xilinx-nwl: Fix ECAM programming
  PCI: j721e: Fix incorrect error message in probe()
  PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
  dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
  PCI: dwc: Support 16-lane operation
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
  PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock
  PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0()
  PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock
  PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  PCI: rcar-gen4: Assure reset occurs before DBI access
  PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
  PCI: Set up bridge resources earlier
  PCI: rcar-host: Drop PMSR spinlock
  ...
2025-10-06 10:41:03 -07:00
Linus Torvalds
6093a688a0 Char/Misc/IIO/Binder changes for 6.18-rc1
Here is the big set of char/misc/iio and other driver subsystem changes
 for 6.18-rc1.  Loads of different stuff in here, it was a busy
 development cycle in lots of different subsystems, with over 27k new
 lines added to the tree.  Included in here are:
   - IIO updates including new drivers, reworking of existing apis, and
     other goodness in the sensor subsystems
   - MEI driver updates and additions
   - NVMEM driver updates
   - slimbus removal for an unused driver and some other minor
     updates
   - coresight driver updates and additions
   - MHI driver updates
   - comedi driver updates and fixes
   - extcon driver updates
   - interconnect driver additions
   - eeprom driver updates and fixes
   - minor UIO driver updates
   - tiny W1 driver updates
 
 But the majority of new code is in the rust bindings and additions,
 which includes:
   - misc driver rust binding updates for read/write support, we can now
     write "normal" misc drivers in rust fully, and the sample driver
     shows how this can be done.
   - Initial framework for USB driver rust bindings, which are disabled
     for now in the build, due to limited support, but coming in through
     this tree due to dependencies on other rust binding changes that
     were in here.  I'll be enabling these back on in the build in the
     usb.git tree after -rc1 is out so that developers can continue to
     work on these in linux-next over the next development cycle.
   - Android Binder driver implemented in Rust.  This is the big one, and
     was driving a huge majority of the rust binding work over the past
     years.  Right now there are 2 binder drivers in the kernel, selected
     only at build time as to which one to use as binder wants to be
     included in the system at boot time.  The binder C maintainers all
     agreed on this, as eventually, they want the C code to be removed from
     the tree, but it will take a few releases to get there while both
     are maintained to ensure that the rust implementation is fully
     stable and compliant with the existing userspace apis.
 
 All of these have been in linux-next for a while, with only minor merge
 issues showing up (you will hit them as well.)  Just accept both sides
 of the merge, it's just some header and include file lines, nothing
 major.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaOEffA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynI/wCgjLFWH9B+huZI5JQb06NShggZod4AnjFFJ4ID
 macHNv5/SjpAh7H5ssBU
 =cjWS
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull Char/Misc/IIO/Binder updates from Greg KH:
 "Here is the big set of char/misc/iio and other driver subsystem
  changes for 6.18-rc1.

  Loads of different stuff in here, it was a busy development cycle in
  lots of different subsystems, with over 27k new lines added to the
  tree.

  Included in here are:

   - IIO updates including new drivers, reworking of existing apis, and
     other goodness in the sensor subsystems

   - MEI driver updates and additions

   - NVMEM driver updates

   - slimbus removal for an unused driver and some other minor updates

   - coresight driver updates and additions

   - MHI driver updates

   - comedi driver updates and fixes

   - extcon driver updates

   - interconnect driver additions

   - eeprom driver updates and fixes

   - minor UIO driver updates

   - tiny W1 driver updates

  But the majority of new code is in the rust bindings and additions,
  which includes:

   - misc driver rust binding updates for read/write support, we can now
     write "normal" misc drivers in rust fully, and the sample driver
     shows how this can be done.

   - Initial framework for USB driver rust bindings, which are disabled
     for now in the build, due to limited support, but coming in through
     this tree due to dependencies on other rust binding changes that
     were in here. I'll be enabling these back on in the build in the
     usb.git tree after -rc1 is out so that developers can continue to
     work on these in linux-next over the next development cycle.

   - Android Binder driver implemented in Rust.

     This is the big one, and was driving a huge majority of the rust
     binding work over the past years. Right now there are two binder
     drivers in the kernel, selected only at build time as to which one
     to use as binder wants to be included in the system at boot time.

     The binder C maintainers all agreed on this, as eventually, they
     want the C code to be removed from the tree, but it will take a few
     releases to get there while both are maintained to ensure that the
     rust implementation is fully stable and compliant with the existing
     userspace apis.

  All of these have been in linux-next for a while"

* tag 'char-misc-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (320 commits)
  rust: usb: keep usb::Device private for now
  rust: usb: don't retain device context for the interface parent
  USB: disable rust bindings from the build for now
  samples: rust: add a USB driver sample
  rust: usb: add basic USB abstractions
  coresight: Add label sysfs node support
  dt-bindings: arm: Add label in the coresight components
  coresight: tnoc: add new AMBA ID to support Trace Noc V2
  coresight: Fix incorrect handling for return value of devm_kzalloc
  coresight: tpda: fix the logic to setup the element size
  coresight: trbe: Return NULL pointer for allocation failures
  coresight: Refactor runtime PM
  coresight: Make clock sequence consistent
  coresight: Refactor driver data allocation
  coresight: Consolidate clock enabling
  coresight: Avoid enable programming clock duplicately
  coresight: Appropriately disable trace bus clocks
  coresight: Appropriately disable programming clocks
  coresight: etm4x: Support atclk
  coresight: catu: Support atclk
  ...
2025-10-04 16:26:32 -07:00
Linus Torvalds
54ba6d9b13 hid-for-linus-2025093001
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEoEVH9lhNrxiMPSyI7MXwXhnZSjYFAmjb8vETHGJlbnRpc3NA
 a2VybmVsLm9yZwAKCRDsxfBeGdlKNhMhD/42hnNxVA+pRYUdwgX+5fGPkKpM6ayS
 z627qOGgFsrwidipRmx68VZpHnKTpZMKyqxYQYEQ0SExVviXaIohQGu04OF+N+En
 3fo9zHJiXxnlxVv9R+7LiV/S74yN1nLM+nURnvG7NjIqOeLfxO0NnNdh+MIGmYzv
 OdUdAO7p84F5aadKPjyFTHbO7Y2bAT2XT0xTJR+SZ7MnP8rWRbN3l/BfrkTL+D3X
 Bhg9Xdgoqst6f1Be5m6oYthHaXbs3yjqGYMRkhFRKlEsd/ArBRudyMzqAQ602WMl
 gHMcHwpc5ztJTIOBZU6QzBAbJBozrPWzr6Lvih3jE2HZBs7B2mw0NcnE4+7xkaYs
 llNcrwlBTRuS0qwvrJcNvstV7z9toCx4PjVCYiMhTAdkq1m3J1KqCAPA5OvIELLq
 xXjHmmNRYe9bvAik4cuVxeuw5azvgxYd/lFJx7OmENTw/pczXot5N3+bAqdl+4ka
 yFF+yUvpYTdrYXQmj74FxJODlErHUSRxNtFeAk6nJ1HLZB/lcl6XF9YnCqpE4TVe
 WPzuzrpoJvOeUvqBHtXYQ2/g8argaOAQu62NpfjF8/OA1nBF4Jg/n/lRGy4Mmmec
 ar7w4+3y72KSVmLJ7F775Kakrod/gT8tA1RPtHGFFXbayGqw6Cz9E6E0P0q08n0y
 v0CUWASm2YIEWA==
 =8Nwa
 -----END PGP SIGNATURE-----

Merge tag 'hid-for-linus-2025093001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID updates from Benjamin Tissoires:

 - haptic touchpad support (Angela Czubak and Jonathan Denose)

 - support for audio jack handling on DualSense Playstation controllers
   (Cristian Ciocaltea)

 - allow HID-BPF to rebind a driver to hid-multitouch (Benjamin
   Tissoires)

 - rework hidraw ioctls to make them safer (and tested) (Benjamin
   Tissoires)

 - various PIDFF and universal-PIDFF fixes/improvements (Tomasz Pakuła)

 - better configuration of Intel QuickI2C through ACPI (Xinpeng Sun)

 - other assorted cleanups and fixes

* tag 'hid-for-linus-2025093001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (58 commits)
  HID: playstation: Switch to scoped_guard() in {dualsense|dualshock4}_output_worker()
  HID: playstation: Silence sparse warnings for locking context imbalances
  HID: playstation: Update SP preamp gain comment line
  HID: intel-thc-hid: intel-quicki2c: support ACPI config for advanced features
  HID: core: Change hid_driver to use a const char* for name
  HID: hidraw: tighten ioctl command parsing
  selftests/hid: hidraw: forge wrong ioctls and tests them
  selftests/hid: hidraw: add more coverage for hidraw ioctls
  selftests/hid: update vmtest.sh for virtme-ng
  HID: playstation: Support DualSense audio jack event reporting
  HID: playstation: Support DualSense audio jack hotplug detection
  HID: playstation: Redefine DualSense input report status field
  HID: playstation: Prefer kzalloc(sizeof(*buf)...)
  HID: playstation: Document spinlock_t usage
  HID: playstation: Fix all alignment and line length issues
  HID: playstation: Correct spelling in comment sections
  HID: playstation: Replace uint{32,16,8}_t with u{32,16,8}
  HID: playstation: Simplify locking with guard() and scoped_guard()
  HID: playstation: Add spaces around arithmetic operators
  HID: playstation: Make use of bitfield macros
  ...
2025-10-04 15:38:04 -07:00
Linus Torvalds
d104e3d17f CXL changes for v6.18
Misc changes:
 - Use str_plural() instead of open code for emitting strings.
 - Use str_enabled_disabled() instead of ternary operator
 - Fix emit of type resource_size_t argument for validate_region_offset()
 - Typo fixup in CXL driver-api documentation
 - Rename CFMWS coherency restriction defines
 - Add convention doc describe dealing with x86 low memory hole and CXL
 
 Poison Inject support series:
 - Move hpa_to_spa callback to new reoot decoder ops structure
 - Define a SPA to HPA callback for interleave calculation with XOR math
 - Add support for SPA to DPA address translation with XOR
 - Add locked variants of poison inject and clear functions
 - Add inject and clear poison support by region offset
 
 CXL access coordinates update fix series:
 - A comment update for hotplug memory callback prority defines
 - Add node_update_perf_attrs() for updating perf attrs on a node
 - Update cxl_access_coordinates() to use the new node update function
 - Remove hmat_update_target_coordinates() and related code
 
 CXL delayed downstream port enumeration and initialization series
 - Add helper to detect top of CXL device topology and remove open coding
 - Add helper to delete single dport
 - Add a cached copy of target_map to cxl_decoder
 - Refactor decoder setup to reduce cxl_test burden
 - Defer dport allocation for switch ports
 - Add mock version of devm_cxl_add_dport_by_dev() for cxl_test
 - Adjust the mock version of devm_cxl_switch_port_decoders_setup() due to
   cxl core usage
 - Setup target_map for cxl_test decoder initialization
 - Change SSLBIS handler to handle single dport
 - Move port register setup to when first dport appears
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmjdqxAACgkQYGjFFmlT
 OEoayhAAqW6nPPM2XNiNigGp5oxQTt0GiblhS/PDAq+VHZaFQtrM6lvbrvqj1Gus
 g49ID4SkKVq2SKZlGVk5xcPE2BEeKp4YOF6mmAqKNy6geeG0mVXf/gNbhd/8pnpm
 zmbX9FdAR2x4ZimPzBZZO0vlm5NG61sVHWyz1VcU9rQUpB8shSQF3QIoKypq1MpU
 G7PgN92Pc8Ztr1cI9RSFXV6p5Bd26IMt7Bi3Wub5z4rtnQAFzhtQ5oFpen6Dc4Gj
 py+BwY9x25HsVCWD6oQIFvDfH5iiZfSbL62h2ttbalkqM0dFJedKmHq1rNMpsV/4
 mNY2COr2uTBOB7Zht10+Q46pAAYdBTVKFIhRAEUidnCmzF8PPEAEYISo4vE5Oqih
 lUJYhU8tREacLJ9jR4ro0NwgM43mESX4Aj84CV+BtPA2SyI2qsqY8xCHXyyiaLsn
 GUGSbVXRbhtAWs+gM8ciERx9U/AE+yV8oABaz/zeUO5RMSB2ho6y9XD6PFHfp5Fb
 w+Ud6CNkk00HR8Api38zfHJeMOR+GzsaebZgW8pOcfucC6dxS1rhUa3iN0ifLMIC
 QdRSemwBjbPWJ21JwHxJCGVv/OUocziTg9H5ydZfaxOoXjIKYZcKo5ePUe4KH7bi
 2tNjlA8BCycBiwaUMIMEHcIZNNm2GGddeN6TwP8QLVAevj3Hkl0=
 =VYmp
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL updates from Dave Jiang:
 "The changes include adding poison injection support, fixing CXL access
  coordinates when onlining CXL memory, and delaing the enumeration of
  downstream switch ports for CXL hierarchy to ensure that the CXL link
  is established at the time of enumeration to address a few issues
  observed on AMD and Intel platforms.

  Misc changes:
   - Use str_plural() instead of open code for emitting strings.
   - Use str_enabled_disabled() instead of ternary operator
   - Fix emit of type resource_size_t argument for
     validate_region_offset()
   - Typo fixup in CXL driver-api documentation
   - Rename CFMWS coherency restriction defines
   - Add convention doc describe dealing with x86 low memory hole
     and CXL

  Poison Inject support:
   - Move hpa_to_spa callback to new reoot decoder ops structure
   - Define a SPA to HPA callback for interleave calculation with
     XOR math
   - Add support for SPA to DPA address translation with XOR
   - Add locked variants of poison inject and clear functions
   - Add inject and clear poison support by region offset

  CXL access coordinates update fix:
   - A comment update for hotplug memory callback prority defines
   - Add node_update_perf_attrs() for updating perf attrs on a node
   - Update cxl_access_coordinates() to use the new node update function
   - Remove hmat_update_target_coordinates() and related code

  CXL delayed downstream port enumeration and initialization:
   - Add helper to detect top of CXL device topology and remove
     open coding
   - Add helper to delete single dport
   - Add a cached copy of target_map to cxl_decoder
   - Refactor decoder setup to reduce cxl_test burden
   - Defer dport allocation for switch ports
   - Add mock version of devm_cxl_add_dport_by_dev() for cxl_test
   - Adjust the mock version of devm_cxl_switch_port_decoders_setup()
     due to cxl core usage
   - Setup target_map for cxl_test decoder initialization
   - Change SSLBIS handler to handle single dport
   - Move port register setup to when first dport appears"

* tag 'cxl-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (25 commits)
  cxl: Move port register setup to when first dport appear
  cxl: Change sslbis handler to only handle single dport
  cxl/test: Setup target_map for cxl_test decoder initialization
  cxl/test: Adjust the mock version of devm_cxl_switch_port_decoders_setup()
  cxl/test: Add mock version of devm_cxl_add_dport_by_dev()
  cxl: Defer dport allocation for switch ports
  cxl/test: Refactor decoder setup to reduce cxl_test burden
  cxl: Add a cached copy of target_map to cxl_decoder
  cxl: Add helper to delete dport
  cxl: Add helper to detect top of CXL device topology
  cxl: Documentation/driver-api/cxl: Describe the x86 Low Memory Hole solution
  cxl/acpi: Rename CFMW coherency restrictions
  Documentation/driver-api: Fix typo error in cxl
  acpi/hmat: Remove now unused hmat_update_target_coordinates()
  cxl, acpi/hmat: Update CXL access coordinates directly instead of through HMAT
  drivers/base/node: Add a helper function node_update_perf_attrs()
  mm/memory_hotplug: Update comment for hotplug memory callback priorities
  cxl: Fix emit of type resource_size_t argument for validate_region_offset()
  cxl/region: Add inject and clear poison by region offset
  cxl/core: Add locked variants of the poison inject and clear funcs
  ...
2025-10-04 12:02:50 -07:00
Linus Torvalds
67da125e30 RCU pull request for v6.18
This pull request contains the following branches, non-octopus merged:
 
 Documentation updates:
 
   - Update whatisRCU.rst and checklist.rst for recent RCU API additions.
 
   - Fix RCU documentation formatting and typos.
 
   - Replace dead Ottawa Linux Symposium links in RTFP.txt.
 
 Miscellaneous RCU updates:
 
   - Document that rcu_barrier() hurries RCU_LAZY callbacks.
 
   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler().
 
   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly.
 
   - Make initial set of changes to accommodate upcoming system_percpu_wq
     changes.
 
 SRCU updates:
 
   - Create an srcu_read_lock_fast_notrace() for eventual use in tracing,
     including adding guards.
 
   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast().
 
   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers.
 
   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed().
 
 Torture-test updates:
 
   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary.
 
   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels without
     doing non-debug kernels.
 
   - Fix a number of false-positive diagnostics that were being triggered
     by rcutorture starting before boot completed.  Running multiple
     near-CPU-bound rcutorture processes when there is only the boot CPU
     is after all a bit excessive.
 
   - Substitute kcalloc() for kzalloc().
 
   - Remove a redundant kfree() and NULL out kfree()ed objects.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmjWzFcTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jAUeD/4xp/3rvVlfX6UB1ax/lYbQopOm2Hns
 7DVO/lp8ih6jUFCRappw7do+jbU3EcP+76sGUd5qkKlRbJIierTHBrolULpKph/p
 hQ/LddPYIHg+bPtbq/vA6fFwFI/xwbBNQOMS1bWzzU5iWn/p/ETS1kWptz+g2wk5
 pOxx9PnH52Ls5BgCCnb8kGpUuG3cy63sFb52ORrN206EaUq59Q12CLA7aTge4QGV
 3fvBIv+U+chagELG0usoPPTC+fMUt8oj8vfVYyDqUPjoyxATDtvxAv7ORFI7ZmEm
 CVemKrHWEaVEERfXSSbMp0TpPrNnkB4SoYOGT6vakyjgpcQHLh0GMaUZfluvyROt
 nQNaQFbOLfh9GGBjZQpAu1Aa2aAvMcOxyrL1JPhvwFVT0G4hF6s4Zs0zLY7+MAPD
 XT0wSf79U4lTzlg4eNrwZazoFGIeUhwyH2X59yc04yXytM7QUBFw+7XIK/PA8Wn4
 LgJYrwn6poFinGT2HlwKPrIUKSfCpLS8ePutmMgMUqLhiVtKvoaE6S38h2T97d9i
 OuLFDVMm+j0awMadS3cJD5kqtGVbqnPsUuaC3OFlpyng8K+OHHTNCfcFMiMbhgdw
 w6cMioYdq/a9mLoN6T50ylvTOKeoKWxXI4X6q6x7vZYUztMpFby+b5mTTbR12dJs
 LSqHR8QGSIpNmQ==
 =FEf1
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Paul McKenney:
 "Documentation updates:

   - Update whatisRCU.rst and checklist.rst for recent RCU API additions

   - Fix RCU documentation formatting and typos

   - Replace dead Ottawa Linux Symposium links in RTFP.txt

  Miscellaneous RCU updates:

   - Document that rcu_barrier() hurries RCU_LAZY callbacks

   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler()

   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly

   - Make initial set of changes to accommodate upcoming
     system_percpu_wq changes

  SRCU updates:

   - Create an srcu_read_lock_fast_notrace() for eventual use in
     tracing, including adding guards

   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast()

   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers

   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed()

  Torture-test updates:

   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary

   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels
     without doing non-debug kernels

   - Fix a number of false-positive diagnostics that were being
     triggered by rcutorture starting before boot completed. Running
     multiple near-CPU-bound rcutorture processes when there is only the
     boot CPU is after all a bit excessive

   - Substitute kcalloc() for kzalloc()

   - Remove a redundant kfree() and NULL out kfree()ed objects"

* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
  rcu: WQ_UNBOUND added to sync_wq workqueue
  rcu: WQ_PERCPU added to alloc_workqueue users
  rcu: replace use of system_wq with system_percpu_wq
  refperf: Set reader_tasks to NULL after kfree()
  refperf: Remove redundant kfree() after torture_stop_kthread()
  srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
  srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
  srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
  rculist: move list_for_each_rcu() to where it belongs
  refscale: Use kcalloc() instead of kzalloc()
  rcutorture: Use kcalloc() instead of kzalloc()
  docs: rcu: Replace multiple dead OLS links in RTFP.txt
  doc: Fix typo in RCU's torture.rst documentation
  Documentation: RCU: Retitle toctree index
  Documentation: RCU: Reduce toctree depth
  Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
  rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
  doc: Add RCU guards to checklist.rst
  doc: Update whatisRCU.rst for recent RCU API additions
  rcutorture: Delay forward-progress testing until boot completes
  ...
2025-10-04 11:28:45 -07:00
Rong Tao
de7342228b bpf: Finish constification of 1st parameter of bpf_d_path()
The commit 1b8abbb121 ("bpf...d_path(): constify path argument")
constified the first parameter of the bpf_d_path(), but failed to
update it in all places. Finish constification.

Otherwise the selftest fail to build:
.../selftests/bpf/bpf_experimental.h:222:12: error: conflicting types for 'bpf_path_d_path'
  222 | extern int bpf_path_d_path(const struct path *path, char *buf, size_t buf__sz) __ksym;
      |            ^
.../selftests/bpf/tools/include/vmlinux.h:153922:12: note: previous declaration is here
 153922 | extern int bpf_path_d_path(struct path *path, char *buf, size_t buf__sz) __weak __ksym;

Fixes: 1b8abbb121 ("bpf...d_path(): constify path argument")
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-04 09:05:23 -07:00
Linus Torvalds
f3826aa996 guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM
   types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
   precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
   no way to detect private vs. shared).
 
   This lays the groundwork for removal of guest memory from the kernel direct
   map, as well as for limited mmap() for guest_memfd-backed memory.
 
   For more information see:
   * a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
   * https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
     (guest_memfd in Firecracker)
   * https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
     (direct map removal)
   * https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
     (mmap support)
 
 ARM:
 
 * Add support for FF-A 1.2 as the secure memory conduit for pKVM,
   allowing more registers to be used as part of the message payload.
 
 * Change the way pKVM allocates its VM handles, making sure that the
   privileged hypervisor is never tricked into using uninitialised
   data.
 
 * Speed up MMIO range registration by avoiding unnecessary RCU
   synchronisation, which results in VMs starting much quicker.
 
 * Add the dump of the instruction stream when panic-ing in the EL2
   payload, just like the rest of the kernel has always done. This will
   hopefully help debugging non-VHE setups.
 
 * Add 52bit PA support to the stage-1 page-table walker, and make use
   of it to populate the fault level reported to the guest on failing
   to translate a stage-1 walk.
 
 * Add NV support to the GICv3-on-GICv5 emulation code, ensuring
   feature parity for guests, irrespective of the host platform.
 
 * Fix some really ugly architecture problems when dealing with debug
   in a nested VM. This has some bad performance impacts, but is at
   least correct.
 
 * Add enough infrastructure to be able to disable EL2 features and
   give effective values to the EL2 control registers. This then allows
   a bunch of features to be turned off, which helps cross-host
   migration.
 
 * Large rework of the selftest infrastructure to allow most tests to
   transparently run at EL2. This is the first step towards enabling
   NV testing.
 
 * Various fixes and improvements all over the map, including one BE
   fix, just in time for the removal of the feature.
 
 LoongArch:
 
 * Detect page table walk feature on new hardware
 
 * Add sign extension with kernel MMIO/IOCSR emulation
 
 * Improve in-kernel IPI emulation
 
 * Improve in-kernel PCH-PIC emulation
 
 * Move kvm_iocsr tracepoint out of generic code
 
 RISC-V:
 
 * Added SBI FWFT extension for Guest/VM with misaligned delegation and
   pointer masking PMLEN features
 
 * Added ONE_REG interface for SBI FWFT extension
 
 * Added Zicbop and bfloat16 extensions for Guest/VM
 
 * Enabled more common KVM selftests for RISC-V
 
 * Added SBI v3.0 PMU enhancements in KVM and perf driver
 
 s390:
 
 * Improve interrupt cpu for wakeup, in particular the heuristic to decide
   which vCPU to deliver a floating interrupt to.
 
 * Clear the PTE when discarding a swapped page because of CMMA; this
   bug was introduced in 6.16 when refactoring gmap code.
 
 x86 selftests:
 
 * Add #DE coverage in the fastops test (the only exception that's guest-
   triggerable in fastop-emulated instructions).
 
 * Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
   Forest (SRF) and Clearwater Forest (CWF).
 
 * Minor cleanups and improvements
 
 x86 (guest side):
 
 * For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.
 
 * Make kvm_async_pf_task_wake() a local static helper and remove its
   export.
 
 * Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.
 
 Generic:
 
 * Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
   now included in GFP_NOWAIT.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
 FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
 6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
 F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
 i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
 FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
 =fyov
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This excludes the bulk of the x86 changes, which I will send
  separately. They have two not complex but relatively unusual conflicts
  so I will wait for other dust to settle.

  guest_memfd:

   - Add support for host userspace mapping of guest_memfd-backed memory
     for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
     (which isn't precisely the same thing as CoCo VMs, since x86's
     SEV-MEM and SEV-ES have no way to detect private vs. shared).

     This lays the groundwork for removal of guest memory from the
     kernel direct map, as well as for limited mmap() for
     guest_memfd-backed memory.

     For more information see:
       - commit a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD")
       - guest_memfd in Firecracker:
           https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
       - direct map removal:
           https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
       - mmap support:
           https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/

  ARM:

   - Add support for FF-A 1.2 as the secure memory conduit for pKVM,
     allowing more registers to be used as part of the message payload.

   - Change the way pKVM allocates its VM handles, making sure that the
     privileged hypervisor is never tricked into using uninitialised
     data.

   - Speed up MMIO range registration by avoiding unnecessary RCU
     synchronisation, which results in VMs starting much quicker.

   - Add the dump of the instruction stream when panic-ing in the EL2
     payload, just like the rest of the kernel has always done. This
     will hopefully help debugging non-VHE setups.

   - Add 52bit PA support to the stage-1 page-table walker, and make use
     of it to populate the fault level reported to the guest on failing
     to translate a stage-1 walk.

   - Add NV support to the GICv3-on-GICv5 emulation code, ensuring
     feature parity for guests, irrespective of the host platform.

   - Fix some really ugly architecture problems when dealing with debug
     in a nested VM. This has some bad performance impacts, but is at
     least correct.

   - Add enough infrastructure to be able to disable EL2 features and
     give effective values to the EL2 control registers. This then
     allows a bunch of features to be turned off, which helps cross-host
     migration.

   - Large rework of the selftest infrastructure to allow most tests to
     transparently run at EL2. This is the first step towards enabling
     NV testing.

   - Various fixes and improvements all over the map, including one BE
     fix, just in time for the removal of the feature.

  LoongArch:

   - Detect page table walk feature on new hardware

   - Add sign extension with kernel MMIO/IOCSR emulation

   - Improve in-kernel IPI emulation

   - Improve in-kernel PCH-PIC emulation

   - Move kvm_iocsr tracepoint out of generic code

  RISC-V:

   - Added SBI FWFT extension for Guest/VM with misaligned delegation
     and pointer masking PMLEN features

   - Added ONE_REG interface for SBI FWFT extension

   - Added Zicbop and bfloat16 extensions for Guest/VM

   - Enabled more common KVM selftests for RISC-V

   - Added SBI v3.0 PMU enhancements in KVM and perf driver

  s390:

   - Improve interrupt cpu for wakeup, in particular the heuristic to
     decide which vCPU to deliver a floating interrupt to.

   - Clear the PTE when discarding a swapped page because of CMMA; this
     bug was introduced in 6.16 when refactoring gmap code.

  x86 selftests:

   - Add #DE coverage in the fastops test (the only exception that's
     guest- triggerable in fastop-emulated instructions).

   - Fix PMU selftests errors encountered on Granite Rapids (GNR),
     Sierra Forest (SRF) and Clearwater Forest (CWF).

   - Minor cleanups and improvements

  x86 (guest side):

   - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
     overriding guest MTRR for TDX/SNP to fix an issue where ACPI
     auto-mapping could map devices as WB and prevent the device drivers
     from mapping their devices with UC/UC-.

   - Make kvm_async_pf_task_wake() a local static helper and remove its
     export.

   - Use native qspinlocks when running in a VM with dedicated
     vCPU=>pCPU bindings even when PV_UNHALT is unsupported.

  Generic:

   - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
     __GFP_NOWARN is now included in GFP_NOWAIT.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
  KVM: s390: Fix to clear PTE when discarding a swapped page
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
  KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  selftests/kvm: remove stale TODO in xapic_state_test
  KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
  ...
2025-10-04 08:52:16 -07:00
Linus Torvalds
55a42f78ff VFIO updates for v6.18-rc1
- Use fdinfo to expose the sysfs path of a device represented by a
    vfio device file. (Alex Mastro)
 
  - Mark vfio-fsl-mc, vfio-amba, and the reset functions for
    vfio-platform for removal as these are either orphaned or believed
    to be unused. (Alex Williamson)
 
  - Add reviewers for vfio-platform to save it from also being marked
    for removal. (Mostafa Saleh, Pranjal Shrivastava)
 
  - VFIO selftests, including basic sanity testing and minimal userspace
    drivers for testing against real hardware.  This is also expected to
    provide integration with KVM selftests for KVM-VFIO interfaces.
    (David Matlack, Josh Hilke)
 
  - Fix drivers/cdx and vfio/cdx to build without CONFIG_GENERIC_MSI_IRQ.
    (Nipun Gupta)
 
  - Fix reference leak in hisi_acc. (Miaoqian Lin)
 
  - Use consistent return for unsupported device feature. (Alex Mastro)
 
  - Unwind using the correct memory free callback in vfio/pds.
    (Zilin Guan)
 
  - Use IRQ_DISABLE_LAZY flag to improve handling of pre-PCI2.3 INTx
    and resolve stalled interrupt on ppc64. (Timothy Pearson)
 
  - Enable GB300 in nvgrace-gpu vfio-pci variant driver. (Tushar Dave)
 
  - Misc:
    - Drop unnecessary ternary conversion in vfio/pci. (Xichao Zhao)
    - Grammatical fix in nvgrace-gpu. (Morduan Zang)
    - Update Shameer's email address. (Shameer Kolothum)
    - Fix document build warning. (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmjbCBcbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsik80P/2GQQ25ipSDadWAMGE2f
 ylA03/rPJ0OoE4H09bvHELcrZEV0LvaaOpaT0xZfxLa/TuiYyY7h+Yi30BgVLZNQ
 pvD2RhKWhRheneFllaPCcYwfK80lntnmOHd6ZjKvpddXKEwoksXUq657yWtBqnvK
 fjIjLPx/gFpfvAFM3miHPHhPtURi3utTvKKF2U34qWPSYSqlczRqzHx+c0gyqMVQ
 iDYlKRbbpDIuTgh1MpL26Ia6xKsOUOKBe9pOh12pbB3Hix8ZWCDIVhPbUIj9uFoB
 uTrftguba9SMV1iMtr/aqiwImxEwp9gR3t6b0MRVWlHqx3QKn1/EgNWOI6ybRsfL
 FEspW4dPl9ruUTMbZ83fzvpJGihPx/nxoOnpSPd/cCCNLOyXbhSmZWA+3CgBdXME
 vu614SEyRqtdJSQY+RfVr0cM9yImWal0PLJeU2+/VII/Sp+kqYEm4mwVzxTwrrjk
 vSsLjg8Ch4zv/dNFnDikcHRpkxYmS5NLUeP2Htyfl1BVxHNLCATZWgSKzG3fFzV0
 jWP6yG27/dVrVhKXb9X+yrPFE9/2Uq9pUkIdDR/Mfb54GJtSXmJIQVIzgmSQpSlQ
 qXCZufVLY38xtLvl+hGpWa/DBBhnItRzkTwL7gkBlzz1L4Ajy4T84QqKLxmyxsIj
 bsmaPit0CQI0iOzZ1xMrJlq3
 =E6Zs
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.18-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Use fdinfo to expose the sysfs path of a device represented by a vfio
   device file (Alex Mastro)

 - Mark vfio-fsl-mc, vfio-amba, and the reset functions for
   vfio-platform for removal as these are either orphaned or believed to
   be unused (Alex Williamson)

 - Add reviewers for vfio-platform to save it from also being marked for
   removal (Mostafa Saleh, Pranjal Shrivastava)

 - VFIO selftests, including basic sanity testing and minimal userspace
   drivers for testing against real hardware. This is also expected to
   provide integration with KVM selftests for KVM-VFIO interfaces (David
   Matlack, Josh Hilke)

 - Fix drivers/cdx and vfio/cdx to build without CONFIG_GENERIC_MSI_IRQ
   (Nipun Gupta)

 - Fix reference leak in hisi_acc (Miaoqian Lin)

 - Use consistent return for unsupported device feature (Alex Mastro)

 - Unwind using the correct memory free callback in vfio/pds (Zilin
   Guan)

 - Use IRQ_DISABLE_LAZY flag to improve handling of pre-PCI2.3 INTx and
   resolve stalled interrupt on ppc64 (Timothy Pearson)

 - Enable GB300 in nvgrace-gpu vfio-pci variant driver (Tushar Dave)

 - Misc:
    - Drop unnecessary ternary conversion in vfio/pci (Xichao Zhao)
    - Grammatical fix in nvgrace-gpu (Morduan Zang)
    - Update Shameer's email address (Shameer Kolothum)
    - Fix document build warning (Alex Williamson)

* tag 'vfio-v6.18-rc1' of https://github.com/awilliam/linux-vfio: (48 commits)
  vfio/nvgrace-gpu: Add GB300 SKU to the devid table
  vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices
  vfio/pds: replace bitmap_free with vfree
  vfio: return -ENOTTY for unsupported device feature
  hisi_acc_vfio_pci: Fix reference leak in hisi_acc_vfio_debug_init
  vfio/platform: Mark reset drivers for removal
  vfio/amba: Mark for removal
  MAINTAINERS: Add myself as VFIO-platform reviewer
  MAINTAINERS: Add myself as VFIO-platform reviewer
  docs: proc.rst: Fix VFIO Device title formatting
  vfio: selftests: Fix .gitignore for already tracked files
  vfio/cdx: update driver to build without CONFIG_GENERIC_MSI_IRQ
  cdx: don't select CONFIG_GENERIC_MSI_IRQ
  MAINTAINERS: Update Shameer Kolothum's email address
  vfio: selftests: Add a script to help with running VFIO selftests
  vfio: selftests: Make iommufd the default iommu_mode
  vfio: selftests: Add iommufd mode
  vfio: selftests: Add iommufd_compat_type1{,v2} modes
  vfio: selftests: Add vfio_type1v2_mode
  vfio: selftests: Replicate tests across all iommu_modes
  ...
2025-10-04 08:24:54 -07:00
Linus Torvalds
cbf33b8e0b bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjfBtMACgkQ6rmadz2v
 bToTuA/+JizieMIW6eqzCDrrhj45DmuKu6gUMrkmM0xte3QCoLly5FDGJ9HMmrqf
 L5xfz/TAExg/Nh9a0zpCGE7/fMr9P10Z5vVHMe/I06rcW+/fDrAERXDurlfpfliv
 /7KqF/lq1D6kp7bcBzWX4dzkT66Nh/Ax6ZJKIXF/K3iwFXBb1A/95Wwgqxuc3xNM
 n0kthEhnX6x5cojy6fla2zYNoebNLVxPvafTcAtnE0QM8vxdOl0Q7KWrdQpVE5V/
 z6WF2M822eTQeUIy6k/ZRckFlmEwa++I+w5EnygBFP8INUHARdYNBQpQVFkg31jz
 GYGlbTs7jaDbMNWkEhKyDbuhL5Xq9/5FGOaLI9CvDtpPjbfYTZMGtIn9L902XtqE
 VEfBcTA/g/qd8Urx5622sfPwau64LrSAKBRE/4CJ7/dsnPx4BdKavsA5DMVuyRIm
 Au6tOxlRFhPWhultnmmbFBjwn33xTleYNUF9wpWQnijNq+Ph29gKZk8Ac5eKKS6A
 dTSCxv7LGafezN/evr779vXohZyF9vWfqHdITVzTo40tsSbi5v4uyfJ2qzoaS7Gf
 UQ4F2nkjqddhG/O1W2TY8aFzmWXP2q92V6yqkcq5zjA3D1Ue6JOxRzG/3kNzgsLX
 Gf1DmPosUyeLNpwdDYRD83Dk7tJ7rAI+x3i6Iwh8qivIhC4CzrU=
 =Kzf9
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix selftests/bpf (typo, conflicts) and unbreak BPF CI (Jiri Olsa)

 - Remove linux/unaligned.h dependency for libbpf_sha256 (Andrii
   Nakryiko) and add a test (Eric Biggers)

 - Reject negative offsets for ALU operations in the verifier (Yazhou
   Tang) and add a test (Eduard Zingerman)

 - Skip scalar adjustment for BPF_NEG operation if destination register
   is a pointer (Brahmajit Das) and add a test (KaFai Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  libbpf: Fix missing #pragma in libbpf_utils.c
  selftests/bpf: Add tests for rejection of ALU ops with negative offsets
  selftests/bpf: Add test for libbpf_sha256()
  bpf: Reject negative offsets for ALU ops
  libbpf: remove linux/unaligned.h dependency for libbpf_sha256()
  libbpf: move libbpf_sha256() implementation into libbpf_utils.c
  libbpf: move libbpf_errstr() into libbpf_utils.c
  libbpf: remove unused libbpf_strerror_r and STRERR_BUFSIZE
  libbpf: make libbpf_errno.c into more generic libbpf_utils.c
  selftests/bpf: Add test for BPF_NEG alu on CONST_PTR_TO_MAP
  bpf: Skip scalar adjustment for BPF_NEG if dst is a pointer
  selftests/bpf: Fix realloc size in bpf_get_addrs
  selftests/bpf: Fix typo in subtest_basic_usdt after merge conflict
  selftests/bpf: Fix open-coded gettid syscall in uprobe syscall tests
2025-10-03 19:38:19 -07:00
Linus Torvalds
e56ebe27a0 iommufd 6.18 merge window pull
Two minor fixes
 
 - Make the selftest work again on x86 platforms with iommus enabled
 
 - Fix a compiler warning in the userspace kselftest
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaN6InwAKCRCFwuHvBreF
 YXsrAQCu42l0LtnavsGRgF4v5BhE9dx+WqSI41bIq3jqUPOHPgEAn4xpNyNbuDA0
 3R31E9C3exWSzsyp1XABfSz13bNklQU=
 =7ZTd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Two minor fixes:

   - Make the selftest work again on x86 platforms with iommus enabled

   - Fix a compiler warning in the userspace kselftest"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Register iommufd mock devices with fwspec
  iommu/selftest: prevent use of uninitialized variable
2025-10-03 18:18:48 -07:00
Linus Torvalds
50647a1176 file->f_path constification
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaN3daAAKCRBZ7Krx/gZQ
 6zNWAP9kD6rOJRNqDgea4pibDPa47Tps/WM5tsDv3dsLliY29gEA6sveOWZ3guAj
 4oY3ts/NtHLWXvhI7Vd/1mr2aTKEZQk=
 =YNK+
 -----END PGP SIGNATURE-----

Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file->f_path constification from Al Viro:
 "Only one thing was modifying ->f_path of an opened file - acct(2).

  Massaging that away and constifying a bunch of struct path * arguments
  in functions that might be given &file->f_path ends up with the
  situation where we can turn ->f_path into an anon union of const
  struct path f_path and struct path __f_path, the latter modified only
  in a few places in fs/{file_table,open,namei}.c, all for struct file
  instances that are yet to be opened"

* tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
  Have cc(1) catch attempts to modify ->f_path
  kernel/acct.c: saner struct file treatment
  configfs:get_target() - release path as soon as we grab configfs_item reference
  apparmor/af_unix: constify struct path * arguments
  ovl_is_real_file: constify realpath argument
  ovl_sync_file(): constify path argument
  ovl_lower_dir(): constify path argument
  ovl_get_verity_digest(): constify path argument
  ovl_validate_verity(): constify {meta,data}path arguments
  ovl_ensure_verity_loaded(): constify datapath argument
  ksmbd_vfs_set_init_posix_acl(): constify path argument
  ksmbd_vfs_inherit_posix_acl(): constify path argument
  ksmbd_vfs_kern_path_unlock(): constify path argument
  ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path *
  check_export(): constify path argument
  export_operations->open(): constify path argument
  rqst_exp_get_by_name(): constify path argument
  nfs: constify path argument of __vfs_getattr()
  bpf...d_path(): constify path argument
  done_path_create(): constify path argument
  ...
2025-10-03 16:32:36 -07:00
Linus Torvalds
6238729bfc fuse update for 6.18
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCaNuYpQAKCRDh3BK/laaZ
 PBF6APoDWzheST2Gw3LPkU03xF6Yn+ZMP0p3L676yYBb07LrWQD8CBsfOkMei6W/
 L+/GaVAbpIsuCQ6GTPE/6HTLWevU2Q0=
 =kR5v
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Extend copy_file_range interface to be fully 64bit capable (Miklos)

 - Add selftest for fusectl (Chen Linxuan)

 - Move fuse docs into a separate directory (Bagas Sanjaya)

 - Allow fuse to enter freezable state in some cases (Sergey
   Senozhatsky)

 - Clean up writeback accounting after removing tmp page copies (Joanne)

 - Optimize virtiofs request handling (Li RongQing)

 - Add synchronous FUSE_INIT support (Miklos)

 - Allow server to request prune of unused inodes (Miklos)

 - Fix deadlock with AIO/sync release (Darrick)

 - Add some prep patches for block/iomap support (Darrick)

 - Misc fixes and cleanups

* tag 'fuse-update-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (26 commits)
  fuse: move CREATE_TRACE_POINTS to a separate file
  fuse: move the backing file idr and code into a new source file
  fuse: enable FUSE_SYNCFS for all fuseblk servers
  fuse: capture the unique id of fuse commands being sent
  fuse: fix livelock in synchronous file put from fuseblk workers
  mm: fix lockdep issues in writeback handling
  fuse: add prune notification
  fuse: remove redundant calls to fuse_copy_finish() in fuse_notify()
  fuse: fix possibly missing fuse_copy_finish() call in fuse_notify()
  fuse: remove FUSE_NOTIFY_CODE_MAX from <uapi/linux/fuse.h>
  fuse: remove fuse_readpages_end() null mapping check
  fuse: fix references to fuse.rst -> fuse/fuse.rst
  fuse: allow synchronous FUSE_INIT
  fuse: zero initialize inode private data
  fuse: remove unused 'inode' parameter in fuse_passthrough_open
  virtio_fs: fix the hash table using in virtio_fs_enqueue_req()
  mm: remove BDI_CAP_WRITEBACK_ACCT
  fuse: use default writeback accounting
  virtio_fs: Remove redundant spinlock in virtio_fs_request_complete()
  fuse: remove unneeded offset assignment when filling write pages
  ...
2025-10-03 12:48:18 -07:00
Linus Torvalds
e406d57be7 Patch series in this pull request:
- The 3 patch series "ida: Remove the ida_simple_xxx() API" from
   Christophe Jaillet completes the removal of this legacy IDR API.
 
 - The 9 patch series "panic: introduce panic status function family"
   from Jinchao Wang provides a number of cleanups to the panic code and
   its various helpers, which were rather ad-hoc and scattered all over the
   place.
 
 - The 5 patch series "tools/delaytop: implement real-time keyboard
   interaction support" from Fan Yu adds a few nice user-facing usability
   changes to the delaytop monitoring tool.
 
 - The 3 patch series "efi: Fix EFI boot with kexec handover (KHO)" from
   Evangelos Petrongonas fixes a panic which was happening with the
   combination of EFI and KHO.
 
 - The 2 patch series "Squashfs: performance improvement and a sanity
   check" from Phillip Lougher teaches squashfs's lseek() about
   SEEK_DATA/SEEK_HOLE.  A mere 150x speedup was measured for a well-chosen
   microbenchmark.
 
 - Plus another 50-odd singleton patches all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN78zwAKCRDdBJ7gKXxA
 jhLeAQCddTv0XtSUTrvBvmrJVUBrQQeJc+LtNopMIjfAF/WAWAEAogSVKxg+HHEB
 GaVixx4zDriNzEqrqiCx9rm4l+YooQA=
 =XRe0
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet
   completes the removal of this legacy IDR API

 - "panic: introduce panic status function family" from Jinchao Wang
   provides a number of cleanups to the panic code and its various
   helpers, which were rather ad-hoc and scattered all over the place

 - "tools/delaytop: implement real-time keyboard interaction support"
   from Fan Yu adds a few nice user-facing usability changes to the
   delaytop monitoring tool

 - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos
   Petrongonas fixes a panic which was happening with the combination of
   EFI and KHO

 - "Squashfs: performance improvement and a sanity check" from Phillip
   Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere
   150x speedup was measured for a well-chosen microbenchmark

 - plus another 50-odd singleton patches all over the place

* tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits)
  Squashfs: reject negative file sizes in squashfs_read_inode()
  kallsyms: use kmalloc_array() instead of kmalloc()
  MAINTAINERS: update Sibi Sankar's email address
  Squashfs: add SEEK_DATA/SEEK_HOLE support
  Squashfs: add additional inode sanity checking
  lib/genalloc: fix device leak in of_gen_pool_get()
  panic: remove CONFIG_PANIC_ON_OOPS_VALUE
  ocfs2: fix double free in user_cluster_connect()
  checkpatch: suppress strscpy warnings for userspace tools
  cramfs: fix incorrect physical page address calculation
  kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
  Squashfs: fix uninit-value in squashfs_get_parent
  kho: only fill kimage if KHO is finalized
  ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name()
  kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths
  sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock
  coccinelle: platform_no_drv_owner: handle also built-in drivers
  coccinelle: of_table: handle SPI device ID tables
  lib/decompress: use designated initializers for struct compress_format
  efi: support booting with kexec handover (KHO)
  ...
2025-10-02 18:44:54 -07:00
Linus Torvalds
8804d970fa Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from
   Kairui Song improves performance and reduces the failure rate of swap
   cluster allocation.
 
 - The 4 patch series "support large align and nid in Rust allocators"
   from Vitaly Wool permits Rust allocators to set NUMA node and large
   alignment when perforning slub and vmalloc reallocs.
 
 - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from
   Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets
   for virtual address spaces for ops-level DAMOS filters.
 
 - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock"
   from Suren Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps.
 
 - The 2 patch series "mm/mincore: minor clean up for swap cache
   checking" from Kairui Song performs some cleanup in the swap code.
 
 - The 11 patch series "mm: vm_normal_page*() improvements" from David
   Hildenbrand provides code cleanup in the pagemap code.
 
 - The 5 patch series "add persistent huge zero folio support" from
   Pankaj Raghav provides a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero.
 
 - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a
   few touchups to the recently added Kexec Handover feature.
 
 - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all
   arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap.  To
   end the constant struggle with space shortage on 32-bit conflicting with
   64-bit's needs.
 
 - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li
   cleans up some swap code.
 
 - The 7 patch series "selftests/mm: Fix false positives and skip
   unsupported tests" from Donet Tom fixes a few things in our selftests
   code.
 
 - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide
   THPs when advised" from David Hildenbrand "allows individual processes
   to opt-out of THP=always into THP=madvise, without affecting other
   workloads on the system".
 
   It's a long story - the [1/N] changelog spells out the considerations.
 
 - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox
   gets us started on the memdesc project.  Please see
   https://kernelnewbies.org/MatthewWilcox/Memdescs and
   https://blogs.oracle.com/linux/post/introducing-memdesc.
 
 - The 3 patch series "Tiny optimization for large read operations" from
   Chi Zhiling improves the efficiency of the pagecache read path.
 
 - The 5 patch series "Better split_huge_page_test result check" from Zi
   Yan improves our folio splitting selftest code.
 
 - The 2 patch series "test that rmap behaves as expected" from Wei Yang
   adds some rmap selftests.
 
 - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig
   removes that function and converts its two remaining callers.
 
 - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain
   fixes some UFFD selftests issues.
 
 - The 3 patch series "introduce kernel file mapped folios" from Boris
   Burkov introduces the concept of "kernel file pages".  Using these
   permits btrfs to account its metadata pages to the root cgroup, rather
   than to the cgroups of random inappropriate tasks.
 
 - The 2 patch series "mm/pageblock: improve readability of some
   pageblock handling" from Wei Yang provides some readability improvements
   to the page allocator code.
 
 - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae
   Park teaches DAMON to understand arm32 highmem.
 
 - The 4 patch series "tools: testing: Use existing atomic.h for
   vma/maple tests" from Brendan Jackman performs some code cleanups and
   deduplication under tools/testing/.
 
 - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from
   Liam Howlett fixes a couple of 32-bit issues in
   tools/testing/radix-tree.c.
 
 - The 2 patch series "kasan: unify kasan_enabled() and remove
   arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN
   arch-specific initialization code into a common arch-neutral
   implementation.
 
 - The 3 patch series "mm: remove zpool" from Johannes Weiner removes
   zspool - an indirection layer which now only redirects to a single thing
   (zsmalloc).
 
 - The 2 patch series "mm: task_stack: Stack handling cleanups" from
   Pasha Tatashin makes a couple of cleanups in the fork code.
 
 - The 37 patch series "mm: remove nth_page()" from David Hildenbrand
   makes rather a lot of adjustments at various nth_page() callsites,
   eventually permitting the removal of that undesirable helper function.
 
 - The 2 patch series "introduce kasan.write_only option in hw-tags" from
   Yeoreum Yun creates a KASAN read-only mode for ARM, using that
   architecture's memory tagging feature.  It is felt that a read-only mode
   KASAN is suitable for use in production systems rather than debug-only.
 
 - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation"
   from Kefeng Wang does some tidying in the hugetlb folio allocation code.
 
 - The 12 patch series "mm: establish const-correctness for pointer
   parameters" from Max Kellermann makes quite a number of the MM API
   functions more accurate about the constness of their arguments.  This
   was getting in the way of subsystems (in this case CEPH) when they
   attempt to improving their own const/non-const accuracy.
 
 - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola
   fixes a number of code sites which were confused over when to use
   free_pages() vs __free_pages().
 
 - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice
   Ryhl makes the mapletree code accessible to Rust.  Required by nouveau
   and by its forthcoming successor: the new Rust Nova driver.
 
 - The 2 patch series "selftests/mm: split_huge_page_test:
   split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and
   some cleanups to the thp selftesting code.
 
 - The 14 patch series "mm, swap: introduce swap table as swap cache
   (phase I)" from Chris Li and Kairui Song is the first step along the
   path to implementing "swap tables" - a new approach to swap allocation
   and state tracking which is expected to yield speed and space
   improvements.  This patchset itself yields a 5-20% performance benefit
   in some situations.
 
 - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes
   the new memdesc layer to clean up the ptdesc code a little.
 
 - The 3 patch series "Fix va_high_addr_switch.sh test failure" from
   Chunyu Hu fixes some issues in our 5-level pagetable selftesting code.
 
 - The 2 patch series "Minor fixes for memory allocation profiling" from
   Suren Baghdasaryan addresses a couple of minor issues in relatively new
   memory allocation profiling feature.
 
 - The 3 patch series "Small cleanups" from Matthew Wilcox has a few
   cleanups in preparation for more memdesc work.
 
 - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and
   DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in
   furtherance of supporting arm highmem.
 
 - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix
   warnings" from Muhammad Anjum adds that compiler check to selftests code
   and fixes the fallout, by removing dead code.
 
 - The 10 patch series "Improvements to Victim Process Thawing and OOM
   Reaper Traversal Order" from zhongjinji makes a number of improvements
   in the OOM killer: mainly thawing a more appropriate group of victim
   threads so they can release resources.
 
 - The 5 patch series "mm/damon: misc fixups and improvements for 6.18"
   from SeongJae Park is a bunch of small and unrelated fixups for DAMON.
 
 - The 7 patch series "mm/damon: define and use DAMON initialization
   check function" from SeongJae Park implement reliability and
   maintainability improvements to a recently-added bug fix.
 
 - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and
   non-idle ages" from SeongJae Park provides additional transparency to
   userspace clients of the DAMON_STAT information.
 
 - The 2 patch series "Expand scope of khugepaged anonymous collapse"
   from Dev Jain removes some constraints on khubepaged's collapsing of
   anon VMAs.  It also increases the success rate of MADV_COLLAPSE against
   an anon vma.
 
 - The 2 patch series "mm: do not assume file == vma->vm_file in
   compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards
   removal of file_operations.mmap().  This patchset concentrates upon
   clearing up the treatment of stacked filesystems.
 
 - The 6 patch series "mm: Improve mlock tracking for large folios" from
   Kiryl Shutsemau provides some fixes and improvements to mlock's tracking
   of large folios.  /proc/meminfo's "Mlocked" field became more accurate.
 
 - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters
   during fork" from Donet Tom fixes several user-visible KSM stats
   inaccuracies across forks and adds selftest code to verify these
   counters.
 
 - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei
   Yang addresses some potential but presently benign issues in KSM's
   mm_slot handling.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA
 jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1
 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA=
 =4mhY
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "mm, swap: improve cluster scan strategy" from Kairui Song improves
   performance and reduces the failure rate of swap cluster allocation

 - "support large align and nid in Rust allocators" from Vitaly Wool
   permits Rust allocators to set NUMA node and large alignment when
   perforning slub and vmalloc reallocs

 - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
   DAMOS_STAT's handling of the DAMON operations sets for virtual
   address spaces for ops-level DAMOS filters

 - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
   Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps

 - "mm/mincore: minor clean up for swap cache checking" from Kairui Song
   performs some cleanup in the swap code

 - "mm: vm_normal_page*() improvements" from David Hildenbrand provides
   code cleanup in the pagemap code

 - "add persistent huge zero folio support" from Pankaj Raghav provides
   a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero

 - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
   the recently added Kexec Handover feature

 - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
   Stoakes turns mm_struct.flags into a bitmap. To end the constant
   struggle with space shortage on 32-bit conflicting with 64-bit's
   needs

 - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
   code

 - "selftests/mm: Fix false positives and skip unsupported tests" from
   Donet Tom fixes a few things in our selftests code

 - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
   from David Hildenbrand "allows individual processes to opt-out of
   THP=always into THP=madvise, without affecting other workloads on the
   system".

   It's a long story - the [1/N] changelog spells out the considerations

 - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
   the memdesc project. Please see

      https://kernelnewbies.org/MatthewWilcox/Memdescs and
      https://blogs.oracle.com/linux/post/introducing-memdesc

 - "Tiny optimization for large read operations" from Chi Zhiling
   improves the efficiency of the pagecache read path

 - "Better split_huge_page_test result check" from Zi Yan improves our
   folio splitting selftest code

 - "test that rmap behaves as expected" from Wei Yang adds some rmap
   selftests

 - "remove write_cache_pages()" from Christoph Hellwig removes that
   function and converts its two remaining callers

 - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
   selftests issues

 - "introduce kernel file mapped folios" from Boris Burkov introduces
   the concept of "kernel file pages". Using these permits btrfs to
   account its metadata pages to the root cgroup, rather than to the
   cgroups of random inappropriate tasks

 - "mm/pageblock: improve readability of some pageblock handling" from
   Wei Yang provides some readability improvements to the page allocator
   code

 - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
   to understand arm32 highmem

 - "tools: testing: Use existing atomic.h for vma/maple tests" from
   Brendan Jackman performs some code cleanups and deduplication under
   tools/testing/

 - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
   a couple of 32-bit issues in tools/testing/radix-tree.c

 - "kasan: unify kasan_enabled() and remove arch-specific
   implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
   initialization code into a common arch-neutral implementation

 - "mm: remove zpool" from Johannes Weiner removes zspool - an
   indirection layer which now only redirects to a single thing
   (zsmalloc)

 - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
   couple of cleanups in the fork code

 - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
   adjustments at various nth_page() callsites, eventually permitting
   the removal of that undesirable helper function

 - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
   creates a KASAN read-only mode for ARM, using that architecture's
   memory tagging feature. It is felt that a read-only mode KASAN is
   suitable for use in production systems rather than debug-only

 - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
   some tidying in the hugetlb folio allocation code

 - "mm: establish const-correctness for pointer parameters" from Max
   Kellermann makes quite a number of the MM API functions more accurate
   about the constness of their arguments. This was getting in the way
   of subsystems (in this case CEPH) when they attempt to improving
   their own const/non-const accuracy

 - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
   code sites which were confused over when to use free_pages() vs
   __free_pages()

 - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
   mapletree code accessible to Rust. Required by nouveau and by its
   forthcoming successor: the new Rust Nova driver

 - "selftests/mm: split_huge_page_test: split_pte_mapped_thp
   improvements" from David Hildenbrand adds a fix and some cleanups to
   the thp selftesting code

 - "mm, swap: introduce swap table as swap cache (phase I)" from Chris
   Li and Kairui Song is the first step along the path to implementing
   "swap tables" - a new approach to swap allocation and state tracking
   which is expected to yield speed and space improvements. This
   patchset itself yields a 5-20% performance benefit in some situations

 - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
   layer to clean up the ptdesc code a little

 - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
   issues in our 5-level pagetable selftesting code

 - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
   addresses a couple of minor issues in relatively new memory
   allocation profiling feature

 - "Small cleanups" from Matthew Wilcox has a few cleanups in
   preparation for more memdesc work

 - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
   Quanmin Yan makes some changes to DAMON in furtherance of supporting
   arm highmem

 - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
   Anjum adds that compiler check to selftests code and fixes the
   fallout, by removing dead code

 - "Improvements to Victim Process Thawing and OOM Reaper Traversal
   Order" from zhongjinji makes a number of improvements in the OOM
   killer: mainly thawing a more appropriate group of victim threads so
   they can release resources

 - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
   is a bunch of small and unrelated fixups for DAMON

 - "mm/damon: define and use DAMON initialization check function" from
   SeongJae Park implement reliability and maintainability improvements
   to a recently-added bug fix

 - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
   SeongJae Park provides additional transparency to userspace clients
   of the DAMON_STAT information

 - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
   some constraints on khubepaged's collapsing of anon VMAs. It also
   increases the success rate of MADV_COLLAPSE against an anon vma

 - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
   from Lorenzo Stoakes moves us further towards removal of
   file_operations.mmap(). This patchset concentrates upon clearing up
   the treatment of stacked filesystems

 - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
   provides some fixes and improvements to mlock's tracking of large
   folios. /proc/meminfo's "Mlocked" field became more accurate

 - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
   Donet Tom fixes several user-visible KSM stats inaccuracies across
   forks and adds selftest code to verify these counters

 - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
   some potential but presently benign issues in KSM's mm_slot handling

* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
  mm: swap: check for stable address space before operating on the VMA
  mm: convert folio_page() back to a macro
  mm/khugepaged: use start_addr/addr for improved readability
  hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
  alloc_tag: fix boot failure due to NULL pointer dereference
  mm: silence data-race in update_hiwater_rss
  mm/memory-failure: don't select MEMORY_ISOLATION
  mm/khugepaged: remove definition of struct khugepaged_mm_slot
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  hugetlb: increase number of reserving hugepages via cmdline
  selftests/mm: add fork inheritance test for ksm_merging_pages counter
  mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
  drivers/base/node: fix double free in register_one_node()
  mm: remove PMD alignment constraint in execmem_vmalloc()
  mm/memory_hotplug: fix typo 'esecially' -> 'especially'
  mm/rmap: improve mlock tracking for large folios
  mm/filemap: map entire large folio faultaround
  mm/fault: try to map the entire file folio in finish_fault()
  mm/rmap: mlock large folios in try_to_unmap_one()
  mm/rmap: fix a mlock race condition in folio_referenced_one()
  ...
2025-10-02 18:18:33 -07:00
Linus Torvalds
24d9e8b3c9 slab updates for 6.18
-----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCAA5FiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmja74IbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJELvgsHXSRYiacR4H/04aBsr7LZnTJVeZLQwK
 HKoOwXBqiQyqPdjKXGKnp7Mh9gRp2W3V11VsYTuDJNUS+Vz5YXW0z8cRnUfZ3SYs
 l+GZC3vZeAy2EVJE1U6Mb673hU8vziI80IO2q/tGzaj9a+wC3L0lemc+YFQTwG+u
 pMtt8zU2vHRjgkx8TNNqJBBOLLDV+RzIl8pqXVnh4eju6x6ZdreGnjXaePYMdjG0
 fXLf9XwIeWREqbfeOCEOB50Ts71kkdiOeskwnJyfCTDT8WTu3zC/dICqfh66e3Gg
 8hQKvMsuKpm/FwbtgdB0WvaDjENH6PmY+ubLYVxwvNpcsTSqfe0IYGm+HpUP+TPf
 m+Y=
 =w+JL
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - A new layer for caching objects for allocation and free via percpu
   arrays called sheaves.

   The aim is to combine the good parts of SLAB (lower-overhead and
   simpler percpu caching, compared to SLUB) without the past issues
   with arrays for freeing remote NUMA node objects and their flushing.

   It also allows more efficient kfree_rcu(), and cheaper object
   preallocations for cases where the exact number of objects is
   unknown, but an upper bound is.

   Currently VMAs and maple nodes are using this new caching, with a
   plan to enable it for all caches and remove the complex SLUB fastpath
   based on cpu (partial) slabs and this_cpu_cmpxchg_double().
   (Vlastimil Babka, with Liam Howlett and Pedro Falcato for the maple
   tree changes)

 - Re-entrant kmalloc_nolock(), which allows opportunistic allocations
   from NMI and tracing/kprobe contexts.

   Building on prior page allocator and memcg changes, it will result in
   removing BPF-specific caches on top of slab (Alexei Starovoitov)

 - Various fixes and cleanups. (Kuan-Wei Chiu, Matthew Wilcox, Suren
   Baghdasaryan, Ye Liu)

* tag 'slab-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (40 commits)
  slab: Introduce kmalloc_nolock() and kfree_nolock().
  slab: Reuse first bit for OBJEXTS_ALLOC_FAIL
  slab: Make slub local_(try)lock more precise for LOCKDEP
  mm: Introduce alloc_frozen_pages_nolock()
  mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock().
  locking/local_lock: Introduce local_lock_is_locked().
  maple_tree: Convert forking to use the sheaf interface
  maple_tree: Add single node allocation support to maple state
  maple_tree: Prefilled sheaf conversion and testing
  tools/testing: Add support for prefilled slab sheafs
  maple_tree: Replace mt_free_one() with kfree()
  maple_tree: Use kfree_rcu in ma_free_rcu
  testing/radix-tree/maple: Hack around kfree_rcu not existing
  tools/testing: include maple-shim.c in maple.c
  maple_tree: use percpu sheaves for maple_node_cache
  mm, vma: use percpu sheaves for vm_area_struct cache
  tools/testing: Add support for changes to slab for sheaves
  slab: allow NUMA restricted allocations to use percpu sheaves
  tools/testing/vma: Implement vm_refcnt reset
  slab: skip percpu sheaves for remote object freeing
  ...
2025-10-02 15:58:05 -07:00
Linus Torvalds
07fdad3a93 Networking changes for 6.18.
Core & protocols
 ----------------
 
  - Improve drop account scalability on NUMA hosts for RAW and UDP sockets
    and the backlog, almost doubling the Pps capacity under DoS.
 
  - Optimize the UDP RX performance under stress, reducing contention,
    revisiting the binary layout of the involved data structs and
    implementing NUMA-aware locking. This improves UDP RX performance by
    an additional 50%, even more under extreme conditions.
 
  - Add support for PSP encryption of TCP connections; this mechanism has
    some similarities with IPsec and TLS, but offers superior HW offloads
    capabilities.
 
  - Ongoing work to support Accurate ECN for TCP. AccECN allows more than
    one congestion notification signal per RTT and is a building block for
    Low Latency, Low Loss, and Scalable Throughput (L4S).
 
  - Reorganize the TCP socket binary layout for data locality, reducing
    the number of touched cachelines in the fastpath.
 
  - Refactor skb deferral free to better scale on large multi-NUMA hosts,
    this improves TCP and UDP RX performances significantly on such HW.
 
  - Increase the default socket memory buffer limits from 256K to 4M to
    better fit modern link speeds.
 
  - Improve handling of setups with a large number of nexthop, making dump
    operating scaling linearly and avoiding unneeded synchronize_rcu() on
    delete.
 
  - Improve bridge handling of VLAN FDB, storing a single entry per bridge
    instead of one entry per port; this makes the dump order of magnitude
    faster on large switches.
 
  - Restore IP ID correctly for encapsulated packets at GSO segmentation
    time, allowing GRO to merge packets in more scenarios.
 
  - Improve netfilter matching performance on large sets.
 
  - Improve MPTCP receive path performance by leveraging recently
    introduced core infrastructure (skb deferral free) and adopting recent
    TCP autotuning changes.
 
  - Allow bridges to redirect to a backup port when the bridge port is
    administratively down.
 
  - Introduce MPTCP 'laminar' endpoint that con be used only once per
    connection and simplify common MPTCP setups.
 
  - Add RCU safety to dst->dev, closing a lot of possible races.
 
  - A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing
    code duplication.
 
  - Supports pulling data from an skb frag into the linear area of an XDP
    buffer.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Generate netlink documentation from YAML using an integrated
    YAML parser.
 
 Driver API
 ----------
 
  - Support using IPv6 Flow Label in Rx hash computation and RSS queue
    selection.
 
  - Introduce API for fetching the DMA device for a given queue, allowing
    TCP zerocopy RX on more H/W setups.
 
  - Make XDP helpers compatible with unreadable memory, allowing more
    easily building DevMem-enabled drivers with a unified XDP/skbs
    datapath.
 
  - Add a new dedicated ethtool callback enabling drivers to provide the
    number of RX rings directly, improving efficiency and clarity in RX
    ring queries and RSS configuration.
 
  - Introduce a burst period for the health reporter, allowing better
    handling of multiple errors due to the same root cause.
 
  - Support for DPLL phase offset exponential moving average, controlling
    the average smoothing factor.
 
 Device drivers
 --------------
 
  - Add a new Huawei driver for 3rd gen NIC (hinic3).
 
  - Add a new SpacemiT driver for K1 ethernet MAC.
 
  - Add a generic abstraction for shared memory communication devices
    (dibps)
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - Use multiple per-queue doorbell, to avoid MMIO contention issues
      - support adjacent functions, allowing them to delegate their
        SR-IOV VFs to sibling PFs
      - support RSS for IPSec offload
      - support exposing raw cycle counters in PTP and mlx5
      - support for disabling host PFs.
    - Intel (100G, ice, idpf):
      - ice: support for SRIOV VFs over an Active-Active link aggregate
      - ice: support for firmware logging via debugfs
      - ice: support for Earliest TxTime First (ETF) hardware offload
      - idpf: support basic XDP functionalities and XSk
    - Broadcom (bnxt):
      - support Hyper-V VF ID
      - dynamic SRIOV resource allocations for RoCE
    - Meta (fbnic):
      - support queue API, zero-copy Rx and Tx
      - support basic XDP functionalities
      - devlink health support for FW crashes and OTP mem corruptions
      - expand hardware stats coverage to FEC, PHY, and Pause
    - Wangxun:
      - support ethtool coalesce options
      - support for multiple RSS contexts
 
  - Ethernet virtual:
    - Macsec:
      - replace custom netlink attribute checks with policy-level checks
    - Bonding:
      - support aggregator selection based on port priority
    - Microsoft vNIC:
      - use page pool fragments for RX buffers instead of full pages to
        improve memory efficiency
 
  - Ethernet NICs consumer, and embedded:
    - Qualcomm: support Ethernet function for IPQ9574 SoC
    - Airoha: implement wlan offloading via NPU
    - Freescale
      - enetc: add NETC timer PTP driver and add PTP support
      - fec: enable the Jumbo frame support for i.MX8QM
    - Renesas (R-Car S4): support HW offloading for layer 2 switching
      - support for RZ/{T2H, N2H} SoCs
    - Cadence (macb): support TAPRIO traffic scheduling
    - TI:
      - support for Gigabit ICSS ethernet SoC (icssm-prueth)
    - Synopsys (stmmac): a lot of cleanups
 
  - Ethernet PHYs:
    - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
      driver
    - Support bcm63268 GPHY power control
    - Support for Micrel lan8842 PHY and PTP
    - Support for Aquantia AQR412 and AQR115
 
  - CAN:
    - a large CAN-XL preparation work
    - reorganize raw_sock and uniqframe struct to minimize memory usage
    - rcar_canfd: update the CAN-FD handling
 
  - WiFi:
    - extended Neighbor Awareness Networking (NAN) support
    - S1G channel representation cleanup
    - improve S1G support
 
  - WiFi drivers:
    - Intel (iwlwifi):
      - major refactor and cleanup
    - Broadcom (brcm80211):
      - support for AP isolation
    - RealTek (rtw88/89) rtw88/89:
      - preparation work for RTL8922DE support
    - MediaTek (mt76):
      - HW restart improvements
      - MLO support
    - Qualcomm/Atheros (ath10k_
      - GTK rekey fixes
 
  - Bluetooth drivers:
    - btusb: support for several new IDs for MT7925
    - btintel: support for BlazarIW core
    - btintel_pcie: support for _suspend() / _resume()
    - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjdBdsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkWnAP/2Jz0ExnYMDxLxkkKqMte+3JNeF/KNjB
 zVIslYN/5ekkd1TMXyDLFlWHqUOUMSF9+cK5ZRMHG6/lzOueSzFuuuDFsWs6Kf2f
 q7Q9MzlXhR9++YCsdES1uS5x3PPjMInzo2ZivMFa6fUDLLFSzeAOKL9k+RS0EggU
 VlXv2Wkm73R0O6KAssgDsHke9cnNz+F0DzhQ0S3qkyZF9tS5NrDeUl7fZ47XZgwb
 ZuXdEzKmTTepo2XvZGxJoF2D7nekWFlHhLjEPpDJtET19nwhukCry41/FplJwlKR
 dWsYkqLkrOEQKFQ+q++/5c3BgFOG+IrQLbP5bGQF1tZRMl4Dqp9PDxK5fKJfccbS
 0VY3Y2qWOSYejT2Ji2kEHR5E5rPyFm3Y5A4Rnpz0yEHj14vL2v4zf7CZRkCyyDfD
 doIZXFGkM0+N7QeXLEN833cje9zjaXuP9GAE+2bb+wAWAZAqof4KX8JgHh+y5Rwm
 pvUtvFxmEtntlMwNBap8aT3FquGtfZncU8pzAE5kvWjuMvyF0NVWiE5rB2kSQM/X
 NLmUdvDyjwwJqthXAJG0fl+a0mNJ/kOAqSOKJDJrfKDGWa+ovwY0iY06SpK0wIbO
 Wz7tpMk5MSlYXW8xWMlbyhvvU6T9xuoQ2KV4QTdMxc6Ir3sNX6YkQr+gjQjxB0gx
 ST2QF6lZeWFh
 =w2Kz
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core & protocols:

   - Improve drop account scalability on NUMA hosts for RAW and UDP
     sockets and the backlog, almost doubling the Pps capacity under DoS

   - Optimize the UDP RX performance under stress, reducing contention,
     revisiting the binary layout of the involved data structs and
     implementing NUMA-aware locking. This improves UDP RX performance
     by an additional 50%, even more under extreme conditions

   - Add support for PSP encryption of TCP connections; this mechanism
     has some similarities with IPsec and TLS, but offers superior HW
     offloads capabilities

   - Ongoing work to support Accurate ECN for TCP. AccECN allows more
     than one congestion notification signal per RTT and is a building
     block for Low Latency, Low Loss, and Scalable Throughput (L4S)

   - Reorganize the TCP socket binary layout for data locality, reducing
     the number of touched cachelines in the fastpath

   - Refactor skb deferral free to better scale on large multi-NUMA
     hosts, this improves TCP and UDP RX performances significantly on
     such HW

   - Increase the default socket memory buffer limits from 256K to 4M to
     better fit modern link speeds

   - Improve handling of setups with a large number of nexthop, making
     dump operating scaling linearly and avoiding unneeded
     synchronize_rcu() on delete

   - Improve bridge handling of VLAN FDB, storing a single entry per
     bridge instead of one entry per port; this makes the dump order of
     magnitude faster on large switches

   - Restore IP ID correctly for encapsulated packets at GSO
     segmentation time, allowing GRO to merge packets in more scenarios

   - Improve netfilter matching performance on large sets

   - Improve MPTCP receive path performance by leveraging recently
     introduced core infrastructure (skb deferral free) and adopting
     recent TCP autotuning changes

   - Allow bridges to redirect to a backup port when the bridge port is
     administratively down

   - Introduce MPTCP 'laminar' endpoint that con be used only once per
     connection and simplify common MPTCP setups

   - Add RCU safety to dst->dev, closing a lot of possible races

   - A significant crypto library API for SCTP, MPTCP and IPv6 SR,
     reducing code duplication

   - Supports pulling data from an skb frag into the linear area of an
     XDP buffer

  Things we sprinkled into general kernel code:

   - Generate netlink documentation from YAML using an integrated YAML
     parser

  Driver API:

   - Support using IPv6 Flow Label in Rx hash computation and RSS queue
     selection

   - Introduce API for fetching the DMA device for a given queue,
     allowing TCP zerocopy RX on more H/W setups

   - Make XDP helpers compatible with unreadable memory, allowing more
     easily building DevMem-enabled drivers with a unified XDP/skbs
     datapath

   - Add a new dedicated ethtool callback enabling drivers to provide
     the number of RX rings directly, improving efficiency and clarity
     in RX ring queries and RSS configuration

   - Introduce a burst period for the health reporter, allowing better
     handling of multiple errors due to the same root cause

   - Support for DPLL phase offset exponential moving average,
     controlling the average smoothing factor

  Device drivers:

   - Add a new Huawei driver for 3rd gen NIC (hinic3)

   - Add a new SpacemiT driver for K1 ethernet MAC

   - Add a generic abstraction for shared memory communication
     devices (dibps)

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - Use multiple per-queue doorbell, to avoid MMIO contention
           issues
         - support adjacent functions, allowing them to delegate their
           SR-IOV VFs to sibling PFs
         - support RSS for IPSec offload
         - support exposing raw cycle counters in PTP and mlx5
         - support for disabling host PFs.
      - Intel (100G, ice, idpf):
         - ice: support for SRIOV VFs over an Active-Active link
           aggregate
         - ice: support for firmware logging via debugfs
         - ice: support for Earliest TxTime First (ETF) hardware offload
         - idpf: support basic XDP functionalities and XSk
      - Broadcom (bnxt):
         - support Hyper-V VF ID
         - dynamic SRIOV resource allocations for RoCE
      - Meta (fbnic):
         - support queue API, zero-copy Rx and Tx
         - support basic XDP functionalities
         - devlink health support for FW crashes and OTP mem corruptions
         - expand hardware stats coverage to FEC, PHY, and Pause
      - Wangxun:
         - support ethtool coalesce options
         - support for multiple RSS contexts

   - Ethernet virtual:
      - Macsec:
         - replace custom netlink attribute checks with policy-level
           checks
      - Bonding:
         - support aggregator selection based on port priority
      - Microsoft vNIC:
         - use page pool fragments for RX buffers instead of full pages
           to improve memory efficiency

   - Ethernet NICs consumer, and embedded:
      - Qualcomm: support Ethernet function for IPQ9574 SoC
      - Airoha: implement wlan offloading via NPU
      - Freescale
         - enetc: add NETC timer PTP driver and add PTP support
         - fec: enable the Jumbo frame support for i.MX8QM
      - Renesas (R-Car S4):
         - support HW offloading for layer 2 switching
         - support for RZ/{T2H, N2H} SoCs
      - Cadence (macb): support TAPRIO traffic scheduling
      - TI:
         - support for Gigabit ICSS ethernet SoC (icssm-prueth)
      - Synopsys (stmmac): a lot of cleanups

   - Ethernet PHYs:
      - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
        driver
      - Support bcm63268 GPHY power control
      - Support for Micrel lan8842 PHY and PTP
      - Support for Aquantia AQR412 and AQR115

   - CAN:
      - a large CAN-XL preparation work
      - reorganize raw_sock and uniqframe struct to minimize memory
        usage
      - rcar_canfd: update the CAN-FD handling

   - WiFi:
      - extended Neighbor Awareness Networking (NAN) support
      - S1G channel representation cleanup
      - improve S1G support

   - WiFi drivers:
      - Intel (iwlwifi):
         - major refactor and cleanup
      - Broadcom (brcm80211):
         - support for AP isolation
      - RealTek (rtw88/89) rtw88/89:
         - preparation work for RTL8922DE support
      - MediaTek (mt76):
         - HW restart improvements
         - MLO support
      - Qualcomm/Atheros (ath10k):
         - GTK rekey fixes

   - Bluetooth drivers:
      - btusb: support for several new IDs for MT7925
      - btintel: support for BlazarIW core
      - btintel_pcie: support for _suspend() / _resume()
      - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"

* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits)
  net: stmmac: Add support for Allwinner A523 GMAC200
  dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
  Revert "Documentation: net: add flow control guide and document ethtool API"
  octeontx2-pf: fix bitmap leak
  octeontx2-vf: fix bitmap leak
  net/mlx5e: Use extack in set rxfh callback
  net/mlx5e: Introduce mlx5e_rss_params for RSS configuration
  net/mlx5e: Introduce mlx5e_rss_init_params
  net/mlx5e: Remove unused mdev param from RSS indir init
  net/mlx5: Improve QoS error messages with actual depth values
  net/mlx5e: Prevent entering switchdev mode with inconsistent netns
  net/mlx5: HWS, Generalize complex matchers
  net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  selftests/net: add tcp_port_share to .gitignore
  Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
  net: add NUMA awareness to skb_attempt_defer_free()
  net: use llist for sd->defer_list
  net: make softnet_data.defer_count an atomic
  selftests: drv-net: psp: add tests for destroying devices
  selftests: drv-net: psp: add test for auto-adjusting TCP MSS
  ...
2025-10-02 15:17:01 -07:00
Eduard Zingerman
2ce61c63e7 selftests/bpf: Add tests for rejection of ALU ops with negative offsets
Define a simple program using MOD, DIV, ADD instructions,
make sure that the program is rejected if invalid offset
field is used for instruction.

These are test cases for
commit 55c0ced59f ("bpf: Reject negative offsets for ALU ops")
Link: https://lore.kernel.org/all/tencent_70D024BAE70A0A309A4781694C7B764B0608@qq.com/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 13:34:08 -07:00
Linus Torvalds
05a54fa773 sound updates for 6.18-rc1
It's been relatively calm in this cycle from the feature POV, but
 there were lots of cleanup works in the wide-range of code for
 converting with the auto-cleanup macros like guard().
 The mostly user-visible changes are the support of a couple of new
 compress-offload API extensions, and the support of new ASoC codec /
 platform drivers as well as USB-audio quirks.
 
 Here we go with some highlights:
 
 Core:
  - Compress-offload API extension for 64bit timestamp support
  - Compress-offload API extension for OPUS codec support
  - Workaround for PCM locking issue with PREEMPT_RT and softirq
  - KCSAN warning fix for ALSA sequencer core
 
 ASoC:
  - Continued cleanup works for ASoC core APIs
  - Lots of cleanups and conversions of DT bindings
  - Substantial maintainance work on the Intel AVS drivers
  - Support for Qualcomm Glymur and PM4125, Realtek RT1321, Shanghai
    FourSemi FS2104/5S, Texas Instruments PCM1754 and TAS2783A
  - Remove support for TI WL1273 for old Nokia systems
 
 USB-audio:
  - Support for Tascam US-144mkII, Presonus S1824c support
  - More flexible quirk option handling
  - Fix for USB MIDI timer bug triggered by fuzzer
 
 Others:
  - A large series of cleanups with guard() & co macros over (non-ASoC)
    sound drivers (PCI, ISA, HD-audio, USB-audio, drivers, etc)
  - TAS5825 HD-audio side-codec support
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmjby2wOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE+MKQ//cD8GYtfavLC6/mpW2jftcm08Zhzxi8AyuVzC
 0Wr2kwdNvK1F6zhzkXOx6TEQz0PAXzdVsqkmxsBEHGKHxGVNYr5wQ2ITqkm9eR6h
 el2JhajzLM988kMgJi/hGsTPxz2wJk4wuhUT3kST5GHpecPC/X/3r4WRIpMBoDBA
 y9KjEGJoSZCg7uBVoWBDRSHRpvbgmKrp4QpMCfcZ+DGy8fA3t+WGL1py9xxYQYug
 nGf4Q6Qto9Gj/lVefhm85vd1B+AHN4AgS21KLAyOGBIpu7kPmw1ujG/A8tsEbhaU
 DHSZusqqsWEHIy2XYBoVOeMaYcB94Ik3A4snzUe5/TbQkmM4MCQbhJ0euiGNHAzB
 e/mNUP0lFbX595gAK8AVsVnvz7Jzw00ov9b4w66g5Xq/EjM5pb0R8RyPooEujbw2
 ZbNI5SHuJ1i7v3Kqfoh6pUPPu2d4dlLxY68xDAID/DvP2DHcjYyiVE+RjLX/4b7D
 RzDKPqt0Pmckwx0FQyRuGCQWnqyoQ93bp84R29PxfT15Lot6gHdJh84guYqMLnWH
 B7VdV++O9UeS+6DmZfveDTvU+NcuRCGnnaadyJeuqB9qEqqbrXsTzr4XCZE1Hwrs
 WY9QmQeVYU4SPlT6r22Q19dQLToFRQoemHMTc4Q+hRx7YAKebynXQIzM1s3d+D/X
 NviZ8nk=
 =3Dqh
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "It's been relatively calm in this cycle from the feature POV, but
  there were lots of cleanup works in the wide-range of code for
  converting with the auto-cleanup macros like guard().

  The mostly user-visible changes are the support of a couple of new
  compress-offload API extensions, and the support of new ASoC codec /
  platform drivers as well as USB-audio quirks.

  Here we go with some highlights:

  Core:
   - Compress-offload API extension for 64bit timestamp support
   - Compress-offload API extension for OPUS codec support
   - Workaround for PCM locking issue with PREEMPT_RT and softirq
   - KCSAN warning fix for ALSA sequencer core

  ASoC:
   - Continued cleanup works for ASoC core APIs
   - Lots of cleanups and conversions of DT bindings
   - Substantial maintainance work on the Intel AVS drivers
   - Support for Qualcomm Glymur and PM4125, Realtek RT1321, Shanghai
     FourSemi FS2104/5S, Texas Instruments PCM1754 and TAS2783A
   - Remove support for TI WL1273 for old Nokia systems

  USB-audio:
   - Support for Tascam US-144mkII, Presonus S1824c support
   - More flexible quirk option handling
   - Fix for USB MIDI timer bug triggered by fuzzer

  Others:
   - A large series of cleanups with guard() & co macros over (non-ASoC)
     sound drivers (PCI, ISA, HD-audio, USB-audio, drivers, etc)
   - TAS5825 HD-audio side-codec support"

* tag 'sound-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (454 commits)
  ALSA: usb-audio: don't hardcode gain for output channel of Presonus Studio
  ALSA: usb-audio: add the initial mix for Presonus Studio 1824c
  ALSA: doc: improved docs about quirk_flags in snd-usb-audio
  ALSA: usb-audio: make param quirk_flags change-able in runtime
  ALSA: usb-audio: improve module param quirk_flags
  ALSA: usb-audio: add two-way convert between name and bit for QUIRK_FLAG_*
  ALSA: usb-audio: fix race condition to UAF in snd_usbmidi_free
  ALSA: usb-audio: add mono main switch to Presonus S1824c
  ALSA: compress: document 'chan_map' member in snd_dec_opus
  ASoC: cs35l56: Add support for CS35L56 B2 silicon
  ASoC: cs35l56: Set fw_regs table after getting REVID
  ALSA: hda/realtek: Add quirk for HP Spectre 14t-ea100
  ASoc: tas2783A: Fix an error code in probe()
  ASoC: tlv320aic3x: Fix class-D initialization for tlv320aic3007
  ASoC: qcom: sc8280xp: use sa8775p/ subdir for QCS9100 / QCS9075
  ASoC: stm32: sai: manage context in set_sysclk callback
  ASoC: renesas: msiof: ignore 1st FSERR
  ASoC: renesas: msiof: Add note for The possibility of R/L opposite Capture
  ASoC: renesas: msiof: setup both (Playback/Capture) in the same time
  ASoC: renesas: msiof: tidyup DMAC stop timing
  ...
2025-10-02 11:37:19 -07:00
Linus Torvalds
e1b1d03cee for-6.18/block-20250929
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmjbLCgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoY0D/9J+11BC88pBxCrLKv/V2TwCNokRMi0dU3L
 r3EUdA46k0oXmvb6ueZqIcfY2e+IX7rdQkaRbh1zRdsNejqHo4548C3ePWGdBAcM
 OdNEGfpehO0aD0td1+mK/NxoJMLhbs5QraPanz+SOkGZOKeF+vGCga5PUDivsr5J
 16T9yb7i+isENLdAc2RJbZVyAphqHQlo5GHi5ZIKOVi5cNt8GU/q2sQl7NYmGvHd
 aq37svvZHFOhLRajP959Fw9WOxEYITewzQ4UYf1FZjUodJUxO+vCnP0ooBQRlyu8
 1B4PYWwSE+Vn3GkQE0Om+mzo9AVPOiLmoAWGxdgJBMyEkZndocr46XEslXOufQ1Z
 T3Gu19G6jCxcyByNVhjVnaajYKmvSQAy1w75m4XlfqTRm4f9Om+LAJavUk3RuaOL
 7lXKQ7Ql1/Tby9Jmf8afjYYXXotNDNku6rz2P3qtOwAA26mNJfgVt0rO+8XGRDe9
 ioLbCkTjslYMc/Oh4jSsbrspsVALbaQMq/Dmah8k0EWb4QAHVgCJyGBoff3hOboI
 jD6B1enaKOQVgcjWcjm/FjOk3jv2h3v4X26YWQZTvEc/1PnSnST78Zi/ePhzDdmt
 sBALUAS37TfTgNMzrhbHl5Zs13k0C0XyANuayuKuo5hlNnC1wbdap+5FZJOmpuOB
 YT+VkYnaOA==
 =kOmc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - NVMe pull request via Keith:
     - FC target fixes (Daniel)
     - Authentication fixes and updates (Martin, Chris)
     - Admin controller handling (Kamaljit)
     - Target lockdep assertions (Max)
     - Keep-alive updates for discovery (Alastair)
     - Suspend quirk (Georg)

 - MD pull request via Yu:
     - Add support for a lockless bitmap.

       A key feature for the new bitmap are that the IO fastpath is
       lockless. If a user issues lots of write IO to the same bitmap
       bit in a short time, only the first write has additional overhead
       to update bitmap bit, no additional overhead for the following
       writes.

       By supporting only resync or recover written data, means in the
       case creating new array or replacing with a new disk, there is no
       need to do a full disk resync/recovery.

 - Switch ->getgeo() and ->bios_param() to using struct gendisk rather
   than struct block_device.

 - Rust block changes via Andreas. This series adds configuration via
   configfs and remote completion to the rnull driver. The series also
   includes a set of changes to the rust block device driver API: a few
   cleanup patches, and a few features supporting the rnull changes.

   The series removes the raw buffer formatting logic from
   `kernel::block` and improves the logic available in `kernel::string`
   to support the same use as the removed logic.

 - floppy arch cleanups

 - Reduce the number of dereferencing needed for ublk commands

 - Restrict supported sockets for nbd. Mostly done to eliminate a class
   of issues perpetually reported by syzbot, by using nonsensical socket
   setups.

 - A few s390 dasd block fixes

 - Fix a few issues around atomic writes

 - Improve DMA interation for integrity requests

 - Improve how iovecs are treated with regards to O_DIRECT aligment
   constraints.

   We used to require each segment to adhere to the constraints, now
   only the request as a whole needs to.

 - Clean up and improve p2p support, enabling use of p2p for metadata
   payloads

 - Improve locking of request lookup, using SRCU where appropriate

 - Use page references properly for brd, avoiding very long RCU sections

 - Fix ordering of recursively submitted IOs

 - Clean up and improve updating nr_requests for a live device

 - Various fixes and cleanups

* tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits)
  s390/dasd: enforce dma_alignment to ensure proper buffer validation
  s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request
  ublk: remove redundant zone op check in ublk_setup_iod()
  nvme: Use non zero KATO for persistent discovery connections
  nvmet: add safety check for subsys lock
  nvme-core: use nvme_is_io_ctrl() for I/O controller check
  nvme-core: do ioccsz/iorcsz validation only for I/O controllers
  nvme-core: add method to check for an I/O controller
  blk-cgroup: fix possible deadlock while configuring policy
  blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path
  blk-mq: Fix more tag iteration function documentation
  selftests: ublk: fix behavior when fio is not installed
  ublk: don't access ublk_queue in ublk_unmap_io()
  ublk: pass ublk_io to __ublk_complete_rq()
  ublk: don't access ublk_queue in ublk_need_complete_req()
  ublk: don't access ublk_queue in ublk_check_commit_and_fetch()
  ublk: don't pass ublk_queue to ublk_fetch()
  ublk: don't access ublk_queue in ublk_config_io_buf()
  ublk: don't access ublk_queue in ublk_check_fetch_buf()
  ublk: pass q_id and tag to __ublk_check_and_get_req()
  ...
2025-10-02 10:16:56 -07:00
Linus Torvalds
c0f53f0d2e linux_kselftest-next-6.18-rc1
- Fixes watchdog test to exit when device doesn't support keep alive
 - Fix missing header build complaints during out of tree build
 - A few minor fixes to git ignore
 - MAINTAINERS file change to update dma_map_benchmark
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmjbD5YACgkQCwJExA0N
 QxzUPw//U/sUb/+3eAOvF3A698j54u7Phtm9VTzyXdGmZ3tXDsnkP1spRoK32F5d
 kO2qhTIGPoo8FKS8CDL8a8UUr3TvqEd7+8Hcv37s/JluYCDnLuRJ87XTAXoH2vWy
 LkLD41Gt2BySH//99ciBB+N3DuAKc/+Ln5YXgPTIkzOEm+8a6Ux0lW5RjF+RzkH9
 jGXyoSFdeQcWKEz/CJ9Mp7gwgwig9KTAIL62hB69aP35Wjbw+le1POCs6bo4UuzU
 A+XdNit6Hau/JKS+Sw2/lpMcMz+Aev4gdFo2FukPuJSLJMR7KVNyiCVuIHIVI1oC
 v4yj5/QfUgZzWojweI9IVvt4BCuiSh3ke26cxO+xYSWTIlSbBDpAYRe3AVSp5xGq
 aPhaviqlgDLzxVuXTmGCyrxhpybgxGOQGmo6NulJ8wGaQ1GLllptB6hKnKrLBTrp
 jIwKUVBcl/xIeP4sd8vIvTNTyL56/XJYnouZYF+IXkX/8BY9L4OJycjPNgvZ0+IK
 LsTVXb2gdMghEGUFR2XNwoQljpyPSUFFcqoteOCaVhHaeD0CEAGBVSQNa7m3MtkO
 0q6pu/XfQAziVd6/dllDGueWTjzqvNChTipSe9dSgolZgKrvc2t4bw48rB+oORPe
 f/cBIWjpyZlwpggb8aSC4KAIkTG8ugz4EIOV6t3a7IAx6K71zwQ=
 =Z4UE
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:

 - Fix watchdog test to exit when device doesn't support keep-alive

 - Fix missing header build complaints during out of tree build

 - A few minor fixes to git ignore

 - MAINTAINERS file change to update dma_map_benchmark

* tag 'linux_kselftest-next-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  MAINTAINERS: add myself and Barry to dma_map_benchmark maintainers
  selftests/kexec: Ignore selftest binary
  selftests: always install UAPI headers to the correct directory
  selftests/kselftest_harness: Add harness-selftest.expected to TEST_FILES
  selftests: watchdog: skip ping loop if WDIOF_KEEPALIVEPING not supported
2025-10-01 20:41:10 -07:00
Linus Torvalds
30bbcb4470 linux_kselftest-kunit-6.18-rc1
- A seven patch series adds a new parameterized test features
   KUnit parameterized tests currently support two primary methods for
   getting parameters:
     1.  Defining custom logic within a generate_params() function.
     2.  Using the KUNIT_ARRAY_PARAM() and KUNIT_ARRAY_PARAM_DESC()
         macros with a pre-defined static array and passing
         the created *_gen_params() to KUNIT_CASE_PARAM().
 
     These methods present limitations when dealing with dynamically
     generated parameter arrays, or in scenarios where populating parameters
     sequentially via generate_params() is inefficient or overly complex.
 
     These limitations are fixed with a parameterized test method.
 
 - Fixes issues in kunit build artifacts cleanup,
 - Fixes parsing skipped test problem in kselftest framework,
 - Enables PCI on UML without triggering WARN()
 - a few other fixes and adds support for new configs such as MIPS
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmjbB0QACgkQCwJExA0N
 QxxpUQ/8CslEjTv2+/LA122eGDtg8np61W1MTNEQslYyiobhZ9CXeFw4yJlTTb0w
 hShFK1pGAWgkVCVKIJOaeItY0hF3BIxyVtlQxDUKvukpMnLvZRI0KhG7p8aYnCq+
 jfQRy4gqwjaHyLQekQ1v6vRdHdTfh5mB6OUOCYa7QPQuKhOkBOaq2ZJ+9eFnkPXl
 KUGTcC3pL0jQQmaSgo7zFUbdGSq0JZkNbpMj0fAYB0zs+MSpfkAnQYEw3qaxU1/Z
 lu+CQdom+77Kt0r7UyEb6xUBeRveqYAWv6oFKq3CBzhlEGCkqVv6zgNg48qMC0QO
 cVW0E61u8bVAalhb7hOT3QsEp1k01Lr2N9/VBLP4LRb2HMlD2qlXDo6ftBjNgx/y
 m5Kukh1wQoOTaTJekdDMlaRXwApdn3MegheXE83n4dr44C5oyhlR8fFk/0Y6gSEU
 RKUg8ZwUNUhCdw4VgBn8zoSkK18D74feRoustmuMS00I/8to3iGvQ7kn3nz2fPMb
 AaZ6a2K95gk9TrD4UZdHF1aPHDudn3e8g6LdSOV/NoRNI9H+fXfhvxcLwW8WCqOw
 u5qfTxNUk4mPQrCs9hcDnWm9Ppuu5YeY989rKkA7j4z71dhyMOPynpOM+vXRijB8
 9zRcvyls87tgDAquIZvTp6Q/oEmi60OF40DRC9MjgOw0Iw7feIc=
 =VfHu
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit updates from Shuah Khan:

 - New parameterized test features

   KUnit parameterized tests supported two primary methods for getting
   parameters:

    - Defining custom logic within a generate_params() function.

    - Using the KUNIT_ARRAY_PARAM() and KUNIT_ARRAY_PARAM_DESC() macros
      with a pre-defined static array and passing the created
      *_gen_params() to KUNIT_CASE_PARAM().

   These methods present limitations when dealing with dynamically
   generated parameter arrays, or in scenarios where populating
   parameters sequentially via generate_params() is inefficient or
   overly complex.

   These limitations are fixed with a parameterized test method

 - Fix issues in kunit build artifacts cleanup

 - Fix parsing skipped test problem in kselftest framework

 - Enable PCI on UML without triggering WARN()

 - a few other fixes and adds support for new configs such as MIPS

* tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Extend kconfig help text for KUNIT_UML_PCI
  rust: kunit: allow `cfg` on `test`s
  kunit: qemu_configs: Add MIPS configurations
  kunit: Enable PCI on UML without triggering WARN()
  Documentation: kunit: Document new parameterized test features
  kunit: Add example parameterized test with direct dynamic parameter array setup
  kunit: Add example parameterized test with shared resource management using the Resource API
  kunit: Enable direct registration of parameter arrays to a KUnit test
  kunit: Pass parameterized test context to generate_params()
  kunit: Introduce param_init/exit for parameterized test context management
  kunit: Add parent kunit for parameterized test context
  kunit: tool: Accept --raw_output=full as an alias of 'all'
  kunit: tool: Parse skipped tests from kselftest.h
  kunit: Always descend into kunit directory during build
2025-10-01 19:15:11 -07:00
Linus Torvalds
f13ee7cc2d Thermal control updates for 6.18-rc1
- Add new thermal driver for the Renesas RZ/G3S SoC (Claudiu Beznea)
 
  - Add new thermal driver for the Renesas RZ/G3E SoC (John Madieu)
 
  - Add support for new platform power slider feature to the Intel
    int340x driver (Srinivas Pandruvada).
 
  - Add new Tegra114-specific SOCTHERM driver and document Tegra114
    SOCTHERM Thermal Management System in DT bindings (Svyatoslav Ryhel)
 
  - Add temperature sensor channel to thermal-generic-adc (Svyatoslav
    Ryhel)
 
  - Add support for per-SoC default trim values to the Renesas rcar_gen3
    thermal driver, use it for adding R-Car V4H default trim values, fix
    a comment typo in that driver and document Gen4 support in its
    Kconfig entry (Marek Vasut)
 
  - Fix mapping SoCs to generic Gen4 entry in the Renesas rcar_gen3
    thermal driver (Wolfram Sang)
 
  - Document the TSU unit in the r9a08g045-tsu and r9a09g047-tsu DT
    bindings (Claudiu Beznea, John Madieu)
 
  - Make LMH select QCOM_SCM and add missing IRQ includes to the
    qcom/lmh thermal driver (Dmitry Baryshkov)
 
  - Fix incorrect error message in the qcom/lmh thermal driver (Sumeet
    Pawnikar)
 
  - Add QCS615 compatible to tsens thermal DT bindings (Gaurav Kohli)
 
  - Document the Glymur temperature sensor in qcom-tsens thermal DT
    bindings (Manaf Meethalavalappu Pallikunhi)
 
  - Make k3_j72xx_bandgap thermal driver register the thermal sensor
    with hwmon (Michael Walle)
 
  - Tighten GRF requirements in the rockchip thermal DT bindings,
    silence a GRF warning in the rockchip thermal driver and unify
    struct rockchip_tsadc_chip format in it (Sebastian Reichel)
 
  - Update the Step-wise thermal governor to allow it to reduce the
    cooling level earlier if thermal zone temperature is dropping
    and clean it up (Rafael Wysocki)
 
  - Clean up the thermal testing code (Rafael Wysocki)
 
  - Assorted cleanups of thermal drivers (Jiapeng Chong, Salah Triki,
    Osama Abdelkader)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmjamT4SHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1BOwH/3ZL/GN+iJ7oFU/jOW4EktnLowCMgIKF
 DURhmrZRoqJzZUY8PXysUekXYmniT6Itzv2h4psKMihn/khtapmkRkE23exmR9Gy
 h5V4Nxbx4Ero6YfZ3KT4+AY1x2eRO6aDY29EdIAVeWacwOleTPhUvP70d5dZTHdy
 QYi21qZarrkG3uiiNrBfgRW2IAcvQ37G1dy6fxJrURhUG3E9u2P9C9REBwQfkxBj
 LxaV42oiQNU0S5+r4/0f1UbqR88FTh/X4xmpEFHK+zWewKYdYhXHBdCYlcegH7oO
 WwQW8u3ofR/COpFNPdWrpDb9fYia0fbcJ131CVmAy8RySVd4dBAPKDM=
 =okDy
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
 "These are mostly thermal driver updates, including new thermal drivers
  for Renesas RZ/G3S and Renesas RZ/G3E SoCs, a new power slider
  platform feature support in the Intel int340x thermal driver, a new
  Tegra114- specific SOCTHERM driver and more.

  There is also a Step-wise thermal governor update allowing it to start
  reducing cooling somewhat earlier if the temperature of the given
  thermal zone is falling down and a thermal testing code cleanup.

  Specifics:

   - Add new thermal driver for the Renesas RZ/G3S SoC (Claudiu Beznea)

   - Add new thermal driver for the Renesas RZ/G3E SoC (John Madieu)

   - Add support for new platform power slider feature to the Intel
     int340x driver (Srinivas Pandruvada).

   - Add new Tegra114-specific SOCTHERM driver and document Tegra114
     SOCTHERM Thermal Management System in DT bindings (Svyatoslav
     Ryhel)

   - Add temperature sensor channel to thermal-generic-adc (Svyatoslav
     Ryhel)

   - Add support for per-SoC default trim values to the Renesas
     rcar_gen3 thermal driver, use it for adding R-Car V4H default trim
     values, fix a comment typo in that driver and document Gen4 support
     in its Kconfig entry (Marek Vasut)

   - Fix mapping SoCs to generic Gen4 entry in the Renesas rcar_gen3
     thermal driver (Wolfram Sang)

   - Document the TSU unit in the r9a08g045-tsu and r9a09g047-tsu DT
     bindings (Claudiu Beznea, John Madieu)

   - Make LMH select QCOM_SCM and add missing IRQ includes to the
     qcom/lmh thermal driver (Dmitry Baryshkov)

   - Fix incorrect error message in the qcom/lmh thermal driver (Sumeet
     Pawnikar)

   - Add QCS615 compatible to tsens thermal DT bindings (Gaurav Kohli)

   - Document the Glymur temperature sensor in qcom-tsens thermal DT
     bindings (Manaf Meethalavalappu Pallikunhi)

   - Make k3_j72xx_bandgap thermal driver register the thermal sensor
     with hwmon (Michael Walle)

   - Tighten GRF requirements in the rockchip thermal DT bindings,
     silence a GRF warning in the rockchip thermal driver and unify
     struct rockchip_tsadc_chip format in it (Sebastian Reichel)

   - Update the Step-wise thermal governor to allow it to reduce the
     cooling level earlier if thermal zone temperature is dropping and
     clean it up (Rafael Wysocki)

   - Clean up the thermal testing code (Rafael Wysocki)

   - Assorted cleanups of thermal drivers (Jiapeng Chong, Salah Triki,
     Osama Abdelkader)"

* tag 'thermal-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (37 commits)
  thermal/drivers/renesas/rzg3e: Fix add thermal driver for the Renesas RZ/G3E SoC
  dt-bindings: thermal: qcom-tsens: Document the Glymur temperature Sensor
  thermal/drivers/renesas/rzg3e: Add thermal driver for the Renesas RZ/G3E SoC
  dt-bindings: thermal: r9a09g047-tsu: Document the TSU unit
  thermal/drivers/thermal-generic-adc: Add temperature sensor channel
  dt-bindings: thermal: rockchip: Tighten grf requirements
  thermal/drivers/rockchip: Shut up GRF warning
  thermal/drivers/rockchip: Unify struct rockchip_tsadc_chip format
  thermal/drivers/renesas/rzg3s: Add thermal driver for the Renesas RZ/G3S SoC
  dt-bindings: thermal: r9a08g045-tsu: Document the TSU unit
  thermal/drivers/k3_j72xx_bandgap: Register sensors with hwmon
  thermal/drivers/rcar_gen3: Fix mapping SoCs to generic Gen4 entry
  thermal/drivers/tegra: Add Tegra114 specific SOCTHERM driver
  dt-bindings: thermal: add Tegra114 soctherm header
  thermal/drivers/tegra/soctherm-fuse: Prepare calibration for Tegra114 support
  dt-bindings: thermal: Document Tegra114 SOCTHERM Thermal Management System
  thermal/drivers/rcar_gen3: Document Gen4 support in Kconfig entry
  thermal/drivers/rcar_gen3: Fix comment typo
  drivers/thermal/qcom/lmh: Fix incorrect error message
  thermal/drivers/qcom/lmh: Add missing IRQ includes
  ...
2025-10-01 16:33:14 -07:00
Eric Biggers
f09f57c746 selftests/bpf: Add test for libbpf_sha256()
Test that libbpf_sha256() calculates SHA-256 digests correctly.

Tested with:
    make -C tools/testing/selftests/bpf/
    ./tools/testing/selftests/bpf/test_progs -t sha256 -v

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 16:00:25 -07:00
KaFai Wan
8709c16852 selftests/bpf: Add test for BPF_NEG alu on CONST_PTR_TO_MAP
Add a test case for BPF_NEG operation on CONST_PTR_TO_MAP. Tests if
BPF_NEG operation on map_ptr is rejected in unprivileged mode and is a
scalar value and do not trigger Oops in privileged mode.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251001191739.2323644-3-listout@listout.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:55:00 -07:00
Jiri Olsa
0c342bfc99 selftests/bpf: Fix realloc size in bpf_get_addrs
We will segfault once we call realloc in bpf_get_addrs due to
wrong size argument.

Fixes: 6302bdeb91 ("selftests/bpf: Add a kprobe_multi subtest to use addrs instead of syms")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:37:30 -07:00
Jiri Olsa
4b2b38ea20 selftests/bpf: Fix typo in subtest_basic_usdt after merge conflict
Use proper 'called' variable name.

Fixes: ae28ed4578 ("Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/aN0JVRynHxqKy4lw@krava/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:37:30 -07:00
Jiri Olsa
f83fcec784 selftests/bpf: Fix open-coded gettid syscall in uprobe syscall tests
Commit 0e2fb011a0 ("selftests/bpf: Clean up open-coded gettid syscall
invocations") addressed the issue that older libc may not have a gettid()
function call wrapper for the associated syscall.

The uprobe syscall tests got in from tip tree, using sys_gettid in there.

Fixes: 0e2fb011a0 ("selftests/bpf: Clean up open-coded gettid syscall invocations")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:37:29 -07:00
Paolo Abeni
f1455695d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).

Conflicts:

tools/testing/selftests/drivers/net/bonding/Makefile
  87951b5664 selftests: bonding: add test for passive LACP mode
  c2377f1763 selftests: bonding: add test for LACP actor port priority

Adjacent changes:

drivers/net/ethernet/cadence/macb.h
  fca3dc859b net: macb: remove illusion about TBQPH/RBQPH being per-queue
  89934dbf16 net: macb: Add TAPRIO traffic scheduling support

drivers/net/ethernet/cadence/macb_main.c
  fca3dc859b net: macb: remove illusion about TBQPH/RBQPH being per-queue
  89934dbf16 net: macb: Add TAPRIO traffic scheduling support

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-01 10:14:49 +02:00
Linus Torvalds
50c19e20ed nolibc changes for v6.18
Only contains small bugfixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTg4lxklFHAidmUs57B+h1jyw5bOAUCaNmibgAKCRDB+h1jyw5b
 OP8VAQDzrMGp77iFvfAFRsFuGYXLpWgNd8aWzFVMmPz/Kts3yAD/X1RnJsak5l5g
 iiuZTPijrZeyQaO0uX4UV/XVHzxU2Ac=
 =IenX
 -----END PGP SIGNATURE-----

Merge tag 'nolibc-20250928-for-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc

Pull nolibc updates from Thomas Weißschuh:
 "Only small bugfixes and cleanups"

* tag 'nolibc-20250928-for-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc:
  tools/nolibc: add stdbool.h to nolibc includes
  tools/nolibc: make time_t robust if __kernel_old_time_t is missing in host headers
  selftests/nolibc: remove outdated comment about construct order
  selftests/nolibc: fix EXPECT_NZ macro
  tools/nolibc: drop wait4() support
  kselftest/arm64: tpidr2: Switch to waitpid() over wait4()
  tools/nolibc: fold llseek fallback into lseek()
  tools/nolibc: remove __nolibc_enosys() fallback from fork functions
  tools/nolibc: remove __nolibc_enosys() fallback from dup2()
  tools/nolibc: remove __nolibc_enosys() fallback from *at() functions
  tools/nolibc: remove __nolibc_enosys() fallback from time64-related functions
  tools/nolibc: use tabs instead of spaces for indentation
  tools/nolibc: avoid error in dup2() if old fd equals new fd
  selftests/nolibc: always compile the kernel with GCC
  selftests/nolibc: don't pass CC to toplevel Makefile
  selftests/nolibc: deduplicate invocations of toplevel Makefile
  selftests/nolibc: be more specific about variables affecting nolibc-test
  tools/nolibc: fix error return value of clock_nanosleep()
2025-09-30 19:18:17 -07:00
Linus Torvalds
ae28ed4578 bpf-next-6.18
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v
 bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/
 UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX
 +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9
 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT
 tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG
 TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH
 Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB
 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy
 n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p
 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd
 c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg=
 =LeQi
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc
   (Amery Hung)

   Applied as a stable branch in bpf-next and net-next trees.

 - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki)

   Also a stable branch in bpf-next and net-next trees.

 - Enforce expected_attach_type for tailcall compatibility (Daniel
   Borkmann)

 - Replace path-sensitive with path-insensitive live stack analysis in
   the verifier (Eduard Zingerman)

   This is a significant change in the verification logic. More details,
   motivation, long term plans are in the cover letter/merge commit.

 - Support signed BPF programs (KP Singh)

   This is another major feature that took years to materialize.

   Algorithm details are in the cover letter/marge commit

 - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich)

 - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan)

 - Fix USDT SIB argument handling in libbpf (Jiawei Zhao)

 - Allow uprobe-bpf program to change context registers (Jiri Olsa)

 - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and
   Puranjay Mohan)

 - Allow access to union arguments in tracing programs (Leon Hwang)

 - Optimize rcu_read_lock() + migrate_disable() combination where it's
   used in BPF subsystem (Menglong Dong)

 - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred
   execution of BPF callback in the context of a specific task using the
   kernel’s task_work infrastructure (Mykyta Yatsenko)

 - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya
   Dwivedi)

 - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi)

 - Improve the precision of tnum multiplier verifier operation
   (Nandakumar Edamana)

 - Use tnums to improve is_branch_taken() logic (Paul Chaignon)

 - Add support for atomic operations in arena in riscv JIT (Pu Lehui)

 - Report arena faults to BPF error stream (Puranjay Mohan)

 - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin
   Monnet)

 - Add bpf_strcasecmp() kfunc (Rong Tao)

 - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao
   Chen)

* tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits)
  libbpf: Replace AF_ALG with open coded SHA-256
  selftests/bpf: Add stress test for rqspinlock in NMI
  selftests/bpf: Add test case for different expected_attach_type
  bpf: Enforce expected_attach_type for tailcall compatibility
  bpftool: Remove duplicate string.h header
  bpf: Remove duplicate crypto/sha2.h header
  libbpf: Fix error when st-prefix_ops and ops from differ btf
  selftests/bpf: Test changing packet data from kfunc
  selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
  selftests/bpf: Refactor stacktrace_map case with skeleton
  bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
  selftests/bpf: Fix flaky bpf_cookie selftest
  selftests/bpf: Test changing packet data from global functions with a kfunc
  bpf: Emit struct bpf_xdp_sock type in vmlinux BTF
  selftests/bpf: Task_work selftest cleanup fixes
  MAINTAINERS: Delete inactive maintainers from AF_XDP
  bpf: Mark kfuncs as __noclone
  selftests/bpf: Add kprobe multi write ctx attach test
  selftests/bpf: Add kprobe write ctx attach test
  selftests/bpf: Add uprobe context ip register change test
  ...
2025-09-30 17:58:11 -07:00
Linus Torvalds
4b81e2eb9e Updates for the VDSO subsystem:
- Further consolidation of the VDSO infrastructure and the common data
     store.
 
   - Simplification of the related Kconfig logic
 
   - Improve the VDSO selftest suite
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaUJgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoT5VD/9l+TP42vJzwygPArefVM9QijAc4k+g
 iptaz5M+Lhk0BlwMUu502Tp1zcwGq5529LlE2P72DuYoZRtZKSFCGES2mye0JiAs
 A5Vk8d1zicg3IIGjYgw39shA9ghzuzV3gZdgHVqQ6024uSeKxUY3SqF156GGfPFG
 azXWDV7i4RGQvKLm2ye8m7JNYCE3P9f8Wh+GvPaq7qBlW1MsMCWC4FP5qu4iStf3
 mol9uA/EM2zSDwkpIpq6P5L2TpIH1dGDNGxVf/HG3J16UhNXzzXv/rxuKHKPNGWm
 MQZ8WdPjF+uwdUsNXaImC6aCE0hoQ1Ga7GE2YA3pMQYfySeiJ/fA5I1KfhlUCiHD
 oUgnb6PQfZjaBDrIODKcrFHAYc5pTOyJXnJ6q1Lc1yrOfoVcIpMaRXHO/phCeuwI
 61gfb9ggbFEH/7cL26Z9XGSqh4BDCKVP5Z+y98OnNEzV74ScpX13/szB3VgdN6LW
 tmTOeMPYCo4FQoPw8OV1xkeYOKD2JyqH+garnxDZNFjaVaeUkzlAMKz+1JUjxRZj
 UpUhZ8y5KFVjgKkqZHxhucr9cnwV/xd4n+IzZeG4l6XZp4VyirG+5QpAikGYe+Jq
 WFU4Ao1xvevyizrJEhoGJKznwtrNrsy6AQ+wLQxYiRR4aTnng8n9T/ry2DVvhJf8
 diYWxuRTR7KMxg==
 =i3UB
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull VDSO updates from Thomas Gleixner:

 - Further consolidation of the VDSO infrastructure and the common data
   store

 - Simplification of the related Kconfig logic

 - Improve the VDSO selftest suite

* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests: vDSO: Drop vdso_test_clock_getres
  selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
  selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
  selftests: vDSO: vdso_test_abi: Use explicit indices for name array
  selftests: vDSO: vdso_test_abi: Drop clock availability tests
  selftests: vDSO: vdso_test_abi: Use ksft_finished()
  selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
  selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
  vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
  vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
  vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
  vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
  vdso: Drop kconfig GENERIC_COMPAT_VDSO
  vdso: Drop kconfig GENERIC_VDSO_32
  riscv: vdso: Untangle Kconfig logic
  time: Build generic update_vsyscall() only with generic time vDSO
  vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
  vdso: Move ENABLE_COMPAT_VDSO from core to arm64
  ARM: VDSO: Remove cntvct_ok global variable
  vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30 16:58:21 -07:00
Linus Torvalds
c574fb2ed7 A set of updates for futexes and related selftests:
- Plug the ptrace_may_access() race against a concurrent exec() which
     allows to pass the check before the target's process transition in
     exec() by taking a read lock on signal->ext_update_lock.
 
   - A large set of cleanups and enhancement to the futex selftests. The
     bulk of changes is the conversion to the kselftest harness.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaS8QTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYof44EAC7amBzdJlaluVjLTpCV07G8K4bDkzz
 iN5vOm+vrxJlg7kq9L4CffuGKS8yFj7sgNkSexOv/+JUFp/E9WpV3CT+kqJNm71x
 bDPDAFR3reZQ7u27dzpciOTgtCEnq+hqeAovx2c1v2GuESvnTfqWTFSQBLfzqGW7
 leCiaR6VJ87+RB10ZrlxJDdrFnVbKBa48k1dZjy97ivkw+SUS5xSEadSBG2vw7Rd
 C+SQDa8Hhh7xAMJvySWkES6rfzZvC/ubkpq5DmjRZvHu/wS4wDNaKj06wEtbLxEA
 5eaemJVBeEbjC6GNBgba6cUmtrT45PQG7wNFoPmgJsO0nlnfwfmm713CAa9vK+dT
 MTUnGWKSpf0Vj1yHkzLeHer226jqfOe1lA4anRzD6FoZy2XaJiNCymQysfIrIWTG
 BCcA8za3W4PTiUHxDe5JGsbA4f57xDV+bL0SmjaXv5wSo/FX21iJBisBI2FFl531
 9XlgU3ssvUUfQfH6bPIdgcZrWD2Uc9W5aj2dZv+zN6LT8EhvLaQwcM2bxWbuJfWu
 wmU07LMZS9zVrC8B6T3KvrYwnIuETWRZiPhenDy+Shtn8UM1N/MmlNNmAEvdOYue
 V3o1BTm9wDWfSMJAq7q7erjl15QOgUyOpte1gTuzDbqtcWj132q31HDegtxlGIyu
 NLOx8otYdLEOqQ==
 =vUpN
 -----END PGP SIGNATURE-----

Merge tag 'locking-futex-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull futex updates from Thomas Gleixner:
 "A set of updates for futexes and related selftests:

   - Plug the ptrace_may_access() race against a concurrent exec() which
     allows to pass the check before the target's process transition in
     exec() by taking a read lock on signal->ext_update_lock.

   - A large set of cleanups and enhancement to the futex selftests. The
     bulk of changes is the conversion to the kselftest harness"

* tag 'locking-futex-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  selftest/futex: Fix spelling mistake "boundarie" -> "boundary"
  selftests/futex: Remove logging.h file
  selftests/futex: Drop logging.h include from futex_numa
  selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h
  selftests/futex: Refactor futex_priv_hash with kselftest_harness.h
  selftests/futex: Refactor futex_waitv with kselftest_harness.h
  selftests/futex: Refactor futex_requeue with kselftest_harness.h
  selftests/futex: Refactor futex_wait with kselftest_harness.h
  selftests/futex: Refactor futex_wait_private_mapped_file with kselftest_harness.h
  selftests/futex: Refactor futex_wait_unitialized_heap with kselftest_harness.h
  selftests/futex: Refactor futex_wait_wouldblock with kselftest_harness.h
  selftests/futex: Refactor futex_wait_timeout with kselftest_harness.h
  selftests/futex: Refactor futex_requeue_pi_signal_restart with kselftest_harness.h
  selftests/futex: Refactor futex_requeue_pi_mismatched_ops with kselftest_harness.h
  selftests/futex: Refactor futex_requeue_pi with kselftest_harness.h
  selftests: kselftest: Create ksft_print_dbg_msg()
  futex: Don't leak robust_list pointer on exec race
  selftest/futex: Compile also with libnuma < 2.0.16
  selftest/futex: Reintroduce "Memory out of range" numa_mpol's subtest
  selftest/futex: Make the error check more precise for futex_numa_mpol
  ...
2025-09-30 16:07:10 -07:00
Linus Torvalds
1d17e808cf Two fixes for RSEQ:
1) Protect the event mask modification against the membarrier() IPI as
      otherwise the RmW operation is unprotected and events might be lost.
 
   2) Fix the weak symbol reference in rseq selftests
 
      The current weak RSEQ symbols definitions which were added to allow
      static linkage are not working correctly as the effectively re-define
      the glibc symbols leading to multiple versions of the symbols when
      compiled with -fno-common. Mark them as 'extern' to convert them from
      weak symbol definitions to weak symbol references. That works with
      static and dynamic linkage independent of -fcommon and -fno-common.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaQc4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTj1D/0Un97b1GJRNUjmvrhSE8uFy5RxgYdX
 R2dJkfuQSF0fx4MzcrUg9b2KeklwDkYtq0YkmbStQeihp4jYpgODNFAdFdB4Cfhv
 e1JfgQ6iTS9zWq625ImygHNZv5NOiDBxf9jtsRTUA3lv804taoS/jHost4Qi9HwO
 jO9YoBrSmxZIZA1cPO9mA/AEpAdhKFxpwZK6yNNA4lH83gVhhSXKyEOFJQ0nWlZJ
 pycEmwM+v9iW67QZEbWkEBPukflnMXYtcUDLmawYDsMZ2gsB4yPYvZwHNTeiB7gu
 n0dfZH4jfw4DkO7MuU1CV1xlXhTO4sJQjmRsJuu2ypnZFVPh6iR+J2DxnZ5PcgvR
 LifuEa/+mN/LM5/gjohDJwny10EXysrEDJ/vZUR4BBTdTsWK6FwLdSnTuP8WF3wo
 LyY+PfeUmosHYOAa6Q8pLJUJzgXHtzlJLjVhYgb61dQjlgFuRB/0ksA8VoVeVhSz
 6A5v/rcVoZIhRj/fxzsJOXtfP24ghIXtRFyfeDeNsWWlqEuL/X28xYf45YBjws7F
 zApbCqeg+JYwJRaWmCDxjmmLsyMvpH3yujvlPZxuit6TX2PfxYG5CNpIxt192wZ2
 fFk8qcsgJ5x/9ah5mpio7wHDESiDkdaPanYA+ercdXogbzI/8Y7fUua8RgJKm4Mj
 3eo3tY3e8q9A3A==
 =VV+N
 -----END PGP SIGNATURE-----

Merge tag 'core-rseq-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull rseq updates from Thomas Gleixner:
 "Two fixes for RSEQ:

   - Protect the event mask modification against the membarrier() IPI as
     otherwise the RmW operation is unprotected and events might be lost

   - Fix the weak symbol reference in rseq selftests

     The current weak RSEQ symbols definitions which were added to allow
     static linkage are not working correctly as they effectively
     re-define the glibc symbols leading to multiple versions of the
     symbols when compiled with -fno-common.

     Mark them as 'extern' to convert them from weak symbol definitions
     to weak symbol references. That works with static and dynamic
     linkage independent of -fcommon and -fno-common"

* tag 'core-rseq-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq/selftests: Use weak symbol reference, not definition, to link with glibc
  rseq: Protect event mask against membarrier IPI
2025-09-30 15:06:33 -07:00
Linus Torvalds
e4dcbdff11 Performance events updates for v6.18:
Core perf code updates:
 
  - Convert mmap() related reference counts to refcount_t. This
    is in reaction to the recently fixed refcount bugs, which
    could have been detected earlier and could have mitigated
    the bug somewhat. (Thomas Gleixner, Peter Zijlstra)
 
  - Clean up and simplify the callchain code, in preparation
    for sframes. (Steven Rostedt, Josh Poimboeuf)
 
 Uprobes updates:
 
  - Add support to optimize usdt probes on x86-64, which
    gives a substantial speedup. (Jiri Olsa)
 
  - Cleanups and fixes on x86 (Peter Zijlstra)
 
 PMU driver updates:
 
  - Various optimizations and fixes to the Intel PMU driver
    (Dapeng Mi)
 
 Misc cleanups and fixes:
 
  - Remove redundant __GFP_NOWARN (Qianfeng Rong)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWpGIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iHvxAAvO8qWbbhUdF3EZaFU0Wx6oh5KBhImU49
 VZ107xe9llA0Szy3hIl1YpdOQA2NAHtma6We/ebonrPVTTkcSCGq8absc+GahA3I
 CHIomx2hjD0OQ01aHvTqgHJUdFUQQ0yzE3+FY6Tsn05JsNZvDmqpAMIoMQT0LuuG
 7VvVRLBuDXtuMtNmGaGCvfDGKTZkGGxD6iZS1iWHuixvVAz4IECK0vYqSyh31UGA
 w9Jwa0thwjKm2EZTmcSKaHSM2zw3N8QXJ3SNPPThuMrtO6QDz2+3Da9kO+vhGcRP
 Jls9KnWC2wxNxqIs3dr80Mzn4qMplc67Ekx2tUqX4tYEGGtJQxW6tm3JOKKIgFMI
 g/KF9/WJPXp0rVI9mtoQkgndzyswR/ZJBAwfEQu+nAqlp3gmmQR9+MeYPCyNnyhB
 2g22PTMbXkihJmRPAVeH+WhwFy1YY3nsRhh61ha3/N0ULXTHUh0E+hWwUVMifYSV
 SwXqQx4srlo6RJJNTji1d6R3muNjXCQNEsJ0lCOX6ajVoxWZsPH2x7/W1A8LKmY+
 FLYQUi6X9ogQbOO3WxCjUhzp5nMTNA2vvo87MUzDlZOCLPqYZmqcjntHuXwdjPyO
 lPcfTzc2nK1Ud26bG3+p2Bk3fjqkX9XcTMFniOvjKfffEfwpAq4xRPBQ3uRlzn0V
 pf9067JYF+c=
 =sVXH
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Core perf code updates:

   - Convert mmap() related reference counts to refcount_t. This is in
     reaction to the recently fixed refcount bugs, which could have been
     detected earlier and could have mitigated the bug somewhat (Thomas
     Gleixner, Peter Zijlstra)

   - Clean up and simplify the callchain code, in preparation for
     sframes (Steven Rostedt, Josh Poimboeuf)

  Uprobes updates:

   - Add support to optimize usdt probes on x86-64, which gives a
     substantial speedup (Jiri Olsa)

   - Cleanups and fixes on x86 (Peter Zijlstra)

  PMU driver updates:

   - Various optimizations and fixes to the Intel PMU driver (Dapeng Mi)

  Misc cleanups and fixes:

   - Remove redundant __GFP_NOWARN (Qianfeng Rong)"

* tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  selftests/bpf: Fix uprobe_sigill test for uprobe syscall error value
  uprobes/x86: Return error from uprobe syscall when not called from trampoline
  perf: Skip user unwind if the task is a kernel thread
  perf: Simplify get_perf_callchain() user logic
  perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL
  perf: Have get_perf_callchain() return NULL if crosstask and user are set
  perf: Remove get_perf_callchain() init_nr argument
  perf/x86: Print PMU counters bitmap in x86_pmu_show_pmu_cap()
  perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
  perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48)
  perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag
  perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
  perf/x86/intel: Use early_initcall() to hook bts_init()
  uprobes: Remove redundant __GFP_NOWARN
  selftests/seccomp: validate uprobe syscall passes through seccomp
  seccomp: passthrough uprobe systemcall without filtering
  selftests/bpf: Fix uprobe syscall shadow stack test
  selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
  selftests/bpf: Add uprobe_regs_equal test
  selftests/bpf: Add optimized usdt variant for basic usdt test
  ...
2025-09-30 11:11:21 -07:00
Paolo Bonzini
12abeb81c8 KVM x86 CET virtualization support for 6.18
Add support for virtualizing Control-flow Enforcement Technology (CET) on
 Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
 
 CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
 Branch Tracking (IBT), that can be utilized by software to help provide
 Control-flow integrity (CFI).  SHSTK defends against backward-edge attacks
 (a.k.a. Return-oriented programming (ROP)), while IBT defends against
 forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).
 
 Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
 flow to unauthorized targets in order to execute small snippets of code,
 a.k.a. gadgets, of the attackers choice.  By chaining together several gadgets,
 an attacker can perform arbitrary operations and circumvent the system's
 defenses.
 
 SHSTK defends against backward-edge attacks, which execute gadgets by modifying
 the stack to branch to the attacker's target via RET, by providing a second
 stack that is used exclusively to track control transfer operations.  The
 shadow stack is separate from the data/normal stack, and can be enabled
 independently in user and kernel mode.
 
 When SHSTK is is enabled, CALL instructions push the return address on both the
 data and shadow stack. RET then pops the return address from both stacks and
 compares the addresses.  If the return addresses from the two stacks do not
 match, the CPU generates a Control Protection (#CP) exception.
 
 IBT defends against backward-edge attacks, which branch to gadgets by executing
 indirect CALL and JMP instructions with attacker controlled register or memory
 state, by requiring the target of indirect branches to start with a special
 marker instruction, ENDBRANCH.  If an indirect branch is executed and the next
 instruction is not an ENDBRANCH, the CPU generates a #CP.  Note, ENDBRANCH
 behaves as a NOP if IBT is disabled or unsupported.
 
 From a virtualization perspective, CET presents several problems.  While SHSTK
 and IBT have two layers of enabling, a global control in the form of a CR4 bit,
 and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
 respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
 Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
 option due to complexity, and outright disallowing use of XSTATE to context
 switch SHSTK/IBT state would render the features unusable to most guests.
 
 To limit the overall complexity without sacrificing performance or usability,
 simply ignore the potential virtualization hole, but ensure that all paths in
 KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
 hardware, and the guest has access to at least one of SHSTK or IBT.  I.e. allow
 userspace to advertise one of SHSTK or IBT if both are supported in hardware,
 even though doing so would allow a misbehaving guest to use the unadvertised
 feature.
 
 Fully emulating SHSTK and IBT would also require significant complexity, e.g.
 to track and update branch state for IBT, and shadow stack state for SHSTK.
 Given that emulating large swaths of the guest code stream isn't necessary on
 modern CPUs, punt on emulating instructions that meaningful impact or consume
 SHSTK or IBT.  However, instead of doing nothing, explicitly reject emulation
 of such instructions so that KVM's emulator can't be abused to circumvent CET.
 Disable support for SHSTK and IBT if KVM is configured such that emulation of
 arbitrary guest instructions may be required, specifically if Unrestricted
 Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
 is smaller than host.MAXPHYADDR.
 
 Lastly disable SHSTK support if shadow paging is enabled, as the protections
 for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
 that they can't be directly modified by software), i.e. would require
 non-trivial support in the Shadow MMU.
 
 Note, AMD CPUs currently only support SHSTK.  Explicitly disable IBT support
 so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
 in SVM requires KVM modifications.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXbisACgkQOlYIJqCj
 N/373w//ckB4c9MjS6eDRp+LtTXQfXyAs8eMcs9YTs7yD3uMvqcbaNuDsf1U2cI6
 i2qcuOdxlnKSJphn6oH2JKDWPjRAfHhCqmYghUPaJwgeYqsTfork9s8rzU2tC82q
 38mQ6BhAuOwa/plodvDp/+POEIoXUyexSoWX+cngGVTmFWdbfA4NNGjWMZOl1XG2
 qLBck6t+IxxUTs1Ij+OsexlAKdY7FcZZ85Ok6I/VE4/lITEhuTJkwkYdh8td3KK/
 IVVk1jb1Z7t8lGQ5fi3+N/D8iHJ/0ladmOux6Yxzw88uyj6XLIFOOFsdK09GyhUS
 QzV06syFkV2vU68VDYiOcMZIdeGmYR5jDpmy9N+o0s86YLU6rKKEaXRP7vW5yHj/
 99AU+DfRHvhqKwWyQ51B+rhr80F3EQrkZXI0QBr8KO7sseFZvZNNVozwKjSyZtNH
 VBhxjIlVQm5Z1rjucKjc573sONK95z9XUSZjYnCUwB1NH7VsvdULQmJBucCmzW/p
 9j49CpmShwggceV6LcYg4Miuvjl/bL1B8Go5Fg+1Fdg7L6Nepi16yywxHmyPqreJ
 Wx/6N0gqZ3LKDdl5CFYxAxvJoldJR6lbw/AGjvFkre8A+TGGRdz3uS9XXqGHvtbu
 W5wKhnvGov69lm4xYbxbI+rvxYmmQLm9SgQXel23icbKJ5kmE48=
 =zsBl
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 CET virtualization support for 6.18

Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).

CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
Branch Tracking (IBT), that can be utilized by software to help provide
Control-flow integrity (CFI).  SHSTK defends against backward-edge attacks
(a.k.a. Return-oriented programming (ROP)), while IBT defends against
forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).

Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
flow to unauthorized targets in order to execute small snippets of code,
a.k.a. gadgets, of the attackers choice.  By chaining together several gadgets,
an attacker can perform arbitrary operations and circumvent the system's
defenses.

SHSTK defends against backward-edge attacks, which execute gadgets by modifying
the stack to branch to the attacker's target via RET, by providing a second
stack that is used exclusively to track control transfer operations.  The
shadow stack is separate from the data/normal stack, and can be enabled
independently in user and kernel mode.

When SHSTK is is enabled, CALL instructions push the return address on both the
data and shadow stack. RET then pops the return address from both stacks and
compares the addresses.  If the return addresses from the two stacks do not
match, the CPU generates a Control Protection (#CP) exception.

IBT defends against backward-edge attacks, which branch to gadgets by executing
indirect CALL and JMP instructions with attacker controlled register or memory
state, by requiring the target of indirect branches to start with a special
marker instruction, ENDBRANCH.  If an indirect branch is executed and the next
instruction is not an ENDBRANCH, the CPU generates a #CP.  Note, ENDBRANCH
behaves as a NOP if IBT is disabled or unsupported.

From a virtualization perspective, CET presents several problems.  While SHSTK
and IBT have two layers of enabling, a global control in the form of a CR4 bit,
and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
option due to complexity, and outright disallowing use of XSTATE to context
switch SHSTK/IBT state would render the features unusable to most guests.

To limit the overall complexity without sacrificing performance or usability,
simply ignore the potential virtualization hole, but ensure that all paths in
KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
hardware, and the guest has access to at least one of SHSTK or IBT.  I.e. allow
userspace to advertise one of SHSTK or IBT if both are supported in hardware,
even though doing so would allow a misbehaving guest to use the unadvertised
feature.

Fully emulating SHSTK and IBT would also require significant complexity, e.g.
to track and update branch state for IBT, and shadow stack state for SHSTK.
Given that emulating large swaths of the guest code stream isn't necessary on
modern CPUs, punt on emulating instructions that meaningful impact or consume
SHSTK or IBT.  However, instead of doing nothing, explicitly reject emulation
of such instructions so that KVM's emulator can't be abused to circumvent CET.
Disable support for SHSTK and IBT if KVM is configured such that emulation of
arbitrary guest instructions may be required, specifically if Unrestricted
Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
is smaller than host.MAXPHYADDR.

Lastly disable SHSTK support if shadow paging is enabled, as the protections
for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
that they can't be directly modified by software), i.e. would require
non-trivial support in the Shadow MMU.

Note, AMD CPUs currently only support SHSTK.  Explicitly disable IBT support
so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
in SVM requires KVM modifications.
2025-09-30 13:37:14 -04:00
Paolo Bonzini
d05ca6b793 KVM x86 changes for 6.18
- Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
    where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
    intercepts and trigger a variety of WARNs in KVM.
 
  - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
    supposed to exist for v2 PMUs.
 
  - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
 
  - Clean up KVM's vector hashing code for delivering lowest priority IRQs.
 
  - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
    actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
    in the process of doing so add fastpath support for TSC_DEADLINE writes on
    AMD CPUs.
 
  - Clean up a pile of PMU code in anticipation of adding support for mediated
    vPMUs.
 
  - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
    emulator support (KVM should never need to emulate the MSRs outside of
    forced emulation and other contrived testing scenarios).
 
  - Clean up the MSR APIs in preparation for CET and FRED virtualization, as
    well as mediated vPMU support.
 
  - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
    as KVM can't faithfully emulate an I/O APIC for such guests.
 
  - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
    for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
    response to PMU refreshes for guests with mediated vPMUs.
 
  - Misc cleanups and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXIr0ACgkQOlYIJqCj
 N/1bbhAAxHzxN7IcizgAYf1BZWMjRU4zJgwlkoGuBeH/IgUOODPjs93L9kyrzvVL
 tcFgIe9o5fZRGmUfyZbCKnJaQi/4u/2QPRSGhsYt7vyDjCoXzO5CJPMYIqDz5Z2r
 qg+GNMlLtWI8EbcDd4qT22SWC8GufoXFEQnX6PUNhasOHeKit5ye8wmttcG+zvYV
 KeIkPluddQkQ2JKyG53IFNmm1lkY05oAibv61hkxqUSwCIJKsQFuDjl4GVouAd/H
 eu0+pzNmzPUTQ/qJzr2cNL5Nqz08DGp2OCFFRO6bgXaWkvHnFG3EAEHlhTAUh92t
 LPJxmhb6R8SUc+z8rYTgyF/zVpgeJcJO7F44FrXa7r2iV58ds3TfuO53hVaEfyNp
 1GUMH0m8N2vfjtFyUVP1KwZHuFxiGKLd1wZ1h0yKpj1Eg1FjR2cEontqwH44tHn2
 ENq8MIbWIBhvCsz5fIbM4y591JSevJUrDlYu60Lz7VyXHAw8Cq92t/dN9O7oH5mJ
 pIyoracU1g0Q6bbATZYsOGhkCTYLtdelZaBb5AYIgQ+U4C1TA4GpgEBUSVH8HXDy
 kXzVqSFlL0v5rrFkBPjiNFb5WD3iLjJIM3DLGoNegOM8+79r/USGHUY+XU3z/kCH
 rV8JBlTnLBCrNOHEiHJUI2kwBQ9C9/l88X/VwvRUNv7SthuExSo=
 =9IB0
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.18

 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
   where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
   intercepts and trigger a variety of WARNs in KVM.

 - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
   supposed to exist for v2 PMUs.

 - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.

 - Clean up KVM's vector hashing code for delivering lowest priority IRQs.

 - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
   actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
   in the process of doing so add fastpath support for TSC_DEADLINE writes on
   AMD CPUs.

 - Clean up a pile of PMU code in anticipation of adding support for mediated
   vPMUs.

 - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
   emulator support (KVM should never need to emulate the MSRs outside of
   forced emulation and other contrived testing scenarios).

 - Clean up the MSR APIs in preparation for CET and FRED virtualization, as
   well as mediated vPMU support.

 - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
   as KVM can't faithfully emulate an I/O APIC for such guests.

 - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
   for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
   response to PMU refreshes for guests with mediated vPMUs.

 - Misc cleanups and minor fixes.
2025-09-30 13:36:41 -04:00
Paolo Bonzini
473badf5c4 KVM selftests changes for 6.18
- Add #DE coverage in the fastops test (the only exception that's guest-
    triggerable in fastop-emulated instructions).
 
  - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
    Forest (SRF) and Clearwater Forest (CWF).
 
  - Minor cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXfNMACgkQOlYIJqCj
 N/1Y2xAAgCTIpiarbgqbeyeH/ugTMmy861i00bdbzlMG5ViUSrMJ/XNBZcXFDoA5
 WUpnKVno0n7FItqYtt1UTpsw6RU2FjqoZhL8GAfgTzwkvSy1xF/tEm8VtyBcdzQ5
 V/V/H9wCINkGuAU/t7NdiAKOT8IqOuwIjQlv69UOHCzvCRA4OBr1h/d8o6DTGhpf
 zmdLNBoHYV4jTW/mWsKdgsOPgAXj3Id7OgjvNF5kq4KQ1jZ+DtMm9N+PW1M4PjA7
 7Bx6k0HwmeC9UXzsfpnfMWkmRtNEk5DEf2T01qFO5Ij/+JlXPkMw8wmkDx4udpRF
 0tVd6FxnGdy62NAxHETOs8ZOacRvs0SR4MHJRepIeO4Ey+4vTN3NEIu5/U24VwG+
 4oKKJ9XlYKHzY/RwgVtnqIPXcHixHt+1uGjX7dBnp8Wku5Oh9i1v3Hwbi0iDXKwu
 CdKCb4hry/PtKc4FkFhNFPIQTSKxrqj84GRGC/BHFdCSVLib0crFfFF+HHGsygnB
 BbkeOjPt/+DvflkI55EyxionDcojhjosuB5hHJoQr5RXwR0wfbDnYcIQ8ZeutMSR
 TiK2CpnEteHCKLDOF13Lydw9E1Byo9no62qzmmOPYpcuxS/VfXcKDK1BNzJ/WHti
 rrxy9mEWJ/tDsSDkq/MLLbWAhRldpmUumpfMLjseSay1jK2z6Gc=
 =A/nr
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests-6.18' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.18

 - Add #DE coverage in the fastops test (the only exception that's guest-
   triggerable in fastop-emulated instructions).

 - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
   Forest (SRF) and Clearwater Forest (CWF).

 - Minor cleanups and improvements
2025-09-30 13:23:54 -04:00
Paolo Bonzini
6a13749717 LoongArch KVM changes for v6.18
1. Add PTW feature detection on new hardware.
 2. Add sign extension with kernel MMIO/IOCSR emulation.
 3. Improve in-kernel IPI emulation.
 4. Improve in-kernel PCH-PIC emulation.
 5. Move kvm_iocsr tracepoint out of generic code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmjTd/wWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImep7+D/4uA7triOH1eB4TikB6NlF3T+ry
 1qLycMy+O8mm24UxT0217sqOtKiZHrglyJB+cCmxPNjq0vWX/F08VgNoh8+6feWL
 6X28YlJ8qcRGCTUxlaaleEGIiXt7lbbnZKFfiBn8Ibb9rHn3tE8V738Kzm+SV1Dr
 bSZlAGnAqp/pRI1UFBQ0T+GpqQz+UvDw8JCOJj5Vs5UylhDY3atPaNhLjfR2tkUh
 AFrqr87gKPHdJxmk//7u+e6QLGViBB9aO1fNMP6y8gViJRfkCEbbm8XYe0qe+SmO
 QpLKuHBEVo7C8vOzemEVieQX2VujDcGSDDRGCU3wKbIpbIQgmOGbbsKfrKf9FxaR
 8ieNyP3UExr2ZvYV9SOLqeLD2K2yox9EkO7tD2CM9kwev9HUtr+6e/OaIP3Sjth2
 rd7V47x/8bCdgt0grQXIxejebbO5NawnFjXlFS7M2SRQXLMtyfFbiuZVGw0kZ+nn
 rzMeodQVmGi518nmmc9YcyW0/R8qer9DaaiN1ybgVF/4ZSK+LhlZo7xv4Dv5bWuv
 ThR89Iz09xUmmGYDniAR6q3/zH/52lJAuNU4tQGwB6O/+z8qJR0tZFM86KnApyLU
 pGI3q1s0g9ZK9mouaM+jcV5/fzFGoTXGkIQqaaXOqob97klagudAdBWKN74YynT2
 rAU7sWXF6F9WsKX2sA==
 =keSO
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.18

1. Add PTW feature detection on new hardware.
2. Add sign extension with kernel MMIO/IOCSR emulation.
3. Improve in-kernel IPI emulation.
4. Improve in-kernel PCH-PIC emulation.
5. Move kvm_iocsr tracepoint out of generic code.
2025-09-30 13:23:44 -04:00
Paolo Bonzini
924ccf1d09 KVM/riscv changes for 6.18
- Added SBI FWFT extension for Guest/VM with misaligned
   delegation and pointer masking PMLEN features
 - Added ONE_REG interface for SBI FWFT extension
 - Added Zicbop and bfloat16 extensions for Guest/VM
 - Enabled more common KVM selftests for RISC-V such as
   access_tracking_perf_test, dirty_log_perf_test,
   memslot_modification_stress_test, memslot_perf_test,
   mmu_stress_test, and rseq_test
 - Added SBI v3.0 PMU enhancements in KVM and perf driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmjU1XEACgkQrUjsVaLH
 LAfahQ//aPSwG0jbPwRSmWtoqdm38VvKYuWOOxeCjKyas5w0cZPkMQCh3zV/Ks92
 YKNIe6wlDeLIlT+qIzAi5VLwHHiC3ggoJsuNhf7/qmJJvks+1quvl1clUDkVGbRu
 G+FKFkRpBa1HF2O3NASnvBnRLYzXa/wKOuArwsY5DEG4+irnJU3AegbqRuhyPBFr
 VX+LwpfUUHipFGg/0ivflsVBjcz42Y3i1VQ4oXzuHqsRn3Ig3eSy9dGTHIOqJ0A3
 leNbDUSdZxnMlj3Sfab7hH4Vxlr8kiCEgMogHyYpnlyPvG8zJobMWDE9Ziy86D7f
 130zMcHuF0ZkeuaKX+3o2L7yGNTnN/JNtns7VtClRShGqSA8Dtn2xLhnvyray7RH
 CIYjv//34z1BjWyCBfuN5kFIfX5K7cNQHlDP7RVfgw91k7rbCGCJF743nkxz1WBX
 98G05/Rnmn+KCtU8t2pRoG9Qkq+bE3iz8Ka8thmiUNciqO78nn/p7a6OEMcjn7jH
 C+VUNST519UBGCjLehBrCZuUOtrPRozHz7Adx3VTJT5wBX2hd/QLoNoMssEi9VmS
 1j/abh7HJcbe7aTUDWUhOK9I+bPHJRsbeNyL5r9jpoC2Xg5Njh/V3aQZ/uuLzO5+
 pYjciRrrmdZ30xkInulWI6wORHkmki34RRmsM9gqYIJBMV+Rcxg=
 =BCyU
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.18

- Added SBI FWFT extension for Guest/VM with misaligned
  delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
  access_tracking_perf_test, dirty_log_perf_test,
  memslot_modification_stress_test, memslot_perf_test,
  mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver
2025-09-30 13:23:36 -04:00
Paolo Bonzini
924ebaefce KVM/arm64 updates for 6.18
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
   allowing more registers to be used as part of the message payload.
 
 - Change the way pKVM allocates its VM handles, making sure that the
   privileged hypervisor is never tricked into using uninitialised
   data.
 
 - Speed up MMIO range registration by avoiding unnecessary RCU
   synchronisation, which results in VMs starting much quicker.
 
 - Add the dump of the instruction stream when panic-ing in the EL2
   payload, just like the rest of the kernel has always done. This will
   hopefully help debugging non-VHE setups.
 
 - Add 52bit PA support to the stage-1 page-table walker, and make use
   of it to populate the fault level reported to the guest on failing
   to translate a stage-1 walk.
 
 - Add NV support to the GICv3-on-GICv5 emulation code, ensuring
   feature parity for guests, irrespective of the host platform.
 
 - Fix some really ugly architecture problems when dealing with debug
   in a nested VM. This has some bad performance impacts, but is at
   least correct.
 
 - Add enough infrastructure to be able to disable EL2 features and
   give effective values to the EL2 control registers. This then allows
   a bunch of features to be turned off, which helps cross-host
   migration.
 
 - Large rework of the selftest infrastructure to allow most tests to
   transparently run at EL2. This is the first step towards enabling
   NV testing.
 
 - Various fixes and improvements all over the map, including one BE
   fix, just in time for the removal of the feature.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmjVhSsACgkQI9DQutE9
 ekOSIg//YdbA/zo17GrLnJnlGdUnS0e3/357n3e5Lypx1UFRDTmNacpVPw4VG/jt
 eVpQn7AgYwyvKfCq46eD+hBBqNv1XTn4DeXttv7CmVqhCRythsEvDkTBWSt7oYUZ
 xfYXCMKNhqUElH4AbYYx3y7nb2E9/KVGr+NBn6Vf5c14OZ3MGVc/fyp4jM1ih5dR
 kcV2onAYohlIGvFEyZMBtJ+jkYkqfIbfxfqCL0RAET0aEBFcmM1aXybWZj47hlLM
 f2j+E6cFQ0ZzUt+3pFhT75wo43lHGtIFDjVd60uishyU+NXTVvqRmXDTRU4k546W
 18HHX1yijbzuXIatVhVRo2hIq3jKU37T9wtj46BejbDHRdAPENEyN/Qopm7rNS+X
 mCwOT7He6KR+H4rU6nFaTcsS7bNRCvIbZP9i9zb6NElbvXu5QnM8BUQsYFCDUa/n
 xtbtQlckbo/7zeoUsBDrGj2XmCf0d45FTHb7fdWOYEmMSmJhXYpUKdM4JcLyhKoQ
 DD0ox2S+pt2lwNw3XOSABdES0KJxCvDASAMIgn2h2sGpY8FxsBcVW/BufXopdafG
 UeInxWaILp2iCDM4tH2GLjKqlvMAOwcA+mAEZToXypxlJAnYA6J1pXCF8WEaM6+D
 BGrLli8Zd8JRs87byq6K7tp8oLNzZdliJp73j5jfOHTJA4MnvFI=
 =iAL3
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.18

- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
  allowing more registers to be used as part of the message payload.

- Change the way pKVM allocates its VM handles, making sure that the
  privileged hypervisor is never tricked into using uninitialised
  data.

- Speed up MMIO range registration by avoiding unnecessary RCU
  synchronisation, which results in VMs starting much quicker.

- Add the dump of the instruction stream when panic-ing in the EL2
  payload, just like the rest of the kernel has always done. This will
  hopefully help debugging non-VHE setups.

- Add 52bit PA support to the stage-1 page-table walker, and make use
  of it to populate the fault level reported to the guest on failing
  to translate a stage-1 walk.

- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
  feature parity for guests, irrespective of the host platform.

- Fix some really ugly architecture problems when dealing with debug
  in a nested VM. This has some bad performance impacts, but is at
  least correct.

- Add enough infrastructure to be able to disable EL2 features and
  give effective values to the EL2 control registers. This then allows
  a bunch of features to be turned off, which helps cross-host
  migration.

- Large rework of the selftest infrastructure to allow most tests to
  transparently run at EL2. This is the first step towards enabling
  NV testing.

- Various fixes and improvements all over the map, including one BE
  fix, just in time for the removal of the feature.
2025-09-30 13:23:28 -04:00
Paolo Bonzini
8cbb0df294 KVM/arm64 changes for 6.17, round #3
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
    visiting from an MMU notifier
 
  - Fixes to the TLB match process and TLB invalidation range for
    managing the VCNR pseudo-TLB
 
  - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
    values in PMSCR_EL1
 
  - Fix save/restore of host MDCR_EL2 to account for eagerly programming
    at vcpu_load() on VHE systems
 
  - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
    where an xarray's spinlock was nested with a *raw* spinlock
 
  - Permit stage-2 read permission aborts which are possible in the case
    of NV depending on the guest hypervisor's stage-2 translation
 
  - Call raw_spin_unlock() instead of the internal spinlock API
 
  - Fix parameter ordering when assigning VBAR_EL1
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaMHCvxccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFqbCAP9ygpb3tkpAPkbaB68IwyJgGD/C
 59f2NTUGkzak20SjHAD+IcvQV8GemDsGNvSvjpb08KrNWMUxeuOBiAp/IvqrqwU=
 =VrcD
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, round #3

 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
   visiting from an MMU notifier

 - Fixes to the TLB match process and TLB invalidation range for
   managing the VCNR pseudo-TLB

 - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
   values in PMSCR_EL1

 - Fix save/restore of host MDCR_EL2 to account for eagerly programming
   at vcpu_load() on VHE systems

 - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
   where an xarray's spinlock was nested with a *raw* spinlock

 - Permit stage-2 read permission aborts which are possible in the case
   of NV depending on the guest hypervisor's stage-2 translation

 - Call raw_spin_unlock() instead of the internal spinlock API

 - Fix parameter ordering when assigning VBAR_EL1

[Pull into kvm/master to fix conflicts. - Paolo]
2025-09-30 13:23:06 -04:00
Linus Torvalds
755fa5b4fb cgroup: Changes for v6.18
- Extensive cpuset code cleanup and refactoring work with no functional
   changes: CPU mask computation logic refactoring, introducing new helpers,
   removing redundant code paths, and improving error handling for better
   maintainability.
 
 - A few bug fixes to cpuset including fixes for partition creation failures
   when isolcpus is in use, missing error returns, and null pointer access
   prevention in free_tmpmasks().
 
 - Core cgroup changes include replacing the global percpu_rwsem with
   per-threadgroup rwsem when writing to cgroup.procs for better scalability,
   workqueue conversions to use WQ_PERCPU and system_percpu_wq to prepare for
   workqueue default switching from percpu to unbound, and removal of unused
   code including the post_attach callback.
 
 - New cgroup.stat.local time accounting feature that tracks frozen time
   duration.
 
 - Misc changes including selftests updates (new freezer time tests and
   backward compatibility fixes), documentation sync, string function safety
   improvements, and 64-bit division fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaNb1Sg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfLMAPwKwkvUg9DPJEuECRfM9woOOHyIWLp1DwUhpg1v
 Zq0lkAEAmo/+IkJXGZ7TGF+wzSj7GFIugrILu3upzLCHzgYoDgs=
 =39KF
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Extensive cpuset code cleanup and refactoring work with no functional
   changes: CPU mask computation logic refactoring, introducing new
   helpers, removing redundant code paths, and improving error handling
   for better maintainability.

 - A few bug fixes to cpuset including fixes for partition creation
   failures when isolcpus is in use, missing error returns, and null
   pointer access prevention in free_tmpmasks().

 - Core cgroup changes include replacing the global percpu_rwsem with
   per-threadgroup rwsem when writing to cgroup.procs for better
   scalability, workqueue conversions to use WQ_PERCPU and
   system_percpu_wq to prepare for workqueue default switching from
   percpu to unbound, and removal of unused code including the
   post_attach callback.

 - New cgroup.stat.local time accounting feature that tracks frozen time
   duration.

 - Misc changes including selftests updates (new freezer time tests and
   backward compatibility fixes), documentation sync, string function
   safety improvements, and 64-bit division fixes.

* tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits)
  cpuset: remove is_prs_invalid helper
  cpuset: remove impossible warning in update_parent_effective_cpumask
  cpuset: remove redundant special case for null input in node mask update
  cpuset: fix missing error return in update_cpumask
  cpuset: Use new excpus for nocpu error check when enabling root partition
  cpuset: fix failure to enable isolated partition when containing isolcpus
  Documentation: cgroup-v2: Sync manual toctree
  cpuset: use partition_cpus_change for setting exclusive cpus
  cpuset: use parse_cpulist for setting cpus.exclusive
  cpuset: introduce partition_cpus_change
  cpuset: refactor cpus_allowed_validate_change
  cpuset: refactor out validate_partition
  cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers
  cpuset: refactor CPU mask buffer parsing logic
  cpuset: Refactor exclusive CPU mask computation logic
  cpuset: change return type of is_partition_[in]valid to bool
  cpuset: remove unused assignment to trialcs->partition_root_state
  cpuset: move the root cpuset write check earlier
  cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock
  cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock
  ...
2025-09-30 09:55:41 -07:00
Gopi Krishna Menon
03faea8466 selftests/net: add tcp_port_share to .gitignore
Add the tcp_port_share test binary to .gitignore to avoid
accidentally staging the build artifact.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250929163140.122383-1-krishnagopi487@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-30 08:30:26 -07:00
Benjamin Tissoires
6a88bb252b Merge branch 'for-6.18/selftests' into for-linus
- update vmtest.sh (Benjamin Tissoires)
2025-09-30 16:51:38 +02:00
Jakub Kicinski
b3820e0e6c selftests: drv-net: psp: add tests for destroying devices
Add tests for making sure device can disappear while associations
exist. This is netdevsim-only since destroying real devices is
more tricky.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-9-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski
81236c74db selftests: drv-net: psp: add test for auto-adjusting TCP MSS
Test TCP MSS getting auto-adjusted. PSP adds an encapsulation overhead
of 40B per packet, when used in transport mode without any
virtualization cookie or other optional PSP header fields. The kernel
should adjust the MSS for a connection after PSP tx state is reached.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-8-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski
2748087cf1 selftests: drv-net: psp: add connection breaking tests
Add test checking conditions which lead to connections breaking.
Using bad key or connection gets stuck if device key is rotated
twice.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-7-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski
81b8908531 selftests: drv-net: psp: add association tests
Add tests for exercising PSP associations for TCP sockets.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-6-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski
8f90dc6e41 selftests: drv-net: psp: add basic data transfer and key rotation tests
Add basic tests for sending data over PSP and making sure that key
rotation toggles the MSB of the spi.

Deploy PSP responder on the remote end. We also need a healthy dose
of common helpers for setting up the connections, assertions and
interrogating socket state on the Python side.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-5-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski
2aeb71b2f9 selftests: drv-net: add PSP responder
PSP tests need the remote system to support PSP, and some PSP capable
application to exchange data with. Create a simple PSP responder app
which we can build and deploy to the remote host. The tests themselves
can be written in Python but for ease of deploying the responder is in C
(using C YNL).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-4-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski
8a5f956a9f selftests: drv-net: base device access API test
Simple PSP test to getting info about PSP devices.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-3-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:21 +02:00
Hangbin Liu
99e4c35ead selftests: bonding: add ipsec offload test
This introduces a test for IPSec offload over bonding, utilizing netdevsim
for the testing process, as veth interfaces do not support IPSec offload.
The test will ensure that the IPSec offload functionality remains operational
even after a failover event occurs in the bonding configuration.

Here is the test result:

TEST: bond_ipsec_offload (active_slave eth0)                        [ OK ]
TEST: bond_ipsec_offload (active_slave eth1)                        [ OK ]

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250925023304.472186-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 09:55:12 +02:00
Linus Torvalds
417552999d powerpc updates for 6.18
- powerpc support for BPF arena and arena atomics
  - Patches to switch to msi parent domain (per-device MSI domains)
  - Add a lock contention tracepoint in the queued spinlock slowpath
  - Fixes for underflow in pseries/powernv msi and pci paths
  - Switch from legacy-of-mm-gpiochip dependency to platform driver
  - Fixes for handling TLB misses
  - Introduce support for powerpc papr-hvpipe
  - Add vpa-dtl PMU driver for pseries platform
  - Misc fixes and cleanups
 
 Thanks to: Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira Rajeev, Cédric Le Goater, Christophe Leroy,
 Erhard Furtner, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence, Kajol Jain, Kienan Stewart,
 Linus Walleij, Mahesh Salgaonkar, Nam Cao, Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
 Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas Gleixner, Thomas Huth, Thorsten Blum,
 Tyrel Datwyler, Venkat Rao Bagalkote,
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmjatlYACgkQpnEsdPSH
 ZJRQRQ//fCFV+ZMAgg54N/PMaobYSQdjvAGLGj+ynTfLas0qkhV62x3X1R85pLbW
 cqiDTcBeVT9nW7+2w+eIcS4kWku2/oorlPnsDtKTadLa78Sk9r12kzsPn+vOobwA
 /dril1kRHNqdpqUG8P751EUIFWULsKFOwNMdp24q8inpr2YOy3csVbAxUqVkSVbk
 8aafr5/hl5uvj38KdE4ditLlphNenA+kB1Tf67/R4T7MjHTqZtLPd+crqrLtEQlk
 eYvcJ1c5t/+ltgiSnuR5Go48CKDaw4YKCtqnaBiOBxMad2xnzqRo+NWmcxH1HmwI
 MZsSyHeri/sF7zBRCpytqGlci/wFQwiNeI3bpE31lV0wwu8Bmoypommxly/fPUSU
 23Qi7UuTmOXu537aFcMAfqmA05JlJh79JCEuWVN5LUgQUIalATeiyoKNPAuYYw2V
 FNYyURBBe1FNVhGyJdTg+766npUX8eqOWMImcyuAzdAV4MuBPzUbhVKJazHax+4r
 BzqDD5DeApl53+wUzLjWnN4s1rKxOuRUdWgOsrzD3DQZJkIP3p8WDZkgQY4YiI3P
 EWbHEx7q1XSiMJD4lhyAYHY01yc1bwYrrZ0/NCpl4G1trbdLb0sZmGEH9ykJlzSC
 cVV/kvH9tyy95I7HbDKc4lK1c0dC5kwFmcArTaHnDaRpRvc+eBQ=
 =U2s/
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Madhavan Srinivasan:

 - powerpc support for BPF arena and arena atomics

 - Patches to switch to msi parent domain (per-device MSI domains)

 - Add a lock contention tracepoint in the queued spinlock slowpath

 - Fixes for underflow in pseries/powernv msi and pci paths

 - Switch from legacy-of-mm-gpiochip dependency to platform driver

 - Fixes for handling TLB misses

 - Introduce support for powerpc papr-hvpipe

 - Add vpa-dtl PMU driver for pseries platform

 - Misc fixes and cleanups

Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira
Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence,
Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao,
Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas
Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao
Bagalkote.

* tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits)
  powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct
  genirq/msi: Remove msi_post_free()
  powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
  powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
  powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
  powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
  docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
  powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
  powerpc/time: Expose boot_tb via accessor
  powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
  powerpc/fprobe: fix updated fprobe for function-graph tracer
  powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
  powerpc64/modules: replace stub allocation sentinel with an explicit counter
  powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
  powerpc/ftrace: ensure ftrace record ops are always set for NOPs
  powerpc/603: Really copy kernel PGD entries into all PGDIRs
  powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
  powerpc/pseries: HVPIPE changes to support migration
  powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS
  powerpc/pseries: Enable HVPIPE event message interrupt
  ...
2025-09-29 19:28:50 -07:00
Linus Torvalds
cb7e3669c6 RISC-V updates for the v6.18 merge window (part one)
First set of RISC-V updates for the v6.18 merge window, including:
 
 - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other
   architectures have already merged this type of cleanup)
 
 - The introduction of ioremap_wc() for RISC-V
 
 - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than
   open code
 
 - A RISC-V kprobes unit test
 
 - An architecture-specific endianness swap macro set implementation,
   leveraging some dedicated RISC-V instructions for this purpose if they
   are available
 
 - The ability to identity and communicate to userspace the presence of a
   MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE
   implementation in cpu_relax()
 
 - Several other miscellaneous cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmjaMVIACgkQx4+xDQu9
 Kkva4g/9GE4gzwM+KHlc4e1sfsaTo4oXe9eV+Hj3gUJdM+g8dCtNFchPRCKFjHYb
 X9lm2YVL9Q3cZ7yWWy/DJtZ66Yz9foQ4laX7uYbjsdbdjbsAXLXhjNp6uk4nBrqb
 uW7Uq+Qel8qq66J/B4Z/U0UzF3e5MaptQ6sLuNRONY9OJUxG76zTSJVKwiVNsGaX
 8W59b7ALFlCIwCVyXGm/KO1EELjl8FKDENWXFE1v6T8XvfXfYuhcvUMp84ebbzHV
 D4kQrO3nxKQVgKEdCtW8xxt0/aWkYQ8zbQg8bD0gDDzMYb5uiJDcMFaa8ZuEYiLg
 EcAJX8LmE5GGlTcJf8/jxvaA87hisNsGFvNPXX1OuI26w2TUNC2u80wE6Q6a3aNu
 74oUtZEaPhdFuB31A0rALC8Zb2zalnwwjAL7xRlyAozUuye8Ej7mE7w97WTjIfWz
 7ZL19/+C1uawljzYLn+FJBIcl1wTyvOgx6T4TJlmOsa4OFnCreFx+a0Az6+Rx1GC
 XGsRyDkGPSabIbNVGxUCyTr4w+VpW1WlDjrjwyLC0DQTFW8wyP44W9/K2scp+CqJ
 bSCcAz8QtGAeZ5UlSmXYTOV69xXqZPaom7fVk5RFHoy24en5DSo7kj1NJ3ChupRD
 8ACpALcIgw/VEo0Tyqqy3dyVhPDuYaZMY3WgGGj9Cz18U37e/Ho=
 =nK3D
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley

 - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other
   architectures have already merged this type of cleanup)

 - The introduction of ioremap_wc() for RISC-V

 - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather
   than open code

 - A RISC-V kprobes unit test

 - An architecture-specific endianness swap macro set implementation,
   leveraging some dedicated RISC-V instructions for this purpose if
   they are available

 - The ability to identity and communicate to userspace the presence
   of a MIPS P8700-specific ISA extension, and to leverage its
   MIPS-specific PAUSE implementation in cpu_relax()

 - Several other miscellaneous cleanups

* tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits)
  riscv: errata: Fix the PAUSE Opcode for MIPS P8700
  riscv: hwprobe: Document MIPS xmipsexectl vendor extension
  riscv: hwprobe: Add MIPS vendor extension probing
  riscv: Add xmipsexectl instructions
  riscv: Add xmipsexectl as a vendor extension
  dt-bindings: riscv: Add xmipsexectl ISA extension description
  riscv: cpufeature: add validation for zfa, zfh and zfhmin
  perf: riscv: skip empty batches in counter start
  selftests: riscv: Add README for RISC-V KSelfTest
  riscv: sbi: Switch to new sys-off handler API
  riscv: Move vendor errata definitions to new header
  RISC-V: ACPI: enable parsing the BGRT table
  riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG
  riscv: pi: use 'targets' instead of extra-y in Makefile
  riscv: introduce asm/swab.h
  riscv: mmap(): use unsigned offset type in riscv_sys_mmap
  drivers/perf: riscv: Remove redundant ternary operators
  riscv: mm: Use mmu-type from FDT to limit SATP mode
  riscv: mm: Return intended SATP mode for noXlvl options
  riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM
  ...
2025-09-29 19:01:08 -07:00
Linus Torvalds
feafee2845 arm64 updates for 6.18
Confidential computing:
  - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
    and mapping them with appropriate attributes.
 
 CPU features:
  - Advertise atomic floating-point instructions to userspace.
 
  - Extend Spectre workarounds to cover additional Arm CPU variants.
 
  - Extend list of CPUs that support break-before-make level 2 and
    guarantee not to generate TLB conflict aborts for changes of mapping
    granularity (BBML2_NOABORT).
 
  - Add GCS support to our uprobes implementation.
 
 Documentation:
  - Remove bogus SME documentation concerning register state when
    entering/exiting streaming mode.
 
 Entry code:
  - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY).
 
  - Micro-optimise syscall entry path with a compiler branch hint.
 
 Memory management:
  - Enable huge mappings in vmalloc space even when kernel page-table
    dumping is enabled.
 
  - Tidy up the types used in our early MMU setup code.
 
  - Rework rodata= for closer parity with the behaviour on x86.
 
  - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
    linear map even when rodata= applies to virtual aliases.
 
  - Don't re-allocate the virtual region between '_text' and '_stext',
    as doing so confused tools parsing /proc/vmcore.
 
 Miscellaneous:
  - Clean-up Kconfig menuconfig text for architecture features.
 
  - Avoid redundant bitmap_empty() during determination of supported
    SME vector lengths.
 
  - Re-enable warnings when building the 32-bit vDSO object.
 
  - Avoid breaking our eggs at the wrong end.
 
 Perf and PMUs:
  - Support for v3 of the Hisilicon L3C PMU.
 
  - Support for Hisilicon's MN and NoC PMUs.
 
  - Support for Fujitsu's Uncore PMU.
 
  - Support for SPE's extended event filtering feature.
 
  - Preparatory work to enable data source filtering in SPE.
 
  - Support for multiple lanes in the DWC PCIe PMU.
 
  - Support for i.MX94 in the IMX DDR PMU driver.
 
  - MAINTAINERS update (Thank you, Yicong).
 
  - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
 
 Selftests:
  - Add basic LSFE check to the existing hwcaps test.
 
  - Support nolibc in GCS tests.
 
  - Extend SVE ptrace test to pass unsupported regsets and invalid vector
    lengths.
 
  - Minor cleanups (typos, cosmetic changes).
 
 System registers:
  - Fix ID_PFR1_EL1 definition.
 
  - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1.
 
  - Sync TCR_EL1 definition with the latest Arm ARM (L.b).
 
  - Be stricter about the input fed into our AWK sysreg generator script.
 
  - Typo fixes and removal of redundant definitions.
 
 ACPI, EFI and PSCI:
  - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
    support from the ACPI GHES code so that it can be used by platforms
    booted with device-tree.
 
  - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
    runtime calls.
 
  - Fix a node refcount imbalance in the PSCI device-tree code.
 
 CPU Features:
  - Ensure register sanitisation is applied to fields in ID_AA64MMFR4.
 
  - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests
    can reliably query the underlying CPU types from the VMM.
 
  - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
    to our context-switching, signal handling and ptrace code.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmjWlMIQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNIVYB/sHaFYmToEaK4ldMfTn4ZQ8yr9ZuTMLr/ji
 zOm7wULP08wKIcp6t+1N6e7A8Wb3YSErxTywc4b9MqctIel1WvBMewjJd38xb2qO
 hFCuRuWemfyt6Rw/EGMCP54yueLzKbnN4q+/Aks8jedWtS+AXY6McGCJOzMjuECL
 zKb68Hqqb09YQ2b2BPSAiU9g42rotiIYppLaLbEdxUnw/eqfEN3wSrl92/LuBlqH
 r2wZDnJwA5q8Iy9HWlnhg4Jax0Cs86jSJPIQLB6v3WWhc3HR49pm5cqYzYkUht9L
 nA6eaWJiBl/+k3S+ftPKNzU8n04r15lqyNZE2FYfRk+0BPSacJOf
 =kO15
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's good stuff across the board, including some nice mm
  improvements for CPUs with the 'noabort' BBML2 feature and a clever
  patch to allow ptdump to play nicely with block mappings in the
  vmalloc area.

  Confidential computing:

   - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
     and mapping them with appropriate attributes.

  CPU features:

   - Advertise atomic floating-point instructions to userspace

   - Extend Spectre workarounds to cover additional Arm CPU variants

   - Extend list of CPUs that support break-before-make level 2 and
     guarantee not to generate TLB conflict aborts for changes of
     mapping granularity (BBML2_NOABORT)

   - Add GCS support to our uprobes implementation.

  Documentation:

   - Remove bogus SME documentation concerning register state when
     entering/exiting streaming mode.

  Entry code:

   - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)

   - Micro-optimise syscall entry path with a compiler branch hint.

  Memory management:

   - Enable huge mappings in vmalloc space even when kernel page-table
     dumping is enabled

   - Tidy up the types used in our early MMU setup code

   - Rework rodata= for closer parity with the behaviour on x86

   - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
     linear map even when rodata= applies to virtual aliases

   - Don't re-allocate the virtual region between '_text' and '_stext',
     as doing so confused tools parsing /proc/vmcore.

  Miscellaneous:

   - Clean-up Kconfig menuconfig text for architecture features

   - Avoid redundant bitmap_empty() during determination of supported
     SME vector lengths

   - Re-enable warnings when building the 32-bit vDSO object

   - Avoid breaking our eggs at the wrong end.

  Perf and PMUs:

   - Support for v3 of the Hisilicon L3C PMU

   - Support for Hisilicon's MN and NoC PMUs

   - Support for Fujitsu's Uncore PMU

   - Support for SPE's extended event filtering feature

   - Preparatory work to enable data source filtering in SPE

   - Support for multiple lanes in the DWC PCIe PMU

   - Support for i.MX94 in the IMX DDR PMU driver

   - MAINTAINERS update (Thank you, Yicong)

   - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).

  Selftests:

   - Add basic LSFE check to the existing hwcaps test

   - Support nolibc in GCS tests

   - Extend SVE ptrace test to pass unsupported regsets and invalid
     vector lengths

   - Minor cleanups (typos, cosmetic changes).

  System registers:

   - Fix ID_PFR1_EL1 definition

   - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1

   - Sync TCR_EL1 definition with the latest Arm ARM (L.b)

   - Be stricter about the input fed into our AWK sysreg generator
     script

   - Typo fixes and removal of redundant definitions.

  ACPI, EFI and PSCI:

   - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
     support from the ACPI GHES code so that it can be used by platforms
     booted with device-tree

   - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
     runtime calls

   - Fix a node refcount imbalance in the PSCI device-tree code.

  CPU Features:

   - Ensure register sanitisation is applied to fields in ID_AA64MMFR4

   - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
     guests can reliably query the underlying CPU types from the VMM

   - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
     to our context-switching, signal handling and ptrace code"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64: cpufeature: Remove duplicate asm/mmu.h header
  arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
  perf/dwc_pcie: Fix use of uninitialized variable
  arm/syscalls: mark syscall invocation as likely in invoke_syscall
  Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
  Documentation: hisi-pmu: Fix of minor format error
  drivers/perf: hisi: Add support for L3C PMU v3
  drivers/perf: hisi: Refactor the event configuration of L3C PMU
  drivers/perf: hisi: Extend the field of tt_core
  drivers/perf: hisi: Extract the event filter check of L3C PMU
  drivers/perf: hisi: Simplify the probe process of each L3C PMU version
  drivers/perf: hisi: Export hisi_uncore_pmu_isr()
  drivers/perf: hisi: Relax the event ID check in the framework
  perf: Fujitsu: Add the Uncore PMU driver
  arm64: map [_text, _stext) virtual address range non-executable+read-only
  arm64/sysreg: Update TCR_EL1 register
  arm64: Enable vmalloc-huge with ptdump
  arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
  arm64: errata: Apply workarounds for Neoverse-V3AE
  arm64: cputype: Add Neoverse-V3AE definitions
  ...
2025-09-29 18:48:39 -07:00
Kuniyuki Iwashima
9b62d53cc8 selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt
This also does not have the non-experimental version, so converted to FO.

The comment in .pkt explains the detailed scenario.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
05b9f505fb selftest: packetdrill: Import sockopt-fastopen-key.pkt
sockopt-fastopen-key.pkt does not have the non-experimental
version, so the Experimental version is converted, FOEXP -> FO.

The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead
sets another key via setsockopt(TCP_FASTOPEN_KEY).

The first listener generates a valid cookie in response to TFO
option without cookie, and the second listner creates a TFO socket
using the valid cookie.

TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh
so that we can use TFO_COOKIE and support dualstack.  Similarly,
TFO_COOKIE_ZERO for the 0-0-0-0 key is defined.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
be90c7b3d5 selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt.
These changes are applied to follow the imported packetdrill tests.

  * Call setsockopt(TCP_FASTOPEN)
  * Remove unnecessary accept() delay
  * Add assertion for TCP states
  * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
21f7fb31ae selftest: packetdrill: Import opt34/*-trigger-rst.pkt.
This imports the non-experimental version of opt34/*-trigger-rst.pkt.

                                     | accept() | SYN data |
  -----------------------------------+----------+----------+
  listener-closed-trigger-rst.pkt    |    no    |  unread  |
  unread-data-closed-trigger-rst.pkt |   yes    |  unread  |

Both files test that close()ing a SYN_RECV socket with unread SYN data
triggers RST.

The files are renamed to have the common prefix, trigger-rst.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
5920f154e1 selftest: packetdrill: Import opt34/reset-* tests.
This imports the non-experimental version of opt34/reset-*.pkt.

                                   |  Child  |              RST              | sk_err  |
  ---------------------------------+---------+-------------------------------+---------+
  reset-after-accept.pkt           |   TFO   |   after accept(), SYN_RECV    |  read() |
  reset-close-with-unread-data.pkt |   TFO   |   after accept(), SYN_RECV    | write() |
  reset-before-accept.pkt          |   TFO   |  before accept(), SYN_RECV    |  read() |
  reset-non-tfo-socket.pkt         | non-TFO |  before accept(), ESTABLISHED | write() |

The first 3 files test scenarios where a SYN_RECV socket receives RST
before/after accept() and data in SYN must be read() without error,
but the following read() or fist write() will return ECONNRESET.

The last test is similar but with non-TFO socket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
a8b1750e68 selftest: packetdrill: Import opt34/icmp-before-accept.pkt.
This imports the non-experimental version of icmp-before-accept.pkt.

This file tests the scenario where an ICMP unreachable packet for a
not-yet-accept()ed socket changes its state to TCP_CLOSE, but the
SYN data must be read without error, and the following read() returns
EHOSTUNREACH.

Note that this test support only IPv4 as icmp is used.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
5ed080f85a selftest: packetdrill: Import opt34/fin-close-socket.pkt.
This imports the non-experimental version of fin-close-socket.pkt.

This file tests the scenario where a TFO child socket's state
transitions from SYN_RECV to CLOSE_WAIT before accept()ed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
e57b3933ab selftest: packetdrill: Add test for experimental option.
The only difference between non-experimental vs experimental TFO
option handling is SYN+ACK generation.

When tcp_parse_fastopen_option() parses a TFO option, it sets
tcp_fastopen_cookie.exp to false if the option number is 34,
and true if 255.

The value is carried to tcp_options_write() to generate a TFO option
with the same option number.

Other than that, all the TFO handling is the same and the kernel must
generate the same cookie regardless of the option number.

Let's add a test for the handling so that we can consolidate
fastopen/server/ tests and fastopen/server/opt34 tests.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima
399e0a7ed9 selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1.
TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and
each server test requires setsockopt(TCP_FASTOPEN).

Let's add a basic test for TFO_SERVER_WO_SOCKOPT1.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima
0b8f164eb2 selftest: packetdrill: Import TFO server basic tests.
This imports basic TFO server tests from google/packetdrill.

The repository has two versions of tests for most scenarios; one uses
the non-experimental option (34), and the other uses the experimental
option (255) with 0xF989.

This only imports the following tests of the non-experimental version
placed in [0].  I will add a specific test for the experimental option
handling later.

                             | TFO | Cookie | Payload |
  ---------------------------+-----+--------+---------+
  basic-rw.pkt               | yes |  yes   |   yes   |
  basic-zero-payload.pkt     | yes |  yes   |    no   |
  basic-cookie-not-reqd.pkt  | yes |   no   |   yes   |
  basic-non-tfo-listener.pkt |  no |  yes   |   yes   |
  pure-syn-data.pkt          | yes |   no   |   yes   |

The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did
not test TFO server in some scenarios unintentionally, so setsockopt()
is added where needed.  In addition, non-TFO scenario is stripped as
it is covered by basic-non-tfo-listener.pkt.  Also, I added basic- prefix.

Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34 #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima
97b3b8306f selftest: packetdrill: Define common TCP Fast Open cookie.
TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher().

The cookie value is generated from src/dst IPs and a key configured by
setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key.

The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill
defines the corresponding cookie as TFO_COOKIE in run_all.py. [0]

Then, each test does not need to care about the value, and we can easily
update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the
algorithm.

However, some tests use the bare hex value for specific IPv4 addresses
and do not support IPv6.

Let's define the same TFO_COOKIE in ksft_runner.sh.

We will replace such bare hex values with TFO_COOKIE except for a single
test for setsockopt(TCP_FASTOPEN_KEY).

Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65 #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:19 -07:00
Kuniyuki Iwashima
261cb8b123 selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN).
To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must
have 0x2 (TFO_SERVER_ENABLE), and we need to do either

  1. Call setsockopt(TCP_FASTOPEN) for the socket
  2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen

The default.sh sets 0x70403 so that each test does not need setsockopt().
(0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???)

However, some tests overwrite net.ipv4.tcp_fastopen without
TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN).

For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally,
except in the first scenario.

To prevent such an accident, let's require explicit setsockopt().

TFO_CLIENT_ENABLE is necessary for
tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt.

Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:07 -07:00
Kuniyuki Iwashima
70dd4775db selftest: packetdrill: Set ktap_set_plan properly for single protocol test.
The cited commit forgot to update the ktap_set_plan call.

ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must
match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP +
KTAP_CNT_XFAIL) in ktap_finished.

Otherwise, the selftest exit()s with 1.

Let's adjust KSFT_NUM_TESTS based on supported protocols.

While at it, misalignment is fixed up.

Fixes: a5c10aa3d1 ("selftests/net: packetdrill: Support single protocol test.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:06 -07:00
Matthieu Baerts (NGI0)
c912f935a5 selftests: mptcp: join: validate new laminar endp
Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar'
endpoint type.

In a setup where subflows created using the routing rules would be
rejected by the listener, and where the latter announces one IP address,
some cases are verified:

- Without any 'laminar' endpoints: no new subflows are created.

- With one 'laminar' endpoint: a second subflow is created.

- With multiple 'laminar' endpoints: 2 IPv4 subflows are created.

- With one 'laminar' endpoint, but the server announcing a second IP
  address, only one subflow is created.

- With one 'laminar' + 'subflow' endpoint, the same endpoint is only
  used once.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:23:36 -07:00
Linus Torvalds
a240a79d43 seccomp updates for v6.18-rc1
- Fix race with WAIT_KILLABLE_RECV (Johannes Nixdorf)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNrY3wAKCRA2KwveOeQk
 u5vmAP9pH7LAt7zTWgleIxWPYuUvELS+zB9oK9EukyTZNbAVoAD8Do5ZpFP51Llk
 a6mxvmi808aYytROWM5c8O4tk5CFDQQ=
 =Vkfy
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp update from Kees Cook:

 - Fix race with WAIT_KILLABLE_RECV (Johannes Nixdorf)

* tag 'seccomp-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
  seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
2025-09-29 17:44:09 -07:00
Linus Torvalds
18b19abc37 namespace-6.18-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQgQAKCRCRxhvAZXjc
 oiFXAQCpbLvkWbld9wLgxUBhq+q+kw5NvGxzpvqIhXwJB9F9YAEA44/Wevln4xGx
 +kRUbP+xlRQqenIYs2dLzVHzAwAdfQ4=
 =EO4Y
 -----END PGP SIGNATURE-----

Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains a larger set of changes around the generic namespace
  infrastructure of the kernel.

  Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
  ns_common which carries the reference count of the namespace and so
  on.

  We open-coded and cargo-culted so many quirks for each namespace type
  that it just wasn't scalable anymore. So given there's a bunch of new
  changes coming in that area I've started cleaning all of this up.

  The core change is to make it possible to correctly initialize every
  namespace uniformly and derive the correct initialization settings
  from the type of the namespace such as namespace operations, namespace
  type and so on. This leaves the new ns_common_init() function with a
  single parameter which is the specific namespace type which derives
  the correct parameters statically. This also means the compiler will
  yell as soon as someone does something remotely fishy.

  The ns_common_init() addition also allows us to remove ns_alloc_inum()
  and drops any special-casing of the initial network namespace in the
  network namespace initialization code that Linus complained about.

  Another part is reworking the reference counting. The reference
  counting was open-coded and copy-pasted for each namespace type even
  though they all followed the same rules. This also removes all open
  accesses to the reference count and makes it private and only uses a
  very small set of dedicated helpers to manipulate them just like we do
  for e.g., files.

  In addition this generalizes the mount namespace iteration
  infrastructure introduced a few cycles ago. As reminder, the vfs makes
  it possible to iterate sequentially and bidirectionally through all
  mount namespaces on the system or all mount namespaces that the caller
  holds privilege over. This allow userspace to iterate over all mounts
  in all mount namespaces using the listmount() and statmount() system
  call.

  Each mount namespace has a unique identifier for the lifetime of the
  systems that is exposed to userspace. The network namespace also has a
  unique identifier working exactly the same way. This extends the
  concept to all other namespace types.

  The new nstree type makes it possible to lookup namespaces purely by
  their identifier and to walk the namespace list sequentially and
  bidirectionally for all namespace types, allowing userspace to iterate
  through all namespaces. Looking up namespaces in the namespace tree
  works completely locklessly.

  This also means we can move the mount namespace onto the generic
  infrastructure and remove a bunch of code and members from struct
  mnt_namespace itself.

  There's a bunch of stuff coming on top of this in the future but for
  now this uses the generic namespace tree to extend a concept
  introduced first for pidfs a few cycles ago. For a while now we have
  supported pidfs file handles for pidfds. This has proven to be very
  useful.

  This extends the concept to cover namespaces as well. It is possible
  to encode and decode namespace file handles using the common
  name_to_handle_at() and open_by_handle_at() apis.

  As with pidfs file handles, namespace file handles are exhaustive,
  meaning it is not required to actually hold a reference to nsfs in
  able to decode aka open_by_handle_at() a namespace file handle.
  Instead the FD_NSFS_ROOT constant can be passed which will let the
  kernel grab a reference to the root of nsfs internally and thus decode
  the file handle.

  Namespaces file descriptors can already be derived from pidfds which
  means they aren't subject to overmount protection bugs. IOW, it's
  irrelevant if the caller would not have access to an appropriate
  /proc/<pid>/ns/ directory as they could always just derive the
  namespace based on a pidfd already.

  It has the same advantage as pidfds. It's possible to reliably and for
  the lifetime of the system refer to a namespace without pinning any
  resources and to compare them trivially.

  Permission checking is kept simple. If the caller is located in the
  namespace the file handle refers to they are able to open it otherwise
  they must hold privilege over the owning namespace of the relevant
  namespace.

  The namespace file handle layout is exposed as uapi and has a stable
  and extensible format. For now it simply contains the namespace
  identifier, the namespace type, and the inode number. The stable
  format means that userspace may construct its own namespace file
  handles without going through name_to_handle_at() as they are already
  allowed for pidfs and cgroup file handles"

* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
  ns: drop assert
  ns: move ns type into struct ns_common
  nstree: make struct ns_tree private
  ns: add ns_debug()
  ns: simplify ns_common_init() further
  cgroup: add missing ns_common include
  ns: use inode initializer for initial namespaces
  selftests/namespaces: verify initial namespace inode numbers
  ns: rename to __ns_ref
  nsfs: port to ns_ref_*() helpers
  net: port to ns_ref_*() helpers
  uts: port to ns_ref_*() helpers
  ipv4: use check_net()
  net: use check_net()
  net-sysfs: use check_net()
  user: port to ns_ref_*() helpers
  time: port to ns_ref_*() helpers
  pid: port to ns_ref_*() helpers
  ipc: port to ns_ref_*() helpers
  cgroup: port to ns_ref_*() helpers
  ...
2025-09-29 11:20:29 -07:00
Linus Torvalds
3a2a5b278f vfs-6.18-rc1.mount
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQOwAKCRCRxhvAZXjc
 oth/AQDvlOo+23/f2djgDGS8akjkBYVLW14OYzC0q5cbEnnGgAEAycHL50pp3n1o
 3jMYlCByuv507vpCsDupo7QcJapmQAk=
 =/NwE
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
 "This contains some work around mount api handling:

   - Output the warning message for mnt_too_revealing() triggered during
     fsmount() to the fscontext log. This makes it possible for the
     mount tool to output appropriate warnings on the command line.

     For example, with the newest fsopen()-based mount(8) from
     util-linux, the error messages now look like:

       # mount -t proc proc /tmp
       mount: /tmp: fsmount() failed: VFS: Mount too revealing.
              dmesg(1) may have more information after failed mount system call.

   - Do not consume fscontext log entries when returning -EMSGSIZE

     Userspace generally expects APIs that return -EMSGSIZE to allow for
     them to adjust their buffer size and retry the operation.

     However, the fscontext log would previously clear the message even
     in the -EMSGSIZE case.

     Given that it is very cheap for us to check whether the buffer is
     too small before we remove the message from the ring buffer, let's
     just do that instead.

   - Drop an unused argument from do_remount()"

* tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: fs/namespace.c: remove ms_flags argument from do_remount
  selftests/filesystems: add basic fscontext log tests
  fscontext: do not consume log entries when returning -EMSGSIZE
  vfs: output mount_too_revealing() errors to fscontext
  docs/vfs: Remove mentions to the old mount API helpers
  fscontext: add custom-prefix log helpers
  fs: Remove mount_bdev
  fs: Remove mount_nodev
2025-09-29 09:32:34 -07:00
Linus Torvalds
b7ce6fa90f vfs-6.18-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQMQAKCRCRxhvAZXjc
 omNLAQCgrwzd9sa1JTlixweu3OAxQlSEbLuMpEv7Ztm+B7Wz0AD9HtwPC44Kev03
 GbMcB2DCFLC4evqYECj6IG7NBmoKsAs=
 =1ICf
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...
2025-09-29 09:03:07 -07:00
Liam R. Howlett
6bf377b06c maple_tree: Add single node allocation support to maple state
The fast path through a write will require replacing a single node in
the tree.  Using a sheaf (32 nodes) is too heavy for the fast path, so
special case the node store operation by just allocating one node in the
maple state.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:40:46 +02:00
Liam R. Howlett
9b05890a25 maple_tree: Prefilled sheaf conversion and testing
Use prefilled sheaves instead of bulk allocations. This should speed up
the allocations and the return path of unused allocations.

Remove the push and pop of nodes from the maple state as this is now
handled by the slab layer with sheaves.

Testing has been removed as necessary since the features of the tree
have been reduced.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:31:41 +02:00
Liam R. Howlett
fdbebab19f tools/testing: Add support for prefilled slab sheafs
Add the prefilled sheaf structs to the slab header and the associated
functions to the testing/shared/linux.c file.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:25:00 +02:00
Liam R. Howlett
551a6e757a testing/radix-tree/maple: Hack around kfree_rcu not existing
liburcu doesn't have kfree_rcu (or anything similar). Despite that, we
can hack around it in a trivial fashion, by adding a wrapper.

The wrapper only works for maple_nodes because we cannot get the
kmem_cache pointer any other way in the test code.

Link: https://lore.kernel.org/all/20250812162124.59417-1-pfalcato@suse.de/
Suggested-by: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:23:52 +02:00
Vlastimil Babka
9f910f7d3d tools/testing: include maple-shim.c in maple.c
There's some duplicated code and we are about to add more functionality
in maple-shared.h that we will need in the userspace maple test to be
available, so include it via maple-shim.c

Co-developed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:23:08 +02:00
Liam R. Howlett
c4fb7f0a79 tools/testing: Add support for changes to slab for sheaves
The slab changes for sheaves requires more effort in the testing code.
Unite all the kmem_cache work into the tools/include slab header for
both the vma and maple tree testing.

The vma test code also requires importing more #defines to allow for
seamless use of the shared kmem_cache code.

This adds the pthread header to the slab header in the tools directory
to allow for the pthread_mutex in linux.c.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:22:44 +02:00
Liam R. Howlett
d09a61a3aa tools/testing/vma: Implement vm_refcnt reset
Add the reset of the ref count in vma_lock_init().  This is needed if
the vma memory is not zeroed on allocation.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:21:16 +02:00
Liam R. Howlett
e3852a1213 maple_tree: Drop bulk insert support
Bulk insert mode was added to facilitate forking faster, but forking now
uses __mt_dup() to duplicate the tree.

The addition of sheaves has made the bulk allocations difficult to
maintain - since the expected entries would preallocate into the maple
state.  A big part of the maple state node allocation was the ability to
push nodes back onto the state for later use, which was essential to the
bulk insert algorithm.

Remove mas_expected_entries() and mas_destroy_rebalance() functions as
well as the MA_STATE_BULK and MA_STATE_REBALANCE maple state flags since
there are no users anymore.  Drop the associated testing as well.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:21:16 +02:00
Lorenzo Stoakes
da577f1fcb tools/testing/vma: clean up stubs in vma_internal.h
We do not need to references arguments just to avoid compiler warnings,
the warning in question does not arise here, so remove all of the
instances of '(void)xxx' introduced purely to avoid this warning.

As reported by WagYuli in the referenced mail, GCC 8.3 and before will
have issues compiling this file if parameter names are not provided, so
ensure these are always provided.

Finally, perform a trivial fix up of kmem_cache_alloc() which technically
has parameters in the incorrect order (as reported by Vlastimil Babka
off-list).

Link: https://lkml.kernel.org/r/20250826102824.22730-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: WangYuli <wangyuli@uniontech.com>
Closes: https://lore.kernel.org/linux-mm/EFCEBE7E301589DE+20250729084700.208767-1-wangyuli@uniontech.com/
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29 09:21:16 +02:00
Donet Tom
08ff89b565 selftests/mm: add fork inheritance test for ksm_merging_pages counter
Add a new selftest to verify whether the `ksm_merging_pages` counter in
`mm_struct` is not inherited by a child process after fork.  This helps
ensure correctness of KSM accounting across process creation.

Link: https://lkml.kernel.org/r/e7bb17d374133bd31a3e423aa9e46e1122e74971.1758648700.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-28 11:51:32 -07:00
Takashi Iwai
6b9c4a05ae ASoC: Updates for v6.18 round 2
Some more updates for v6.18, mostly fixes for the earlier pull request
 with some cleanups and more minor fixes for older code.  We do have one
 new driver, the TI TAS2783A, and some quirks for new platforms.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmjZIZAACgkQJNaLcl1U
 h9DZqgf+L+U/ysqynLO6NncBatsP3QHBxL8Op8DhOUG8cmtKwUHgeNJPaNDPA/rF
 b6WLe6yZBXpBLyDtWo/eHwxC3pdOPJ7JcFWhpZcYMBKfxwszzki72OiZerxkwuUS
 mFVr0EFkB/cbqkw9D8AF5tOMlauSDNjJQUiuNDOKD7BYvFNtuY3wOYvIA/kaW86l
 SlAb3dD4pOhYP8mgPP8v8h83LyHr3NnIEX4uVSQTxZ98zZmL08FB2sCv4bzqZuag
 pI6cDF1qlOT5y1CKLXIZ1JoflrUiEYyB9X1eb/rW2+8/USGzC4enr2cEm4phU3te
 X4MV3N3BulSq03xewX0Jw19mdQE0Hg==
 =91Ur
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v6.18-2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next

ASoC: Updates for v6.18 round 2

Some more updates for v6.18, mostly fixes for the earlier pull request
with some cleanups and more minor fixes for older code.  We do have one
new driver, the TI TAS2783A, and some quirks for new platforms.
2025-09-28 15:41:17 +02:00
Kumar Kartikeya Dwivedi
15cf39221e selftests/bpf: Add stress test for rqspinlock in NMI
Introduce a kernel module that will exercise lock acquisition in the NMI
path, and bias toward creating contention such that NMI waiters end up
being non-head waiters. Prior to the rqspinlock fix made in the commit
0d80e7f951 ("rqspinlock: Choose trylock fallback for NMI waiters"), it
was possible for the queueing path of non-head waiters to get stuck in
NMI, which this stress test reproduces fairly easily with just 3 CPUs.

Both AA and ABBA flavors are supported, and it will serve as a test case
for future fixes that address this corner case. More information about
the problem in question is available in the commit cited above. When the
fix is reverted, this stress test will lock up the system.

To enable this test automatically through the test_progs infrastructure,
add a load_module_params API to exercise both AA and ABBA cases when
running the test.

Note that the test runs for at most 5 seconds, and becomes a noop after
that, in order to allow the system to make forward progress. In
addition, CPU 0 is always kept untouched by the created threads and
NMIs. The test will automatically scale to the number of available
online CPUs.

Note that at least 3 CPUs are necessary to run this test, hence skip the
selftest in case the environment has less than 3 CPUs available.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250927205304.199760-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-28 03:18:40 -07:00
Daniel Borkmann
0e8e60e86c selftests/bpf: Add test case for different expected_attach_type
Add a small test case which adds two programs - one calling the other
through a tailcall - and check that BPF rejects them in case of different
expected_attach_type values:

  # ./vmtest.sh -- ./test_progs -t xdp_devmap
  [...]
  #641/1   xdp_devmap_attach/DEVMAP with programs in entries:OK
  #641/2   xdp_devmap_attach/DEVMAP with frags programs in entries:OK
  #641/3   xdp_devmap_attach/Verifier check of DEVMAP programs:OK
  #641/4   xdp_devmap_attach/DEVMAP with programs in entries on veth:OK
  #641     xdp_devmap_attach:OK
  Summary: 2/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250926171201.188490-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-27 06:24:27 -07:00
Petr Machata
fca6ff9191 selftests: forwarding: README: Mention defer, adf_
Mention how it would be nice if new code used defer. Also if it does that
in dirtying helpers, how it would be nice if these were named adf_*.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0764bdb9266cd516da23ddeec110e01118cf981e.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:41 -07:00
Petr Machata
040a6cbead selftests: forwarding: lib: Add an autodefer variant of forwarding_enable()
Most forwarding tests invoke forwarding_enable() to enable the router and
forwarding_restore() to restore the original configuration. Add a helper,
adf_forwarding_enable(), which is like forwarding_enable(), but takes care
of scheduling the cleanup automatically.

Convert the tests that currently use defer to schedule the cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/78b752c40069cde21c44dcf4c7b966a76a0eef2c.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:40 -07:00
Petr Machata
f53748d56d selftests: forwarding: lib: Add an autodefer variant of simple_if_init()
Most forwarding tests invoke simple_if_init() to set up a VRF-based "host"
and simple_if_fini() to tear it down again. Add a helper,
adf_simple_if_init(), which is like simple_if_fini(), but takes care of
scheduling the cleanup automatically.

Convert the tests that currently use defer to schedule the cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/6b9ee1a7946a36fd32a47fdb1aa9325198ffc695.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:40 -07:00
Petr Machata
02aabe00b2 selftests: forwarding: lib: Add an autodefer variant of vrf_prepare()
Most forwarding tests invoke vrf_prepare() to set up VRF forwarding and
vrf_cleanup() to restore the original configuration. Add a helper,
adf_vrf_prepare(), which is like vrf_prepare(), but takes care of
scheduling the cleanup automatically.

Convert a number of tests that currently use defer to schedule the cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/2f2000e54ae700d560a8d6128322dade3bd2207e.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:40 -07:00
Petr Machata
14b72996ae selftests: net: vlan_bridge_binding: Rename dfr_set_binding_*() to adf_*
This test contains two autodefer-like helpers, but namespaces them as dfr_*
instead of adf_* like this patchset. Rename them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/5f0c81b39e9e1f56f706cc4b53f82238a1d1e2f9.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:39 -07:00
Petr Machata
b628dfcd54 selftests: net: lib: Rename bridge_vlan_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/93526ce79e635a3ec34753c796edf0c96711547d.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:39 -07:00
Petr Machata
d85bcf6505 selftests: net: lib: Rename ip_route_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/403143183373419e4a31df4665d6bfaa273eb761.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:39 -07:00
Petr Machata
773603d6db selftests: net: lib: Rename ip_addr_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/706327a5db660c7f18ba9fbfba7ce913da065e3e.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:38 -07:00
Petr Machata
a55f9fb343 selftests: net: lib: Rename ip_link_set_down() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/e5bf4cb3405fb50fe6e217a04268952e97410dc2.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:38 -07:00
Petr Machata
34d3f8b75e selftests: net: lib: Rename ip_link_set_up() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/475716ef792f5bd42e5c8ef1c3e287b1294f1630.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:37 -07:00
Petr Machata
beb98a3477 selftests: net: lib: Rename ip_link_set_addr() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/5318e90f7f491f9f397ac221a8b47fdbedd0d3b2.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:37 -07:00
Petr Machata
c3cbd21fe1 selftests: net: lib: Rename ip_link_set_master() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/53ce64231faa1396a968b2869af5f1c0aebec2c9.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:37 -07:00
Petr Machata
191c4912f9 selftests: net: lib: Rename ip_link_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0b163cca1bf2ec44270e0fc89108f488d99d9c9d.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:36 -07:00
Matthieu Baerts (NGI0)
c5273f6ca1 mptcp: pm: rename 'subflows' to 'extra_subflows'
A few variables linked to the Path-Managers are confusing, and it would
help current and future developers, to clarify them.

One of them is 'subflows', which in fact represents the number of extra
subflows: all the additional subflows created after the initial one, and
not the total number of subflows.

While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_extra_subflows. Not to break the current uAPI, the
new name is added as a 'define' pointing to the former name. This will
then also help userspace devs.

No functional changes intended.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-5-ad126cc47c6b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:44:04 -07:00
Matthieu Baerts (NGI0)
008385efd0 selftests: mptcp: join: validate C-flag + def limit
The previous commit adds an exception for the C-flag case. The
'mptcp_join.sh' selftest is extended to validate this case.

In this subtest, there is a typical CDN deployment with a client where
MPTCP endpoints have been 'automatically' configured:

- the server set net.mptcp.allow_join_initial_addr_port=0

- the client has multiple 'subflow' endpoints, and the default limits:
  not accepting ADD_ADDRs.

Without the parent patch, the client is not able to establish new
subflows using its 'subflow' endpoints. The parent commit fixes that.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: df377be387 ("mptcp: add deny_join_id0 in mptcp_options_received")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-2-ad126cc47c6b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:44:03 -07:00
Vadim Fedorenko
ed3d74a754 selftests: net-drv: stats: sanity check FEC histogram
Simple tests to validate kernel's output. FEC bin range should be valid
means high boundary should be not less than low boundary. Bin boundaries
have to be provided as well as error counter value. Per-plane value
should match bin's value.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20250924124037.1508846-6-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 16:49:29 -07:00
Alessandro Zanni
81dcfdd21d selftest: net: Fix error message if empty variable
Fix to avoid cases where the `res` shell variable is
empty in script comparisons.
The comparison has been modified into string comparison to
handle other possible values the variable could assume.

The issue can be reproduced with the command:
make kselftest TARGETS=net

It solves the error:
./tfo_passive.sh: line 98: [: -eq: unary operator expected

Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250925132832.9828-1-alessandro.zanni87@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 15:23:33 -07:00
Stanislav Fomichev
47f78a67d3 selftests: drv-net: Enable BTF
Commit fec2e55bdef ("selftests: drv-net: Pull data before parsing headers")
added __ksym external symbol to xdp_native.bpf.c which now requires
a kernel with BTF. Enable BTF for driver selftests.

Before:

  # TAP version 13
  # 1..10
  # # Exception| Traceback (most recent call last):
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/ksft.py", line 244, in ksft_run
  # # Exception|     case(*args)
  # # Exception|     ~~~~^^^^^^^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/drivers/net/./xdp.py", line 231, in test_xdp_native_pass_sb
  # # Exception|     _test_pass(cfg, bpf_info, 256)
  # # Exception|     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/drivers/net/./xdp.py", line 209, in _test_pass
  # # Exception|     prog_info = _load_xdp_prog(cfg, bpf_info)
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/drivers/net/./xdp.py", line 114, in _load_xdp_prog
  # # Exception|     cmd(
  # # Exception|     ~~~^
  # # Exception|     f"ip link set dev {cfg.ifname} mtu {bpf_info.mtu} xdpdrv obj {abs_path} sec {bpf_info.xdp_sec}",
  # # Exception|     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  # # Exception|     shell=True
  # # Exception|     ^^^^^^^^^^
  # # Exception|     )
  # # Exception|     ^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/utils.py", line 75, in __init__
  # # Exception|     self.process(terminate=False, fail=fail, timeout=timeout)
  # # Exception|     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/utils.py", line 95, in process
  # # Exception|     raise CmdExitFailure("Command failed: %s\nSTDOUT: %s\nSTDERR: %s" %
  # # Exception|                          (self.proc.args, stdout, stderr), self)
  # # Exception| net.lib.py.utils.CmdExitFailure: Command failed: ip link set dev eni30773np1 mtu 1500 xdpdrv obj /home/sdf/src/linux/tools/testing/selftests/net/lib/xdp_native.bpf.o sec xdp
  # # Exception| STDOUT: b''
  # # Exception| STDERR: b"libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?\nlibbpf: failed to find '.BTF' ELF section in /lib/modules/6.17.0-rc6-virtme/build/vmlinux\nlibbpf: failed to find valid kernel BTF\nlib
  bpf: Error loading vmlinux BTF: -3\nlibbpf: failed to load object '/home/sdf/src/linux/tools/testing/selftests/net/lib/xdp_native.bpf.o'\n"
  # not ok 1 xdp.test_xdp_native_pass_sb
  ...

After:

  # TAP version 13
  # 1..10
  # ok 1 xdp.test_xdp_native_pass_sb
  # ok 2 xdp.test_xdp_native_pass_mb
  # ok 3 xdp.test_xdp_native_drop_sb
  # ok 4 xdp.test_xdp_native_drop_mb
  # ok 5 xdp.test_xdp_native_tx_sb
  # ok 6 xdp.test_xdp_native_tx_mb
  # # Ignoring SIGTERM (cnt: 2), already exiting...
  # # Ignoring SIGTERM (cnt: 3), already exiting...
  # # Exception| Traceback (most recent call last):
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/ksft.py", line 244, in ksft_run
  # # Exception|     case(*args)
  # # Exception|     ~~~~^^^^^^^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/drivers/net/./xdp.py", line 506, in test_xdp_native_adjst_taa
  # # Exception|     res = _test_xdp_native_tail_adjst(
  # # Exception|         cfg,
  # # Exception|         pkt_sz_lst,
  # # Exception|         offset_lst,
  # # Exception|     )
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/drivers/net/./xdp.py", line 467, in _test_xdp_native_tail_adt
  # # Exception|     recvd_str = _exchg_udp(cfg, port, test_str)
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/drivers/net/./xdp.py", line 72, in _exchg_udp
  # # Exception|     with bkg(rx_udp_cmd, exit_wait=True) as nc:
  # # Exception|          ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/utils.py", line 137, in __exit__
  # # Exception|     return self.process(terminate=terminate, fail=self.check_fail)
  # # Exception|            ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/utils.py", line 85, in process
  # # Exception|     stdout, stderr = self.proc.communicate(timeout)
  # # Exception|                      ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  # # Exception|   File "/usr/lib/python3.13/subprocess.py", line 1222, in communicate
  # # Exception|     stdout, stderr = self._communicate(input, endtime, timeout)
  # # Exception|                      ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  # # Exception|   File "/usr/lib/python3.13/subprocess.py", line 2128, in _communicate
  # # Exception|     ready = selector.select(timeout)
  # # Exception|   File "/usr/lib/python3.13/selectors.py", line 398, in select
  # # Exception|     fd_event_list = self._selector.poll(timeout)
  # # Exception|   File "/home/sdf/src/linux/tools/testing/selftests/net/lib/py/ksft.py", line 208, in _ksft_intr
  # # Exception|     raise KsftTerminate()
  # # Exception| net.lib.py.ksft.KsftTerminate
  # # Stopping tests due to KsftTerminate.
  # not ok 7 xdp.test_xdp_native_adjst_tail_grow_data
  # # Totals: pass:6 fail:1 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250924222518.1826863-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 14:23:42 -07:00
Amery Hung
11ae737efe selftests: drv-net: Reload pkt pointer after calling filter_udphdr
Fix a verification failure. filter_udphdr() calls bpf_xdp_pull_data(),
which will invalidate all pkt pointers. Therefore, all ctx->data loaded
before filter_udphdr() cannot be used. Reload it to prevent verification
errors.

The error may not appear on some compiler versions if they decide to
load ctx->data after filter_udphdr() when it is first used.

Fixes: efec2e55bd ("selftests: drv-net: Pull data before parsing headers")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250925161452.1290694-1-ameryhung@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 13:54:46 -07:00
Amery Hung
991e555eff selftests/bpf: Test changing packet data from kfunc
bpf_xdp_pull_data() is the first kfunc that changes packet data. Make
sure the verifier clear all packet pointers after calling packet data
changing kfunc.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250926164142.1850176-1-ameryhung@gmail.com
2025-09-26 10:44:51 -07:00
Alessandro Zanni
1d235d8494 iommu/selftest: prevent use of uninitialized variable
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.

It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
  inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized]
  760 |   if (!(__exp _t __seen)) { \
      |      ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
  513 |   __EXPECT(expected, #expected, seen, #seen, ==, 1)
      |   ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
 1057 |   ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id, nvevents))
      |   ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro ‘test_cmd_trigger_vevents’
 2924 |   test_cmd_trigger_vevents(dev_id, 3);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~

The issue can be reproduced, building the tests, with the command: make -C
tools/testing/selftests TARGETS=iommu

Link: https://patch.msgid.link/r/20250924171629.50266-1-alessandro.zanni87@gmail.com
Fixes: 97717a1f28 ("iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage")
Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-26 10:00:27 -03:00
Tao Chen
d43029ff7d selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
Add tests for stacktrace map lookup and delete:
1. use bpf_map_lookup_and_delete_elem to lookup and delete the target
   stack_id,
2. lookup the deleted stack_id again to double check.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925175030.1615837-3-chen.dylane@linux.dev
2025-09-25 16:17:30 -07:00
Tao Chen
363b17e273 selftests/bpf: Refactor stacktrace_map case with skeleton
The loading method of the stacktrace_map test case looks too outdated,
refactor it with skeleton, and we can use global variable feature in
the next patch.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925175030.1615837-2-chen.dylane@linux.dev
2025-09-25 16:17:14 -07:00
Mykyta Yatsenko
105eb5dc74 selftests/bpf: Fix flaky bpf_cookie selftest
bpf_cookie can fail on perf_event_open(), when it runs after the task_work
selftest. The task_work test causes perf to lower
sysctl_perf_event_sample_rate, and bpf_cookie uses sample_freq,
which is validated against that sysctl. As a result,
perf_event_open() rejects the attr if the (now tighter) limit is
exceeded.

>From perf_event_open():
if (attr.freq) {
	if (attr.sample_freq > sysctl_perf_event_sample_rate)
		return -EINVAL;
} else {
	if (attr.sample_period & (1ULL << 63))
		return -EINVAL;
}

Switch bpf_cookie to use sample_period, which is not checked against
sysctl_perf_event_sample_rate.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925215230.265501-1-mykyta.yatsenko5@gmail.com
2025-09-25 15:55:44 -07:00
Amery Hung
1193c46c17 selftests/bpf: Test changing packet data from global functions with a kfunc
The verifier should invalidate all packet pointers after a packet data
changing kfunc is called. So, similar to commit 3f23ee5590
("selftests/bpf: test for changing packet data from global functions"),
test changing packet data from global functions to make sure packet
pointers are indeed invalidated.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250925170013.1752561-2-ameryhung@gmail.com
2025-09-25 14:52:09 -07:00
Dylan Yudaken
21bbcdf669 selftests/kexec: Ignore selftest binary
Add a .gitignore for the test case build object.

Signed-off-by: Dylan Yudaken <dyudaken@gmail.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-09-25 13:43:28 -06:00
Jakub Kicinski
203e3beb73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).

Conflicts:

drivers/net/can/spi/hi311x.c
  6b69680847 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled")
  27ce71e1ce ("net: WQ_PERCPU added to alloc_workqueue users")
https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org

net/smc/smc_loopback.c
drivers/dibs/dibs_loopback.c
  a35c04de25 ("net/smc: fix warning in smc_rx_splice() when calling get_page()")
  cc21191b58 ("dibs: Move data path to dibs layer")
https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org

Adjacent changes:

drivers/net/dsa/lantiq/lantiq_gswip.c
  c0054b25e2 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()")
  7a1eaef0a7 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-25 11:00:59 -07:00
Mykyta Yatsenko
5730dacb3f selftests/bpf: Task_work selftest cleanup fixes
task_work selftest does not properly handle cleanup during failures:
 * destroy bpf_link
 * perf event fd is passed to bpf_link, no need to close it if link was
 created successfully
 * goto cleanup if fork() failed, close pipe.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250924142954.129519-2-mykyta.yatsenko5@gmail.com
2025-09-25 11:00:01 -07:00
Guangshuo Li
a9e6aa9949 nvdimm: ndtest: Return -ENOMEM if devm_kcalloc() fails in ndtest_probe()
devm_kcalloc() may fail. ndtest_probe() allocates three DMA address
arrays (dcr_dma, label_dma, dimm_dma) and later unconditionally uses
them in ndtest_nvdimm_init(), which can lead to a NULL pointer
dereference under low-memory conditions.

Check all three allocations and return -ENOMEM if any allocation fails,
jumping to the common error path. Do not emit an extra error message
since the allocator already warns on allocation failure.

Fixes: 9399ab61ad ("ndtest: Add dimms to the two buses")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2025-09-25 12:49:46 -05:00
Linus Torvalds
4ff71af020 Including fixes from Bluetooth, IPsec and CAN.
No known regressions at this point.
 
 Current release - regressions:
 
   - xfrm: xfrm_alloc_spi shouldn't use 0 as SPI
 
 Previous releases - regressions:
 
   - xfrm: fix offloading of cross-family tunnels
 
   - bluetooth: fix several races leading to UaFs
 
   - dsa: lantiq_gswip: fix FDB entries creation for the CPU port
 
   - eth: tun: update napi->skb after XDP process
 
   - eth: mlx: fix UAF in flow counter release
 
 Previous releases - always broken:
 
   - core: forbid FDB status change while nexthop is in a group
 
   - smc: fix warning in smc_rx_splice() when calling get_page()
 
   - can: provide missing ndo_change_mtu(), to prevent buffer overflow.
 
   - eth: i40e: fix VF config validation
 
   - eth: broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjVKrYSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkttwQAIErYmSE3aGyQ2ps9iDYFzM0BJyE2dZg
 BfZDdpL3PiRJkckltGzQSE4/Lw4ypIIxDuWrXmjbaaMwg7j6Y1hIe1KS81NnptyR
 2slfubXVWsmO6KVl9UHauxnsewCbozTemWI+2bRf6Urc2j8OpAkyHCPWT2NBzFv2
 U7ykcSLGDqrMa0NH3UD+ymmGfpdZe0x9RFzkBTUQIxS/Oo7EgGD3jgY0h85n04C5
 bTTOeWbMopS10L0Ow79xH4MnT4iPenxW7ABRhMwyres4Mm9WA39nP6Iep8VtmCKX
 51xLK4NBvPR0OMipJZj3hD+5RCWsLeGW+AVY3HoMPDSvetxjDlPcLJ1yMW6zP4Qa
 ZwW1OhMTSqSGviJ+9NSb849I+RRLcIs5206dXYfRLTXZl/DTcBvYCcFsh0QohHOa
 OYXarf9wXK2kekNukJY5isqgvNNZWsah4Mvjv4NcjF9xxg9NHQaD0QRI6NyFQAU+
 4OfH1bzCV20hLiUjEI3U5t+EN9gJl5bNloEmHPkgW7NnkRoobjnCRwzTMY+UeF57
 55P4AmD6gAHSSE7ZtaHRLt5ZV4TP7ySYWDEgRCXA/DgtdsQLPUhB4sw6xkz7j1a+
 3k9AeSXu87khyXpqRxI/lPb+xqZ52wPHhlRwDlKD3dLnCd+y+YuT7+dLlvnrPNEF
 pYsu6LF0yTQY
 =M0Pd
 -----END PGP SIGNATURE-----

Merge tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth, IPsec and CAN.

  No known regressions at this point.

  Current release - regressions:

   - xfrm: xfrm_alloc_spi shouldn't use 0 as SPI

  Previous releases - regressions:

   - xfrm: fix offloading of cross-family tunnels

   - bluetooth: fix several races leading to UaFs

   - dsa: lantiq_gswip: fix FDB entries creation for the CPU port

   - eth:
       - tun: update napi->skb after XDP process
       - mlx: fix UAF in flow counter release

  Previous releases - always broken:

   - core: forbid FDB status change while nexthop is in a group

   - smc: fix warning in smc_rx_splice() when calling get_page()

   - can: provide missing ndo_change_mtu(), to prevent buffer overflow.

   - eth:
       - i40e: fix VF config validation
       - broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl"

* tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()
  net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port
  net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()
  libie: fix string names for AQ error codes
  net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD
  net/mlx5: HWS, ignore flow level for multi-dest table
  net/mlx5: fs, fix UAF in flow counter release
  selftests: fib_nexthops: Add test cases for FDB status change
  selftests: fib_nexthops: Fix creation of non-FDB nexthops
  nexthop: Forbid FDB status change while nexthop is in a group
  net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS
  bnxt_en: correct offset handling for IPv6 destination address
  ptp: document behavior of PTP_STRICT_FLAGS
  broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl
  broadcom: fix support for PTP_PEROUT_DUTY_CYCLE
  Bluetooth: MGMT: Fix possible UAFs
  Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync
  Bluetooth: hci_event: Fix UAF in hci_conn_tx_dequeue
  Bluetooth: hci_sync: Fix hci_resume_advertising_sync
  Bluetooth: Fix build after header cleanup
  ...
2025-09-25 08:23:52 -07:00
Richard Gobert
5e9ff9378a selftests/net: test ipip packets in gro.sh
Add IPIP test-cases to the GRO selftest.

This selftest already contains IP ID test-cases. They are now
also tested for encapsulated packets.

This commit also fixes ipip packet generation in the test.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250923085908.4687-6-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25 12:42:49 +02:00
Richard Gobert
f095a358fa net: gro: remove unnecessary df checks
Currently, packets with fixed IDs will be merged only if their
don't-fragment bit is set. This restriction is unnecessary since
packets without the don't-fragment bit will be forwarded as-is even
if they were merged together. The merged packets will be segmented
into their original forms before being forwarded, either by GSO or
by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
is set, in which case the IDs can become incrementing, which is also fine.

Clean up the code by removing the unnecessary don't-fragment checks.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250923085908.4687-5-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25 12:42:49 +02:00
Vadim Fedorenko
b9c8a2c567 selftests: drv-net: add HW timestamping tests
Add simple tests to validate that the driver sets up timestamping
configuration according to what is reported in capabilities.

For RX timestamping we allow driver to fallback to wider scope for
timestamping if filter is applied. That actually means that driver
can enable ptpv2-event when it reports ptpv2-l4-event is supported,
but not vice versa.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250923173310.139623-5-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24 18:14:22 -07:00
Jakub Kicinski
c7ab8024ca netfilter pull request nf-next-25-09-24
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmjT71ENHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AN7OEAClGm5eABrjTlHq8REn7S4xUBo9/jOESTvg22Bg
 A4qu14OV7rGMT/mLyV0kuNzHs18NT9bhpTwANL0bjz7YOK+v8QRlsHzNaJ7dzJ00
 WUPPYFnAhCA3qTCWLLvCnzpZgAopnRpFAgpXP6ddZ2kpsT+fo5B67kcasOQoVJzz
 9Kk98r0kgo6YKghmaIhBgchTt+rxZbo4R0Jb9BoBPXeQ32AghacOlngXPiAJA2d/
 Yeqet5p0gg8oI62KhhBC+uXyR29DG4c3iD2pUUhPnxV1Gt3S6AG1PTkt0KYzqVKm
 XEIxjrmtn3+2NYcaaH340mQwBbBV1EdZroHeBPUN9gYr5MpkHjppBJyrQ76cXXcr
 xKw2C357H4O8tT2nxz4LvJagRKhnZa0P6vWL6HeFIlFHXOqEJXYG6X2jbDzomUXd
 NPQrwKJnf2iYiRcc4U5xyABZE98/kVmqcclJDs0Zxz0eX8N8e26d/wjFKAZxfJyY
 pCUjF7lo7Doq6sOiB25OZjbPN3Fz/BcBTYTCTK+8MuQM1EaKXm33RcWjtO2gDUoe
 vyGfQResdwFELj9niPf27Nymezm1nX27x6f/jqCmioLxqQDGAGPsgJIMHy9XO8wz
 3YU/92dbwBdOQ6laKT7f5ClfPGDQlpqQTXXyxVHLFUd3T6JHJGUv0AplwVNZk6l8
 KCXEWg==
 =uK7h
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-25-09-24' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: fixes for net-next

These fixes target next because the bug is either not severe or has
existed for so long that there is no reason to cram them in at the last
minute.

1) Fix IPVS ftp unregistering during netns cleanup, broken since netns
   support was introduced in 2011 in the 2.6.39 kernel.
   From Slavin Liu.

2) nfnetlink must reset the 'nlh' pointer back to the original
   address when a batch is replayed, else we emit bogus ACK messages
   and conceal real errno from userspace.
   From Fernando Fernandez Mancera.  This was broken since 6.10.

3) Recent fix for nftables 'pipapo' set type was incomplete, it only
   made things work for the AVX2 version of the algorithm.

4) Testing revealed another problem with avx2 version that results in
   out-of-bounds read access, this bug always existed since feature was
   added in 5.7 kernel.  This also comes with a selftest update.

Last fix resolves a long-standing bug (since 4.9) in conntrack /proc
interface:
Decrease skip count when we reap an expired entry during dump.
As-is we erronously elide one conntrack entry from dump for every expired
entry seen.  From Eric Dumazet.

* tag 'nf-next-25-09-24' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_conntrack: do not skip entries in /proc/net/nf_conntrack
  selftests: netfilter: nft_concat_range.sh: add check for double-create bug
  netfilter: nft_set_pipapo_avx2: fix skip of expired entries
  netfilter: nft_set_pipapo: use 0 genmask for packetpath lookups
  netfilter: nfnetlink: reset nlh pointer during batch replay
  ipvs: Defer ip_vs_ftp unregister during netns cleanup
====================

Link: https://patch.msgid.link/20250924140654.10210-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24 17:45:15 -07:00
Marc Zyngier
10fd028530 Merge branch kvm-arm64/selftests-6.18 into kvmarm-master/next
* kvm-arm64/selftests-6.18:
  : .
  : KVM/arm64 selftest updates for 6.18:
  :
  : - Large update to run EL1 selftests at EL2 when possible
  :   (20250917212044.294760-1-oliver.upton@linux.dev)
  :
  : - Work around lack of ID_AA64MMFR4_EL1 trapping on CPUs
  :   without FEAT_FGT
  :   (20250923173006.467455-1-oliver.upton@linux.dev)
  :
  : - Additional fixes and cleanups
  :   (20250920-kvm-arm64-id-aa64isar3-el1-v1-0-1764c1c1c96d@kernel.org)
  : .
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:35:50 +01:00
Mark Brown
b02a2c060b KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
We have a couple of writable bitfields in ID_AA64ISAR3_EL1 but the
set_id_regs selftest does not cover this register at all, add coverage.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:24:57 +01:00
Mark Brown
5a070fc376 KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
Currently we list the main set of registers with bits we test three
times, once in the test_regs array which is used at runtime, once in the
guest code and once in a list of ARRAY_SIZE() operations we use to tell
kselftest how many tests we plan to execute. This is needlessly fiddly,
when adding new registers as the test_cnt calculation is formatted with
two registers per line. Instead count the number of bitfields in the
register arrays at runtime.

The existing code subtracts ARRAY_SIZE(test_regs) from the number of
tests to account for the terminating FTR_REG_END entries in the per
register arrays, the new code accounts for this when enumerating.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:24:57 +01:00
Oliver Upton
75b2fdc1a8 KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
Implementations without FEAT_FGT aren't required to trap the entire ID
register space when HCR_EL2.TID3 is set. This is a terrible idea, as the
hypervisor may need to advertise the absence of a feature to the VM
using a negative value in a signed field, FEAT_E2H0 being a great
example of this.

Cope with uncooperative implementations in the EL2 selftest by accepting
a zero value when FEAT_FGT is absent and otherwise only tolerating the
expected nonzero value.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:24:02 +01:00
Oliver Upton
f677b0efa9 KVM: arm64: selftests: Add basic test for running in VHE EL2
Add an embarrassingly simple selftest for sanity checking KVM's VHE EL2
and test that the ID register bits are consistent with HCR_EL2.E2H being
RES1.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
2de21fb623 KVM: arm64: selftests: Enable EL2 by default
Take advantage of VHE to implicitly promote KVM selftests to run at EL2
with only slight modification. Update the smccc_filter test to account
for this now that the EL2-ness of a VM is visible to tests.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
05c93cbe66 KVM: arm64: selftests: Initialize HCR_EL2
Initialize HCR_EL2 such that EL2&0 is considered 'InHost', allowing the
use of (mostly) unmodified EL1 selftests at EL2.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
7ae44d1cda KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
Configuring the number of implemented counters via PMCR_EL0.N was a bad
idea in retrospect as it interacts poorly with nested. Migrate the
selftest to use the vCPU attribute instead of the KVM_SET_ONE_REG
mechanism.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
0910778e49 KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
Arch timer registers are redirected to their hypervisor counterparts
when running in VHE EL2. This is great, except for the fact that the
hypervisor timers use different PPIs. Use the correct INTIDs when that
is the case.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
d72543ac72 KVM: arm64: selftests: Select SMCCC conduit based on current EL
HVCs are taken within the VM when EL2 is in use. Ensure tests use the
SMC instruction when running at EL2 to interact with the host.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
a1b91ac238 KVM: arm64: selftests: Provide helper for getting default vCPU target
The default vCPU target in KVM selftests is pretty boring in that it
doesn't enable any vCPU features. Expose a helper for getting the
default target to prepare for cramming in more features. Call
KVM_ARM_PREFERRED_TARGET directly from get-reg-list as it needs
fine-grained control over feature flags.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
1c9604ba23 KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
FEAT_VHE has the somewhat nice property of implicitly redirecting EL1
register aliases to their corresponding EL2 representations when E2H=1.
Unfortunately, there's no such abstraction for userspace and EL2
registers are always accessed by their canonical encoding.

Introduce a helper that applies EL2 redirections to sysregs and use
aggressive inlining to catch misuse at compile time. Go a little past
the architectural definition for ease of use for test authors (e.g. the
stack pointer).

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
8911c7dbc6 KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
Start creating a VGICv3 by default unless explicitly opted-out by the
test. While having an interrupt controller is nice, the real benefit
here is clearing a hurdle for EL2 VMs which mandate the presence of a
VGIC.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
b8daa7ceac KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
vgic_v3_setup() has a good bit of sanity checking internally to ensure
that vCPUs have actually been created and match the dimensioning of the
vgic itself. Spin off an unsanitised setup and initialization helper so
vgic initialization can be wired in around a 'default' VM's vCPU
creation.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
b712afa7a1 KVM: arm64: selftests: Add helper to check for VGICv3 support
Introduce a proper predicate for probing VGICv3 by performing a 'test'
creation of the device on a dummy VM.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Oliver Upton
a5022da5f9 KVM: arm64: selftests: Initialize VGICv3 only once
vgic_v3_setup() unnecessarily initializes the vgic twice. Keep the
initialization after configuring MMIO frames and get rid of the other.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:31 +01:00
Oliver Upton
7326348209 KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
In order to compel the default usage of EL2 in selftests, move
kvm_arch_vm_post_create() to library code and expose an opt-in for using
MTE by default.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:31 +01:00
Jakub Kicinski
5e3fee34f6 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaNNwBQAKCRAraS+Z+3y5
 E8heAQDdJTR9rwAL7gD79cldlHP5PTmjyidLIoFG/efaGSbN1AD9EdvrykDU4xOG
 aGaO8TooGUZf7vAL8tIFuMeydYvi/gM=
 =Qu4T
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-09-23

We've added 9 non-merge commits during the last 33 day(s) which contain
a total of 10 files changed, 480 insertions(+), 53 deletions(-).

The main changes are:

1) A new bpf_xdp_pull_data kfunc that supports pulling data from
   a frag into the linear area of a xdp_buff, from Amery Hung.

   This includes changes in the xdp_native.bpf.c selftest, which
   Nimrod's future work depends on.

   It is a merge from a stable branch 'xdp_pull_data' which has
   also been merged to bpf-next.

   There is a conflict with recent changes in 'include/net/xdp.h'
   in the net-next tree that will need to be resolved.

2) A compiler warning fix when CONFIG_NET=n in the recent dynptr
   skb_meta support, from Jakub Sitnicki.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests: drv-net: Pull data before parsing headers
  selftests/bpf: Test bpf_xdp_pull_data
  bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN
  bpf: Make variables in bpf_prog_test_run_xdp less confusing
  bpf: Clear packet pointers after changing packet data in kfuncs
  bpf: Support pulling non-linear xdp data
  bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail
  bpf: Clear pfmemalloc flag when freeing all fragments
  bpf: Return an error pointer for skb metadata when CONFIG_NET=n
====================

Link: https://patch.msgid.link/20250924050303.2466356-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24 10:22:37 -07:00
Will Deacon
712f4ee70a Merge branch 'for-next/selftests' into for-next/core
* for-next/selftests:
  kselftest/arm64: Add lsfe to the hwcaps test
  kselftest/arm64: Check that unsupported regsets fail in sve-ptrace
  kselftest/arm64: Verify that we reject out of bounds VLs in sve-ptrace
  kselftest/arm64/gcs/basic-gcs: Respect parent directory CFLAGS
  selftests/arm64: Fix grammatical error in string literals
  kselftest/arm64: Add parentheses around sizeof for clarity
  kselftest/arm64: Supress warning and improve readability
  kselftest/arm64: Remove extra blank line
  kselftest/arm64/gcs: Use nolibc's getauxval()
  kselftest/arm64/gcs: Correctly check return value when disabling GCS
  selftests: arm64: Fix -Waddress warning in tpidr2 test
  kselftest/arm64: Log error codes in sve-ptrace
  selftests: arm64: Check fread return value in exec_target
2025-09-24 16:34:56 +01:00
Florian Westphal
94bd247bc2 selftests: netfilter: nft_concat_range.sh: add check for double-create bug
Add a test case for bug resolved with:
'netfilter: nft_set_pipapo_avx2: fix skip of expired entries'.

It passes on nf.git (it uses the generic/C version for insertion
duplicate check) but fails on unpatched nf-next if AVX2 is supported:

  cannot create same element twice      0s                        [FAIL]
Could create element twice in same transaction
table inet filter { # handle 8
[..]
  elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
               1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0,
               1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
               1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0 }

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-24 11:50:28 +02:00
Jiri Olsa
3d237467a4 selftests/bpf: Add kprobe multi write ctx attach test
Adding test to check we can't attach kprobe multi program
that writes to the context.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Jiri Olsa
1b881ee294 selftests/bpf: Add kprobe write ctx attach test
Adding test to check we can't attach standard kprobe program that
writes to the context.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Jiri Olsa
6a4ea0d1cb selftests/bpf: Add uprobe context ip register change test
Adding test to check we can change the application execution
through instruction pointer change through uprobe program.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Jiri Olsa
7f8a05c5d3 selftests/bpf: Add uprobe context registers changes test
Adding test to check we can change common register values through
uprobe program.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Petr Machata
f67e9ae72d selftests: bridge_fdb_local_vlan_0: Test FDB vs. NET_ADDR_SET behavior
The previous patch fixed an issue whereby no FDB entry would be created for
the bridge itself on VLAN 0 under some circumstances. This could break
forwarding. Add a test for the fix.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/137cc25396f5a4f407267af895a14bc45552ba5f.1758550408.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 17:10:49 -07:00
Ido Schimmel
00af023d90 selftests: fib_nexthops: Add test cases for FDB status change
Add the following test cases for both IPv4 and IPv6:

* Can change from FDB nexthop to non-FDB nexthop and vice versa.
* Can change FDB nexthop address while in a group.
* Cannot change from FDB nexthop to non-FDB nexthop and vice versa while
  in a group.

Output without "nexthop: Forbid FDB status change while nexthop is in a
group":

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

 IPv6 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
 [...]

 Tests passed:  36
 Tests failed:   4
 Tests skipped:  0

Output with "nexthop: Forbid FDB status change while nexthop is in a
group":

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

 IPv6 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
 [...]

 Tests passed:  40
 Tests failed:   0
 Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 17:01:05 -07:00
Ido Schimmel
c29913109c selftests: fib_nexthops: Fix creation of non-FDB nexthops
The test creates non-FDB nexthops without a nexthop device which leads
to the expected failure, but for the wrong reason:

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

 IPv6 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb
 Error: Invalid nexthop id.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb
 Error: Invalid nexthop id.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

 COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb
 COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb
 COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15
 Error: Invalid nexthop id.
 TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
 [...]
 COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15
 Error: Nexthop id does not exist.
 TEST: Route add with fdb nexthop                                    [ OK ]

In addition, as can be seen in the above output, a couple of IPv4 test
cases used the non-FDB nexthops (14 and 15) when they intended to use
the FDB nexthops (16 and 17). These test cases only passed because
failure was expected, but they failed for the wrong reason.

Fix the test to create the non-FDB nexthops with a nexthop device and
adjust the IPv4 test cases to use the FDB nexthops instead of the
non-FDB nexthops.

Output after the fix:

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

 IPv6 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb
 Error: FDB nexthop group can only have fdb nexthops.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb
 Error: FDB nexthop group can only have fdb nexthops.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

 COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb
 COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb
 COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17
 Error: Non FDB nexthop group cannot have fdb nexthops.
 TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
 [...]
 COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16
 Error: Route cannot point to a fdb nexthop.
 TEST: Route add with fdb nexthop                                    [ OK ]
 [...]
 Tests passed:  30
 Tests failed:   0
 Tests skipped:  0

Fixes: 0534c5489c ("selftests: net: add fdb nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 17:01:05 -07:00
Alok Tiwari
f770645860 selftests: rtnetlink: correct error message in rtnetlink.sh fou test
The rtnetlink FOU selftest prints an incorrect string:
"FAIL: fou"s. Change it to the intended "FAIL: fou" by
removing a stray character in the end_test string of the test.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250921192111.1567498-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 16:55:05 -07:00
Martin KaFai Lau
34f033a6c9 Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/master'
Merge the xdp_pull_data stable branch into the master branch. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-09-23 16:23:58 -07:00
Martin KaFai Lau
55d5a5154d Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/net'
Merge the xdp_pull_data stable branch into the net branch. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-09-23 15:46:52 -07:00
Amery Hung
efec2e55bd selftests: drv-net: Pull data before parsing headers
It is possible for drivers to generate xdp packets with data residing
entirely in fragments. To keep parsing headers using direct packet
access, call bpf_xdp_pull_data() to pull headers into the linear data
area.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-9-ameryhung@gmail.com
2025-09-23 15:21:26 -07:00
Lance Yang
0389c305ef selftests/mm: skip soft-dirty tests when CONFIG_MEM_SOFT_DIRTY is disabled
The madv_populate and soft-dirty kselftests currently fail on systems
where CONFIG_MEM_SOFT_DIRTY is disabled.

Introduce a new helper softdirty_supported() into vm_util.c/h to ensure
tests are properly skipped when the feature is not enabled.

Link: https://lkml.kernel.org/r/20250917133137.62802-1-lance.yang@linux.dev
Fixes: 9f3265db6a ("selftests: vm: add test for Soft-Dirty PTE bit")
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-23 14:14:16 -07:00
Amery Hung
323302f54d selftests/bpf: Test bpf_xdp_pull_data
Test bpf_xdp_pull_data() with xdp packets with different layouts. The
xdp bpf program first checks if the layout is as expected. Then, it
calls bpf_xdp_pull_data(). Finally, it checks the 0xbb marker at offset
1024 using directly packet access.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-8-ameryhung@gmail.com
2025-09-23 13:35:12 -07:00
Amery Hung
fe9544ed1a bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN
To test bpf_xdp_pull_data(), an xdp packet containing fragments as well
as free linear data area after xdp->data_end needs to be created.
However, bpf_prog_test_run_xdp() always fills the linear area with
data_in before creating fragments, leaving no space to pull data. This
patch will allow users to specify the linear data size through
ctx->data_end.

Currently, ctx_in->data_end must match data_size_in and will not be the
final ctx->data_end seen by xdp programs. This is because ctx->data_end
is populated according to the xdp_buff passed to test_run. The linear
data area available in an xdp_buff, max_linear_sz, is alawys filled up
before copying data_in into fragments.

This patch will allow users to specify the size of data that goes into
the linear area. When ctx_in->data_end is different from data_size_in,
only ctx_in->data_end bytes of data will be put into the linear area when
creating the xdp_buff.

While ctx_in->data_end will be allowed to be different from data_size_in,
it cannot be larger than the data_size_in as there will be no data to
copy from user space. If it is larger than the maximum linear data area
size, the layout suggested by the user will not be honored. Data beyond
max_linear_sz bytes will still be copied into fragments.

Finally, since it is possible for a NIC to produce a xdp_buff with empty
linear data area, allow it when calling bpf_test_init() from
bpf_prog_test_run_xdp() so that we can test XDP kfuncs with such
xdp_buff. This is done by moving lower-bound check to callers as most of
them already do except bpf_prog_test_run_skb(). The change also fixes a
bug that allows passing an xdp_buff with data < ETH_HLEN. This can
happen when ctx is used and metadata is at least ETH_HLEN.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-7-ameryhung@gmail.com
2025-09-23 13:35:12 -07:00
Leon Hwang
1c6686bf7f selftests/bpf: Add union argument tests using fexit programs
Add test coverage for union argument support using fexit programs:

* 8B union argument - verify that the verifier accepts it and that fexit
  programs can trace such functions.
* 16B union argument - verify that the verifier accepts it and that
  fexit programs can access the argument, which is passed using two
  registers.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250919044110.23729-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:07:47 -07:00
Puranjay Mohan
f616549124 selftests: bpf: Add tests for signed loads from arena
Add tests for loading 8, 16, and 32 bits with sign extension from arena,
also verify that exception handling is working correctly and correct
assembly is being generated by the x86 and arm64 JITs.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250923110157.18326-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:00:23 -07:00
Sean Christopherson
947ab90c91 KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
Add a check in the MSRs test to verify that KVM's reported support for
MSRs with feature bits is consistent between KVM's MSR save/restore lists
and KVM's supported CPUID.

To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET,
track the "second" feature to avoid false failures when running on a CPU
with only one of IBT or SHSTK.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-51-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 10:03:02 -07:00
Sean Christopherson
3469fd203b KVM: selftests: Add coverage for KVM-defined registers in MSRs test
Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs
test.  While _KVM's_ goal is to not tie the uAPI of KVM-defined registers
to any particular internal implementation, i.e. to not commit in uAPI to
handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing
purposes is a-ok and is a naturally fit given the semantics of SSP.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-50-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:59:44 -07:00
Sean Christopherson
80c2b6d8e7 KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
When KVM_{G,S}ET_ONE_REG are supported, verify that MSRs can be accessed
via ONE_REG and through the dedicated MSR ioctls.  For simplicity, run
the test twice, e.g. instead of trying to get MSR values into the exact
right state when switching write methods.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-49-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:52:18 -07:00
Sean Christopherson
a8b9cca99c KVM: selftests: Extend MSRs test to validate vCPUs without supported features
Add a third vCPUs to the MSRs test that runs with all features disabled in
the vCPU's CPUID model, to verify that KVM does the right thing with
respect to emulating accesses to MSRs that shouldn't exist.  Use the same
VM to verify that KVM is honoring the vCPU model, e.g. isn't looking at
per-VM state when emulating MSR accesses.

Link: https://lore.kernel.org/r/20250919223258.1604852-48-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:52:17 -07:00
Sean Christopherson
27c4135306 KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
Extend the MSRs test to support {S,U}_CET, which are a bit of a pain to
handled due to the MSRs existing if IBT *or* SHSTK is supported.  To deal
with Intel's wonderful decision to bundle IBT and SHSTK under CET, track
the second feature, but skip only RDMSR #GP tests to avoid false failures
when running on a CPU with only one of IBT or SHSTK (the WRMSR #GP tests
are still valid since the enable bits are per-feature).

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-47-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:51:59 -07:00
Sean Christopherson
9c38ddb3df KVM: selftests: Add an MSR test to exercise guest/host and read/write
Add a selftest to verify reads and writes to various MSRs, from both the
guest and host, and expect success/failure based on whether or not the
vCPU supports the MSR according to supported CPUID.

Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but
provides more coverage with respect to host accesses, and will be extended
to provide addition testing of CPUID-based features, save/restore lists,
and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in
KUT.

If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as
KVM's ABI is a mess; what exactly is supposed to be ignored, and when,
varies wildly.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-46-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:51:56 -07:00
Sean Christopherson
1f2bbbbbda KVM: x86: Merge 'selftests' into 'cet' to pick up ex_str()
Merge the queue of KVM selftests changes for 6.18 to pick up the ex_str()
helper so that it can be used to pretty print expected versus actual
exceptions in a new MSR selftest.  CET virtualization will add support for
several MSRs with non-trivial semantics, along with new uAPI for accessing
the guest's Shadow Stack Pointer (SSP) from userspace.
2025-09-23 09:00:18 -07:00
Sean Christopherson
df1f294013 KVM: selftests: Add ex_str() to print human friendly name of exception vectors
Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
lengths reasonable) and use it in assert messages that currently print the
raw vector number.

Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-45-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:39:02 -07:00
Sukrut Heroorkar
ff86b48d4c selftests/kvm: remove stale TODO in xapic_state_test
The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.

Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Link: https://lore.kernel.org/r/20250908210547.12748-1-hsukrut3@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:39:01 -07:00
dongsheng
c435978e4f KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
Add a PMU errata framework and use it to relax precise event counts on
Atom platforms that overcount "Instruction Retired" and "Branch Instruction
Retired" events, as the overcount issues on VM-Exit/VM-Entry are impossible
to prevent from userspace, e.g. the test can't prevent host IRQs.

Setup errata during early initialization and automatically sync the mask
to VMs so that tests can check for errata without having to manually
manage host=>guest variables.

For Intel Atom CPUs, the PMU events "Instruction Retired" or
"Branch Instruction Retired" may be overcounted for some certain
instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
and complex SGX/SMX/CSTATE instructions/flows.

The detailed information can be found in the errata (section SRF7):
https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/

For the Atom platforms before Sierra Forest (including Sierra Forest),
Both 2 events "Instruction Retired" and "Branch Instruction Retired" would
be overcounted on these certain instructions, but for Clearwater Forest
only "Instruction Retired" event is overcounted on these instructions.

Signed-off-by: dongsheng <dongsheng.x.zhang@intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Dapeng Mi
2922b59588 KVM: selftests: Validate more arch-events in pmu_counters_test
Add support for 5 new architectural events (4 topdown level 1 metrics
events and LBR inserts event) that will first show up in Intel's
Clearwater Forest CPUs.  Detailed info about the new events can be found
in SDM section 21.2.7 "Pre-defined Architectural  Performance Events".

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: drop "unavailable_mask" changes]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Sean Christopherson
1fcd3053aa KVM: selftests: Reduce number of "unavailable PMU events" combos tested
Reduce the number of combinations of unavailable PMU events masks that are
testing by the PMU counters test.  In reality, testing every possible
combination isn't all that interesting, and certainly not worth the tens
of seconds (or worse, minutes) of runtime.  Fully testing the N^2 space
will be especially problematic in the near future, as 5! new arch events
are on their way.

Use alternating bit patterns (and 0 and -1u) in the hopes that _if_ there
is ever a KVM bug, it's not something horribly convoluted that shows up
only with a super specific pattern/value.

Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Sean Christopherson
571fc2833e KVM: selftests: Track unavailable_mask for PMU events as 32-bit value
Track the mask of "unavailable" PMU events as a 32-bit value.  While bits
31:9 are currently reserved, silently truncating those bits is unnecessary
and asking for missed coverage.  To avoid running afoul of the sanity check
in vcpu_set_cpuid_property(), explicitly adjust the mask based on the
non-reserved bits as reported by KVM's supported CPUID.

Opportunistically update the "all ones" testcase to pass -1u instead of
0xff.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Dapeng Mi
210b09fa42 KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test
A new bit PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
to indicated if PEBS supports to record timing information in a new
"Retried Latency" field.

Since KVM requires user can only set host consistent PEBS capabilities,
otherwise the PERF_CAPABILITIES setting would fail, add pebs_timing_info
into the "immutable_caps" to block host inconsistent PEBS configuration
and cause errors.

Opportunistically drop the anythread_deprecated bit.  It isn't and likely
never was a PERF_CAPABILITIES flag, the test's definition snuck in when
the union was copy+pasted from the kernel's definition.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: call out anythread_deprecated change]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 08:38:59 -07:00
Mykyta Yatsenko
c6ae18e0af selftests/bpf: add bpf task work stress tests
Add stress tests for BPF task-work scheduling kfuncs. The tests spawn
multiple threads that concurrently schedule task_work callbacks against
the same and different map values to exercise the kfuncs under high
contention.
Verify callbacks are reliably enqueued and executed with no drops.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20250923112404.668720-10-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 07:34:39 -07:00
Mykyta Yatsenko
39fd74dfd5 selftests/bpf: BPF task work scheduling tests
Introducing selftests that check BPF task work scheduling mechanism.
Validate that verifier does not accepts incorrect calls to
bpf_task_work_schedule kfunc.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250923112404.668720-9-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 07:34:39 -07:00
Jakub Sitnicki
8a8241cdaa selftests/net: Test tcp port reuse after unbinding a socket
Exercise the scenario described in detail in the cover letter:

  1) socket A: connect() from ephemeral port X
  2) socket B: explicitly bind() to port X
  3) check that port X is now excluded from ephemeral ports
  4) close socket B to release the port bind
  5) socket C: connect() from ephemeral port X

As well as a corner case to test that the connect-bind flag is cleared:

  1) connect() from ephemeral port X
  2) disconnect the socket with connect(AF_UNSPEC)
  3) bind() it explicitly to port X
  4) check that port X is now excluded from ephemeral ports

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20250917-update-bind-bucket-state-on-unhash-v5-2-57168b661b47@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23 10:12:15 +02:00
Lorenzo Stoakes
f7a741c53b mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()
In commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
nested file systems") we introduced the ability for stacked drivers and
file systems to correctly invoke the f_op->mmap_prepare() handler from an
f_op->mmap() handler via a compatibility layer implemented in
compat_vma_mmap_prepare().

This populates vm_area_desc fields according to those found in the (not
yet fully initialised) VMA passed to f_op->mmap().

However this function implicitly assumes that the struct file which we are
operating upon is equal to vma->vm_file.  This is not a safe assumption in
all cases.

The only really sane situation in which this matters would be something
like e.g.  i915_gem_dmabuf_mmap() which invokes vfs_mmap() against
obj->base.filp:

	ret = vfs_mmap(obj->base.filp, vma);
	if (ret)
		return ret;

And then sets the VMA's file to this, should the mmap operation succeed:

	vma_set_file(vma, obj->base.filp);

That is - it is the file that is intended to back the VMA mapping.

This is not an issue currently, as so far we have only implemented
f_op->mmap_prepare() handlers for some file systems and internal mm uses,
and the only stacked f_op->mmap() operations that can be performed upon
these are those in backing_file_mmap() and coda_file_mmap(), both of which
use vma->vm_file.

However, moving forward, as we convert drivers to using
f_op->mmap_prepare(), this will become a problem.

Resolve this issue by explicitly setting desc->file to the provided file
parameter and update callers accordingly.

Callers are expected to read desc->file and update desc->vm_file - the
former will be the file provided by the caller (if stacked, this may
differ from vma->vm_file).

If the caller needs to differentiate between the two they therefore now
can.

While we are here, also provide a variant of compat_vma_mmap_prepare()
that operates against a pointer to any file_operations struct and does not
assume that the file_operations struct we are interested in is file->f_op.

This function is __compat_vma_mmap_prepare() and we invoke it from
compat_vma_mmap_prepare() so that we share code between the two functions.

This is important, because some drivers provide hooks in a separate
struct, for instance struct drm_device provides an fops field for this
purpose.

Also update the VMA selftests accordingly.

Link: https://lkml.kernel.org/r/dd0c72df8a33e8ffaa243eeb9b01010b670610e9.1756920635.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-22 20:17:11 -07:00
Lorenzo Stoakes
af6703838e mm: specify separate file and vm_file params in vm_area_desc
Patch series "mm: do not assume file == vma->vm_file in
compat_vma_mmap_prepare()", v2.

As part of the efforts to eliminate the problematic f_op->mmap callback, a
new callback - f_op->mmap_prepare was provided.

While we are converting these callbacks, we must deal with 'stacked'
filesystems and drivers - those which in their own f_op->mmap callback
invoke an inner f_op->mmap callback.

To accomodate for this, a compatibility layer is provided that, via
vfs_mmap(), detects if f_op->mmap_prepare is provided and if so, generates
a vm_area_desc containing the VMA's metadata and invokes the call.

So far, we have provided desc->file equal to vma->vm_file.  However this
is not necessarily valid, especially in the case of stacked drivers which
wish to assign a new file after the inner hook is invoked.

To account for this, we adjust vm_area_desc to have both file and vm_file
fields.  The .vm_file field is strictly set to vma->vm_file (or in the
case of a new mapping, what will become vma->vm_file).

However, .file is set to whichever file vfs_mmap() is invoked with when
using the compatibilty layer.

Therefore, if the VMA's file needs to be updated in .mmap_prepare,
desc->vm_file should be assigned, whilst desc->file should be read.

No current f_op->mmap_prepare users assign desc->file so this is safe to
do.

This makes the .mmap_prepare callback in the context of a stacked
filesystem or driver completely consistent with the existing .mmap
implementations.

While we're here, we do a few small cleanups, and ensure that we const-ify
things correctly in the vm_area_desc struct to avoid hooks accidentally
trying to assign fields they should not.


This patch (of 2):

Stacked filesystems and drivers may invoke mmap hooks with a struct file
pointer that differs from the overlying file.  We will make this
functionality possible in a subsequent patch.

In order to prepare for this, let's update vm_area_struct to separately
provide desc->file and desc->vm_file parameters.

The desc->file parameter is the file that the hook is expected to operate
upon, and is not assignable (though the hok may wish to e.g.  update the
file's accessed time for instance).

The desc->vm_file defaults to what will become vma->vm_file and is what
the hook must reassign should it wish to change the VMA"s vma->vm_file.

For now we keep desc->file, vm_file the same to remain consistent.

No f_op->mmap_prepare() callback sets a new vma->vm_file currently, so
this is safe to change.

While we're here, make the mm_struct desc->mm pointers at immutable as
well as the desc->mm field itself.

As part of this change, also update the single hook which this would
otherwise break - mlock_future_ok(), invoked by secretmem_mmap_prepare()).

We additionally update set_vma_from_desc() to compare fields in a more
logical fashion, checking the (possibly) user-modified fields as the first
operand against the existing value as the second one.

Additionally, update VMA tests to accommodate changes.

Link: https://lkml.kernel.org/r/cover.1756920635.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3fa15a861bb7419f033d22970598aa61850ea267.1756920635.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-22 20:17:11 -07:00
KP Singh
b720903e2b selftests/bpf: Enable signature verification for some lskel tests
The test harness uses the verify_sig_setup.sh to generate the required
key material for program signing.

Generate key material for signing LSKEL some lskel programs and use
xxd to convert the verification certificate into a C header file.

Finally, update the main test runner to load this
certificate into the session keyring via the add_key() syscall before
executing any tests. Use the session keyring in the tests with signed
programs.

Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250921160120.9711-6-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-22 19:17:55 -07:00
Matthieu Baerts (NGI0)
e6c3552945 selftests: mptcp: pm: get server-side flag
server-side info linked to the MPTCP connect/established events can now
come from the flags, in addition to the dedicated attribute.

The attribute is now deprecated -- in favour of the new flag, and will
be removed later on.

Print this info only once.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-4-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:51:25 -07:00
Matthieu Baerts (NGI0)
c9809f03c1 mptcp: pm: netlink: only add server-side attr when true
This attribute is a boolean. No need to add it to set it to 'false'.

Indeed, the default value when this attribute is not set is naturally
'false'. A few bytes can then be saved by not adding this attribute if
the connection is not on the server side.

This prepares the future deprecation of its attribute, in favour of a
new flag.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-1-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:51:24 -07:00
David Yang
50d51cef55 selftests: forwarding: Reorder (ar)ping arguments to obey POSIX getopt
Quoted from musl wiki:

  GNU getopt permutes argv to pull options to the front, ahead of
  non-option arguments. musl and the POSIX standard getopt stop
  processing options at the first non-option argument with no
  permutation.

Thus these scripts stop working on musl since non-option arguments for
tools using getopt() (in this case, (ar)ping) do not always come last.
Fix it by reordering arguments.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250919053538.1106753-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:37:20 -07:00
Linus Torvalds
d4c7fccfa7 iommufd 6.17 second rc pull
Two user triggerable UAFs:
 
 - Possible race UAF setting up mmaps
 
 - Syzkaller found UAF when erroring an file descriptor creation ioctl due
   to the fput() work queue.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaNFXCgAKCRCFwuHvBreF
 YXoFAP9xzBDOVp64CJBY8cD/URaQc8t55b8uJCYgxbDesQa4iAEAxTX9gdZ7Lr6Z
 THwdY00LI040gBHUxIZOaXNeOhuPIg8=
 =DQIy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:
 "Fix two user triggerable use-after-free issues:

   - Possible race UAF setting up mmaps

   - Syzkaller found UAF when erroring an file descriptor creation ioctl
     due to the fput() work queue"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd/selftest: Update the fail_nth limit
  iommufd: WARN if an object is aborted with an elevated refcount
  iommufd: Fix race during abort for file descriptors
  iommufd: Fix refcounting race during mmap
2025-09-22 11:16:14 -07:00
Sean Christopherson
e8f85d7884 KVM: x86: Don't treat ENTER and LEAVE as branches, because they aren't
Remove the IsBranch flag from ENTER and LEAVE in KVM's emulator, as ENTER
and LEAVE are stack operations, not branches.  Add forced emulation of
said instructions to the PMU counters test to prove that KVM diverges from
hardware, and to guard against regressions.

Opportunistically add a missing "1 MOV" to the selftest comment regarding
the number of instructions per loop, which commit 7803339fa9 ("KVM:
selftests: Use data load to trigger LLC references/misses in Intel PMU")
forgot to add.

Fixes: 018d70ffcf ("KVM: x86: Update vPMCs when retiring branch instructions")
Cc: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919004639.1360453-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-22 07:14:05 -07:00
Muhammad Usama Anjum
e75f15fb69 selftests/mm: protection_keys: fix dead code
The while loop doesn't execute and following warning gets generated:

protection_keys.c:561:15: warning: code will never be executed
[-Wunreachable-code]
                int rpkey = alloc_random_pkey();

Let's enable the while loop such that it gets executed nr_iterations
times. Simplify the code a bit as well.

Link: https://lkml.kernel.org/r/20250912123025.1271051-3-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:34 -07:00
Muhammad Usama Anjum
3d5022a0f8 selftests/mm: add -Wunreachable-code and fix warnings
Patch series "selftests/mm: Add -Wunreachable-code and fix warnings".

Add -Wunreachable-code to selftests and remove dead code from generated
warnings.


This patch (of 2):

Enable -Wunreachable-code flag to catch dead code and fix them.

1. Remove the dead code and write a comment instead:
hmm-tests.c:2033:3: warning: code will never be executed
[-Wunreachable-code]
                perror("Should not reach this\n");
                ^~~~~~

2. ksft_exit_fail_msg() calls exit(). So cleanup isn't done. Replace it
   with ksft_print_msg().
split_huge_page_test.c:301:3: warning: code will never be executed
[-Wunreachable-code]
                goto cleanup;
                ^~~~~~~~~~~~

3. Remove duplicate inline.
pkey_sighandler_tests.c:44:15: warning: duplicate 'inline' declaration
specifier [-Wduplicate-decl-specifier]
static inline __always_inline

Link: https://lkml.kernel.org/r/20250912123025.1271051-1-usama.anjum@collabora.com
Link: https://lkml.kernel.org/r/20250912123025.1271051-2-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:34 -07:00
Muhammad Usama Anjum
5ea8ab7f93 selftests/mm: centralize the __always_unused macro
This macro gets used in different tests.  Add it to kselftest.h which is
central location and tests use this header.  Then use this new macro.

Link: https://lkml.kernel.org/r/20250912125102.1309796-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Antonio Quartulli <antonio@openvpn.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Sabrina Dubroca" <sd@queasysnail.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:33 -07:00
David Hildenbrand
a5883fa942 selftests/mm: gup_tests: option to GUP all pages in a single call
We recently missed detecting an issue during early testing because the
default (!all) tests would not trigger it and even when running "all"
tests it only would happen sometimes because of races.

So let's allow for an easy way to specify "GUP all pages in a single
call", extend the test matrix and extend our default (!all) tests.

By GUP'ing all pages in a single call, with the default size of 128MiB
we'll cover multiple leaf page tables / PMDs on architectures with sane
THP sizes.

Link: https://lkml.kernel.org/r/20250910093051.1693097-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:33 -07:00
Zach O'Keefe
9b375adb39 selftests/mm: remove PROT_EXEC req from file-collapse tests
As of v6.8 commit 7fbb5e1882 ("mm: remove VM_EXEC requirement for THP
eligibility") thp collapse no longer requires file-backed mappings be
created with PROT_EXEC.

Remove the overly-strict dependency from thp collapse tests so we test the
least-strict requirement for success.

Link: https://lkml.kernel.org/r/20250909190534.512801-1-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:31 -07:00
Chunyu Hu
c56325259a selftests/mm: fix va_high_addr_switch.sh failure on x86_64
The test will fail as below on x86_64 with cpu la57 support (will skip if
no la57 support).  Note, the test requries nr_hugepages to be set first.

  # running bash ./va_high_addr_switch.sh
  # -------------------------------------
  # mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK
  # mmap(addr_switch_hint - pagesize, (2 * pagesize)): 0x7f55b60f9000 - OK
  # mmap(addr_switch_hint, pagesize): 0x800000000000 - OK
  # mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK
  # mmap(NULL): 0x7f55b60f9000 - OK
  # mmap(low_addr): 0x40000000 - OK
  # mmap(high_addr): 0x1000000000000 - OK
  # mmap(high_addr) again: 0xffff55b6136000 - OK
  # mmap(high_addr, MAP_FIXED): 0x1000000000000 - OK
  # mmap(-1): 0xffff55b6134000 - OK
  # mmap(-1) again: 0xffff55b6132000 - OK
  # mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK
  # mmap(addr_switch_hint - pagesize, 2 * pagesize): 0x7f55b60f9000 - OK
  # mmap(addr_switch_hint - pagesize/2 , 2 * pagesize): 0x7f55b60f7000 - OK
  # mmap(addr_switch_hint, pagesize): 0x800000000000 - OK
  # mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK
  # mmap(NULL, MAP_HUGETLB): 0x7f55b5c00000 - OK
  # mmap(low_addr, MAP_HUGETLB): 0x40000000 - OK
  # mmap(high_addr, MAP_HUGETLB): 0x1000000000000 - OK
  # mmap(high_addr, MAP_HUGETLB) again: 0xffff55b5e00000 - OK
  # mmap(high_addr, MAP_FIXED | MAP_HUGETLB): 0x1000000000000 - OK
  # mmap(-1, MAP_HUGETLB): 0x7f55b5c00000 - OK
  # mmap(-1, MAP_HUGETLB) again: 0x7f55b5a00000 - OK
  # mmap(addr_switch_hint - pagesize, 2*hugepagesize, MAP_HUGETLB): 0x800000000000 - FAILED
  # mmap(addr_switch_hint , 2*hugepagesize, MAP_FIXED | MAP_HUGETLB): 0x800000000000 - OK
  # [FAIL]

addr_switch_hint is defined as DFEFAULT_MAP_WINDOW in the failed test (for
x86_64, DFEFAULT_MAP_WINDOW is defined as (1UL<<47) - pagesize) in 64 bit.

Before commit cc92882ee2 ("mm: drop hugetlb_get_unmapped_area{_*}
functions"), for x86_64 hugetlb_get_unmapped_area() is handled in arch
code arch/x86/mm/hugetlbpage.c and addr is checked with
map_address_hint_valid() after align with 'addr &= huge_page_mask(h)'
which is a round down way, and it will fail the check because the addr is
within the DEFAULT_MAP_WINDOW but (addr + len) is above the
DFEFAULT_MAP_WINDOW.  So it wil go through the
hugetlb_get_unmmaped_area_top_down() to find an area within the
DFEFAULT_MAP_WINDOW.

After commit cc92882ee2 ("mm: drop hugetlb_get_unmapped_area{_*}
functions").  The addr hint for hugetlb_get_unmmaped_area() will be
rounded up and aligned to hugepage size with ALIGN() for all arches.  And
after the align, the addr will be above the default MAP_DEFAULT_WINDOW,
and the map_addresshint_valid() check will pass because both aligned addr
(addr0) and (addr + len) are above the DEFAULT_MAP_WINDOW, and the aligned
hint address (0x800000000000) is returned as an suitable gap is found
there, in arch_get_unmapped_area_topdown().

To still cover the case that addr is within the DEFAULT_MAP_WINDOW, and
addr + len is above the DFEFAULT_MAP_WINDOW, change to choose the last
hugepage aligned address within the DEFAULT_MAP_WINDOW as the hint addr,
and the addr + len (2 hugepages) will be one hugepage above the
DEFAULT_MAP_WINDOW.  An aligned address won't be affected by the page
round up or round down from kernel, so it's determistic.

Link: https://lkml.kernel.org/r/20250912013711.3002969-4-chuhu@redhat.com
Fixes: cc92882ee2 ("mm: drop hugetlb_get_unmapped_area{_*} functions")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:29 -07:00
Chunyu Hu
d9d957bd7b selftests/mm: alloc hugepages in va_high_addr_switch test
Alloc hugepages in the test internally, so we don't fully rely on the
run_vmtests.sh.  If run_vmtests.sh does that great, free hugepages is
enough for being used to run the test, leave it as it is, otherwise setup
the hugepages in the test.

Save the original nr_hugepages value and restore it after test finish, so
leave a stable test envronment.

Link: https://lkml.kernel.org/r/20250912013711.3002969-3-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:28 -07:00
Chunyu Hu
6e296bcf29 selftests/mm: fix hugepages cleanup too early
Patch series "Fix va_high_addr_switch.sh test failure", v3.

These three patches fix the va_high_addr_switch.sh test failure on x86_64.

Patch 1 fixes the hugepage setup issue that nr_hugepages is reset too
early in run_vmtests.sh and break the later va_high_addr_switch testing.

Patch 2 adds hugepage setup in va_high_addr_switch test, so that it can
still work if vm_runtests.sh changes the hugepage setup someday.

Patch 3 fixes the test failure caused by the hint addr align method change
in hugetlb_get_unmapped_area().


This patch (of 3):

The nr_hugepgs variable is used to keep the original nr_hugepages at the
hugepage setup step at test beginning.  After userfaultfd test, a cleaup
is executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and
/proc/sys//vm/nr_hugepages are reset to 'original' value before
userfaultfd test starts.

Issue here is the value used to restore /proc/sys/vm/nr_hugepages is
nr_hugepgs which is the initial value before the vm_runtests.sh runs, not
the value before userfaultfd test starts.  'va_high_addr_swith.sh' tests
runs after that will possibly see no hugepages available for test, and got
EINVAL when mmap(HUGETLB), making the result invalid.

And before pkey tests, nr_hugepgs is changed to be used as a temp variable
to save nr_hugepages before pkey test, and restore it after pkey tests
finish.  The original nr_hugepages value is not tracked anymore, so no way
to restore it after all tests finish.

Add a new variable orig_nr_hugepgs to save the original nr_hugepages, and
and restore it to nr_hugepages after all tests finish.  And change to use
the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after
hugepage setup, it's also the value before userfaultfd test starts, and
the correct value to be restored after userfaultfd finishes.  The
va_high_addr_switch.sh broken will be resolved.

Link: https://lkml.kernel.org/r/20250912013711.3002969-1-chuhu@redhat.com
Link: https://lkml.kernel.org/r/20250912013711.3002969-2-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:28 -07:00
David Hildenbrand
24a3c7af3b selftests/mm: split_huge_page_test: cleanups for split_pte_mapped_thp test
There is room for improvement, so let's clean up a bit:

(1) Define "4" as a constant.

(2) SKIP if we fail to allocate all THPs (e.g., fragmented) and add
    recovery code for all other failure cases: no need to exit the test.

(3) Rename "len" to thp_area_size, and "one_page" to "thp_area".

(4) Allocate a new area "page_area" into which we will mremap the
    pages; add "page_area_size". Now we can easily merge the two
    mremap instances into a single one.

(5) Iterate THPs instead of bytes when checking for missed THPs after
    mremap.

(6) Rename "pte_mapped2" to "tmp", used to verify mremap(MAP_FIXED)
    result.

(7) Split the corruption test from the failed-split test, so we can just
    iterate bytes vs. thps naturally.

(8) Extend comments and clarify why we are using mremap in the first
    place.

Link: https://lkml.kernel.org/r/20250903070253.34556-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:20 -07:00
David Hildenbrand
0d0e03d5b8 selftests/mm: split_huge_page_test: fix occasional is_backed_by_folio() wrong results
Patch series "selftests/mm: split_huge_page_test: split_pte_mapped_thp
improvements", v2.

One fix for occasional failures I found while testing and a bunch of
cleanups that should make that test easier to digest.


This patch (of 2):

When checking for actual tail or head pages of a folio, we must make sure
that the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flag is paired with KPF_THP.

For example, if we have another large folio after our large folio in
physical memory, our "pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL)" would
trigger even though it's actually a head page of the next folio.

If is_backed_by_folio() returns a wrong result, split_pte_mapped_thp() can
fail with "Some THPs are missing during mremap".

Fix it by checking for head/tail pages of folios properly.  Add
folio_tail_flags/folio_head_flags to improve readability and use these
masks also when just testing for any compound page.

Link: https://lkml.kernel.org/r/20250903070253.34556-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250903070253.34556-2-david@redhat.com
Fixes: 169b456b0162 ("selftests/mm: reimplement is_backed_by_thp() with more precise check")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:20 -07:00
David Hildenbrand
84efbefa26 mm: remove nth_page()
Now that all users are gone, let's remove it.

Link: https://lkml.kernel.org/r/20250901150359.867252-38-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:10 -07:00
David Hildenbrand
3b864c8f55 wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config
It's no longer user-selectable (and the default was already "y"), so let's
just drop it.

It was never really relevant to the wireguard selftests either way.

Link: https://lkml.kernel.org/r/20250901150359.867252-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:22:01 -07:00
Ujwal Kundur
4dfd4bba85 selftests/mm/uffd: refactor non-composite global vars into struct
Refactor macros and non-composite global variable definitions into a
struct that is defined at the start of a test and is passed around instead
of relying on global vars.

Link: https://lkml.kernel.org/r/20250829155600.2000-1-ujwal.kundur@gmail.com
Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:21:59 -07:00
Johannes Weiner
2ccd9fecd9 mm: remove unused zpool layer
With zswap using zsmalloc directly, there are no more in-tree users of
this code.  Remove it.

With zpool gone, zsmalloc is now always a simple dependency and no
longer something the user needs to configure. Hide CONFIG_ZSMALLOC
from the user and have zswap and zram pull it in as needed.

Link: https://lkml.kernel.org/r/20250829162212.208258-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> 
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21 14:21:59 -07:00
Uday Shankar
a3835a4410 selftests: ublk: fix behavior when fio is not installed
Some ublk selftests have strange behavior when fio is not installed.
While most tests behave correctly (run if they don't need fio, or skip
if they need fio), the following tests have different behavior:

- test_null_01, test_null_02, test_generic_01, test_generic_02, and
  test_generic_12 try to run fio without checking if it exists first,
  and fail on any failure of the fio command (including "fio command
  not found"). So these tests fail when they should skip.
- test_stress_05 runs fio without checking if it exists first, but
  doesn't fail on fio command failure. This test passes, but that pass
  is misleading as the test doesn't do anything useful without fio
  installed. So this test passes when it should skip.

Fix these issues by adding _have_program fio checks to the top of all of
these tests.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20 23:18:23 -06:00
Yonghong Song
5a427fddec selftests/bpf: Fix selftest verifier_arena_large failure
With latest llvm22, I got the following verification failure:

  ...
  ; int big_alloc2(void *ctx) @ verifier_arena_large.c:207
  0: (b4) w6 = 1                        ; R6_w=1
  ...
  ; if (err) @ verifier_arena_large.c:233
  53: (56) if w6 != 0x0 goto pc+62      ; R6=0
  54: (b7) r7 = -4                      ; R7_w=-4
  55: (18) r8 = 0x7f4000000000          ; R8_w=scalar()
  57: (bf) r9 = addr_space_cast(r8, 0, 1)       ; R8_w=scalar() R9_w=arena
  58: (b4) w6 = 5                       ; R6_w=5
  ; pg = page[i]; @ verifier_arena_large.c:238
  59: (bf) r1 = r7                      ; R1_w=-4 R7_w=-4
  60: (07) r1 += 4                      ; R1_w=0
  61: (79) r2 = *(u64 *)(r9 +0)         ; R2_w=scalar() R9_w=arena
  ; if (*pg != i) @ verifier_arena_large.c:239
  62: (bf) r3 = addr_space_cast(r2, 0, 1)       ; R2_w=scalar() R3_w=arena
  63: (71) r3 = *(u8 *)(r3 +0)          ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
  64: (5d) if r1 != r3 goto pc+51       ; R1_w=0 R3_w=0
  ; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241
  65: (18) r1 = 0xff11000114548000      ; R1_w=map_ptr(map=arena,ks=0,vs=0)
  67: (b4) w3 = 2                       ; R3_w=2
  68: (85) call bpf_arena_free_pages#72675      ;
  69: (b7) r1 = 0                       ; R1_w=0
  ; page[i + 1] = NULL; @ verifier_arena_large.c:243
  70: (7b) *(u64 *)(r8 +8) = r1
  R8 invalid mem access 'scalar'
  processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2
  =============
  #489/5   verifier_arena_large/big_alloc2:FAIL

The main reason is that 'r8' in insn '70' is not an arena pointer.
Further debugging at llvm side shows that llvm commit ([1]) caused
the failure. For the original code:
  page[i] = NULL;
  page[i + 1] = NULL;
the llvm transformed it to something like below at source level:
  __builtin_memset(&page[i], 0, 16)
Such transformation prevents llvm BPFCheckAndAdjustIR pass from
generating proper addr_space_cast insns ([2]).

Adding support in llvm BPFCheckAndAdjustIR pass should work, but
not sure that such a pattern exists or not in real applications.
At the same time, simply adding a memory barrier between two 'page'
assignment can fix the issue.

  [1] https://github.com/llvm/llvm-project/pull/155415
  [2] https://github.com/llvm/llvm-project/pull/84410

Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250920045805.3288551-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-20 15:49:35 -07:00
Colin Ian King
4386f71623 selftest/futex: Fix spelling mistake "boundarie" -> "boundary"
There is a spelling mistake in a test message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:23:13 +02:00
André Almeida
520db0559d selftests/futex: Remove logging.h file
Every futex selftest uses the kselftest_harness.h helper and don't need
the logging.h file. Delete it.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:56 +02:00
André Almeida
b257d91c4d selftests/futex: Drop logging.h include from futex_numa
futex_numa doesn't really use logging.h helpers, it's only need two
includes from this file. So drop it and include the two missing
includes.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:56 +02:00
André Almeida
d35ca2f642 selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h
To reduce the boilerplate code, refactor futex_numa_mpol test to use
kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:55 +02:00
André Almeida
4ba629e6c6 selftests/futex: Refactor futex_priv_hash with kselftest_harness.h
To reduce the boilerplate code, refactor futex_priv_hash test to use
kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:55 +02:00
André Almeida
a91e8e372e selftests/futex: Refactor futex_waitv with kselftest_harness.h
To reduce the boilerplate code, refactor futex_waitv test to use
kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:55 +02:00
André Almeida
f341a20f6d selftests/futex: Refactor futex_requeue with kselftest_harness.h
To reduce the boilerplate code, refactor futex_requeue test to use
kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:55 +02:00
André Almeida
e5c04d0f3e selftests/futex: Refactor futex_wait with kselftest_harness.h
To reduce the boilerplate code, refactor futex_wait test to use
kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:55 +02:00
André Almeida
14d016bd72 selftests/futex: Refactor futex_wait_private_mapped_file with kselftest_harness.h
To reduce the boilerplate code, refactor futex_wait_private_mapped_file
test to use kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:54 +02:00
André Almeida
af3c79f857 selftests/futex: Refactor futex_wait_unitialized_heap with kselftest_harness.h
To reduce the boilerplate code, refactor futex_wait_unitialized_heap
test to use kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:54 +02:00
André Almeida
f5a1683441 selftests/futex: Refactor futex_wait_wouldblock with kselftest_harness.h
To reduce the boilerplate code, refactor futex_wait_wouldblock test to
use kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:54 +02:00
André Almeida
0c02abf638 selftests/futex: Refactor futex_wait_timeout with kselftest_harness.h
To reduce the boilerplate code, refactor futex_wait_timeout test to use
kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:54 +02:00
André Almeida
2ef0615685 selftests/futex: Refactor futex_requeue_pi_signal_restart with kselftest_harness.h
To reduce the boilerplate code, refactor futex_requeue_pi_signal_restart
test to use kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:54 +02:00
André Almeida
65a12ce20f selftests/futex: Refactor futex_requeue_pi_mismatched_ops with kselftest_harness.h
To reduce the boilerplate code, refactor futex_requeue_pi_mismatched_ops
test to use kselftest_harness header instead of futex's logging header.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:53 +02:00
André Almeida
d060495a37 selftests/futex: Refactor futex_requeue_pi with kselftest_harness.h
To reduce the boilerplate code, refactor futex_requeue_pi test to use
kselftest_harness header instead of futex's logging header.

Use kselftest fixture feature to make it easy to repeat the same test
with different parameters. With that, drop all repetitive test calls
from run.sh.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:53 +02:00
André Almeida
f2662ec26b selftests: kselftest: Create ksft_print_dbg_msg()
Create ksft_print_dbg_msg() so testers can enable extra debug messages
when running a test with the flag -d.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-20 18:11:53 +02:00
Marc Zyngier
46bd74ef07 Merge branch kvm-arm64/el2-feature-control into kvmarm-master/next
* kvm-arm64/el2-feature-control: (23 commits)
  : .
  : General rework of EL2 features that can be disabled to satisfy
  : the requirement of migration between heterogeneous hosts:
  :
  : - Handle effective RES0 behaviour of undefined registers, making sure
  :   that disabling a feature affects full registeres, and not just
  :   individual control bits. (20250918151402.1665315-1-maz@kernel.org)
  :
  : - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace.
  :   (20250911114621.3724469-1-yangjinqian1@huawei.com)
  :
  : - Turn the NV feature management into a deny-list, and expose
  :   missing features to EL2 guests.
  :   (20250912212258.407350-1-oliver.upton@linux.dev)
  : .
  KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs
  KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs
  KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs
  KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set
  KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs
  KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac
  KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs
  KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs
  KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()
  KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED}
  KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace
  KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2
  KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2}
  KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits()
  KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2
  KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20 12:26:18 +01:00
Marc Zyngier
00a37271c8 KVM: arm64: selftest: Expand external_aborts test to look for TTW levels
Add a basic test corrupting a level-2 table entry to check that
the resulting abort is a SEA on a PTW at level-3.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20 11:05:14 +01:00
Takashi Iwai
b8d8265a0d ASoC: Updates for v6.18
A relatively quiet release for ASoC, we've had a lot of maintainance
 work going on and several new drivers but really the most remarkable
 thing is that we removed a driver, the WL1273 driver used in some old
 Nokia systems that have had the underlying system support removed from
 the kernel.
 
  - Morimoto-san continues his work on cleanups of the core APIs and
    enforcement of abstraction layers.
  - Lots of cleanups and conversions of DT bindings.
  - Substantial maintainance work on the Intel AVS drivers.
  - Support for Qualcomm Glymur and PM4125, Realtek RT1321, Shanghai
    FourSemi FS2104/5S, Texas Instruments PCM1754.
  - Remove support for TI WL1273.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmjNx/QACgkQJNaLcl1U
 h9A+oQf/a/hKhCdmDVl8LE/a5dTExQqpzxvLpWxUBwiYykh5B08n/adS7oALRyBK
 IfkbGfwpA4N2dGtwrluy4VATyQBTe8SUboX6iP1cxifbWG8+EDuVfpkdUl/R3fcK
 gPm41C/2Xk+GoAF4StfijPKg2PV8mUOWmTrxNm2QswGpkXxOFO4PI2GbTwsABDeU
 cv+EK7PUZHhKFUOu1ELLi1HmgI57TMK7Kb3I+ETcKNZ3ZiCaLs7Vkje5z2IUhSZZ
 +Z/EDnLKUmvYRmbkA48aFas4hpafkT7jrmGrk95mju/W0Udd9Ggm4MSF6+9DN4MF
 buaNixQZlIwVz68zapcVtrFedxKLEQ==
 =6T3w
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next

ASoC: Updates for v6.18

A relatively quiet release for ASoC, we've had a lot of maintainance
work going on and several new drivers but really the most remarkable
thing is that we removed a driver, the WL1273 driver used in some old
Nokia systems that have had the underlying system support removed from
the kernel.

 - Morimoto-san continues his work on cleanups of the core APIs and
   enforcement of abstraction layers.
 - Lots of cleanups and conversions of DT bindings.
 - Substantial maintainance work on the Intel AVS drivers.
 - Support for Qualcomm Glymur and PM4125, Realtek RT1321, Shanghai
   FourSemi FS2104/5S, Texas Instruments PCM1754.
 - Remove support for TI WL1273.
2025-09-20 08:38:17 +02:00
Uday Shankar
a755da0dd0 selftests: ublk: add test to verify that feat_map is complete
Add a test that verifies that the currently running kernel does not
report support for any features that are unrecognized by kublk. This
should catch cases where features are added without updating kublk's
feat_map accordingly, which has happened multiple times in the past (see
[1], [2]).

Note that this new test may fail if the test suite is older than the
kernel, and the newer kernel contains a newly introduced feature. I
believe this is not a use case we currently care about - we only care
about newer test suites passing on older kernels.

[1] https://lore.kernel.org/linux-block/20250606214011.2576398-1-csander@purestorage.com/t/#u
[2] https://lore.kernel.org/linux-block/2a370ab1-d85b-409d-b762-f9f3f6bdf705@nvidia.com/t/#m1c520a058448d594fd877f07804e69b28908533f

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-19 11:06:15 -06:00
Uday Shankar
742bcc1101 selftests: ublk: kublk: add UBLK_F_BUF_REG_OFF_DAEMON to feat_map
When UBLK_F_BUF_REG_OFF_DAEMON was added, we missed updating kublk's
feat_map, which results in the feature being reported as "unknown." Add
UBLK_F_BUF_REG_OFF_DAEMON to feat_map to fix this.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-19 11:06:15 -06:00
Uday Shankar
1f924cf781 selftests: ublk: kublk: simplify feat_map definition
Simplify the definition of feat_map by introducing a helper macro
FEAT_NAME to avoid having to type the feature name twice. As a side
effect, this changes the names in the feature list to be the full macro
name instead of the abbreviated names that were used before, but this is
a good change for clarity.

Using the full feature macro names ruins the alignment of the output, so
change the output format to put each feature's hex value before its
name, as this is easier to align nicely. The output now looks as
follows:

root# ./kublk features
ublk_drv features: 0x7fff
0x1               : UBLK_F_SUPPORT_ZERO_COPY
0x2               : UBLK_F_URING_CMD_COMP_IN_TASK
0x4               : UBLK_F_NEED_GET_DATA
0x8               : UBLK_F_USER_RECOVERY
0x10              : UBLK_F_USER_RECOVERY_REISSUE
0x20              : UBLK_F_UNPRIVILEGED_DEV
0x40              : UBLK_F_CMD_IOCTL_ENCODE
0x80              : UBLK_F_USER_COPY
0x100             : UBLK_F_ZONED
0x200             : UBLK_F_USER_RECOVERY_FAIL_IO
0x400             : UBLK_F_UPDATE_SIZE
0x800             : UBLK_F_AUTO_BUF_REG
0x1000            : UBLK_F_QUIESCE
0x2000            : UBLK_F_PER_IO_DAEMON
0x4000            : unknown

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-19 11:06:15 -06:00
Eduard Zingerman
fdcecdff90 selftests/bpf: test cases for callchain sensitive live stack tracking
- simple propagation of read/write marks;
- joining read/write marks from conditional branches;
- avoid must_write marks in when same instruction accesses different
  stack offsets on different execution paths;
- avoid must_write marks in case same instruction accesses stack
  and non-stack pointers on different execution paths;
- read/write marks propagation to outer stack frame;
- independent read marks for different callchains ending with the same
  function;
- bpf_calls_callback() dependent logic in
  liveness.c:bpf_stack_slot_alive().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-12-c3cd27bacc60@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19 09:27:24 -07:00
Eduard Zingerman
34c513be3d selftests/bpf: __not_msg() tag for test_loader framework
This patch adds tags __not_msg(<msg>) and __not_msg_unpriv(<msg>).
Test fails if <msg> is found in verifier log.

If __msg_not() is situated between __msg() tags framework matches
__msg() tags first, and then checks that <msg> is not present in a
portion of a log between bracketing __msg() tags.
__msg_not() tags bracketed by a same __msg() group are effectively
unordered.

The idea is borrowed from LLVM's CheckFile with its CHECK-NOT syntax.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-11-c3cd27bacc60@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19 09:27:23 -07:00
Eduard Zingerman
107e169799 bpf: disable and remove registers chain based liveness
Remove register chain based liveness tracking:
- struct bpf_reg_state->{parent,live} fields are no longer needed;
- REG_LIVE_WRITTEN marks are superseded by bpf_mark_stack_write()
  calls;
- mark_reg_read() calls are superseded by bpf_mark_stack_read();
- log.c:print_liveness() is superseded by logging in liveness.c;
- propagate_liveness() is superseded by bpf_update_live_stack();
- no need to establish register chains in is_state_visited() anymore;
- fix a bunch of tests expecting "_w" suffixes in verifier log
  messages.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-9-c3cd27bacc60@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19 09:27:23 -07:00
Christian Brauner
d093090ea7
selftests/namespaces: verify initial namespace inode numbers
Make sure that all works correctly.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 16:22:38 +02:00
Jason Gunthorpe
43f6bee021 iommufd/selftest: Update the fail_nth limit
There are more failure conditions now so 400 iterations is not enough pass
them all, up it to 1000. The limit exists so it doesn't infinite loop.

Link: https://patch.msgid.link/r/3-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19 10:34:49 -03:00
Jinqian Yang
be8c9192ea KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED}
Assert that the EL2 features {HCX, TWED} of ID_AA64MMFR1_EL1 are writable
from userspace. They are only allowed to be downgraded in userspace.

Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19 13:55:57 +01:00
Christian Brauner
28ef38a9a2
selftests/namespaces: add file handle selftests
Add a bunch of selftests for namespace file handles.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 14:26:17 +02:00
Christian Brauner
14f98438f0
selftests/namespaces: add identifier selftests
Add a bunch of selftests for the identifier retrieval ioctls.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 14:26:16 +02:00
Mark Brown
777fb19ed8 kselftest/arm64: Add lsfe to the hwcaps test
This feature has no traps associated with it so the SIGILL is not reliable.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19 12:50:17 +01:00
KP Singh
ea2e6467ac bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD
Currently only array maps are supported, but the implementation can be
extended for other maps and objects. The hash is memoized only for
exclusive and frozen maps as their content is stable until the exclusive
program modifies the map.

This is required for BPF signing, enabling a trusted loader program to
verify a map's integrity. The loader retrieves
the map's runtime hash from the kernel and compares it against an
expected hash computed at build time.

Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250914215141.15144-7-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 19:11:42 -07:00
KP Singh
6c850cbca8 selftests/bpf: Add tests for exclusive maps
Check if access is denied to another program for an exclusive map

Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250914215141.15144-6-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 19:11:42 -07:00
Bala-Vignesh-Reddy
f68cd7ddd0 selftests: riscv: Add README for RISC-V KSelfTest
Add a README file for RISC-V specific kernel selftests under
tools/testing/selftests/riscv/. This mirrors the existing README
for arm64, providing clear guidance on how the tests are architecture
specific and skipped on non-riscv systems. It also includes
standard make commands for building, running and installing the
tests, along with a reference to general kselftest documentation.

Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250815180724.14459-1-reddybalavignesh9979@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18 19:26:07 -06:00
Kumar Kartikeya Dwivedi
8b788d6638 selftests/bpf: Add tests for KF_RCU_PROTECTED
Add a couple of test cases to ensure RCU protection is kicked in
automatically, and the return type is as expected.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250917032755.4068726-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 15:36:17 -07:00
Kumar Kartikeya Dwivedi
1512231b6c bpf: Enforce RCU protection for KF_RCU_PROTECTED
Currently, KF_RCU_PROTECTED only applies to iterator APIs and that too
in a convoluted fashion: the presence of this flag on the kfunc is used
to set MEM_RCU in iterator type, and the lack of RCU protection results
in an error only later, once next() or destroy() methods are invoked on
the iterator. While there is no bug, this is certainly a bit
unintuitive, and makes the enforcement of the flag iterator specific.

In the interest of making this flag useful for other upcoming kfuncs,
e.g. scx_bpf_cpu_curr() [0][1], add enforcement for invoking the kfunc
in an RCU critical section in general.

This would also mean that iterator APIs using KF_RCU_PROTECTED will
error out earlier, instead of throwing an error for lack of RCU CS
protection when next() or destroy() methods are invoked.

In addition to this, if the kfuncs tagged KF_RCU_PROTECTED return a
pointer value, ensure that this pointer value is only usable in an RCU
critical section. There might be edge cases where the return value is
special and doesn't need to imply MEM_RCU semantics, but in general, the
assumption should hold for the majority of kfuncs, and we can revisit
things if necessary later.

  [0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
  [1]: https://lore.kernel.org/all/20250909195709.92669-1-arighi@nvidia.com

Tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250917032755.4068726-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 15:36:17 -07:00
Dave Jiang
46037455cb Merge branch 'for-6.18/cxl-delay-dport' into cxl-for-next
Add changes to delay the allocation and setup of dports until when the
endpoint device is being probed. At this point, the CXL link is
established from endpoint to host bridge. Addresses issues seen on
some platforms when dports are probed earlier.

Link: https://lore.kernel.org/linux-cxl/20250829180928.842707-1-dave.jiang@intel.com/
2025-09-18 14:34:51 -07:00
Jakub Kicinski
f2cdc4c22b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc7).

No conflicts.

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
  9536fbe10c ("net/mlx5e: Add PSP steering in local NIC RX")
  7601a0a462 ("net/mlx5e: Add a miss level for ipsec crypto offload")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 11:26:06 -07:00
Thomas Weißschuh
2c55daf7de selftests: always install UAPI headers to the correct directory
Currently the UAPI headers are always installed into the source directory.
When building out-of-tree this doesn't work, as the include path will be
wrong and it dirties the source tree, leading to complains by kbuild.

Make sure the 'headers' target installs the UAPI headers in the correctly.

The real target directory can come from multiple places. To handle them all
extract the target directory from KHDR_INCLUDES.

Link: https://lore.kernel.org/r/20250918-kselftest-uapi-out-of-tree-v1-1-f4434f28adcd@linutronix.de
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Closes: https://lore.kernel.org/lkml/20250917153209.GA2023406@nvidia.com/
Fixes: 1a59f5d315 ("selftests: Add headers target")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-09-18 12:05:27 -06:00
Dave Jiang
87439b598a cxl/test: Setup target_map for cxl_test decoder initialization
cxl_test uses mock functions for decoder enumaration. Add initialization
of the cxld->target_map[] for cxl_test based decoders in the mock
functions.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18 09:55:23 -07:00
Dave Jiang
644685abc1 cxl/test: Adjust the mock version of devm_cxl_switch_port_decoders_setup()
With devm_cxl_switch_port_decoders_setup() being called within cxl_core
instead of by the port driver probe, adjustments are needed to deal with
circular symbol dependency when this function is being mock'd. Add the
appropriate changes to get around the circular dependency.

Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18 09:55:23 -07:00
Dave Jiang
d96eb90d9c cxl/test: Add mock version of devm_cxl_add_dport_by_dev()
devm_cxl_add_dport_by_dev() outside of cxl_test is done through PCI
hierarchy. However with cxl_test, it needs to be done through the
platform device hierarchy. Add the mock function for
devm_cxl_add_dport_by_dev().

When cxl_core calls a cxl_core exported function and that function is
mocked by cxl_test, the call chain causes a circular dependency issue. Dan
provided a workaround to avoid this issue. Apply the method to changes from
the late dport allocation changes in order to enable cxl-test.

In cxl_core they are defined with "__" added in front of the function. A
macro is used to define the original function names for when non-test
version of the kernel is built. A bit of macros and typedefs are used to
allow mocking of those functions in cxl_test.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18 09:55:23 -07:00
Dave Jiang
68d5d9734c cxl/test: Refactor decoder setup to reduce cxl_test burden
Group the decoder setup code in switch and endpoint port probe into a
single function for each to reduce the number of functions to be mocked
in cxl_test. Introduce devm_cxl_switch_port_decoders_setup() and
devm_cxl_endpoint_decoders_setup(). These two functions will be mocked
instead with some functions optimized out since the mock version does
not do anything. Remove devm_cxl_setup_hdm(),
devm_cxl_add_passthrough_decoder(), and devm_cxl_enumerate_decoders() in
cxl_test mock code. In turn, mock_cxl_add_passthrough_decoder() can be
removed since cxl_test does not setup passthrough decoders.
__wrap_cxl_hdm_decode_init() and __wrap_cxl_dvsec_rr_decode() can be
removed as well since they only return 0 when called.

[dj: drop 'struct cxl_port' forward declaration (Robert)]

Suggested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18 09:54:50 -07:00
Mark Brown
09b5febf84 kselftest/arm64: Check that unsupported regsets fail in sve-ptrace
Add a test which verifies that NT_ARM_SVE and NT_ARM_SSVE reads and writes
are rejected as expected when the relevant architecture feature is not
supported.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18 12:53:38 +01:00
Mark Brown
dd68f51feb kselftest/arm64: Verify that we reject out of bounds VLs in sve-ptrace
We do not currently have a test that asserts that we reject attempts to set
a vector length smaller than SVE_VL_MIN or larger than SVE_VL_MAX, add one
since that is our current behaviour.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18 12:53:38 +01:00
Jakub Kicinski
4c05c7ed88 selftests: tls: test skb copy under mem pressure and OOB
Add a test which triggers mem pressure via OOB writes.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18 12:43:54 +02:00
Kuniyuki Iwashima
1fd0362262 selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
The test reproduces the scenario explained in the previous patch.

Without the patch, the test triggers the warning and cannot see the last
retransmitted packet.

  # ./ksft_runner.sh tcp_fastopen_server_reset-after-disconnect.pkt
  TAP version 13
  1..2
  [   29.229250] ------------[ cut here ]------------
  [   29.231414] WARNING: CPU: 26 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer+0x32/0x9f0
  ...
  tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
  not ok 1 ipv4
  tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
  not ok 2 ipv6
  # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 16:01:52 -07:00
Hangbin Liu
dc5f94b1ec selftests: bonding: add vlan over bond testing
Add a vlan over bond testing to make sure arp/ns target works.
Also change all the configs to mudules.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17 15:13:51 -07:00
Eduard Zingerman
a24a2dda70 selftests/bpf: trigger verifier.c:maybe_exit_scc() for a speculative state
This is a test case minimized from a syzbot reproducer from [1].
The test case triggers verifier.c:maybe_exit_scc() w/o
preceding call to verifier.c:maybe_enter_scc() on a speculative
symbolic execution path.

Here is verifier log for the test case:

  Live regs before insn:
        0: .......... (b7) r0 = 100
    1   1: 0......... (7b) *(u64 *)(r10 -512) = r0
    1   2: 0......... (b5) if r0 <= 0x0 goto pc-2
        3: 0......... (95) exit
  0: R1=ctx() R10=fp0
  0: (b7) r0 = 100                      ; R0_w=100
  1: (7b) *(u64 *)(r10 -512) = r0       ; R0_w=100 R10=fp0 fp-512_w=100
  2: (b5) if r0 <= 0x0 goto pc-2
  mark_precise: ...
  2: R0_w=100
  3: (95) exit

  from 2 to 1 (speculative execution): R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
  1: R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
  1: (7b) *(u64 *)(r10 -512) = r0
  processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

- Non-speculative execution path 0-3 does not allocate any checkpoints
  (and hence does not call maybe_enter_scc()), and schedules a
  speculative jump from 2 to 1.
- Speculative execution path stops immediately because of an infinite
  loop detection and triggers verifier.c:update_branch_counts() ->
  maybe_exit_scc() calls.

[1] https://lore.kernel.org/bpf/68c85acd.050a0220.2ff435.03a4.GAE@google.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250916212251.3490455-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-17 11:19:58 -07:00
Sebastian Andrzej Siewior
ed323aeda5 selftest/futex: Compile also with libnuma < 2.0.16
After using numa_set_mempolicy_home_node() the test fails to compile on
systems with libnuma library versioned lower than 2.0.16.

In order to allow lower library version add a pkg-config related check
and exclude that part of the code. Without the proper MPOL setup it
can't be tested.

Make a total number of tests two. The first one is the first batch and
the second is the MPOL related one. The goal is to let the user know if
it has been skipped due to library limitation.

Remove test_futex_mpol(), it was unused and it is now complained by the
compiler if the part is not compiled.

Fixes: 0ecb4232fc ("selftests/futex: Set the home_node in futex_numa_mpol")
Closes: https://lore.kernel.org/oe-lkp/202507150858.bedaf012-lkp@intel.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-09-17 19:48:45 +02:00
André Almeida
2951dddef0 selftest/futex: Reintroduce "Memory out of range" numa_mpol's subtest
Commit d8e2f91999 ("selftests/futex: Fix some futex_numa_mpol
subtests") removed the "Memory out of range" subtest due to it being
dependent on the memory layout of the test process having an invalid
memory address just after the `*futex_ptr` allocated memory.

Reintroduce this test and make it deterministic, by allocation two
memory pages and marking the second one with PROT_NONE.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
2025-09-17 19:48:44 +02:00
André Almeida
c1c8634577 selftest/futex: Make the error check more precise for futex_numa_mpol
Instead of just checking if the syscall failed as expected, check as
well if the returned error code matches the expected error code.

[ bigeasy: reword the commmit message ]

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
2025-09-17 19:48:44 +02:00
Dave Jiang
02edab6cee cxl: Add a cached copy of target_map to cxl_decoder
Add a cached copy of the hardware port-id list that is available at init
before all @dport objects have been instantiated. Change is in preparation
of delayed dport instantiation.

Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-17 08:53:24 -07:00
Thomas Weißschuh
5b7bdc4402 kselftest/arm64/gcs/basic-gcs: Respect parent directory CFLAGS
basic-gcs has it's own make rule to handle the special compiler
invocation to build against nolibc. This rule does not respect the
$(CFLAGS) passed by the Makefile from the parent directory.
However these $(CFLAGS) set up the include path to include the UAPI
headers from the current kernel.
Due to this the asm/hwcap.h header is used from the toolchain instead of
the UAPI and the definition of HWCAP_GCS is not found.

Restructure the rule for basic-gcs to respect the $(CFLAGS).
Also drop those options which are already provided by $(CFLAGS).

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://lore.kernel.org/lkml/CA+G9fYv77X+kKz2YT6xw7=9UrrotTbQ6fgNac7oohOg8BgGvtw@mail.gmail.com/
Fixes: a985fe6383 ("kselftest/arm64/gcs: Use nolibc's getauxval()")
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-17 16:32:02 +01:00
Paul Chaignon
180a46bc1a selftests/bpf: Test accesses to ctx padding
This patch adds tests covering the various paddings in ctx structures.
In case of sk_lookup BPF programs, the behavior is a bit different
because accesses to the padding are explicitly allowed. Other cases
result in a clear reject from the verifier.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/3dc5f025e350aeb2bb1c257b87c577518e574aeb.1758094761.git.paul.chaignon@gmail.com
2025-09-17 16:15:57 +02:00
Paul Chaignon
7c60f6e488 selftests/bpf: Move macros to bpf_misc.h
Move the sizeof_field and offsetofend macros from individual test files
to the common bpf_misc.h to avoid duplication.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/97a3f3788bd3aec309100bc073a5c77130e371fd.1758094761.git.paul.chaignon@gmail.com
2025-09-17 16:15:37 +02:00
Benjamin Tissoires
8c62074fa8 selftests/hid: hidraw: forge wrong ioctls and tests them
We also need coverage for when the malicious user is not using the
proper ioctls definitions and tries to work around the driver.

Most of the scaffholding has been generated by claude-4-sonnet and then
carefully reviewed.

Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-09-17 11:37:23 +02:00
Benjamin Tissoires
bb6c861a29 selftests/hid: hidraw: add more coverage for hidraw ioctls
Try to ensure all ioctls are having at least one test.

Most of the scaffholding has been generated by claude-4-sonnet and then
carefully reviewed.

Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-09-17 11:37:23 +02:00
Benjamin Tissoires
be66a27b4f selftests/hid: update vmtest.sh for virtme-ng
This commit is a rewrite almost from scratch of vmtest.sh.

By relying on virtme-ng, we get rid of boot2container, reducing the
total bootup time (and network requirements). That means that we are
relying on the programs being installed on the host, but that shouldn't
be an issue. The generation of the kconfig is also now handled by
virtme-ng, so that's one less thing to worry.

I used tools/testing/selftests/vsock/vmtest.sh as a base and modified it
to look mostly like my previous script:
- removed the custom ssh handling
- make use of vng for compiling, which allows to bring remote
  compilation (and potentially remote compilation on a remote container)
- change the verbosity logic by having 2 levels:
  - first one shows the tests outputs
  - second level also shows the VM logs
- instead of only running the compiled kernel when it is built, if we
  are in the kernel tree, use the kernel artifacts there (and complain
  if they are not built)
- adapted the tests list to match the HID subsystem tests

Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-09-17 11:35:39 +02:00
Yi Lai
3e23a3f688 selftests/kselftest_harness: Add harness-selftest.expected to TEST_FILES
The harness-selftest.expected is not installed in INSTALL_PATH.
Attempting to execute harness-selftest.sh shows warning:

diff: ./kselftest_harness/harness-selftest.expected: No such file or
directory

Add harness-selftest.expected to TEST_FILES.

Link: https://lore.kernel.org/r/20250909082619.584470-1-yi1.lai@intel.com
Fixes: df82ffc5a3 ("selftests: harness: Add kselftest harness selftest")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-09-16 18:05:03 -06:00
Nai-Chen Cheng
d3f7457da7 selftests/Makefile: include $(INSTALL_DEP_TARGETS) in clean target to clean net/lib dependency
The selftests 'make clean' does not clean the net/lib because it only
processes $(TARGETS) and ignores $(INSTALL_DEP_TARGETS). This leaves
compiled objects in net/lib after cleaning, requiring manual cleanup.

Include $(INSTALL_DEP_TARGETS) in clean target to ensure net/lib
dependency is properly cleaned.

Signed-off-by: Nai-Chen Cheng <bleach1827@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://patch.msgid.link/20250910-selftests-makefile-clean-v1-1-29e7f496cd87@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16 16:05:02 -07:00
Akhilesh Patil
e8cfc524ea selftests: watchdog: skip ping loop if WDIOF_KEEPALIVEPING not supported
Check if watchdog device supports WDIOF_KEEPALIVEPING option before
entering keep_alive() ping test loop. Fix watchdog-test silently looping
if ioctl based ping is not supported by the device. Exit from test in
such case instead of getting stuck in loop executing failing keep_alive()

watchdog_info:
 identity:              m41t93 rtc Watchdog
 firmware_version:      0
Support/Status: Set timeout (in seconds)
Support/Status: Watchdog triggers a management or other external alarm not a reboot

Watchdog card disabled.
Watchdog timeout set to 5 seconds.
Watchdog ping rate set to 2 seconds.
Watchdog card enabled.
WDIOC_KEEPALIVE not supported by this device

without this change
Watchdog card disabled.
Watchdog timeout set to 5 seconds.
Watchdog ping rate set to 2 seconds.
Watchdog card enabled.
Watchdog Ticking Away!
(Where test stuck here forver silently)

Updated change log at commit time:
Shuah Khan <skhan@linuxfoundation.org>

Link: https://lore.kernel.org/r/20250914152840.GA3047348@bhairav-test.ee.iitb.ac.in
Fixes: d89d08ffd2 ("selftests: watchdog: Fix ioctl SET* error paths to take oneshot exit path")
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-09-16 16:55:04 -06:00
Anup Patel
5c6d333a9e KVM: riscv: selftests: Add SBI FWFT to get-reg-list test
KVM RISC-V now supports SBI FWFT, so add it to the get-reg-list test.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250823155947.1354229-7-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:54:24 +05:30
Quan Zhou
dbe3d1d160 KVM: riscv: selftests: Add common supported test cases
Some common KVM test cases are supported on riscv now as following:

    access_tracking_perf_test
    dirty_log_perf_test
    memslot_modification_stress_test
    memslot_perf_test
    mmu_stress_test
    rseq_test

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/c447f18115b27562cd65863645e41a5ef89bd37b.1756710918.git.dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:53:58 +05:30
Dong Yang
f4103c1171 KVM: riscv: selftests: Add missing headers for new testcases
Add missing headers to fix the build for new RISC-V KVM selftests.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/bfb66541918de68cd89b83bc3430af94bdc75a85.1756710918.git.dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:53:55 +05:30
Quan Zhou
c92786e179 KVM: riscv: selftests: Use the existing RISCV_FENCE macro in rseq-riscv.h
To avoid redefinition issues with RISCV_FENCE, directly reference
the existing macro in `rseq-riscv.h`.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/85e5e51757c9289ca463fbc4ba6d22f9c9db791b.1756710918.git.dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:53:52 +05:30
Quan Zhou
b4ab605e2f KVM: riscv: selftests: Add bfloat16 extension to get-reg-list test
The KVM RISC-V allows Zfbfmin/Zvfbfmin/Zvfbfwma extensions for Guest/VM
so add them to get-reg-list test.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/40e52ff7053401a2fcb206e75f45ebc8557fc28b.1754646071.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:53:49 +05:30
Quan Zhou
e677fab865 KVM: riscv: selftests: Add Zicbop extension to get-reg-list test
The KVM RISC-V allows Zicbop extension for Guest/VM
so add them to get-reg-list test.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://lore.kernel.org/r/076908690c15070f907f43d2ff81ba7e95582ec7.1754646071.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:53:46 +05:30
Al Viro
1b8abbb121 bpf...d_path(): constify path argument
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15 21:17:08 -04:00
Stanislav Fomichev
17a0374be9 selftests: ncdevmem: remove sleep on rx
RX devmem sometimes fails on NIPA:

https://netdev-3.bots.linux.dev/vmksft-fbnic-qemu-dbg/results/294402/7-devmem-py/

Both RSS and flow steering are properly installed, but the wait_port_listen
fails. Try to remove sleep(1) to see if the cause of the failure is
spending too much time during RX setup. I don't see a good reason to
have sleep in the first place. If there needs to be a delay between
installing the rules and receiving the traffic, let's add it to the
callers (devmem.py) instead.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250912170611.676110-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:14:48 -07:00
Geliang Tang
e3241506a4 selftests: mptcp: close server IPC descriptors
The client-side function connect_one_server() properly closes its IPC
descriptor after use, but the server-side code in both mptcp_sockopt.c
and mptcp_inq.c was missing corresponding close() calls for their IPC
descriptors, leaving file descriptors open unnecessarily.

This change ensures proper cleanup by:
1. Adding missing close(pipefds[0]/unixfds[0]) in server processes
2. Adding close(pipefds[1]/unixfds[1]) after server() function calls

This ensures both ends of the IPC pipe are properly closed in their
respective processes, preventing file descriptor leaks.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-next-mptcp-minor-fixes-6-18-v1-2-99d179b483ad@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:14:23 -07:00
Geliang Tang
dab86ee688 selftests: mptcp: close server file descriptors
The server file descriptor ('fd') is opened in server() but never closed.
While accepted connections are properly closed in process_one_client(),
the main listening socket remains open, causing a resource leak.

This patch ensures the server fd is properly closed after processing
clients, bringing the sockopt and inq test cases in line with proper
resource cleanup practices.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-next-mptcp-minor-fixes-6-18-v1-1-99d179b483ad@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:14:22 -07:00
Geliang Tang
b86418bead selftests: mptcp: sockopt: fix error messages
This patch fixes several issues in the error reporting of the MPTCP sockopt
selftest:

1. Fix diff not printed: The error messages for counter mismatches had
   the actual difference ('diff') as argument, but it was missing in the
   format string. Displaying it makes the debugging easier.

2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly
   used 'ret2' (sent bytes) for both the expected value and the difference
   calculation. It now correctly uses 'ret' (received bytes), which is the
   expected value for bytes_acked.

3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff
   was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been
   corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected
   value in the condition above it.

Fixes: 5dcff89e14 ("selftests: mptcp: explicitly tests aggregate counters")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:12:05 -07:00
Matthieu Baerts (NGI0)
24733e193a selftests: mptcp: userspace pm: validate deny-join-id0 flag
The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make
sure it is correctly announced by the other peer when it has been
received.

pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events,
and when this flag was set by the other peer.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:12:05 -07:00
Matthieu Baerts (NGI0)
cf74e0aa0e selftests: mptcp: connect: print pcap prefix
To be able to find which capture files have been produced after several
runs.

This prefix was not printed anywhere before.

While at it, always use the same prefix by taking info from ns1, instead
of "$connector_ns", which is sometimes ns1, sometimes ns2 in the
subtests.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-5-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:10:37 -07:00
Matthieu Baerts (NGI0)
a17c5aa3a3 selftests: mptcp: print trailing bytes with od
This is better than printing random bytes in the terminal.

Note that Jakub suggested 'hexdump', but Mat found out this tool is not
often installed by default. 'od' can do a similar job, and it is in the
POSIX specs and available in coreutils, so it should be on more systems.

While at it, display a few more bytes, just to fill in the two lines.
And no need to display the 3rd only line showing the next number of
bytes: 0000040.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-4-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:10:37 -07:00
Matthieu Baerts (NGI0)
8708c5d8b3 selftests: mptcp: avoid spurious errors on TCP disconnect
The disconnect test-case, with 'plain' TCP sockets generates spurious
errors, e.g.

  07 ns1 TCP   -> ns1 (dead:beef:1::1:10006) MPTCP
  read: Connection reset by peer
  read: Connection reset by peer
  (duration   155ms) [FAIL] client exit code 3, server 3

  netns ns1-FloSdv (listener) socket stat for 10006:
  TcpActiveOpens                  2                  0.0
  TcpPassiveOpens                 2                  0.0
  TcpEstabResets                  2                  0.0
  TcpInSegs                       274                0.0
  TcpOutSegs                      276                0.0
  TcpOutRsts                      3                  0.0
  TcpExtPruneCalled               2                  0.0
  TcpExtRcvPruned                 1                  0.0
  TcpExtTCPPureAcks               104                0.0
  TcpExtTCPRcvCollapsed           2                  0.0
  TcpExtTCPBacklogCoalesce        42                 0.0
  TcpExtTCPRcvCoalesce            43                 0.0
  TcpExtTCPChallengeACK           1                  0.0
  TcpExtTCPFromZeroWindowAdv      42                 0.0
  TcpExtTCPToZeroWindowAdv        41                 0.0
  TcpExtTCPWantZeroWindowAdv      13                 0.0
  TcpExtTCPOrigDataSent           164                0.0
  TcpExtTCPDelivered              165                0.0
  TcpExtTCPRcvQDrop               1                  0.0

In the failing scenarios (TCP -> MPTCP), the involved sockets are
actually plain TCP ones, as fallbacks for passive sockets at 2WHS time
cause the MPTCP listeners to actually create 'plain' TCP sockets.

Similar to commit 218cc16632 ("selftests: mptcp: avoid spurious errors
on disconnect"), the root cause is in the user-space bits: the test
program tries to disconnect as soon as all the pending data has been
spooled, generating an RST. If such option reaches the peer before the
connection has reached the closed status, the TCP socket will report an
error to the user-space, as per protocol specification, causing the
above failure. Note that it looks like this issue got more visible since
the "tcp: receiver changes" series from commit 06baf9bfa6 ("Merge
branch 'tcp-receiver-changes'").

Address the issue by explicitly waiting for the TCP sockets (-t) to
reach a closed status before performing the disconnect. More precisely,
the test program now waits for plain TCP sockets or TCP subflows in
addition to the MPTCP sockets that were already monitored.

While at it, use 'ss' with '-n' to avoid resolving service names, which
is not needed here.

Fixes: 218cc16632 ("selftests: mptcp: avoid spurious errors on disconnect")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-3-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:10:37 -07:00
Matthieu Baerts (NGI0)
14e22b43df selftests: mptcp: connect: catch IO errors on listen side
IO errors were correctly printed to stderr, and propagated up to the
main loop for the server side, but the returned value was ignored. As a
consequence, the program for the listener side was no longer exiting
with an error code in case of IO issues.

Because of that, some issues might not have been seen. But very likely,
most issues either had an effect on the client side, or the file
transfer was not the expected one, e.g. the connection got reset before
the end. Still, it is better to fix this.

The main consequence of this issue is the error that was reported by the
selftests: the received and sent files were different, and the MIB
counters were not printed. Also, when such errors happened during the
'disconnect' tests, the program tried to continue until the timeout.

Now when an IO error is detected, the program exits directly with an
error.

Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 18:10:36 -07:00
Hangbin Liu
71379e1c95 selftests: bonding: add fail_over_mac testing
Add a test to check each value of bond fail_over_mac option.

Also fix a minor garp_test print issue.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 17:48:42 -07:00
Victor Nogueira
b7df2e7eae selftests/tc-testing: Adapt tc police action tests for Gb rounding changes
For the tc police action, iproute2 rounds up mtu and burst sizes to a
higher order representation. For example, if the user specifies the default
mtu for a police action instance (4294967295 bytes), iproute2 will output
it as 4096Mb when this action instance is dumped. After Jay's changes [1],
iproute2 will round up to Gb, so 4096Mb becomes 4Gb. With that in mind,
fix police's tc test output so that it works both with the current
iproute2 version and Jay's.

[1] https://lore.kernel.org/netdev/20250907014216.2691844-1-jay.vosburgh@canonical.com/

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Cong Wang <cwang@multikernel.io>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://patch.msgid.link/20250912154616.67489-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15 16:13:13 -07:00
Alan Maguire
3ae4c52708 selftests/bpf: More open-coded gettid syscall cleanup
Commit 0e2fb011a0 ("selftests/bpf: Clean up open-coded gettid syscall
invocations") addressed the issue that older libc may not have a gettid()
function call wrapper for the associated syscall.

A few more instances have crept into tests, use sys_gettid() instead, and
poison raw gettid() usage to avoid future issues.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250911163056.543071-1-alan.maguire@oracle.com
2025-09-15 20:39:24 +02:00
Kumar Kartikeya Dwivedi
a8250d167c selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns
Make sure that we only switch the cgroup namespace and enter a new
cgroup in a child process separate from test_progs, to not mess up the
environment for subsequent tests.

To remove this cgroup, we need to wait for the child to exit, and then
rmdir its cgroup. If the read call fails, or waitpid succeeds, we know
the child exited (read call would fail when the last pipe end is closed,
otherwise waitpid waits until exit(2) is called). We then invoke a newly
introduced remove_cgroup_pid() helper, that identifies cgroup path using
the passed in pid of the now dead child, instead of using the current
process pid (getpid()).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250915032618.1551762-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-15 10:53:15 -07:00
Saket Kumar Bhaskar
a9d4e9f0e8 selftests/bpf: Fix arena_spin_lock selftest failure
For systems having CONFIG_NR_CPUS set to > 1024 in kernel config
the selftest fails as arena_spin_lock_irqsave() returns EOPNOTSUPP.
(eg - incase of powerpc default value for CONFIG_NR_CPUS is 8192)

The selftest is skipped incase bpf program returns EOPNOTSUPP,
with a descriptive message logged.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Link: https://lore.kernel.org/r/20250913091337.1841916-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-15 10:49:18 -07:00
Leon Hwang
f7528e4412 selftests/bpf: Skip timer_interrupt case when bpf_timer is not supported
Like commit fbdd61c94b ("selftests/bpf: Skip timer cases when bpf_timer is not supported"),
'timer_interrupt' test case should be skipped if verifier rejects
bpf_timer with returning -EOPNOTSUPP.

cd tools/testing/selftests/bpf
./test_progs -t timer
461     timer_interrupt:SKIP
Summary: 6/0 PASSED, 7 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250915121657.28084-1-leon.hwang@linux.dev
2025-09-15 10:22:58 -07:00