- Add Panther Lake, Wildcat Lake and Nova Lake processor IDs to the
list of supported processors in the intel_tcc_cooling thermal
driver (Srinivas Pandruvada)
- Drop unnecessary explicit driver data clearing on removal from the
intel_pch_thermal driver (Kaushlendra Kumar)
- Add support for "slow" workload type hints to the int340x
processor_thermal driver and enable it on the Panther Lake
platform (Srinivas Pandruvada)
- Use sysfs_emit{_at}() in sysfs show functions in Intel thermal
drivers (Thorsten Blum)
- Update the x86_pkg_temp_thermal driver to handle THERMAL_TEMP_INVALID
that can be passed to it via sysfs as expected (Rafael Wysocki)
- Drop a redundant local variable from the intel_tcc_cooling thermal
driver and fix a kerneldoc comment typo in the TCC library (Sumeet
Pawnikar)
- Fix CFLAGS and LDFLAGS in the pkg-config libthermal template (Romain
Gantois)
- Support multiple temp to raw conversion functions in the Mediatek
LVTS thermal driver and add MT8196 and MT6991 support to it (Laura
Nao)
- Add Mediatek LVTS driver support for MT7987 (Frank Wunderlich)
- Use the existing HZ_PER_MHZ macro on STM32 (Andy Shevchenko)
- Use the existing clamp() macro on BCM2835 (Thorsten Blum)
- Make the reset line optional in order to support new Renesas SoCs
where it is not available and add support for RZ/T2H and RZ/N2H
to the rzg3e thermal driver (Cosmin Tanislav)
- Document RZ/V2N TSU in the r9a09g047-tsu DT bindings (Ovidiu
Panait)
- Fix all kernel-doc warnings in the internal thermal core header
file (Randy Dunlap)
- Fix a device node reference leak in thermal_of_cm_lookup() (Felix Gu)
- Replace some old-style library function calls with ones that are
currently recommended in several places in the thermal core and
debugfs code (Sumeet Pawnikar, Thorsten Blum)
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmmDnCoSHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1lyoIAJ5gQviWP4gmifh/osuBbc6EdN5qBnwL
d6oj6oKoTKYC8TCzgIVNf4sOU9BziJgiuB5WSUDqFG6MPy+C/kSztlZ3yVQDpJzW
XgGdgLMK/Sdza6IE8+AfHN3WeYTA4BD7Otda1L965FAY8PN1W4ZzzZyydWHd27Gt
4D9ruuCFZQ52EatMtbtpAX+loapIIFD3jUZ6fxBK1ziAN3en2F57xC4GeJvbS9fz
zpBJld+p19CdfTd/NCHMdqVTc01fL0d6cwnpzdH+aeqmam61SYAzLQrk0Wu3wlfr
m8jqx+F7qL9/X/CUSOoy50K7pBQimUAh80L7YPBV+/gwpW3IO4Ll3KU=
=AqXf
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add support for "slow" (long-term trend) workload type hints to
the Intel int340x thermal driver and selftests (and enable it for
Panther Lake), add support for MT8196 along with DT bindings and for
MT7987 to the Mediatek LVTS thermal driver, add support for RZ/T2H and
RZ/N2H along with DT bindings to the Renesas rzg3e thermal driver, add
support for the Panther Lake, Wildcat Lake and Nova Lake processors to
the intel_tcc_cooling driver, fix bugs, make some cosmetic changes
including code cleanups and library function substitutions, and update
documentation.
Specifics:
- Add Panther Lake, Wildcat Lake and Nova Lake processor IDs to the
list of supported processors in the intel_tcc_cooling thermal
driver (Srinivas Pandruvada)
- Drop unnecessary explicit driver data clearing on removal from the
intel_pch_thermal driver (Kaushlendra Kumar)
- Add support for "slow" workload type hints to the int340x
processor_thermal driver and enable it on the Panther Lake platform
(Srinivas Pandruvada)
- Use sysfs_emit{_at}() in sysfs show functions in Intel thermal
drivers (Thorsten Blum)
- Update the x86_pkg_temp_thermal driver to handle
THERMAL_TEMP_INVALID that can be passed to it via sysfs as expected
(Rafael Wysocki)
- Drop a redundant local variable from the intel_tcc_cooling thermal
driver and fix a kerneldoc comment typo in the TCC library (Sumeet
Pawnikar)
- Fix CFLAGS and LDFLAGS in the pkg-config libthermal template
(Romain Gantois)
- Support multiple temp to raw conversion functions in the Mediatek
LVTS thermal driver and add MT8196 and MT6991 support to it (Laura
Nao)
- Add Mediatek LVTS driver support for MT7987 (Frank Wunderlich)
- Use the existing HZ_PER_MHZ macro on STM32 (Andy Shevchenko)
- Use the existing clamp() macro on BCM2835 (Thorsten Blum)
- Make the reset line optional in order to support new Renesas SoCs
where it is not available and add support for RZ/T2H and RZ/N2H to
the rzg3e thermal driver (Cosmin Tanislav)
- Document RZ/V2N TSU in the r9a09g047-tsu DT bindings (Ovidiu
Panait)
- Fix all kernel-doc warnings in the internal thermal core header
file (Randy Dunlap)
- Fix a device node reference leak in thermal_of_cm_lookup() (Felix
Gu)
- Replace some old-style library function calls with ones that are
currently recommended in several places in the thermal core and
debugfs code (Sumeet Pawnikar, Thorsten Blum)
* tag 'thermal-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
drivers: thermal: intel: tcc_cooling: Drop redundant local variable
thermal/of: Fix reference leak in thermal_of_cm_lookup()
thermal: core: thermal_core.h: fix all kernel-doc warnings
thermal: intel: x86_pkg_temp_thermal: Handle invalid temperature
thermal: renesas: rzg3e: add support for RZ/T2H and RZ/N2H
dt-bindings: thermal: r9a09g047-tsu: document RZ/T2H and RZ/N2H
thermal: renesas: rzg3e: make calibration value retrieval per-chip
thermal: renesas: rzg3e: make min and max temperature per-chip
thermal: renesas: rzg3e: make reset optional
dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2N TSU
thermal/drivers/broadcom: Use clamp to simplify bcm2835_thermal_temp2adc
thermal/drivers/stm32: Use predefined HZ_PER_MHZ instead of a custom one
thermal/drivers/mediatek/lvts_thermal: Add mt7987 support
dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT7987
dt-bindings: nvmem: mediatek: efuse: Add support for MT8196
thermal/drivers/mediatek/lvts_thermal: Add MT8196 support
thermal/drivers/mediatek/lvts: Support MSR offset for 16-bit calibration data
thermal/drivers/mediatek/lvts: Add support for ATP mode
thermal/drivers/mediatek/lvts: Add lvts_temp_to_raw variant
thermal/drivers/mediatek/lvts: Add platform ops to support alternative conversion logic
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGLwcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv+TD/48S2HTnMhmW6AtFYWErQ+sEKXpHrxbYe7S
+qR8/g/T+QSfhfqPwZEuagndFKtIP3LJfaXGSP1Lk1RfP9NLQy91v33Ibe4DjHkp
etWSfnMHA9MUAoWKmg8EvncB2G+ZQFiYCpjazj5tKHD9S2+psGMuL8kq6qzMJE83
uhpb8WutUl4aSIXbMSfyGlwBhI1MjjRbbWlIBmg4yC8BWt1sH8Qn2L2GNVylEIcX
U8At3KLgPGn0axSg4yGMAwTqtGhL/jwdDyeczbmRlXuAr4iVL9UX/yADCYkazt6U
ttQ2/H+cxCwfES84COx9EteAatlbZxo6wjGvZ3xOMiMJVTjYe1x6Gkcckq+LrZX6
tjofi2KK78qkrMXk1mZMkZjpyUWgRtCswhDllbQyqFs0SwzQtno2//Rk8HU9dhbt
pkpryDbGFki9X3upcNyEYp5TYflpW6YhAzShYgmE6KXim2fV8SeFLviy0erKOAl+
fwjTE6KQ5QoQv0s3WxkWa4lREm34O6IHrCUmbiPm5CruJnQDhqAN2QZIDgYC4WAf
0gu9cR/O4Vxu7TQXrumPs5q+gCyDU0u0B8C3mG2s+rIo+PI5cVZKs2OIZ8HiPo0F
x73kR/pX3DMe35ZQkQX22ymMuowV+aQouDLY9DTwakP5acdcg7h7GZKABk6VLB06
gUIsnxURiQ==
=jNzW
-----END PGP SIGNATURE-----
Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Support for batch request processing for ublk, improving the
efficiency of the kernel/ublk server communication. This can yield
nice 7-12% performance improvements
- Support for integrity data for ublk
- Various other ublk improvements and additions, including a ton of
selftests additions and updated
- Move the handling of blk-crypto software fallback from below the
block layer to above it. This reduces the complexity of dealing with
bio splitting
- Series fixing a number of potential deadlocks in blk-mq related to
the queue usage counter and writeback throttling and rq-qos debugfs
handling
- Add an async_depth queue attribute, to resolve a performance
regression that's been around for a qhilw related to the scheduler
depth handling
- Only use task_work for IOPOLL completions on NVMe, if it is necessary
to do so. An earlier fix for an issue resulted in all these
completions being punted to task_work, to guarantee that completions
were only run for a given io_uring ring when it was local to that
ring. With the new changes, we can detect if it's necessary to use
task_work or not, and avoid it if possible.
- rnbd fixes:
- Fix refcount underflow in device unmap path
- Handle PREFLUSH and NOUNMAP flags properly in protocol
- Fix server-side bi_size for special IOs
- Zero response buffer before use
- Fix trace format for flags
- Add .release to rnbd_dev_ktype
- MD pull requests via Yu Kuai
- Fix raid5_run() to return error when log_init() fails
- Fix IO hang with degraded array with llbitmap
- Fix percpu_ref not resurrected on suspend timeout in llbitmap
- Fix GPF in write_page caused by resize race
- Fix NULL pointer dereference in process_metadata_update
- Fix hang when stopping arrays with metadata through dm-raid
- Fix any_working flag handling in raid10_sync_request
- Refactor sync/recovery code path, improve error handling for
badblocks, and remove unused recovery_disabled field
- Consolidate mddev boolean fields into mddev_flags
- Use mempool to allocate stripe_request_ctx and make sure
max_sectors is not less than io_opt in raid5
- Fix return value of mddev_trylock
- Fix memory leak in raid1_run()
- Add Li Nan as mdraid reviewer
- Move phys_vec definitions to the kernel types, mostly in preparation
for some VFIO and RDMA changes
- Improve the speed for secure erase for some devices
- Various little rust updates
- Various other minor fixes, improvements, and cleanups
* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
blk-mq: ABI/sysfs-block: fix docs build warnings
selftests: ublk: organize test directories by test ID
block: decouple secure erase size limit from discard size limit
block: remove redundant kill_bdev() call in set_blocksize()
blk-mq: add documentation for new queue attribute async_dpeth
block, bfq: convert to use request_queue->async_depth
mq-deadline: covert to use request_queue->async_depth
kyber: covert to use request_queue->async_depth
blk-mq: add a new queue sysfs attribute async_depth
blk-mq: factor out a helper blk_mq_limit_depth()
blk-mq-sched: unify elevators checking for async requests
block: convert nr_requests to unsigned int
block: don't use strcpy to copy blockdev name
blk-mq-debugfs: warn about possible deadlock
blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
blk-rq-qos: fix possible debugfs_mutex deadlock
blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
...
Please consider pulling these changes from the signed vfs-7.0-rc1.namespace tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
ovzgAP9BpqMQhMy2VCurru8/T5VAd6eJdgXzEfXqMksL5BNm8gEAsLx666KJNKgm
Sh/yVA2KBjf51gvcLZ4gHOISaMU8bAI=
=RGLf
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- statmount: accept fd as a parameter
Extend struct mnt_id_req with a file descriptor field and a new
STATMOUNT_BY_FD flag. When set, statmount() returns mount information
for the mount the fd resides on — including detached mounts
(unmounted via umount2(MNT_DETACH)).
For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID
mask bits are cleared since neither is meaningful. The capability
check is skipped for STATMOUNT_BY_FD since holding an fd already
implies prior access to the mount and equivalent information is
available through fstatfs() and /proc/pid/mountinfo without
privilege. Includes comprehensive selftests covering both attached
and detached mount cases.
- fs: Remove internal old mount API code (1 patch)
Now that every in-tree filesystem has been converted to the new
mount API, remove all the legacy shim code in fs_context.c that
handled unconverted filesystems. This deletes ~280 lines including
legacy_init_fs_context(), the legacy_fs_context struct, and
associated wrappers. The mount(2) syscall path for userspace remains
untouched. Documentation references to the legacy callbacks are
cleaned up.
- mount: add OPEN_TREE_NAMESPACE to open_tree()
Container runtimes currently use CLONE_NEWNS to copy the caller's
entire mount namespace — only to then pivot_root() and recursively
unmount everything they just copied. With large mount tables and
thousands of parallel container launches this creates significant
contention on the namespace semaphore.
OPEN_TREE_NAMESPACE copies only the specified mount tree (like
OPEN_TREE_CLONE) but returns a mount namespace fd instead of a
detached mount fd. The new namespace contains the copied tree mounted
on top of a clone of the real rootfs.
This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a
single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER)
followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by
the new user namespace. Mount namespace file mounts are excluded from
the copy to prevent cycles. Includes ~1000 lines of selftests"
* tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/open_tree: add OPEN_TREE_NAMESPACE tests
mount: add OPEN_TREE_NAMESPACE
fs: Remove internal old mount API code
selftests: statmount: tests for STATMOUNT_BY_FD
statmount: accept fd as a parameter
statmount: permission check should return EPERM
Add a test to v_ptrace test suite to verify that ptrace accepts the
valid input combinations of vector csr registers. Use kselftest
fixture variants to create multiple inputs for the test.
The test simulates a debug scenario with three breakpoints:
0. init: let the tracee set up its initial vector configuration
1. 1st bp: modify the tracee's vector csr registers from the debugger
- resume the tracee to execute a block without vector instructions
2. 2nd bp: read back the tracees's vector csr registers from the debugger
- compare with values set by the debugger
- resume the tracee to execute a block with vector instructions
3. 3rd bp: read back the tracess's vector csr registers again
- compare with values set by the debugger
The last check helps to confirm that ptrace validation check for vector
csr registers input values works properly and maintains an accurate view
of the tracee's vector context in debugger.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-10-geomatsi@gmail.com
[pjw@kernel.org: cleaned up a checkpatch issue]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Add a test to v_ptrace test suite to verify that ptrace rejects the
invalid input combinations of vector csr registers. Use kselftest
fixture variants to create multiple invalid inputs for the test.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-9-geomatsi@gmail.com
[pjw@kernel.org: cleaned up some checkpatch issues]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Add a test to v_ptrace test suite to verify that vector csr registers
are clobbered on syscalls.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-8-geomatsi@gmail.com
[pjw@kernel.org: cleaned up a checkpatch issue]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Add a test case that attaches to a traced process immediately after its
first executed vector instructions to verify the initial vector context.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-7-geomatsi@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Add a test case to check ptrace behavior in the case when vector
extension is supported by the system, but vector context is not
yet enabled for the traced process.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-6-geomatsi@gmail.com
[pjw@kernel.org: dropped duplicate sys/wait.h include]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
- Misc fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmmGsbEACgkQOlYIJqCj
N/18Iw//U9ZiNSW8k9CGRnXN/hmc8h21cNlTdGliqY3lkf0y7feCb1sEdkCFv6/U
KXlOhGUD8PiVlcJWm3ZWWMq/bJ5Ahcvyvre8RelRMQ5SRw07IojYSI1IkNHpSUBX
brEd8DBG24oaw2El+rkl6mN9fneNUAq4pZtU9QDA/ehKDxpdsym2OAUStAVjXy0R
YtIhsz0k1qX+EN/UIrvBTS6bCG3Ihd6btHgCehqGAOnY2rk5gNR0zChdKV3mdk2t
hsbpKp8rtZppZ9Ltru/ly4TYzaKT/dl9gWt7h1y78fN7XD5orenAe8MOkav3WoPI
zdDkDMzvwjv0p+bGPJKszxJrb4SBagtadvFMmKR+WZ0aYhysdAhxlpt64krqFrSV
wjfNfPQ1Z2qHb9PV4TfuBr4g+OyYZfnBcEvyJswrVHOBTfCoMn4hx4tF0bbSZdLd
nmOVqcXiPPpnOza2EXtYc97PSiHwl/CVlhXguYRPg/FQFnJKHHYoL9aRH4YpyZiK
o/7Bsqe20ouuMoRdVIt+zp8FvhOsuiHV122e6d55+bvNhUGBC4sXNDEKQlmQps4K
yvBUIGWLSx3Por/Iey7Rp+7hCXACf9KXaD1ogG2ZxL7xDE0smj9Jzu2NIzFJWUQ6
uubKwsZBJJDhYAZuDLUFmzoGydntb/Wi/FxetPp7Fzi7D4dnSUI=
=RH/c
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.20
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
- Misc fixes and cleanups.
RCU Tasks Trace:
Re-implement RCU tasks trace in term of SRCU-fast, not only more than 500 lines
of code are saved because of the reimplementation, a new set of API,
rcu_read_{,un}lock_tasks_trace(), becomes possible as well. Compared to the
previous rcu_read_{,un}lock_trace(), the new API avoid the task_struct accesses
thanks to the SRCU-fast semantics. As a result, the old
rcu_read{,un}lock_trace() API is now deprecated.
RCU Torture Test:
- Multiple improvements on kvm-series.sh (parallel run and progress showing
metrics)
- Add context checks to rcu_torture_timer().
- Make config2csv.sh properly handle comments in .boot files.
- Include commit discription in testid.txt.
Miscellaneous RCU changes:
- Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early.
- Use suitable gfp_flags for the init_srcu_struct_nodes().
- Fix rcu_read_unlock() deadloop due to softirq.
- Correctly compute probability to invoke ->exp_current() in rcutorture.
- Make expedited RCU CPU stall warnings detect stall-end races.
RCU nocb:
- Remove unnecessary WakeOvfIsDeferred wake path and callback overload
handling.
- Extract nocb_defer_wakeup_cancel() helper.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCAAvFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmmARZERHGJvcXVuQGtl
cm5lbC5vcmcACgkQSXnow7UH+rh8SAf+PDIBWAkdbGgs32EfgpFY42RB4CWygH47
YRup/M3+nU0JcBzNnona1srpHBXRySBJQbvRbsOdlM45VoNQ2wPjig/3vFVRUKYx
uqj9Tze00DS74IIGESoTGp0amZde9SS9JakNRoEfTr+Zpj8N6LFERQw0ywUwjR5b
RR6bz7q05TAl3u2BYUAgNdnf3VWWTmj4WYwArlQ+qRFAyGN+TVj8Ezra6+K5TJ7u
SQYrf7WmRGOhHbVVolvVEOVdACccI8dFl3ebJVE2Ky0gp1o3BLPkcDLJ6gBdTCoE
rRrbnkeqs5V7tOkPFDBeUhLPrm1QxrdEDxUQFWjSApbv161sx7AOZA==
=O9QQ
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Boqun Feng:
- RCU Tasks Trace:
Re-implement RCU tasks trace in term of SRCU-fast, not only more than
500 lines of code are saved because of the reimplementation, a new
set of API, rcu_read_{,un}lock_tasks_trace(), becomes possible as
well. Compared to the previous rcu_read_{,un}lock_trace(), the new
API avoid the task_struct accesses thanks to the SRCU-fast semantics.
As a result, the old rcu_read{,un}lock_trace() API is now deprecated.
- RCU Torture Test:
- Multiple improvements on kvm-series.sh (parallel run and
progress showing metrics)
- Add context checks to rcu_torture_timer()
- Make config2csv.sh properly handle comments in .boot files
- Include commit discription in testid.txt
- Miscellaneous RCU changes:
- Reduce synchronize_rcu() latency by reporting GP kthread's
CPU QS early
- Use suitable gfp_flags for the init_srcu_struct_nodes()
- Fix rcu_read_unlock() deadloop due to softirq
- Correctly compute probability to invoke ->exp_current()
in rcutorture
- Make expedited RCU CPU stall warnings detect stall-end races
- RCU nocb:
- Remove unnecessary WakeOvfIsDeferred wake path and callback
overload handling
- Extract nocb_defer_wakeup_cancel() helper
* tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (25 commits)
rcu/nocb: Extract nocb_defer_wakeup_cancel() helper
rcu/nocb: Remove dead callback overload handling
rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path
rcu: Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early
srcu: Use suitable gfp_flags for the init_srcu_struct_nodes()
rcu: Fix rcu_read_unlock() deadloop due to softirq
rcutorture: Correctly compute probability to invoke ->exp_current()
rcu: Make expedited RCU CPU stall warnings detect stall-end races
rcutorture: Add --kill-previous option to terminate previous kvm.sh runs
rcutorture: Prevent concurrent kvm.sh runs on same source tree
torture: Include commit discription in testid.txt
torture: Make config2csv.sh properly handle comments in .boot files
torture: Make kvm-series.sh give run numbers and totals
torture: Make kvm-series.sh give build numbers and totals
torture: Parallelize kvm-series.sh guest-OS execution
rcutorture: Add context checks to rcu_torture_timer()
rcutorture: Test rcu_tasks_trace_expedite_current()
srcu: Create an rcu_tasks_trace_expedite_current() function
checkpatch: Deprecate rcu_read_{,un}lock_trace()
rcu: Update Requirements.rst for RCU Tasks Trace
...
resctrl test:
- fixes a devision by zero error on Hygon
- fixes non-contiguous CBM check for Hygon
- defines CPU vendor IDs as bits to match usage
- adds CPU vendor detection for Hygon
- coredeump test: changes to use __builtin_trap() instead of a null pointer
- anon_inode: replaces null pointers with empty arrays
- kublk: includes message in _Static_assert for C11 compatibility
- run_kselftest.sh: adds `--skip` argument option
- pidfd: fixes typo in comment
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAml/QHkACgkQCwJExA0N
QxxB4A/+IQNvqlgD2vT/oZonMUZ6hLNueJFi4DMONUAiLZkloUwf8Diq2ZW1SOTF
RkmkNDcSrW9KG++q5wwSThExlYbmfahhfT2Q7F3KeDZuL85KiK2uFJzPjnpSW43p
iKUIiF5QSPk1UnBV/WsfzLP87dGuc8LD2kD0J2GaMCeQc4miq9WxCFQr6JjeAOJL
lTCq7XcskDHUEVhFSgMkEvBoL0FqRbd14SxE3TeaATzjDOK0nvz4OjqaE4chr2tW
c+OIe6djqv+nc91Gm4WPjlP6SPLbb6QCfJP5ETV4IHvjvcQheTHyyqlnsyQMTEXT
yGyzKrroCSUAjPtpzqKbu+3cF3mji4Cu8MNRzWT/R2MOJO+hkReOmoluQWT84Oxy
5OG6SD3fLnu/Kj8sJP3l7SMAkurjkBGPJyR4ZA+nRw101IjqeMbdZK9/CXm051aG
rOtKZTVX0FSp/DbARjfIoSqhcMiL1wef8Yfnt11frDwdIf7O1iG3kPNYfddMwy5Y
on5KgcWhnw9+zYF2i0dJdKKwWucNPrSs5NxmNcanCtfGqTB5TEIsdDTX+2TxT+V8
8HOJhuk0g8xA+EO310xFpauqOX1/RM95la7NLghLukauiNfbep60ohn1nMCkNWGJ
3P3WFKw/kue0r7tmfzwrKcjWHJ55jai7dvyJ/VtoMd5NgFfElJM=
=1cku
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-next-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"resctrl test:
- fix division by zero error on Hygon
- fix non-contiguous CBM check for Hygon
- define CPU vendor IDs as bits to match usage
- add CPU vendor detection for Hygon
misc:
- coredeump test: use __builtin_trap() instead of a null pointer
- anon_inode: replace null pointers with empty arrays
- kublk: include message in _Static_assert for C11 compatibility
- run_kselftest.sh: add `--skip` argument option
- pidfd: fix typo in comment"
* tag 'linux_kselftest-next-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/pidfd: fix typo in comment
selftests/run_kselftest.sh: Add `--skip` argument option
selftests/resctrl: Fix non-contiguous CBM check for Hygon
selftests/resctrl: Add CPU vendor detection for Hygon
selftests/resctrl: Define CPU vendor IDs as bits to match usage
selftests/resctrl: Fix a division by zero error on Hygon
kselftest/kublk: include message in _Static_assert for C11 compatibility
kselftest/anon_inode: replace null pointers with empty arrays
kselftest/coredump: use __builtin_trap() instead of null pointer
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
and extend x86's infrastructure to support EPT and NPT (for L2 guests).
- Extend several nested VMX tests to also cover nested SVM.
- Add a selftest for nested VMLOAD/VMSAVE.
- Rework the nested dirty log test, originally added as a regression test for
PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
and to hopefully make the test easier to understand and maintain.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmmGr7AACgkQOlYIJqCj
N/362Q//X5VCuR/LYGf7H8MjkOAYfV7u0o3rk2Dvuv1VCXyO0UIMzlCDH7D7j4mV
LE5bhiXEW9ey6xomSs3OVQNvSQqR8zsrwKmyLgNmSJ5F/UsjSgqp+189WCMC3KyT
dOyZZgn+c2FZcOqRE1piUzpvZSFGgnqeIGZLnQ0RlYdqQH63ImkhA00oiPAkgjBi
xnGPxBtQ+rGLHW/NEioIVmCsoi66gLsAOZNwDyRESWslDt6QLD+gQawwyYEV+xg1
XrqXG6y9SK266yeXVHCrNhp2LCc6iJDaZHDiLU6G/FqokWk7nuChR2T1dRpnd2nS
apH5LrJ/IJGeT5ouKZZkundU/xu9E0sYoK2tQ8M1qVrg0FBmsDDa1WLSraVM7wue
QKbgBjp/L9x7vvZA/2CY2IiauKqqllFdlGHsK62kygof8MJx2gnEynYDehSqxIaE
bdhRgsJ7N2cmnOCM1pQWFh3pVcBZY0cMRRtEpUwXXQT+pgkK0xUA2PhjD0gnofwY
ViC2BBdAlivCPBMsc+AEXPNgdDcq7is6oBLZ+DYewI8zKDX6ID2l3/qOc7/OdAEd
RGQQ4wOzCFtk0nkjxQrygot7IcVaeO5aQQFMG4oCJRhaHAjzcdGVVpQT45hs2r7N
OaigdAmdsOk4ZfYJtrGDl8krb2VzA2W7d8V+Tjw8Fhw1i9h6ri4=
=Fzhf
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-selftests-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.20
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
and extend x86's infrastructure to support EPT and NPT (for L2 guests).
- Extend several nested VMX tests to also cover nested SVM.
- Add a selftest for nested VMLOAD/VMSAVE.
- Rework the nested dirty log test, originally added as a regression test for
PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
and to hopefully make the test easier to understand and maintain.
kunit:
- adds __rust_helper to helpers
- fixes up const mis-match in many assert functions
- fixes up const mismatch in test_list_sort
- protects KUNIT_BINARY_STR_ASSERTION against ERR_PTR values
- respects KBUILD_OUTPUT env variable by default
- adds bash completion
kunit tool:
- adds test for nested test result reporting
- fixes to not overwrite test status based on subtest counts
- adds 32-bit big endian ARM configuration to qemu_configs
- renames test_data_path() to _test_data_path()
- fixes relying on implicit working directory change
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAml8f08ACgkQCwJExA0N
Qxws1w/9GCnGCGbcdY+uw4ExyLGWZM/Gfecjtf03yspVl9/ohrkkR5K6wkfXdKMH
Cb1J9dI7z+7GAwtDg2C7AQNa/mtiUBu7fSfrCTohExeGSL4okkJEdk7IcYouPtFG
TCYBnIXiz7bp1HDk7zn1D8w2brstp5VKnNTgvfEWaoH60WypEavukJqmy6knrJ0z
Ot0mNhPi5I/wJUAjSLvg85M4KKL1t0Z6Kj4JWytaPLOhz7UG94JE2XvO6vNmB1Qt
3Scym/8RpR6k7zdp6YriffFjCCPTZMvLZEDQggCsEHiqGJjCAWOp8yBQdhzYEt5g
SbFvZOL3Jvl1jXMfPdOVYFXu08/2hHz7OHqkkrHfa4lgirQxTwAuF+sNkWJ63pci
ppAUtsvs56GY0/+XUWviRJXH+9ajUjpkB6RmeMz/Vlj61pWmKqjqYXinMfFTC1ae
HphrQFNIGVJVvZtF2V6wbYpCNKb9pPy8/Pje4EWofVbIJWgIHCh6PsO+ebhvfyPr
4wuGO75Vg+Jv3kGZycITqxGgn3YdC05RAVQGbLso8yXsVwoZnvGDC+RN+j3pA2WJ
qFkREu/pn9AInHOEr24vHsSL6Dn8le6AWbi0vjNbG23r04fs7WDxeKZWgAFRIO1g
zThDQGDAO4A1E0jW6bXmFA7GD4a78tbyFUflyXTjR9JDeXDWXRs=
=ofp6
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
"kunit:
- add __rust_helper to helpers
- fix up const mismatch in many assert functions
- fix up const mismatch in test_list_sort
- protect KUNIT_BINARY_STR_ASSERTION against ERR_PTR values
- respect KBUILD_OUTPUT env variable by default
- add bash completion
kunit tool:
- add test for nested test result reporting
- do not overwrite test status based on subtest counts
- add 32-bit big endian ARM configuration to qemu_configs
- rename test_data_path() to _test_data_path()
- do not rely on implicit working directory change"
* tag 'linux_kselftest-kunit-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: add bash completion
kunit: tool: test: Don't rely on implicit working directory change
kunit: tool: test: Rename test_data_path() to _test_data_path()
kunit: qemu_configs: Add 32-bit big endian ARM configuration
kunit: tool: Don't overwrite test status based on subtest counts
kunit: tool: Add test for nested test result reporting
kunit: respect KBUILD_OUTPUT env variable by default
kunit: Protect KUNIT_BINARY_STR_ASSERTION against ERR_PTR values
test_list_sort: fix up const mismatch
kunit: fix up const mis-match in many assert functions
rust: kunit: add __rust_helper to helpers
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception.
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process.
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers.
- More pKVM fixes for features that are not supposed to be exposed to
guests.
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor.
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding.
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest.
- Preliminary work for guest GICv5 support.
- A bunch of debugfs fixes, removing pointless custom iterators stored
in guest data structures.
- A small set of FPSIMD cleanups.
- Selftest fixes addressing the incorrect alignment of page
allocation.
- Other assorted low-impact fixes and spelling fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmmGBEkACgkQI9DQutE9
ekPxQQ//VOzle+RVmgzVSJzpNcoW576QGI7+pZLEMIywXTx6rH+uz2FCaZvvgV7M
LrJ+1Qps9ea5Yti9OplNJmQwy1yAHIurZnpnAoMR+EJ5PUeq8p1EAypySpHtmT/d
KngZsbCvSMydNdfJFwGaz3NFSYj05FlTmWNN+Ndq0JFqyMJQMgY2qKDVmg3pWKcv
TLKTNRo9fJFUVhhBIyIoMl2hE36M6Ac3Qd4dUb5J+Fn834QDXgOzVzUjBtkmbSHD
kJ4gbSs2Ic6QsYWtt70RlyRdreBYegA4C3z1cZV6DDQYxp5Jz2oqXYYC31Ro520A
swuI5y9HMct4mOxqPUqf1lhbvsmkjuZ5Iog6P7W+mOtYHXZIzY8F61sv9YAis9/5
XNOHkg9Cn/n8C2RRQ8vnq0FEI1g7se1UGbe/1NkD4xeR/bzhE/AZSoOrRE7G/XJx
qbF9FkPzd4OXYB2Pdm37G1BWsfN4M1bY1rOmmCyMKym793+b/jM7xdoZY1QfbabP
uKiavuK8RYgqxrEilhP0asvafKjpZaJbn2R3jwHZgQDWe7WH5FhXwX2UcUpQsTan
XZd+/cWaYXjLsKJbiAzy3UArgnzSrHPSpwIOkYq8Lf8EvPgS2g3LLJYbw250Cf1G
74stwoK4PgZ3e6k0nkMk43x1swKb13Gp0vCZjVdnIec9EQgOHfI=
=X8iC
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 7.0
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception.
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process.
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers.
- More pKVM fixes for features that are not supposed to be exposed to
guests.
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor.
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding.
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest.
- Preliminary work for guest GICv5 support.
- A bunch of debugfs fixes, removing pointless custom iterators stored
in guest data structures.
- A small set of FPSIMD cleanups.
- Selftest fixes addressing the incorrect alignment of page
allocation.
- Other assorted low-impact fixes and spelling fixes.
1. Add more CPUCFG mask bits.
2. Improve feature detection.
3. Add FPU/LBT delay load support.
4. Set default return value in KVM IO bus ops.
5. Add paravirt preempt feature support.
6. Add KVM steal time test case for tools/selftests.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmmFcN4WHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImeqcAD/9OZOg0J14+UXZ2qF0cGvSWKSCD
I6TRjy2OlVbcUCt7N/M7dppOuaDfv1ilIexulvubglUIvRMJXNvOAjqTU7I4+MOF
3jjUTklnF9gtMjmWjatWwjo8KHim93zc99FDgy7rRNZRAhosO3BFWJ+b5hEk5RMY
jOCGXiAMob3+w26KKDC/FK6xSpVt+rcCRNymc9T8/kLYY2fv+cWbXwmk0U4ry6yG
xGhvzIcsNnjH15rNB9zbleNrw28uxEJ3V/M/F8C5SbF0V71B2XWyRUi5X75ExjzT
gKzYEwoPhcCBLRd/SMk7RCMk/aGS6sFLGbDLShuG9MRtmJAGk4b92wfIXVVRBiAt
TzO0xcQdQvVFZnaKHe/r7x7+roA+790oZbJlpVJVpVgV5obiKM9OCLNtCnWD/n5B
FDV2Xjyfdmk6Br+MSpb7iq+3AKUDAVDEpRLEZkt5nCeVX1IX0y1KdtWb2MxVVULm
VXncgVLiG4RaQRNk1Gzqhgxml/BfN8im2ytK6I7qnUTAmm/GuqRq5fKLJH0hgASr
/kHsPcTam6JSKk/YJO2TXw21O1mZE/RwtTRW4bplq5d/X17cqUpqBmf3TelN1uvI
alx6YkF8lBJ6nd7YHVLypvsEMyPJNNihyjC264E5IcaeRCMjo52Lq/6rkManHZCW
z+qO3gLbJB3TbkLCXA==
=CUZl
-----END PGP SIGNATURE-----
Merge tag 'loongarch-kvm-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.20
1. Add more CPUCFG mask bits.
2. Improve feature detection.
3. Add FPU/LBT delay load support.
4. Set default return value in KVM IO bus ops.
5. Add paravirt preempt feature support.
6. Add KVM steal time test case for tools/selftests.
This warning can be seen with GCC 15.2:
mptcp_connect.c: In function ‘main_loop’:
mptcp_connect.c:1422:37: warning: ‘peer’ may be used uninitialized [-Wmaybe-uninitialized]
1422 | if (connect(fd, peer->ai_addr, peer->ai_addrlen))
| ~~~~^~~~~~~~~
mptcp_connect.c:1377:26: note: ‘peer’ was declared here
1377 | struct addrinfo *peer;
| ^~~~
This variable is set in sock_connect_mptcp() in some conditions. If not,
this helper returns an error, and the program stops. So this is a false
positive, but better removing it by initialising peer to NULL.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-4-c2720ce75c34@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bpf_local_storage_free() already does not rely on local_storage->smap
since switching to kmalloc_nolock(). As local_storage->smap is removed,
fix the outdated test by dropping the local_storage->smap check. Keep
the second map in task local storage map test to test that multiple
elements can be added to the storage similar to sk storage test.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-18-ameryhung@gmail.com
bpf_cgrp_storage_busy has been removed. Use bpf_bprintf_nest_level
instead. This percpu variable is also in the bpf subsystem so that
if it is removed in the future, BPF-CI will catch this type of CI-
breaking change.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-17-ameryhung@gmail.com
Remove a test in test_maps that checks if the updating of the percpu
counter in task local storage map is preemption safe as the percpu
counter is now removed.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-16-ameryhung@gmail.com
Adjust the error code we are checking against as
bpf_task_storage_delete() now returns -EDEADLK or -ETIMEDOUT when
deadlock happens.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-15-ameryhung@gmail.com
Update the expected result of the selftest as recursion of task local
storage syscall and helpers have been relaxed. Now that the percpu
counter is removed, task local storage helpers, bpf_task_storage_get()
and bpf_task_storage_delete() can now run on the same CPU at the same
time unless they cause deadlock.
Note that since there is no percpu counter preventing recursion in
task local storage helpers, bpf_trampoline now catches the recursion
of on_update as reported by recursion_misses.
on_enter: tp_btf/sys_enter
on_update: fentry/bpf_local_storage_update
Old behavior New behavior
____________ ____________
on_enter on_enter
bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a)
bpf_task_storage_trylock succeed bpf_local_storage_update(&map_a)
bpf_local_storage_update(&map_a)
on_update on_update
bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a)
bpf_task_storage_trylock fail on_update::misses++ (1)
return NULL create and return map_a::ptr
map_a::ptr += 1 (1)
bpf_task_storage_delete(&map_a)
return 0
bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b)
bpf_task_storage_trylock fail on_update::misses++ (2)
return NULL create and return map_b::ptr
map_b::ptr += 1 (1)
create and return map_a::ptr create and return map_a::ptr
map_a::ptr = 200 map_a::ptr = 200
bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b)
bpf_task_storage_trylock succeed lockless lookup succeed
bpf_local_storage_update(&map_b) return map_b::ptr
on_update
bpf_task_storage_get(&map_a)
bpf_task_storage_trylock fail
lockless lookup succeed
return map_a::ptr
map_a::ptr += 1 (201)
bpf_task_storage_delete(&map_a)
bpf_task_storage_trylock fail
return -EBUSY
nr_del_errs++ (1)
bpf_task_storage_get(&map_b)
bpf_task_storage_trylock fail
return NULL
create and return ptr
map_b::ptr = 100
Expected result:
map_a::ptr = 201 map_a::ptr = 200
map_b::ptr = 100 map_b::ptr = 1
nr_del_err = 1 nr_del_err = 0
on_update::recursion_misses = 0 on_update::recursion_misses = 2
On_enter::recursion_misses = 0 on_enter::recursion_misses = 0
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-14-ameryhung@gmail.com
Check sk_omem_alloc when the caller of bpf_local_storage_destroy()
returns. bpf_local_storage_destroy() now returns the memory to uncharge
to the caller instead of directly uncharge. Therefore, in the
sk_storage_omem_uncharge, check sk_omem_alloc when bpf_sk_storage_free()
returns instead of bpf_local_storage_destroy().
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-13-ameryhung@gmail.com
The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to
an odd number.
TOO_MANY_FRAGS test case contains an invalid packet consisting of
(xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD
flag set. This packet is expected to be dropped. After that, there is a
valid linear packet, which is expected to be received back.
Once (xdp_zc_max_segs) is an odd number, the last packet cannot be
received, if packet forwarding between Rx and Tx interfaces relies on
the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed,
if all traffic is looped.
Turns out, sending function processes multiple invalid frags as if they
were in 2-frag packets. So once the invalid mbuf packet contains an odd
number of those, the valid packet after gets paired with the previous
invalid descriptor, and hence does not get an ethernet header generated, so
it is either dropped or malformed.
Make invalid packets in verbatim mode always have only a single frag. For
such packets, number of frags is otherwise meaningless, as descriptor flags
are pre-configured in verbatim mode and packet data is not generated for
invalid descriptors.
Fixes: 697604492b ("selftests/xsk: add invalid descriptor test for multi-buffer")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Referenced commit reduced the scope of the variable pkt, so now it has to
be reinitialized via pkt_stream_get_next_rx_pkt(), which also increments
some counters. When the packet is interrupted by the batch ending, pkt
stream therefore proceeds to the next packet, while xsk ring still contains
the previous one, this results in a pkt_nb mismatch.
Decrement the affected counters when packet is interrupted.
Fixes: 8913e653e9 ("selftests/xsk: Iterate over all the sockets in the receive pkts function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Exercise various scenarios where Landlock domains are enforced across
all of a processes' threads.
Test coverage for security/landlock is 91.6% of 2130 lines according to
LLVM 21.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-3-gnoack@google.com
[mic: Fix subject, use EXPECT_EQ(close()), make helpers static, add test
coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a
given Landlock ruleset is applied to all threads of the calling
process, instead of only the current one.
Without this flag, multithreaded userspace programs currently resort
to using the nptl(7)/libpsx hack for multithreaded policy enforcement,
which is also used by libcap and for setuid(2). Using this
userspace-based scheme, the threads of a process enforce the same
Landlock policy, but the resulting Landlock domains are still
separate. The domains being separate causes multiple problems:
* When using Landlock's "scoped" access rights, the domain identity is
used to determine whether an operation is permitted. As a result,
when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads
stops working. This is a problem for programming languages and
frameworks which are inherently multithreaded (e.g. Go).
* In audit logging, the domains of separate threads in a process will
get logged with different domain IDs, even when they are based on
the same ruleset FD, which might confuse users.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com
[mic: Fix restrict_self_flags test, clean up Makefile, allign comments,
reduce local variable scope, add missing includes]
Closes: https://github.com/landlock-lsm/linux/issues/2
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The KVM RISC-V allows Zalasr extensions for Guest/VM so add this
extension to get-reg-list test.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251020042904.32096-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Current vm modes cannot represent riscv guest modes precisely, here add
all 9 combinations of P(56,40,41) x V(57,48,39). Also the default vm
mode is detected on runtime instead of hardcoded one, which might not be
supported on specific machine.
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251105151442.28767-1-wu.fei9@sanechips.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
The KVM RISC-V allows Zilsd and Zclsd extensions for Guest/VM so add
this extension to get-reg-list test.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250826162939.1494021-6-pincheng.plct@isrc.iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
The script now requires IPV6 tunnel support, enable this.
This should have caught by CI, but as the config option is missing,
the tunnel interface isn't added. This results in an error cascade
that ends with "route change default" failure.
That in turn means the "ipv6 tunnel" test re-uses the previous
test setup so the "ip6ip6" test passes and script returns 0.
Make sure to catch such bugs, set ret=1 if device cannot be added
and delete the old default route before installing the new one.
After this change, IPV6_TUNNEL=n kernel builds fail with the expected
FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel
... while builds with IPV6_TUNNEL=m pass as before.
Fixes: 5e51803521 ("selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest")
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Without the preceding patch, this fails with:
FAIL: test_udp_gro_ct: Expected udp conntrack entry
FAIL: test_udp_gro_ct: Expected software segmentation to occur, had 10 and 0
Signed-off-by: Florian Westphal <fw@strlen.de>
LoongArch KVM supports steal time accounting now, here add steal time
test case on LoongArch.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
* kvm-arm64/resx:
: .
: Add infrastructure to deal with the full gamut of RESx bits
: for NV. As a result, it is now possible to have the expected
: semantics for some bits such as SCTLR_EL2.SPAN.
: .
KVM: arm64: Add debugfs file dumping computed RESx values
KVM: arm64: Add sanitisation to SCTLR_EL2
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
KVM: arm64: Remove all traces of FEAT_TME
KVM: arm64: Simplify handling of full register invalid constraint
KVM: arm64: Get rid of FIXED_VALUE altogether
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
KVM: arm64: Move RESx into individual register descriptors
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
KVM: arm64: Simplify FIXED_VALUE handling
KVM: arm64: Convert HCR_EL2.RW to AS_RES1
KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
KVM: arm64: Allow RES1 bits to be inferred from configuration
KVM: arm64: Inherit RESx bits from FGT register descriptors
KVM: arm64: Extend unified RESx handling to runtime sanitisation
KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
KVM: arm64: Introduce standalone FGU computing primitive
KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
arm64: Convert SCTLR_EL2 to sysreg infrastructure
Signed-off-by: Marc Zyngier <maz@kernel.org>
FEAT_TME has been dropped from the architecture. Retrospectively.
I'm sure someone is crying somewhere, but most of us won't.
Clean-up time.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Set UBLK_TEST_DIR to ${TMPDIR:-./ublktest-dir}/${TID}.XXXXXX to create
per-test subdirectories organized by test ID. This makes it easier to
identify and debug specific test runs.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When executing the last MPTCP selftests on older kernels, this output is
printed:
# 001 no JOIN
# join Rx [SKIP]
# join Tx [SKIP]
# fallback [SKIP]
In fact, behind each line, a few counters are checked, and likely not
all of them have been skipped because the they are not available on
these kernels. Instead, "new" and unsupported counters for these groups
are now ignored, and [ OK ] will be printed instead of [SKIP].
Note that on the MPTCP CI, when validating the dev versions, any
unsupported counter will cause the tests to fail. So this is safe not to
print 'SKIP' for these group checks.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-15-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To the TFO, only the file descriptor is needed, the family is not.
Also, the error can be handled the same way when 'sendto()' or
'connect()' are used. Only the printed error message is different.
This avoids a bit of confusions.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-14-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A few loops were declaring 'i', but this variable was not used.
To avoid confusions, use '_' instead: it is more explicit to mark that
this variable is not needed.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-13-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nstat outputs are already printed when calling 'fail_test', no need to
do it again.
While at it, no need to use the dump_stats variable, print the extra
stats directly. And use 'ip -n $ns' instead of 'ip netns exec $ns',
shorter and clearer.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-12-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of waiting for a random amount of time (1 second), wait for an
event to be received on the other side.
To do that, when an address is announced (userspace_pm_add_addr), the
ANNOUNCED is expected. When a new subflow is created
(userspace_pm_add_sf), the SUB_ESTABLISHED event is expected.
With this, the tests can finish quicker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-11-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It looks like most of the time, this helper was simply waiting a bit
more than one second: the previous MPJoin counter was often already at
the expected value. So at the end, it was just checking 10 times for
the MPJoin counter to change, but it was not happening. For the tests,
that was time, it was just waiting longer for nothing.
Instead, use 'wait_mpj' with the expected counter: in the tests, the MPJ
counter can easily be predicted. While at it, stop passing the netns as
argument: here the received MPJoin ACK is checked, which happens on the
server side. If later on, this needs to be checked on the client side,
the helper can be adapted for this case, but better avoid confusions now
if it is not needed.
While at it, stop using 'i' for the variable if it is not used.
With this, the tests can finish quicker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-10-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'wait_mpj' was used just after having created a background connection,
but before creating new subflows. So no MPJ were sent. The intention was
to wait for the connection to be established, which was the same as
doing a simple sleep with a "random" value.
Instead, wait for an "established" event. With this, the tests can
finish quicker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-9-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This file is the only one from this directory not to have all these
header inclusions sorted by type and alphabetical order.
Adapt them, to ease the reading, prevent conflicts during potential
future backport modifying these lines, and also to avoid having UAPI
header inclusions before libc ones, see [1].
Link: https://lore.kernel.org/20260120-uapi-sockaddr-v2-1-63c319111cf6@linutronix.de
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-8-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests for linked register tracking with negative offsets, BPF_SUB,
and alu32. These test for all edge cases like overflows, etc.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204151741.2678118-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously, the verifier only tracked positive constant deltas between
linked registers using BPF_ADD. This limitation meant patterns like:
r1 = r0;
r1 += -4;
if r1 s>= 0 goto l0_%=; // r1 >= 0 implies r0 >= 4
// verifier couldn't propagate bounds back to r0
if r0 != 0 goto l0_%=;
r0 /= 0; // Verifier thinks this is reachable
l0_%=:
Similar limitation exists for 32-bit registers.
With this change, the verifier can now track negative deltas in reg->off
enabling bound propagation for the above pattern.
For alu32, we make sure the destination register has the upper 32 bits
as 0s before creating the link. BPF_ADD_CONST is split into
BPF_ADD_CONST64 and BPF_ADD_CONST32, the latter is used in case of alu32
and sync_linked_regs uses this to zext the result if known_reg has this
flag.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204151741.2678118-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now BPF_END has bitwise tracking support. This patch adds selftests to
cover various cases of BPF_END (`bswap(16|32|64)`, `be(16|32|64)`,
`le(16|32|64)`) with bitwise propagation.
This patch is based on existing `verifier_bswap.c`, and add several
types of new tests:
1. Unconditional byte swap operations:
- bswap16/bswap32/bswap64 with unknown bytes
2. Endian conversion operations (architecture-aware):
- be16/be32/be64: convert to big-endian
* on little-endian: do swap
* on big-endian: truncation (16/32-bit) or no-op (64-bit)
- le16/le32/le64: convert to little-endian
* on big-endian: do swap
* on little-endian: truncation (16/32-bit) or no-op (64-bit)
Each test simulates realistic networking scenarios where a value is
masked with unknown bits (e.g., var_off=(0x0; 0x3f00), range=[0,0x3f00]),
then byte-swapped, and the verifier must prove the result stays within
expected bounds.
Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely track bitwise through byte swap
operations, it can prune the trap path (invalid memory access) that
should be unreachable, allowing the program to pass verification.
If bitwise tracking is incorrect, the verifier cannot prove the trap
is unreachable, causing verification failure.
The tests use preprocessor conditionals (#ifdef __BYTE_ORDER__) to
verify correct behavior on both little-endian and big-endian
architectures, and require Clang 18+ for bswap instruction support.
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Yazhou Tang <tangyazhou518@outlook.com>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204111503.77871-3-ziye@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fix a bug where AVIC is incorrectly inhibited when running with x2AVIC
disabled via module param (or on a system without x2AVIC).
- Fix a dangling device posted IRQs bug by explicitly checking if the irqfd is
still active (on the list) when handling an eventfd signal, instead of
zeroing the irqfd's routing information when the irqfd is deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not disable
posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks for an MSI),
incorrectly leaving the IRQ in posted mode (and leading to use-after-free
and memory leaks on AMD in particular).
This is both the most pressing and scariest, but it's been in -next for
a while.
- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
generating calls to the checked versions of memset() and friends, which
leads to unexpected page faults in guest code due e.g. __memset_chk@plt
not being resolved.
- Explicitly configure the support XSS from within {svm,vmx}_set_cpu_caps() to
fix a bug where VMX will compute the reference VMCS configuration with SHSTK
and IBT enabled, but then compute each CPUs local config with SHSTK and IBT
disabled if not all CET xfeatures are enabled, e.g. if the kernel is built
with X86_KERNEL_IBT=n. The mismatch in features results in differing nVMX
setting, and ultimately causes kvm-intel.ko to refuse to load with nested=1.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmDgqcUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOnPggArLZiVQjG3crGcV9rePwo8Wzwilj8
RgMNPARW6OlKRVSNJTvxvUdQ3G9plrnq0EuvezF27YnNJoptu6XsyaIlGX7wQaEq
Krw5agrbx/8FzH6kvXuE5jD8PwNVKwstML+uN1T+NSBncE88IiZxH0Fp0E2LnBxP
VCeR4NTQeLdXk15O2c3lQ5s9SN0vHeUNKy53myjU1AodwjsQ40BQjwd0z7d8NbYu
1gyH7wMmRvJG39KMVSaQgVwG7WA6HvrKKnydLWG5tVdUA0U7wiUHrFwIprXK/TNG
hSuPpI/pmCI1kdRRRgip+Z7yIWmZt5MnaRDADnZ3nA8+bznl5Uo6kvLROQ==
=CYK3
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- Fix a bug where AVIC is incorrectly inhibited when running with
x2AVIC disabled via module param (or on a system without x2AVIC)
- Fix a dangling device posted IRQs bug by explicitly checking if the
irqfd is still active (on the list) when handling an eventfd signal,
instead of zeroing the irqfd's routing information when the irqfd is
deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not
disable posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks
for an MSI), incorrectly leaving the IRQ in posted mode (and leading
to use-after-free and memory leaks on AMD in particular).
This is both the most pressing and scariest, but it's been in -next
for a while.
- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
generating calls to the checked versions of memset() and friends,
which leads to unexpected page faults in guest code due e.g.
__memset_chk@plt not being resolved.
- Explicitly configure the supported XSS capabilities from within
{svm,vmx}_set_cpu_caps() to fix a bug where VMX will compute the
reference VMCS configuration with SHSTK and IBT enabled, but then
compute each CPUs local config with SHSTK and IBT disabled if not all
CET xfeatures are enabled, e.g. if the kernel is built with
X86_KERNEL_IBT=n.
The mismatch in features results in differing nVMX setting, and
ultimately causes kvm-intel.ko to refuse to load with nested=1.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly configure supported XSS from {svm,vmx}_set_cpu_caps()
KVM: selftests: Add -U_FORTIFY_SOURCE to avoid some unpredictable test failures
KVM: x86: Assert that non-MSI doesn't have bypass vCPU when deleting producer
KVM: Don't clobber irqfd routing type when deassigning irqfd
KVM: SVM: Check vCPU ID against max x2AVIC ID if and only if x2AVIC is enabled
Merge updates of Intel thermal drivers for 6.20/7.0:
- Add Panther Lake, Wildcat Lake and Nova Lake processor IDs to the
list of supported processors in the intel_tcc_cooling thermal
driver (Srinivas Pandruvada)
- Drop unnecessary explicit driver data clearing on removal from the
intel_pch_thermal driver (Kaushlendra Kumar)
- Add support for "slow" workload type hints to the int340x
processor_thermal driver and enable it on the Panther Lake
platform (Srinivas Pandruvada)
- Use sysfs_emit{_at}() in sysfs show functions in Intel thermal
drivers (Thorsten Blum)
- Update the x86_pkg_temp_thermal driver to handle THERMAL_TEMP_INVALID
that can be passed to it via sysfs as expected (Rafael Wysocki)
- Drop a redundant local variable from the intel_tcc_cooling thermal
driver and fix a kerneldoc comment typo in the TCC library (Sumeet
Pawnikar)
* thermal-intel:
drivers: thermal: intel: tcc_cooling: Drop redundant local variable
thermal: intel: x86_pkg_temp_thermal: Handle invalid temperature
thermal: intel: Use sysfs_emit() in a sysfs show function
thermal: intel: fix typo "nagative" in comment for cpu argument
thermal: intel: int340x: Use sysfs_emit{_at}() in sysfs show functions
thermal: intel: selftests: workload_hint: Support slow workload hints
thermal: int340x: processor_thermal: Enable slow workload type hints
thermal: intel: intel_pch_thermal: Drop explicit driver data clearing
thermal: intel: intel_tcc_cooling: Add CPU models in the support list
Add support for normalized CXL address translation through ACPI PRM method
to support AMD Zen5 platforms. Including a conventions doc that explains
how the translation is implemented and for future implementations that
need such setup to comply with the current implementation method.
cxl: Disable HPA/SPA translation handlers for Normalized Addressing
cxl/region: Factor out code into cxl_region_setup_poison()
cxl/atl: Lock decoders that need address translation
cxl: Enable AMD Zen5 address translation using ACPI PRMT
cxl/acpi: Prepare use of EFI runtime services
cxl: Introduce callback for HPA address ranges translation
cxl/region: Use region data to get the root decoder
cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Simplify cxl_root_ops allocation and handling
cxl/region: Store HPA range in struct cxl_region
cxl/region: Store root decoder in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @hpa_range
Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
cxl, doc: Moving conventions in separate files
cxl, doc: Remove isonum.txt inclusion
- Fix a bug where AVIC is incorrectly inhibited when running with x2AVIC
disabled via module param (or on a system without x2AVIC).
- Fix a dangling device posted IRQs bug by explicitly checking if the irqfd is
still active (on the list) when handling an eventfd signal, instead of
zeroing the irqfd's routing information when the irqfd is deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not disable
posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks for an MSI),
incorrectly leaving the IRQ in posted mode (and leading to use-after-free
and memory leaks on AMD in particular).
- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
generating calls to the checked versions of memset() and friends, which
leads to unexpected page faults in guest code due e.g. __memset_chk@plt
not being resolved.
- Explicitly configure the support XSS from within {svm,vmx}_set_cpu_caps() to
fix a bug where VMX will compute the reference VMCS configuration with SHSTK
and IBT enabled, but then compute each CPUs local config with SHSTK and IBT
disabled if not all CET xfeatures are enabled, e.g. if the kernel is built
with X86_KERNEL_IBT=n. The mismatch in features results in differing nVMX
setting, and ultimately causes kvm-intel.ko to refuse to load with nested=1.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAml9aGcACgkQOlYIJqCj
N/2BCw/9E4XyIIpbMJRk1xNBt+dBslBSggzmb6utXhucn0NaO1pKg9fEoqg6bfic
OP4Z+UUV8v4lPswVTob/Iq7L5VqvSPq0wa1//YuzxIce4pl5om7zO7RDFnDS47+w
8367jcmKDzGBKM9KfQN+J5xBYylh82fmd3FtI0212yS6HMOtf9DDioyUz0FXri+s
k1FWssxV2F9k9qp3qfAwqhj/fCSfVBeEaVKoJY6ep/kQPaLMemuFyrPVXxuWxw1Y
4udk0idjzrsDuln6fNvgV281Z2o6ScDMhvouRJZOo8d3njwaoJIgRBbGobTcuRGl
ULHZlilyg14+a8/25TN02SGbDaEZalHXced7F8Y5XLtCsea4ZxzjB4DVwPaKG6Kb
/465mH/lepdTxmelPFvhhA2th+Ku4o1hfCesYFRckO5jhOU8+EaXT3SW46+NLIAY
uXtx0BcVPmBqNPn9Zt82N+9WB/MEVLenTuvhkHW14ON3isVPDrUUPmH1kpoyZ/Uq
6DWmnunTmeWgS2wk3zuEsXWpxK6ko+wK2E8sUIDQiYIGSPTdIjlZMfHNHMicYrGL
b2vQPw/7Jb1NXy0Ek3CYcZes/s2crYDqWcRsg1jFP2INFXytOlzFrBS5bER4ZZVH
JB3UNItHKUcFdOk96wXxVxKXyjjeyBrWWH8Gu+T0ERPNl7Tn54c=
=O9hx
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.19-rc8' of https://github.com/kvm-x86/linux into HEAD
Final KVM fixes for 6.19:
- Fix a bug where AVIC is incorrectly inhibited when running with x2AVIC
disabled via module param (or on a system without x2AVIC).
- Fix a dangling device posted IRQs bug by explicitly checking if the irqfd is
still active (on the list) when handling an eventfd signal, instead of
zeroing the irqfd's routing information when the irqfd is deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not disable
posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks for an MSI),
incorrectly leaving the IRQ in posted mode (and leading to use-after-free
and memory leaks on AMD in particular).
This is both the most pressing and scariest, but it's been in -next for
a while.
- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
generating calls to the checked versions of memset() and friends, which
leads to unexpected page faults in guest code due e.g. __memset_chk@plt
not being resolved.
- Explicitly configure the support XSS from within {svm,vmx}_set_cpu_caps() to
fix a bug where VMX will compute the reference VMCS configuration with SHSTK
and IBT enabled, but then compute each CPUs local config with SHSTK and IBT
disabled if not all CET xfeatures are enabled, e.g. if the kernel is built
with X86_KERNEL_IBT=n. The mismatch in features results in differing nVMX
setting, and ultimately causes kvm-intel.ko to refuse to load with nested=1.
Add AMD Zen5 support for address translation.
Zen5 systems may be configured to use 'Normalized addresses'. Then,
host physical addresses (HPA) are different from their system physical
addresses (SPA). The endpoint has its own physical address space and
an incoming HPA is already converted to the device's physical address
(DPA). Thus it has interleaving disabled and CXL endpoints are
programmed passthrough (DPA == HPA).
Host Physical Addresses (HPAs) need to be translated from the endpoint
to its CXL host bridge, esp. to identify the endpoint's root decoder
and region's address range. ACPI Platform Runtime Mechanism (PRM)
provides a handler to translate the DPA to its SPA. This is documented
in:
AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
ACPI v6.5 Porting Guide, Publication # 58088
https://www.amd.com/en/search/documentation/hub.html
With Normalized Addressing this PRM handler must be used to translate
an HPA of an endpoint to its SPA.
Do the following to implement AMD Zen5 address translation:
Introduce a new file core/atl.c to handle ACPI PRM specific address
translation code. Naming is loosely related to the kernel's AMD
Address Translation Library (CONFIG_AMD_ATL) but implementation does
not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
options respectively to enable the code depending on architecture and
platform options.
AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
System Physical Address). Firmware enables the PRM handler if the
platform has address translation implemented. Check firmware and
kernel support of ACPI PRM using the specific GUID. On success enable
address translation by setting up the earlier introduced root port
callback, see function cxl_prm_setup_translation(). Setup is done in
cxl_setup_prm_address_translation(), it is the only function that
needs to be exported. For low level PRM firmware calls, use the ACPI
framework.
Identify the region's interleaving ways by inspecting the address
ranges. Also determine the interleaving granularity using the address
translation callback. Note that the position of the chunk from one
interleaving block to the next may vary and thus cannot be considered
constant. Address offsets larger than the interleaving block size
cannot be used to calculate the granularity. Thus, probe the
granularity using address translation for various HPAs in the same
interleaving block.
[ dj: Add atl.o build to cxl_test ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://patch.msgid.link/20260114164837.1076338-11-rrichter@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
This test allows to test the various storage key handling functions.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
When using the TEST_HARNESS_MAIN macro definition to declare the main
function, it is required to use the EXPECT*() and ASSERT*() macros in
conjunction and not ksft_test_result_*(). Otherwise, even if a test item
fails, the test will still return a success result because
ksft_test_result_*() does not affect the test harness state.
Convert the code to use EXPECT/ASSERT() variants, which ensures that the
overall test result is fail if one of the EXPECT()s fails.
[ tglx: Massaged change log to explain _why_ ksft_test_result*() is the wrong
choice ]
Fixes: f341a20f6d ("selftests/futex: Refactor futex_requeue with kselftest_harness.h")
Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/tencent_51851B741CC4B5EC9C22AFF70BA82BB60805@qq.com
We had a few patches in this area and no explicit coverage so far.
The test case covers the scenario addressed by the previous fix;
reusing the existing udpgro_fwd.sh script to leverage part of the
of the virtual network setup, even if such script is possibly not
a perfect fit.
Note that the mentioned script already contains several shellcheck
violation; this patch does not fix the existing code, just avoids
adding more issues in the new one.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/768ca132af81e83856e34d3105b86c37e566a7ad.1770032084.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now bpf_timer can be used in tracepoints, so these tests are no longer
relevant.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-9-alexei.starovoitov@gmail.com
Add stress tests for BPF timers that run in NMI context using perf_event
programs attached to PERF_COUNT_HW_CPU_CYCLES.
The tests cover three scenarios:
- nmi_race: Tests concurrent timer start and async cancel operations
- nmi_update: Tests updating a map element (effectively deleting and
inserting new for array map) from within a timer callback
- nmi_cancel: Tests timer self-cancellation attempt.
A common test_common() helper is used to share timer setup logic across
all test modes.
The tests spawn multiple threads in a child process to generate
perf events, which trigger the BPF programs in NMI context. Hit counters
verify that the NMI code paths were actually exercised.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-8-alexei.starovoitov@gmail.com
Refactor timer selftests, extracting stress test into a separate test.
This makes it easier to debug test failures and allows to extend.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-5-alexei.starovoitov@gmail.com
Add a selftest to ensure BPF stream functions can now be called
while holding a lock.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260203180424.14057-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that two registers with their id=0 (unlinked) in the cached state
can be mapped to a single id (linked) in the current state.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260203165102.2302462-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Scalar register IDs are used by the verifier to track relationships
between registers and enable bounds propagation across those
relationships. Once an ID becomes singular (i.e. only a single
register/stack slot carries it), it can no longer contribute to bounds
propagation and effectively becomes stale. The previous commit makes the
verifier clear such ids before caching the state.
When comparing the current and cached states for pruning, these stale
IDs can cause technically equivalent states to be considered different
and thus prevent pruning.
For example, in the selftest added in the next commit, two registers -
r6 and r7 are not linked to any other registers and get cached with
id=0, in the current state, they are both linked to each other with
id=A. Before this commit, check_scalar_ids would give temporary ids to
r6 and r7 (say tid1 and tid2) and then check_ids() would map tid1->A,
and when it would see tid2->A, it would not consider these state
equivalent.
Relax scalar ID equivalence by treating rold->id == 0 as "independent":
if the old state did not rely on any ID relationships for a register,
then any ID/linking present in the current state only adds constraints
and is always safe to accept for pruning. Implement this by returning
true immediately in check_scalar_ids() when old_id == 0.
Maintain correctness for the opposite direction (old_id != 0 && cur_id
== 0) by still allocating a temporary ID for cur_id == 0. This avoids
incorrectly allowing multiple independent current registers (id==0) to
satisfy a single linked old ID during mapping.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260203165102.2302462-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linux Accurate ECN test sets using ACE counters and AccECN options to
cover several scenarios: Connection teardown, different ACK conditions,
counter wrapping, SACK space grabbing, fallback schemes, negotiation
retransmission/reorder/loss, AccECN option drop/loss, different
handshake reflectors, data with marking, and different sysctl values.
The packetdrill used is commit cbe405666c9c8698ac1e72f5e8ffc551216dfa56
of repo: https://github.com/minuscat/packetdrill/tree/upstream_accecn.
And corresponding patches are sent to google/packetdrill email list.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-16-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.
+===================+==========+===============+===========+
| Packet id | CWR flag | Payload | Flushing? |
+===================+==========+===============+===========+
| 0 | 0 | PAYLOAD_LEN | 0 |
| ... | 0 | PAYLOAD_LEN | 1 |
+-------------------+----------+---------------+-----------+
| NUM_PACKETS/2 - 1 | 1 | payload_len | 0 |
| NUM_PACKETS/2 | 1 | payload_len | 1 |
+-------------------+----------+---------------+-----------+
| ... | 0 | PAYLOAD_LEN | 0 |
| NUM_PACKETS | 0 | PAYLOAD_LEN | 1 |
+===================+==========+===============+===========+
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a new kselftest to verify that the total_bw value in
/sys/kernel/debug/sched/debug remains consistent across all CPUs
under different sched_ext BPF program states:
1. Before a BPF scheduler is loaded
2. While a BPF scheduler is loaded and active
3. After a BPF scheduler is unloaded
The test runs CPU stress threads to ensure DL server bandwidth
values stabilize before checking consistency. This helps catch
potential issues with DL server bandwidth accounting during
sched_ext transitions.
Co-developed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-8-arighi@nvidia.com
Add a selftest to validate the correct behavior of the deadline server
for the ext_sched_class.
Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-7-arighi@nvidia.com
The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.
Note that this mode is also supported by stable kernel versions, but
optimised in this patch series.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.
do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.
Usage:
./mptcp_connect.sh -m splice
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-5-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a test which checks that the RSS table is at least 4x the max
queue count supported by the device. The original RSS spec from
Microsoft stated that the RSS indirection table should be 2 to 8
times the CPU count, presumably assuming queue per CPU. If the
CPU count is not a power of two, however, a power-of-2 table
2x larger than queue count results in a 33% traffic imbalance.
Validate that the indirection table is at least 4x the queue
count. This lowers the imbalance to 16% which empirically
appears to be more acceptable to memcache-like workloads.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260131225454.1225151-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The init_enable_count test is flaky. The test forks 1024 children before
attaching the scheduler to verify that existing tasks get ops.init_task()
called. The children were using sleep(1) before exiting.
7900aa699c ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free()
to finish_task_switch()") changed when tasks are removed from scx_tasks -
previously when the task_struct was freed, now immediately in
finish_task_switch() when the task dies.
Before the commit, pre-forked children would linger on scx_tasks until freed
regardless of when they exited, so the scheduler would always see them during
iteration. The sleep(1) was unnecessary. After the commit, children are
removed as soon as they die. The sleep(1) masks the problem in most cases but
the test becomes flaky depending on timing.
Fix by synchronizing properly using a pipe. All children block on read() and
the parent signals them to exit by closing the write end after attaching the
scheduler. The children are auto-reaped so there's no need to wait on them.
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Changwoo Min <changwoo@igalia.com>
Cc: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixup and refactor downstream port enumeration to prepare for CXL port
protocol error handling. Main motivation is to move endpoint
component register mapping to a port object.
cxl/port: Unify endpoint and switch port lookup
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Map Port RAS registers
cxl/port: Move dport RAS setup to dport add time
cxl/port: Move dport probe operations to a driver event
cxl/port: Move decoder setup before dport creation
cxl/port: Cleanup dport removal with a devres group
cxl/port: Reduce number of @dport variables in cxl_port_add_dport()
cxl/port: Cleanup handling of the nr_dports 0 -> 1 transition
Towards the end goal of making all CXL RAS capability handling uniform
across host bridge ports, upstream switch ports, and endpoint ports, move
dport RAS setup. Move it to cxl_switch_port_probe() context for switch / VH
dports (via cxl_port_add_dport()) and cxl_endpoint_port_probe() context for
an RCH dport. Rename the RAS setup helper to devm_cxl_dport_ras_setup() for
symmetry with devm_cxl_switch_port_decoders_setup().
Only the RCH version needs to be exported and the cxl_test mocking can be
deleted with a dev_is_pci() check on the dport_dev.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260131000403.2135324-7-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
In preparation for adding more register setup to the cxl_port_add_dport()
path (for RAS register mapping), move the dport creation event to a driver
callback. This achieves two goals, it puts driver operations logically
where they belong, in a driver, and it obviates the gymnastics of
DECLARE_TESTABLE() which just makes a mess of grepping for CXL symbols.
In other words, a driver callback is less of an ongoing maintenance burden
than this DECLARE_TESTABLE arrangement that does not scale and diminishes
the grep-ability of the codebase.
cxl_port_add_dport() moves mostly unmodified from drivers/cxl/core/port.c.
The only deliberate change is that it now assumes that the device_lock is
held on entry and the driver is attached (just like cxl_port_probe()).
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Tested-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260131000403.2135324-6-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
At present the mm selftests are integrated into the kselftest harness by
having it run run_vmtest.sh and letting it pick it's default set of tests
to invoke, rather than by telling the kselftest framework about each test
program individually as is more standard. This has some unfortunate
interactions with the kselftest harness:
- If any of the tests hangs the harness will kill the entire mm
selftests run rather than just the individual test, meaning no
further tests get run.
- The timeout applied by the harness is applied to the whole run rather
than an individual test which frequently leads to the suite not being
completed in production testing.
Deploy a crude but effective mitigation for these issues by telling the
kselftest framework to run each of the test categories that run_vmtests.sh
has separately. Since kselftest really wants to run test programs this is
done by providing a trivial wrapper script for each categorty that invokes
run_vmtest.sh, this is not a thing of great elegence but it is clear and
simple. Since run_vmtests.sh is doing runtime support detection, scenario
enumeration and setup for many of the tests we can't consistently tell the
framework about the individual test programs.
This has the side effect of reordering the tests, hopefully the testing
is not overly sensitive to this.
Link: https://lkml.kernel.org/r/20260123-selftests-mm-run-suites-separately-v2-1-3e934edacbfa@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When the test fails, it shows whole sampled working set size measurements.
The purpose is showing the distribution of the measured values, to let
the tester know if it was just intermittent failure. Multiple same values
on the output are therefore unnecessary. It was not a big deal since the
test was failing only once in the past. But the test can now fail
multiple times with increased working set size, until it passes or the
working set size reaches a limit. Hence the noisy output can be quite
long and annoying. Print only the deduplicated distribution information.
Link: https://lkml.kernel.org/r/20260117020731.226785-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMON selftest for working set size estimation collects DAMON's working
set size measurements of the running artificial memory access generator
program until the program is finished. Depending on how quickly the
program finishes, and how quickly DAMON starts, the number of collected
working set size measurements may vary, and make the test results
unreliable. Ensure it collects 40 measurements by using the repeat mode
of the artificial memory access generator program, and finish the
measurements only after the desired number of collections are made.
Link: https://lkml.kernel.org/r/20260117020731.226785-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
'access_memory' is an artificial memory access generator program that is
used for a few DAMON selftests. It accesses a given number of regions one
by one only once, and exits. Depending on systems, the test workload may
exit faster than expected, making the tests unreliable. For reliable
control of the artificial memory access pattern, add a mode to make it
repeat running.
Link: https://lkml.kernel.org/r/20260117020731.226785-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMON reads and writes Accessed bits of page tables without manual TLB
flush for two reasons. First, it minimizes the overhead. Second, real
systems that need DAMON are expected to be memory intensive enough to
cause periodic TLB flushes. For test setups that use small test
workloads, however, the system's TLB could be big enough to cover whole or
most accesses of the test workload. In this case, no page table walk
happens and DAMON cannot show any access from the test workload.
The test workload for DAMON's working set size estimation selftest is such
a case. It accesses only 10 MiB working set, and it turned out there are
test setups that have TLBs large enough to cover the 10 MiB data accesses.
As a result, the test fails depending on the test machine.
Make it more reliable by trying larger working sets up to 160 MiB when it
fails.
Link: https://lkml.kernel.org/r/20260117020731.226785-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/damon: improve leak detection and wss estimation
reliability".
Two DAMON selftets, namely 'sysfs_memcg_leak' and
'sysfs_update_schemes_tried_regions_wss_estimation' frequently show
intermittent failures due to their unreliable leak detection and working
set size estimation. Make those more reliable.
This patch (of 5):
sysfs_memcg_path_leak.sh determines if the memory leak has happened by
seeing if Slab size on /proc/meminfo increases more than expected after an
action. Depending on the system and background workloads, the reasonable
expectation varies. For the reason, the test frequently shows
intermittent failures. Use kmemleak, which is much more reliable and
correct, instead.
Link: https://lkml.kernel.org/r/20260117020731.226785-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260117020731.226785-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This self test is asserting internal implementation details and is highly
vulnerable to internal kernel changes as a result.
It is currently failing locally from at least v6.17, and it seems that it
may have been failing for longer in many configurations/hardware as it
skips if e.g. CONFIG_ANON_VMA_NAME is not specified.
With these skips and the fact that run_vmtests.sh won't run the tests in
certain configurations it is likely we have simply missed this test being
broken in CI for a long while.
I have tried multiple versions of these tests and am unable to find a
working bisect as previous versions of the test fail also.
The tests are essentially mmap()'ing a series of mappings with no hint and
asserting what the get_unmapped_area*() functions will come up with, with
seemingly few checks for what other mappings may already be in place.
It then appears to be mmap()'ing with a hint, and making a series of
similar assertions about the internal implementation details of the
hinting logic.
Commit 0ef3783d75 ("selftests/mm: add support to test 4PB VA on PPC64"),
commit 3bd6137220 ("selftests/mm: virtual_address_range: avoid reading
from VM_IO mappings"), and especially commit a005145b9c ("selftests/mm:
virtual_address_range: mmap() without PROT_WRITE") are good examples of
the whack-a-mole nature of maintaining this test.
The last commit there being particularly pertinent as it was accounting
for an internal implementation detail change that really should have no
bearing on self-tests, that is commit e93d2521b2 ("x86/vdso: Split
virtual clock pages into dedicated mapping").
The purpose of the mm self-tests are to assert attributes about the API
exposed to users, and to ensure that expectations are met.
This test is emphatically not doing this, rather making a series of
assumptions about internal implementation details and asserting them.
It therefore, sadly, seems that the best course is to remove this test
altogether.
Link: https://lkml.kernel.org/r/20260116132053.857887-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pfnmap currently checks the target file in FIXTURE_SETUP(pfnmap), meaning
once for every test, and skips the test if any check fails.
The target file is the same for every test so this is a little overkill.
More importantly, this approach means that the whole suite will report
PASS even if all the tests are skipped because kernel configuration (e.g.
CONFIG_STRICT_DEVMEM=y) prevented /dev/mem from being mapped, for
instance.
Let's ensure that KSFT_SKIP is returned as exit code if any check fails by
performing the checks in pfnmap_init(), run once. That function also
takes care of finding the offset of the pages to be mapped and saves it in
a global. The file is now opened only once and the fd saved in a global,
but it is still mapped/unmapped for every test, as some of them modify the
mapping.
Link: https://lkml.kernel.org/r/20260122170224.4056513-10-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make sure pagemap_ioctl exits with an appropriate value:
* If the tests are run, call ksft_finished() to report the right
status instead of reporting PASS unconditionally.
* Report SKIP if userfaultfd isn't available (in line with other
tests)
* Report FAIL if we failed to open /proc/self/pagemap, as this file
has been added a long time ago and doesn't depend on any CONFIG
option (returning -EINVAL from main() is meaningless)
Link: https://lkml.kernel.org/r/20260122170224.4056513-9-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
One of the pagemap_ioctl tests attempts to fault in pages by memcpy()'ing
them to an unused buffer. This probably worked originally, but since
commit 46036188ea ("selftests/mm: build with -O2") the compiler is free
to optimise away that unused buffer and the memcpy() with it. As a result
there might not be any resident page in the mapping and the test may fail.
We don't need to copy all that memory anyway. Just fault in every page.
While at it also make sure to compute the number of pages once using
simple integer arithmetic instead of ceilf() and implicit conversions.
Link: https://lkml.kernel.org/r/20260122170224.4056513-8-kevin.brodsky@arm.com
Fixes: 46036188ea ("selftests/mm: build with -O2")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
FORCE_READ(*addr) ensures that the compiler will emit a load from addr.
Several tests need to trigger such a load for a range of pages, ensuring
that every page is faulted in, if it wasn't already.
Introduce a new helper force_read_pages() that does exactly that and
replace existing loops with a call to it.
The step size (regular/huge page size) is preserved for all loops, except
in split_huge_page_test. Reading every byte is unnecessary; we now read
every huge page, matching the following call to check_huge_file().
Link: https://lkml.kernel.org/r/20260122170224.4056513-7-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Many cow tests rely on FORCE_READ() to populate pages. Introduce a helper
to make sure that the pages are actually populated, and fail otherwise.
Link: https://lkml.kernel.org/r/20260122170224.4056513-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 5bbc2b785e ("selftests/mm: fix FORCE_READ to read input value
correctly") modified FORCE_READ() to take a value instead of a pointer.
It also changed most of the call sites accordingly, but missed many of
them in cow.c. In those cases, we ended up with the pointer itself being
read, not the memory it points to.
No failure occurred as a result, so it looks like the tests work just fine
without faulting in. However, the huge_zeropage tests explicitly check
that pages are populated, so those became skipped.
Convert all the remaining FORCE_READ() to fault in the mapped page, as was
originally intended. This allows the huge_zeropage tests to run again (3
tests in total).
Link: https://lkml.kernel.org/r/20260122170224.4056513-5-kevin.brodsky@arm.com
Fixes: 5bbc2b785e ("selftests/mm: fix FORCE_READ to read input value correctly")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
check_config.sh checks that liburing is available by running the compiler
provided as its first argument. This makes two assumptions:
1. CC consists of only one word
2. No extra flag is required
Unfortunately, there are many situations where these assumptions don't
hold. For instance:
- When using Clang, CC consists of multiple words
- When cross-compiling, extra flags may be required to allow the
compiler to find headers
Remove these assumptions by passing down CC and CFLAGS as-is from the
Makefile, so that the same command line is used as when actually building
the tests.
Link: https://lkml.kernel.org/r/20260122170224.4056513-4-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 96ed62ea02 ("mm: page_frag: fix a compile error when kernel is
not compiled") introduced a check to avoid attempting to build the
page_frag module if <linux/page_frag_cache.h> is missing.
Unfortunately this check only works if KDIR points to /lib/modules/... or
an in-tree kernel build. It always fails if KDIR points to an out-of-tree
build (i.e. when the kernel was built with O=... make) because only
generated headers are present under $KDIR/include/ in that case.
A recent commit switched KDIR to default to the kernel's build directory,
so that check is no longer justified.
Link: https://lkml.kernel.org/r/20260122170224.4056513-3-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: wang lian <lianux.mm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Various mm kselftests improvements/fixes", v3.
Various improvements/fixes for the mm kselftests:
- Patch 1-3 extend support for more build configurations: out-of-tree
$KDIR, cross-compilation, etc.
- Patch 4-7 fix issues related to faulting in pages, introducing a new
helper for that purpose.
- Patch 8 fixes the value returned by pagemap_ioctl (PASS was always
returned, which explains why the issue fixed in patch 6 went
unnoticed).
- Patch 9 improves the exit code of pfnmap.
Net results:
- 1 test no longer fails (patch 7)
- 3 tests are no longer skipped (patch 4)
- More accurate return values for whole suites (patch 8, 9)
- Extra tests are more likely to be built (patch 1-3)
This patch (of 9):
KDIR currently defaults to the running kernel's modules directory when
building the page_frag module. The underlying assumption is that most
users build the kselftests in order to run them against the system they're
built on.
This assumption seems questionable, and there is no guarantee that the
module can actually be built against the running kernel.
Switch the default value of KDIR to the kernel's build directory, i.e.
$(O) if O= or KBUILD_OUTPUT= is used, and the source directory otherwise.
This seems like the least surprising option: the test module is built
against the kernel that has been previously built.
Note: we can't use $(top_srcdir) in mm/Makefile because it is only defined
once lib.mk is included.
Link: https://lkml.kernel.org/r/20260122170224.4056513-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20260122170224.4056513-2-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Usama Anjum <Usama.Anjum@arm.com>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove test_generic_01.sh since block layer may reorder I/O, making
the test prone to false positives. Apply the improvements to
test_generic_02.sh instead, which supposes for covering ublk dispatch
io order.
Rework test_generic_02 to verify that ublk dispatch doesn't reorder I/O
by comparing request start order with completion order using bpftrace.
The bpftrace script now:
- Tracks each request's start sequence number in a map keyed by sector
- On completion, verifies the request's start order matches expected
completion order
- Reports any out-of-order completions detected
The test script:
- Wait bpftrace BEGIN code block is run
- Pins fio to CPU 0 for deterministic behavior
- Uses block_io_start and block_rq_complete tracepoints
- Checks bpftrace output for reordering errors
Reported-and-tested-by: Alexander Atanasov <alex@zazolabs.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When running tests in parallel with high JOBS count (e.g., JOBS=64),
the existing timeouts can be insufficient due to system load:
- Increase state wait loops from 20/50 to 100 iterations in
_recover_ublk_dev(), __ublk_quiesce_dev(), and __ublk_kill_daemon()
to handle slower state transitions under heavy load
- Add --timeout=20 to udevadm settle calls to prevent indefinite
hangs when udev event queue is overwhelmed by rapid device
creation/deletion
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add _ublk_sleep() helper function that uses different sleep times
depending on whether tests run in parallel or sequential mode.
Usage: _ublk_sleep <normal_secs> <parallel_secs>
Export JOBS variable from Makefile so test scripts can detect parallel
execution, and use _ublk_sleep in test_part_02.sh to handle the
partition scan delay (1s normal, 5s parallel).
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add convenient Makefile targets for running specific test groups:
- run_generic, run_batch, run_null, run_loop, run_stripe, run_stress, etc.
- run_all for running all tests
Test groups are auto-detected from TEST_PROGS using pattern matching
(test_<group>_<num>.sh -> group), and targets are generated dynamically
using define/eval templates.
Supports parallel execution via JOBS variable:
- JOBS=1 (default): sequential with kselftest TAP output
- JOBS>1: parallel execution with xargs -P
Usage examples:
make run_null # Sequential execution
make run_stress JOBS=4 # Parallel with 4 jobs
make run_all JOBS=8 # Run all tests with 8 parallel jobs
With JOBS=8, running time of `make run_all` is reduced to 2m2s from 6m5s
in my test VM.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Track device IDs in UBLK_DEVS array when created. Update
_cleanup_test() to only delete devices created by this test
instead of using 'del -a' which removes all devices.
This prepares for running tests concurrently where each test
should only clean up its own devices.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add _ublk_del_dev() to delete a specific ublk device by ID and
use it in all test scripts instead of calling UBLK_PROG directly.
Also remove unused _remove_ublk_devices() function.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Encapsulate each test case in its own function for better organization
and maintainability:
- _setup_device(): device and backfile initialization
- _test_fill_and_verify(): initial data population
- _test_corrupted_reftag(): reftag corruption detection test
- _test_corrupted_data(): data corruption detection test
- _test_bad_apptag(): apptag mismatch detection test
Also fix temp file creation to use ${UBLK_TEST_DIR}/fio_err_XXXXX instead of
creating in current directory.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove intermediate TDIR variable and set UBLK_TEST_DIR directly
in _prep_test(). Remove default initialization since the directory
is created dynamically when tests run.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The added fsession does not prevent running on those architectures, that
haven't added fsession support.
For example, try to run fsession tests on arm64:
test_fsession_basic:PASS:fsession_test__open_and_load 0 nsec
test_fsession_basic:PASS:fsession_attach 0 nsec
check_result:FAIL:test_run_opts err unexpected error: -14 (errno 14)
In order to prevent such errors, add bpf_jit_supports_fsession() to guard
those architectures.
Fixes: 2d419c4465 ("bpf: add fsession support")
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260131144950.16294-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Create and use a temporary directory for the files created during
test runs. If TMPDIR environment variable is set use it as a base
for the temporary directory path.
TMPDIR=/mnt/scratch make run_tests
and
TMPDIR=/mnt/scratch ./test_generic_01.sh
will place test directory under /mnt/scratch
Signed-off-by: Alexander Atanasov <alex@zazolabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Log test start and end time in dmesg, so generated log messages
during the test run can be linked to specific test from the test
suite.
(switch to `date +%F %T`)
Signed-off-by: Alexander Atanasov <alex@zazolabs.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The null target doesn't handle IO, so disable partition scan to avoid IO
failures caused by integrity verification during the kernel's partition
table read.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Encapsulate each test case in its own function that creates the
device, runs checks, and deletes only that device. This avoids
calling _cleanup_test multiple times.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This test exercises partition scanning behavior, so move it to
the test_part_* group for consistency.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add test_part_01.sh to test the UBLK_F_NO_AUTO_PART_SCAN feature
flag which allows suppressing automatic partition scanning during
device startup while still allowing manual partition probing.
The test verifies:
- Normal behavior: partitions are auto-detected without the flag
- With flag: partitions are not auto-detected during START_DEV
- Manual scan: blockdev --rereadpt works with the flag
Also update kublk tool to support --no_auto_part_scan option and
recognize the feature flag.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add automatic TID derivation in test_common.sh based on the script
filename. The TID is extracted by stripping the "test_" prefix and
".sh" suffix from the script name (e.g., test_loop_01.sh -> loop_01).
This removes the need for each test script to manually define TID,
reducing boilerplate and preventing potential mismatches between
the script name and TID. Scripts can still override TID after
sourcing test_common.sh if needed.
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Adding support to call bpf_get_stackid helper from trigger programs,
so far added for kprobe multi.
Adding the --stacktrace/-g option to enable it.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-7-jolsa@kernel.org
Adding test that attaches fentry/fexitand verifies the
ORC stacktrace matches expected functions.
The test is only for ORC unwinder to keep it simple.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-6-jolsa@kernel.org
Adding test that attaches kprobe/kretprobe and verifies the
ORC stacktrace matches expected functions.
The test is only for ORC unwinder to keep it simple.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-5-jolsa@kernel.org
We now include the attached function in the stack trace,
fixing the test accordingly.
Fixes: c9e208fa93 ("selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-4-jolsa@kernel.org
Recent x86 kernels export __preempt_count as a ksym, while some old kernels
between v6.1 and v6.14 expose the preemption counter via
pcpu_hot.preempt_count. The existing selftest helper unconditionally
dereferenced __preempt_count, which breaks BPF program loading on such old
kernels.
Make the x86 preemption count lookup version-agnostic by:
- Marking __preempt_count and pcpu_hot as weak ksyms.
- Introducing a BTF-described pcpu_hot___local layout with
preserve_access_index.
- Selecting the appropriate access path at runtime using ksym availability
and bpf_ksym_exists() and bpf_core_field_exists().
This allows a single BPF binary to run correctly across kernel versions
(e.g., v6.18 vs. v6.13) without relying on compile-time version checks.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260130021843.154885-1-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding test that makes sure we can't mix sleepable and non-sleepable
bpf programs in the BPF_MAP_TYPE_PROG_ARRAY map and that we can do
tail call in the sleepable program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260130081208.1130204-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test_rss_flow_label_6only test case fails on devices that do not
support IPv6 flow label hashing. Make it skip neatly, consistent with
the behavior of the test_rss_flow_label case.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20260128090217.663366-1-noren@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch moves netconsole selftests from drivers/net to its own target
in drivers/net/netconsole.
This change helps saving some resources from CI since tests in
drivers/net automatically run against real hardware which are not used
by netconsole tests as they rely solely on netdevsim.
lib_netcons.sh is kept under drivers/net/lib since it is also used by
bonding selftests. Finally, drivers/net config remains unchanged as
netpoll_basic.py requires netconsole (and does leverage real HW testing).
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Link: https://patch.msgid.link/20260127-netcons-selftest-target-v2-1-f509ab65b3bc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add BAR_SUBRANGE_TEST to the pci_endpoint kselftest suite.
The test uses the PCITEST_BAR_SUBRANGE ioctl and will skip when the
chosen BAR is disabled (-ENODATA), when the endpoint/controller does not
support subrange mapping (-EOPNOTSUPP), or when the BAR is reserved for
the test register space (-EBUSY).
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260124145012.2794108-9-den@valinux.co.jp
There are no known regressions currently under investigation.
Current release - fix to a fix:
- can: gs_usb_receive_bulk_callback(): fix error message
Current release - regressions:
- eth: gve: fix probe failure if clock read fails
Previous releases - regressions:
- ipv6: use the right ifindex when replying to icmpv6 from localhost
- mptcp: fix race in mptcp_pm_nl_flush_addrs_doit()
- bluetooth: fix null-ptr-deref in hci_uart_write_work
- eth: sfc: fix deadlock in RSS config read
- eth: ice: ifix NULL pointer dereference in ice_vsi_set_napi_queues
- eth: mlx5: fix memory leak in esw_acl_ingress_lgcy_setup()
Previous releases - always broken:
- core: fix segmentation of forwarding fraglist GRO
- wifi: mac80211: correctly decode TTLM with default link map
- mptcp: avoid dup SUB_CLOSED events after disconnect
- nfc: fix memleak in nfc_llcp_send_ui_frame().
- eth: bonding: fix use-after-free due to enslave fail
- eth: mlx5e:
- TC, delete flows only for existing peers
- fix inverted cap check in tx flow table root disconnect
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAml7ghkSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk+qUP/R3xFIu6f0/aexjEaj5n/GxIEqFp3etE
zdFFRHniApSn88qmwIYn3PsviGzF5zqfA8gpXuABJpcck6uWpZs+6epXTAVWEueA
frf8TbDY/cuXpA9tq84tuRifHYo96uJp71iC3Hn2Ph+VAQ2u+FFe8hW04lHB0Bij
3VbRGPa+Z1xjjzzgvPERMIzBAt9Zomkh5DX+VGNwQgbTZwmnhxhJ6+mCM8/0cyrG
lVMatK8gxllAhsQqKSBENufzJCzXEPZ+sHuDonxQ62mQ0Sng7VDcaiezsqx4ufhO
f8idnDV+S74Vwi5gHKMPfOD6KrwPkrtwyREltDI8x+IW90TzySd5Jtd0EO60y/14
tnbqEJlGjDYyheXdgTFFgVWNCWCrdxeZKofpkJB/3COx+BC2QQd//Z7OI1LL1mFt
lmFu+vGSSnz1GMPpqtZ1TiS/BUwitRboeXZvzFM7EkPK+zSp8ddIgRMUoKxGDdBO
Hg1pKyYFlH9zg/HuGT0k1NyZ1MhhWXpLeAYJM5R+ep/s3qFrcTLe3OQweR02gzt6
4gOFolP6ei4/EkYRnOLdBRjQdUi+S3Kh9EGuV0kjgCoO0qNBg2sk/U7fhz/nqvvj
CBhw9hdbXEWC0Zg937R3TMZya2/BDPokS9XWobuE/BEfvga6pWintDgIN6eXE21m
P6p86ugToMrU
=m4lK
-----END PGP SIGNATURE-----
Merge tag 'net-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, CAN and wireless.
There are no known regressions currently under investigation.
Current release - fix to a fix:
- can: gs_usb_receive_bulk_callback(): fix error message
Current release - regressions:
- eth: gve: fix probe failure if clock read fails
Previous releases - regressions:
- ipv6: use the right ifindex when replying to icmpv6 from localhost
- mptcp: fix race in mptcp_pm_nl_flush_addrs_doit()
- bluetooth: fix null-ptr-deref in hci_uart_write_work
- eth:
- sfc: fix deadlock in RSS config read
- ice: ifix NULL pointer dereference in ice_vsi_set_napi_queues
- mlx5: fix memory leak in esw_acl_ingress_lgcy_setup()
Previous releases - always broken:
- core: fix segmentation of forwarding fraglist GRO
- wifi: mac80211: correctly decode TTLM with default link map
- mptcp: avoid dup SUB_CLOSED events after disconnect
- nfc: fix memleak in nfc_llcp_send_ui_frame().
- eth:
- bonding: fix use-after-free due to enslave fail
- mlx5e:
- TC, delete flows only for existing peers
- fix inverted cap check in tx flow table root disconnect"
* tag 'net-6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
net: fix segmentation of forwarding fraglist GRO
wifi: mac80211: correctly decode TTLM with default link map
selftests: mptcp: join: fix local endp not being tracked
selftests: mptcp: check subflow errors in close events
mptcp: only reset subflow errors when propagated
selftests: mptcp: check no dup close events after error
mptcp: avoid dup SUB_CLOSED events after disconnect
net/mlx5e: Skip ESN replay window setup for IPsec crypto offload
net/mlx5: Fix vhca_id access call trace use before alloc
net/mlx5: fs, Fix inverted cap check in tx flow table root disconnect
net: phy: micrel: fix clk warning when removing the driver
net/mlx5e: don't assume psp tx skbs are ipv6 csum handling
net: bridge: fix static key check
nfc: nci: Fix race between rfkill and nci_unregister_device().
gve: fix probe failure if clock read fails
net/mlx5e: Account for netdev stats in ndo_get_stats64
net/mlx5e: TC, delete flows only for existing peers
net/mlx5: Fix Unbinding uplink-netdev in switchdev mode
ice: stop counting UDP csum mismatch as rx_errors
ice: Fix NULL pointer dereference in ice_vsi_set_napi_queues
...
Add a kselftest for RISC-V control flow integrity implementation for
user mode. There is not a lot going on in the kernel to enable landing
pad for user mode. CFI selftests are intended to be compiled with a
zicfilp and zicfiss enabled compiler. This kselftest simply checks if
landing pads and shadow stacks for the process are enabled or not and
executes ptrace selftests on CFI. The selftest then registers a
SIGSEGV signal handler. Any control flow violations are reported as
SIGSEGV with si_code = SEGV_CPERR. The test will fail on receiving
any SEGV_CPERR. The shadow stack part has more changes in the kernel,
and thus there are separate tests for that.
- Exercise 'map_shadow_stack' syscall
- 'fork' test to make sure COW works for shadow stack pages
- gup tests
Kernel uses FOLL_FORCE when access happens to memory via
/proc/<pid>/mem. Not breaking that for shadow stack.
- signal test. Make sure signal delivery results in token creation on
shadow stack and consumes (and verifies) token on sigreturn
- shadow stack protection test. attempts to write using regular store
instruction on shadow stack memory must result in access faults
- ptrace test: adds landing pad violation, clears ELP and continues
In case the toolchain doesn't support the CFI extension, the CFI
kselftest won't be built.
Test output
===========
"""
TAP version 13
1..5
This is to ensure shadow stack is indeed enabled and working
This is to ensure shadow stack is indeed enabled and working
ok 1 shstk fork test
ok 2 map shadow stack syscall
ok 3 shadow stack gup tests
ok 4 shadow stack signal tests
ok 5 memory protections of shadow stack memory
"""
Suggested-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-28-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; cleaned up patch description, code comments]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
We've run out of bits to describe RISC-V ISA extensions in our initial
hwprobe key, RISCV_HWPROBE_KEY_IMA_EXT_0. So, let's add
RISCV_HWPROBE_KEY_IMA_EXT_1, along with the framework to set the
appropriate hwprobe tuple, and add testing for it.
Based on a suggestion from Andrew Jones <andrew.jones@oss.qualcomm.com>,
also fix the documentation for RISCV_HWPROBE_KEY_IMA_EXT_0.
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Similar to IPIP, introduce specific selftest for IP6IP6 flowtable SW
acceleration in nft_flowtable.sh
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
When running this mptcp_join.sh selftest on older kernel versions not
supporting local endpoints tracking, this test fails because 3 MP_JOIN
ACKs have been received, while only 2 were expected.
It is not clear why only 2 MP_JOIN ACKs were expected on old kernel
versions, while 3 MP_JOIN SYN and SYN+ACK were expected. When testing on
the v5.15.197 kernel, 3 MP_JOIN ACKs are seen, which is also what is
expected in the selftests included in this kernel version, see commit
f4480eaad489 ("selftests: mptcp: add missing join check").
Switch the expected MP_JOIN ACKs to 3. While at it, move this
chk_join_nr helper out of the special condition for older kernel
versions as it is now the same as with more recent ones. Also, invert
the condition to be more logical: what's expected on newer kernel
versions having such helper first.
Fixes: d4c81bbb86 ("selftests: mptcp: join: support local endpoint being tracked or not")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-5-7f71e1bc4feb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This validates the previous commit: subflow closed events should contain
an error field when a subflow got closed with an error, e.g. reset or
timeout.
For this test, the chk_evt_nr helper has been extended to check
attributes in the matched events.
In this test, the 2 subflow closed events should have an error.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-4-7f71e1bc4feb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This validates the previous commit: subflow closed events are re-sent
with less info when the initial subflow is disconnected after an error
and each time a subflow is closed after that.
In this new test, the userspace PM is involved because that's how it was
discovered, but it is not specific to it. The initial subflow is
terminated with a RESET, and that will cause the subflow disconnect.
Then, a new subflow is initiated, but also got rejected, which cause a
second subflow closed event, but not a third one.
While at it, in case of failure to get the expected amount of events,
the events are printed.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: d82809b6c5 ("mptcp: avoid duplicated SUB_CLOSED events")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-2-7f71e1bc4feb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a bug in assoc_sk_only_mismatch() and
assoc_sk_only_mismatch_tx() that creates a race condition which
triggers test flakes in later test cases e.g. data_send_bad_key().
The problem is that the client uses the "conn clr" rpc to setup a data
connection with psp_responder, but never uses a matching "data close"
rpc. This creates a race condition where if the client can queue
another data sock request, like in data_send_bad_key(), before the
server can accept the old connection from the backlog we end up in a
situation where we have two connections in the backlog: one for the
closed connection we have received a FIN for, and one for the new PSP
connection which is expecting to do key exchange.
From there the server pops the closed connection from the backlog, but
the data_send_bad_key() test case in psp.py hangs waiting to perform
key exchange.
The fix is to properly use _conn_close, which fill force the server to
remove the closed connection from the backlog before sending the RPC
ack to the client.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260127-psp-flaky-test-v1-1-13403e390af3@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test tcp_tx_timestamp() behavior after ("tcp: tcp_tx_timestamp()
must look at the rtx queue").
Without the fix, this new test fails like this:
tcp_timestamping_tcp_tx_timestamp_bug.pkt:55: runtime error in recvmsg call: Expected result 0 but got -1 with errno 11 (Resource temporarily unavailable)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20260127123828.4098577-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with
xxd (part of vim-common package).
As xxd may not be available on some systems building selftests, it makes
sense to substitute it with more common utils: hexdump, wc, sed to
generate equivalent C array output.
Tested by generating header with both xxd and hexdump and comparing
them.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20260128190552.242335-1-mykyta.yatsenko5@gmail.com
When fp-pidbench was originally written SVE hardware was not widely
available so it was useful to run it in emulation and the default number
of loops was set very low, running for less than a second on actual
hardware. Now that SVE hardware is reasonably available it is very much
less interesting to use emulation, bump the default number of loops up to
even out a bit of the noise on real systems. On the machine I have to hand
this now takes about 15s which is still a toy microbenchmark but perhaps a
bit more useful.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Some applications use SVE intermittently, one common case being where SVE
is used during statup (eg, by ld.so) but then rarely if ever during the
main application runtime. Add a repeat of the no SVE loop after we've done
the SVE loops to fp-pidbench to capture results for that.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
In sync with the main kernel headers, include a stub version of
compiler-context-analysis.h in tools/include/linux/compiler_types.h and
remove the sparse context tracking definitions.
Since tools/ headers are generally self-contained, provide a standalone
tools/include/linux/compiler-context-analysis.h with no-op stubs for now. Also
clean up redundant stubs in tools/testing/shared/linux/kernel.h that are now
redundant.
This fixes build errors in tools/testing/radix-tree/ where headers from
include/linux/ (like cleanup.h) are used directly and expect these
macros to be defined:
| cc -I../shared -I. -I../../include -I../../arch/x86/include -I../../../lib -g -Og -Wall -D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined -c -o radix-tree.o radix-tree.c
| In file included from ../shared/linux/cleanup.h:2,
| from ../shared/linux/../../../../include/linux/idr.h:18,
| from ../shared/linux/idr.h:5,
| from radix-tree.c:18:
| ../shared/linux/../../../../include/linux/idr.h: In function ‘class_idr_alloc_destructor’:
| ../shared/linux/../../../../include/linux/cleanup.h:283:9: error: expected declaration specifiers before ‘__no_context_analysis’
| 283 | __no_context_analysis \
| | ^~~~~~~~~~~~~~~~~~~~~
Closes: https://lore.kernel.org/oe-lkp/202601261546.d7ae2447-lkp@intel.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://patch.msgid.link/20260127111428.3747328-1-elver@google.com
Some PTP hardware clock (PHC) devices may return -EOPNOTSUPP for
operations like settime, adjtime, or adjfreq. This commonly occurs
with timestamp-only PHC implementations that don't support full clock
control.
For background, syzbot previously exposed a crash risk when PTP clock
drivers lacked required callbacks[1]. Subsequent work[2] made callback
presence a registration requirement. As a result, some drivers (like
iwlwifi MVM/MLD[3]) now provide stub callbacks that return -EOPNOTSUPP
for unsupported operations.
When phc_ctl encounters such devices, the "Operation not supported"
error should be treated as a skip (device limitation) rather than a
test failure. This patch:
- Adds [SKIP] output handling in log_test()
- Detects "Operation not supported" from phc_ctl and returns ksft_skip
- Returns ksft_skip if all tests are skipped, preventing false-positive
results when testing timestamp-only PHC implementations
Link: https://lore.kernel.org/netdev/20251028043216.1971292-1-junjie.cao@intel.com/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dfb073d32cac [2]
Link: https://lore.kernel.org/netdev/20251204123204.9316-1-ziyao@disroot.org/ [3]
Signed-off-by: Junjie Cao <junjie.cao@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260126061532.12532-2-junjie.cao@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kselftest framework defines KSFT_SKIP=4 as the standard exit code
for skipped tests. However, phc.sh currently uses a mix of 'exit 0' and
'exit 1' to indicate skip conditions, which can confuse test harnesses
and CI systems.
This patch introduces ksft_skip=4 variable and unifies all skip exit
paths to use 'exit $ksft_skip', consistent with other selftests like
net/lib.sh and net/fib_nexthops.sh.
Signed-off-by: Junjie Cao <junjie.cao@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260126061532.12532-1-junjie.cao@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit adds two new test functions: one to reproduce the bug reported
by syzkaller [1], and another to cover the calculation of copied_seq.
The tests primarily involve installing and uninstalling sockmap on
sockets, then reading data to verify proper functionality.
Additionally, extend the do_test_sockmap_skb_verdict_fionread() function
to support UDP FIONREAD testing.
[1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20260124113314.113584-4-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend some of the existing CSS iterator selftests such that they
cover the newly introduced BPF_CGROUP_ITER_CHILDREN iterator control
option.
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20260127085112.3608687-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests that validate vsock sockets are resilient to deleting
namespaces. The vsock sockets should still function normally.
The function check_ns_delete_doesnt_break_connection() is added to
re-use the step-by-step logic of 1) setup connections, 2) delete ns,
3) check that the connections are still ok.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-12-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests to validate namespace correctness using vsock_test and socat.
The vsock_test tool is used to validate expected success tests, but
socat is used for expected failure tests. socat is used to ensure that
connections are rejected outright instead of failing due to some other
socket behavior (as tested in vsock_test). Additionally, socat is
already required for tunneling TCP traffic from vsock_test. Using only
one of the vsock_test tests like 'test_stream_client_close_client' would
have yielded a similar result, but doing so wouldn't remove the socat
dependency.
Additionally, check for the dependency socat. socat needs special
handling beyond just checking if it is on the path because it must be
compiled with support for both vsock and unix. The function
check_socat() checks that this support exists.
Add more padding to test name printf strings because the tests added in
this patch would otherwise overflow.
Add vm_dmesg_* helpers to encapsulate checking dmesg
for oops and warnings.
Add ability to pass extra args to host-side vsock_test so that tests
that cause false positives may be skipped with arg --skip.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-11-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests to verify CID collision rules across different vsock namespace
modes.
1. Two VMs with the same CID cannot start in different global namespaces
(ns_global_same_cid_fails)
2. Two VMs with the same CID can start in different local namespaces
(ns_local_same_cid_ok)
3. VMs with the same CID can coexist when one is in a global namespace
and another is in a local namespace (ns_global_local_same_cid_ok and
ns_local_global_same_cid_ok)
The tests ns_global_local_same_cid_ok and ns_local_global_same_cid_ok
make sure that ordering does not matter.
The tests use a shared helper function namespaces_can_boot_same_cid()
that attempts to start two VMs with identical CIDs in the specified
namespaces and verifies whether VM initialization failed or succeeded.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-10-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests for the /proc/sys/net/vsock/{ns_mode,child_ns_mode}
interfaces. Namely, that they accept/report "global" and "local" strings
and enforce their access policies.
Start a convention of commenting the test name over the test
description. Add test name comments over test descriptions that existed
before this convention.
Add a check_netns() function that checks if the test requires namespaces
and if the current kernel supports namespaces. Skip tests that require
namespaces if the system does not have namespace support.
This patch is the first to add tests that do *not* re-use the same
shared VM. For that reason, it adds a run_ns_tests() function to run
these tests and filter out the shared VM tests.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-9-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Replace /proc/net parsing with ss(8) for detecting listening sockets in
wait_for_listener() functions and add support for TCP, VSOCK, and Unix
socket protocols.
The previous implementation parsed /proc/net/tcp using awk to detect
listening sockets, but this approach could not support vsock because
vsock does not export socket information to /proc/net/.
Instead, use ss so that we can detect listeners on tcp, vsock, and unix.
The protocol parameter is now required for all wait_for_listener family
functions (wait_for_listener, vm_wait_for_listener,
host_wait_for_listener) to explicitly specify which socket type to wait
for.
ss is added to the dependency check in check_deps().
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-8-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
These functions are reused by the VM tests to collect and compare dmesg
warnings and oops counts. The future VM-specific tests use them heavily.
This patches relies on vm_ssh() already supporting namespaces.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-7-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add namespace support to vm management, ssh helpers, and vsock_test
wrapper functions. This enables running VMs and test helpers in specific
namespaces, which is required for upcoming namespace isolation tests.
The functions still work correctly within the init ns, though the caller
must now pass "init_ns" explicitly.
No functional changes for existing tests. All have been updated to pass
"init_ns" explicitly.
Affected functions (such as vm_start() and vm_ssh()) now wrap their
commands with 'ip netns exec' when executing commands in non-init
namespaces.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-6-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add functions for initializing namespaces with the different vsock NS
modes. Callers can use add_namespaces() and del_namespaces() to create
namespaces global0, global1, local0, and local1.
The add_namespaces() function initializes global0, local0, etc... with
their respective vsock NS mode by toggling child_ns_mode before creating
the namespace.
Remove namespaces upon exiting the program in cleanup(). This is
unlikely to be needed for a healthy run, but it is useful for tests that
are manually killed mid-test.
This patch is in preparation for later namespace tests.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-5-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Increase the timeout from 300s to 1200s. On a modern bare metal server
my last run showed the new set of tests taking ~400s. Multiply by an
(arbitrary) factor of three to account for slower/nested runners.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-4-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Specify which operation is being performed to anon_vma_clone(), which
allows us to do checks specific to each operation type, as well as to
separate out and make clear that the anon_vma reuse logic is absolutely
specific to fork only.
This opens the door to further refactorings and refinements later as we
have more information to work with.
Link: https://lkml.kernel.org/r/cf7da7a2d973cdc72a1b80dd9a73260519e8fa9f.1768746221.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This function is confusing, we already have the concept of anon_vma merge
to adjacent VMA's anon_vma's to increase probability of anon_vma
compatibility and therefore VMA merge (see is_mergeable_anon_vma() etc.),
as well as anon_vma reuse, along side the usual VMA merge logic.
We can remove the anon_vma check as it is redundant - a merge would not
have been permitted with removal if the anon_vma's were not the same (and
in the case of an unfaulted/faulted merge, we would have already set the
unfaulted VMA's anon_vma to vp->remove->anon_vma in dup_anon_vma()).
Avoid overloading this term when we're very simply unlinking anon_vma
state from a removed VMA upon merge.
Link: https://lkml.kernel.org/r/56bbe45e309f7af197b1c4f94a9a0c8931ff2d29.1768746221.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The __exit__ method receives ex_type as the exception class when an
exception occurs. The previous code used implicit boolean evaluation:
terminate = self.terminate or (self._exit_wait and ex_type)
^^^^^^^^^^^
In Python, the and operator can be used with non-boolean values, but it
does not always return a boolean result.
This is probably not what we want, because 'self._exit_wait and ex_type'
could return the actual ex_type value (the exception class) rather than
a boolean True when an exception occurs.
Use explicit `ex_type is not None` check to properly evaluate whether
an exception occurred, returning a boolean result.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260125105524.773993-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test_bpftool_map.sh script tests that maps read/write accesses
are being properly allowed/refused by the kernel depending on a specific
fmod_ret program being attached on security_bpf_map function.
Rewrite this test to integrate it in the test_progs. The
new test spawns a few subtests:
#36/1 bpftool_maps_access/unprotected_unpinned:OK
#36/2 bpftool_maps_access/unprotected_pinned:OK
#36/3 bpftool_maps_access/protected_unpinned:OK
#36/4 bpftool_maps_access/protected_pinned:OK
#36/5 bpftool_maps_access/nested_maps:OK
#36/6 bpftool_maps_access/btf_list:OK
#36 bpftool_maps_access:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-3-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test_bpftool_metadata.sh script validates that bpftool properly
returns in its ouptput any metadata generated by bpf programs through
some .rodata sections.
Port this test to the test_progs framework so that it can be executed
automatically in CI. The new test, similarly to the former script,
checks that valid data appears both for textual output and json output,
as well as for both data not used at all and used data. For the json
check part, the expected json string is hardcoded to avoid bringing a
new external dependency (eg: a json deserializer) for test_progs.
As the test is now converted into test_progs, remove the former script.
The newly converted test brings two new subtests:
#37/1 bpftool_metadata/metadata_unused:OK
#37/2 bpftool_metadata/metadata_used:OK
#37 bpftool_metadata:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-2-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In order to integrate some bpftool tests into test_progs, define a few
specific helpers that allow to execute bpftool commands, while possibly
retrieving the command output. Those helpers most notably set the
path to the bpftool binary under test. This version checks different
possible paths relative to the directories where the different
test_progs runners are executed, as we want to make sure not to
accidentally use a bootstrap version of the binary.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-1-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
CI occasionally reports failures in the
percpu_alloc/cpu_flag_lru_percpu_hash selftest, for example:
First test_progs failure (test_progs_no_alu32-x86_64-llvm-21):
#264/15 percpu_alloc/cpu_flag_lru_percpu_hash
...
test_percpu_map_op_cpu_flag:FAIL:bpf_map_lookup_batch value on specified cpu unexpected bpf_map_lookup_batch value on specified cpu: actual 0 != expected 3735929054
The unexpected value indicates that an element was removed from the map.
However, the test never calls delete_elem(), so the only possible cause
is LRU eviction.
This can happen when the current task migrates to another CPU: an
update_elem() triggers eviction because there is no available LRU node
on local freelist and global freelist.
Harden the test against this behavior by provisioning sufficient spare
elements. Set max_entries to 'nr_cpus * 2' and restrict the test to using
the first nr_cpus entries, ensuring that updates do not spuriously trigger
LRU eviction.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260119133417.19739-1-leon.hwang@linux.dev
The binary generated by check_hugetlb_options is missing in .gitignore
under the directory. Add it into the file so it won't be logged into
version control.
Signed-off-by: I-Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Test ipv6 pinging to local configured address and linklocal address from
localhost with -I ::1.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260121194409.6749-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new utility metadata_size was added in
commit 261b67f4e3 ("selftests: ublk: add utility to get block device metadata size")
but it was not added to .gitignore. Fix that by adding it there.
While at it sort all entries alphabetically and add a SPDX license header.
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 261b67f4e3 ("selftests: ublk: add utility to get block device metadata size")
Signed-off-by: Alexander Atanasov <alex@zazolabs.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new selftest suite `exe_ctx` to verify the accuracy of the
bpf_in_task(), bpf_in_hardirq(), and bpf_in_serving_softirq() helpers
introduced in bpf_experimental.h.
Testing these execution contexts deterministically requires crossing
context boundaries within a single CPU. To achieve this, the test
implements a "Trigger-Observer" pattern using bpf_testmod:
1. Trigger: A BPF syscall program calls a new bpf_testmod kfunc
bpf_kfunc_trigger_ctx_check().
2. Task to HardIRQ: The kfunc uses irq_work_queue() to trigger a
self-IPI on the local CPU.
3. HardIRQ to SoftIRQ: The irq_work handler calls a dummy function
(observed by BPF fentry) and then schedules a tasklet to
transition into SoftIRQ context.
The user-space runner ensures determinism by pinning itself to CPU 0
before execution, forcing the entire interrupt chain to remain on a
single core. Dummy noinline functions with compiler barriers are
added to bpf_testmod.c to serve as stable attachment points for
fentry programs. A retry loop is used in user-space to wait for the
asynchronous SoftIRQ to complete.
Note that testing on s390x is avoided because supporting those helpers
purely in BPF on s390x is not possible at this point.
Reviewed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260125115413.117502-3-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce bpf_in_nmi(), bpf_in_hardirq(), bpf_in_serving_softirq(), and
bpf_in_task() inline helpers in bpf_experimental.h. These allow BPF
programs to query the current execution context with higher granularity
than the existing bpf_in_interrupt() helper.
While BPF programs can often infer their context from attachment points,
subsystems like sched_ext may call the same BPF logic from multiple
contexts (e.g., task-to-task wake-ups vs. interrupt-to-task wake-ups).
These helpers provide a reliable way for logic to branch based on
the current CPU execution state.
Implementing these as BPF-native inline helpers wrapping
get_preempt_count() allows the compiler and JIT to inline the logic. The
implementation accounts for differences in preempt_count layout between
standard and PREEMPT_RT kernels.
Reviewed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260125115413.117502-2-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test session cookie for fsession. Multiple fsession BPF progs is attached
to bpf_fentry_test1() and session cookie is read and write in the
testcase.
bpf_get_func_ip() will influence the layout of the session cookies, so we
test the cookie in two case: with and without bpf_get_func_ip().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-13-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add testcases for BPF_TRACE_FSESSION. The function arguments and return
value are tested both in the entry and exit. And the kfunc
bpf_session_is_ret() is also tested.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-11-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the function argument of "void *ctx" to bpf_session_cookie() and
bpf_session_is_return(), which is a preparation of the next patch.
The two kfunc is seldom used now, so it will not introduce much effect
to change their function prototype.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20260124062008.8657-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The fsession is something that similar to kprobe session. It allow to
attach a single BPF program to both the entry and the exit of the target
functions.
Introduce the struct bpf_fsession_link, which allows to add the link to
both the fentry and fexit progs_hlist of the trampoline.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If the argument 'pull_len' of run_test() is 'PULL_MAX' or
'PULL_MAX | PULL_PLUS_ONE', the eventual pull_len size
will close to the page size. On arm64 systems with 64K pages,
the pull_len size will be close to 64K. But the existing buffer
will be close to 9000 which is not enough to pull.
For those failed run_tests(), make buff size to
pg_sz + (pg_sz / 2)
This way, there will be enough buffer space to pull
regardless of page size.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260123055128.495265-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On arm64 systems with 64K pages, the selftest task_local_data has the following
failures:
...
test_task_local_data_basic:PASS:tld_create_key 0 nsec
test_task_local_data_basic:FAIL:tld_create_key unexpected tld_create_key: actual 0 != expected -28
...
test_task_local_data_basic_thread:PASS:run task_main 0 nsec
test_task_local_data_basic_thread:FAIL:task_main retval unexpected error: 2 (errno 0)
test_task_local_data_basic_thread:FAIL:tld_get_data value0 unexpected tld_get_data value0: actual 0 != expected 6268
...
#447/1 task_local_data/task_local_data_basic:FAIL
...
#447/2 task_local_data/task_local_data_race:FAIL
#447 task_local_data:FAIL
When TLD_DYN_DATA_SIZE is 64K page size, for
struct tld_meta_u {
_Atomic __u8 cnt;
__u16 size;
struct tld_metadata metadata[];
};
field 'cnt' would overflow. For example, for 4K page, 'cnt' will
be 4096/64 = 64. But for 64K page, 'cnt' will be 65536/64 = 1024
and 'cnt' is not enough for 1024. To accommodate 64K page,
'_Atomic __u8 cnt' becomes '_Atomic __u16 cnt'. A few other places
are adjusted accordingly.
In test_task_local_data.c, the value for TLD_DYN_DATA_SIZE is changed
from 4096 to (getpagesize() - 8) since the maximum buffer size for
TLD_DYN_DATA_SIZE is (getpagesize() - 8).
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260123055122.494352-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Add $(DISABLE_KSTACK_ERASE) to vdso compile flags to fix compile errors
with old gcc versions
- Fix path to s390 chacha implementation in vdso selftests, after vdso64
has been renamed to vdso
- Fix off-by-one bug in APQN limit calculation
- Discard .modinfo section from decompressor image to fix SecureBoot
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmlze6cACgkQIg7DeRsp
bsI6bxAApt9LWUqAQL8GtdWdVm+o+lgEeBOVqv0Pil7i9HSfwuvUxppIVlI4hDku
vFavggZFNFN6nSSM5bBiz5SjRiUvlEqO2ZynBubgakfutrXMDw89qGph4jTiaOTl
ySIIOkf0+/In9DWWGW8qibuu+9Ff2lWcdIie1bS/E/JwmqaS4uUsH2J4JuhX8N2b
TuFWnKKUqvnLaXoRMjIigC/aMJW+rCTH/T/B29H8Fdu4R8e1XbsjFSP0IlhSLgZy
ks+2jVjoJDezfJYfwwBT+vhn2Z7Z5h2KYDei713vDI3WgiaczDwCRw6aVvF1wtQK
D1YPA6GO0QLGkjgrnMTDaORzdp/dqRc9Y6yk4d79/mVkN4NUpPyVFeMJAdhX92Ym
ccSWFbsA5DmzRR6Dg77Z737wapO5DbTo6hCfF2su2Otlulz4EEdgyKT9cc0jKnmk
ol+bWMxaJd1R0IaSfOvCceFeL3mHvNdFcOk1Zew+SA7ybmcAeTaVOw1l4E7T38sL
hjbUFDTzZJ/mAexCwq2qn5h/epF3yvrWquiF616JdZCR2d0iZS053at+hxW0gUbV
kmf80XjMHnLSazDMc32YU8/XRHMgi0Qrkn61ELSiGpbaTAeiPw5O8o2n/j4pFC6K
kNnwrjZpHWscs+Jb2GJJ/eJDH4WrSbVkcU18hXD6UNMMCMahkBM=
=ugtG
-----END PGP SIGNATURE-----
Merge tag 's390-6.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Add $(DISABLE_KSTACK_ERASE) to vdso compile flags to fix compile
errors with old gcc versions
- Fix path to s390 chacha implementation in vdso selftests, after
vdso64 has been renamed to vdso
- Fix off-by-one bug in APQN limit calculation
- Discard .modinfo section from decompressor image to fix SecureBoot
* tag 's390-6.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer
s390/ap: Fix wrong APQN fill calculation
selftests: vDSO: getrandom: Fix path to s390 chacha implementation
s390/vdso: Disable kstack erase
Merge in patches to support several patch series such as Soft Reserve
handling, type2 accelerator enabling, and LSA 2.1 labeling support.
Mainly addition of cxl_memdev_attach() to allow the memdev probe
to make a decision of proceed/fail depending success of CXL topology
enumeration.
dax/hmem, e820, resource: Defer Soft Reserved insertion until hmem is ready
cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
cxl/mem: Drop @host argument to devm_cxl_add_memdev()
cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
cxl/port: Arrange for always synchronous endpoint attach
cxl/mem: Arrange for always-synchronous memdev attach
cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmly5rUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv30EACNx0n1MlRvrIHvqIH+gq2NmpweLhJphars
3e4lyDD5bBUuQ1V2iRc5k7MiKPEtQJpF83m1WVIvycNjjx2qWLBsZyCc9PCNoZIn
nBe1/qCuKOs8kmrNkTXLKjr1uFVp+0Nm5jWSjeIgNI1zd7IDe6fpkjyH6JrnKsw8
DD7aCYE4jHGJH8q9Ks3rOu3M2syiFwkqmOD12Xgqiz29fP4qJJDvC/y4yOMDWbZY
G7ZKFHZK5QS8tmjRUhgJacVpJlN3NCL6lZ16f0fRTJoIlWzOAaBafeJTAjiS3S01
TS8sadfTLrVZocFYSKZjixVgUocHqE5QeKBrDe09N0A5XsWz/YJkcNisNw0oRcvb
BKCrO1TCHiPzliPCobPcaFZII4z2HeiIfVGSTaJAzSM4ZCt/EH9BFTKrgohGRZi4
v+bPNdJaCLaFOipgsUeA63NZpt9xN+PsfmzFGC1PvF50gQMM+JeFpqjNh37XjCIt
6QtK9rzLzC9GvdFXV3o1CRbIBAli3UEgjyLeThhsCu2HPnab/yrrf9AwF5udZQlJ
wK7S8kB3zl25nyM3jwKzzaVLqi+aUWr1Jn+D+DA2BwwPg3m8h7yIrysJryF5Leon
5VAYHCOg+MhPuvhNFjqkgahjreJqV/3Qou+sbGcLur4PXdKXeW8SY5wrDlplumFb
+BZSUZ06Kw==
=RK+t
-----END PGP SIGNATURE-----
Merge tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- A set of selftest fixes for ublk
- Fix for a pid mismatch in ublk, comparing PIDs in different
namespaces if run inside a namespace
- Fix for a regression added in this release with polling, where the
nvme tcp connect code would spin forever
- Zoned device error path fix
- Tweak the blkzoned uapi additions from this kernel release, making
them more easily discoverable
- Fix for a regression in bcache with bio endio handling added in this
release
* tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
bcache: use bio cloning for detached device requests
blk-mq: use BLK_POLL_ONESHOT for synchronous poll completion
selftests/ublk: fix garbage output in foreground mode
selftests/ublk: fix error handling for starting device
selftests/ublk: fix IO thread idle check
block: make the new blkzoned UAPI constants discoverable
ublk: fix ublksrv pid handling for pid namespaces
block: Fix an error path in disk_update_zone_resources()
In preparation for testing GSO over UDP tunnels, enhance the test
infrastructure to support a more complex data path involving a TUN
device and a GENEVE udp tunnel.
This patch introduces a dedicated setup/teardown topology that creates
both a GENEVE tunnel interface and a TUN interface. The TUN device acts
as the VTEP (Virtual Tunnel Endpoint), allowing it to send and receive
virtio-net packets. This setup effectively tests the kernel's data path
for encapsulated traffic.
Note that after adding a new address to the UDP tunnel, we need to wait
a bit until the associated route is available.
Additionally, a new data structure is defined to manage test parameters.
This structure is designed to be extensible, allowing different test
data and configurations to be easily added in subsequent patches.
Signed-off-by: Xu Du <xudu@redhat.com>
Link: https://patch.msgid.link/b5787b8c269f43ce11e1756f1691cc7fd9a1e901.1768979440.git.xudu@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce rtnetlink manipulation and packet construction helpers that
will simplify the later creation of more related test cases. This avoids
duplicating logic across different test cases.
This new header will contain:
- YNL-based netlink management utilities.
- Helpers for ip link, ip address, ip neighbor and ip route operations.
- Packet construction and manipulation helpers.
Signed-off-by: Xu Du <xudu@redhat.com>
Link: https://patch.msgid.link/91f905715c69c75f7bf72d43388921fde6c34989.1768979440.git.xudu@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a simple, netns-based topology with double, nested UDP tunnels and
perform TSO transfers on top.
Explicitly enable GSO and/or GRO and check the skb layout consistency with
different configuration allowing (or not) GSO frames to be delivered on
the other end.
The trickest part is account in a robust way the aggregated/unaggregated
packets with double encapsulation: use a classic bpf filter for it.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/61f2c98ba0f73057c2d6f6cb62eb807abd90bf6b.1769011015.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A collection of a few more small fixes for HD- and USB-audio,
including a regression fix for the OOB fix that was included in
the previous PR.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmlyUsMOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8Jsg//bdurKx6pYKtDDPRvcOs+1D5pE6a804M8vEiM
YyfzseQ3KpJ+MsMXEXD4zZ4AJ0GgZoeCkjsvHxKbXu4gBv+AiHfuke0nmA/5hwas
Nge6ItXd8/JNVsltpZblp+SodPLGWHsxdnIxgUSXVqlGeyg8OnbJh3VDcMtBq3gA
0GnpkcQ6GRMo1O9owa4p1Mfx6OwXzW+Frtukxo0p0vMqcj7A1D8r3hR1dvsSfygT
vFXkVyy7Zoe7HYy4DfbHVEwaeSU2S9+CDoKAjzLaf5dGG8i1VeuDXa4zDsnwFkfK
X1YKScJ+gb6e40W39jrAWqLnksmal8m5KWaNqSJrJJBrLmaWfpP0vr84EhNtDzie
mpXnIGeylkMe8HM8+2TGE3rAz2a2SuVzs1ICigXwf4UxJ6X/gtdoTnbZ8oB+3UJQ
LHsuYFPmKkXNuCyA6fWD2bFwu9EG1anGkNHmjzHH60BIhGox/Jsp9/Aw2HHr/Ryv
re+0vCPURYhiZUjc/UdI6BR41fA0Uu+fXNu0urwCdjefTMB0QXxG+dRRigr3uD4x
0ezSxO3C8RfpPxqilUGree4xbZVwqINI8E9LoIUm8+CyRUtmAEDcjcOIwaz0dvOB
YF7jvpTuMExb0fLImJz3xwWXjHqKprrXjvA4WJKV3FrzMB3H/7tf2Ig0HAPE9Tpa
rF2YSUA=
=MH0R
-----END PGP SIGNATURE-----
Merge tag 'sound-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of a few more small fixes for HD- and USB-audio,
including a regression fix for the OOB fix that was included
in the previous pull request"
* tag 'sound-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: ALC269 fixup for Lenovo Yoga Book 9i 13IRU8 audio
ALSA: hda/realtek: Add quirk for Samsung 730QED to fix headphone
ALSA: usb-audio: Use the right limit for PCM OOB check
ALSA: usb-audio: Fix use-after-free in snd_usb_mixer_free()
ALSA: hda/realtek: Fix headset mic for TongFang X6AR55xU
ALSA: ctxfi: Fix potential OOB access in audio mixer handling
selftests: ALSA: Remove unused variable in utimer-test
ALSA: usb-audio: Add delay quirk for MOONDROP Moonriver2 Ti
ALSA: scarlett2: Fix buffer overflow in config retrieval
ALSA: usb: Increase volume range that triggers a warning
Some distributions (such as Ubuntu) configure GCC so that
_FORTIFY_SOURCE is automatically enabled at -O1 or above. This results
in some fortified version of definitions of standard library functions
are included. While linker resolves the symbols, the fortified versions
might override the definitions in lib/string_override.c and reference to
those PLT entries in GLIBC. This is not a problem for the code in host,
but it is a disaster for the guest code. E.g., if build and run
x86/nested_emulation_test on Ubuntu 24.04 will encounter a L1 #PF due to
memset() reference to __memset_chk@plt.
The option -fno-builtin-memset is not helpful here, because those
fortified versions are not built-in but some definitions which are
included by header, they are for different intentions.
In order to eliminate the unpredictable behaviors may vary depending on
the linker and platform, add the "-U_FORTIFY_SOURCE" into CFLAGS to
prevent from introducing the fortified definitions.
Signed-off-by: Zhiquan Li <zhiquan_li@163.com>
Link: https://patch.msgid.link/20260122053551.548229-1-zhiquan_li@163.com
Fixes: 6b6f71484b ("KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest use")
Cc: stable@vger.kernel.org
[sean: tag for stable]
Signed-off-by: Sean Christopherson <seanjc@google.com>
* kvm-arm64/feat_idst:
: .
: Add support for FEAT_IDST, allowing ID registers that are not implemented
: to be reported as a normal trap rather than as an UNDEF exception.
: .
KVM: arm64: selftests: Add a test for FEAT_IDST
KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome
KVM: arm64: pkvm: Add a generic synchronous exception injection primitive
KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE
KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way
KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers
KVM: arm64: Add a generic synchronous exception injection primitive
KVM: arm64: Add trap routing for GMID_EL1
arm64: Repaint ID_AA64MMFR2_EL1.IDS description
Signed-off-by: Marc Zyngier <maz@kernel.org>
Enable flexible thread-to-queue mapping in batch I/O mode to support
arbitrary combinations of threads and queues, improving resource
utilization and scalability.
Key improvements:
- Support N:M thread-to-queue mapping (previously limited to 1:1)
- Dynamic buffer allocation based on actual queue assignment per thread
- Thread-safe queue preparation with spinlock protection
- Intelligent buffer index calculation for multi-queue scenarios
- Enhanced validation for thread/queue combination constraints
Implementation details:
- Add q_thread_map matrix to track queue-to-thread assignments
- Dynamic allocation of commit and fetch buffers per thread
- Round-robin queue assignment algorithm for load balancing
- Per-queue spinlock to prevent race conditions during prep
- Updated buffer index calculation using queue position within thread
This enables efficient configurations like:
- Any other N:M combinations for optimal resource matching
Testing:
- Added test_batch_02.sh: 4 threads vs 1 queue
- Added test_batch_03.sh: 1 thread vs 4 queues
- Validates correctness across different mapping scenarios
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add --batch/-b for enabling F_BATCH_IO.
Add batch_01 for covering its basic function.
Add stress_08 and stress_09 for covering stress test.
Add recovery test for F_BATCH_IO in generic_04 and generic_05.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
More tests need to be covered in existing generic tests, and default
45sec isn't enough, and timeout is often triggered, increase timeout
by adding setting file.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add support for UBLK_U_IO_FETCH_IO_CMDS to enable efficient batch
fetching of I/O commands using multishot io_uring operations.
Key improvements:
- Implement multishot UBLK_U_IO_FETCH_IO_CMDS for continuous command fetching
- Add fetch buffer management with page-aligned, mlocked buffers
- Process fetched I/O command tags from kernel-provided buffers
- Integrate fetch operations with existing batch I/O infrastructure
- Significantly reduce uring_cmd issuing overhead through batching
The implementation uses two fetch buffers per thread with automatic
requeuing to maintain continuous I/O command flow. Each fetch operation
retrieves multiple command tags in a single syscall, dramatically
improving performance compared to individual command fetching.
Technical details:
- Fetch buffers are page-aligned and mlocked for optimal performance
- Uses IORING_URING_CMD_MULTISHOT for continuous operation
- Automatic buffer management and requeuing on completion
- Enhanced CQE handling for fetch command completions
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement UBLK_U_IO_COMMIT_IO_CMDS to enable efficient batched
completion of I/O operations in the batch I/O framework.
This completes the batch I/O infrastructure by adding the commit
phase that notifies the kernel about completed I/O operations:
Key features:
- Batch multiple I/O completions into single UBLK_U_IO_COMMIT_IO_CMDS
- Dynamic commit buffer allocation and management per thread
- Automatic commit buffer preparation before processing events
- Commit buffer submission after processing completed I/Os
- Integration with existing completion workflows
Implementation details:
- ublk_batch_prep_commit() allocates and initializes commit buffers
- ublk_batch_complete_io() adds completed I/Os to current batch
- ublk_batch_commit_io_cmds() submits batched completions to kernel
- Modified ublk_process_io() to handle batch commit lifecycle
- Enhanced ublk_complete_io() to route to batch or legacy completion
The commit buffer stores completion information (tag, result, buffer
details) for multiple I/Os, then submits them all at once, significantly
reducing syscall overhead compared to individual I/O completions.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement support for UBLK_U_IO_PREP_IO_CMDS in the batch I/O framework:
- Add batch command initialization and setup functions
- Implement prep command queueing with proper buffer management
- Add command completion handling for prep and commit commands
- Integrate batch I/O setup into thread initialization
- Update CQE handling to support batch commands
The implementation uses the previously established buffer management
infrastructure to queue UBLK_U_IO_PREP_IO_CMDS commands. Commands are
prepared in the first thread context and use commit buffers for
efficient command batching.
Key changes:
- ublk_batch_queue_prep_io_cmds() prepares I/O command batches
- ublk_batch_compl_cmd() handles batch command completions
- Modified thread setup to use batch operations when enabled
- Enhanced buffer index calculation for batch mode
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add the foundational infrastructure for UBLK_F_BATCH_IO buffer
management including:
- Allocator utility functions for small sized per-thread allocation
- Batch buffer allocation and deallocation functions
- Buffer index management for commit buffers
- Thread state management for batch I/O mode
- Buffer size calculation based on device features
This prepares the groundwork for handling batch I/O commands by
establishing the buffer management layer needed for UBLK_U_IO_PREP_IO_CMDS
and UBLK_U_IO_COMMIT_IO_CMDS operations.
The allocator uses CPU sets for efficient per-thread buffer tracking,
and commit buffers are pre-allocated with 2 buffers per thread to handle
overlapping command operations.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since UBLK_F_PER_IO_DAEMON is added, io buffer index may depend on current
thread because the common way is to use per-pthread io_ring_ctx for issuing
ublk uring_cmd.
Add one helper for returning io buffer index, so we can hide the buffer
index implementation details for target code.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace assert() with ublk_assert() since it is often triggered in daemon,
and we may get nothing shown in terminal.
Add ublk_assert(), so we can log something to syslog when assert() is
triggered.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The build_user_data() function packs multiple fields into a __u64
value using bit shifts. Without explicit __u64 casts before shifting,
the shift operations are performed on 32-bit unsigned integers before
being promoted to 64-bit, causing data loss.
Specifically, when tgt_data >= 256, the expression (tgt_data << 24)
shifts on a 32-bit value, truncating the upper 8 bits before promotion
to __u64. Since tgt_data can be up to 16 bits (assertion allows up to
65535), values >= 256 would have their high byte lost.
Add explicit __u64 casts to both op and tgt_data before shifting to
ensure the shift operations happen in 64-bit space, preserving all
bits of the input values.
user_data_to_tgt_data() is only used by stripe.c, in which the max
supported member disks are 4, so won't trigger this issue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
RFC 4884 extended certain ICMP messages with a length attribute that
encodes the length of the "original datagram" field. This is needed so
that new information could be appended to these messages without
applications thinking that it is part of the "original datagram" field.
In version 5.9, the kernel was extended with two new socket options
(SOL_IP/IP_RECVERR_4884 and SOL_IPV6/IPV6_RECVERR_RFC4884) that allow
user space to retrieve this length which is basically the offset to the
ICMP Extension Structure at the end of the ICMP message. This is
required by user space applications that need to parse the information
contained in the ICMP Extension Structure. For example, the RFC 5837
extension for tracepath.
Add a selftest that verifies correct handling of the RFC 4884 length
field for both IPv4 and IPv6, with and without extension structures,
and validates that malformed extensions are correctly reported as invalid.
For each address family, the test creates:
- a raw socket used to send locally crafted ICMP error packets to the
loopback address, and
- a datagram socket used to receive the encapsulated original datagram
and associated error metadata from the kernel error queue.
ICMP packets are constructed entirely in user space rather than relying
on kernel-generated errors. This allows the test to exercise invalid
scenarios (such as corrupted checksums and incorrect length fields) and
verify that the SO_EE_RFC4884_FLAG_INVALID flag is set as expected.
Output Example:
$ ./icmp_rfc4884
Starting 18 tests from 18 test cases.
RUN rfc4884.ipv4_ext_small_payload.rfc4884 ...
OK rfc4884.ipv4_ext_small_payload.rfc4884
ok 1 rfc4884.ipv4_ext_small_payload.rfc4884
RUN rfc4884.ipv4_ext.rfc4884 ...
OK rfc4884.ipv4_ext.rfc4884
ok 2 rfc4884.ipv4_ext.rfc4884
RUN rfc4884.ipv4_ext_large_payload.rfc4884 ...
OK rfc4884.ipv4_ext_large_payload.rfc4884
ok 3 rfc4884.ipv4_ext_large_payload.rfc4884
RUN rfc4884.ipv4_no_ext_small_payload.rfc4884 ...
OK rfc4884.ipv4_no_ext_small_payload.rfc4884
ok 4 rfc4884.ipv4_no_ext_small_payload.rfc4884
RUN rfc4884.ipv4_no_ext_min_payload.rfc4884 ...
OK rfc4884.ipv4_no_ext_min_payload.rfc4884
ok 5 rfc4884.ipv4_no_ext_min_payload.rfc4884
RUN rfc4884.ipv4_no_ext_large_payload.rfc4884 ...
OK rfc4884.ipv4_no_ext_large_payload.rfc4884
ok 6 rfc4884.ipv4_no_ext_large_payload.rfc4884
RUN rfc4884.ipv4_invalid_ext_checksum.rfc4884 ...
OK rfc4884.ipv4_invalid_ext_checksum.rfc4884
ok 7 rfc4884.ipv4_invalid_ext_checksum.rfc4884
RUN rfc4884.ipv4_invalid_ext_length_small.rfc4884 ...
OK rfc4884.ipv4_invalid_ext_length_small.rfc4884
ok 8 rfc4884.ipv4_invalid_ext_length_small.rfc4884
RUN rfc4884.ipv4_invalid_ext_length_large.rfc4884 ...
OK rfc4884.ipv4_invalid_ext_length_large.rfc4884
ok 9 rfc4884.ipv4_invalid_ext_length_large.rfc4884
RUN rfc4884.ipv6_ext_small_payload.rfc4884 ...
OK rfc4884.ipv6_ext_small_payload.rfc4884
ok 10 rfc4884.ipv6_ext_small_payload.rfc4884
RUN rfc4884.ipv6_ext.rfc4884 ...
OK rfc4884.ipv6_ext.rfc4884
ok 11 rfc4884.ipv6_ext.rfc4884
RUN rfc4884.ipv6_ext_large_payload.rfc4884 ...
OK rfc4884.ipv6_ext_large_payload.rfc4884
ok 12 rfc4884.ipv6_ext_large_payload.rfc4884
RUN rfc4884.ipv6_no_ext_small_payload.rfc4884 ...
OK rfc4884.ipv6_no_ext_small_payload.rfc4884
ok 13 rfc4884.ipv6_no_ext_small_payload.rfc4884
RUN rfc4884.ipv6_no_ext_min_payload.rfc4884 ...
OK rfc4884.ipv6_no_ext_min_payload.rfc4884
ok 14 rfc4884.ipv6_no_ext_min_payload.rfc4884
RUN rfc4884.ipv6_no_ext_large_payload.rfc4884 ...
OK rfc4884.ipv6_no_ext_large_payload.rfc4884
ok 15 rfc4884.ipv6_no_ext_large_payload.rfc4884
RUN rfc4884.ipv6_invalid_ext_checksum.rfc4884 ...
OK rfc4884.ipv6_invalid_ext_checksum.rfc4884
ok 16 rfc4884.ipv6_invalid_ext_checksum.rfc4884
RUN rfc4884.ipv6_invalid_ext_length_small.rfc4884 ...
OK rfc4884.ipv6_invalid_ext_length_small.rfc4884
ok 17 rfc4884.ipv6_invalid_ext_length_small.rfc4884
RUN rfc4884.ipv6_invalid_ext_length_large.rfc4884 ...
OK rfc4884.ipv6_invalid_ext_length_large.rfc4884
ok 18 rfc4884.ipv6_invalid_ext_length_large.rfc4884
PASSED: 18 / 18 tests passed.
Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260121114644.2863640-1-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Restricted CXL Host (RCH) protocol error handling uses a procedure distinct
from the CXL Virtual Hierarchy (VH) handling. This is because of the
differences in the RCH and VH topologies. Improve the maintainability and
add ability to enable/disable RCH handling.
Move and combine the RCH handling code into a single block conditionally
compiled with the CONFIG_CXL_RCH_RAS kernel config.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260114182055.46029-9-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Create new config CONFIG_CXL_RAS and put all CXL RAS items behind the
config. The config will depend on CPER and PCIE AER to build. Move the
related VH RAS code from core/pci.c to core/ras.c.
Restricted CXL host (RCH) RAS functions will be moved in a future patch.
Cc: Robert Richter <rrichter@amd.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260114182055.46029-8-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Pretty big PR, but hard to make up any cohesive story that would
explain it, a random collection of fixes. The two reverts of bad
patches from this release here feel like stuff that'd normally
show up by rc5 or rc6. Perhaps obvious thing to say, given the MW
timing.
That said, no active investigations / regressions. Let's see what
the next week brings.
Current release - fix to a fix:
- can: alloc_candev_mqs(): add missing default CAN capabilities
Current release - regressions:
- usbnet: fix crash due to missing BQL accounting after resume
- Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...
Previous releases - regressions:
- Revert "nfc/nci: Add the inconsistency check between the input ...
Previous releases - always broken:
- number of driver fixes for incorrect use of seqlocks on stats
- rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
when MSG_PEEK was set
- ipvlan: make the addrs_lock be per port avoid races in the port
hash table
- sched: enforce that teql can only be used as root qdisc
- virtio: coalesce only linear skb
- wifi: ath12k: fix dead lock while flushing management frames
- eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmlyVNQACgkQMUZtbf5S
IrvoMQ/+N+KikpNLtHFpkqKAsTnNC6gR9yG+NYvUck1Oo+CQQpc/wJ78pS2cyPpM
siMa7lLzRuCKPYni3RVpG9nqG/jv9HKvFXpgpSb71Evs6syO85kI8Dy4z8G4mClk
A4QFH6/8V4uZpH/ewy9DTIjxdWPEJA6CuXXnmolTEBhrNkgskv0Sz3jFVTxDjFzN
xTMk6H2PhgANUsYFree9PQ+gR+QAHKHPvjobWCsepYRHlT4YVi8PrmqoYBvycLwa
Q//xy6PUOSXZM4sa6YSUMZk2Nou/84BvqouQ05ljaLSZDl4Ecfxl5MDkh/kzZHjR
wPbMhiZUrJngwwyXQWcSnir1M17IPKjC/FAEDFUFVWDT/wYYHODt9npCRpdglsa5
SGoy0yDgUN5Lq9Q5dPL0PMLHVLkuDdlzmyva4uso/dQYovXgTi/5Kx2bPc6LLqMJ
CP5QxjN5OsOHKnGdI/UO28OU5Yn2KRCmJiW9nW3XdxR316lkmo6BWHAxWG9XhbJE
aCBXu3EoqNSDVuWaZFVQ8g2uTN3uOhnuJqBllhahtENbdqjsat0cxcjYTuUDvuMm
B1W0My5qq7O6TRjlFl6JfC3k2gqbZ+HlLg3LfmC74bmddOL52up+9HKyeRoWiIaA
9BQ3DKpjD3VFEbsbBVPfoZ4Ch4jrB+G+ck8f7g/l9BCNM+Ji1Ds=
=Ehzr
-----END PGP SIGNATURE-----
Merge tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and wireless.
Pretty big, but hard to make up any cohesive story that would explain
it, a random collection of fixes. The two reverts of bad patches from
this release here feel like stuff that'd normally show up by rc5 or
rc6. Perhaps obvious thing to say, given the holiday timing.
That said, no active investigations / regressions. Let's see what the
next week brings.
Current release - fix to a fix:
- can: alloc_candev_mqs(): add missing default CAN capabilities
Current release - regressions:
- usbnet: fix crash due to missing BQL accounting after resume
- Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ...
Previous releases - regressions:
- Revert "nfc/nci: Add the inconsistency check between the input ...
Previous releases - always broken:
- number of driver fixes for incorrect use of seqlocks on stats
- rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue
when MSG_PEEK was set
- ipvlan: make the addrs_lock be per port avoid races in the port
hash table
- sched: enforce that teql can only be used as root qdisc
- virtio: coalesce only linear skb
- wifi: ath12k: fix dead lock while flushing management frames
- eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue"
* tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
Octeontx2-af: Add proper checks for fwdata
dpll: Prevent duplicate registrations
net/sched: act_ife: avoid possible NULL deref
hinic3: Fix netif_queue_set_napi queue_index input parameter error
vsock/test: add stream TX credit bounds test
vsock/virtio: cap TX credit to local buffer size
vsock/test: fix seqpacket message bounds test
vsock/virtio: fix potential underflow in virtio_transport_get_credit()
net: fec: account for VLAN header in frame length calculations
net: openvswitch: fix data race in ovs_vport_get_upcall_stats
octeontx2-af: Fix error handling
net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X
rxrpc: Fix data-race warning and potential load/store tearing
net: dsa: fix off-by-one in maximum bridge ID determination
net: bcmasp: Fix network filter wake for asp-3.0
bonding: provide a net pointer to __skb_flow_dissect()
selftests: net: amt: wait longer for connection before sending packets
be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning"
netrom: fix double-free in nr_route_frame()
...
Add a regression test for the TX credit bounds fix. The test verifies
that a sender with a small local buffer size cannot queue excessive
data even when the peer advertises a large receive buffer.
The client:
- Sets a small buffer size (64 KiB)
- Connects to server (which advertises 2 MiB buffer)
- Sends in non-blocking mode until EAGAIN
- Verifies total queued data is bounded
This guards against the original vulnerability where a remote peer
could cause unbounded kernel memory allocation by advertising a large
buffer and reading slowly.
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: use sock_buf_size to check the bytes sent + small fixes]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-5-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test requires the sender (client) to send all messages before waking
up the receiver (server).
Since virtio-vsock had a bug and did not respect the size of the TX
buffer, this test worked, but now that we are going to fix the bug, the
test hangs because the sender would fill the TX buffer before waking up
the receiver.
Set the buffer size in the sender (client) as well, as we already do for
the receiver (server).
Fixes: 5c338112e4 ("test/vsock: rework message bounds test")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260121093628.9941-3-sgarzare@redhat.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add tests for FEAT_LS64. Issue related instructions if feature
presents, no SIGILL should be received. When such instructions
operate on Device memory or non-cacheable memory, we may received
a SIGBUS during the test (w/o FEAT_LS64WB). Just ignore it since
we only tested whether the instruction itself can be issued as
expected on platforms declaring the support of such features.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
The my_syscall*() macros are internal implementation details of nolibc.
Nolibc also provides the regular syscall(2), which is also a macro
and directly expands to the correct my_syscall().
Use syscall() instead.
As a side-effect this fixes some return value checks, as my_syscall()
returns the raw value as set by the kernel and does not set errno.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Provide an initial test case to evaluate the functionality. This needs to be
extended to cover the ABI violations and expose the race condition between
observing granted and arriving in rseq_slice_yield().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251215155709.320325431@linutronix.de
Both send_mcast4() and send_mcast6() use sleep 2 to wait for the tunnel
connection between the gateway and the relay, and for the listener
socket to be created in the LISTENER namespace.
However, tests sometimes fail because packets are sent before the
connection is fully established.
Increase the waiting time to make the tests more reliable, and use
wait_local_port_listen() to explicitly wait for the listener socket.
Fixes: c08e8baea7 ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20260120133930.863845-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce a new netconsole selftest to validate that netconsole is able
to resume a deactivated target when the low level interface comes back.
The test setups the network using netdevsim, creates a netconsole target
and then remove/add netdevsim in order to bring the same interfaces
back. Afterwards, the test validates that the target works as expected.
Targets are created via cmdline parameters to the module to ensure that
we are able to resume targets that were bound by mac and interface name.
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260118-netcons-retrigger-v11-7-4de36aebcf48@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When wq__attach() fails, serial_test_wq() returns early without calling
wq__destroy(), leaking the skeleton resources allocated by
wq__open_and_load(). This causes ASAN leak reports in selftests runs.
Fix this by jumping to a common clean_up label that calls wq__destroy()
on all exit paths after successful open_and_load.
Note that the early return after wq__open_and_load() failure is correct
and doesn't need fixing, since that function returns NULL on failure
(after internally cleaning up any partial allocations).
Fixes: 8290dba519 ("selftests/bpf: wq: add bpf_wq_start() checks")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com
Fix the typo "untill" → "until" in a comment in pidfd_info_test.c.
This typo is already listed in scripts/spelling.txt by commit
66b47b4a9d ("checkpatch: look for common misspellings").
Link: https://lore.kernel.org/r/20260121094147.4187337-1-chenziyu@uniontech.com
Suggested-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Signed-off-by: Ziyu Chen <chenziyu@uniontech.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Test bpf_get_func_arg() and bpf_get_func_arg_cnt() for tp_btf. The code
is most copied from test1 and test2.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260121044348.113201-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
when running kublk in foreground mode. Without this, _evtfd is
zero-initialized to 0 (stdin), and ublk_send_dev_event() writes
binary data to stdin which appears as garbage on the terminal.
Also fix debug message format string.
Fixes: 6aecda00b7 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix error handling in ublk_start_daemon() when start_dev fails:
1. Call ublk_ctrl_stop_dev() to cancel inflight uring_cmd before
cleanup. Without this, the device deletion may hang waiting for
I/O completion that will never happen.
2. Add fail_start label so that pthread_join() is called on the
error path. This ensures proper thread cleanup when startup fails.
Fixes: 6aecda00b7 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Include cmd_inflight in ublk_thread_is_done() check. Without this,
the thread may exit before all FETCH commands are completed, which
may cause device deletion to hang.
Fixes: 6aecda00b7 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add the testcase for the jited inline of bpf_get_current_task().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260120070555.233486-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The softlockup_panic sysctl is currently a binary option: panic
immediately or never panic on soft lockups.
Panicking on any soft lockup, regardless of duration, can be overly
aggressive for brief stalls that may be caused by legitimate operations.
Conversely, never panicking may allow severe system hangs to persist
undetected.
Extend softlockup_panic to accept an integer threshold, allowing the
kernel to panic only when the normalized lockup duration exceeds N
watchdog threshold periods. This provides finer-grained control to
distinguish between transient delays and persistent system failures.
The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic when duration >= 1 * threshold (20s default, original behavior)
- N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)
The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility while allowing systems to tolerate brief lockups
while still catching severe, persistent hangs.
[lirongqing@baidu.com: v2]
Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com
Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The test supports arm64 as well so the comment is incorrect. And there's
a check for arm64 in va_high_addr_switch.c.
Link: https://lkml.kernel.org/r/20251221040025.3159990-5-chuhu@redhat.com
Fixes: 983e760bcd ("selftest/mm: va_high_addr_switch: add ppc64 support check")
Fixes: f556acc2fa ("selftests/mm: skip test for non-LPA2 and non-LVA systems")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When the first test failed, and the hugetlb test passed, the result would
be pass, but we expect a fail. Fix this issue by returning fail if either
is not KSFT_PASS.
Link: https://lkml.kernel.org/r/20251221040025.3159990-4-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
arm64 and x86_64 has the same nr_hugepages requriement for running the
va_high_addr_switch test. Since commit d9d957bd7b ("selftests/mm: alloc
hugepages in va_high_addr_switch test"), the setup can be done in
va_high_addr_switch.sh. So remove the duplicated setup.
Link: https://lkml.kernel.org/r/20251221040025.3159990-3-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The va_high_addr_switch test requires 6 hugepages, not 5. If running the
test directly by: ./va_high_addr_switch.sh, the test will hit a mmap 'FAIL'
caused by not enough hugepages:
mmap(addr_switch_hint - hugepagesize, 2*hugepagesize, MAP_HUGETLB): 0x7f330f800000 - OK
mmap(addr_switch_hint , 2*hugepagesize, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
The failure can't be hit if run the tests by running 'run_vmtests.sh -t
hugevm' because the nr_hugepages is set to 128 at the beginning of
run_vmtests.sh and va_high_addr_switch.sh skip the setup of nr_hugepages
because already enough.
Link: https://lkml.kernel.org/r/20251221040025.3159990-2-chuhu@redhat.com
Fixes: d9d957bd7b ("selftests/mm: alloc hugepages in va_high_addr_switch test")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix va_high_addr_switch.sh test failure - again", v2.
The series address several issues exist for the va_high_addr_switch test:
1) the test return value is ignored in va_high_addr_switch.sh.
2) the va_high_addr_switch test requires 6 hugepages not 5.
3) the reurn value of the first test in va_high_addr_switch.c can be
overridden by the second test.
4) the nr_hugepages setup in run_vmtests.sh for arm64 can be done in
va_high_addr_switch.sh too.
5) update a comment for check_test_requirements.
This patch: (of 5)
The return value should be return value of va_high_addr_switch, otherwise
a test failure would be silently ignored.
Link: https://lkml.kernel.org/r/20251221040025.3159990-1-chuhu@redhat.com
Fixes: d9d957bd7b ("selftests/mm: alloc hugepages in va_high_addr_switch test")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The hugetlb cgroup usage wait loops in charge_reserved_hugetlb.sh were
unbounded and could hang forever if the expected cgroup file value never
appears (e.g. due to write_to_hugetlbfs in Error mapping).
=== Error log ===
# uname -r
6.12.0-xxx.el10.aarch64+64k
# ls /sys/kernel/mm/hugepages/hugepages-*
hugepages-16777216kB/ hugepages-2048kB/ hugepages-524288kB/
#./charge_reserved_hugetlb.sh -cgroup-v2
# -----------------------------------------
...
# nr hugepages = 10
# writing cgroup limit: 5368709120
# writing reseravation limit: 5368709120
...
# write_to_hugetlbfs: Error mapping the file: Cannot allocate memory
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
...
Introduce a small helper, wait_for_file_value(), and use it for:
- waiting for reservation usage to drop to 0,
- waiting for reservation usage to reach a given size,
- waiting for fault usage to reach a given size.
This makes the waits consistent and adds a hard timeout (60 tries with 1s
sleep) so the test fails instead of stalling indefinitely.
Link: https://lkml.kernel.org/r/20251221122639.3168038-4-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
charge_reserved_hugetlb.sh mounts a hugetlbfs instance at /mnt/huge with a
fixed size of 256M. On systems with large base hugepages (e.g. 512MB),
this is smaller than a single hugepage, so the hugetlbfs mount ends up
with zero capacity (often visible as size=0 in mount output).
As a result, write_to_hugetlbfs fails with ENOMEM and the test can hang
waiting for progress.
=== Error log ===
# uname -r
6.12.0-xxx.el10.aarch64+64k
#./charge_reserved_hugetlb.sh -cgroup-v2
# -----------------------------------------
...
# nr hugepages = 10
# writing cgroup limit: 5368709120
# writing reseravation limit: 5368709120
...
# write_to_hugetlbfs: Error mapping the file: Cannot allocate memory
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
# Waiting for hugetlb memory reservation to reach size 2684354560.
# 0
...
# mount |grep /mnt/huge
none on /mnt/huge type hugetlbfs (rw,relatime,seclabel,pagesize=512M,size=0)
# grep -i huge /proc/meminfo
...
HugePages_Total: 10
HugePages_Free: 10
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 524288 kB
Hugetlb: 5242880 kB
Drop the mount args with 'size=256M', so the filesystem capacity is sufficient
regardless of HugeTLB page size.
Link: https://lkml.kernel.org/r/20251221122639.3168038-3-liwang@redhat.com
Fixes: 29750f71a9 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: hugetlb cgroup charging: robustness fixes", v3.
This series fixes a few issues in the hugetlb cgroup charging selftests
(write_to_hugetlbfs.c + charge_reserved_hugetlb.sh) that show up on
systems with large hugepages (e.g. 512MB) and when failures cause the
test to wait indefinitely.
On an aarch64 64k page kernel with 512MB hugepages, the test consistently
fails in write_to_hugetlbfs with ENOMEM and then hangs waiting for the
expected usage values. The root cause is that charge_reserved_hugetlb.sh
mounts hugetlbfs with a fixed size=256M, which is smaller than a single
hugepage, resulting in a mount with size=0 capacity.
In addition, write_to_hugetlbfs previously parsed -s via atoi() into an
int, which can overflow and print negative sizes.
Reproducer / environment:
- Kernel: 6.12.0-xxx.el10.aarch64+64k
- Hugepagesize: 524288 kB (512MB)
- ./charge_reserved_hugetlb.sh -cgroup-v2
- Observed mount: pagesize=512M,size=0 before this series
After applying the series, the test completes successfully on the above
setup.
This patch (of 3):
write_to_hugetlbfs currently parses the -s size argument with atoi() into
an int. This silently accepts malformed input, cannot report overflow,
and can truncate large sizes.
=== Error log ===
# uname -r
6.12.0-xxx.el10.aarch64+64k
# ls /sys/kernel/mm/hugepages/hugepages-*
hugepages-16777216kB/ hugepages-2048kB/ hugepages-524288kB/
#./charge_reserved_hugetlb.sh -cgroup-v2
# -----------------------------------------
...
# nr hugepages = 10
# writing cgroup limit: 5368709120
# writing reseravation limit: 5368709120
...
# Writing to this path: /mnt/huge/test
# Writing this size: -1610612736 <--------
Switch the size variable to size_t and parse -s with sscanf("%zu", ...).
Also print the size using %zu.
This avoids incorrect behavior with large -s values and makes the utility
more robust.
Link: https://lkml.kernel.org/r/20251221122639.3168038-1-liwang@redhat.com
Link: https://lkml.kernel.org/r/20251221122639.3168038-2-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If PAGE_SIZE is larger than 4k and if you have a system with a large
number of CPUs, this test can require a very large amount of memory
leading to oom-killer firing. Given the type of allocation, the kernel
won't have anything to kill, causing the system to stall.
Add a parameter to the test_vmalloc driver to represent the number of
times a percpu object will be allocated. Calculate this in
test_vmalloc.sh to be 90% of available memory or the current default of
35000, whichever is smaller.
Link: https://lkml.kernel.org/r/20251201181848.1216197-1-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit ca9d74eb5f ("uapi: add INT_MAX and INT_MIN constants")
recently removed some includes of limits.h in uAPI headers.
ncdevmem.c was depending on them:
ncdevmem.c: In function ‘ethtool_add_flow’:
ncdevmem.c:369:60: error: ‘INT_MAX’ undeclared (first use in this function)
369 | if (endptr == id_start || flow_id < 0 || flow_id > INT_MAX)
| ^~~~~~~
ncdevmem.c:77:1: note: ‘INT_MAX’ is defined in header ‘<limits.h>’; did you forget to ‘#include <limits.h>’?
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20260120180319.1673271-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend HW timestamp tests to check that ioctl interface is not broken
and configuration setups and requests are equal to netlink interface.
Some linter warnings are disabled because of ctypes classes.
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260116062121.1230184-2-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Begunkov says:
====================
Add support for providers with large rx buffer
Many modern NICs support configurable receive buffer lengths, and zcrx and
memory providers can use buffers larger than 4K to improve performance.
When paired with hw-gro larger rx buffer sizes can drastically reduce
the number of buffers traversing the stack and save a lot of processing
time. It also allows to give to users larger contiguous chunks of data.
Single stream benchmarks showed up to ~30% CPU util improvement.
E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC:
packets=23987040 (MB=2745098), rps=199559 (MB/s=22837)
CPU %usr %nice %sys %iowait %irq %soft %idle
0 1.53 0.00 27.78 2.72 1.31 66.45 0.22
packets=24078368 (MB=2755550), rps=200319 (MB/s=22924)
CPU %usr %nice %sys %iowait %irq %soft %idle
0 0.69 0.00 8.26 31.65 1.83 57.00 0.57
This series adds net infrastructure for memory providers configuring
the size and implements it for bnxt. It's an opt-in feature for drivers,
they should advertise support for the parameter in the qops and must check
if the hardware supports the given size. It's limited to memory providers
as it drastically simplifies implementation. It doesn't affect the fast
path zcrx uAPI, and the user exposed parameter is defined in zcrx terms,
which allows it to be flexible and adjusted in the future.
A liburing example can be found at [2]
full branch:
[1] https://github.com/isilence/linux.git zcrx/large-buffers-v8
Liburing example:
[2] https://github.com/isilence/liburing.git zcrx/rx-buf-len
* tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux:
io_uring/zcrx: document area chunking parameter
selftests: iou-zcrx: test large chunk sizes
eth: bnxt: support qcfg provided rx page size
eth: bnxt: adjust the fill level of agg queues with larger buffers
eth: bnxt: store rx buffer size per queue
net: pass queue rx page size from memory provider
net: add bare bone queue configs
net: reduce indent of struct netdev_queue_mgmt_ops members
net: memzero mp params when closing a queue
====================
Link: https://patch.msgid.link/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 77b9c4a438, reversing
changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497:
931420a2fc ("selftests/net: Add netkit container tests")
ab771c938d ("selftests/net: Make NetDrvContEnv support queue leasing")
6be87fbb27 ("selftests/net: Add env for container based tests")
61d99ce3df ("selftests/net: Add bpf skb forwarding program")
920da36341 ("netkit: Add xsk support for af_xdp applications")
eef51113f8 ("netkit: Add netkit notifier to check for unregistering devices")
b5ef109d22 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create")
b5c3fa4a0b ("netkit: Add single device mode for netkit")
0073d2fd67 ("xsk: Proxy pool management for leased queues")
1ecea95dd3 ("xsk: Extend xsk_rcv_check validation")
804bf334d0 ("net: Proxy netdev_queue_get_dma_dev for leased queues")
0caa9a8dde ("net: Proxy net_mp_{open,close}_rxq for leased queues")
ff8889ff91 ("net, ethtool: Disallow leased real rxqs to be resized")
9e2103f361 ("net: Add lease info to queue-get response")
31127dedde ("net: Implement netdev_nl_queue_create_doit")
a5546e18f7 ("net: Add queue-create operation")
The series will conflict with io_uring work, and the code needs more
polish.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace the verifier test for default trusted pointer semantics, which
previously relied on BPF kfunc bpf_get_root_mem_cgroup(), with a new
test utilizing dedicated BPF kfuncs defined within the bpf_testmod.
bpf_get_root_mem_cgroup() was modified such that it again relies on
KF_ACQUIRE semantics, therefore no longer making it a suitable
candidate to test BPF verifier default trusted pointer semantics
against.
Link: https://lore.kernel.org/bpf/20260113083949.2502978-2-mattbobrowski@google.com
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20260120091630.3420452-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now BPF_DIV has range tracking support via interval analysis. This patch
adds selftests to cover various cases of BPF_DIV and BPF_MOD operations
when the divisor is a constant, also covering both signed and unsigned variants.
This patch includes several types of tests in 32-bit and 64-bit variants:
1. For UDIV
- positive divisor
- zero divisor
2. For SDIV
- positive divisor, positive dividend
- positive divisor, negative dividend
- positive divisor, mixed sign dividend
- negative divisor, positive dividend
- negative divisor, negative dividend
- negative divisor, mixed sign dividend
- zero divisor
- overflow (SIGNED_MIN/-1), normal dividend
- overflow (SIGNED_MIN/-1), constant dividend
3. For UMOD
- positive divisor
- positive divisor, small dividend
- zero divisor
4. For SMOD
- positive divisor, positive dividend
- positive divisor, negative dividend
- positive divisor, mixed sign dividend
- positive divisor, mixed sign dividend, small dividend
- negative divisor, positive dividend
- negative divisor, negative dividend
- negative divisor, mixed sign dividend
- negative divisor, mixed sign dividend, small dividend
- zero divisor
- overflow (SIGNED_MIN/-1), normal dividend
- overflow (SIGNED_MIN/-1), constant dividend
Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely analyze the result of BPF_DIV/BPF_MOD
instruction, it can prune the path that leads to an error (here we use
invalid memory access as the error case), allowing the program to pass
verification.
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Link: https://lore.kernel.org/r/20260119085458.182221-3-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch implements range tracking (interval analysis) for BPF_DIV and
BPF_MOD operations when the divisor is a constant, covering both signed
and unsigned variants.
While LLVM typically optimizes integer division and modulo by constants
into multiplication and shift sequences, this optimization is less
effective for the BPF target when dealing with 64-bit arithmetic.
Currently, the verifier does not track bounds for scalar division or
modulo, treating the result as "unbounded". This leads to false positive
rejections for safe code patterns.
For example, the following code (compiled with -O2):
```c
int test(struct pt_regs *ctx) {
char buffer[6] = {1};
__u64 x = bpf_ktime_get_ns();
__u64 res = x % sizeof(buffer);
char value = buffer[res];
bpf_printk("res = %llu, val = %d", res, value);
return 0;
}
```
Generates a raw `BPF_MOD64` instruction:
```asm
; __u64 res = x % sizeof(buffer);
1: 97 00 00 00 06 00 00 00 r0 %= 0x6
; char value = buffer[res];
2: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
4: 0f 01 00 00 00 00 00 00 r1 += r0
5: 91 14 00 00 00 00 00 00 r4 = *(s8 *)(r1 + 0x0)
```
Without this patch, the verifier fails with "math between map_value
pointer and register with unbounded min value is not allowed" because
it cannot deduce that `r0` is within [0, 5].
According to the BPF instruction set[1], the instruction's offset field
(`insn->off`) is used to distinguish between signed (`off == 1`) and
unsigned division (`off == 0`). Moreover, we also follow the BPF division
and modulo runtime behavior (semantics) to handle special cases, such as
division by zero and signed division overflow.
- UDIV: dst = (src != 0) ? (dst / src) : 0
- SDIV: dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst / src))
- UMOD: dst = (src != 0) ? (dst % src) : dst
- SMOD: dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src))
Here is the overview of the changes made in this patch (See the code comments
for more details and examples):
1. For BPF_DIV: Firstly check whether the divisor is zero. If so, set the
destination register to zero (matching runtime behavior).
For non-zero constant divisors: goto `scalar(32)?_min_max_(u|s)div` functions.
- General cases: compute the new range by dividing max_dividend and
min_dividend by the constant divisor.
- Overflow case (SIGNED_MIN / -1) in signed division: mark the result
as unbounded if the dividend is not a single number.
2. For BPF_MOD: Firstly check whether the divisor is zero. If so, leave the
destination register unchanged (matching runtime behavior).
For non-zero constant divisors: goto `scalar(32)?_min_max_(u|s)mod` functions.
- General case: For signed modulo, the result's sign matches the
dividend's sign. And the result's absolute value is strictly bounded
by `min(abs(dividend), abs(divisor) - 1)`.
- Special care is taken when the divisor is SIGNED_MIN. By casting
to unsigned before negation and subtracting 1, we avoid signed
overflow and correctly calculate the maximum possible magnitude
(`res_max_abs` in the code).
- "Small dividend" case: If the dividend is already within the possible
result range (e.g., [-2, 5] % 10), the operation is an identity
function, and the destination register remains unchanged.
3. In `scalar(32)?_min_max_(u|s)(div|mod)` functions: After updating current
range, reset other ranges and tnum to unbounded/unknown.
e.g., in `scalar_min_max_sdiv`, signed 64-bit range is updated. Then reset
unsigned 64-bit range and 32-bit range to unbounded, and tnum to unknown.
Exception: in BPF_MOD's "small dividend" case, since the result remains
unchanged, we do not reset other ranges/tnum.
4. Also updated existing selftests based on the expected BPF_DIV and
BPF_MOD behavior.
[1] https://www.kernel.org/doc/Documentation/bpf/standardization/instruction-set.rst
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20260119085458.182221-2-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A test kfunc named bpf_kfunc_multi_st_ops_test_1_impl() is a user of
__prog suffix. Subsequent patch removes __prog support in favor of
KF_IMPLICIT_ARGS, so migrate this kfunc to use implicit argument.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-12-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement bpf_stream_vprintk with an implicit bpf_prog_aux argument,
and remote bpf_stream_vprintk_impl from the kernel.
Update the selftests to use the new API with implicit argument.
bpf_stream_vprintk macro is changed to use the new bpf_stream_vprintk
kfunc, and the extern definition of bpf_stream_vprintk_impl is
replaced accordingly.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-11-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement bpf_task_work_schedule_* with an implicit bpf_prog_aux
argument, and remove corresponding _impl funcs from the kernel.
Update special kfunc checks in the verifier accordingly.
Update the selftests to use the new API with implicit argument.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove extern declaration of bpf_wq_set_callback_impl() from
hid_bpf_helpers.h and replace bpf_wq_set_callback macro with a
corresponding new declaration.
Tested with:
# append tools/testing/selftests/hid/config and build the kernel
$ make -C tools/testing/selftests/hid
# in built kernel
$ ./tools/testing/selftests/hid/hid_bpf -t test_multiply_events_wq
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN hid_bpf.test_multiply_events_wq ...
[ 2.575520] hid-generic 0003:0001:0A36.0001: hidraw0: USB HID v0.00 Device [test-uhid-device-138] on 138
# OK hid_bpf.test_multiply_events_wq
ok 1 hid_bpf.test_multiply_events_wq
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
PASS
Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-9-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement bpf_wq_set_callback() with an implicit bpf_prog_aux
argument, and remove bpf_wq_set_callback_impl().
Update special kfunc checks in the verifier accordingly.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-8-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add trivial end-to-end tests to validate that KF_IMPLICIT_ARGS flag is
properly handled by both resolve_btfids and the verifier.
Declare kfuncs in bpf_testmod. Check that bpf_prog_aux pointer is set
in the kfunc implementation. Verify that calls with implicit args and
a legacy case all work.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a multi-producer benchmark for perfbuf to complement the existing
ringbuf multi-producer test. Unlike ringbuf which uses a shared buffer
and experiences contention, perfbuf uses per-CPU buffers so the test
measures scaling behavior rather than contention.
This allows developers to compare perfbuf vs ringbuf performance under
multi-producer workloads when choosing between the two for their systems.
Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260120090716.82927-1-gyutae.opensource@navercorp.com
Currently, kunit.py has many subcommands and options, making it difficult
to remember them without checking the help message.
Add --list-cmds and --list-opts to kunit.py to get available commands and
options, use those outputs in kunit-completion.sh to show completion.
This implementation is similar to perf and tools/perf/perf-completion.sh.
Example output:
$ source tools/testing/kunit/kunit-completion.sh
$ ./tools/testing/kunit/kunit.py [TAB][TAB]
build config exec parse run
$ ./tools/testing/kunit/kunit.py run --k[TAB][TAB]
--kconfig_add --kernel_args --kunitconfig
Link: https://lore.kernel.org/r/20260117-kunit-completion-v2-1-cabd127d0801@gmail.com
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Blamed commit implemented logic to discover available vsock transports by
grepping /proc/kallsyms for known symbols. It incorrectly filtered entries
by type 'd'.
For some kernel configs having
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VSOCKETS_LOOPBACK=y
kallsyms reports
0000000000000000 d virtio_transport [vmw_vsock_virtio_transport]
0000000000000000 t loopback_transport
Overzealous filtering might have affected vsock test suit, resulting in
insufficient/misleading testing.
Do not filter symbols by type. It never helped much.
Fixes: 3070c05b7a ("vsock/test: Introduce get_transports()")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260116-vsock_test-kallsyms-grep-v1-1-3320bc3346f2@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add two tests using NetDrvContEnv. One basic test that sets up a netkit
pair, with one end in a netns. Use LOCAL_PREFIX_V6 and nk_forward BPF
program to ping from a remote host to the netkit in netns.
Second is a selftest for netkit queue leasing, using io_uring zero copy
test binary inside of a netns with netkit. This checks that memory
providers can be bound against virtual queues in a netkit within a
netns that are leasing from a physical netdev in the default netns.
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-17-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a new parameter `lease` to NetDrvContEnv that sets up queue leasing
in the env.
The NETIF also has some ethtool parameters changed to support memory
provider tests. This is needed in NetDrvContEnv rather than individual
test cases since the cleanup to restore NETIF can't be done, until the
netns in the env is gone.
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-16-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add an env NetDrvContEnv for container based selftests. This automates
the setup of a netns, netkit pair with one inside the netns, and a BPF
program that forwards skbs from the NETIF host inside the container.
Currently only netkit is used, but other virtual netdevs e.g. veth can
be used too.
Expect netkit container datapath selftests to have a publicly routable
IP prefix to assign to netkit in a container, such that packets will
land on eth0. The BPF skb forward program will then forward such packets
from the host netns to the container netns.
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-15-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add nk_forward.bpf.c, a BPF program that forwards skbs matching some IPv6
prefix received on eth0 ifindex to a specified netkit ifindex. This will
be needed by netkit container tests.
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-14-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a selftest that attempts to add a teql qdisc as a qfq child.
Since teql _must_ be added as a root qdisc, the kernel should reject
this.
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260114160243.913069-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Following warning is encountered when building selftests on powerpc/32.
CC csum
csum.c: In function 'recv_get_packet_csum_status':
csum.c:710:50: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
710 | error(1, 0, "cmsg: len=%lu expected=%lu",
| ~~^
| |
| long unsigned int
| %u
711 | cm->cmsg_len, CMSG_LEN(sizeof(struct tpacket_auxdata)));
| ~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
csum.c:710:63: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
710 | error(1, 0, "cmsg: len=%lu expected=%lu",
| ~~^
| |
| long unsigned int
| %u
cm->cmsg_len has type __kernel_size_t and CMSG() macro has the type
returned by sizeof() which is size_t.
size_t is 'unsigned int' on some platforms and 'unsigned long' on
other ones so use %zu instead of %lu.
The code in question was introduced by
commit 91a7de8560 ("selftests/net: add csum offload test").
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/8b69b40826553c1dd500d9d25e45883744f3f348.1768556791.git.chleroy@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a simple ipvtap test to test handling
IP-address add/remove on ipvlan interface.
It creates a veth-interface and then creates several
network-namespace with ipvlan0 interface in it linked to veth.
Then it starts to add/remove addresses on ipvlan0 interfaces
in several threads.
At finish, it checks that there is no duplicated addresses.
Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry@huawei.com>
Link: https://patch.msgid.link/20260112142417.4039566-3-skorodumov.dmitry@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drop the assertions about IOMMU mappings sizes for VFIO_TYPE1_IOMMU
modes (both the VFIO mode and the iommufd compatibility mode). These
assertions fail when CONFIG_IOMMUFD_VFIO_CONTAINER is enabled, since
iommufd compatibility mode provides different huge page behavior than
VFIO for VFIO_TYPE1_IOMMU. VFIO_TYPE1_IOMMU is an old enough interface
that it's not worth changing the behavior of VFIO and iommufd to match
nor care about the IOMMU mapping sizes.
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: https://lore.kernel.org/kvm/20260109143830.176dc279@shazbot.org/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114211252.2581145-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Test IOMMU mapping the BAR mmaps created during vfio_pci_device_setup().
All IOMMU modes are tested: vfio_type1 variants are expected to succeed,
while non-type1 modes are expected to fail. iommufd compat mode can be
updated to expect success once kernel support lands. Native iommufd will
not support mapping vaddrs backed by MMIO (it will support dma-buf based
MMIO mapping instead).
Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114-map-mmio-test-v3-3-44e036d95e64@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Update vfio_pci_bar_map() to align BAR mmaps for efficient huge page
mappings. The manual mmap alignment can be removed once mmap(!MAP_FIXED)
on vfio device fds improves to automatically return well-aligned
addresses.
Also add MADV_HUGEPAGE, which encourages the kernel to use huge pages
(e.g. when /sys/kernel/mm/transparent_hugepage/enabled is set to "madvise").
Drop MAP_FILE from mmap(). It is an ignored compatibility flag.
Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114-map-mmio-test-v3-2-44e036d95e64@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Replace scattered string literals with MODE_* macros in iommu.h. This
provides a single source of truth for IOMMU mode name strings.
Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20260114-map-mmio-test-v3-1-44e036d95e64@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Add selftests that verify the keyring behaves correctly.
For simplicity this works with dm-verity as a module.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
The variable 'i' in wrong_timers_test() is declared but never used.
This was detected by Cppcheck static analysis.
tools/testing/selftests/alsa/utimer-test.c:144:9: style: Unused variable: i [unusedVariable]
Remove it to clean up the code and silence the warning.
Signed-off-by: LeeYongjun <jun85566@gmail.com>
Link: https://patch.msgid.link/20260118065510.29644-1-jun85566@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Improve the passive TFO test to report failure if the server or the
client timed out or exited with non-zero status.
Before this commit, TFO test didn't fail even if exit(EXIT_FAILURE) is
added to the first line of the run_server() and run_client() functions.
Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/214d399caec2e5de7738ced5736829915d507e4e.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The s390 vDSO source directory was recently moved,
but this reference was not updated.
Fixes: c0087d807a ("s390/vdso: Rename vdso64 to vdso")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The previous change centralizing kselftest.h include path in lib.mk caused x86
selftests to fail, as x86 Makefile overwrites CFLAGS using ":=", dropping the
include path added in lib.mk. Therefore, helpers.h could not find kselftest.h
during compilation.
Fix this by adding the tools/testing/sefltest to CFLAGS in x86 Makefile.
[ bp: Correct commit ID in Fixes: ]
Fixes: e6fbd1759c ("selftests: complete kselftest include centralization")
Closes: https://lore.kernel.org/lkml/CA+G9fYvKjQcCBMfXA-z2YuL2L+3Qd-pJjEUDX8PDdz2-EEQd=Q@mail.gmail.com/T/#m83fd330231287fc9d6c921155bee16c591db7360
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Brendan Jackman <jackmanb@google.com>
Link: https://patch.msgid.link/20251022062948.162852-1-reddybalavignesh9979@gmail.com
On my arm64 machine, I get the following failure:
...
tester_init:PASS:tester_log_buf 0 nsec
process_subtest:PASS:obj_open_mem 0 nsec
process_subtest:PASS:specs_alloc 0 nsec
serial_test_map_kptr:PASS:rcu_tasks_trace_gp__open_and_load 0 nsec
...
test_map_kptr_success:PASS:map_kptr__open_and_load 0 nsec
test_map_kptr_success:PASS:test_map_kptr_ref1 refcount 0 nsec
test_map_kptr_success:FAIL:test_map_kptr_ref1 retval unexpected error: 2 (errno 2)
test_map_kptr_success:PASS:test_map_kptr_ref2 refcount 0 nsec
test_map_kptr_success:FAIL:test_map_kptr_ref2 retval unexpected error: 1 (errno 2)
...
#201/21 map_kptr/success-map:FAIL
In serial_test_map_kptr(), before test_map_kptr_success(), one
kern_sync_rcu() is used to have some delay for freeing the map.
But in my environment, one kern_sync_rcu() seems not enough and
caused the test failure.
In bpf_map_free_in_work() in syscall.c, the queue time for
queue_work(system_dfl_wq, &map->work)
may be longer than expected. This may cause the test failure
since test_map_kptr_success() expects all previous maps having been freed.
Since it is not clear how long queue_work() time takes, a bpf prog
is added to count the reference after bpf_kfunc_call_test_acquire().
If the number of references is 2 (for initial ref and the one just
acquired), all previous maps should have been released. This will
resolve the above 'retval unexpected error' issue.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20260116052245.3692405-1-yonghong.song@linux.dev
If CONFIG_VXLAN is 'm', struct vxlanhdr will not be in vmlinux.h.
Add a ___local variant to support cases where vxlan is a module.
Fixes: 8517b1abe5 ("selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260115163457.146267-1-alan.maguire@oracle.com
- Recognize all ZONE_DEVICE users as physaddr consumers
- Fix format string for extended_linear_cache_size_show()
- Fix target list setup for multiple decoders sharing the same
downstream port
- Restore HBIW check before derefernce platform data
- Fix potential infinite loop in __cxl_dpa_reserve()
- Check for invalid addresses returned from translation functions on
error.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmlqdOUACgkQYGjFFmlT
OEp6bA/8C5uSMf4BMN+gJMerX1wTcVfmDy3nYyk80ZM4Gw2QqxrhtRzG3bfeYaXm
IioxYVMWscqasN16xuX7uje2sSDvoPTZYJ2tfnzKfAbyReknlolXvjWJtgoSuVKj
IGZDGLjuW0CcTpFW8NgNGiiSExnv6Iap7CnzRTXj72ZQkGcb8Goh4JL/CmAWrF8m
HQDbFZWQHI60rOLgxTXgWzlNKej4/tZfCTRKV1KIhqRv+MCurzNPb62H+giCwGDh
M+apYESCW3xCk2B7RbGmc6U+rcSmCNc3O31JL3Yt9TKPp4V5N0kaH8dd1196GnQX
HOJFlDjV5J/HTJjk4vondyAcIcUaSw3T7IJLtBRbXOBrs1zQ76adV96UNSwSEQC4
KaRjRf0uCsdDvwBw44VbxnWJiH6Y6jrMhKGPyCrXxFikX2EIR5I1l7+ytKDTNC/+
wdAf3+ch3qc6PfTBtooAWzji9e8JFfWpby87UHkWSz0eeKedJQnMbKtbloc6tHLc
fO/jiUbhay5QGfnvdPwL31zTGqvZB9eGUR2oBSasRJ7taWvURN2ibAzhAnTY8UTN
EQrTZ7isXPfxHwEkaGtpFNWts0iX8kJYzl/wjZjWtukltT4NmMy7NfMlkecXWQ5N
iueUMr6tHKYeDpmmFbdC+Y83LFSWv3i9TiINF1vMWuRj8MryRVU=
=pIv9
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang:
- Recognize all ZONE_DEVICE users as physaddr consumers
- Fix format string for extended_linear_cache_size_show()
- Fix target list setup for multiple decoders sharing the same
downstream port
- Restore HBIW check before derefernce platform data
- Fix potential infinite loop in __cxl_dpa_reserve()
- Check for invalid addresses returned from translation functions on
error
* tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Check for invalid addresses returned from translation functions on errors
cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve()
cxl/acpi: Restore HBIW check before dereferencing platform_data
cxl/port: Fix target list setup for multiple decoders sharing the same dport
cxl/region: fix format string for resource_size_t
x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
Currently the only way of excluding certain tests from a collection is by
passing all the other tests explicitly via `--test`. Therefore, if the user
wants to skip a single test the resulting command line might be too big,
depending on the collection. Add an option `--skip` that takes care of
that.
Link: https://lore.kernel.org/r/20260116-selftests-add_skip_opt-v1-1-ab54afaae81b@suse.com
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Before the last commit, sync_linked_regs() corrupted the register whose
bounds are being updated by copying known_reg's id to it. The ids are
the same in value but known_reg has the BPF_ADD_CONST flag which is
wrongly copied to reg.
This later causes issues when creating new links to this reg.
assign_scalar_id_before_mov() sees this BPF_ADD_CONST and gives a new id
to this register and breaks the old links. This is exposed by the added
selftest.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260115151143.1344724-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update the nested dirty log test to validate KVM's handling of READ faults
when dirty logging is enabled. Specifically, set the Dirty bit in the
guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
when handling L2 read faults. When handling read faults in the shadow MMU,
KVM opportunistically creates a writable SPTE if the mapping can be
writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
if KVM doesn't need to intercept writes in order to emulate Dirty-bit
updates.
To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
sequences to separate L1 pages, and differentiate between "marked dirty
due to a WRITE access/fault" and "marked dirty due to creating a writable
SPTE for a READ access/fault". The updated sequence exposes the bug fixed
by KVM commit 1f4e5fc83a ("KVM: x86: fix nested guest live migration
with PML") when the guest performs a READ=>WRITE sequence with dirty guest
PTEs.
Opportunistically tweak and rename the address macros, and add comments,
to make it more obvious what the test is doing. E.g. NESTED_TEST_MEM1
vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
creating aliases in both the L2 GPA and GVA address spaces, but only when
L1 is using TDP to run L2.
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260115172154.709024-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Currently, the test breaks if the SUT already has a default route
configured for IPv6. Fix by avoiding the use of the default namespace.
Fixes: 4ed591c8ab ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260113-selftests-net-fib-onlink-v2-1-89de2b931389@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Loopback transport can mangle data in rx queue when a linear skb is
followed by a small MSG_ZEROCOPY packet.
To exercise the logic, send out two packets: a weirdly sized one (to ensure
some spare tail room in the skb) and a zerocopy one that's small enough to
fit in the spare room of its predecessor. Then, wait for both to land in
the rx queue, and check the data received. Faulty packets merger manifests
itself by corrupting payload of the later packet.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260113-vsock-recv-coalescence-v2-2-552b17837cf4@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We do not actually test the bpf_override_return helper functionality
itself at the moment, only the bpf program being able to attach it.
Adding test that override prctl syscall return value on top of
kprobe and kprobe.multi.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20260112121157.854473-2-jolsa@kernel.org
- four kerneldoc fixes from Bagas Sanjaya
- four DAMON fixes from SeongJae
- four mremap VMA-related fixes from Lorenzo
- various singletons - please see the changelogs for details
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaWkP3gAKCRDdBJ7gKXxA
jnmTAQCBiCyw7Pvvak5OM8Of0qQ8m/jQmNMBPRrd5M3tAAEOlwD9F7E6RjcmyItU
k+sboDgNKTvgdRHTmRzj1t96aXJrSQQ=
=OxTh
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2026-01-15-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
- kerneldoc fixes from Bagas Sanjaya
- DAMON fixes from SeongJae
- mremap VMA-related fixes from Lorenzo
- various singletons - please see the changelogs for details
* tag 'mm-hotfixes-stable-2026-01-15-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (30 commits)
drivers/dax: add some missing kerneldoc comment fields for struct dev_dax
mm: numa,memblock: include <asm/numa.h> for 'numa_nodes_parsed'
mailmap: add entry for Daniel Thompson
tools/testing/selftests: fix gup_longterm for unknown fs
mm/page_alloc: prevent pcp corruption with SMP=n
iommu/sva: include mmu_notifier.h header
mm: kmsan: fix poisoning of high-order non-compound pages
tools/testing/selftests: add forked (un)/faulted VMA merge tests
mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
tools/testing/selftests: add tests for !tgt, src mremap() merges
mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
mm/zswap: fix error pointer free in zswap_cpu_comp_prepare()
mm/damon/sysfs-scheme: cleanup access_pattern subdirs on scheme dir setup failure
mm/damon/sysfs-scheme: cleanup quotas subdirs on scheme dir setup failure
mm/damon/sysfs: cleanup attrs subdirs on context dir setup failure
mm/damon/sysfs: cleanup intervals subdirs on attrs dir setup failure
mm/damon/core: remove call_control in inactive contexts
powerpc/watchdog: add support for hardlockup_sys_info sysctl
mips: fix HIGHMEM initialization
mm/hugetlb: ignore hugepage kernel args if hugepages are unsupported
...
Fix minor documentation errors in `kvm_util.h` and `kvm_util.c`.
- Correct the argument description for `vcpu_args_set` in `kvm_util.h`,
which incorrectly listed `vm` instead of `vcpu`.
- Fix a typo in the comment for `kvm_selftest_arch_init` ("exeucting" ->
"executing").
- Correct the return value description for `vm_vaddr_unused_gap` in
`kvm_util.c` to match the implementation, which returns an address "at
or above" `vaddr_min`, not "at or below".
No functional change intended.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
To avoid code duplication, move page_align() to the shared `kvm_util.h`
header file. Rename it to vm_page_align(), to make it clear that the
alignment is done with respect to the guest's base page size.
No functional change intended.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.
Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.
Fixes: 3e06cdf105 ("KVM: selftests: Add initial support for RISC-V 64-bit")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.
Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.
Fixes: 7a6629ef74 ("kvm: selftests: add virt mem support for aarch64")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM selftests map all guest code and data into the lower virtual address
range (0x0000...) managed by TTBR0_EL1. The upper range (0xFFFF...)
managed by TTBR1_EL1 is unused and uninitialized.
If a guest accesses the upper range, the MMU attempts a translation
table walk using uninitialized registers, leading to unpredictable
behavior.
Set `TCR_EL1.EPD1` to disable translation table walks for TTBR1_EL1,
ensuring that any access to the upper range generates an immediate
Translation Fault. Additionally, set `TCR_EL1.TBI1` (Top Byte Ignore) to
ensure that tagged pointers in the upper range also deterministically
trigger a Translation Fault via EPD1.
Define `TCR_EPD1_MASK`, `TCR_EPD1_SHIFT`, and `TCR_TBI1` in
`processor.h` to support this configuration. These are based on their
definitions in `arch/arm64/include/asm/pgtable-hwdef.h`.
Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Commit 66bce7afba ("selftests/mm: fix test result reporting in
gup_longterm") introduced a small bug causing unknown filesystems to
always result in a test failure.
This is because do_test() was updated to use a common reporting path, but
this case appears to have been missed.
This is problematic for e.g. virtme-ng which uses an overlayfs file
system, causing gup_longterm to appear to fail each time due to a test
count mismatch:
# Planned tests != run tests (50 != 46)
# Totals: pass:24 fail:0 xfail:0 xpass:0 skip:22 error:0
The fix is to simply change the return into a break.
Link: https://lkml.kernel.org/r/20260106154547.214907-1-lorenzo.stoakes@oracle.com
Fixes: 66bce7afba ("selftests/mm: fix test result reporting in gup_longterm")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we correctly handle forked faulted/unfaulted merge on mremap(),
exhaustively assert that we handle this correctly.
Do this in the less duplicative way by adding a new merge_with_fork
fixture and forked/unforked variants, and abstract the forking logic as
necessary to avoid code duplication with this also.
Link: https://lkml.kernel.org/r/1daf76d89fdb9d96f38a6a0152d8f3c2e9e30ac7.1767638272.git.lorenzo.stoakes@oracle.com
Fixes: 879bca0a2c ("mm/vma: fix incorrectly disallowed anonymous VMA merges")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Test that mremap()'ing a VMA into a position such that the target VMA on
merge is unfaulted and the source faulted is correctly performed.
We cover 4 cases:
1. Previous VMA unfaulted:
copied -----|
v
|-----------|.............|
| unfaulted |(faulted VMA)|
|-----------|.............|
prev
target = prev, expand prev to cover.
2. Next VMA unfaulted:
copied -----|
v
|.............|-----------|
|(faulted VMA)| unfaulted |
|.............|-----------|
next
target = next, expand next to cover.
3. Both adjacent VMAs unfaulted:
copied -----|
v
|-----------|.............|-----------|
| unfaulted |(faulted VMA)| unfaulted |
|-----------|.............|-----------|
prev next
target = prev, expand prev to cover.
4. prev unfaulted, next faulted:
copied -----|
v
|-----------|.............|-----------|
| unfaulted |(faulted VMA)| faulted |
|-----------|.............|-----------|
prev next
target = prev, expand prev to cover. Essentially equivalent to 3, but
with additional requirement that next's anon_vma is the same as the
copied VMA's.
Each of these are performed with MREMAP_DONTUNMAP set, which will cause a
KASAN assert for UAF or an assert on zero refcount anon_vma if a bug
exists with correctly propagating anon_vma state in each scenario.
Link: https://lkml.kernel.org/r/f903af2930c7c2c6e0948c886b58d0f42d8e8ba3.1767638272.git.lorenzo.stoakes@oracle.com
Fixes: 879bca0a2c ("mm/vma: fix incorrectly disallowed anonymous VMA merges")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test which checks that the destination register of a gotox
instruction is marked as used and that the union of jump targets
is considered as live.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260114162544.83253-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cross-merge BPF and other fixes after downstream PR.
No conflicts.
Adjacent:
Auto-merging MAINTAINERS
Auto-merging Makefile
Auto-merging kernel/bpf/verifier.c
Auto-merging kernel/sched/ext.c
Auto-merging mm/memcontrol.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test for VMLOAD/VMSAVE in an L2 guest. The test verifies that L1
intercepts for VMSAVE/VMLOAD always work regardless of
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK.
Then, more interestingly, it makes sure that when L1 does not intercept
VMLOAD/VMSAVE, they work as intended in L2. When
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is enabled by L1, VMSAVE/VMLOAD from
L2 should interpret the GPA as an L2 GPA and translate it through the
NPT. When VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is disabled by L1,
VMSAVE/VMLOAD from L2 should interpret the GPA as an L1 GPA.
To test this, put two VMCBs (0 and 1) in L1's physical address space,
and have a single L2 GPA where:
- L2 VMCB GPA == L1 VMCB(0) GPA
- L2 VMCB GPA maps to L1 VMCB(1) via the NPT in L1.
This setup allows detecting how the GPA is interpreted based on which L1
VMCB is actually accessed.
In both cases, L2 sets KERNEL_GS_BASE (one of the fields handled by
VMSAVE/VMLOAD), and executes VMSAVE to write its value to the VMCB. The
test userspace code then checks that the write was made to the correct
VMCB (based on whether VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is set by L1),
and writes a new value to that VMCB. L2 then executes VMLOAD to load the
new value and makes sure it's reflected correctly in KERNERL_GS_BASE.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260110004821.3411245-4-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Instead of calling memstress_setup_ept_mappings() only in the first
iteration in the loop, move it before the loop.
The call needed to happen within the loop before commit e40e72fec0
("KVM: selftests: Stop passing VMX metadata to TDP mapping functions"),
as memstress_setup_ept_mappings() used to take in a pointer to vmx_pages
and pass it into tdp_identity_map_1g() (to get the EPT root GPA). This
is no longer the case, as tdp_identity_map_1g() gets the EPT root
through stage2 MMU.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260113171456.2097312-1-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
The cache parameter of getcpu() is useless nowadays for various reasons.
* It is never passed by userspace for either the vDSO or syscalls.
* It is never used by the kernel.
* It could not be made to work on the current vDSO architecture.
* The structure definition is not part of the UAPI headers.
* vdso_getcpu() is superseded by restartable sequences in any case.
Remove the struct and its header.
As a side-effect this gets rid of an unwanted inclusion of the linux/
header namespace from vDSO code.
[ tglx: Adapt to s390 upstream changes */
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Link: https://patch.msgid.link/20251230-getcpu_cache-v3-1-fb9c5f880ebe@linutronix.de
The ldimm64 instruction for map value supports an offset.
For insn array maps it wasn't tested before, as normally
such instructions aren't generated. However, this is still
possible to pass such instructions, so add a few tests to
check that correct offsets work properly and incorrect
offsets are rejected.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260111153047.8388-4-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF verifier was recently updated to treat pointers to struct types
returned from BPF kfuncs as implicitly trusted by default. Add a new
test case to exercise this new implicit trust semantic.
The KF_ACQUIRE flag was dropped from the bpf_get_root_mem_cgroup()
kfunc because it returns a global pointer to root_mem_cgroup without
performing any explicit reference counting. This makes it an ideal
candidate to verify the new implicit trusted pointer semantics.
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-3-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Teach the BPF verifier to treat pointers to struct types returned from
BPF kfuncs as implicitly trusted (PTR_TO_BTF_ID | PTR_TRUSTED) by
default. Returning untrusted pointers to struct types from BPF kfuncs
should be considered an exception only, and certainly not the norm.
Update existing selftests to reflect the change in register type
printing (e.g. `ptr_` becoming `trusted_ptr_` in verifier error
messages).
Link: https://lore.kernel.org/bpf/aV4nbCaMfIoM0awM@google.com/
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The toeplitz.py test passed the hex mask without "0x" prefix (e.g.,
"300" for CPUs 8,9). The toeplitz.c strtoul() call wrongly parsed this
as decimal 300 (0x12c) instead of hex 0x300.
Pass the prefixed mask to toeplitz.c, and the unprefixed one to sysfs.
Fixes: 9cf9aa77a1 ("selftests: drv-net: hw: convert the Toeplitz test to Python")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112173715.384843-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add test cases that verify that when the "onlink" keyword is specified,
both address families (with and without VRF) accept routes with a
gateway address that is reachable via a different interface than the one
specified.
Output without "ipv6: Allow for nexthop device mismatch with "onlink"":
# ./fib-onlink-tests.sh | grep mismatch
TEST: nexthop device mismatch [ OK ]
TEST: nexthop device mismatch [ OK ]
TEST: nexthop device mismatch [FAIL]
TEST: nexthop device mismatch [FAIL]
Output with "ipv6: Allow for nexthop device mismatch with "onlink"":
# ./fib-onlink-tests.sh | grep mismatch
TEST: nexthop device mismatch [ OK ]
TEST: nexthop device mismatch [ OK ]
TEST: nexthop device mismatch [ OK ]
TEST: nexthop device mismatch [ OK ]
That is, the IPv4 tests were always passing, but the IPv6 ones only pass
after the specified patch.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A multicast gateway address should be rejected when "onlink" is
specified, but it is only tested as part of the IPv6 tests. Add an
equivalent IPv4 test.
# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip ro add table 254 169.254.101.12/32 via 233.252.0.1 dev veth1 onlink
Error: Nexthop has invalid gateway.
TEST: Invalid gw - multicast address [ OK ]
[...]
COMMAND: ip ro add table 1101 169.254.102.12/32 via 233.252.0.1 dev veth5 onlink
Error: Nexthop has invalid gateway.
TEST: Invalid gw - multicast address, VRF [ OK ]
[...]
Tests passed: 37
Tests failed: 0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The command in the test fails as expected because IPv6 forbids a nexthop
device mismatch:
# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip -6 ro add table 1101 2001:db8:102::103/128 via 2001:db8:701::64 dev veth5 onlink
Error: Nexthop has invalid gateway or device mismatch.
TEST: Gateway resolves to wrong nexthop device - VRF [ OK ]
[...]
Where:
# ip route get 2001:db8:701::64 vrf lisa
2001:db8:701::64 dev veth7 table 1101 proto kernel src 2001:db8:701::1 metric 256 pref medium
This is in contrast to IPv4 where a nexthop device mismatch is allowed
when "onlink" is specified:
# ip route get 169.254.7.2 vrf lisa
169.254.7.2 dev veth7 table 1101 src 169.254.7.1 uid 0
# ip ro add table 1101 169.254.102.103/32 via 169.254.7.2 dev veth5 onlink
# echo $?
0
Remove these tests in preparation for aligning IPv6 with IPv4 and
allowing nexthop device mismatch when "onlink" is specified.
A subsequent patch will add tests that verify that both address families
allow a nexthop device mismatch with "onlink".
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to the test description, these tests fail because of a wrong
nexthop device:
# ./fib-onlink-tests.sh -v
[...]
COMMAND: ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth1 onlink
Error: Nexthop has invalid gateway.
TEST: Gateway resolves to wrong nexthop device [ OK ]
COMMAND: ip ro add table 1101 169.254.102.103/32 via 169.254.7.1 dev veth5 onlink
Error: Nexthop has invalid gateway.
TEST: Gateway resolves to wrong nexthop device - VRF [ OK ]
[...]
But this is incorrect. They fail because the gateway addresses are local
addresses:
# ip -4 address show
[...]
28: veth3@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link-netns peer_ns-Urqh3o
inet 169.254.3.1/24 scope global veth3
[...]
32: veth7@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lisa state UP group default qlen 1000 link-netns peer_ns-Urqh3o
inet 169.254.7.1/24 scope global veth7
Therefore, using a local address that matches the nexthop device fails
as well:
# ip ro add table 254 169.254.101.102/32 via 169.254.3.1 dev veth3 onlink
Error: Nexthop has invalid gateway.
Using a gateway address with a "wrong" nexthop device is actually valid
and allowed:
# ip route get 169.254.1.2
169.254.1.2 dev veth1 src 169.254.1.1 uid 0
# ip ro add table 254 169.254.101.102/32 via 169.254.1.2 dev veth3 onlink
# echo $?
0
Remove these tests given that their output is confusing and that the
scenario that they are testing is already covered by other tests.
A subsequent patch will add tests for the nexthop device mismatch
scenario.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260111120813.159799-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GRO test groups the cases into categories, e.g. "tcp" case
checks coalescing in presence of:
- packets with bad csum,
- sequence number mismatch,
- timestamp option value mismatch,
- different TCP options.
Since we now have TAP support grouping the cases like that
lowers our reporting granularity. This matters even more for
NICs performing HW GRO and LRO since it appears that most
implementation have _some_ bugs. Flagging the whole group
of tests as failed prevents us from catching regressions
in the things that work today.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Run the test against HW GRO and LRO. NICs I have pass the base cases.
Interestingly all are happy to build GROs larger than 64k.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We'll need to do a lot more feature handling to test HW-GRO and LRO.
Clean up the feature handling for SW GRO a bit to let the next commit
focus on the new test cases, only.
Make sure HW GRO-like features are not enabled for the SW tests.
Be more careful about changing features as "nothing changed"
situations may result in non-zero error code from ethtool.
Don't disable TSO on the local interface (receiver) when running over
netdevsim, we just want GSO to break up the segments on the sender.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260113000740.255360-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that cmd() can be printed directly remove the old formatting.
Before:
# fragmented ip6 doesn't coalesce:
# Expected {200 100 100 }, Total 3 packets
# Received {200 100 }, Total 2 packets.
# /root/ksft-net-drv/drivers/net/gro: incorrect number of packets
Now:
# CMD: drivers/net/gro --ipv6 --dmac 9e:[...]
# EXIT: 1
# STDOUT: fragmented ip6 doesn't coalesce:
# STDERR: Expected {200 100 100 }, Total 3 packets
# Received {200 100 }, Total 2 packets.
# /root/ksft-net-drv/drivers/net/gro: incorrect number of packets
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Teach cmd() how to print itself, to make debug prints easier.
Example output (leading # due to ksft_pr()):
# CMD: /root/ksft-net-drv/drivers/net/gro
# EXIT: 1
# STDOUT: ipv6 with ext header does coalesce:
# STDERR: Expected {200 }, Total 1 packets
# Received {100 [!=200]100 [!=0]}, Total 2 packets.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make printing multi-line logs easier by automatically prefixing
each line in ksft_pr(). Make use of this when formatting exceptions.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260113000740.255360-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix KVM's long-standing buggy handling of SVM's exit_code as a 32-bit
value. Per the APM and Xen commit d1bd157fbc ("Big merge the HVM
full-virtualisation abstractions.") (which is arguably more trustworthy
than KVM), offset 0x70 is a single 64-bit value:
070h 63:0 EXITCODE
Track exit_code as a single u64 to prevent reintroducing bugs where KVM
neglects to correctly set bits 63:32.
Fixes: 6aa8b732ca ("[PATCH] kvm: userspace interface")
Cc: Jim Mattson <jmattson@google.com>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230211347.4099600-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a test to verify KVM correctly handles a variety of edge cases related
to APICv updates, and in particular updates that are triggered while L2 is
actively running.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20260109034532.1012993-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
This patch introduces test cases for the btf__permute function to ensure
it works correctly with both base BTF and split BTF scenarios.
The test suite includes:
- test_permute_base: Validates permutation on base BTF
- test_permute_split: Tests permutation on split BTF
Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20260109130003.3313716-3-dolinux.peng@gmail.com
This test occasionally fails due to exceeding timing bounds, as
run in continuous testing on netdev.bots:
https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
A common pattern is a single elevated delay between USR and SND.
# 8.36 [+0.00] test SND
# 8.36 [+0.00] USR: 1767864384 s 240994 us (seq=0, len=0)
# 8.44 [+0.08] ERROR: 18461 us expected between 10000 and 18000
# 8.44 [+0.00] SND: 1767864384 s 259455 us (seq=42, len=10) (USR +18460 us)
# 8.52 [+0.07] SND: 1767864384 s 339523 us (seq=42, len=10) (USR +10005 us)
# 8.52 [+0.00] USR: 1767864384 s 409580 us (seq=0, len=0)
# 8.60 [+0.08] SND: 1767864384 s 419586 us (seq=42, len=10) (USR +10005 us)
# 8.60 [+0.00] USR: 1767864384 s 489645 us (seq=0, len=0)
# 8.68 [+0.08] SND: 1767864384 s 499651 us (seq=42, len=10) (USR +10005 us)
# 8.68 [+0.00] USR-SND: count=4, avg=12119 us, min=10005 us, max=18460 us
(Note that other delays are nowhere near the large 8ms tolerance.)
One hypothesis is that the task is descheduled between taking the USR
timestamp and sending the packet. Possibly in printing.
Delay taking the timestamp closer to sendmsg, and delay printing until
after sendmsg.
With this change, failure rate is significantly lower in current runs.
Link: https://lore.kernel.org/netdev/20260107110521.1aab55e9@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260112163355.3510150-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Avoid freeing stack-allocated node in kvm_async_pf_queue_task
- Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmlh7zYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM6bQgArr0BI1b34xyPmsGk3J0+mUBSeYwp
0aVkegq4BLTDNDY7/yz7jXf9wL0x8uLps/+PJjR3QKpjAAMCwwWi5iQrwoVLsOjO
N9S15Np4Be1fKvpHoB+8z7/uGFhHr9ltpO5wsDOdR2/lpEnZlTZZtD2Wl093DsKV
xoCbddcfgRmx78NV15GnXSTFeTPodHSLQL+96oVN4XZRZVLefYsISnZa66VA5B/2
pl9SndD/zMMo8j6YrrImtuQL6Km+DlqNqNddAXgvQ6JIJ3rO/WgLyxJPaV8trC4Q
n6x8hQcFCRqM+Oi9G0+b4LgfQUPg/0hxPSIZi8yeZAZpJhMb7iEetLXc6A==
=Kew3
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm fixes from Paolo Bonzini:
- Avoid freeing stack-allocated node in kvm_async_pf_queue_task
- Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1
selftests: kvm: try getting XFD and XSAVE state out of sync
selftests: kvm: replace numbered sync points with actions
x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1
x86/kvm: Avoid freeing stack-allocated node in kvm_async_pf_queue_task
With 64K page on arm64, verifier_arena_globals1 failed like below:
...
libbpf: map 'arena': failed to create: -E2BIG
...
#509/1 verifier_arena_globals1/check_reserve1:FAIL
...
For 64K page, if the number of arena pages is (1UL << 20), the total
memory will exceed 4G and this will cause map creation failure.
Adjusting ARENA_PAGES based on the actual page size fixed the problem.
Cc: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260113061033.3798549-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The current selftest sk_bypass_prot_mem only supports 4K page.
When running with 64K page on arm64, the following failure happens:
...
check_bypass:FAIL:no bypass unexpected no bypass: actual 3 <= expected 32
...
#385/1 sk_bypass_prot_mem/TCP :FAIL
...
check_bypass:FAIL:no bypass unexpected no bypass: actual 4 <= expected 32
...
#385/2 sk_bypass_prot_mem/UDP :FAIL
...
Adding support to 64K page as well fixed the failure.
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061028.3798326-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On arm64 with 64K page , I observed the following test failure:
...
subtest_dmabuf_iter_check_lots_of_buffers:FAIL:total_bytes_read unexpected total_bytes_read:
actual 4696 <= expected 65536
#97/3 dmabuf_iter/lots_of_buffers:FAIL
With 4K page on x86, the total_bytes_read is 4593.
With 64K page on arm64, the total_byte_read is 4696.
In progs/dmabuf_iter.c, for each iteration, the output is
BPF_SEQ_PRINTF(seq, "%lu\n%llu\n%s\n%s\n", inode, size, name, exporter);
The only difference between 4K and 64K page is 'size' in
the above BPF_SEQ_PRINTF. The 4K page will output '4096' and
the 64K page will output '65536'. So the total_bytes_read with 64K page
is slighter greater than 4K page.
Adjusting the total_bytes_read from 65536 to 4096 fixed the issue.
Cc: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061023.3798085-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Translation functions may return an invalid address in case of errors.
If the address is not checked the further use of the invalid value
will cause an address corruption.
Consistently check for a valid address returned by translation
functions. Use RESOURCE_SIZE_MAX to indicate an invalid address for
type resource_size_t. Depending on the type either RESOURCE_SIZE_MAX
or ULLONG_MAX is used to indicate an address error.
Propagating an invalid address from a failed translation may cause
userspace to think it has received a valid SPA, when in fact it is
wrong. The CXL userspace API, using trace events, expects ULLONG_MAX
to indicate a translation failure. If ULLONG_MAX is not returned
immediately, subsequent calculations can transform that bad address
into a different value (!ULLONG_MAX), and an invalid SPA may be
returned to userspace. This can lead to incorrect diagnostics and
erroneous corrective actions.
[ dj: Added user impact statement from Alison. ]
[ dj: Fixed checkpatch tab alignment issue. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Fixes: c3dd67681c ("cxl/region: Add inject and clear poison by region offset")
Fixes: b78b9e7b79 ("cxl/region: Refactor address translation funcs for testing")
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260107120544.410993-1-rrichter@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
SYS_clock_getres might have been redirected by libc to some other system
call than the actual clock_getres. For testing it is required to use
exactly this system call.
Use the system call number exported by the UAPI headers which is always
correct.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20251223-vdso-compat-time32-v1-3-97ea7a06a543@linutronix.de
Test 684b: Create CAKE_MQ with default setting (4 queues)
Test 7ee8: Create CAKE_MQ with bandwidth limit (4 queues)
Test 1f87: Create CAKE_MQ with rtt time (4 queues)
Test e9cf: Create CAKE_MQ with besteffort flag (4 queues)
Test 7c05: Create CAKE_MQ with diffserv8 flag (4 queues)
Test 5a77: Create CAKE_MQ with diffserv4 flag (4 queues)
Test 8f7a: Create CAKE_MQ with flowblind flag (4 queues)
Test 7ef7: Create CAKE_MQ with dsthost and nat flag (4 queues)
Test 2e4d: Create CAKE_MQ with wash flag (4 queues)
Test b3e6: Create CAKE_MQ with flowblind and no-split-gso flag (4 queues)
Test 62cd: Create CAKE_MQ with dual-srchost and ack-filter flag (4 queues)
Test 0df3: Create CAKE_MQ with dual-dsthost and ack-filter-aggressive flag (4 queues)
Test 9a75: Create CAKE_MQ with memlimit and ptm flag (4 queues)
Test cdef: Create CAKE_MQ with fwmark and atm flag (4 queues)
Test 93dd: Create CAKE_MQ with overhead 0 and mpu (4 queues)
Test 1475: Create CAKE_MQ with conservative and ingress flag (4 queues)
Test 7bf1: Delete CAKE_MQ with conservative and ingress flag (4 queues)
Test ee55: Replace CAKE_MQ with mpu (4 queues)
Test 6df9: Change CAKE_MQ with mpu (4 queues)
Test 67e2: Show CAKE_MQ class (4 queues)
Test 2de4: Change bandwidth of CAKE_MQ (4 queues)
Test 5f62: Fail to create CAKE_MQ with autorate-ingress flag (4 queues)
Test 038e: Fail to change setting of sub-qdisc under CAKE_MQ
Test 7bdc: Fail to replace sub-qdisc under CAKE_MQ
Test 18e0: Fail to install CAKE_MQ on single queue device
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-6-8d613fece5d8@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The "struct alg" object contains a union of 3 xfrm structures:
union {
struct xfrm_algo;
struct xfrm_algo_aead;
struct xfrm_algo_auth;
}
All of them end with a flexible array member used to store key material,
but the flexible array appears at *different offsets* in each struct.
bcz of this, union itself is of variable-sized & Placing it above
char buf[...] triggers:
ipsec.c:835:5: warning: field 'u' with variable sized type 'union
(unnamed union at ipsec.c:831:3)' not at the end of a struct or class
is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
835 | } u;
| ^
one fix is to use "TRAILING_OVERLAP()" which works with one flexible
array member only.
But In "struct alg" flexible array member exists in all union members,
but not at the same offset, so TRAILING_OVERLAP cannot be applied.
so the fix is to explicitly overlay the key buffer at the correct offset
for the largest union member (xfrm_algo_auth). This ensures that the
flexible-array region and the fixed buffer line up.
No functional change.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20260109152201.15668-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With CONFIG_CFI enabled, the kernel strictly enforces that indirect
function calls use a function pointer type that matches the target
function. As bpf_testmod_ctx_release() signature differs from the
btf_dtor_kfunc_t pointer type used for the destructor calls in
bpf_obj_free_fields(), add a stub function with the correct type to
fix the type mismatch.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260110082548.113748-9-samitolvanen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add 'stop' subcommand to kublk utility that uses the new
UBLK_CMD_TRY_STOP_DEV command when --safe option is specified.
This allows stopping a device only if it has no active openers,
returning -EBUSY otherwise.
Also add test_generic_16.sh to test the new functionality.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As stated in commit 1c09b195d3 ("cpuset: fix a regression in validating
config change"), it is not allowed to clear masks of a cpuset if
there're tasks in it. This is specific to v1 since empty "cpuset.cpus"
or "cpuset.mems" will cause the v2 cpuset to inherit the effective CPUs
or memory nodes from its parent. So it is OK to have empty cpus or mems
even if there are tasks in the cpuset.
Move this empty cpus/mems check in validate_change() to
cpuset1_validate_change() to allow more flexibility in setting
cpus or mems in v2. cpuset_is_populated() needs to be moved into
cpuset-internal.h as it is needed by the empty cpus/mems checking code.
Also add a test case to test_cpuset_prs.sh to verify that.
Reported-by: Chen Ridong <chenridong@huaweicloud.com>
Closes: https://lore.kernel.org/lkml/7a3ec392-2e86-4693-aa9f-1e668a668b9c@huaweicloud.com/
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
the sibling's partition state becomes invalid. This is overly harsh and
is probably not necessary.
The cpuset.cpus.exclusive control file, if set, will override the
cpuset.cpus of the same cpuset when creating a cpuset partition.
So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
a partition. However, it cannot override a conflicting cpuset.cpus file
in a sibling cpuset and the partition creation process will fail. This
is inconsistent. That will also make using cpuset.cpus.exclusive less
valuable as a tool to set up cpuset partitions as the users have to
check if such a cpuset.cpus conflict exists or not.
Fix these problems by making sure that once a cpuset.cpus.exclusive
is set without failure, it will always be allowed to form a valid
partition as long as at least one CPU can be granted from its parent
irrespective of the state of the siblings' cpuset.cpus values. Of
course, setting cpuset.cpus.exclusive will fail if it conflicts with
the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value
of a sibling.
Partition can still be created by setting only cpuset.cpus without
setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
be removed from its cpuset.cpus.exclusive.effective as long as there
is still one or more CPUs left and can be granted from its parent. This
CPU stripping is currently done in rm_siblings_excl_cpus().
The new code will now try its best to enable the creation of new
partitions with only cpuset.cpus set without invalidating existing ones.
However it is not guaranteed that all the CPUs requested in cpuset.cpus
will be used in the new partition even when all these CPUs can be
granted from the parent.
This is similar to the fact that cpuset.cpus.effective may not be
able to include all the CPUs requested in cpuset.cpus. In this case,
the parent may not able to grant all the exclusive CPUs requested in
cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
already been granted to other partitions earlier.
With the creation of multiple sibling partitions by setting
only cpuset.cpus, this does have the side effect that their exact
cpuset.cpus.exclusive.effective settings will depend on the order of
partition creation if there are conflicts. Due to the exclusive nature
of the CPUs in a partition, it is not easy to make it fair other than
the old behavior of invalidating all the conflicting partitions.
For example,
# echo "0-2" > A1/cpuset.cpus
# echo "root" > A1/cpuset.cpus.partition
# cat A1/cpuset.cpus.partition
root
# cat A1/cpuset.cpus.exclusive.effective
0-2
# echo "2-4" > B1/cpuset.cpus
# echo "root" > B1/cpuset.cpus.partition
# cat B1/cpuset.cpus.partition
root
# cat B1/cpuset.cpus.exclusive.effective
3-4
# cat B1/cpuset.cpus.effective
3-4
For users who want to be sure that they can get most of the CPUs they
want, cpuset.cpus.exclusive should be used instead if they can set
it successfully without failure. Setting cpuset.cpus.exclusive will
guarantee that sibling conflicts from then onward is no longer possible.
To make this change, we have to separate out the is_cpu_exclusive()
check in cpus_excl_conflict() into a cgroup v1 only
cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
helper is now no longer needed and can be removed.
Some existing tests in test_cpuset_prs.sh are updated and new ones are
added to reflect the new behavior. The cgroup-v2.rst doc file is also
updated the clarify what exclusive CPUs will be used when a partition
is created.
Reported-by: Sun Shaojie <sunshaojie@kylinos.cn>
Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add test case loop_08 to verify the ublk integrity data flow. It uses
the kublk loop target to create a ublk device with integrity on top of
backing data and integrity files. It then writes to the whole device
with fio configured to generate integrity data. Then it reads back the
whole device with fio configured to verify the integrity data.
It also verifies that injected guard, reftag, and apptag corruptions are
correctly detected.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add test case null_04 to exercise all the different integrity params. It
creates 4 different ublk devices with different combinations of
integrity arguments and verifies their integrity limits via sysfs and
the metadata_size utility.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To perform and end-to-end test of integrity information through a ublk
device, we need to actually store it somewhere and retrieve it. Add this
support to kublk's loop target. It uses a second backing file for the
integrity data corresponding to the data stored in the first file.
The integrity file is initialized with byte 0xFF, which ensures the app
and reference tags are set to the "escape" pattern to disable the
bio-integrity-auto guard and reftag checks until the blocks are written.
The integrity file is opened without O_DIRECT since it will be accessed
at sub-block granularity. Each incoming read/write results in a pair of
reads/writes, one to the data file, and one to the integrity file. If
either backing I/O fails, the error is propagated to the ublk request.
If both backing I/Os read/write some bytes, the ublk request is
completed with the smaller of the number of blocks accessed by each I/O.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A subsequent commit will add support for using a backing file to store
integrity data. Since integrity data is accessed in intervals of
metadata_size, which may be much smaller than a logical block on the
backing device, direct I/O cannot be used. Add an argument to
backing_file_tgt_init() to specify the number of files to open for
direct I/O. The remaining files will use buffered I/O. For now, continue
to request direct I/O for all the files.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If integrity data is enabled for kublk, allocate an integrity buffer for
each I/O. Extend ublk_user_copy() to copy the integrity data between the
ublk request and the integrity buffer if the ublksrv_io_desc indicates
that the request has integrity data.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add integrity param command line arguments to kublk. Plumb these to
struct ublk_params for the null and fault_inject targets, as they don't
need to actually read or write the integrity data. Forbid the integrity
params for loop or stripe until the integrity data copy is implemented.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some block device integrity parameters are available in sysfs, but
others are only accessible using the FS_IOC_GETLBMD_CAP ioctl. Add a
metadata_size utility program to print out the logical block metadata
size, PI offset, and PI size within the metadata. Example output:
$ metadata_size /dev/ublkb0
metadata_size: 64
pi_offset: 56
pi_tuple_size: 8
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add support for printing the UBLK_F_INTEGRITY feature flag in the
human-readable kublk features output.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge in fixes that went to 6.19 after for-7.0/block was branched.
Pending ublk changes depend on particularly the async scan work.
* block-6.19:
block: zero non-PI portion of auto integrity buffer
ublk: fix use-after-free in ublk_partition_scan_work
blk-mq: avoid stall during boot due to synchronize_rcu_expedited
loop: add missing bd_abort_claiming in loop_set_status
block: don't merge bios with different app_tags
blk-rq-qos: Remove unlikely() hints from QoS checks
loop: don't change loop device under exclusive opener in loop_set_status
block, bfq: update outdated comment
blk-mq: skip CPU offline notify on unmapped hctx
selftests/ublk: fix Makefile to rebuild on header changes
selftests/ublk: add test for async partition scan
ublk: scan partition in async way
block,bfq: fix aux stat accumulation destination
md: Fix forward incompatibility from configurable logical block size
md: Fix logical_block_size configuration being overwritten
md: suspend array while updating raid_disks via sysfs
md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
md: Fix static checker warning in analyze_sbs
Add a test that exercises create->write->seek->read to check that using the
stream functions (fwrite() etc) is not totally broken.
The only edge cases this is testing for are:
- Reading the file after writing but without rewinding reads nothing.
- Trying to read more items than the file contains returns the count of
fully read items.
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20260105023629.1502801-4-daniel@thingy.jp
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Hook up libc-test to the regular selftest build to make sure
nolibc-test.c stays compatible with a normal libc.
As the pattern rule from lib.mk does not handle compiling a target from
a differently named source file, add an explicit rule definition.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-3-f82101c2c505@weissschuh.net
When stdout is redirected to a file this test fails.
This happens when running through the kselftest runner since
commit d9e6269e33 ("selftests/run_kselftest.sh: exit with
error if tests fail").
For consistency with other tests that read from a file descriptor,
switch to stdin over stdout. The tests are still brittle against
a redirected stdin, but at least they are now consistently so.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-selftests-v1-1-f82101c2c505@weissschuh.net
Fixes tracing test_multiple_writes stalls when buffer_size_kb is less
than 12KB.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmlime8ACgkQCwJExA0N
Qxyung/+KwLzsf+etHDBNFkt8GwMH9+RQcLkR3wZK2X7x074jXPQ1dfd/zHj7O1S
R42VtDAc5QM6LhiyBEteYdYtGeIkbFo9jVmvjctQzYz7yAOShImeEd2oa0Jnt44J
c3Gqo5clmGbM85C0tUlgKGtc43GZfdq6YQKr7/sYqNOkVpijNhjby7XF6ewXKjFD
rem9QglvQoHrG0tTdzOkICIoOPHHCoP8YkQe8qmB0c1Ec9YgYZizIm5ME0oSaxpR
4c6zMAqMOKv/UepYe9J5MgIO2+xveyDKY1yN9hzGyJNZkIwansjCU9XKndnbsPn4
QX38EDEHrry5hZ2Lb/mmBxCHy7PpsH/Wx1LnAU3vP7gflxg/+X35zKhb+kyH5tv8
2v35NRONKhfofSXps4QiQozS5+tUpCXg4k+Dd2NPJ7DHtxJtYB4kksQgbRzChIKX
iKgKuEPM1Fka/vZtbzJDrxcCwqaneunMpsB+WIQR+HDaHiCjTf3pSY7aZzXelT+G
zCcFWLGdOBLOFUIoQuPCa+ourmVEjuYm6OSP5oLF9ObdiumBBW9MdHwj8wfqPsXw
tZf4JdwOCmk0geN8FADIPuJAf+V43cfPFFvcGLiDvRks5+mx+jSRCeG/yynHN2N4
SACsSPhCaJvdNw/tTbvmTlwT0EPDvo3bZI0XvNCfFMGYT4O592I=
=zuty
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"Fix tracing test_multiple_writes stalls when buffer_size_kb is less
than 12KB"
* tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Fix test_multiple_writes stall
I wasted a couple of hours recently after accidentally adding
a defer() from within a function which itself was called as
part of defer(). This leads to an infinite loop of defer().
Make sure this cannot happen and raise a helpful exception.
I understand that the pair of _ksft_defer_arm() calls may
not be the most Pythonic way to implement this, but it's
easy enough to understand.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260108225257.2684238-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Import utils and refer to the global defer queue that way instead
of importing the queue. This will make it possible to assign value
to the global variable. While at it capitalize the name, to comply
with the Python coding style.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260108225257.2684238-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ksft_variants to parametrise tests in iou-zcrx.py to either use
single queues or RSS contexts, reducing duplication.
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20260108234521.3619621-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PSP responder fails when zero or multiple PSP devices are detected.
There's an option to select the device id to use (-d) but it's
currently not used from the PSP self test. It's also hard to use because
the PSP test doesn't dump the PSP devices so can't choose one.
When zero devices are detected, psp_responder fails which will cause the
parent test to fail as well instead of skipping PSP tests.
Fix both of these problems. Change psp_responder to:
- not fail when no PSP devs are detected.
- get an optional -i ifindex argument instead of -d.
- select the correct PSP dev from the dump corresponding to ifindex or
- select the first PSP dev when -i is not given.
- fail when multiple devs are found and -i is not given.
- warn and continue when the requested ifindex is not found.
Also plumb the ifindex from the Python test.
With these, when there are no PSP devs found or the wrong one is chosen,
psp_responder opens the server socket, listens for control connections
normally, and leaves the skipping of the various test cases which
require a PSP device (~most, but not all of them) to the parent test.
This results in output like:
ok 1 psp.test_case # SKIP No PSP devices found
[...]
ok 12 psp.dev_get_device # SKIP No PSP devices found
ok 13 psp.dev_get_device_bad
ok 14 psp.dev_rotate # SKIP No PSP devices found
[...]
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Link: https://patch.msgid.link/20260109110851.2952906-2-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the last test fails, the other side still completes correctly,
which could lead to false positives.
Let's add a final barrier that ensures that the last test has finished
correctly on both sides, but also that the two sides agree on the
number of tests to be performed.
Fixes: 2f65b44e19 ("VSOCK: add full barrier between test cases")
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260108114419.52747-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rework the AMX test's #NM handling to use kvm_asm_safe() to verify an #NM
actually occurs. As is, a completely missing #NM could go unnoticed.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The host is allowed to set FPU state that includes a disabled
xstate component. Check that this does not cause bad effects.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rework the guest=>host syncs in the AMX test to use named actions instead
of arbitrary, incrementing numbers. The "stage" of the test has no real
meaning, what matters is what action the test wants the host to perform.
The incrementing numbers are somewhat helpful for triaging failures, but
fully debugging failures almost always requires a much deeper dive into
the test (and KVM).
Using named actions not only makes it easier to extend the test without
having to shift all sync point numbers, it makes the code easier to read.
[Commit message by Sean Christopherson]
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recent version of tcpdump (tcpdump-4.99.6-1.fc43.x86_64) seems to have
removed the spurious space after msg type in PTP info, e.g.:
before: PTPv2, majorSdoId: 0x0, msg type : sync msg, length: 44
after: PTPv2, majorSdoId: 0x0, msg type: sync msg, length: 44
Update our patterns to match both.
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Link: https://patch.msgid.link/20260107145320.1837464-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The gro.py test (testing software GRO) is slightly flaky when
running against fbnic. We see one flake per roughly 20 runs in NIPA,
mostly in ipip.large, and always including some EAGAIN:
# Shouldn't coalesce if exceed IP max pkt size: Test succeeded
# Expected {65475 899 }, Total 2 packets
# Received {65475 899 }, Total 2 packets.
# Expected {64576 900 900 }, Total 3 packets
# Received {64576 /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable
The test sends 2 large frames (64k + change). Looks like the default
packet socket rcvbuf (~200kB) may not be large enough to hold them.
Bump the rcvbuf to 1MB.
Add a debug print showing socket statistics to make debugging this
issue easier in the future. Without the rcvbuf increase we see:
# Shouldn't coalesce if exceed IP max pkt size: Test succeeded
# Expected {65475 899 }, Total 2 packets
# Received {65475 899 }, Total 2 packets.
# Expected {64576 900 900 }, Total 3 packets
# Received {64576 Socket stats: packets=7, drops=3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260107232557.2147760-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We see the following failure a few times a week:
# RUN global.data_steal ...
# tls.c:3280:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (10000) == -1 (-1)
# data_steal: Test failed
# FAIL global.data_steal
not ok 8 global.data_steal
The 10000 bytes read suggests that the child process did a recv()
of half of the data using the TLS ULP and we're now getting the
remaining half. The intent of the test is to get the child to
enter _TCP_ recvmsg handler, so it needs to enter the syscall before
parent installed the TLS recvmsg with setsockopt(SOL_TLS).
Instead of the 10msec sleep send 1 byte of data and wait for the
child to consume it.
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260106200205.1593915-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The resctrl selftest currently fails on Hygon CPUs that always supports
non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks
vendor detection for Hygon CPUs, preventing proper identification of
their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to
arch_supports_noncont_cat().
Link: https://lore.kernel.org/r/20251217030456.3834956-5-shenxiaochen@open-hieco.net
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The resctrl selftest currently fails on Hygon CPUs that support Platform
QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include
Hygon's vendor ID.
Link: https://lore.kernel.org/r/20251217030456.3834956-4-shenxiaochen@open-hieco.net
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The CPU vendor IDs are required to be unique bits because they're used
for vendor_specific bitmask in the struct resctrl_test.
Consider for example their usage in test_vendor_specific_check():
return get_vendor() & test->vendor_specific
However, the definitions of CPU vendor IDs in file resctrl.h is quite
subtle as a bitmask value:
#define ARCH_INTEL 1
#define ARCH_AMD 2
A clearer and more maintainable approach is to define these CPU vendor
IDs using BIT(). This ensures each vendor corresponds to a distinct bit
and makes it obvious when adding new vendor IDs.
Accordingly, update the return types of detect_vendor() and get_vendor()
from 'int' to 'unsigned int' to align with their usage as bitmask values
and to prevent potentially risky type conversions.
Furthermore, introduce a bool flag 'initialized' to simplify the
get_vendor() -> detect_vendor() logic. This ensures the vendor ID is
detected only once and resolves the ambiguity of using the same variable
'vendor' both as a value and as a state.
Link: https://lore.kernel.org/r/20251217030456.3834956-3-shenxiaochen@open-hieco.net
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Suggested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Change to adjust effective L3 cache size with SNC enabled change
introduced the snc_nodes_per_l3_cache() function to detect the Intel
Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs
sharing LLC with CPU0. The function was designed to return:
(1) >1: SNC mode is enabled.
(2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually
less than #CPUs in node0. This results in snc_nodes_per_l3_cache()
returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size():
*cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with:
"Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC
mode is not supported on the platform.
Updated commit log to fix commit has issues:
Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20251217030456.3834956-2-shenxiaochen@open-hieco.net
Fixes: a1cd99e700 ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When /sys/kernel/tracing/buffer_size_kb is less than 12KB,
the test_multiple_writes test will stall and wait for more
input due to insufficient buffer space.
Check current buffer_size_kb value before the test. If it is
less than 12KB, it temporarily increase the buffer to 12KB,
and restore the original value after the tests are completed.
Link: https://lore.kernel.org/r/20260109033620.25727-1-fushuai.wang@linux.dev
Fixes: 37f4660138 ("selftests/tracing: Add basic test for trace_marker_raw file")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add a missing close(srv_fd) call, and use EXPECT_EQ() to check the
result.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Fixes: f83d51a5bd ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets")
Link: https://lore.kernel.org/r/20260101134102.25938-2-gnoack3000@gmail.com
[mic: Use EXPECT_EQ() and update commit message]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
memblock_free_pages() currently takes both a struct page * and the
corresponding PFN. The page pointer is always derived from the PFN at
call sites (pfn_to_page(pfn)), making the parameter redundant and also
allowing accidental mismatches between the two arguments.
Simplify the interface by removing the struct page * argument and
deriving the page locally from the PFN, after the deferred struct page
initialization check. This keeps the behavior unchanged while making
the helper harder to misuse.
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Link: https://patch.msgid.link/tencent_F741CE6ECC49EE099736685E60C0DBD4A209@qq.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Add test cases for the validation checks in svm_set_nested_state(), and
allow the test to run with SVM as well as VMX. The SVM test also makes
sure that KVM_SET_NESTED_STATE accepts GIF being set or cleared if
EFER.SVME is cleared, verifying a recently fixed bug where GIF was
incorrectly expected to always be set when EFER.SVME is cleared.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251121204803.991707-5-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
The assert messages do not add much value, so use TEST_ASSERT_EQ(),
which also nicely displays the addresses in hex. While at it, also
assert the values of state->flags.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251121204803.991707-4-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the
"page table entry" makes it unnecessarily difficult to quickly understand
what callers are doing.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add L1 SVM code and generalize the setup code to work for both VMX and
SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1
code, doing some renaming (e.g. EPT -> TDP), and having setup code for
both SVM and VMX in test_dirty_log().
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
According to the APM, NPT walks are treated as user accesses. In
preparation for supporting NPT mappings, set the 'user' bit on NPTs by
adding a mask of bits to always be set on PTEs in kvm_mmu.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Implement nCR3 and NPT initialization functions, similar to the EPT
equivalents, and create common TDP helpers for enablement checking and
initialization. Enable NPT for nested guests by default if the TDP MMU
was initialized, similar to VMX.
Reuse the PTE masks from the main MMU in the NPT MMU, except for the C
and S bits related to confidential VMs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-17-seanjc@google.com
[sean: apply Yosry's fixup for ncr3_gpa]
Signed-off-by: Sean Christopherson <seanjc@google.com>
In preparation for generalizing the nested dirty logging test, checking
if either EPT or NPT is enabled will be needed. To avoid needing to gate
the kvm_cpu_has_ept() call by the CPU type, make sure the function
returns false if VMX is not available instead of trying to read VMX-only
MSRs.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that the functions are no longer VMX-specific, move them to
processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rework tdp_map() and friends to use __virt_pg_map() and drop the custom
EPT code in __tdp_pg_map() and tdp_create_pte(). The EPT code and
__virt_pg_map() are practically identical, the main differences are:
- EPT uses the EPT struct overlay instead of the PTE masks.
- EPT always assumes 4-level EPTs.
To reuse __virt_pg_map(), extend the PTE masks to work with EPT's RWX and
X-only capabilities, and provide a tdp_mmu_init() API so that EPT can pass
in the EPT PTE masks along with the root page level (which is currently
hardcoded to '4').
Don't reuse KVM's insane overloading of the USER bit for EPT_R as there's
no reason to multiplex bits in the selftests, e.g. selftests aren't trying
to shadow guest PTEs and thus don't care about funnelling protections into
a common permissions check.
Another benefit of reusing the code is having separate handling for
upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
large bit on a 4K PTE in the EPTs.
For all intents and purposes, no functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251230230150.4150236-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a stage-2 MMU instance so that architectures that support nested
virtualization (more specifically, nested stage-2 page tables) can create
and track stage-2 page tables for running L2 guests. Plumb the structure
into common code to avoid cyclical dependencies, and to provide some line
of sight to having common APIs for creating stage-2 mappings.
As a bonus, putting the member in common code justifies using stage2_mmu
instead of tdp_mmu for x86.
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The root GPA is now retrieved from the nested MMU, stop passing VMX
metadata. This is in preparation for making these functions work for
NPTs as well.
Opportunistically drop tdp_pg_map() since it's unused.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
prepare_eptp() currently allocates new EPTs for each vCPU. memstress has
its own hack to share the EPTs between vCPUs. Currently, there is no
reason to have separate EPTs for each vCPU, and the complexity is
significant. The only reason it doesn't matter now is because memstress
is the only user with multiple vCPUs.
Add vm_enable_ept() to allocate EPT page tables for an entire VM, and use
it everywhere to replace prepare_eptp(). Drop 'eptp' and 'eptp_hva' from
'struct vmx_pages' as they serve no purpose (e.g. the EPTP can be built
from the PGD), but keep 'eptp_gpa' so that the MMU structure doesn't need
to be passed in along with vmx_pages. Dynamically allocate the TDP MMU
structure to avoid a cyclical dependency between kvm_util_arch.h and
kvm_util.h.
Remove the workaround in memstress to copy the EPT root between vCPUs
since that's now the default behavior.
Name the MMU tdp_mmu instead of e.g. nested_mmu or nested.mmu to avoid
recreating the same mess that KVM has with respect to "nested" MMUs, e.g.
does nested refer to the stage-2 page tables created by L1, or the stage-1
page tables created by L2?
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251230230150.4150236-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping
functions. Introduce helpers to read/write different PTE bits given a
kvm_mmu.
Drop the 'global' bit definition as it's currently unused, but leave the
'user' bit as it will be used in coming changes. Opportunisitcally
rename 'large' to 'huge' as it's more consistent with the kernel naming.
Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and
a lot of other macros depend on it. It's tempting to move all the other
macros to be per-struct instead, but it would be too much noise for
little benefit.
Keep c_bit and s_bit in vm->arch as they used before the MMU is
initialized, through __vmcreate() -> vm_userspace_mem_region_add() ->
vm_mem_add() -> vm_arch_has_protected_memory().
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rename accessors to is_<adjective>_pte()]
Link: https://patch.msgid.link/20251230230150.4150236-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add an arch structure+field in "struct kvm_mmu" so that architectures can
track arch-specific information for a given MMU.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In preparation for generalizing the x86 virt mapping APIs to work with
TDP (stage-2) page tables, plumb "struct kvm_mmu" into all of the helper
functions instead of operating on vm->mmu directly.
Opportunistically swap the order of the check in virt_get_pte() to first
assert that the parent is the PGD, and then check that the PTE is present,
as it makes more sense to check if the parent PTE is the PGD/root (i.e.
not a PTE) before checking that the PTE is PRESENT.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rebase on common kvm_mmu structure, rewrite changelog]
Link: https://patch.msgid.link/20251230230150.4150236-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1
MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for
both stage-1 and stage-2 MMUs, without creating the potential for subtle
bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2
MMU.
Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of
burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in
____vm_create(), and in the hopes that other architectures can utilize the
formalized MMU structure if/when they too support stage-2 page tables.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that
the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code
without bleeding the EPT hack into the common APIs.
While commit 0944442045 ("selftests: kvm: add test for dirty logging
inside nested guests") is _very_ light on details, the most likely
explanation is that vmx_dirty_log_test was attempting to avoid taking an
EPT Violation on the first _write_ from L2.
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
WRITE_ONCE(*a, 1); <===
GUEST_SYNC(true);
...
}
When handling read faults in the shadow MMU, KVM opportunistically creates
a writable SPTE if the mapping can be writable *and* the gPTE is dirty (or
doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept
writes in order to emulate Dirty-bit updates. By setting A/D bits in the
test's EPT entries, the above READ+WRITE will fault only on the read, and
in theory expose the bug fixed by KVM commit 1f4e5fc83a ("KVM: x86: fix
nested guest live migration with PML"). If the Dirty bit is NOT set, the
test will get a false pass due; though again, in theory.
However, the test is flawed (and always was, at least in the versions
posted publicly), as KVM (correctly) marks the corresponding L1 GFN as
dirty (in the dirty bitmap) when creating the writable SPTE. I.e. without
a check on the dirty bitmap after the READ_ONCE(), the check after the
first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having
been updated by the read fault, not by PML.
Furthermore, the subsequent behavior in the test's l2_guest_code()
effectively hides the flawed test behavior, as the straight writes to a
new L2 GPA fault also trigger the KVM bug, and so the test will still
detect the failure due to lack of isolation between the two testcases
(Read=>Write vs. Write=>Write).
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
GUEST_SYNC(false);
Punt on fixing vmx_dirty_log_test for the moment as it will be easier to
properly fix the test once the TDP code uses the common MMU APIs, at which
point it will be trivially easy for the test to retrieve the EPT PTE and
set the Dirty bit as needed.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rewrite changelog to explain the situation]
Link: https://patch.msgid.link/20251230230150.4150236-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace the struct overlay with explicit bitmasks, which is clearer and
less error-prone. See commit f18b4aebe1 ("kvm: selftests: do not use
bitfields larger than 32-bits for PTEs") for an example of why bitfields
are not preferable.
Remove the unused PAGE_SHIFT_4K definition while at it.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename the functions from nested_* to tdp_* to make their purpose
clearer.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
On x86, KVM selftests use memslot 0 for all the default regions used by
the test infrastructure. This is an implementation detail.
nested_map_memslot() is currently used to map the default regions by
explicitly passing slot 0, which leaks the library implementation into
the caller.
Rename the function to a very verbose
nested_identity_map_default_memslots() to reflect what it actually does.
Add an assertion that only memslot 0 is being used so that the
implementation does not change from under us.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The function is only used in processor.c, drop the declaration in
processor.h and make it static.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The function get_desc64_base() performs a series of bitwise left shifts on
fields of various sizes. More specifically, when performing '<< 24' on
'desc->base2' (which is a u8), 'base2' is promoted to a signed integer
before shifting.
In a scenario where base2 >= 0x80, the shift places a 1 into bit 31,
causing the 32-bit intermediate value to become negative. When this
result is cast to uint64_t or ORed into the return value, sign extension
occurs, corrupting the upper 32 bits of the address (base3).
Example:
Given:
base0 = 0x5000
base1 = 0xd6
base2 = 0xf8
base3 = 0xfffffe7c
Expected return: 0xfffffe7cf8d65000
Actual return: 0xfffffffff8d65000
Fix this by explicitly casting the fields to 'uint64_t' before shifting
to prevent sign extension.
Signed-off-by: MJ Pooladkhay <mj@pooladkhay.com>
Link: https://patch.msgid.link/20251222174207.107331-1-mj@pooladkhay.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a few extra TPR / CR8 tests to x86's xapic_state_test to see if:
* TPR is 0 on reset,
* TPR, PPR and CR8 are equal inside the guest,
* TPR and CR8 read equal by the host after a VMExit
* TPR borderline values set by the host correctly mask interrupts in the
guest.
These hopefully will catch the most obvious cases of improper TPR sync or
interrupt masking.
Do these tests both in x2APIC and xAPIC modes.
The x2APIC mode uses SELF_IPI register to trigger interrupts to give it a
bit of exercise too.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
[sean: put code in separate test]
Link: https://patch.msgid.link/20251205224937.428122-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Current release - fix to a fix:
- net: do not write to msg_get_inq in callee
- arp: do not assume dev_hard_header() does not change skb->head
Current release - regressions:
- wifi: mac80211: don't iterate not running interfaces
- eth: mlx5: fix NULL pointer dereference in ioctl module EEPROM
Current release - new code bugs:
- eth: bnge: add AUXILIARY_BUS to Kconfig dependencies
Previous releases - regressions:
- eth: mlx5: dealloc forgotten PSP RX modify header
Previous releases - always broken:
- ping: fix ICMP out SNMP stats double-counting with ICMP sockets
- bonding: preserve NETIF_F_ALL_FOR_ALL across TSO updates
- bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress
- eth: bnxt: fix potential data corruption with HW GRO/LRO
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmlf6N8ACgkQMUZtbf5S
Irt9lxAAote4zPojTEJvE1DkABVsx31gPm6DF8zEae39XsFKtOMfb876s5bchOUA
Rkd27+k28l/U5HFrrhUhqDlaOikjXx6baT12qTPsuxGrOL6Um23EfIgVuNFzxMwE
lcBgkSNe1tfT8WBirWEVWLc5xme+vvll7ViX7kkQJ7fEdk1mvTPZ5roIq1+pE1U0
V5Gu4l9QBcWC/IymAO8Z2UE08terMcYt1G4H6mSIoooeMM1QElbPwVEiRWAzJ/NP
9cTjvnHJDAdRnA4bMa76CGWxg4wgPhgj3+ydlouWjgJADL6hlMj4sIZxaXgjDuoE
XyCEuk6Y/rUTSmX1yn7rha9FQwJOyMu9XlEjnNSvH0LRdnSa7xO7NzeXtrWv7HSg
kQOMTnMVgVlabOuMbR6xNqY6UyulQgK/2E56RgOO4Iw6U7crZsbyZx3OFkIKhq8g
ZWaRBQNdYBBftjJA7FwQSyj/K75sLfbYAS5YizguNyFPBCBhSBJdgFWoGb+XhT0/
k0KwsX/NWN0apHmZNiD4mT/UdX2PhJRdiTWPczNyEzqJcxh1P2HMWHVZOsIyZQHT
EK3w6LLqp1eshEERPqFsqCFYX4LUuifQbPPF0kBkL1hTMa2NuXnkIsOzbcWal88o
qdej9TY9VC5ycabgDI4/9ZNhnAzho1Nk/e6YuHsBcu2pIVDKgNI=
=vR9r
-----END PGP SIGNATURE-----
Merge tag 'net-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and wireless.
Current release - fix to a fix:
- net: do not write to msg_get_inq in callee
- arp: do not assume dev_hard_header() does not change skb->head
Current release - regressions:
- wifi: mac80211: don't iterate not running interfaces
- eth: mlx5: fix NULL pointer dereference in ioctl module EEPROM
Current release - new code bugs:
- eth: bnge: add AUXILIARY_BUS to Kconfig dependencies
Previous releases - regressions:
- eth: mlx5: dealloc forgotten PSP RX modify header
Previous releases - always broken:
- ping: fix ICMP out SNMP stats double-counting with ICMP sockets
- bonding: preserve NETIF_F_ALL_FOR_ALL across TSO updates
- bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress
- eth: bnxt: fix potential data corruption with HW GRO/LRO"
* tag 'net-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
arp: do not assume dev_hard_header() does not change skb->head
net: enetc: fix build warning when PAGE_SIZE is greater than 128K
atm: Fix dma_free_coherent() size
tools: ynl: don't install tests
net: do not write to msg_get_inq in callee
bnxt_en: Fix NULL pointer crash in bnxt_ptp_enable during error cleanup
net: usb: pegasus: fix memory leak in update_eth_regs_async()
net: 3com: 3c59x: fix possible null dereference in vortex_probe1()
net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset
wifi: mac80211: collect station statistics earlier when disconnect
wifi: mac80211: restore non-chanctx injection behaviour
wifi: mac80211_hwsim: disable BHs for hwsim_radio_lock
wifi: mac80211: don't iterate not running interfaces
wifi: mac80211_hwsim: fix typo in frequency notification
wifi: avoid kernel-infoleak from struct iw_point
net: airoha: Fix schedule while atomic in airoha_ppe_deinit()
selftests: netdevsim: add carrier state consistency test
net: netdevsim: fix inconsistent carrier state after link/unlink
selftests: drv-net: Bring back tool() to driver __init__s
net/sched: act_api: avoid dereferencing ERR_PTR in tcf_idrinfo_destroy
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmlfxBEACgkQpmLzj2vt
YEl21w//SDcIehN5GWJn6pMKoS8mYj+S17vo9BqQ9S68dQbhTww8D1jZ9k50+xbU
7YLzwiMpDEex8omY9zs5EYiEBf6RCMSb/y5dGt8eyqLj+cHjqmmUK+4yjq/mKTjz
SWh7A9A/WqFVbiQz43wOqu/kCBg10cNxKuOXmZm9LknJSb5PJRd9bd9/HGKWFDg2
VstoZC/n/+mQoMsvDvMtbOJw/iLSGr/wDkhFkIzSbHdCnRKRmCwBOrjGonwzPOet
8c0kNSZt9kpzhZCmOXTDWhGDr+fpaxWtkpX+WkDZ7Guxjb/4wRkLEMDqs77WRwPx
cI1ko1ug3oPv0e0dqG1+2VQ1nJ5h6Q83gp/l+R9H4P6aT874IAcmpGoXa04u82ob
vlkOzfQjBb8qe/O4+KG9lj6Wclr3m3nvZ07vfjQRCnizwnAo5z+NjZqWJPJcuoDf
7KInyQI2+2KHWqLmGtAXbXfK6xb7Kcxx9ysKcN32kWSubf6FBg0GYclhlAJyqRiC
/Z1yPNFWroJYChNN3p4IIh4hr2SCMj/Qzil456/JeXjt/3Gqbg+DLmgY8IgUng3g
M6OTMvXR1UP9i73mb0Z22oN0x4s4Ijysjnj4X2AhL+K+lCrZBJ84j4lt+zY/N39C
IC5pqbvPKZAvZ5iSnz07g9QQ0CbDCMDzst96jigiMN/aTtx6mm8=
=s/hE
-----END PGP SIGNATURE-----
Merge tag 'hid-for-linus-2026010801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- build fix for HID-BPF (Benjamin Tissoires)
- fix for potential buffer overflow in i2c-hid (Kwok Kin Ming)
- a couple of selftests/hid fixes (Peter Hutterer)
- fix for handling pressure pads in hid-multitouch (Peter Hutterer)
- fix for potential NULL pointer dereference in intel-thc-hid (Even Xu)
- fix for interrupt delay control in intel-thc-hid (Even Xu)
- fix finger release detection on some VTL-class touchpads (DaytonCL)
- fix for correct enumeration on intel-ish-hid systems with no sensors
(Zhang Lixu)
- assorted device ID additions and device-specific quirks
* tag 'hid-for-linus-2026010801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (21 commits)
HID: logitech: add HID++ support for Logitech MX Anywhere 3S
HID: Elecom: Add support for ELECOM M-XT3DRBK (018C)
HID: quirks: work around VID/PID conflict for appledisplay
HID: Apply quirk HID_QUIRK_ALWAYS_POLL to Edifier QR30 (2d99:a101)
HID: i2c-hid: fix potential buffer overflow in i2c_hid_get_report()
selftests/hid: add a test for the Digitizer/Button Type pressurepad
selftests/hid: use a enum class for the different button types
selftests/hid: require hidtools 0.12
HID: multitouch: set INPUT_PROP_PRESSUREPAD based on Digitizer/Button Type
HID: quirks: Add another Chicony HP 5MP Cameras to hid_ignore_list
HID: Intel-thc-hid: Intel-thc: Add safety check for reading DMA buffer
hid: intel-thc-hid: Select SGL_ALLOC
selftests/hid: fix bpf compilations due to -fms-extensions
HID: bpf: fix bpf compilation with -fms-extensions
HID: Intel-thc-hid: Intel-thc: Fix wrong register reading
HID: multitouch: add MT_QUIRK_STICKY_FINGERS to MT_CLS_VTL
HID: intel-ish-hid: Reset enum_devices_done before enumeration
HID: intel-ish-hid: Update ishtp bus match to support device ID table
HID: Intel-thc-hid: Intel-thc: fix dma_unmap_sg() nents value
HID: playstation: Center initial joystick axes to prevent spurious events
...
Add option to enable slow workload type hints. User can specify
"slow" as the command line argument to enable slow workload type hints.
There are two slow workload type hints: "power" and "performance".
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251218222559.4110027-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If no kunitconfig_paths are passed to LinuxSourceTree() it falls back to
DEFAULT_KUNITCONFIG_PATH. This resolution only works when the current
working directory is the root of the source tree. This works by chance
when running the full testsuite through the default unittest runner, as
some tests will change the current working directory as a side-effect of
'kunit.main()'. When running a single testcase or using pytest, which
resets the working directory for each test, this assumption breaks.
Explicitly specify an empty kunitconfig for the affected tests.
Link: https://lore.kernel.org/r/20260107015936.2316047-2-davidgow@google.com
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Running the KUnit testsuite through pytest fails, as the function
test_data_path() is recognized as a test function. Its execution fails
as pytest tries to resolve the 'path' argument as a fixture which does
not exist.
Rename the function, so the helper function is not incorrectly
recognized as a test function.
Link: https://lore.kernel.org/r/20260107015936.2316047-1-davidgow@google.com
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We have to resort to a bit of a hack: python-libevdev gets the
properties from libevdev at module init time. If libevdev hasn't been
rebuilt with the new property it won't be automatically populated. So we
hack around this by constructing the property manually.
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Instead of multiple spellings of a string-provided argument, let's make
this a tad more type-safe and use an enum here.
And while we do this fix the two wrong devices:
- elan_04f3_313a (HP ZBook Fury 15) is discrete button pad
- dell_044e_1220 (Dell Precision 7740) is a discrete button pad
Equivalent hid-tools commit
8300a55bf4
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Not all our tests really require it but since it's likely pip-installed
anyway it's trivial to require the new version, just in case we want to
start cleaning up other bits.
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Similar to commit 835a507535 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f4 ("bpftool: Fix build warnings
due to MS extensions")
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct slab {
..
struct freelist_counters;
};
Use -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
GCC insists in placing attributes before the declarators in function
declarations. Now that GCC supports btf_decl_tag and therefore __tag1
and __tag2 expand to actual attributes, the compiler is complaining
about it for
static __noinline int foo(int x __tag1 __tag2) __tag1 __tag2
progs/test_btf_decl_tag.c:36:1: error: attributes should be specified \
before the declarator in a function definition
This patch simply places the tags before the declarator.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260106173650.18191-3-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
GCC 16 has changed the semantics of -Wunused-but-set-variable, as well
as introducing new options -Wunused-but-set-variable={0,1,2,3} to
adjust the level of support.
One of the changes is that GCC now treats 'sum += 1' and 'sum++' as
non-usage, whereas clang (and GCC < 16) considers the first as usage
and the second as non-usage, which is sort of inconsistent.
The GCC 16 -Wunused-but-set-variable=2 option implements the previous
semantics of -Wunused-but-set-variable, but since it is a new option,
it cannot be used unconditionally for forward-compatibility, just for
backwards-compatibility.
So this patch adds pragmas to the two self-tests impacted by this,
progs/free_timer.c and progs/rcu_read_lock.c, to make gcc to ignore
-Wunused-but-set-variable warnings when compiling them with GCC > 15.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44677#c25 for details
on why this regression got introduced in GCC upstream.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260106173650.18191-2-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test coverage for the new BPF_F_CPU and BPF_F_ALL_CPUS flags support
in percpu maps. The following APIs are exercised:
* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem()
* bpf_map__update_elem()
* bpf_map_lookup_elem_flags()
* bpf_map__lookup_elem()
For lru_percpu_hash map, set max_entries to
'libbpf_num_possible_cpus() + 1' and only use the first
'libbpf_num_possible_cpus()' entries. This ensures a spare entry is always
available in the LRU free list, avoiding eviction.
When updating an existing key in lru_percpu_hash map:
1. l_new = prealloc_lru_pop(); /* Borrow from free list */
2. l_old = lookup_elem_raw(); /* Found, key exists */
3. pcpu_copy_value(); /* In-place update */
4. bpf_lru_push_free(); /* Return l_new to free list */
Also add negative tests to verify that non-percpu array and hash maps
reject the BPF_F_CPU and BPF_F_ALL_CPUS flags.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260107022022.12843-8-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit adds a test case for netdevsim carrier state consistency.
Specifically, the added test verifies the carrier state during the
following operations:
1. Unlink two netdevsims
2. ifdown one netdevsim, then ifup again
3. Link the netdevsims again
4. ifdown one netdevsim, then ifup again
These steps verifies that the carrier is UP iff two netdevsims are
linked and ifuped.
Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/481e2729e53b6074ebfc0ad85764d8feb244de8c.1767624906.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The pp_alloc_fail.py test (which doesn't run in NIPA CI?) uses tool, add
back the import.
Resolves:
ImportError: cannot import name 'tool' from 'lib.py'
Fixes: 68a052239f ("selftests: drv-net: update remaining Python init files")
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260105163319.47619-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce minimal tests. These can serve as simple illustrative
examples, and as templates when writing new tests.
When adding new cases, it can be easier to extend an existing base
test rather than start from scratch. The existing tests all focus on
real, often non-trivial, features. It is not obvious which to take as
starting point, and arguably none really qualify.
Add two tests
- the client test performs the active open and initial close
- the server test implements the passive open and final close
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260105172529.3514786-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test currently SKIPs if the symmetric RSS xfrm is not enabled
by default. This leads to spurious SKIPs in the Intel CI reporting
results to NIPA.
Testing on CX7:
# ./drivers/net/hw/rss_input_xfrm.py
TAP version 13
1..2
ok 1 rss_input_xfrm.test_rss_input_xfrm_ipv4 # SKIP Test requires IPv4 connectivity
# Sym input xfrm already enabled: {'sym-or-xor'}
ok 2 rss_input_xfrm.test_rss_input_xfrm_ipv6
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0
# ethtool -X eth0 xfrm none
# ./drivers/net/hw/rss_input_xfrm.py
TAP version 13
1..2
ok 1 rss_input_xfrm.test_rss_input_xfrm_ipv4 # SKIP Test requires IPv4 connectivity
# Sym input xfrm configured: {'sym-or-xor'}
ok 2 rss_input_xfrm.test_rss_input_xfrm_ipv6
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0
Link: https://patch.msgid.link/20260104184600.795280-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IPv6 addresses with the same scope are returned in reverse insertion
order, unlike IPv4. For example, when adding a -> b -> c, the list is
reported as c -> b -> a, while IPv4 preserves the original order.
This behavior causes:
a. When using `ip -6 a save` and `ip -6 a restore`, addresses are restored
in the opposite order from which they were saved. See example below
showing addresses added as 1::1, 1::2, 1::3 but displayed and saved
in reverse order.
# ip -6 a a 1::1 dev x
# ip -6 a a 1::2 dev x
# ip -6 a a 1::3 dev x
# ip -6 a s dev x
2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
inet6 1::3/128 scope global tentative
valid_lft forever preferred_lft forever
inet6 1::2/128 scope global tentative
valid_lft forever preferred_lft forever
inet6 1::1/128 scope global tentative
valid_lft forever preferred_lft forever
# ip -6 a save > dump
# ip -6 a d 1::1 dev x
# ip -6 a d 1::2 dev x
# ip -6 a d 1::3 dev x
# ip a d ::1 dev lo
# ip a restore < dump
# ip -6 a s dev x
2: x: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
inet6 1::1/128 scope global tentative
valid_lft forever preferred_lft forever
inet6 1::2/128 scope global tentative
valid_lft forever preferred_lft forever
inet6 1::3/128 scope global tentative
valid_lft forever preferred_lft forever
# ip a showdump < dump
if1:
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
if2:
inet6 1::3/128 scope global tentative
valid_lft forever preferred_lft forever
if2:
inet6 1::2/128 scope global tentative
valid_lft forever preferred_lft forever
if2:
inet6 1::1/128 scope global tentative
valid_lft forever preferred_lft forever
b. Addresses in pasta to appear in reversed order compared to host
addresses.
The ipv6 addresses were added in reverse order by commit e55ffac601
("[IPV6]: order addresses by scope"), then it was changed by commit
502a2ffd73 ("ipv6: convert idev_list to list macros"), and restored by
commit b54c9b98bb ("ipv6: Preserve pervious behavior in
ipv6_link_dev_addr()."). However, this reverse ordering within the same
scope causes inconsistency with IPv4 and the issues described above.
This patch aligns IPv6 address ordering with IPv4 for consistency
by changing the comparison from >= to > when inserting addresses
into the address list. Also updates the ioam6 selftest to reflect
the new address ordering behavior. Combine these two changes into
one patch for bisectability.
Link: https://bugs.passt.top/show_bug.cgi?id=175
Suggested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Acked-by: Justin Iurman <justin.iurman@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260104032357.38555-1-yuhuang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the selftest to check that the metadata size check takes the
xdp_frame size into account in bpf_prog_test_run. The original
check (for meta size 256) was broken because the data frame supplied was
smaller than this, triggering a different EINVAL return. So supply a
larger data frame for this test to make sure we actually exercise the
check we think we are.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since LLVM commit 39e30508a7f6 ("[Driver][Sparc] Default to -mcpu=v9 for
32-bit Linux/sparc64 (#109278)"), clang defaults to -mcpu=v9 for 32-bit
SPARC builds. -mcpu=v9 generates instructions which are not recognized
by qemu-sparc and qemu-system-sparc.
Explicitly enforce -mcpu=v8 to generate compatible code.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260106-nolibc-sparc32-fix-v2-1-7c5cd6b175c2@weissschuh.net
This logic was added in commit 850fad7de8 ("selftests/nolibc: allow
test -include /path/to/nolibc.h") to allow the testing of -include
/path/to/nolibc.h. As it requires as special variable to activate, this
code is nearly never used. Furthermore it complicates the logic a bit.
Since commit a6a054c8ad ("tools/nolibc: add target to check header
usability") and commit 443c6467fc ("selftests/nolibc: always run
nolibc header check") the usability of -include /path/to/nolibc.h is
always checked anyways, making NOLIBC_SYSROOT=0 pointless.
Drop the special logic.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260104-nolibc-nolibc_sysroot-v1-1-98025ad99add@weissschuh.net
Keeping 'struct timespec' and 'struct __kernel_timespec' compatible
allows the source code to stay simple.
Validate that the types stay compatible.
The test is specific to nolibc and does not compile on other libcs, so
skip it there.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251220-nolibc-uapi-types-v3-10-c662992f75d7@weissschuh.net
Compiler reports potential uses of uninitialized variables in
mptcp_connect.c when xerror() is called from failure paths.
mptcp_connect.c:1262:11: warning: variable 'raw_addr' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
xerror() terminates execution by calling exit(), but it is not visible
to the compiler & assumes control flow may continue past the call.
Annotate xerror() with __noreturn so the compiler can correctly reason
about control flow and avoid false-positive uninitialized variable
warnings.
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20260101172840.90186-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a basic config to run kunit tests on 32-bit big endian ARM.
Link: https://lore.kernel.org/r/20260102-kunit-armeb-v1-1-e8e5475d735c@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
If a subtest itself reports success, but the outer testcase fails,
the whole testcase should be reported as a failure. However the status
is recalculated based on the test counts, overwriting the outer test
result. Synthesize a failed test in this case to make sure the failure
is not swallowed.
Link: https://lore.kernel.org/r/20251230-kunit-nested-failure-v1-2-98cfbeb87823@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently there is a lack of tests validating the result reporting from
nested tests. Add one, it will also be used to validate upcoming changes
to the nested test parsing.
Link: https://lore.kernel.org/r/20251230-kunit-nested-failure-v1-1-98cfbeb87823@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, kunit.py ignores the KBUILD_OUTPUT env variable and always
defaults to .kunit in the working directory. This behavior is inconsistent
with standard Kbuild behavior, where KBUILD_OUTPUT defines the build
artifact location.
This patch modifies kunit.py to respect KBUILD_OUTPUT if set. A .kunit
subdirectory is created inside KBUILD_OUTPUT to avoid polluting the build
directory.
Link: https://lore.kernel.org/r/20260106-kunit-kbuild_output-v2-1-582281797343@gmail.com
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The top level kselftest Makefile supports an option FORCE_TARGETS which
causes any failures during the build to be propagated to the exit status
of the top level make, useful during build testing. Currently the recursion
done by the arm64 selftests ignores this option, meaning arm64 failures are
not reported via this mechanism. Add the logic to implement FORCE_TARGETS
so that it works for arm64.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
Unlike the cxl_pci class driver that opportunistically enables memory
expansion with no other dependent functionality, CXL accelerator drivers
have distinct PCIe-only and CXL-enhanced operation states. If CXL is
available some additional coherent memory/cache operations can be enabled,
otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
This constitutes a new mode of operation where the caller of
devm_cxl_add_memdev() wants to make a "go/no-go" decision about running
in CXL accelerated mode or falling back to PCIe-only operation. Part of
that decision making process likely also includes additional
CXL-acceleration-specific resource setup. Encapsulate both of those
requirements into 'struct cxl_memdev_attach' that provides a ->probe()
callback. The probe callback runs in cxl_mem_probe() context, after the
port topology is successfully attached for the given memdev. It supports
a contract where, upon successful return from devm_cxl_add_memdev(),
everything needed for CXL accelerated operation has been enabled.
Additionally the presence of @cxlmd->attach indicates that the accelerator
driver be detached when CXL operation ends. This conceptually makes a CXL
link loss event mirror a PCIe link loss event which results in triggering
the ->remove() callback of affected devices+drivers. A driver can re-attach
to recover back to PCIe-only operation. Live recovery, i.e. without a
->remove()/->probe() cycle, is left as a future consideration.
[ dj: Repalce with updated commit log from Dan ]
Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251216005616.3090129-7-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
In all cases the device that created the 'struct cxl_dev_state' instance is
also the device to host the devm cleanup of devm_cxl_add_memdev(). This
simplifies the function prototype, and limits a degree of freedom of the
API.
Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Link: https://patch.msgid.link/20251216005616.3090129-6-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
* rcu-torture.20260104a:
rcutorture: Add --kill-previous option to terminate previous kvm.sh runs
rcutorture: Prevent concurrent kvm.sh runs on same source tree
torture: Include commit discription in testid.txt
torture: Make config2csv.sh properly handle comments in .boot files
torture: Make kvm-series.sh give run numbers and totals
torture: Make kvm-series.sh give build numbers and totals
torture: Parallelize kvm-series.sh guest-OS execution
rcutorture: Add context checks to rcu_torture_timer()
When kvm.sh is killed, its child processes (make, gcc, qemu, etc.) may
continue running. This prevents new kvm.sh instances from starting even
though the parent is gone.
Add a --kill-previous option that uses fuser(1) to terminate all
processes holding the flock file before attempting to acquire it. This
provides a clean way to recover from stale/zombie kvm.sh runs which
sometimes may have lots of qemu and compiler processes still disturbing.
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Add flock-based locking to kvm.sh to prevent multiple instances from
running concurrently on the same source tree. This prevents build
failures caused by one instance's "make clean" deleting generated files
while another instance is building causing build failures.
The lock file is placed in the rcutorture directory and added to
.gitignore.
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Add ptrace support, as it will be useful in UML.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
[Thomas: drop va_args usage and linux/uio.h inclusion]
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
-- Fix for build failures in tests that use an empty FIXTURE() seen in
Android's build environment, which uses -D_FORTIFY_SOURCE=3), a build
failure occurs in tests that use an empty FIXTURE().
-- Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920 board
resulting from including transient trace file name in checksum compare.
-- Fix to remove available_events requirement from toplevel-enable for
instance as it isn't a valid requirement for this test.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmlX++YACgkQCwJExA0N
QxyWOQ/+M/BWTZxNKpTVfdy3ouQC96u540/Aeu8F0muP8dL2kEPVbcv9kPcEdMR6
uWoE5IRK1DYj1mKHU81VsW338IEbWWLQVDiPc+6fIoW2WoLAKbxj7IpOxopgk1tA
gtFkXGwxkKTioNLtdCVmsMAcb+DRuroIpKNngIs0vn/yrZyR0ovuw8YuAAzBdXFM
KaKMcDhEadbeRs9yLa3UTDHYS7y+7a+1ZvoUr5gM8L9rvNIGjnUpacXVNpdoscBw
zQnd9Y0dWEKvjsCW9HGJVlAhNHm5agyL2omF5gjQBd7GQ9c+8udKvRUZ4HSpLHMd
MGT5aQsvw4c+iDlkaI0oFitPN1HGkR5rDzrwrFnOEYZ+aZqs+5mCNpyS0vB4De77
uLh1/AoO0dZ+tINQoGT7T4nLz5YYTBZBdTuuTNU292nDtvzegD+N82J8K/qt+Rcp
+dYJQ5QUsJvhQXjgiO7EGXRt+p3Z+b4T9vyQbs0+jb0nXlLTfIZbpSAoKNFkVpIy
l0G4f9zQf7DhEWghPh2lfwMZVH9FyBlEe9JkQfVQ1765Bd4mt3CM48o2so6op8cU
N1SXSYKhwqXXH5HZBGIwDKsd+d3wB0JqsaGIt8NG0os/jpKzXtCVRZ+W9RZwukhx
egk/kt51p3Gg+4wzFn5l0Dg0q62zfbWNecmdnBkTQMFceKUc5Ls=
=UYYR
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- Fix for build failures in tests that use an empty FIXTURE() seen in
Android's build environment, which uses -D_FORTIFY_SOURCE=3, a build
failure occurs in tests that use an empty FIXTURE()
- Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920
board resulting from including transient trace file name in checksum
compare
- Fix to remove available_events requirement from toplevel-enable for
instance as it isn't a valid requirement for this test
* tag 'linux_kselftest-fixes-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest/harness: Use helper to avoid zero-size memset warning
selftests/ftrace: Test toplevel-enable for instance
selftests/ftrace: traceonoff_triggers: strip off names
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmlX7MMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvuYEACG0VFYmcqmB4JZygecJB3xaxhbVIrCbjFv
Vmc0XNTkcCpjYAv1jpkS5F3nkJhzZlFNn9xOaP/O8E+6tSctFIre7qjMRpxZM3yl
GA+MqPI+zNbpYMgsoAH/XTASTVfaTEPOlaoAPQeo8Ey3JRw3Ko1IDNU7zIYK94Xl
rSAeT65W7vJ+HBjctBoCZYMsE2x0Sn0yrVctkL1mMusQwIg6oMhJ1w1p36P17Mc1
YgLWQYtfK+eogdTM0Jh9RvDtVJL3WT1I2Ii3KBdCgryY7iSxFXvM0pm1lrOBH+kI
4bKHTylBnjfmxv7dlz3jHwRmahwdXDk7rpq1EMPygDSj835h3SgAFz3rm9nCUjNI
xWyEZeN6z4ykdOlqJ6ghTnZTroRdM/12HbSV46n69tczxepG3Mn1i3gBd4UQhn5T
z6aqa7akIsynlzOnLgrwQjxgVhtfAHptrgAg7g7Kz9hq9xTAEPc2f9Nq7glmLP6f
wPMoy2lla69vk4Tlzh8TZpTHRPLYLHTtL5OQPM6dnyQ6MzWm2/PHJ/MNfV7/o+VR
W61BYXUz6d2q81c/I16VWVQvJ0nUa3v7hUGCLUeimQUg+ulyIlMX4wrOI7iYTFTy
V/4c3DHKEh9y/ptmCgv0jDZdwSoUYvXkn0vFe0fcF3q/T7xea4dok8mcXLcKhMuc
xPFtx92dhQ==
=4NB3
-----END PGP SIGNATURE-----
Merge tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Scan partition tables asynchronously for ublk, similarly to how nvme
does it. This avoids potential deadlocks, which is why nvme does it
that way too. Includes a set of selftests as well.
- MD pull request via Yu:
- Fix null-pointer dereference in raid5 sysfs group_thread_cnt
store (Tuo Li)
- Fix possible mempool corruption during raid1 raid_disks update
via sysfs (FengWei Shih)
- Fix logical_block_size configuration being overwritten during
super_1_validate() (Li Nan)
- Fix forward incompatibility with configurable logical block size:
arrays assembled on new kernels could not be assembled on older
kernels (v6.18 and before) due to non-zero reserved pad rejection
(Li Nan)
- Fix static checker warning about iterator not incremented (Li Nan)
- Skip CPU offlining notifications on unmapped hardware queues
- bfq-iosched block stats fix
- Fix outdated comment in bfq-iosched
* tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block, bfq: update outdated comment
blk-mq: skip CPU offline notify on unmapped hctx
selftests/ublk: fix Makefile to rebuild on header changes
selftests/ublk: add test for async partition scan
ublk: scan partition in async way
block,bfq: fix aux stat accumulation destination
md: Fix forward incompatibility from configurable logical block size
md: Fix logical_block_size configuration being overwritten
md: suspend array while updating raid_disks via sysfs
md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
md: Fix static checker warning in analyze_sbs
With trusted args now being the default, passing NULL to kfunc
parameters that are pointers causes verifier rejection rather than a
runtime error. The test_bpf_nf test was failing because it attempted to
pass NULL to bpf_xdp_ct_lookup() to verify runtime error handling.
Since the NULL check now happens at verification time, remove the
runtime test case that passed NULL to the bpf_tuple parameter and
instead add verification-time tests to ensure the verifier correctly
rejects programs that pass NULL to trusted arguments.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-11-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The cgroup_hierarchical_stats selftests uses an fentry program attached
to cgroup_attach_task and then passes the received &dst_cgrp->self to
the css_rstat_updated() kfunc. The verifier now assumes that all kfuncs
only takes trusted pointer arguments, and pointers received by fentry
are not marked trustes by default.
Use a tp_btf program in place for fentry for this test, pointers
received by tp_btf programs are marked trusted by the verifier.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-10-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As verifier now assumes that all kfuncs only takes trusted pointer
arguments, passing 0 (NULL) to a kfunc that doesn't mark the argument as
__nullable or __opt will be rejected with a failure message of: Possibly
NULL pointer passed to trusted arg<n>
Pass a non-null value to the kfunc to test the expected failure mode.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-9-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The rbtree_api_use_unchecked_remove_retval() selftest passes a pointer
received from bpf_rbtree_remove() to bpf_rbtree_add() without checking
for NULL, this was earlier caught by __check_ptr_off_reg() in the
verifier. Now the verifier assumes every kfunc only takes trusted pointer
arguments, so it catches this NULL pointer earlier in the path and
provides a more accurate failure message.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-8-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With trusted args now being the default, the NULL pointer check runs
before type-specific validation. Update test3 to expect the new error
message "Possibly NULL pointer passed to trusted arg0" instead of the
old dynptr-specific error message.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
without 'netfilter: nft_set_pipapo: fix range overlap detection':
reject overlapping range on add 0s [FAIL]
Returned success for add { 1.2.3.4 . 1.2.4.1-1.2.4.2 } given set:
table inet filter {
[..]
elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
1.2.3.0-1.2.3.4 . 1.2.4.2 counter packets 0 bytes 0 }
}
The element collides with existing ones and was not added, but kernel
returned success to userspace.
Signed-off-by: Florian Westphal <fw@strlen.de>
Currently, the testid.txt file in the top-level directory of the
rcutorture results contains the output of "git rev-parse HEAD", which
just gives the full SHA-1 of the current commit. This is followed by
the output of "git status", which is further followed by the output of
"git diff". This works, but is less than helpful to human readers
scanning a list of commits.
This commit therefore instead uses "git show --oneline --no-patch HEAD",
which provides a short SHA-1, but also the names of any branches and
the commit's title.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
As in strip the "#" and everything after it and *then* tokenize.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
The kvm-series.sh script can easily be convinced to run on the order of
1,000 guest OSes, so some sort of progress indicator would be helpful.
This commit therefore updates the "Starting" output lines to read as in
the following example, adding the ("3 of 4"):
Starting TREE02/1.7e0ad1b49057 using 8 CPUs (4 of 4) Sat Nov 8 10:51:06 PM PST 2025
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
The kvm-series.sh script can easily be convinced to do on the order
of 1,000 builds, so some sort of progress indicator would be helpful.
This commit therefore updates the "Starting" output lines to read
as in the following example, adding the ("2 of 4"):
Starting TREE01/1.7e0ad1b49057 (2 of 4) at Sat Nov 8 10:08:21 PM PST 2025
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Currently, kvm-series.sh builds and runs serially, which makes for
long execution times. This commit changes its logic to build all of
the needed kernels serially, but then run the corresponding guest OSes
concurrently in batches using the entire machine. On large systems,
this results in order-of-magnitude speedups of the guest-OS execution
portion of the runtime.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Now that RCU Tasks Trace has been re-implemented in terms of SRCU-fast,
the ->trc_ipi_to_cpu, ->trc_blkd_cpu, ->trc_blkd_node, ->trc_holdout_list,
and ->trc_reader_special task_struct fields are no longer used.
In addition, the rcu_tasks_trace_qs(), rcu_tasks_trace_qs_blkd(),
exit_tasks_rcu_finish_trace(), and rcu_spawn_tasks_trace_kthread(),
show_rcu_tasks_trace_gp_kthread(), rcu_tasks_trace_get_gp_data(),
rcu_tasks_trace_torture_stats_print(), and get_rcu_tasks_trace_gp_kthread()
functions and all the other functions that they invoke are no longer used.
Also, the TRC_NEED_QS and TRC_NEED_QS_CHECKED CPP macros are no longer used.
Neither are the rcu_tasks_trace_lazy_ms and rcu_task_ipi_delay rcupdate
module parameters and the TASKS_TRACE_RCU_READ_MB Kconfig option.
This commit therefore removes all of them.
[ paulmck: Apply Alexei Starovoitov feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
The order of the variables in the printf() doesn't match the text and
therefore veristat prints something like this:
Done. Processed 24 files, 0 programs. Skipped 62 files, 0 programs.
When it should print:
Done. Processed 24 files, 62 programs. Skipped 0 files, 0 programs.
Fix the order of variables in the printf() call.
Fixes: 518fee8bfa ("selftests/bpf: make veristat skip non-BPF and failing-to-open BPF objects")
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251231221052.759396-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When building kselftests with a toolchain that enables source
fortification (e.g., Android's build environment, which uses
-D_FORTIFY_SOURCE=3), a build failure occurs in tests that use an
empty FIXTURE().
The root cause is that an empty fixture struct results in
`sizeof(self_private)` evaluating to 0. The compiler's fortification
checks then detect the `memset()` call with a compile-time constant size
of 0, issuing a `-Wuser-defined-warnings` which is promoted to an error
by `-Werror`.
An initial attempt to guard the call with `if (sizeof(self_private) > 0)`
was insufficient. The compiler's static analysis is aggressive enough
to flag the `memset(..., 0)` pattern before evaluating the conditional,
thus still triggering the error.
To resolve this robustly, this change introduces a `static inline`
helper function, `__kselftest_memset_safe()`. This function wraps the
size check and the `memset()` call. By replacing the direct `memset()`
in the `__TEST_F_IMPL` macro with a call to this helper, we create an
abstraction boundary. This prevents the compiler's static analyzer from
"seeing" the problematic pattern at the macro expansion site, resolving
the build failure.
Build Context:
Compiler: Android (14488419, +pgo, +bolt, +lto, +mlgo, based on r584948) clang version 22.0.0 (https://android.googlesource.com/toolchain/llvm-project 2d65e4108033380e6fe8e08b1f1826cd2bfb0c99)
Relevant Options: -O2 -Wall -Werror -D_FORTIFY_SOURCE=3 -target i686-linux-android10000
Test: m kselftest_futex_futex_requeue_pi
Removed Gerrit Change-Id
Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20251224084120.249417-1-wakel@google.com
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
'available_events' is actually not required by
'test.d/event/toplevel-enable.tc' and its Existence has been tested in
'test.d/00basic/basic4.tc'.
So the require of 'available_events' can be dropped and then we can add
'instance' flag to test 'test.d/event/toplevel-enable.tc' for instance.
Test result show as below:
# ./ftracetest test.d/event/toplevel-enable.tc
=== Ftrace unit tests ===
[1] event tracing - enable/disable with top level files [PASS]
[2] (instance) event tracing - enable/disable with top level files [PASS]
# of passed: 2
# of failed: 0
# of unresolved: 0
# of untested: 0
# of unsupported: 0
# of xfailed: 0
# of undefined(test bug): 0
Link: https://lore.kernel.org/r/20230509203659.1173917-1-zhengyejian1@huawei.com
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
- Restrict ROM access to dword to resolve a regression introduced
with qword access seen on some Intel NICs. Update VGA region
access to the same given lack of precedent for 64-bit users.
(Kevin Tian)
- Fix missing .get_region_info_caps callback in the xe-vfio-pci
variant driver due to integration through the DRM tree.
(Michal Wajdeczko)
- Add aligned 64-bit access macros to tools/include/linux/types.h,
allowing removal of uapi/linux/type.h includes from various
vfio selftest, resolving redefinition warnings for integration
with KVM selftests. (David Matlack)
- Fix error path memory leak in pds-vfio-pci variant driver.
(Zilin Guan)
- Fix error path use-after-free in xe-vfio-pci variant driver.
(Alper Ak)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmlUP2oRHGFsZXhAc2hh
emJvdC5vcmcACgkQI5ubbjuwiyKYsw//fEJ6vfn0TOP/ahfY9tRpw6suZJuEo5wx
SJj57kMF85/V64iaewRq1G1hbrEAEcOgDtpcR3y57lHAS8METKxyjxL5YZdqvgX4
kRvDwRJRcFz/kvmO0PZx6rn21ZxLv2d9RXahDwaqaQfw2pR2ZOtr/zaawMr6LPmw
Z1dl0UhQnHIhw4kG1QKUhdCozhAgSV3/pmGV2bOjgXRS0rVUZ3UQZ0RprLe6uIEl
hSWLmeWUtyrt30gVzoKPTWWuRvuIw2lnAH2PGhNtha70Djyx1EAUs7iqUA8XBsQh
7JG/T1yibh9CzE5OzI+JBmix5s8zxd4q0RHNa9T31EMHSdzCJhXQfLiZVeJVv9O6
EHsFVWHzE4CXgSMEpD+QjfCrEwBcF4n6W6N68BFAuAVN51+0DoFinFF0PaqpTivj
U/Yh1erkfFhy8IlO33Q2dAOxBfy1aIkszKS2Xkc1pwX3vReMlHiWDmyM5ciB0VKJ
GwslXQwGljSNuxE81e7EFI6g18FGGLGt5EkkYPhSS/hYAZQy0RpqPdRopcP85OiO
HtyfZZZ/Ph0y13f7c7rH5awGm9NOc9W+2xNKxuKNkjwvEmjO12lRmrzGkKPu/OTi
YluIWSFG7gBQfb8eFgcfJoM7vcrRCJ+JeSp5fa7u8QLUsTk+var+/WCxKs9BpX31
hDb8rHj0Bwo=
=a1By
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Restrict ROM access to dword to resolve a regression introduced with
qword access seen on some Intel NICs. Update VGA region access to the
same given lack of precedent for 64-bit users (Kevin Tian)
- Fix missing .get_region_info_caps callback in the xe-vfio-pci variant
driver due to integration through the DRM tree (Michal Wajdeczko)
- Add aligned 64-bit access macros to tools/include/linux/types.h,
allowing removal of uapi/linux/type.h includes from various vfio
selftest, resolving redefinition warnings for integration with KVM
selftests (David Matlack)
- Fix error path memory leak in pds-vfio-pci variant driver (Zilin Guan)
- Fix error path use-after-free in xe-vfio-pci variant driver (Alper Ak)
* tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio:
vfio/xe: Fix use-after-free in xe_vfio_pci_alloc_file()
vfio/pds: Fix memory leak in pds_vfio_dirty_enable()
vfio: selftests: Drop <uapi/linux/types.h> includes
tools include: Add definitions for __aligned_{l,b}e64
vfio/xe: Add default handler for .get_region_info_caps
vfio/pci: Disable qword access to the VGA region
vfio/pci: Disable qword access to the PCI ROM bar
Add descriptive message in the _Static_assert to comply with the C11
standard requirement to prevent compiler from throwing out error. The
compiler throws an error when _Static_assert is used without a message as
that is a C23 extension.
[] Testing:
The diff between before and after of running the kselftest test of the
module shows no regression on system with x86 architecture
[] Error log:
~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/ublk$ make LLVM=1 W=1
CC kublk
In file included from kublk.c:6:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
220 | _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
| ^
| , ""
1 error generated.
In file included from null.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
220 | _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
| ^
| , ""
1 error generated.
In file included from file_backed.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
220 | _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
| ^
| , ""
1 error generated.
In file included from common.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
220 | _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
| ^
| , ""
1 error generated.
In file included from stripe.c:3:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
220 | _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
| ^
| , ""
1 error generated.
In file included from fault_inject.c:11:
./kublk.h:220:43: error: '_Static_assert' with no message is a C23 extension [-Werror,-Wc23-extensions]
220 | _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
| ^
| , ""
1 error generated.
make: *** [../lib.mk:225: ~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/ublk/kublk] Error 1
Link: https://lore.kernel.org/r/20251215085022.7642-1-clintbgeorge@gmail.com
Signed-off-by: Clint George <clintbgeorge@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Make use of empty (NULL-terminated) array instead of NULL pointer to
avoid compiler errors while maintaining the behavior of the function
intact
[] Testing:
The diff between before and after of running the kselftest test of the
module shows no regression on system with x86 architecture
[] Error log:
~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/filesystems$ make LLVM=1 W=1
CC devpts_pts
CC file_stressor
CC anon_inode_test
anon_inode_test.c:45:37: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
45 | ASSERT_LT(execveat(fd_context, "", NULL, NULL, AT_EMPTY_PATH), 0);
| ^~~~
/usr/lib/llvm-18/lib/clang/18/include/__stddef_null.h:26:14: note: expanded from macro 'NULL'
26 | #define NULL ((void*)0)
| ^~~~~~~~~~
../Desktop/kernel-dev/linux-v1/tools/testing/selftests/../../../tools/testing/selftests/kselftest_harness.h:535:11: note: expanded from macro 'ASSERT_LT'
535 | __EXPECT(expected, #expected, seen, #seen, <, 1)
| ^~~~~~~~
../Desktop/kernel-dev/linux-v1/tools/testing/selftests/../../../tools/testing/selftests/kselftest_harness.h:758:33: note: expanded from macro '__EXPECT'
758 | __typeof__(_expected) __exp = (_expected); \
| ^~~~~~~~~
1 warning generated.
Link: https://lore.kernel.org/r/20251215084900.7590-1-clintbgeorge@gmail.com
Signed-off-by: Clint George <clintbgeorge@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Use __builtin_trap() to truly crash the program instead of dereferencing
null pointer which may be optimized by the compiler preventing the crash
from occurring
[] Testing:
The diff between before and after of running the kselftest test of the
module shows no regression on system with x86 architecture
[] Error log:
~/Desktop/kernel-dev/linux-v1/tools/testing/selftests/coredump$ make LLVM=1 W=1
CC stackdump_test
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
59 | i = *(int *)NULL;
| ^~~~~~~~~~~~
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
1 warning generated.
CC coredump_socket_test
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
59 | i = *(int *)NULL;
| ^~~~~~~~~~~~
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
1 warning generated.
CC coredump_socket_protocol_test
coredump_test_helpers.c:59:6: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
59 | i = *(int *)NULL;
| ^~~~~~~~~~~~
coredump_test_helpers.c:59:6: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
1 warning generated.
Link: https://lore.kernel.org/r/20251215084737.7504-1-clintbgeorge@gmail.com
Signed-off-by: Clint George <clintbgeorge@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Recent changes in BTF generation [1] rely on ${OBJCOPY} command to
update .BTF_ids section data in target ELF files.
This exposed a bug in llvm-objcopy --update-section code path, that
may lead to corruption of a target ELF file. Specifically, because of
the bug st_shndx of some symbols may be (incorrectly) set to 0xffff
(SHN_XINDEX) [2][3].
While there is a pending fix for LLVM, it'll take some time before it
lands (likely in 22.x). And the kernel build must keep working with
older LLVM toolchains in the foreseeable future.
Using GNU objcopy for .BTF_ids update would work, but it would require
changes to LLVM-based build process, likely breaking existing build
environments as discussed in [2].
To work around llvm-objcopy bug, implement --patch_btfids code path in
resolve_btfids as a drop-in replacement for:
${OBJCOPY} --update-section .BTF_ids=${btf_ids} ${elf}
Which works specifically for .BTF_ids section:
${RESOLVE_BTFIDS} --patch_btfids ${btf_ids} ${elf}
This feature in resolve_btfids can be removed at some point in the
future, when llvm-objcopy with a relevant bugfix becomes common.
[1] https://lore.kernel.org/bpf/20251219181321.1283664-1-ihor.solodrai@linux.dev/
[2] https://lore.kernel.org/bpf/20251224005752.201911-1-ihor.solodrai@linux.dev/
[3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251231012558.1699758-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test case first initializes 9 stack slots as STACK_MISC,
then conditionally updates each of them to SCALAR spill inside an
iterator based loop. This leads to 2**9 combinations of MISC/SPILL
marks for these slots at the iterator next call.
The loop converges only if the verifier treats such states as
equivalent, otherwise visited states are evicted from the states cache
too quickly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251230-loop-stack-misc-pruning-v1-2-585cfd6cec51@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test for state graph backedges accumulation for SCCs formed by
bpf_loop(). Equivalent to the following C program:
int main(void) {
1: fp[-8] = bpf_get_prandom_u32();
2: fp[-16] = -32; // used in a memory access below
3: bpf_loop(7, loop_cb4, fp, 0);
4: return 0;
}
int loop_cb4(int i, void *ctx) {
5: if (unlikely(ctx[-8] > bpf_get_prandom_u32()))
6: *(u64 *)(fp + ctx[-16]) = 42; // aligned access expected
7: if (unlikely(fp[-8] > bpf_get_prandom_u32()))
8: ctx[-16] = -31; // makes said access unaligned
9: return 0;
}
If state graph backedges are not accumulated properly at the SCC
formed by loop_cb4() call from bpf_loop(), the state {ctx[-16]=-32}
injected at instruction 9 on verification path 1,2,3,5,7,9,4 would be
considered fully verified and would lack precision mark for ctx[-16].
This would lead to early pruning of verification path 1,2,3,5,7,8,9 in
state {ctx[-16]=-31}, which in turn leads to the incorrect assumption
that the above program is safe.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251229-scc-for-callbacks-v1-2-ceadfe679900@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The big_alloc3() test tries to allocate 2051 pages at once in
non-sleepable context and this can fail sporadically on resource
contrained systems, so skip this test in case of such failures.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251230195134.599463-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
for the iwlwifi issue you reported.
Current release - regressions:
- core: avoid prefetching NULL pointers
- wifi:
- iwlwifi: implement settime64 as stub for MVM/MLD PTP
- mac80211: fix list iteration in ieee80211_add_virtual_monitor()
- handshake: fix null-ptr-deref in handshake_complete()
- eth: mana: fix use-after-free in reset service rescan path
Previous releases - regressions:
- openvswitch: avoid needlessly taking the RTNL on vport destroy
- dsa: properly keep track of conduit reference
- ipv4:
- fix reference count leak when using error routes with nexthop objects
- fib: restore ECMP balance from loopback
- mptcp: ensure context reset on disconnect()
- bluetooth: fix potential UaF in btusb
- nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write
- eth: gve: defer interrupt enabling until NAPI registration
- eth: i40e: fix scheduling in set_rx_mode
- eth: macb: relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()
- eth: rtl8150: fix memory leak on usb_submit_urb() failure
- wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()
Previous releases - always broken:
- ip6_gre: make ip6gre_header() robust
- ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
- af_unix: don't post cmsg for SO_INQ unless explicitly asked for
- phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration
- wifi: mac80211: discard beacon frames to non-broadcast address
- eth: iavf: fix off-by-one issues in iavf_config_rss_reg()
- eth: stmmac: fix the crash issue for zero copy XDP_TX action
- eth: team: fix check for port enabled in team_queue_override_port_prio_changed()
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmlT43wSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkHhMP/jF3B8c3djMgpYpEwPRgqJlzdpBGQvcO
UsN/8fYI/XowIcU6T/yC/KM5cABCWyfnj6yZe743wPrlj8DnWK7+Fezrfwx7l8e0
0LH9kVwOIaQXg/QthtXaDHNB/9OanDtgcpitI209gENRjF81bYWCehImil6jbVnn
DUnVmfZIQ6k3dFsAPC4W7uJdA2FORtQzEZ1dZ13Ivx9jmbazK81ptUbIMAAnyfIZ
rUhv+UqaDIlflYwuay58ZPdu8no4nQlJMPiPybXiizfTVStEne9SQKOacP8j7XL0
GSjEyDO8lJXCPVSVnGEyybBH50M0myGUSH73+56o2QRRLtrHLDfieOL/N8AarNDh
7U2g9pq0+IFPuJsm9SFR14dIpUAvpKohc57ZvsmworC8NuENzl6H3b6/U4n/1oNE
JCbcitl91GkyF0Bvyac5a9wfk8SsYEJEGLPrtNX8UwH0UJh8spkfoQq5oHkC3juQ
77n//eOFSz8oPDlV7ayNv+W3CEzOW09mSYFu8bdjBKC5HeyBJsm3HnJdaAhSqNEH
6duRvcMlUJQ0JPILJoS1Zoy166uIu8hs2mZtygcAzyacia8yckL+Oq2UCYMi2oKP
psOIzfd6G/+f203w37jdSY0OVlQSHvmCFSeaY6FHs2LEApDrNWK1cMLYO+lMU8EE
q0j2SmDNMHhM
=P1hX
-----END PGP SIGNATURE-----
Merge tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth and WiFi. Notably this includes the fix
for the iwlwifi issue you reported.
Current release - regressions:
- core: avoid prefetching NULL pointers
- wifi:
- iwlwifi: implement settime64 as stub for MVM/MLD PTP
- mac80211: fix list iteration in ieee80211_add_virtual_monitor()
- handshake: fix null-ptr-deref in handshake_complete()
- eth: mana: fix use-after-free in reset service rescan path
Previous releases - regressions:
- openvswitch: avoid needlessly taking the RTNL on vport destroy
- dsa: properly keep track of conduit reference
- ipv4:
- fix error route reference count leak with nexthop objects
- fib: restore ECMP balance from loopback
- mptcp: ensure context reset on disconnect()
- bluetooth: fix potential UaF in btusb
- nfc: fix deadlock between nfc_unregister_device and
rfkill_fop_write
- eth:
- gve: defer interrupt enabling until NAPI registration
- i40e: fix scheduling in set_rx_mode
- macb: relocate mog_init_rings() callback from macb_mac_link_up()
to macb_open()
- rtl8150: fix memory leak on usb_submit_urb() failure
- wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()
Previous releases - always broken:
- ip6_gre: make ip6gre_header() robust
- ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
- af_unix: don't post cmsg for SO_INQ unless explicitly asked for
- phy: mediatek: fix nvmem cell reference leak in
mt798x_phy_calibration
- wifi: mac80211: discard beacon frames to non-broadcast address
- eth:
- iavf: fix off-by-one issues in iavf_config_rss_reg()
- stmmac: fix the crash issue for zero copy XDP_TX action
- team: fix check for port enabled when priority changes"
* tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
net: rose: fix invalid array index in rose_kill_by_device()
net: enetc: do not print error log if addr is 0
net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()
selftests: fib_test: Add test case for ipv4 multi nexthops
net: fib: restore ECMP balance from loopback
selftests: fib_nexthops: Add test cases for error routes deletion
ipv4: Fix reference count leak when using error routes with nexthop objects
net: usb: sr9700: fix incorrect command used to write single register
ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr()
usbnet: avoid a possible crash in dql_completed()
gve: defer interrupt enabling until NAPI registration
net: stmmac: fix the crash issue for zero copy XDP_TX action
octeontx2-pf: fix "UBSAN: shift-out-of-bounds error"
af_unix: don't post cmsg for SO_INQ unless explicitly asked for
net: mana: Fix use-after-free in reset service rescan path
net: avoid prefetching NULL pointers
net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct
net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write
net: usb: asix: validate PHY address before use
...
The test checks that with multi nexthops route the preferred route is the
one which matches source ip. In case when source ip is on dummy
interface, it checks that the routes are balanced.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251221192639.3911901-2-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There's a three patch series from Jiayuan Chen which fixes some issues
with KASAN and vmalloc. Apart from that it's the usual shower of
singletons - please see the respective changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaVIWxwAKCRDdBJ7gKXxA
jt6HAP49s/mEIIbuZbqnX8hxDrvYdYffs+RsSLPEZoR+yKG/7gD/VqSZhMoPw53b
rMZ56djXNjWxsOAfiVbZit3SFuQfnQ4=
=5rGD
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-12-28-21-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"27 hotfixes. 12 are cc:stable, 18 are MM.
There's a patch series from Jiayuan Chen which fixes some
issues with KASAN and vmalloc. Apart from that it's the usual
shower of singletons - please see the respective changelogs
for details"
* tag 'mm-hotfixes-stable-2025-12-28-21-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (27 commits)
mm/ksm: fix pte_unmap_unlock of wrong address in break_ksm_pmd_entry
mm/page_owner: fix memory leak in page_owner_stack_fops->release()
mm/memremap: fix spurious large folio warning for FS-DAX
MAINTAINERS: notify the "Device Memory" community of memory hotplug changes
sparse: update MAINTAINERS info
mm/page_alloc: report 1 as zone_batchsize for !CONFIG_MMU
mm: consider non-anon swap cache folios in folio_expected_ref_count()
rust: maple_tree: rcu_read_lock() in destructor to silence lockdep
mm: memcg: fix unit conversion for K() macro in OOM log
mm: fixup pfnmap memory failure handling to use pgoff
tools/mm/page_owner_sort: fix timestamp comparison for stable sorting
selftests/mm: fix thread state check in uffd-unit-tests
kernel/kexec: fix IMA when allocation happens in CMA area
kernel/kexec: change the prototype of kimage_map_segment()
MAINTAINERS: add ABI headers to KHO and LIVE UPDATE
.mailmap: remove one of the entries for WangYuli
mm/damon/vaddr: fix missing pte_unmap_unlock in damos_va_migrate_pmd_entry()
MAINTAINERS: update one straggling entry for Bartosz Golaszewski
mm/page_alloc: change all pageblocks migrate type on coalescing
mm: leafops.h: correct kernel-doc function param. names
...
ptrace_test.c currently contains a duplicated version of the
scoped_domains fixture variants. This patch removes that and make it use
the shared scoped_base_variants.h instead, like in
scoped_abstract_unix_test and scoped_signal_test.
This required renaming the hierarchy fixture to scoped_domains, but the
test is otherwise the same.
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/48148f0134f95f819a25277486a875a6fd88ecf9.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Add missing semicolon after EXPECT_EQ(0, close(stream_server_child)) in
the scoped_vs_unscoped test. I suspect currently it's just not executing
the close statement after the line, but this causes no observable
difference.
Fixes: fefcf0f7cf ("selftests/landlock: Test abstract UNIX socket scoping")
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/d9e968e4cd4ecc9bf487593d7b7220bffbb3b5f5.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Add header dependencies to kublk build rule so that changes to
kublk.h, ublk_dep.h, or utils.h trigger a rebuild.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add test_generic_15.sh to verify that async partition scan prevents
IO hang when reading partition tables.
The test creates ublk devices with fault_inject target and very large
delay (60s) to simulate blocked partition table reads, then kills the
daemon to verify proper state transitions without hanging:
1. Without recovery support:
- Create device with fault_inject and 60s delay
- Kill daemon while partition scan may be blocked
- Verify device transitions to DEAD state
2. With recovery support (-r 1):
- Create device with fault_inject, 60s delay, and recovery
- Kill daemon while partition scan may be blocked
- Verify device transitions to QUIESCED state
Before the async partition scan fix, killing the daemon during
partition scan would cause deadlock as partition scan held ub->mutex
while waiting for IO. With the async fix, partition scan happens in
a work function and flush_work() ensures proper synchronization.
Add _add_ublk_dev_no_settle() helper function to skip udevadm settle,
which would otherwise hang waiting for partition scan events to
complete when partition table read is delayed.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The size of Unix pathname addresses is computed in selftests using
offsetof(struct sockaddr_un, sun_path) + strlen(xxx). It should have
been that +1, which makes addresses passed to the libc and kernel
non-NULL-terminated. unix_mkname_bsd() fixes that in Linux so there is
no harm, but just using sizeof(the address struct) should improve
readability.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251202215141.689986-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Remove bind() call on a client socket that doesn't make sense.
Since strlen(cli_un.sun_path) returns a random value depending on stack
garbage, that many uninitialized bytes are read from the stack as an
unix socket address. This creates random test failures due to the bind
address being invalid or already in use if the same stack value comes up
twice.
Fixes: f83d51a5bd ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251201003631.190817-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
connect_variant(unspec_any0) is called twice. Both calls end
up in connect_variant_addrlen() with an address length of
get_addrlen(minimal=false).
However, the connect() syscall and its variants (e.g.
iouring/compat) accept much shorter addresses of 4 bytes
and that behaviour was not tested.
Replace one of these calls with one using a minimal address
length (just a bare sa_family=AF_UNSPEC field with no actual
address). Also add a call using a truncated address for good
measure.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027190726.626244-3-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The nominal error code for bind(AF_UNSPEC) on an IPv6 socket
is -EAFNOSUPPORT, not -EINVAL. -EINVAL is only returned when
the supplied address struct is too short, which happens to be
the case in current selftests because they treat AF_UNSPEC
like IPv4 sockets do: as an alias for AF_INET (which is a
16-byte struct instead of the 24 bytes required by IPv6
sockets).
Make the union large enough for any address (by adding struct
sockaddr_storage to the union), and make AF_UNSPEC addresses
large enough for any family.
Test for -EAFNOSUPPORT instead, and add a dedicated test case
for truncated inputs with -EINVAL.
Fixes: a549d055a2 ("selftests/landlock: Add network tests")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027190726.626244-2-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Drop the <uapi/linux/types.h> includes now that <linux/types.h>
(tools/include/linux/types.h) has a definition for __aligned_le64, which
is needed by <linux/iommufd.h>.
Including <uapi/linux/types.h> is harmless but causes benign typedef
redefinitions. This is not a problem for VFIO selftests but becomes an
issue when the VFIO selftests library is built into KVM selftests, since
they are built with -std=gnu99 which does not allow typedef redifitions.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251219233818.1965306-3-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
As arena kfuncs can now be called from non-sleepable contexts, test this
by adding non-sleepable copies of tests in verifier_arena, this is done
by using a socket program instead of syscall.
Add a new test case in verifier_arena_large to check that the
bpf_arena_alloc_pages() works for more than 1024 pages.
1024 * sizeof(struct page *) is the upper limit of kmalloc_nolock() but
bpf_arena_alloc_pages() should still succeed because it re-uses this
array in a loop.
Augment the arena_list selftest to also run in non-sleepable context by
taking rcu_read_lock.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251222195022.431211-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the thread_state_get() function, the logic to find the thread's state
character was using `sizeof(header) - 1` to calculate the offset from the
"State:\t" string.
The `header` variable is a `const char *` pointer. `sizeof()` on a
pointer returns the size of the pointer itself, not the length of the
string literal it points to. This makes the code's behavior dependent on
the architecture's pointer size.
This bug was identified on a 32-bit ARM build (`gsi_tv_arm`) for Android,
running on an ARMv8-based device, compiled with Clang 19.0.1.
On this 32-bit architecture, `sizeof(char *)` is 4. The expression
`sizeof(header) - 1` resulted in an incorrect offset of 3, causing the
test to read the wrong character from `/proc/[tid]/status` and fail.
On 64-bit architectures, `sizeof(char *)` is 8, so the expression
coincidentally evaluates to 7, which matches the length of "State:\t".
This is why the bug likely remained hidden on 64-bit builds.
To fix this and make the code portable and correct across all
architectures, this patch replaces `sizeof(header) - 1` with
`strlen(header)`. The `strlen()` function correctly calculates the
string's length, ensuring the correct offset is always used.
Link: https://lkml.kernel.org/r/20251210091408.3781445-1-wakel@google.com
Fixes: f60b6634cd ("mm/selftests: add a test to verify mmap_changing race with -EAGAIN")
Signed-off-by: Wake Liu <wakel@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If you use an IDR with a non-zero base, and specify a range that lies
entirely below the base, 'max - base' becomes very large and
idr_get_free() can return an ID that lies outside of the requested range.
Link: https://lkml.kernel.org/r/20251128161853.3200058-1-willy@infradead.org
Fixes: 6ce711f275 ("idr: Make 1-based IDRs more efficient")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6449
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When the selftest 'tap.c' is compiled with '-D_FORTIFY_SOURCE=3',
the strcpy() in rtattr_add_strsz() is replaced with a checked
version which causes the test to consistently fail when compiled
with toolchains for which this option is enabled by default.
TAP version 13
1..3
# Starting 3 tests from 1 test cases.
# RUN tap.test_packet_valid_udp_gso ...
*** buffer overflow detected ***: terminated
# test_packet_valid_udp_gso: Test terminated by assertion
# FAIL tap.test_packet_valid_udp_gso
not ok 1 tap.test_packet_valid_udp_gso
# RUN tap.test_packet_valid_udp_csum ...
*** buffer overflow detected ***: terminated
# test_packet_valid_udp_csum: Test terminated by assertion
# FAIL tap.test_packet_valid_udp_csum
not ok 2 tap.test_packet_valid_udp_csum
# RUN tap.test_packet_crash_tap_invalid_eth_proto ...
*** buffer overflow detected ***: terminated
# test_packet_crash_tap_invalid_eth_proto: Test terminated by assertion
# FAIL tap.test_packet_crash_tap_invalid_eth_proto
not ok 3 tap.test_packet_crash_tap_invalid_eth_proto
# FAILED: 0 / 3 tests passed.
# Totals: pass:0 fail:3 xfail:0 xpass:0 skip:0 error:0
A buffer overflow is detected by the fortified glibc __strcpy_chk()
since the __builtin_object_size() of `RTA_DATA(rta)` is incorrectly
reported as 1, even though there is ample space in its bounding
buffer `req`.
Additionally, given that IFLA_IFNAME also expects a null-terminated
string, callers of rtaddr_add_str{,sz}() could simply use the
rtaddr_add_strsz() variant. (which has been renamed to remove the
trailing `sz`) memset() has been used for this function since it
is unchecked and thus circumvents the issue discussed in the
previous paragraph.
Fixes: 2e64fe4624 ("selftests: add few test cases for tap driver")
Signed-off-by: Alice C. Munduruca <alice.munduruca@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251216170641.250494-1-alice.munduruca@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
test_case will only take on the formatted name after being
called. This does not work with the way ksft_run() currently
works. Assign the name after the test_case is created.
Fixes: 81236c74db ("selftests: drv-net: psp: add test for auto-adjusting TCP MSS")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251216-psp-test-fix-v1-2-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
test_case will only take on its formatted name after it is called by
the test runner. Move the assignment to test_case.__name__ to when the
test_case is constructed, not called.
Fixes: 8f90dc6e41 ("selftests: drv-net: psp: add basic data transfer and key rotation tests")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251216-psp-test-fix-v1-1-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As the previous commit allowed raw_tp programs to call kfuncs, so of the
selftests that were expected to fail will now succeed.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251222133250.1890587-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test coverage for the kfuncs that fetch memcg stats. Using some common
stats, test scenarios ensuring that the given stat increases by some
arbitrary amount. The stats selected cover the three categories represented
by the enums: node_stat_item, memcg_stat_item, vm_event_item.
Since only a subset of all stats are queried, use a static struct made up
of fields for each stat. Write to the struct with the fetched values when
the bpf program is invoked and read the fields in the user mode program for
verification.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20251223044156.208250-6-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a trivial test case asserting that the BPF verifier enforces
PTR_MAYBE_NULL semantics on the struct file pointer argument of BPF
LSM hook bpf_lsm_mmap_file().
Dereferencing the struct file pointer passed into bpf_lsm_mmap_file()
without explicitly performing a NULL check first should not be
permitted by the BPF verifier as it can lead to NULL pointer
dereferences and a kernel crash.
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251216133000.3690723-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
other series:
- Simplify checks in arch_kfence_init_pool() since force_pte_mapping()
already takes BBML2-noabort (break-before-make Level 2 with no aborts
generated) into account
- Remove unneeded SVE/SME fallback preserve/store handling in the arm64
EFI. With the recent updates, the fallback path is only taken for EFI
runtime calls from hardirq or NMI contexts. In practice, this only
happens under panic/oops/emergency_restart() and no restoring of the
user state expected. There's a corresponding lkdtm update to trigger
a BUG() or panic() from hardirq context together with a fixup not to
confuse clang/objtool about the control flow
GCS (guarded control stacks) fix: flush the GCS locking state on exec,
otherwise the new task will not be able to enable GCS (locked as
disabled).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmlGb0YACgkQa9axLQDI
XvGNTA/+MFft7b7Seb0DlanyxMfnKmLqoPbdciSCQh5I79fklWDMCqh/B1KVUxhG
cnY45GyDEO/AC46wYvqqm6frS4sYHwiDybIJwLPGcxmPaoeNacVQkWYjiM6sU68m
RjcFe5xN1+3y+d37ZPgoClU6YNDqNcxv1jLHzQly3eO/8LLOGi6S8bEX9v8SGLWR
4PJki0YSOIJQYEehmAglPwroRgJXeHcaLboF/I/ULPZjhrDMIz2BSoNKreGfzgGI
0+myd6CeGjxcAKf2JI9zmBHiCmgGGi9gMz5BXm6yo+jGoyeMaeTi5/w4r/D4GJlp
kS2RbI+LpWEFN4yKk23YVMM7fYmI9ZCTEsRw0osHV2uqMSt2WNGadE4klG8uV6oK
h0VMKY5QB0SZVc0uT+S44zkK9iLVPbn8FgCAbzMNBE9gTgf/nT6l09VsdgNrzK90
CTwwvt9S0hiRk9VMa57WAWodb8lxp0IUE0eP4ptc/Y7EL8ibgTZcm0GVxG9OLMBD
J+RXQZ1VugeUoYncEH23KYSo89kjRJ7k6tAkz1B6bK1ULnIoQsebyNFhqITCx9t8
dc/3hqsgijiyMhZchLZtvqYh/tfSFBsKTN5cQj2qQCxTlnX7Pi1UvbnSwXeGGdki
lEWTEtkqKQdE2wryYkMlvMllCLibmr7/FClMncj0pEPQjz/FJDA=
=vAzy
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"Two left-over updates that could not go into -rc1 due to conflicts
with other series:
- Simplify checks in arch_kfence_init_pool() since
force_pte_mapping() already takes BBML2-noabort (break-before-make
Level 2 with no aborts generated) into account
- Remove unneeded SVE/SME fallback preserve/store handling in the
arm64 EFI. With the recent updates, the fallback path is only taken
for EFI runtime calls from hardirq or NMI contexts. In practice,
this only happens under panic/oops/emergency_restart() and no
restoring of the user state expected.
There's a corresponding lkdtm update to trigger a BUG() or panic()
from hardirq context together with a fixup not to confuse
clang/objtool about the control flow
GCS (guarded control stacks) fix: flush the GCS locking state on exec,
otherwise the new task will not be able to enable GCS (locked as
disabled)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
lkdtm/bugs: Do not confuse the clang/objtool with busy wait loop
arm64/gcs: Flush the GCS locking state on exec
arm64/efi: Remove unneeded SVE/SME fallback preserve/store handling
lkdtm/bugs: Add cases for BUG and PANIC occurring in hardirq context
arm64: mm: Simplify check in arch_kfence_init_pool()
- Add a missing "break" to fix param parsing in the rseq selftest.
- Apply runtime updates to the _current_ CPUID when userspace is setting
CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
dropping the pending update.
- Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
supported by KVM and leads to a use-after-free due to KVM failing to unbind
the memslot from the previously-associated guest_memfd instance.
- Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.
- Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
as -1ull (a 64-bit value).
- Update SVI when activating APICv to fix a bug where a post-activation EOI
for an in-service IRQ would effective be lost due to SVI being stale.
- Immediately refresh APICv controls (if necessary) on a nested VM-Exit
instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
effectively ignored because KVM thinks the vCPU already has the correct
APICv settings.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmlEQXEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOscwf/ahimZsBtFTtd6yZpZ84E1O7P8g31
aZ4p9CRcx+I0aMQKOfS98R3G1w8FtMpcX11aPWvIi4Vkx/4PVlZr+ds8cfiMV7Z+
Hdd7k/uR4eC+KcL0XoGcmukBlHYi/8lZeijUcID+Iueo7MMlyX5G/VQ1RzBZwP++
VFObgtzoAmlKrR5AaFeRMBnY+fiH6wMjPSwY7fI5pr4qNHrxB06wgtX5+C1Y3Poa
7ETMrfX2XtyfG53N+1YJMGoaXqBkFaGCateMJQNOu/QQzOZhUuBCkC8muzbCTckO
Y405j7GfpyUmhf/mdLI8BlMYrn5Mwdnvo9iJ3U5fGUN/Eh4XmPIYKy7sMg==
=Fa0R
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm fixes from Paolo Bonzini:
"x86 fixes. Everyone else is already in holiday mood apparently.
- Add a missing 'break' to fix param parsing in the rseq selftest
- Apply runtime updates to the _current_ CPUID when userspace is
setting CPUID, e.g. as part of vCPU hotplug, to fix a false
positive and to avoid dropping the pending update
- Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as
it's not supported by KVM and leads to a use-after-free due to KVM
failing to unbind the memslot from the previously-associated
guest_memfd instance
- Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for
supporting flags-only changes on KVM_MEM_GUEST_MEMFD memlslots,
e.g. for dirty logging
- Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is
defined as -1ull (a 64-bit value)
- Update SVI when activating APICv to fix a bug where a
post-activation EOI for an in-service IRQ would effective be lost
due to SVI being stale
- Immediately refresh APICv controls (if necessary) on a nested
VM-Exit instead of deferring the update via KVM_REQ_APICV_UPDATE,
as the request is effectively ignored because KVM thinks the vCPU
already has the correct APICv settings"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit
KVM: VMX: Update SVI during runtime APICv activation
KVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN)
KVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits
KVM: Harden and prepare for modifying existing guest_memfd memslots
KVM: Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot
KVM: selftests: Add a CPUID testcase for KVM_SET_CPUID2 with runtime updates
KVM: x86: Apply runtime updates to current CPUID during KVM_SET_CPUID{,2}
KVM: selftests: Add missing "break" in rseq_test's param parsing
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmlEvbAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphTLD/9Gx4wnirRvzo1Vz77Odhqqbipww/gjDV0G
HASzF4KczfusTei4NABfuSVA4/LS28csZOYq0ZeDmFVg+YPQcmr4DNw/eJzzi1Xh
TDmYuf8y7jAJjwWmX0fgVF2uDg3/kp3tWKfBxSFqpqHrZUkOLr8IY3y8/uJYKakm
r8K9HsS7hyPhx4xdfRQixUil5TgQxBtCgwTX+JRv/K9AMnEVHkBkJ+m4GSmjJ3n8
tr5lL8fQmIQcsSWqTvMH/m2xeLG7gr9LE33wfu2sCLvNJ4CAYFGT9aMf2F0nGbzK
Uu5XOyq1rJ8vKWJuDVfFsjOca4fyyvaKX4bqycv89Z7MVzQOXsjbs+4sHFycsfYC
P+OBmxLb2+rowNsawyzObT9wnd37xptl+tF5nXL0LL6W4PWHmUgnDuM6Vg5zpNv+
n0nO6y7I0iL9o4hgqN8kUOr5e7IjvJccJbratOA/jVXmAw1I/Gg6HDo+W7rEqsVP
0hrlE+RuJSRypbCO2pcvviahaDaCWo3gQmp9TZCBmgqgoiI0WStPAgYrG424+69Y
Hoq5HBmtN4GV3EnU0Ybuawzl2OKtQwwr9DrRpvrxdN9qm4E44CD6eTmE5fEnrTnX
Rcs5VosBAavPTbl8LFDtj2h6qPak+5VHHVZwimLNdOyCLTNf7D/kiAjmOHRBbmdt
AopIuvCipA==
=lk7H
-----END PGP SIGNATURE-----
Merge tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- ublk selftests for missing coverage
- two fixes for the block integrity code
- fix for the newly added newly added PR read keys ioctl, limiting the
memory that can be allocated
- work around for a deadlock that can occur with ublk, where partition
scanning ends up recursing back into file closure, which needs the
same mutex grabbed. Not the prettiest thing in the world, but an
acceptable work-around until we can eliminate the reliance on
disk->open_mutex for this
- fix for a race between enabling writeback throttling and new IO
submissions
- move a bit of bio flag handling code. No changes, but needed for a
patchset for a future kernel
- fix for an init time id leak failure in rnbd
- loop/zloop state check fix
* tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: validate interval_exp integrity limit
block: validate pi_offset integrity limit
block: rnbd-clt: Fix leaked ID in init_dev()
ublk: fix deadlock when reading partition table
block: add allocation size check in blkdev_pr_read_keys()
Documentation: admin-guide: blockdev: replace zone_capacity with zone_capacity_mb when creating devices
zloop: use READ_ONCE() to read lo->lo_state in queue_rq path
loop: use READ_ONCE() to read lo->lo_state without locking
block: fix race between wbt_enable_default and IO submission
selftests: ublk: add user copy test cases
selftests: ublk: add support for user copy to kublk
selftests: ublk: forbid multiple data copy modes
selftests: ublk: don't share backing files between ublk servers
selftests: ublk: use auto_zc for PER_IO_DAEMON tests in stress_04
selftests: ublk: fix fio arguments in run_io_and_recover()
selftests: ublk: remove unused ios map in seq_io.bt
selftests: ublk: correct last_rw map type in seq_io.bt
selftests: ublk: fix overflow in ublk_queue_auto_zc_fallback()
block: move around bio flagging helpers
Currently resolve_btfids updates .BTF_ids section of an ELF file
in-place, based on the contents of provided BTF, usually within the
same input file, and optionally a BTF base.
Change resolve_btfids behavior to enable BTF transformations as part
of its main operation. To achieve this, in-place ELF write in
resolve_btfids is replaced with generation of the following binaries:
* ${1}.BTF with .BTF section data
* ${1}.BTF_ids with .BTF_ids section data if it existed in ${1}
* ${1}.BTF.base with .BTF.base section data for out-of-tree modules
The execution of resolve_btfids and consumption of its output is
orchestrated by scripts/gen-btf.sh introduced in this patch.
The motivation for emitting binary data is that it allows simplifying
resolve_btfids implementation by delegating ELF update to the $OBJCOPY
tool [1], which is already widely used across the codebase.
There are two distinct paths for BTF generation and resolve_btfids
application in the kernel build: for vmlinux and for kernel modules.
For the vmlinux binary a .BTF section is added in a roundabout way to
ensure correct linking. The patch doesn't change this approach, only
the implementation is a little different.
Before this patch it worked as follows:
* pahole consumed .tmp_vmlinux1 [2] and added .BTF section with
llvm-objcopy [3] to it
* then everything except the .BTF section was stripped from .tmp_vmlinux1
into a .tmp_vmlinux1.bpf.o object [2], later linked into vmlinux
* resolve_btfids was executed later on vmlinux.unstripped [4],
updating it in-place
After this patch gen-btf.sh implements the following:
* pahole consumes .tmp_vmlinux1 and produces a *detached* file with
raw BTF data
* resolve_btfids consumes .tmp_vmlinux1 and detached BTF to produce
(potentially modified) .BTF, and .BTF_ids sections data
* a .tmp_vmlinux1.bpf.o object is then produced with objcopy copying
BTF output of resolve_btfids
* .BTF_ids data gets embedded into vmlinux.unstripped in
link-vmlinux.sh by objcopy --update-section
For kernel modules, creating a special .bpf.o file is not necessary,
and so embedding of sections data produced by resolve_btfids is
straightforward with objcopy.
With this patch an ELF file becomes effectively read-only within
resolve_btfids, which allows deleting elf_update() call and satellite
code (like compressed_section_fix [5]).
Endianness handling of .BTF_ids data is also changed. Previously the
"flags" part of the section was bswapped in sets_patch() [6], and then
Elf_Type was modified before elf_update() to signal to libelf that
bswap may be necessary. With this patch we explicitly bswap entire
data buffer on load and on dump.
[1] https://lore.kernel.org/bpf/131b4190-9c49-4f79-a99d-c00fac97fa44@linux.dev/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n110
[3] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/btf_encoder.c?h=v1.31#n1803
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n284
[5] https://lore.kernel.org/bpf/20200819092342.259004-1-jolsa@kernel.org/
[6] https://lore.kernel.org/bpf/cover.1707223196.git.vmalik@redhat.com/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev
A selftest targeting resolve_btfids functionality relies on a resolved
.BTF_ids section to be available in the TRUNNER_BINARY. The underlying
BTF data is taken from a special BPF program (btf_data.c), and so
resolve_btfids is executed as a part of a TRUNNER_BINARY build recipe
on the final binary.
Subsequent patches in this series allow resolve_btfids to modify BTF
before resolving the symbols, which means that the test needs access
to that modified BTF [1]. Currently the test simply reads in
btf_data.bpf.o on the assumption that BTF hasn't changed.
Implement resolve_btfids call only for particular test objects (just
resolve_btfids.test.o for now). The test objects are linked into the
TRUNNER_BINARY, and so .BTF_ids section will be available there.
This will make it trivial for the resolve_btfids test to access BTF
modified by resolve_btfids.
[1] https://lore.kernel.org/bpf/CAErzpmvsgSDe-QcWH8SFFErL6y3p3zrqNri5-UHJ9iK2ChyiBw@mail.gmail.com/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-2-ihor.solodrai@linux.dev
Few minor fixes, other than the randconfig fix this is only relevant to
test code, not releases.
- Randconfig failure if CONFIG_DMA_SHARED_BUFFER is not set
- Remove gcc warning in kselftest
- Fix a refcount leak on an error path in the selftest support code
- Fix missing overflow checks in the selftest support code
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaURMcAAKCRCFwuHvBreF
YUUIAP4u6RUth7SkqxnoM8KOqsV3YaT721AFc+lH/NA4D+XX3gD/VRHwORq8RiCQ
APzvUXP/+j+FBy2b0NINfGOdr+cfLAI=
=0fWH
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
"A few minor fixes, other than the randconfig fix this is only relevant
to test code, not releases:
- Randconfig failure if CONFIG_DMA_SHARED_BUFFER is not set
- Remove gcc warning in kselftest
- Fix a refcount leak on an error path in the selftest support code
- Fix missing overflow checks in the selftest support code"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd/selftest: Check for overflow in IOMMU_TEST_OP_ADD_RESERVED
iommufd/selftest: Do not leak the hwpt if IOMMU_TEST_OP_MD_CHECK_MAP fails
iommufd/selftest: Make it clearer to gcc that the access is not out of bounds
iommufd: Fix building without dmabuf
Current release - regressions:
- netfilter: nf_conncount: fix leaked ct in error paths
- sched: act_mirred: fix loop detection
- sctp: fix potential deadlock in sctp_clone_sock()
- can: fix build dependency
- eth: mlx5e: do not update BQL of old txqs during channel reconfiguration
Previous releases - regressions:
- sched: ets: always remove class from active list before deleting it
- inet: frags: flush pending skbs in fqdir_pre_exit()
- netfilter: nf_nat: remove bogus direction check
- mptcp:
- schedule rtx timer only after pushing data
- avoid deadlock on fallback while reinjecting
- can: gs_usb: fix error handling
- eth: mlx5e:
- avoid unregistering PSP twice
- fix double unregister of HCA_PORTS component
- eth: bnxt_en: fix XDP_TX path
- eth: mlxsw: fix use-after-free when updating multicast route stats
Previous releases - always broken:
- ethtool: avoid overflowing userspace buffer on stats query
- openvswitch: fix middle attribute validation in push_nsh() action
- eth: mlx5: fw_tracer, validate format string parameters
- eth: mlxsw: spectrum_router: fix neighbour use-after-free
- eth: ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()
Misc:
- Jozsef Kadlecsik retires from maintaining netfilter
- tools: ynl: fix build on systems with old kernel headers
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmlEPS0SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkHLEP/3TnVI9qLivOGsZ48bn5UUcIaIlX0wiw
i5gwpUhbtW3zcLSphO0Nh/CmProme6dFhaMOOkk48bKAdIxRpOMbCC20bfYDLyd0
ZJTUQqheKpuI23vOnhfs2TTOkcqz6cM7txUq681taQ8Nfo8Dbf0fgOO2HT5ljRD+
JXlxrdvicZDtve68sSdnsAj5M15EPQGKTPMyPkymBmCKnCMQK9SQKaeTEtaW6NIO
yM0KqDSNoulDc/6LYMhx8DTtyE7yTiTxwe2NixjdyYljXsk0KiRDirCzxnZuSgh8
oh+cFLdFDq4mwdYEKjgC5c3ifQyLEpZvwzlY5MKoobsVnT5SbigeSK53l5rEkO3V
sM84lITHfqTJvlid0AF/ixEc6iWwV7nGRBHh2FXNbfoIKt45eF77jPi9YFsq6Z95
vlCzYIbY0f2L1y3mPvZbzGQbh2Z12b5kyK8QA1j7SK+zNzxgXbf4+ZtYxHg7O3Ne
gecmIpTKXMWodaZyfsRQPjR/F6UIlMqsgl9Ci9bfUw+XwL8x+7bJxQAQz+yVjLla
ng14BItiYKBavcPZBjYlhGKqD1fzGhVZqQecrCkF0VTbMusRd9RcwytU1NG4QGDx
V5aL28ht85KtMednEWOBkrg+PeXnNyZHzLAf2Xtx3UkaGgiDC8G4IxUv3orlOLD1
sFPfZnSiGCof
=6pla
-----END PGP SIGNATURE-----
Merge tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and CAN.
Current release - regressions:
- netfilter: nf_conncount: fix leaked ct in error paths
- sched: act_mirred: fix loop detection
- sctp: fix potential deadlock in sctp_clone_sock()
- can: fix build dependency
- eth: mlx5e: do not update BQL of old txqs during channel
reconfiguration
Previous releases - regressions:
- sched: ets: always remove class from active list before deleting it
- inet: frags: flush pending skbs in fqdir_pre_exit()
- netfilter: nf_nat: remove bogus direction check
- mptcp:
- schedule rtx timer only after pushing data
- avoid deadlock on fallback while reinjecting
- can: gs_usb: fix error handling
- eth:
- mlx5e:
- avoid unregistering PSP twice
- fix double unregister of HCA_PORTS component
- bnxt_en: fix XDP_TX path
- mlxsw: fix use-after-free when updating multicast route stats
Previous releases - always broken:
- ethtool: avoid overflowing userspace buffer on stats query
- openvswitch: fix middle attribute validation in push_nsh() action
- eth:
- mlx5: fw_tracer, validate format string parameters
- mlxsw: spectrum_router: fix neighbour use-after-free
- ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()
Misc:
- Jozsef Kadlecsik retires from maintaining netfilter
- tools: ynl: fix build on systems with old kernel headers"
* tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: hns3: add VLAN id validation before using
net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx
net: hns3: using the num_tqps in the vf driver to apply for resources
net: enetc: do not transmit redirected XDP frames when the link is down
selftests/tc-testing: Test case exercising potential mirred redirect deadlock
net/sched: act_mirred: fix loop detection
sctp: Clear inet_opt in sctp_v6_copy_ip_options().
sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock().
net/handshake: duplicate handshake cancellations leak socket
net/mlx5e: Don't include PSP in the hard MTU calculations
net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
net/mlx5e: Trigger neighbor resolution for unresolved destinations
net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init
net/mlx5: Serialize firmware reset with devlink
net/mlx5: fw_tracer, Handle escaped percent properly
net/mlx5: fw_tracer, Validate format string parameters
net/mlx5: Drain firmware reset in shutdown callback
net/mlx5: fw reset, clear reset requested on drain_fw_reset
net: dsa: mxl-gsw1xx: manually clear RANEG bit
net: dsa: mxl-gsw1xx: fix .shutdown driver operation
...
- Add a missing "break" to fix param parsing in the rseq selftest.
- Apply runtime updates to the _current_ CPUID when userspace is setting
CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
dropping the pending update.
- Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
supported by KVM and leads to a use-after-free due to KVM failing to unbind
the memslot from the previously-associated guest_memfd instance.
- Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.
- Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
as -1ull (a 64-bit value).
- Update SVI when activating APICv to fix a bug where a post-activation EOI
for an in-service IRQ would effective be lost due to SVI being stale.
- Immediately refresh APICv controls (if necessary) on a nested VM-Exit
instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
effectively ignored because KVM thinks the vCPU already has the correct
APICv settings.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmk5p18ACgkQOlYIJqCj
N/0YlBAAvnhGVmqVc3nhd313mo4YGk+Z1RxpO1sJAsGJu42Ir/QqMC9aPHy9ejcS
hfoIXzPFdVJEztuBUWRje9mvocQnXSAWjXFTaoqJE/LXVnh96Txhh4nvCJSQzyrn
5/0uk5ZD5dfPVyPtYk6G9w3q9kgYv3Q6O4UEU48ru0q6wcu5FmRshULfHVZnlyNa
ZALY4k8QzsdSzB0XWusmD0OQpjGRyUR79mqEzUybg4E/b0LAK9Nv6Fr5YgGq7g0N
AU1T1t+/hMN7x4/24RrxNBO+skPKmCi7nM3iVKqilQSO2fZzrNVUOkr/tGb+5EL6
iw4JHJQjp9LlzRVxP3QNZp8Bg+knMVRdbSzAkdomDRguMgpu/TGAe7TtkzfEyVel
VAQUVpDaThp0FK5wAdyMKvpOQqTGjl3KKtM2zs187v+eJJcjJQnIsTk4zW5mmvk4
Y6YOqbulSNAqVOmJj7oqrDxWgjD75PtXlPFEoOsJM0AuL/sHBo8bKT18cGBwVGmH
lJoNfkS45kofG4i0zIBwzQuvKjDIQU7ZdVXa2MLL1aqDlyu/66Hll0vCJLAMvHz5
eb65WQ6Br97e0BNuzVJJNyTGQ3Pr9DdSkTpPkOalwQ3VEyZwcKm3OsB/N0FsgP2V
ta7vZQ5b6Sn568A9LAgXGhcnQ7mA31VBoNCkLLowxIT4kxVKGUY=
=4iuO
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.19-rc1' of https://github.com/kvm-x86/linux into HEAD
KVM fixes for 6.19-rc1
- Add a missing "break" to fix param parsing in the rseq selftest.
- Apply runtime updates to the _current_ CPUID when userspace is setting
CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
dropping the pending update.
- Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
supported by KVM and leads to a use-after-free due to KVM failing to unbind
the memslot from the previously-associated guest_memfd instance.
- Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.
- Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
as -1ull (a 64-bit value).
- Update SVI when activating APICv to fix a bug where a post-activation EOI
for an in-service IRQ would effective be lost due to SVI being stale.
- Immediately refresh APICv controls (if necessary) on a nested VM-Exit
instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
effectively ignored because KVM thinks the vCPU already has the correct
APICv settings.
Add a test case that reproduces deadlock scenario where the user has
a drr qdisc attached to root and has a mirred action that redirects to
self on egress
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251210162255.1057663-2-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmlCBmwACgkQ6rmadz2v
bToUZA//ZY0IE1x1nCixEAqGF/nGpDzVX4YQQfjrUoXQOD4ykzt35yTNXl6B1IVA
dliVSI6kUtdoThUa7xJUxMSkDsVBsEMT/zYXQEXJG1zXvJANCB9wTzsC3OCBWbXt
BRczcEkq0OHC9/l5CrILR6ocwxKGDIMIysfeOSABgfqckSEhylWy3+EWZQCk08ka
gNpXlDJUG7dYpcZD/zhuC7e5Rg1uNvN7WiTv+Biig8xZCsEtYOq+qC5C/sOnsypI
nqfECfbx48cVl49SjatdgquuHn/INESdLRCHisshkurA2Mp5PQuCmrwlXbv4JG59
v9b7lsFQlkpvEXMdo9VYe6K2gjfkOPRdWsVPu2oXA1qISRmrDqX8cKOpapUIwRhL
p3ASruMOnz0KFqVaET8+5u2SwtALeW+c+1p1aHMfVGF/qbXuyG05qBkLoGFJR+Xr
WznXUXY80Z7pjD57SpA6U3DigAkGqKCBXUwdifaOq8HQonwsnQGqkW/3NngNULGP
IC4u0JXn61VgQsM/kAw+ucc4bdKI0g4oKJR56lT48elrj6Yxrjpde4oOqzZ0IQKu
VQ0YnzWqqT2tjh4YNMOwkNPbFR4ALd329zI6TUkWib/jByEBNcfjSj9BRANd1KSx
JgSHAE6agrbl6h3nOx584YCasX3Zq+nfv1Sj4Z/5GaHKKW3q/Vw=
=wHLt
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix BPF builds due to -fms-extensions. selftests (Alexei
Starovoitov), bpftool (Quentin Monnet).
- Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n
(Geert Uytterhoeven)
- Fix livepatch/BPF interaction and support reliable unwinding through
BPF stack frames (Josh Poimboeuf)
- Do not audit capability check in arm64 JIT (Ondrej Mosnacek)
- Fix truncated dmabuf BPF iterator reads (T.J. Mercier)
- Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)
- Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
C23 (Mikhail Gavrilov)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: add regression test for bpf_d_path()
bpf: Fix verifier assumptions of bpf_d_path's output buffer
selftests/bpf: Add test for truncated dmabuf_iter reads
bpf: Fix truncated dmabuf iterator reads
x86/unwind/orc: Support reliable unwinding through BPF stack frames
bpf: Add bpf_has_frame_pointer()
bpf, arm64: Do not audit capability check in do_jit()
libbpf: Fix -Wdiscarded-qualifiers under C23
bpftool: Fix build warnings due to MS extensions
net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
selftests/bpf: Add -fms-extensions to bpf build flags
Add tests for the new libbpf globals arena offset logic. The
tests cover the case of globals being as large as the arena
itself, and being smaller than the arena. In that case, the
data is placed at the end of the arena, and the beginning
of the arena is free.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251216173325.98465-6-emil@etsalapatis.com
Arena globals are currently placed at the beginning of the arena
by libbpf. This is convenient, but prevents users from reserving
guard pages in the beginning of the arena to identify NULL pointer
dereferences. Adjust the load logic to place the globals at the
end of the arena instead.
Also modify bpftool to set the arena pointer in the program's BPF
skeleton to point to the globals. Users now call bpf_map__initial_value()
to find the beginning of the arena mapping and use the arena pointer
in the skeleton to determine which part of the mapping holds the
arena globals and which part is free.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251216173325.98465-5-emil@etsalapatis.com
The verifier currently limits direct offsets into a map to 512MiB
to avoid overflow during pointer arithmetic. However, this prevents
arena maps from using direct addressing instructions to access data
at the end of > 512MiB arena maps. This is necessary when moving
arena globals to the end of the arena instead of the front.
Refactor the verifier code to remove the offset calculation during
direct value access calculations. This is possible because the only
two map types that implement .map_direct_value_addr() are arrays and
arenas, and they both do their own internal checks to ensure the
offset is within bounds.
Adjust selftests that expect the old error. These tests still fail
because the verifier identifies the access as out of bounds for the
map, so change them to expect an "invalid access to map value pointer"
error instead.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251216173325.98465-3-emil@etsalapatis.com
The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.
Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251216173325.98465-2-emil@etsalapatis.com
- Fix memory leak when destroying helper kthread workers during scheduler
disable.
- Fix bypass depth accounting on scx_enable() failure which could leave
the system permanently in bypass mode.
- Fix missing preemption handling when moving tasks to local DSQs via
scx_bpf_dsq_move().
- Misc fixes including NULL check for put_prev_task(), flushing stdout in
selftests, and removing unused code.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaUA9aQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGXWKAP9VeREr2ceRFd3WK5vFsFzCGh1L8rRu+t352MGL
qWXkzQD/apLFSo5RQZ0tE8qORwKM9hNhS8QkVJhlsv5VZRWMBQI=
=ueoE
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix memory leak when destroying helper kthread workers during
scheduler disable
- Fix bypass depth accounting on scx_enable() failure which could leave
the system permanently in bypass mode
- Fix missing preemption handling when moving tasks to local DSQs via
scx_bpf_dsq_move()
- Misc fixes including NULL check for put_prev_task(), flushing stdout
in selftests, and removing unused code
* tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Remove unused code in the do_pick_task_scx()
selftests/sched_ext: flush stdout before test to avoid log spam
sched_ext: Fix missing post-enqueue handling in move_local_task_to_local_dsq()
sched_ext: Factor out local_dsq_post_enq() from dispatch_enqueue()
sched_ext: Fix bypass depth leak on scx_enable() failure
sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
sched_ext: Fix the memleak for sch->helper objects
GCC gets a bit confused and reports:
In function '_test_cmd_get_hw_info',
inlined from 'iommufd_ioas_get_hw_info' at iommufd.c:779:3,
inlined from 'wrapper_iommufd_ioas_get_hw_info' at iommufd.c:752:1:
>> iommufd_utils.h:804:37: warning: array subscript 'struct iommu_test_hw_info[0]' is partly outside array bounds of 'struct iommu_test_hw_info_buffer_smaller[1]' [-Warray-bounds=]
804 | assert(!info->flags);
| ~~~~^~~~~~~
iommufd.c: In function 'wrapper_iommufd_ioas_get_hw_info':
iommufd.c:761:11: note: object 'buffer_smaller' of size 4
761 | } buffer_smaller;
| ^~~~~~~~~~~~~~
While it is true that "struct iommu_test_hw_info[0]" is partly out of
bounds of the input pointer, it is not true that info->flags is out of
bounds. Unclear why it warns on this.
Reuse an existing properly sized stack buffer and pass a truncated length
instead to test the same thing.
Fixes: af4fde93c3 ("iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl")
Link: https://patch.msgid.link/r/0-v1-63a2cffb09da+4486-iommufd_gcc_bounds_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512032344.kaAcKFIM-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add tests for STATMOUNT_BY_FD flag, which adds support for passing a
file descriptors to statmount(). The fd can also be on a "unmounted"
mount (mount unmounted with MNT_DETACH), we also include tests for that.
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Link: https://patch.msgid.link/20251129091455.757724-4-b.sachdev1904@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add lkdtm cases to trigger a BUG() or panic() from hardirq context. This
is useful for testing pstore behavior being invoked from such contexts.
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ublk selftests cover every data copy mode except user copy. Add
tests for user copy based on the existing test suite:
- generic_14 ("basic recover function verification (user copy)") based
on generic_04 and generic_05
- null_03 ("basic IO test with user copy") based on null_01 and null_02
- loop_06 ("write and verify over user copy") based on loop_01 and
loop_03
- loop_07 ("mkfs & mount & umount with user copy") based on loop_02 and
loop_04
- stripe_05 ("write and verify test on user copy") based on stripe_03
- stripe_06 ("mkfs & mount & umount on user copy") based on stripe_02
and stripe_04
- stress_06 ("run IO and remove device (user copy)") based on stress_01
and stress_03
- stress_07 ("run IO and kill ublk server (user copy)") based on
stress_02 and stress_04
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The ublk selftests mock ublk server kublk supports every data copy mode
except user copy. Add support for user copy to kublk, enabled via the
--user_copy (-u) command line argument. On writes, issue pread() calls
to copy the write data into the ublk_io's buffer before dispatching the
write to the target implementation. On reads, issue pwrite() calls to
copy read data from the ublk_io's buffer before committing the request.
Copy in 2 KB chunks to provide some coverage of the offseting logic.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The kublk mock ublk server allows multiple data copy mode arguments to
be passed on the command line (--zero_copy, --get_data, and --auto_zc).
The ublk device will be created with all the requested feature flags,
however kublk will only use one of the modes to interact with request
data (arbitrarily preferring auto_zc over zero_copy over get_data). To
clarify the intent of the test, don't allow multiple data copy modes to
be specified. --zero_copy and --auto_zc are allowed together for
--auto_zc_fallback, which uses both copy modes.
Don't set UBLK_F_USER_COPY for zero_copy, as it's a separate feature.
Fix the test cases in test_stress_05 passing --get_data along with
--zero_copy or --auto_zc.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
stress_04 is missing a wait between blocks of tests, meaning multiple
ublk servers will be running in parallel using the same backing files.
Add a wait after each section to ensure each backing file is in use by a
single ublk server at a time.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
stress_04 is described as "run IO and kill ublk server(zero copy)" but
the --per_io_tasks tests cases don't use zero copy. Plus, one of the
test cases is duplicated. Add --auto_zc to these test cases and
--auto_zc_fallback to one of the duplicated ones. This matches the test
cases in stress_03.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
run_io_and_recover() invokes fio with --size="${size}", but the variable
size doesn't exist. Thus, the argument expands to --size=, which causes
fio to exit immediately with an error without issuing any I/O. Pass the
value for size as the first argument to the function.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The ios map populated by seq_io.bt is never read, so remove it.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The last_rw map is initialized with a value of 0 but later assigned the
value args.sector + args.nr_sector, which has type sector_t = u64.
bpftrace complains about the type mismatch between int64 and uint64:
trace/seq_io.bt:18:3-59: ERROR: Type mismatch for @last_rw: trying to assign value of type 'uint64' when map already contains a value of type 'int64'
@last_rw[$dev, str($2)] = (args.sector + args.nr_sector);
Cast the initial value to uint64 so bpftrace will load the program.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The functions ublk_queue_use_zc(), ublk_queue_use_auto_zc(), and
ublk_queue_auto_zc_fallback() were returning int, but performing
bitwise AND on q->flags which is __u64.
When a flag bit is set in the upper 32 bits (beyond INT_MAX), the
result of the bitwise AND operation could overflow when cast to int,
leading to incorrect boolean evaluation.
For example, if UBLKS_Q_AUTO_BUF_REG_FALLBACK is 0x8000000000000000:
- (u64)flags & 0x8000000000000000 = 0x8000000000000000
- Cast to int: undefined behavior / incorrect value
- Used in if(): may evaluate incorrectly
Fix by:
1. Changing return type from int to bool for semantic correctness
2. Using !! to explicitly convert to boolean (0 or 1)
This ensures the functions return proper boolean values regardless
of which bit position the flags occupy in the 64-bit field.
Fixes: c3a6d48f86 ("selftests: ublk: remove ublk queue self-defined flags")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The sched_ext selftests runner runs each test in the same process,
with each test possibly forking multiple times. When the main runner
has not flushed its stdout, the children inherit the buffered output
for previous tests and emit it during exit. This causes log spam.
Make sure stdout/stderr is fully flushed before each test.
Cc: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Jakub reports spurious failures of the 'conntrack_reverse_clash.sh'
selftest. A bogus test makes nat core resort to port rewrite even
though there is no need for this.
When the test is made, nf_nat_used_tuple() would already have caused us
to return if no other CPU had added a colliding entry.
Moreover, nf_nat_used_tuple() would have ignored the colliding entry if
their origin tuples had been the same.
All that is left to check is if the colliding entry in the hash table
is subject to NAT, and, if its not, if our entry matches in the reverse
direction, e.g. hash table has
addr1:1234 -> addr2:80, and we want to commit
addr2:80 -> addr1:1234.
Because we already checked that neither the new nor the committed entry is
subject to NAT we only have to check origin vs. reply tuple:
for non-nat entries, the reply tuple is always the inverted original.
Just in case there are more problems extend the error reporting
in the selftest while at it and dump conntrack table/stats on error.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251206175135.4a56591b@kernel.org/
Fixes: d8f84a9bc7 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash")
Signed-off-by: Florian Westphal <fw@strlen.de>
Add a test case for a bug fixed by Jamal [1] and for scenario where an
ets drr class is inserted into the active list twice.
- Try to delete ets drr class' qdisc while still keeping it in the
active list
- Try to add ets class to the active list twice
[1] https://lore.kernel.org/netdev/20251128151919.576920-1-jhs@mojatatu.com/
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251208190125.1868423-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmk5UkYNHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2ABGKEAChqKDT9NB2rbp1uxyj26TE8KtWB90FioA/yZcV
6H9lmTp5CXYRsYvkmRDM+xihYHSJpwNptxMVQJ/QonL+LPe8TwiP+THBBwknqrmY
JY5rgcEbhAbYBnmHJ9T1soK2FHsisUoyNjrA/6rvQQpkoi9+2j/WPrDT765RdXmM
I1fiF3qvKP1rYwpibOqXRozrlPVmBDB/ffibM7rNGR+K2N8B3vZPRbhQrc6B60ao
gj6LLdTWZJ4wduPOVo03CzDqzu8hpL1HGwGPYqP1ol3TFlSy8+ByEM+92Nj4BEfm
qX+3WDqxYc0U01pjz+TiXfI8nC1o5hzZiDWZ39eBp0zvxBoAyNCjc5ZKLE++EjvI
6hdDcJCcaOipFDv5GI/UmPadYjXf33YWsPK/9mXdChagXvaXtEB/b2KwbHUO4wMT
eJEtA9NmuvRAk6tGC9L2T7hurre/Dt8Jx35uFz0nJowqNfLxvFmAKeEB6v3yQKGv
Jb+LD3cEY3a4q7o+Vq3AVUZTqQtifW5JYwIailWHWXtxbnEtADQdWFa7NZD1Nm12
U2HFPN0z697WaDByNuK33Svc4CetE5oMXl+0WwE6Qa9hNCSnAkpyZKL3bTwJqEtQ
Ij+4IXoQndgTLbr84koJYcwlTGBDu9li7ii59xh1B1K+BLxJmu/RULxlIR4x010v
ygvezw==
=LRTM
-----END PGP SIGNATURE-----
Merge tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
1) Fix refcount leaks in nf_conncount, from Fernando Fernandez Mancera.
This addresses a recent regression that came in the last -next
pull request.
2) Fix a null dereference in route error handling in IPVS, from Slavin
Liu. This is an ancient issue dating back to 5.1 days.
3) Always set ifindex in route tuple in the flowtable output path, from
Lorenzo Bianconi. This bug came in with the recent output path refactoring.
4) Prefer 'exit $ksft_xfail' over 'exit $ksft_skip' when we fail to
trigger a nat race condition to exercise the clash resolution path in
selftest infra, $ksft_skip should be reserved for missing tooling,
From myself.
* tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: prefer xfail in case race wasn't triggered
netfilter: always set route tuple out ifindex
ipvs: fix ipv4 null-ptr-deref in route error path
netfilter: nf_conncount: fix leaked ct in error paths
====================
Link: https://patch.msgid.link/20251210110754.22620-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After fixing traffic matching in the previous patch, the test does not need
to use the sleep anymore. So drop vx_wait() altogether, migrate all callers
of vx{10,20}_create_wait() to the corresponding _create(), and drop the now
unused _create_wait() helpers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/eabfe4fa12ae788cf3b8c5c876a989de81dfc3d3.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This test runs an overlay traffic, forwarded over a multicast-routed VXLAN
underlay. In order to determine whether packets reach their intended
destination, it uses a TC match. For convenience, it uses a flower match,
which however does not allow matching on the encapsulated packet. So
various service traffic ends up being indistinguishable from the test
packets, and ends up confusing the test. To alleviate the problem, the test
uses sleep to allow the necessary service traffic to run and clear the
channel, before running the test traffic. This worked for a while, but
lately we have nevertheless seen flakiness of the test in the CI.
Fix the issue by using u32 to match the encapsulated packet as well. The
confusing packets seem to always be IPv6 multicast listener reports.
Realistically they could be ARP or other ICMP6 traffic as well. Therefore
look for ethertype IPv4 in the IPv4 traffic test, and for IPv6 / UDP
combination in the IPv6 traffic test.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/6438cb1613a2a667d3ff64089eb5994778f247af.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Flower is commonly used to match on packets in many bash-based selftests.
A dump of a flower filter including statistics looks something like this:
[
{
"protocol": "all",
"pref": 49152,
"kind": "flower",
"chain": 0
},
{
...
"options": {
...
"actions": [
{
...
"stats": {
"bytes": 0,
"packets": 0,
"drops": 0,
"overlimits": 0,
"requeues": 0,
"backlog": 0,
"qlen": 0
}
}
]
}
}
]
The JQ query in the helper function tc_rule_stats_get() assumes this form
and looks for the second element of the array.
However, a dump of a u32 filter looks like this:
[
{
"protocol": "all",
"pref": 49151,
"kind": "u32",
"chain": 0
},
{
"protocol": "all",
"pref": 49151,
"kind": "u32",
"chain": 0,
"options": {
"fh": "800:",
"ht_divisor": 1
}
},
{
...
"options": {
...
"actions": [
{
...
"stats": {
"bytes": 0,
"packets": 0,
"drops": 0,
"overlimits": 0,
"requeues": 0,
"backlog": 0,
"qlen": 0
}
}
]
}
},
]
There's an extra element which the JQ query ends up choosing.
Instead of hard-coding a particular index, look for the entry on which a
selector .options.actions yields anything.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/12982a44471c834511a0ee6c1e8f57e3a5307105.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub says: "We try to reserve SKIP for tests skipped because tool is
missing in env, something isn't built into the kernel etc."
use xfail, we can't force the race condition to appear at will
so its expected that the test 'fails' occasionally.
Fixes: 78a5883635 ("selftests: netfilter: add conntrack clash resolution test case")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251206175647.5c32f419@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Add a regression test for bpf_d_path() to cover incorrect verifier
assumptions caused by an incorrect function prototype. The test
attaches to the fallocate hook, calls bpf_d_path() and verifies that
a simple prefix comparison on the returned pathname behaves correctly
after the fix in patch 1. It ensures the verifier does not assume
the buffer remains unwritten.
Co-developed-by: Zesen Liu <ftyg@live.com>
Signed-off-by: Zesen Liu <ftyg@live.com>
Co-developed-by: Peili Gao <gplhust955@gmail.com>
Signed-off-by: Peili Gao <gplhust955@gmail.com>
Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Shuran Liu <electronlsr@gmail.com>
Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix
tfo.c: In function ‘run_server’:
tfo.c:84:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’
by evaluating the return value from read() and displaying an error message
if it reports an error.
Fixes: c65b5bb232 ("selftests: net: add passive TFO test binary")
Cc: David Wei <dw@davidwei.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-14-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix
ksft.h: In function ‘ksft_ready’:
ksft.h:27:9: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’
ksft.h: In function ‘ksft_wait’:
ksft.h:51:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’
by checking the return value of the affected functions and displaying
an error message if an error is seen.
Fixes: 2b6d490b82 ("selftests: drv-net: Factor out ksft C helpers")
Cc: Joe Damato <jdamato@fastly.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-11-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix:
gcc: error: unrecognized command-line option ‘-Wflex-array-member-not-at-end’
by making the compiler option dependent on its support.
Fixes: 1838731f10 ("selftest: af_unix: Add -Wall and -Wflex-array-member-not-at-end to CFLAGS.")
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-7-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In 'poll_partial_rec_async' a uninitialized char variable 'token' with
is used for write/read instruction to synchronize between threads
via a pipe.
tls.c:2833:26: warning: variable 'token' is uninitialized
when passed as a const pointer argument
Initialize 'token' to '\0' to silence compiler warning.
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20251205163242.14615-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Standardize on ktime_t in restart_block::time as well
(Thomas Weißschuh)
- Futex selftests:
- Add robust list testcases (André Almeida)
- Formatting fixes/cleanups (Carlos Llamas)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmk5J9QRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gimQ//VhPIVYqioY/opLXBWFAiyxWLF8RbzbsD
URC4ppVqMa0d8np4VaoaNGTCyPOvhC5m/5q14BNgNhGfbvHOrx1W5cJ122v2vroC
aNwBkapt655lHwgax4vLgM7jMPW1do9hoyqtjsUIQeKMDlyZZW22gY+ExRTPJZTm
lP6PreWR74QTTKo5pTjNCoWSaLEmuoD5rbo52SkiTYHwBLL+Tqubl92mVFFUbE+Q
9n/c/zDeRSaixKr64TvDAijGyjrH8XFhOoMyyOGCiiWyrXfjDPde0Cblt6OkBQeK
/4/9uFo2TX3vFxQ2Zhj21gqMwQ84cdPvC86GkPh3FoxRNd66gArIpC14ME7rRn0V
aDKmz+c8QhFdS9gp4+Gw2OkuzYylLl2RSoSjhz/ndrZh7OLn64cA9KjfhRscCfjD
mcFMVFcjyK8BlgjXyTltC00tLftf6pyO7bqzO4kaYYbpls687ztfZ0mUSE60gLS4
WuCnntD+Q2IADV0r5xU9ps1a37eaqiHfgtjON2wznJeyj46PbpkWxSpdTIogTRoS
5JdLv+w1LdDMHaUuEPOgYYd6L69X0HpDauHotyRlGY+SoSoCoX6DbSXTg7CSqlqY
VHeuLj2OKnELJpxLEo16pcW6qTd+BM/2jcG91H7VJxbb2sJnngT3XtgJagXE5KYR
IYjcKq6DWig=
=9KQQ
-----END PGP SIGNATURE-----
Merge tag 'locking-futex-2025-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Ingo Molnar:
- Standardize on ktime_t in restart_block::time as well (Thomas
Weißschuh)
- Futex selftests:
- Add robust list testcases (André Almeida)
- Formatting fixes/cleanups (Carlos Llamas)
* tag 'locking-futex-2025-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Store time as ktime_t in restart block
selftests/futex: Create test for robust list
selftests/futex: Skip tests if shmget unsupported
selftests/futex: Add newline to ksft_exit_fail_msg()
selftests/futex: Remove unused test_futex_mpol()
This commit adds 3 tests to verify a common compiler generated
pattern for sign extension (r1 <<= 32; r1 s>>= 32).
The tests make sure the register bounds are correctly computed both for
positive and negative register values.
Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: David Faust <david.faust@oracle.com>
Cc: Jose Marchesi <jose.marchesi@oracle.com>
Cc: Elena Zannoni <elena.zannoni@oracle.com>
Link: https://lore.kernel.org/r/20251202180220.11128-3-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test cases for situations where adding the following types of file
descriptors to a cpumap entry should fail:
- Non-BPF file descriptor (expect -EINVAL)
- Nonexistent file descriptor (expect -EBADF)
Also tighten the assertion for the expected error when adding a
non-BPF_XDP_CPUMAP program to a cpumap entry.
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20251208131449.73036-3-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If many dmabufs are present, reads of the dmabuf iterator can be
truncated at PAGE_SIZE or user buffer size boundaries before the fix in
"bpf: Fix truncated dmabuf iterator reads". Add a test to
confirm truncation does not occur.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20251204000348.1413593-2-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace instances of "__auto_type" with "auto" in:
tools/testing/selftests/bpf/prog_tests/socket_helpers.h
This file does not seem to be including <linux/compiler_types.h>
directly or indirectly, so copy the definition but guard it with
!defined(auto).
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups()
with the new helper `cg_read_key_long_poll()`. This change improves
the robustness of the test by polling the "nr_dying_descendants"
counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
Additionally, increase the retry timeout to 8 seconds (from 5 seconds)
based on testing results:
- With 5-second timeout: 4/20 runs passed.
- With 8-second timeout: 20/20 runs passed.
The 8 second timeout is based on stress testing of test_kmem_dead_cgroups()
under load: 5 seconds was occasionally not enough for reclaim of dying
descendants to complete, whereas 8 seconds consistently covered the observed
latencies. This value is intended as a generous upper bound for the
asynchronous reclaim and is not tied to any specific kernel constant, so it
can be adjusted in the future if reclaim behavior changes.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
test_memcg_sock() currently requires that memory.stat's "sock " counter
is exactly zero immediately after the TCP server exits. On a busy system
this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing
worker, so the "sock " value in memory.stat can stay non-zero for a
short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket
memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock "
counter and allowing it some time to drop to zero instead of checking
it only once. The timeout is set to 3 seconds to cover the periodic
rstat flush interval (FLUSH_TIME = 2*HZ by default) plus some
scheduling slack. If the counter does not become zero within the
timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Introduce a new helper function `cg_read_key_long_poll()` in
cgroup_util.h. This function polls the specified key in a cgroup file
until it matches the expected value or the retry limit is reached,
with configurable wait intervals between retries.
This helper is particularly useful for handling asynchronously updated
cgroup statistics (e.g., memory.stat), where immediate reads may
observe stale values, especially on busy systems. It allows tests and
other utilities to handle such cases more flexibly.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Here is the big set of tty/serial driver changes for 6.19-rc1. Nothing
major at all, just small constant churn to make the tty layer "cleaner"
as well as serial driver updates and even a new test added! Included in
here are:
- More tty/serial cleanups from Jiri
- tty tiocsti test added to hopefully ensure we don't regress in this
area again
- sc16is7xx driver updates
- imx serial driver updates
- 8250 driver updates
- new hardware device ids added
- other minor serial/tty driver cleanups and tweaks
All of these have been in linux-next for a while with no reported issues
other than a merge-conflict that you will have when you merge into your
tree (should be simple to resolve, just delete the code on both sides
of the merge).
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaTTJ+g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymiEQCgweRHAivbv2avIlfsmpFewV27WMgAnR1aE5mW
JhhopcWcdjnRJikavHGo
=lhK0
-----END PGP SIGNATURE-----
Merge tag 'tty-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.19-rc1. Nothing
major at all, just small constant churn to make the tty layer
"cleaner" as well as serial driver updates and even a new test added!
Included in here are:
- More tty/serial cleanups from Jiri
- tty tiocsti test added to hopefully ensure we don't regress in this
area again
- sc16is7xx driver updates
- imx serial driver updates
- 8250 driver updates
- new hardware device ids added
- other minor serial/tty driver cleanups and tweaks
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (60 commits)
serial: sh-sci: Fix deadlock during RSCI FIFO overrun error
dt-bindings: serial: rsci: Drop "uart-has-rtscts: false"
LoongArch: dts: Add uart new compatible string
serial: 8250: Add Loongson uart driver support
dt-bindings: serial: 8250: Add Loongson uart compatible
serial: 8250: add driver for KEBA UART
serial: Keep rs485 settings for devices without firmware node
serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms
serial: qcom-geni: Enable PM runtime for serial driver
serial: sprd: Return -EPROBE_DEFER when uart clock is not ready
tty: serial: samsung: Declare earlycon for Exynos850
serial: icom: Convert PCIBIOS_* return codes to errnos
serial: 8250-of: Fix style issues in 8250_of.c
serial: add support of CPCI cards
serial: mux: Fix kernel doc for mux_poll()
tty: replace use of system_unbound_wq with system_dfl_wq
serial: 8250_platform: simplify IRQF_SHARED handling
serial: 8250: make share_irqs local to 8250_platform
serial: 8250: move skip_txen_test to core
serial: drop SERIAL_8250_DEPRECATED_OPTIONS
...
- The 6 patch series "panic: sys_info: Refactor and fix a potential
issue" from Andy Shevchenko fixes a build issue and does some cleanup in
ib/sys_info.c.
- The 9 patch series "Implement mul_u64_u64_div_u64_roundup()" from
David Laight enhances the 64-bit math code on behalf of a PWM driver and
beefs up the test module for these library functions.
- The 2 patch series "scripts/gdb/symbols: make BPF debug info available
to GDB" from Ilya Leoshkevich makes BPF symbol names, sizes, and line
numbers available to the GDB debugger.
- The 4 patch series "Enable hung_task and lockup cases to dump system
info on demand" from Feng Tang adds a sysctl which can be used to cause
additional info dumping when the hung-task and lockup detectors fire.
- The 6 patch series "lib/base64: add generic encoder/decoder, migrate
users" from Kuan-Wei Chiu adds a general base64 encoder/decoder to lib/
and migrates several users away from their private implementations.
- The 2 patch series "rbree: inline rb_first() and rb_last()" from Eric
Dumazet makes TCP a little faster.
- The 9 patch series "liveupdate: Rework KHO for in-kernel users" from
Pasha Tatashin reworks the KEXEC Handover interfaces in preparation for
Live Update Orchestrator (LUO), and possibly for other future clients.
- The 13 patch series "kho: simplify state machine and enable dynamic
updates" from Pasha Tatashin increases the flexibility of KEXEC
Handover. Also preparation for LUO.
- The 18 patch series "Live Update Orchestrator" from Pasha Tatashin is
a major new feature targeted at cloud environments. Quoting the [0/N]:
This series introduces the Live Update Orchestrator, a kernel subsystem
designed to facilitate live kernel updates using a kexec-based reboot.
This capability is critical for cloud environments, allowing hypervisors
to be updated with minimal downtime for running virtual machines. LUO
achieves this by preserving the state of selected resources, such as
memory, devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving memfd file
descriptors, which allows critical in-memory data, such as guest RAM or
any other large memory region, to be maintained in RAM across the kexec
reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- The 3 patch series "kexec: reorganize kexec and kdump sysfs" from
Sourabh Jain moves the kexec and kdump sysfs entries from /sys/kernel/
to /sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day.
- The 2 patch series "kho: fixes for vmalloc restoration" from Mike
Rapoport fixes a BUG which was being hit during KHO restoration of
vmalloc() regions.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTSAkQAKCRDdBJ7gKXxA
jrkiAP9QKfsRv46XZaM5raScjY1ayjP+gqb2rgt6BQ/gZvb2+wD/cPAYOR6BiX52
n0pVpQmG5P/KyOmpLztn96ejL4heKwQ=
=JY96
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCaTMgExAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSN0kBALPG/cpioGMk0j3DagnUtV6fPvGuux9YTmbe
KpIWdsoCAQC5gO9nzHYIqBOL0CjMKjovljbN+W/AOiirJew95ocyAA==
=msQS
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This mainly fixes handling of disconnected directories and adds new
tests"
* tag 'landlock-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add disconnected leafs and branch test suites
selftests/landlock: Add tests for access through disconnected paths
landlock: Improve variable scope
landlock: Fix handling of disconnected directories
selftests/landlock: Fix makefile header list
landlock: Make docs in cred.h and domain.h visible
landlock: Minor comments improvements
* nvdimm: Prevent integer overflow in ramdax_get_config_data()
* Documentation: btt: Unwrap bit 31-30 nested table
* nvdimm: replace use of system_wq with system_percpu_wq
* tools/testing/nvdimm: Use per-DIMM device handle
* nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCaTMAfRQcaXJhLndlaW55
QGludGVsLmNvbQAKCRCebuN7TNx1MWGXAP4lGiQL2fMOpP0PnFVf6aiiNyaTzdaa
47haKGmsvM6R1AEAqIPelP1tZ+zh6au2G27SJXzJAJk3Nxg1tpxQnZshmg0=
=LhgA
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm updates from Ira Weiny:
"These are mainly bug fixes and code updates.
There is a new feature to divide up memmap= carve outs and a fix
caught in linux-next for that patch. Managing memmap memory on the fly
for multiple VM's was proving difficult and Mike provided a driver
which allows for the memory to be better manged.
Summary:
- Allow exposing RAM carveouts as NVDIMM DIMM devices
- Prevent integer overflow in ramdax_get_config_data()
- Replace use of system_wq with system_percpu_wq
- Documentation: btt: Unwrap bit 31-30 nested table
- tools/testing/nvdimm: Use per-DIMM device handle"
* tag 'libnvdimm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Prevent integer overflow in ramdax_get_config_data()
Documentation: btt: Unwrap bit 31-30 nested table
nvdimm: replace use of system_wq with system_percpu_wq
tools/testing/nvdimm: Use per-DIMM device handle
nvdimm: allow exposing RAM carveouts as NVDIMM DIMM devices
- next part of DMA mapping API refactoring to physical addresses as the primary
interface instead of page+offset parameters; this time dma_map_ops callbacks
are converted to physical addresses, what in turn results also in some
simplification of architecture specific code (Leon Romanovsky and Jason
Gunthorpe)
- clarify that dma_map_benchmark is not a kernel self-test, but standalone
tool (Qinxin Xia)
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaTLpeQAKCRCJp1EFxbsS
RKlAAQCo/gVslheoJ+h4Hk5oUjLfVQaiamJOlzxw12EVHVAs3AEA7GILIAL1GbGn
EpkgwjIuz/J/4aGhNbYO6J+C1qMAHww=
=aM6P
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping updates from Marek Szyprowski:
- More DMA mapping API refactoring to physical addresses as the primary
interface instead of page+offset parameters.
This time dma_map_ops callbacks are converted to physical addresses,
what in turn results also in some simplification of architecture
specific code (Leon Romanovsky and Jason Gunthorpe)
- Clarify that dma_map_benchmark is not a kernel self-test, but
standalone tool (Qinxin Xia)
* tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: remove unused map_page callback
xen: swiotlb: Convert mapping routine to rely on physical address
x86: Use physical address for DMA mapping
sparc: Use physical address DMA mapping
powerpc: Convert to physical address DMA mapping
parisc: Convert DMA map_page to map_phys interface
MIPS/jazzdma: Provide physical address directly
alpha: Convert mapping routine to rely on physical address
dma-mapping: remove unused mapping resource callbacks
xen: swiotlb: Switch to physical address mapping callbacks
ARM: dma-mapping: Switch to physical address mapping callbacks
ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*()
dma-mapping: convert dummy ops to physical address mapping
dma-mapping: prepare dma_map_ops to conversion to physical address
tools/dma: move dma_map_benchmark from selftests to tools/dma
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register.
- Add AVEC basic support.
- Use 64-bit register definition for EIOINTC.
- Add KVM timer test cases for tools/selftests.
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of
exits (and worse performance) on z10 and earlier processor;
ESCA was introduced by z114/z196 in 2010.
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE
caching is disabled, as there can't be any relevant SPTEs to zap.
- Relocate a misplaced export.
- Fix an async #PF bug where KVM would clear the completion queue when the
guest transitioned in and out of paging mode, e.g. when handling an SMI and
then returning to paged mode via RSM.
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping
the notifier registered is ok; the kernel does not use the MSRs and the
callback will run cleanly and restore host MSRs if the CPU manages to
return to userspace before the system goes down.
- Use the checked version of {get,put}_user().
- Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
timers can result in a hard lockup in the host.
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections.
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
behind CONFIG_CPU_MITIGATIONS.
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path;
the only reason they were handled in the fast path was to paper of a bug
in the core #MC code, and that has long since been fixed.
- Add emulator support for AVX MOV instructions, to play nice with emulated
devices whose guest drivers like to access PCI BARs with large multi-byte
instructions.
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs.
- Fix the worst of KVM's lack of EFER.LMSLE emulation.
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
- Fix incorrect handling of selective CR0 writes when checking intercepts
during emulation of L2 instructions.
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
VMRUN and #VMEXIT.
- Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
interrupt if the guest patched the underlying code after the VM-Exit, e.g.
when Linux patches code with a temporary INT3.
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
userspace, and extend KVM "support" to all policy bits that don't require
any actual support from KVM.
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of the
current vCPU state, partly as worthwhile cleanup, but mostly to pave the
way for tracking per-root TLB flushes, and elide EPT flushes on pCPU
migration if the root is clean from a previous flush.
- Add a few missing nested consistency checks.
- Rip out support for doing "early" consistency checks via hardware as the
functionality hasn't been used in years and is no longer useful in general;
replace it with an off-by-default module param to WARN if hardware fails
a check that KVM does not perform.
- Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
on VM-Enter.
- Misc cleanups.
- Overhaul the TDX code to address systemic races where KVM (acting on behalf
of userspace) could inadvertantly trigger lock contention in the TDX-Module;
KVM was either working around these in weird, ugly ways, or was simply
oblivious to them (though even Yan's devilish selftests could only break
individual VMs, not the host kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU,
if creating said vCPU failed partway through.
- Fix a few sparse warnings (bad annotation, 0 != NULL).
- Use struct_size() to simplify copying TDX capabilities to userspace.
- Fix a bug where TDX would effectively corrupt user-return MSR values if the
TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
- Forcefully override ARCH from x86_64 to x86 to play nice with specifying
ARCH=x86_64 on the command line.
- Extend a bunch of nested VMX to validate nested SVM as well.
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
verify KVM can save/restore nested VMX state when L1 is using 5-level
paging, but L2 is not.
- Clean up the guest paging code in anticipation of sharing the core logic for
nested EPT and nested NPT.
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
rough edges in guest_memfd along the way.
- Define a CLASS to automatically handle get+put when grabbing a guest_memfd
from a memslot to make it harder to leak references.
- Enhance KVM selftests to make it easer to develop and debug selftests like
those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
often result in hard-to-debug SIGBUS errors.
- Misc cleanups.
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
irqfd cleanup.
- Fix a goof in the dirty ring documentation.
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkvMa8UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMlFwf+Ow7zOYUuELSQ+Jn+hOYXiCNrdBDx
ZamvMU8kLPr7XX0Zog6HgcMm//qyA6k5nSfqCjfsQZrIhRA/gWJ61jz1OX/Jxq18
pJ9Vz6epnEPYiOtBwz+v8OS8MqDqVNzj2i6W1/cLPQE50c1Hhw64HWS5CSxDQiHW
A7PVfl5YU12lW1vG3uE0sNESDt4Eh/spNM17iddXdF4ZUOGublserjDGjbc17E7H
8BX3DkC2plqkJKwtjg0ae62hREkITZZc7RqsnftUkEhn0N0H9+rb6NKUyzIVh9NZ
bCtCjtrKN9zfZ0Mujnms3ugBOVqNIputu/DtPnnFKXtXWSrHrgGSNv5ewA==
=PEcw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- Support for userspace handling of synchronous external aborts
(SEAs), allowing the VMM to potentially handle the abort in a
non-fatal manner
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers
in hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU
- Allow page table destruction to reschedule, fixing long
need_resched latencies observed when destroying a large VM
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register
- Add AVEC basic support
- Use 64-bit register definition for EIOINTC
- Add KVM timer test cases for tools/selftests
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of exits
(and worse performance) on z10 and earlier processor; ESCA was
introduced by z114/z196 in 2010
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
SPTE caching is disabled, as there can't be any relevant SPTEs to
zap
- Relocate a misplaced export
- Fix an async #PF bug where KVM would clear the completion queue
when the guest transitioned in and out of paging mode, e.g. when
handling an SMI and then returning to paged mode via RSM
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
keeping the notifier registered is ok; the kernel does not use the
MSRs and the callback will run cleanly and restore host MSRs if the
CPU manages to return to userspace before the system goes down
- Use the checked version of {get,put}_user()
- Fix a long-lurking bug where KVM's lack of catch-up logic for
periodic APIC timers can result in a hard lockup in the host
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
latter behind CONFIG_CPU_MITIGATIONS
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
path; the only reason they were handled in the fast path was to
paper of a bug in the core #MC code, and that has long since been
fixed
- Add emulator support for AVX MOV instructions, to play nice with
emulated devices whose guest drivers like to access PCI BARs with
large multi-byte instructions
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs
- Fix the worst of KVM's lack of EFER.LMSLE emulation
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode
- Fix incorrect handling of selective CR0 writes when checking
intercepts during emulation of L2 instructions
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
on VMRUN and #VMEXIT
- Fix a bug where KVM corrupt the guest code stream when re-injecting
a soft interrupt if the guest patched the underlying code after the
VM-Exit, e.g. when Linux patches code with a temporary INT3
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
to userspace, and extend KVM "support" to all policy bits that
don't require any actual support from KVM
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of
the current vCPU state, partly as worthwhile cleanup, but mostly to
pave the way for tracking per-root TLB flushes, and elide EPT
flushes on pCPU migration if the root is clean from a previous
flush
- Add a few missing nested consistency checks
- Rip out support for doing "early" consistency checks via hardware
as the functionality hasn't been used in years and is no longer
useful in general; replace it with an off-by-default module param
to WARN if hardware fails a check that KVM does not perform
- Fix a currently-benign bug where KVM would drop the guest's
SPEC_CTRL[63:32] on VM-Enter
- Misc cleanups
- Overhaul the TDX code to address systemic races where KVM (acting
on behalf of userspace) could inadvertantly trigger lock contention
in the TDX-Module; KVM was either working around these in weird,
ugly ways, or was simply oblivious to them (though even Yan's
devilish selftests could only break individual VMs, not the host
kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
TDX vCPU, if creating said vCPU failed partway through
- Fix a few sparse warnings (bad annotation, 0 != NULL)
- Use struct_size() to simplify copying TDX capabilities to userspace
- Fix a bug where TDX would effectively corrupt user-return MSR
values if the TDX Module rejects VP.ENTER and thus doesn't clobber
host MSRs as expected
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU
system/VM
- Forcefully override ARCH from x86_64 to x86 to play nice with
specifying ARCH=x86_64 on the command line
- Extend a bunch of nested VMX to validate nested SVM as well
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test
to verify KVM can save/restore nested VMX state when L1 is using
5-level paging, but L2 is not
- Clean up the guest paging code in anticipation of sharing the core
logic for nested EPT and nested NPT
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety
of rough edges in guest_memfd along the way
- Define a CLASS to automatically handle get+put when grabbing a
guest_memfd from a memslot to make it harder to leak references
- Enhance KVM selftests to make it easer to develop and debug
selftests like those added for guest_memfd NUMA support, e.g. where
test and/or KVM bugs often result in hard-to-debug SIGBUS errors
- Misc cleanups
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU
workqueue for irqfd cleanup
- Fix a goof in the dirty ring documentation
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
...
- Enable parallel hotplug for RISC-V
- Optimize vector regset allocation for ptrace()
- Add a kernel selftest for the vector ptrace interface
- Enable the userspace RAID6 test to build and run using RISC-V
vectors
- Add initial support for the Zalasr RISC-V ratified ISA extension
- For the Zicbop RISC-V ratified ISA extension to userspace, expose
hardware and kernel support to userspace and add a kselftest for
Zicbop
- Convert open-coded instances of 'asm goto's that are controlled by
runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
following arm64's alternative_has_cap_{un,}likely()
- Remove an unnecessary mask in the GFP flags used in some calls to
pagetable_alloc()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmkx2zkACgkQx4+xDQu9
KkvagQ/+Iv94J0S8VWS4vvRMtvPMkEHvPADCmEOrMrVsQ+hZdasKsSFBKY4DHZt6
baGKEQoyJ0NDrIt51uNWzmR503/PGPwVYSDfrgrpS87IkcO9OWe/HiMSEAu0aCyp
dhKGYnjWUtawWzFQg1GQ2n1vLOm5cQ2u1vTptISqU1yg9XlXUN0LNzIqNB8GpQv3
Hm7qqNFyxZOyAC9/ZFGbX2/0KpKQh1xkXSEONxzRGADJ/bqHKKz3hdaOj3aVcoMM
zObvHBm+I0U6AnGSgiZm71dvO0vlijg1RsuD/wd1DlcGO9QAuaPHX+RKqmJUJFe8
d0JjOon+d6n/pBKxoPnZyfB1IHxFNb3kX3LjmKdP6NYeOmdHZ7LI3dzB+LFyPXZe
mkC948+9GExEqmHQx5ZqyCWDwKtknIQPA45ZjBi8e5YOU8nJQiapShYWQu/6ybKV
GFHRqjDIGfhpjZGVdxnUX2Iok1XcwaGKrI6/6P6WlQ/zHKGurtEEtGiI7XHDYS8m
FSGxYay4GIWUsNbNSBWcZp5QaGH0jW7qiZle3DR+8gDLe1DJzbIo6pWMiArm5v7x
fUmroR4FcF9x2X7A01IB2tEUGf/0nfuHgVfMNPNoHBGsiRs9mXzLMngJjbFXOGyF
EP61/W+K+eaJ0jjkfmnscBQEy8URLSNoACnwQLT9SQcAIC4xWTs=
=J1nv
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Enable parallel hotplug for RISC-V
- Optimize vector regset allocation for ptrace()
- Add a kernel selftest for the vector ptrace interface
- Enable the userspace RAID6 test to build and run using RISC-V vectors
- Add initial support for the Zalasr RISC-V ratified ISA extension
- For the Zicbop RISC-V ratified ISA extension to userspace, expose
hardware and kernel support to userspace and add a kselftest for
Zicbop
- Convert open-coded instances of 'asm goto's that are controlled by
runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
following arm64's alternative_has_cap_{un,}likely()
- Remove an unnecessary mask in the GFP flags used in some calls to
pagetable_alloc()
* tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
selftests/riscv: Add Zicbop prefetch test
riscv: hwprobe: Expose Zicbop extension and its block size
riscv: Introduce Zalasr instructions
riscv: hwprobe: Export Zalasr extension
dt-bindings: riscv: Add Zalasr ISA extension description
riscv: Add ISA extension parsing for Zalasr
selftests: riscv: Add test for the Vector ptrace interface
riscv: ptrace: Optimize the allocation of vector regset
raid6: test: Add support for RISC-V
raid6: riscv: Allow code to be compiled in userspace
raid6: riscv: Prevent compiler from breaking inline vector assembly code
riscv: cmpxchg: Use riscv_has_extension_likely
riscv: bitops: Use riscv_has_extension_likely
riscv: hweight: Use riscv_has_extension_likely
riscv: checksum: Use riscv_has_extension_likely
riscv: pgtable: Use riscv_has_extension_unlikely
riscv: Remove __GFP_HIGHMEM masking
RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
Make sure 1) a timer callback can also reference the associated
struct_ops, and then make sure 2) the timer callback cannot get a
dangled pointer to the struct_ops when the map is freed.
The test schedules a timer callback from a struct_ops program since
struct_ops programs do not pin the map. It is possible for the timer
callback to run after the map is freed. The timer callback calls a
kfunc that runs .test_1() of the associated struct_ops, which should
return MAP_MAGIC when the map is still alive or -1 when the map is
gone.
The first subtest added in this patch schedules the timer callback to
run immediately, while the map is still alive. The second subtest added
schedules the callback to run 500ms after syscall_prog runs and then
frees the map right after syscall_prog runs. Both subtests then wait
until the callback runs to check the return of the kfunc.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-7-ameryhung@gmail.com
Add a test to make sure implicit struct_ops association does not
break backward compatibility nor return incorrect struct_ops.
struct_ops programs should still be allowed to be reused in
different struct_ops map. The associated struct_ops map set implicitly
however will be poisoned. Trying to read it through the helper
bpf_prog_get_assoc_struct_ops() should result in a NULL pointer.
While recursion of test_1() cannot happen due to the associated
struct_ops being ambiguois, explicitly check for it to prevent stack
overflow if the test regresses.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-6-ameryhung@gmail.com
Test BPF_PROG_ASSOC_STRUCT_OPS command that associates a BPF program
with a struct_ops. The test follows the same logic in commit
ba7000f1c3 ("selftests/bpf: Test multi_st_ops and calling kfuncs from
different programs"), but instead of using map id to identify a specific
struct_ops, this test uses the new BPF command to associate a struct_ops
with a program.
The test consists of two sets of almost identical struct_ops maps and BPF
programs associated with the map. Their only difference is the unique
value returned by bpf_testmod_multi_st_ops::test_1().
The test first loads the programs and associates them with struct_ops
maps. Then, it exercises the BPF programs. They will in turn call kfunc
bpf_kfunc_multi_st_ops_test_1_prog_arg() to trigger test_1() of the
associated struct_ops map, and then check if the right unique value is
returned.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-5-ameryhung@gmail.com
- The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from
Uladzislau Rezki reworks the vmalloc() code to support non-blocking
allocations (GFP_ATOIC, GFP_NOWAIT).
- The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes
a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited
across fork/exec.
- The 4 patch series "mm/zswap: misc cleanup of code and documentations"
from SeongJae Park does some light maintenance work on the zswap code.
- The 5 patch series "mm/page_owner: add debugfs files 'show_handles'
and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the
/sys/kernel/debug/page_owner debug feature. It adds unique identifiers
to differentiate the various stack traces so that userspace monitoring
tools can better match stack traces over time.
- The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua
Hahn makes some minor alterations to the page allocator's per-cpu-pages
feature.
- The 2 patch series "Improve UFFDIO_MOVE scalability by removing
anon_vma lock" from Lokesh Gidra addresses a scalability issue in
userfaultfd's UFFDIO_MOVE operation.
- The 2 patch series "kasan: cleanups for kasan_enabled() checks" from
Sabyrzhan Tasbolatov performs some cleanup in the KASAN code.
- The 2 patch series "drivers/base/node: fold node register and
unregister functions" from Donet Tom cleans up the NUMA node handling
code a little.
- The 4 patch series "mm: some optimizations for prot numa" from Kefeng
Wang provides some cleanups and small optimizations to the NUMA
allocation hinting code.
- The 5 patch series "mm/page_alloc: Batch callers of
free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at
boot on large machines. These were causing (harmless) softlockup
warnings.
- The 2 patch series "optimize the logic for handling dirty file folios
during reclaim" from Baolin Wang removes some now-unnecessary work from
page reclaim.
- The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg
per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning
feature.
- The 2 patch series "mm/damon: fixes for address alignment issues in
DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT
and DAMON_RECLAIM with certain userspace configuration.
- The 15 patch series "expand mmap_prepare functionality, port more
users" from Lorenzo Stoakes enhances the new(ish)
file_operations.mmap_prepare() method and ports additional callsites
from the old ->mmap() over to ->mmap_prepare().
- The 8 patch series "Fix stale IOTLB entries for kernel address space"
from Lu Baolu fixes a bug (and possible security issue on non-x86) in
the IOMMU code. In some situations the IOMMU could be left hanging onto
a stale kernel pagetable entry.
- The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()"
from Wei Yang cleans up and optimizes the folio splitting code.
- The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui
Song implements some cleanups and a minor fix in the swap discard code.
- The 8 patch series "mm/damon: misc documentation fixups" from SeongJae
Park does as advertised.
- The 9 patch series "mm/damon: support pin-point targets removal" from
SeongJae Park permits userspace to remove a specific monitoring target
in the middle of the current targets list.
- The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h"
from Harry Yoo implements a couple of cleanups related to mm header file
inclusion.
- The 2 patch series "mm/swapfile.c: select swap devices of default
priority round robin" from Baoquan He improves the selection of swap
devices for NUMA machines.
- The 3 patch series "mm: Convert memory block states (MEM_*) macros to
enums" from Israel Batista changes the memory block labels from macros
to enums so they will appear in kernel debug info.
- The 3 patch series "ksm: perform a range-walk to jump over holes in
break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM
unmerges an address range.
- The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests"
from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON
userspace unit tests.
- The 2 patch series "some cleanups for pageout()" from Baolin Wang
cleans up a couple of minor things in the page scanner's
writeback-for-eviction code.
- The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from
Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file.
- The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from
Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps
and improves the mergeability of guarded VMAs.
- The 2 patch series "mm: perform guard region install/remove under VMA
lock" from Lorenzo Stoakes reduces mmap lock contention for callers
performing VMA guard region operations.
- The 2 patch series "vma_start_write_killable" from Matthew Wilcox
starts work in permitting applications to be killed when they are
waiting on a read_lock on the VMA lock.
- The 11 patch series "mm/damon/tests: add more tests for online
parameters commit" from SeongJae Park adds additional userspace testing
of DAMON's "commit" feature.
- The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does
that.
- The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo
Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when
that VMA is merged with another.
- The 16 patch series "mm: support device-private THP" from Balbir Singh
introduces support for Transparent Huge Page (THP) migration in zone
device-private memory.
- The 3 patch series "Optimize folio split in memory failure" from Zi
Yan optimizes folio split operations in the memory failure code.
- The 2 patch series "mm/huge_memory: Define split_type and consolidate
split support checks" from Wei Yang provides some more cleanups in the
folio splitting code.
- The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap
entries, introduce leaf entries" from Lorenzo Stoakes cleans up our
handling of pagetable leaf entries by introducing the concept of
'software leaf entries', of type softleaf_t.
- The 4 patch series "reparent the THP split queue" from Muchun Song
reparents the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory resources.
- The 3 patch series "unify PMD scan results and remove redundant
cleanup" from Wei Yang does a little cleanup in the hugepage collapse
code.
- The 6 patch series "zram: introduce writeback bio batching" from
Sergey Senozhatsky improves zram writeback efficiency by introducing
batched bio writeback support.
- The 4 patch series "memcg: cleanup the memcg stats interfaces" from
Shakeel Butt cleans up our handling of the interrupt safety of some
memcg stats.
- The 4 patch series "make vmalloc gfp flags usage more apparent" from
Vishal Moola cleans up vmalloc's handling of incoming GFP flags.
- The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V"
from Chunyan Zhang teches soft dirty and userfaultfd write protect
tracking to use RISC-V's Svrsw60t59b extension.
- The 5 patch series "mm: swap: small fixes and comment cleanups" from
Youngjun Park fixes a small bug and cleans up some of the swap code.
- The 4 patch series "initial work on making VMA flags a bitmap" from
Lorenzo Stoakes starts work on converting the vma struct's flags to a
bitmap, so we stop running out of them, especially on 32-bit.
- The 2 patch series "mm/swapfile: fix and cleanup swap list iterations"
from Youngjun Park addresses a possible bug in the swap discard code and
cleans things up a little.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA
jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng
o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4=
=IUzS
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
- Fix incorrect variable in error message in config-bisect.pl
If the old config file fails to get copied as the last good or bad
config file, then it fails the program and prints an error message.
But the variable used to print what the old config's name was incorrect.
It was $config when it should have been $output_config.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTL5PxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnWBAQDtZ+UNtbeNR2eHzbcQ1+ENi0aWGwF9
e93hKyvAWkMgWAEAklDIdstyCaSQQgq3X4ilv1kaG1eu+KSWNnyhmqnyqAM=
=KPW2
-----END PGP SIGNATURE-----
Merge tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest fix from Steven Rostedt:
- Fix incorrect variable in error message in config-bisect.pl
If the old config file fails to get copied as the last good or bad
config file, then it fails the program and prints an error message.
But the variable used to print what the old config's name was
incorrect. It was $config when it should have been $output_config.
* tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Fix uninitialized var in config-bisect.pl
- Adapt the ftracetest script to be run from a different folder
This uses the already existing OPT_TEST_DIR but extends it further to run
independent tests, then add an --rv flag to allow using the script for
testing RV (mostly) independently on ftrace.
- Add basic RV selftests in selftests/verification for more validations
Add more validations for available/enabled monitors and reactors. This
could have caught the bug introducing kernel panic solved above. Tests use
ftracetest.
- Convert react() function in reactor to use va_list directly
Use a central helper to handle the variadic arguments. Clean up macros
and mark functions as static.
- Add lockdep annotations to reactors to have lockdep complain of errors
If the reactors are called from improper context. Useful to develop new
reactors. This highlights a warning in the panic reactor that is related
to the printk subsystem and not to RV.
- Convert core RV code to use lock guards and __free helpers
This completely removes goto statements.
- Fix compilation if !CONFIG_RV_REACTORS
Fix the warning by keeping LTL monitor variable as always static.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTBoVxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWpAQDxPQAJQvBZ41l9q9Cis7PqGGezT4Nv
g6Fh/ydMOlJCsQD/R0Xd5JxPmBI8FLCwCfqHo7wYKUhP8GfL/ORPEWhU2gI=
=EEot
-----END PGP SIGNATURE-----
Merge tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull runtime verifier updates from Steven Rostedt:
- Adapt the ftracetest script to be run from a different folder
This uses the already existing OPT_TEST_DIR but extends it further to
run independent tests, then add an --rv flag to allow using the
script for testing RV (mostly) independently on ftrace.
- Add basic RV selftests in selftests/verification for more validations
Add more validations for available/enabled monitors and reactors.
This could have caught the bug introducing kernel panic solved above.
Tests use ftracetest.
- Convert react() function in reactor to use va_list directly
Use a central helper to handle the variadic arguments. Clean up
macros and mark functions as static.
- Add lockdep annotations to reactors to have lockdep complain of
errors
If the reactors are called from improper context. Useful to develop
new reactors. This highlights a warning in the panic reactor that is
related to the printk subsystem and not to RV.
- Convert core RV code to use lock guards and __free helpers
This completely removes goto statements.
- Fix compilation if !CONFIG_RV_REACTORS
Fix the warning by keeping LTL monitor variable as always static.
* tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix compilation if !CONFIG_RV_REACTORS
rv: Convert to use __free
rv: Convert to use lock guard
rv: Add explicit lockdep context for reactors
rv: Make rv_reacting_on() static
rv: Pass va_list to reactors
selftests/verification: Add initial RV tests
selftest/ftrace: Generalise ftracetest to use with RV
This pull request for TPM driver contains changes to unify TPM return
code translation between trusted_tpm2 and TPM driver itself. Other than
that the changes are either bug fixes or minor imrovements.
Change log that should explain the previous iterations:
1. "Documentation: tpm-security.rst: change title to section"
https://lore.kernel.org/all/86514a6ab364e01f163470a91cacef120e1b8b47.camel@HansenPartnership.com/
2. "drivers/char/tpm: use min() instead of min_t()"
https://lore.kernel.org/all/20251201161228.3c09d88a@pumpkin/
3. Removed spurious kfree(): https://lore.kernel.org/linux-integrity/aS+K5nO2MP7N+kxQ@ly-workstation/
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRE6pSOnaBC00OEHEIaerohdGur0gUCaTComgAKCRAaerohdGur
0t38AQDThfcJhDmgfR3zYo0C8rtNQwM06fnooqsiDjTRHYXu6QEArRKJfR9B/vpN
vIAloxIgIUxQbewBJ1DfxJ7OVO2kGwA=
=hMwZ
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-6.19-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"This contains changes to unify TPM return code translation between
trusted_tpm2 and TPM driver itself. Other than that the changes are
either bug fixes or minor imrovements"
* tag 'tpmdd-next-6.19-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Use tpm_ret_to_err() in trusted_tpm2
tpm: Use -EPERM as fallback error code in tpm_ret_to_err
tpm: Cap the number of PCR banks
tpm: Remove tpm_find_get_ops
tpm: add WQ_PERCPU to alloc_workqueue users
tpm_crb: add missing loc parameter to kerneldoc
tpm_crb: Fix a spelling mistake
selftests: tpm2: Fix ill defined assertions
- Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO. This
is the first step to broader DMABUF support in iommufd, right now it
only works with VFIO. This closes the last functional gap with classic
VFIO type 1 to safely support PCI peer to peer DMA by mapping the VFIO
device's MMIO into the IOMMU.
- Relax SMMUv3 restrictions on nesting domains to better support qemu's
sequence to have an identity mapping before the vSID is established.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaS8DrgAKCRCFwuHvBreF
YY/kAP0Q1s7LkVb83uQb8kIW3xKzEnFNTlhrSSGV5UBuYLbaDgD+J+y+4VrSkJem
85LMipmzaoZdHqtxMhQWrlYbZMr9TAM=
=BacK
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"This is a pretty consequential cycle for iommufd, though this pull is
not too big. It is based on a shared branch with VFIO that introduces
VFIO_DEVICE_FEATURE_DMA_BUF a DMABUF exporter for VFIO device's MMIO
PCI BARs. This was a large multiple series journey over the last year
and a half.
Based on that work IOMMUFD gains support for VFIO DMABUF's in its
existing IOMMU_IOAS_MAP_FILE, which closes the last major gap to
support PCI peer to peer transfers within VMs.
In Joerg's iommu tree we have the "generic page table" work which aims
to consolidate all the duplicated page table code in every iommu
driver into a single algorithm. This will be used by iommufd to
implement unique page table operations to start adding new features
and improve performance.
In here:
- Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO.
This is the first step to broader DMABUF support in iommufd, right
now it only works with VFIO. This closes the last functional gap
with classic VFIO type 1 to safely support PCI peer to peer DMA by
mapping the VFIO device's MMIO into the IOMMU.
- Relax SMMUv3 restrictions on nesting domains to better support
qemu's sequence to have an identity mapping before the vSID is
established"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases
iommufd/selftest: Add some tests for the dmabuf flow
iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
iommufd: Have iopt_map_file_pages convert the fd to a file
iommufd: Have pfn_reader process DMABUF iopt_pages
iommufd: Allow MMIO pages in a batch
iommufd: Allow a DMABUF to be revoked
iommufd: Do not map/unmap revoked DMABUFs
iommufd: Add DMABUF to iopt_pages
vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
- Move libvfio selftest artifacts in preparation of more tightly
coupled integration with KVM selftests. (David Matlack)
- Fix comment typo in mtty driver. (Chu Guangqing)
- Support for new hardware revision in the hisi_acc vfio-pci variant
driver where the migration registers can now be accessed via the PF.
When enabled for this support, the full BAR can be exposed to the
user. (Longfang Liu)
- Fix vfio cdev support for VF token passing, using the correct size
for the kernel structure, thereby actually allowing userspace to
provide a non-zero UUID token. Also set the match token callback for
the hisi_acc, fixing VF token support for this this vfio-pci variant
driver. (Raghavendra Rao Ananta)
- Introduce internal callbacks on vfio devices to simplify and
consolidate duplicate code for generating VFIO_DEVICE_GET_REGION_INFO
data, removing various ioctl intercepts with a more structured
solution. (Jason Gunthorpe)
- Introduce dma-buf support for vfio-pci devices, allowing MMIO regions
to be exposed through dma-buf objects with lifecycle managed through
move operations. This enables low-level interactions such as a
vfio-pci based SPDK drivers interacting directly with dma-buf capable
RDMA devices to enable peer-to-peer operations. IOMMUFD is also now
able to build upon this support to fill a long standing feature gap
versus the legacy vfio type1 IOMMU backend with an implementation of
P2P support for VM use cases that better manages the lifecycle of the
P2P mapping. (Leon Romanovsky, Jason Gunthorpe, Vivek Kasireddy)
- Convert eventfd triggering for error and request signals to use RCU
mechanisms in order to avoid a 3-way lockdep reported deadlock issue.
(Alex Williamson)
- Fix a 32-bit overflow introduced via dma-buf support manifesting with
large DMA buffers. (Alex Mastro)
- Convert nvgrace-gpu vfio-pci variant driver to insert mappings on
fault rather than at mmap time. This conversion serves both to make
use of huge PFNMAPs but also to both avoid corrected RAS events
during reset by now being subject to vfio-pci-core's use of
unmap_mapping_range(), and to enable a device readiness test after
reset. (Ankit Agrawal)
- Refactoring of vfio selftests to support multi-device tests and split
code to provide better separation between IOMMU and device objects.
This work also enables a new test suite addition to measure parallel
device initialization latency. (David Matlack)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkvV3IRHGFsZXhAc2hh
emJvdC5vcmcACgkQI5ubbjuwiyIpIQ/9GwpjLH5Vdv0v2d9mkHmZIWFpG/tr3zJa
+spQqOjO0etASc67PtIJArT9pWib+s6O8OaG7iFrdNR65HCSsXSZbIGbMThPODfy
DdDj1ipAqMVwcaCZT8un2N8Sktu9YpFQMvc5IoXWWYhw88vili7bBx+OTrEFV2T0
6qQijSBdhw1TXVFHG6BGSmqmisyMepIebA6GmPWdfYu6BfoWBYMdcMjDwd1J61Q5
DDwFRzn/Dz2Tvb1jbXiiRMRuFIuegFQii+wtd30S/cRPFZhZLWzc+drimC6oOFiQ
qL19vQQsBPnLtGvch40HsET/AbY5w0pLCkYX5qacxP3sq27+N+KuotzCvbnVMN+H
e2BqOCujyoce8z1Br6BzV71Lr2yzPDcc5pXTuEuuBT+J/ptOY8hfEikOj85s5Wzj
aKsTrdDRGMrn/o11NkGSzYwFcMs9MxCX9mo98U6OkWDr0+cmPLf4LGZgpJudWg4E
POUlzPpnzJrTlX5d+OqCdKJG0a1hTlTa2udzRa5hCDANHaZWLoAssfgSEKfV9xt1
PzOMf0UIJmPJmFcw/OpMO72/5xp8O4WslJS0ulSm6vrAJDtutLApHZ7bJ44KniNd
4vte+gOjyZY8ibTDKRULhXVlCDxkEnZjRBbApgI9HJD61IElOzjqohRuRx77J09B
7c8OSLI8d1U=
=tpee
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Move libvfio selftest artifacts in preparation of more tightly
coupled integration with KVM selftests (David Matlack)
- Fix comment typo in mtty driver (Chu Guangqing)
- Support for new hardware revision in the hisi_acc vfio-pci variant
driver where the migration registers can now be accessed via the PF.
When enabled for this support, the full BAR can be exposed to the
user (Longfang Liu)
- Fix vfio cdev support for VF token passing, using the correct size
for the kernel structure, thereby actually allowing userspace to
provide a non-zero UUID token. Also set the match token callback for
the hisi_acc, fixing VF token support for this this vfio-pci variant
driver (Raghavendra Rao Ananta)
- Introduce internal callbacks on vfio devices to simplify and
consolidate duplicate code for generating VFIO_DEVICE_GET_REGION_INFO
data, removing various ioctl intercepts with a more structured
solution (Jason Gunthorpe)
- Introduce dma-buf support for vfio-pci devices, allowing MMIO regions
to be exposed through dma-buf objects with lifecycle managed through
move operations. This enables low-level interactions such as a
vfio-pci based SPDK drivers interacting directly with dma-buf capable
RDMA devices to enable peer-to-peer operations. IOMMUFD is also now
able to build upon this support to fill a long standing feature gap
versus the legacy vfio type1 IOMMU backend with an implementation of
P2P support for VM use cases that better manages the lifecycle of the
P2P mapping (Leon Romanovsky, Jason Gunthorpe, Vivek Kasireddy)
- Convert eventfd triggering for error and request signals to use RCU
mechanisms in order to avoid a 3-way lockdep reported deadlock issue
(Alex Williamson)
- Fix a 32-bit overflow introduced via dma-buf support manifesting with
large DMA buffers (Alex Mastro)
- Convert nvgrace-gpu vfio-pci variant driver to insert mappings on
fault rather than at mmap time. This conversion serves both to make
use of huge PFNMAPs but also to both avoid corrected RAS events
during reset by now being subject to vfio-pci-core's use of
unmap_mapping_range(), and to enable a device readiness test after
reset (Ankit Agrawal)
- Refactoring of vfio selftests to support multi-device tests and split
code to provide better separation between IOMMU and device objects.
This work also enables a new test suite addition to measure parallel
device initialization latency (David Matlack)
* tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio: (65 commits)
vfio: selftests: Add vfio_pci_device_init_perf_test
vfio: selftests: Eliminate INVALID_IOVA
vfio: selftests: Split libvfio.h into separate header files
vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c
vfio: selftests: Rename vfio_util.h to libvfio.h
vfio: selftests: Stop passing device for IOMMU operations
vfio: selftests: Move IOVA allocator into iova_allocator.c
vfio: selftests: Move IOMMU library code into iommu.c
vfio: selftests: Rename struct vfio_dma_region to dma_region
vfio: selftests: Upgrade driver logging to dev_err()
vfio: selftests: Prefix logs with device BDF where relevant
vfio: selftests: Eliminate overly chatty logging
vfio: selftests: Support multiple devices in the same container/iommufd
vfio: selftests: Introduce struct iommu
vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode
vfio: selftests: Allow passing multiple BDFs on the command line
vfio: selftests: Split run.sh into separate scripts
vfio: selftests: Move run.sh into scripts directory
vfio/nvgrace-gpu: wait for the GPU mem to be ready
vfio/nvgrace-gpu: Inform devmem unmapped after reset
...
Including:
- Introduction of the generic IO page-table framework with support for
Intel and AMD IOMMU formats from Jason. This has good potential for
unifying more IO page-table implementations and making future
enhancements more easy. But this also needed quite some fixes during
development. All known issues have been fixed, but my feeling is that
there is a higher potential than usual that more might be needed.
- Intel VT-d updates:
- Use right invalidation hint in qi_desc_iotlb().
- Reduce the scope of INTEL_IOMMU_FLOPPY_WA.
- ARM-SMMU updates:
- Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
and a new clock for the TBU.
- Fix error handling if level 1 CD table allocation fails.
- Permit more than the architectural maximum number of SMRs for funky
Qualcomm mis-implementations of SMMUv2.
- Mediatek driver:
- MT8189 iommu support.
- Move ARM IO-pgtable selftests to kunit.
- Device leak fixes for a couple of drivers.
- Random smaller fixes and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmktjWgACgkQK/BELZcB
GuMQgA//YljqMVbMmYpFQF/9nXsyTvkpzaaVqj3CscjfnLJQ7YNIrWHLMw4xcZP7
c2zsSDMPc5LAe2nkyNPvMeFufsbDOYE1CcbqFfZhNqwKGWIVs7J1j73aqJXKfXB4
RaVv5GA1NFeeIRRKPx/ZpD9W1WiR9PiUQXBxNTAJLzbBNhcTJzo+3YuSpJ6ckOak
aG67Aox6Dwq/u0gHy8gWnuA2XL6Eit+nQbod7TfchHoRu+TUdbv8qWL+sUChj+u/
IlyBt1YL/do3bJC0G1G2E81J1KGPU/OZRfB34STQKlopEdXX17ax3b2X0bt3Hz/h
9Yk3BLDtGMBQ0aVZzAZcOLLlBlEpPiMKBVuJQj29kK9KJSYfmr2iolOK0cGyt+kv
DfQ8+nv6HRFMbwjetfmhGYf6WemPcggvX44Hm/rgR2qbN3P+Q8/klyyH8MLuQeqO
ttoQIwDd9DYKJelmWzbLgpb2vGE3O0EAFhiTcCKOk643PaudfengWYKZpJVIIqtF
nEUEpk17HlpgFkYrtmIE7CMONqUGaQYO84R3j7DXcYXYAvqQJkhR3uJejlWQeh8x
uMc9y04jpg3p5vC5c7LfkQ3at3p/jf7jzz4GuNoZP5bdVIUkwXXolir0ct3/oby3
/bXXNA1pSaRuUADm7pYBBhKYAFKC7vCSa8LVaDR3CB95aNZnvS0=
=KG7j
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
- Introduction of the generic IO page-table framework with support for
Intel and AMD IOMMU formats from Jason.
This has good potential for unifying more IO page-table
implementations and making future enhancements more easy. But this
also needed quite some fixes during development. All known issues
have been fixed, but my feeling is that there is a higher potential
than usual that more might be needed.
- Intel VT-d updates:
- Use right invalidation hint in qi_desc_iotlb()
- Reduce the scope of INTEL_IOMMU_FLOPPY_WA
- ARM-SMMU updates:
- Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
and a new clock for the TBU.
- Fix error handling if level 1 CD table allocation fails.
- Permit more than the architectural maximum number of SMRs for
funky Qualcomm mis-implementations of SMMUv2.
- Mediatek driver:
- MT8189 iommu support
- Move ARM IO-pgtable selftests to kunit
- Device leak fixes for a couple of drivers
- Random smaller fixes and improvements
* tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits)
iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
powerpc/pseries/svm: Make mem_encrypt.h self contained
genpt: Make GENERIC_PT invisible
iommupt: Avoid a compiler bug with sw_bit
iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal
iommupt: Fix unlikely flows in increase_top()
iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga()
MAINTAINERS: Update my email address
iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables
dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock
iommu/vt-d: Restore previous domain::aperture_end calculation
iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD
iommu/tegra: fix device leak on probe_device()
iommu/sun50i: fix device leak on of_xlate()
iommu/omap: simplify probe_device() error handling
iommu/omap: fix device leaks on probe_device()
iommu/mediatek-v1: add missing larb count sanity check
iommu/mediatek-v1: fix device leaks on probe()
...
Misc:
- Remove incorrect page-allocator quirk section in documentation.
- Remove unused devm_cxl_port_enumerate_dports() function.
- Fix typo in cdat.c code comment.
- Replace use of system_wq with system_percpu_wq
- Add locked CXL decoder support for region removal.
- Return when generic target updated
- Rename region_res_match_cxl_range() to spa_maps_hpa()
- Clarify comment in spa_maps_hpa()
Enable unit testing for XOR address translation of SPA to DPA and vice versa.
- Refactor address translation funcs for testing in cxl_region.
- Make the XOR calculations available for testing.
- Add cxl_translate module for address translation testing in cxl_test.
Extended Linear Cache changes:
- Add extended linear cache size sysfs attribute.
- Adjust failure emission of extended linear cache detection in cxl_acpi.
- Added extended linear cache unit testing support in cxl_test
Preparation refactor patches for PRM translation support.
- Simplify cxl_rd_ops allocation and handling.
- Group xor arithmetric setup code in a single block.
- Remove local variable @inc in cxl_port_setup_targets()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmktwo8ACgkQYGjFFmlT
OEo+zw/+JvSlKu6H9XPB6eiMS4i7aWpo1/k8uqpejZp29jxmZFe4N+YYuvVaMlEB
c7FhMd0e+jXC+/GocsleGyvXSXUJf+mgkrcsRHFNkwaZHSFud2LIbbyrwXRh3wdN
KPViMdhrB12YuCASkZLMXf8PIQT5apDUhXoo9F/pxtlFjlpWnOjqewL7dmNNLK9i
IIfdfrGUb90CyFDCLMreJOPnsRm4TFKaG4ITAImZOlj94Gx9KSdsS0Dcfzm0tZr/
sYjSamgN3ctoOVbzEhEy3Kmy3NOSLyWD8MOUzSXggZo2dU9u1F1h7yu1RABNisCd
jWq1x8/liFS0X1MPtHbjb9VVrgRNsemUrnIhGPTOG+iDWHk33vcftUulYtw1IZ14
tepArmR/duTU9v7J7GLbshjUO8hyPhDvF0TZKV+/290GQyrV4Mkd7kZJ0zU0OrJQ
Z1TiMne1e/RMWP/RNtpyOVif3hX71T3UeB28xo+HuElWd2jlZQQ1wh0JW7gZJzxD
p4Npl1elY1j4lGsXjPrXrBjMJ6X95XTIwsYLdtaSJlKXxvByec47dVSOhJZyfVHG
5UBsZZsAawS+bf6LFbNWH09MBwTZeS0K0i44Bf9ttiafkjSWCk4uJmhmkbGNOt49
DYRYRHykIik64t+9AFzW0SjHcPTsKjfGgCBwsws+UilSIZl5HNk=
=vT5l
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (CXL) updates from Dave Jiang:
"The additions of note are adding CXL region remove support for locked
CXL decoders, adding unit testing support for XOR address translation,
and adding unit testing support for extended linear cache.
Misc:
- Remove incorrect page-allocator quirk section in documentation
- Remove unused devm_cxl_port_enumerate_dports() function
- Fix typo in cdat.c code comment
- Replace use of system_wq with system_percpu_wq
- Add locked CXL decoder support for region removal
- Return when generic target updated
- Rename region_res_match_cxl_range() to spa_maps_hpa()
- Clarify comment in spa_maps_hpa()
Enable unit testing for XOR address translation of SPA to DPA and vice versa:
- Refactor address translation funcs for testing in cxl_region
- Make the XOR calculations available for testing
- Add cxl_translate module for address translation testing in
cxl_test
Extended Linear Cache changes:
- Add extended linear cache size sysfs attribute
- Adjust failure emission of extended linear cache detection in
cxl_acpi
- Added extended linear cache unit testing support in cxl_test
Preparation refactor patches for PRM translation support:
- Simplify cxl_rd_ops allocation and handling
- Group xor arithmetric setup code in a single block
- Remove local variable @inc in cxl_port_setup_targets()"
* tag 'cxl-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (22 commits)
cxl/test: Assign overflow_err_count from log->nr_overflow
cxl/test: Remove ret_limit race condition in mock_get_event()
cxl/test: remove unused mock function for cxl_rcd_component_reg_phys()
cxl/test: Add support for acpi extended linear cache
cxl/test: Add cxl_test CFMWS support for extended linear cache
cxl/test: Standardize CXL auto region size
cxl/region: Remove local variable @inc in cxl_port_setup_targets()
cxl/acpi: Group xor arithmetric setup code in a single block
cxl: Simplify cxl_rd_ops allocation and handling
cxl: Clarify comment in spa_maps_hpa()
cxl: Rename region_res_match_cxl_range() to spa_maps_hpa()
acpi/hmat: Return when generic target is updated
cxl: Add handling of locked CXL decoder
cxl/region: Add support to indicate region has extended linear cache
cxl: Adjust extended linear cache failure emission in cxl_acpi
cxl/test: Add cxl_translate module for address translation testing
cxl/acpi: Make the XOR calculations available for testing
cxl/region: Refactor address translation funcs for testing
cxl/pci: replace use of system_wq with system_percpu_wq
cxl: fix typos in cdat.c comments
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmku8DAACgkQpmLzj2vt
YEk1oQ/+PTkkVxeXaSMuC5/xLPuvDOwOt6w9bN5RbUaqGhg15x67clZ8Ibi6nXYt
cYxXR8rWMHFGujBdiP1UXRbYAAJEHnzYHAj2AQKFhPmSpeIm39TOcN2qEtSsYQ0r
mgcNi/9SAf+M3iF/b5qmcPgnfqZ3SB8jt1AFMV6F5Mbv+9y5ZYLbCDDD6TylJOGM
IU3VRt9B1ffSyc93SZkODXRNNGR/Iv0/0Yn1bPtNNSZeZ6ZNgrRPUukg0P+PsllY
W/hAhuoISKW1oSojlSeF4x5BJR/Xt0Mym7/IdraBbgPJPCwIBxfxcatQiSFJlaMD
GRFf0V1xD9gWxY/xNkmtjJhUjh6hmL6hTTX6lUVA+ltXKnUl7mfnFuMOt1naWKU+
hRech8eFRxJ4ZfwYTpT6w8IXV1OI0MGhcL7uWmbhpPGDGRTxQOqQE5DFTBrf54GB
OMlT0soDbyYQmCzZXaQyrn3HhCKpovmgjJRTMBvY1aQ6T707V/qu8C4eGNqlPjUn
4ywxwPaJ6B4M2q/nSl1BQPakka47pTVZoLkmsJuDU2npskPOs9ZHvfUXgMs7eOri
jS3BFoBOkrNAZ6YTiFtwjV0fw3CNHbSeicXV/fd1B3tB48NA52XBjq/uKiwOkZAJ
3bcJ5K7fgJXwOkVTAQhfgCPplfI1bbdvei40gkyITP36Ye3IyxE=
=7Jr3
-----END PGP SIGNATURE-----
Merge tag 'hid-for-linus-2025120201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- Proper mapping of HID_GD_Z to ABS_DISTANCE for stylus/pen types of
devices (Ping Cheng)
- Power management/hibernation improvements in intel-ish (Zhang Lixu)
- Improved support for several Logitech devices, e.g. G Pro X
Superlight 2, new iteration of Lighspeed receiver, G13, G510 (Nathan
Rossi, Mavroudis Chatzilazaridis, Leo L Schwab, Hans de Goede)
- Support for UcLogic XP-PEN Artist 24 Pro (Joshua Goins)
- WinWing Orion2 throttle support improvement (Ivan Gorinov)
- other assorted small fixes and device ID additions
* tag 'hid-for-linus-2025120201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (37 commits)
drivers: hid: renegotiate resolution multipliers with device after reset
HID: evision: Fix Report Descriptor for Evision Wireless Receiver 320f:226f
HID: logitech-dj: Fix probe failure when used with KVM
HID: logitech-dj: Remove duplicate error logging
HID: logitech-dj: Add support for G Pro X Superlight 2 receiver
selftests/hid-tablet: add ABS_DISTANCE test for stylus/pen
HID: input: map HID_GD_Z to ABS_DISTANCE for stylus/pen
HID: bpf: fix typo in HID usage table
HID: bpf: add the Huion Kamvas 27 Pro
HID: bpf: add heuristics to the Huion Inspiroy 2S eraser button
HID: bpf: Add support for XP-Pen Deco02
HID: bpf: Add support for the XP-Pen Deco 01 V3
HID: bpf: Add support for the Waltop Batteryless Tablet
HID: bpf: Add fixup for Logitech SpaceNavigator variants
HID: bpf: support for Huion Kamvas 16 Gen 3
HID: bpf: add support for Huion Kamvas 13 (Gen 3) (model GS1333)
HID: bpf: Add support for the Inspiroy 2M
Documentation: hid-alps: Format DataByte* subsection headings
Documentation: hid-alps: Fix packet format section headings
HID: nintendo: add WQ_PERCPU to alloc_workqueue users
...
The majority of changes at this time were about ASoC with a lot of
code refactoring works. From the functionality POV, there aren't much
to see, but we have a wide range of device-specific fixes and updates.
Here are some highlights:
- Continued ASoC API clean works, spanned over many files
- Added a SoundWire SCDA generic class driver with regmap support
- Enhancements and fixes for Cirrus, Intel, Maxim and Qualcomm.
- Support for ASoC Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
TAS5815, TAS5828 and TAS5830
- Usual HD-audio and USB-audio quirks and fixups
- Support for Onkyo SE-300PCIE, TASCAM IF-FW/DM MkII
Some gpiolib changes for shared GPIOs are included along with this PR
for covering ASoC drivers changes.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmkwQ2UOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8tIRAAjCHdIlMejNTCzGRlhsRSQVD6bo1wASXcjfJ6
COH84akbnA0oT5z7H7JnzTOmfjzxLJpwC8j6IpZ/9CQazanT5IIVE41FZquXZ1JB
RhQVzuGw9Pl4MaYVdFuRqIXjiP+msY1jpbo9/QXQo8D/B41wpmVTgzkFVW2rxPMy
0aBOu4Wpu+11aBpNBy6dXDiKQ5kDqn7zOLoFGgcf5wlFIvOGZJ0Wg/i0kvCjl+ia
xYiP+/F6xKOyTY1c98iqExbKzSSy4ddGFUwrkevm6bWpu8hkXiL1O0zMWOe769x6
0wy0b5zvsbtOQOxbtK5+8gdjJw7ycgDa441hDtsaXBBROYZEV3D6+XZJCfq8Tz8F
+vLH5lfZeLg+59eqt3GOMGlwBfuhH91qzukIYG3q9EQGOkNkZ19ySJnFMLom68Ei
TCfNzh/ggSGXA9qAmfBcPoizgC/j9o+v4kbLRQteuRRWxES1FxqeN9Ba3d5JcHT3
BQpz1bhUli73477D6voPcwXLiQlM+Alv4QUKTFr2nUnWUQKwMvkZFwiv2jTqVdDf
f71Usv7xdyM7XijgmXuLg+3n0UvCwUPBB+bv3a1Bu7G4iTB1deNKU8t9k+sBJpcX
aRs5ych3MiU/zG+KRMB5FEx31KzpKu+Kk9NQ207/1HLaNhTgD3cg2wS3T3qdRUPv
Yf6wFHs=
=1JUI
-----END PGP SIGNATURE-----
Merge tag 'sound-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"The majority of changes at this time were about ASoC with a lot of
code refactoring works. From the functionality POV, there isn't much
to see, but we have a wide range of device-specific fixes and updates.
Here are some highlights:
- Continued ASoC API cleanup work, spanned over many files
- Added a SoundWire SCDA generic class driver with regmap support
- Enhancements and fixes for Cirrus, Intel, Maxim and Qualcomm.
- Support for ASoC Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
TAS5815, TAS5828 and TAS5830
- Usual HD-audio and USB-audio quirks and fixups
- Support for Onkyo SE-300PCIE, TASCAM IF-FW/DM MkII
Some gpiolib changes for shared GPIOs are included along with this PR
for covering ASoC drivers changes"
* tag 'sound-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (739 commits)
ALSA: hda/realtek: Add PCI SSIDs to HP ProBook quirks
ALSA: usb-audio: Simplify with usb_endpoint_max_periodic_payload()
ALSA: hda/realtek: fix mute/micmute LEDs don't work for more HP laptops
ALSA: rawmidi: Fix inconsistent indenting warning reported by smatch
ALSA: dice: fix buffer overflow in detect_stream_formats()
ASoC: codecs: Modify awinic amplifier dsp read and write functions
ASoC: SDCA: Fixup some more Kconfig issues
ASoC: cs35l56: Log a message if firmware is missing
ASoC: nau8325: Delete a stray tab
firmware: cs_dsp: Add test cases for client_ops == NULL
firmware: cs_dsp: Don't require client to provide a struct cs_dsp_client_ops
ASoC: fsl_micfil: Set channel range control
ASoC: fsl_micfil: Add default quality for different platforms
ASoC: intel: sof_sdw: Add codec_info for cs42l45
ASoC: sdw_utils: Add cs42l45 support functions
ASoC: intel: sof_sdw: Add ability to have auxiliary devices
ASoC: sdw_utils: Move codec_name to dai info
ASoC: sdw_utils: Add codec_conf for every DAI
ASoC: SDCA: Add terminal type into input/output widget name
ASoC: SDCA: Align mute controls to ALSA expectations
...
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct slab {
..
struct freelist_counters;
};
Use -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmktsoMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuiUD/92eivL+HmOh10o8trvxajB0yuyqfSjHHrL
g+xUbF4s9bgAg/v+Upx7sTY8jdrTcMjKov+G9T6uPvBMqVmeVdZckA1PSAKQaIX1
Zb7nS2LnO7F6JKbwpwVrrIaqVbcz8MfGIIMbN4yNNEOMCwdIVMp4fo7trPBknJNx
WddNSGUFlIF3NqSI8AflSS/pYnGm+McfBHXBpJAKipI3iquKKubHv+FX9kLp7Tn4
x27ZoCWOHglIBTJXU0mmXCVsLF8b5BA8DQcGtT62azb8+l0cRTkaHY0DFAv5BvhG
TqcjrKdmR0cGSNt+nEmFrujE3atBRl0G0kiHA80YgA1MTtYzdPaUVOUtM9k/rEem
gpiGMDpBypdxyJAyijPSaVJdfcg0psOlYbhIR4N2wbj/dq8268h+cWzXlF1spgVt
/7ygoaCmfMNbTy9rKThTjH+es787AVXUAXXaPHhIFsnCKUj8xQl4pT7XltmgYeWx
1/XD1NEJeLHHog5upAVlGX3H5tbvP1nIICxbZa9mDOJX1rwxxI7/s/RucPjbNXuY
AiaKPTfxtB9+Ihd2HrJ/76RVMkckcOBc4GIKoFfwuKDbcdLXQ5FcZCmVRoI1V9SV
KsH7JBgihLwR9XWKE1vp9+CBNe1Qlu3K4IjG/E7CNLeuDntIBu73ihqGP/DqV6Bq
RX1Dc0OyAQ==
=m22w
-----END PGP SIGNATURE-----
Merge tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Fix head insertion for mq-deadline, a regression from when priority
support was added
- Series simplifying and improving the ublk user copy code
- Various ublk related cleanups
- Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the
request is punted to a thread for handling
- Merge and then later revert loop dio nowait support, as it ended up
causing excessive stack usage for when the inline issue code needs to
dip back into the full file system code
- Improve auto integrity code, making it less deadlock prone
- Speedup polled IO handling, but manually managing the hctx lookups
- Fixes for blk-throttle for SSD devices
- Small series with fixes for the S390 dasd driver
- Add support for caching zones, avoiding unnecessary report zone
queries
- MD pull requests via Yu:
- fix null-ptr-dereference regression for dm-raid0
- fix IO hang for raid5 when array is broken with IO inflight
- remove legacy 1s delay to speed up system shutdown
- change maintainer's email address
- data can be lost if array is created with different lbs devices,
fix this problem and record lbs of the array in metadata
- fix rcu protection for md_thread
- fix mddev kobject lifetime regression
- enable atomic writes for md-linear
- some cleanups
- bcache updates via Coly
- remove useless discard and cache device code
- improve usage of per-cpu workqueues
- Reorganize the IO scheduler switching code, fixing some lockdep
reports as well
- Improve the block layer P2P DMA support
- Add support to the block tracing code for zoned devices
- Segment calculation improves, and memory alignment flexibility
improvements
- Set of prep and cleanups patches for ublk batching support. The
actual batching hasn't been added yet, but helps shrink down the
workload of getting that patchset ready for 6.20
- Fix for how the ps3 block driver handles segments offsets
- Improve how block plugging handles batch tag allocations
- nbd fixes for use-after-free of the configuration on device clear/put
- Set of improvements and fixes for zloop
- Add Damien as maintainer of the block zoned device code handling
- Various other fixes and cleanups
* tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
block/rnbd: correct all kernel-doc complaints
blk-mq: use queue_hctx in blk_mq_map_queue_type
md: remove legacy 1s delay in md_notify_reboot
md/raid5: fix IO hang when array is broken with IO inflight
md: warn about updating super block failure
md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid
sbitmap: fix all kernel-doc warnings
ublk: add helper of __ublk_fetch()
ublk: pass const pointer to ublk_queue_is_zoned()
ublk: refactor auto buffer register in ublk_dispatch_req()
ublk: add `union ublk_io_buf` with improved naming
ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg()
kfifo: add kfifo_alloc_node() helper for NUMA awareness
blk-mq: fix potential uaf for 'queue_hw_ctx'
blk-mq: use array manage hctx map instead of xarray
ublk: prevent invalid access with DEBUG
s390/dasd: Use scnprintf() instead of sprintf()
s390/dasd: Move device name formatting into separate function
s390/dasd: Remove unnecessary debugfs_create() return checks
s390/dasd: Fix gendisk parent after copy pair swap
...
Core & protocols
----------------
- Replace busylock at the Tx queuing layer with a lockless list. Resulting
in a 300% (4x) improvement on heavy TX workloads, sending twice the
number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue. Normally we perform queue re-selection when flow comes out
of idle, but under extreme circumstances the flows may be constantly
busy. Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already did
for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly
quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for packets.
- Make MPTCP use Rx backlog processing. This lowers the lock pressure,
improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor fit
for modern container workloads, where limits are imposed using cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily
aggressive rcvbuf growth and overshot when the connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC 5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API
----------
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics,
as defined by the OPEN Alliance's "Advanced diagnostic features
for 100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads IPsec
and performs RSS.
- Add info to devlink params whether the current setting is the default
or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame duplication
for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers
--------------
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback
for reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which supports
Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as other
drivers) to avoid wasting resources if feature is unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211 debugfs
interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkveRQACgkQMUZtbf5S
IrvY7A/+Nb0o4BxLHjPkAl1m3t3q2d0Y29B7SNkwnwEtxAV8EkNeZ3GWrdtDnTQY
MYhmc7LEzvz8/lihapr7UJkcokzSASUV54hbez5jDBKC8EEoyUk8FdWDPerwlcRI
zmCFNAVFyh9GX8i7wcrzKbDTHT5+GZLbSlGl9U5mhLsDdRlJgH7d8PJ7vWcmtLFY
XN0paDyaeHfCl8wReWNAYx4C/I0ODOvlscpO0tnAKhB0ngJbQCKY2t6tn3rOYdif
ZSQ5KwVRnJtQ4fYOFMOy9+FSCjVXtyrxF8KLxD+mqom2ZhmO00UpOMl09tqhq3uT
WnvwoHUVBt6F+iITHwg5kMgIDPUq1kpUvL4S4UbVSuUm9ZKD+4KRU2ZHRBYMx+MU
bsqmtY8/IULClUoRz+tZhltA8eb0NEqNZE2JPOFDiJHn1YiCCkFwxibhir893oM3
sB7x65D7LQI2ty2BBGVGYnwYDPtyaxOA/s3WTwPvLEi3+Y/TGNIIrS9lBLA4U+Yr
Gi93WQGVjttMmVyaHgXBUGmi3L52hvolm0AZ8zSRGrnIEpecjhly2KfYuaOzuxXC
IHEQ6AFLdRh6JzafXGb/mQwGCHNmhwsY8A49i94fakWQamaL/L6A+1dyPu4LXMqi
NwqCmlVb/LKGlfNG+V4wT27srJ+yBA2Vk3tpR1sZQQytFh0LKHI=
=UoDR
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmktzC4ACgkQ6rmadz2v
bTpA1w/+PZ45N3y6O+NQVIpBlpnHG7DEMK7Lw19On0xVLwH+XPHz6J5PEfzjyJR1
SCbsV30qkJ1YCtgRHHf+ZCuWPWm58hY8dXYwSDyjNavdQyVGOdf17aBu9pvH45NW
K20OhwQHpCHWIfDlijjPkDdiHnYf5S7Xy6ctt/3ztF0pMDHIaghGxJymG4wULcDT
iLKnT37kwO8b2ihmw/HbcZPQYMWfHRye7X009K+wCv0dnhJ6q/Ny1m+Pg4kF92e6
ON/RY26ep2dq7LpaNWa1rI1yOgFlI7uUlVojqrAuAb+xrg+64wUDBxeijvE37EN1
s/+PuEKAR6xwz1dbY2cWAI0D633saz24UdV6kCBW9HrjHKVRQ7ZSsBF9ENkS4DTK
nowx4wOe1ZHc/6YgTktZp9LEn/0YrmQtFxjqEAJiYUgD18FrBrSjmhHpBiL+HghP
sTqy41qDQGoKtg3bRu42Co9wmNeeLsnxT8NQExCmTYQ4ufpdA/VMQux9cBVX3GBq
EchJb465+AcvvCJUiKbnHLxDsHCQz1YYytz3RqyFLgGDFZnHOE0FjwPJmM8I5kkK
gvDB3ZYdO3Halm8BZfZZBnKv5uK7myuAWwqRLgMRanuZcRgmIV1oUP5EP88HdH75
fB20vZSVcfzB17SLyhiM20ivEWodJa9VCLEw9WDOmDoml+33Pks=
=kaCJ
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
test_progs runner (Alexis Lothoré)
- Convert selftests/bpf/test_xsk to test_progs runner (Bastien
Curutchet)
- Replace bpf memory allocator with kmalloc_nolock() in
bpf_local_storage (Amery Hung), and in bpf streams and range tree
(Puranjay Mohan)
- Introduce support for indirect jumps in BPF verifier and x86 JIT
(Anton Protopopov) and arm64 JIT (Puranjay Mohan)
- Remove runqslower bpf tool (Hoyeon Lee)
- Fix corner cases in the verifier to close several syzbot reports
(Eduard Zingerman, KaFai Wan)
- Several improvements in deadlock detection in rqspinlock (Kumar
Kartikeya Dwivedi)
- Implement "jmp" mode for BPF trampoline and corresponding
DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)
- Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
Chaignon)
- Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
Borkmann)
- Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)
- Optimize bpf_map_update_elem() for map-in-map types (Ritesh
Oedayrajsingh Varma)
- Introduce overwrite mode for BPF ring buffer (Xu Kuohai)
* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
bpf: optimize bpf_map_update_elem() for map-in-map types
bpf: make kprobe_multi_link_prog_run always_inline
selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
selftests/bpf: remove test_tc_edt.sh
selftests/bpf: integrate test_tc_edt into test_progs
selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
selftests/bpf: Add success stats to rqspinlock stress test
rqspinlock: Precede non-head waiter queueing with AA check
rqspinlock: Disable spinning for trylock fallback
rqspinlock: Use trylock fallback when per-CPU rqnode is busy
rqspinlock: Perform AA checks immediately
rqspinlock: Enclose lock/unlock within lock entry acquisitions
bpf: Remove runqslower tool
selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
bpf: Disable file_alloc_security hook
bpf: check for insn arrays in check_ptr_alignment
bpf: force BPF_F_RDONLY_PROG on insn array creation
bpf: Fix exclusive map memory leak
selftests/bpf: Make CS length configurable for rqspinlock stress test
selftests/bpf: Add lock wait time stats to rqspinlock stress test
...
The error path of copying the old config used the wrong variable in the
error message:
$ mkdir /tmp/build
$ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad
$ chmod 0 /tmp/build
$ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad good
cp /tmp/build//.config config-good.tmp ... [0 seconds] FAILED!
Use of uninitialized value $config in concatenation (.) or string at ./tools/testing/ktest/config-bisect.pl line 744.
failed to copy to config-good.tmp
When it should have shown:
failed to copy /tmp/build//.config to config-good.tmp
Cc: stable@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Fixes: 0f0db06599 ("ktest: Add standalone config-bisect.pl program")
Link: https://patch.msgid.link/20251203180924.6862bd26@gandalf.local.home
Reported-by: "John W. Krahn" <jwkrahn@shaw.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmkumVobFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTypHMQALImoj39UZiLLVMn2d4P
5droOQw+qMCfsVVY5m9Rqw3vrCsSds+7eC5Qa2GaXR480l4ZeMGyoYCr+2gDWcM3
vwTWuj8UuUOxQlMXubIFyL9r/SUtYSiqfAPK5tGtzHuNNUEHWGed9r5RrOGKZOnj
SDe6ZL5G2Kekuua1mtSZhw8qpW/nK2IuFTRPxN6O243dUGtX0wkzWr1tVRpDDzoT
UNWZ2xoE1lzb/EAviyuSgSIc5YsT50bM+fMzOUHsY/E9u8fI42N1z5B+P3ZlTvGx
Z5nJ79ZMtbF3/4o6OaUqvR9Y4zZ73TtkMoV4sjrjYg28q+n4/tqVlCsjy0Hys/dP
MlaKbLSI54+RldTWx02Yu0jcn2Artua3N0bURyEenqkhDLW9RXVfx/h29QQYPpE9
7FSlpUAQnFzFbGUGJcB02oClkxUSUcOuzVqC0n30wfdo+bVzDn19rr+Vi2bHe8TD
MsASva96n2FhRVPkm8eBxnxpaYQj9Q6G1/7B1F30Et7PLzwD9qYUvXPXceyPATnd
r3hdUiKgkPvUaDhApJRHwv/ZGnNDKujGBDc/YgBQrl3cPfG0ShGzJzWz892JjX0+
NB+O+eabHVs6tFRFNvxS2fu/ROZxrGfBzZUKoaZjZh3JEgOhlLKVlL7aRqtctFvP
Dz0QYiOqlkjho7FykCIEKXxQ
=6o9X
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Support both paths where tracefs is typically mounted in selftests
- Make old_sympos 0 and 1 equal. They both are valid when there is only
one symbol with the given name.
* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: use canonical ftrace path
livepatch: Match old_sympos 0 and 1 in klp_find_func()
- Improve recovery from misbehaving BPF schedulers. When a scheduler puts many
tasks with varying affinity restrictions on a shared DSQ, CPUs scanning
through tasks they cannot run can overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this, and
hooks into the hardlockup detector to attempt recovery. Add scx_cpu0 example
scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for schedulers
that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in preparation for
deprecating cpu_acquire/release() callbacks in favor of generic BPF hooks.
- Prepare for hierarchical scheduler support: add scx_bpf_task_set_slice() and
scx_bpf_task_set_dsq_vtime() kfuncs, make scx_bpf_dsq_insert*() return bool,
and wrap kfunc args in structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a cgroup's
idle state changes.
- Fix migration tasks being incorrectly downgraded from stop_sched_class to
rt_sched_class across sched_ext enable/disable. Applied late as the fix is
low risk and the bug subtle but needs stable backporting.
- Various fixes and cleanups including cgroup exit ordering, SCX_KICK_WAIT
reliability, and backward compatibility improvements.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS4h1A4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGe/MAP9EZ0pLiTpmMtt6mI/11Fmi+aWfL84j1zt13cz9
W4vb4gEA9eVEH6n9xyC4nhcOk9AQwSDuCWMOzLsnhW8TbEHVTww=
=8W/B
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve recovery from misbehaving BPF schedulers.
When a scheduler puts many tasks with varying affinity restrictions
on a shared DSQ, CPUs scanning through tasks they cannot run can
overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
and hooks into the hardlockup detector to attempt recovery.
Add scx_cpu0 example scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for
schedulers that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in
preparation for deprecating cpu_acquire/release() callbacks in favor
of generic BPF hooks.
- Prepare for hierarchical scheduler support: add
scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a
cgroup's idle state changes.
- Fix migration tasks being incorrectly downgraded from
stop_sched_class to rt_sched_class across sched_ext enable/disable.
Applied late as the fix is low risk and the bug subtle but needs
stable backporting.
- Various fixes and cleanups including cgroup exit ordering,
SCX_KICK_WAIT reliability, and backward compatibility improvements.
* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
sched_ext: tools: Removing duplicate targets during non-cross compilation
sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
sched_ext: Update comments replacing breather with aborting mechanism
sched_ext: Implement load balancer for bypass mode
sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
sched_ext: Add scx_cpu0 example scheduler
sched_ext: Hook up hardlockup detector
sched_ext: Make handle_lockup() propagate scx_verror() result
sched_ext: Refactor lockup handlers into handle_lockup()
sched_ext: Make scx_exit() and scx_vexit() return bool
sched_ext: Exit dispatch and move operations immediately when aborting
sched_ext: Simplify breather mechanism with scx_aborting flag
sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
sched_ext: Refactor do_enqueue_task() local and global DSQ paths
sched_ext: Use shorter slice in bypass mode
sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
sched_ext: Minor cleanups to scx_task_iter
...
- Defer task cgroup unlink until after the dying task's final context switch
so that controllers see the cgroup properly populated until the task is
truly gone.
- cpuset cleanups and simplifications. Enforce that domain isolated CPUs
stay in root or isolated partitions and fail if isolated+nohz_full would
leave no housekeeping CPU. Fix sched/deadline root domain handling during
CPU hot-unplug and race for tasks in attaching cpusets.
- Misc fixes including memory reclaim protection documentation and selftest
KTAP conformance.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS3pEQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGYbrAP9H0kVyWH5tK9VhjSZyqidic8NuvtmNOyhIRrg0
8S8K0wD/YG9xlh2JUyRmS4B23ggc59+9y5xM2/sctrho51Pvsgg=
=0MB+
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Defer task cgroup unlink until after the dying task's final context
switch so that controllers see the cgroup properly populated until
the task is truly gone
- cpuset cleanups and simplifications.
Enforce that domain isolated CPUs stay in root or isolated partitions
and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
sched/deadline root domain handling during CPU hot-unplug and race
for tasks in attaching cpusets
- Misc fixes including memory reclaim protection documentation and
selftest KTAP conformance
* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Treat cpusets in attaching as populated
sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
docs: cgroup: No special handling of unpopulated memcgs
docs: cgroup: Note about sibling relative reclaim protection
docs: cgroup: Explain reclaim protection target
selftests/cgroup: conform test to KTAP format output
cpuset: remove need_rebuild_sched_domains
cpuset: remove global remote_children list
cpuset: simplify node setting on error
cgroup: include missing header for struct irq_work
cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup/cpuset: Globally track isolated_cpus update
cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
cgroup: Defer task cgroup unlink until after the task is done switching out
cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
...
Remove parentheses around assert statements in Python. With parentheses,
assert always evaluates to True, making the checks ineffective.
Signed-off-by: Maurice Hieronymus <mhi@mailbox.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
SRCU
----
_ Properly handle SRCU readers within IRQ disabled sections in tiny SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture.
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write side
(using full RCU grace period ordering instead of simply full
ordering) as compared to "traditional" non-fast SRCU. But those
srcu-fast flavours are going to be optimized in two different
ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next cycle.
Only the new interfaces are added for now, along with related
torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering) without
waiting for the read side to tell which to use. Also this optimizes
the read side altogether with moving flavour debug checks under debug
config and with removing a costly RmW operation on their first call.
- Make some diagnostic functions tracing safe.
REFSCALE
-------
Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code.
MISCELLANOUS
------
- In order to prepare the layout for nohz_full work deferral to
user exit, the context tracking state must shrink the counter
of transitions to/from RCU not watching. The only possible hazard
is to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption.
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
On recent discussions, we also concluded that all those WRITE_ONCE()
and READ_ONCE() on list APIs deserve appropriate comments. Something
to be expected for the next cycle.
- Provide a script to apply several configs to several commits with torture.
- Allow torture to reuse a build directory in order to save needless
rebuild time.
- Various cleanups.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmksubgbFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEIUkVEdQjox3MJ0P/Aq+AOPWuaE70B8SxVYY
VfucjjzZwSyI30Vw4GE8skNVMOnyY2wPIrNRgmckghi9LM5luureMcks9esrdFfv
csWj3AhnBu+/Lyd4EhP8AxMH2OPIX50FUrAQaY7LAcmc4nIng1MqRm35eT776XzR
tN07+nsXjNumPcdN8JAv/VzInUf5DdMvhy+Qw0Krpvan/sUpzzpFpNUWdHsyZ2Re
UCQr//AooQLfCIIIHUYlviThqigJETZlv0NKYrD0Zaxx5MCOwwWmz5otQzacSRk2
Zgn6Bayp98s0972sOghiMRXEoSvqBSJe5eVTWcj1oNLruO3flpyREgE5Wuo8T8Bg
BHvOiYaPNC5FW+btCWOQcKcSx1mxrgsX3/9OAth+kmtPqZdJSQEeY+zqWzdN7ZlV
q5+oipsFONvWXL6+U6uZf44MG1pscrzn6QvXajjg98iaLZgY0+mkAhvtpjcfuTwZ
2CXYQKacfwftpOXgM2QQGB4QE8CKXchlma381KbTHv0rX8zfK0BnWoe/oGPz22GL
3kldmhfIC/87i8t9HHngkU6SZ+Mkvf9vC28eclnuXB1PKCRsBHePGvi3I671kbLj
hycBwnwWriULEwKYTVQ8gmdBHSANB4IpmMYflADVqLuvp+7U6W5mNjWNTcXoSwFl
NoR7AViEUNzOEoa8u6WMKoec
=XnUt
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Properly handle SRCU readers within IRQ disabled sections in tiny
SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write
side (using full RCU grace period ordering instead of simply
full ordering) as compared to "traditional" non-fast SRCU. But
those srcu-fast flavours are going to be optimized in two
different ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next
cycle. Only the new interfaces are added for now, along with
related torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering)
without waiting for the read side to tell which to use.
This also optimizes the read side altogether with moving flavour
debug checks under debug config and with removing a costly RmW
operation on their first call.
- Make some diagnostic functions tracing safe
Refscale:
- Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code
Miscellanous:
- In order to prepare the layout for nohz_full work deferral to user
exit, the context tracking state must shrink the counter of
transitions to/from RCU not watching. The only possible hazard is
to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN
warnings. On recent discussions, we also concluded that all those
WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
comments. Something to be expected for the next cycle
- Provide a script to apply several configs to several commits with
torture
- Allow torture to reuse a build directory in order to save needless
rebuild time
- Various cleanups"
* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
refscale: Add SRCU-fast-updown readers
refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
srcu: Create an SRCU-fast-updown API
refscale: Do not disable interrupts for tests involving local_bh_enable()
refscale: Add non-atomic per-CPU increment readers
refscale: Add this_cpu_inc() readers
refscale: Add preempt_disable() readers
refscale: Add local_bh_disable() readers
refscale: Add local_irq_disable() and local_irq_save() readers
torture: Permit negative kvm.sh --kconfig numberic arguments
srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
rcu: Mark diagnostic functions as notrace
rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
rcutorture: Permit kvm-again.sh to re-use the build directory
torture: Add kvm-series.sh to test commit/scenario combination
rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
locktorture: Fix memory leak in param_set_cpumask()
doc: Update for SRCU-fast definitions and initialization
...
Highlights:
* Preparations to the use of nolibc in UML.
* Cleanup of sparse warnings.
* Library mode without _start().
* More consistency when disabling errno.
* Unconditional installation of all architecture support files.
* Always 64-bit wide ino_t and off_t.
* Various cleanups and bug fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTg4lxklFHAidmUs57B+h1jyw5bOAUCaSwY2wAKCRDB+h1jyw5b
OPpKAP9wvshTAWselixtw/xR8BXBIxHUh9y/NeT827Ut4IvpUgD/UH3duISRfB++
31nIRPI5BSZpC4CgaN9QKChcmxR8rw4=
=CbTX
-----END PGP SIGNATURE-----
Merge tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- Preparations to the use of nolibc in UML:
- Cleanup of sparse warnings
- Library mode without _start()
- More consistency when disabling errno
- Unconditional installation of all architecture support files
- Always 64-bit wide ino_t and off_t
- Various cleanups and bug fixes
* tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (25 commits)
selftests/nolibc: error out on linker warnings
selftests/nolibc: use lld to link loongarch binaries
tools/nolibc: remove more __nolibc_enosys() fallbacks
tools/nolibc: remove now superfluous overflow check in llseek
tools/nolibc: use 64-bit off_t
tools/nolibc: prefer the llseek syscall
tools/nolibc: handle 64-bit off_t for llseek
tools/nolibc: use 64-bit ino_t
tools/nolibc: avoid using plain integer as NULL pointer
tools/nolibc: add support for fchdir()
tools/nolibc: clean up outdated comments in generic arch.h
tools/nolibc: make the "headers" target install all supported archs
tools/nolibc: add the more portable inttypes.h
tools/nolibc: provide the portable sys/select.h
tools/nolibc: add missing memchr() to string.h
tools/nolibc: fix misleading help message regarding installation path
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add option to disable runtime
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: implement %m if errno is not defined
...
Core features:
- Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
Perf and PMU:
- Avoid cycle counter on multi-threaded CPUs
- Extend CSPMU device probing and add additional filtering support for
NVIDIA implementations
- Add support for the PMUs on the NoC S3 interconnect
- Add additional compatible strings for new Cortex and C1 CPUs
- Add support for data source filtering to the SPE driver
- Add support for i.MX8QM and "DB" PMU in the imx PMU driver
Memory managemennt:
- Avoid broadcast TLBI if page reused in write fault
- Elide TLB invalidation if the old PTE was not valid
- Drop redundant cpu_set_*_tcr_t0sz() macros
- Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
- Propagate return value from __change_memory_common()
ACPI and EFI:
- Call EFI runtime services without disabling preemption
- Remove unused ACPI function
Miscellaneous:
- ptrace support to disable streaming on SME-only systems
- Improve sysreg generation to include a 'Prefix' descriptor
- Replace __ASSEMBLY__ with __ASSEMBLER__
- Align register dumps in the kselftest zt-test
- Remove some no longer used macros/functions
- Various spelling corrections
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmkvMjkACgkQa9axLQDI
XvGaGg//dtT/ZAqrWa6Yniv1LOlh837C07YdxAYTTuJ+I87DnrxIqjwbW+ye+bF+
61RTkioeCUm3PH+ncO9gPVNi4ASZ1db3/Rc8Fb6rr1TYOI1sMIeBsbbVdRJgsbX6
zu9197jOBHscTAeDceB6jZBDyW8iSLINPZ7LN6lGxXsZM/Vn5zfE0heKEEio6Fsx
+AzO2vos0XcwBR9vFGXtiCDx57T+/cXUtrWfA0Cjz4nvHSgD8+ghS+Jwv+kHMt1L
zrarqbeQfj+Iixm9PVHiazv+8THo9QdNl1yGLxDmJ4LEVPewjW5jBs8+5e8e3/Gj
p5JEvmSyWvKTTbFoM5vhxC72A7yuT1QwAk2iCyFIxMbQ25PndHboKVp/569DzOkT
+6CjI88sVSP6D7bVlN6pFlzc/Fa07YagnDMnMCSfk4LBjUfE3jYb+usaFydyv/rl
jwZbJrnSF/H+uQlyoJFgOEXSoQdDsll3dv6yEsUCwbd8RqXbAe3svbguOUHSdvIj
sCViezGZQ7Rkn6D21AfF9j6e7ceaSDaf5DWMxPI3dAxFKG8TJbCBsToR59NnoSj+
bNEozbZ1mCxmwH8i43wZ6P0RkClvJnoXcvRA+TJj02fSZACO39d3XDNswfXWL41r
KiWGUJZyn2lPKtiAWVX6pSBtDJ+5rFhuoFgADLX6trkxDe9/EMQ=
=4Sb6
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"These are the arm64 updates for 6.19.
The biggest part is the Arm MPAM driver under drivers/resctrl/.
There's a patch touching mm/ to handle spurious faults for huge pmd
(similar to the pte version). The corresponding arm64 part allows us
to avoid the TLB maintenance if a (huge) page is reused after a write
fault. There's EFI refactoring to allow runtime services with
preemption enabled and the rest is the usual perf/PMU updates and
several cleanups/typos.
Summary:
Core features:
- Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
Perf and PMU:
- Avoid cycle counter on multi-threaded CPUs
- Extend CSPMU device probing and add additional filtering support
for NVIDIA implementations
- Add support for the PMUs on the NoC S3 interconnect
- Add additional compatible strings for new Cortex and C1 CPUs
- Add support for data source filtering to the SPE driver
- Add support for i.MX8QM and "DB" PMU in the imx PMU driver
Memory managemennt:
- Avoid broadcast TLBI if page reused in write fault
- Elide TLB invalidation if the old PTE was not valid
- Drop redundant cpu_set_*_tcr_t0sz() macros
- Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
- Propagate return value from __change_memory_common()
ACPI and EFI:
- Call EFI runtime services without disabling preemption
- Remove unused ACPI function
Miscellaneous:
- ptrace support to disable streaming on SME-only systems
- Improve sysreg generation to include a 'Prefix' descriptor
- Replace __ASSEMBLY__ with __ASSEMBLER__
- Align register dumps in the kselftest zt-test
- Remove some no longer used macros/functions
- Various spelling corrections"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
arm64/pageattr: Propagate return value from __change_memory_common
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
Documentation/arm64: Fix the typo of register names
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
...
- Provide a new interface for dynamic configuration and deconfiguration of
hotplug memory, allowing with and without memmap_on_memory support. This
makes the way memory hotplug is handled on s390 much more similar to
other architectures
- Remove compat support. There shouldn't be any compat user space around
anymore, therefore get rid of a lot of code which also doesn't need to be
tested anymore
- Add stackprotector support. GCC 16 will get new compiler options, which
allow to generate code required for kernel stackprotector support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes
a lot of duplicated code. The new driver is also extendable and allows
to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of
crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area and
the kernel image are within the same 4GB area. This eliminates the need
of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmktZioACgkQIg7DeRsp
bsK4Rw//VzkvHyzOtGKZ8Hb4S+Sh/PFlaZQXNhj+Xt5gWoOhP1uPmmhBe6LxjYaB
J9Ns3hpONQ1dTHV7VVkds8FvM/SBcGe8m5RpefmChC/bjm5UEOV/MppKtA0aLnEH
hJmdubIrrRAXKggxlHEfRLzBsFvV/rJ9Xf16FhRxGDc4pgmgkI1NPQ41/dyCHklQ
dB3YrFVPIETywVYYVB/G3h11JgF5Z6CKtjYCdSx72Fkbj65+6JPfcPgLKMpcJuPd
UxUXtCo1FCXlP70jsz8JQI8cdieG0KDQTtnZP4P/pqjQ3wirOqvMewNa9t9xmQ2e
p6Rc1Vx5DESkq9bRWtQEaprTVVzK7DDLH3RuZwB+uLrcLGD8JvVS6/m9n9CgzBMT
BnJXG2sLZH+gdQy+DSD/fVDD7OvIk8TGrH+OFwVIKhrT/J3B2E7ZSYyZZCNIS7VG
yiuypoDGYg3ZpYjH9+qOXWB3nc0vQWrlFzb1bsQu1omJGmunLv4jtTjAKGN82C33
auBsIYAlQW20X7DV0vZa59PwqwtBqtdQQcTidwtSztzKogRXAdK8KKHtN60JM4S2
7sWFOFCQaTChAeDNw6MF5EtULb551nwH2RtJ9x3CrJj+OGK6clbQNcxIA7Oy0veR
Sl9v1lMfeKOgDrPdDy3ArQBJ8WLlF9qX9wLKbiaNyIKmkz2ymkg=
=CNrb
-----END PGP SIGNATURE-----
Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Provide a new interface for dynamic configuration and deconfiguration
of hotplug memory, allowing with and without memmap_on_memory
support. This makes the way memory hotplug is handled on s390 much
more similar to other architectures
- Remove compat support. There shouldn't be any compat user space
around anymore, therefore get rid of a lot of code which also doesn't
need to be tested anymore
- Add stackprotector support. GCC 16 will get new compiler options,
which allow to generate code required for kernel stackprotector
support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This
removes a lot of duplicated code. The new driver is also extendable
and allows to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of
crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area
and the kernel image are within the same 4GB area. This eliminates
the need of weak per cpu variables. Get rid of
ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
s390/entry: Use lay instead of aghik
s390/vdso: Get rid of -m64 flag handling
s390/vdso: Rename vdso64 to vdso
s390: Rename head64.S to head.S
s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
s390: Add stackprotector support
s390/modules: Simplify module_finalize() slightly
s390: Remove KMSG_COMPONENT macro
s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
s390/ap: Restrict driver_override versus apmask and aqmask use
s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
s390/ap: Support driver_override for AP queue devices
s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
s390/debug: Update description of resize operation
s390/syscalls: Switch to generic system call table generation
s390/syscalls: Remove system call table pointer from thread_struct
s390/uapi: Remove 31 bit support from uapi header files
s390: Remove compat support
tools: Remove s390 compat support
...
* Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuIXAACgkQaDWVMHDJ
krAwoRAAqqavNrthj26XJHjR3x7FVGu11/rvYXAd1U2moN/dhM2w82HMHNFvPuQY
3iq9GDRQdc2rKL7LTkREvN4ZM/rFvkFLt6a5Yv0eCRK8KAiSJEw6Yzu/qgG7kF+0
9clujDUskjjHU0zR5v+o1RxirrLVQ+R50sMVI5uoFx6+WJRiW1BvMG4Csw4BgbvA
AqgrZpyq1dQ/GQOW4f0yxBPH0z84wgUbdllYzQzE0GeUlGWQSI4lqa8GFMOmE/Gr
7gBcKmyE0M/BycwTZW7tiMnjWgNL+Y5/RroQJ7hh6R+f5WOd+SpGvlyOihbF7GER
L3yZfeQ+EWz1aY1QMWwOSvSawIPJo8EkSn3d9/JFq5Vl9zsFh+ZoPZfZ8bEi36U0
inO93swDcyMkkfOTh4sIgxedLgHja5GFNCGPs0yblvLulWbw7yYVzzEmEjXnclzS
fmmifsJjGrUpegEnWdEjAQzXkWPd/hKiAvpzDE/3thBal5NkOzFrudITFvCVuk8w
uS2MW0U8VCskNoON0jjwnvv84p0XdHJOsgPB9WnsuMMASKC1RqKAJWXh8AXvZA+I
TfCNdSyHDTm+o1e+SMQZRbqoE/r7MmAxUQOkKnlvpJDCz58tsLzW64hRXTe7QpCt
rry9/wODswu+oaHoDgfjAmzYde2RhCjwWLzGmqmapNIYfCCVhYs=
=5bcW
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU feature updates from Dave Hansen:
"The biggest thing of note here is Linear Address Space Separation
(LASS). It represents the first time I can think of that the
upper=>kernel/lower=>user address space convention is actually
recognized by the hardware on x86. It ensures that userspace can not
even get the hardware to _start_ page walks for the kernel address
space. This, of course, is a really nice generic side channel defense.
This is really only a down payment on LASS support. There are still
some details to work out in its interaction with EFI calls and
vsyscall emulation. For now, LASS is disabled if either of those
features is compiled in (which is almost always the case).
There's also one straggler commit in here which converts an
under-utilized AMD CPU feature leaf into a generic Linux-defined leaf
so more feature can be packed in there.
Summary:
- Enable Linear Address Space Separation (LASS)
- Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined"
* tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Enable LASS during CPU initialization
selftests/x86: Update the negative vsyscall tests to expect a #GP
x86/traps: Communicate a LASS violation in #GP message
x86/kexec: Disable LASS during relocate kernel
x86/alternatives: Disable LASS when patching kernel code
x86/asm: Introduce inline memcpy and memset
x86/cpu: Add an LASS dependency on SMAP
x86/cpufeatures: Enumerate the LASS feature bits
x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
- Prevent a thundering herd problem when the timekeeper CPU is delayed
and a large number of CPUs compete to acquire jiffies_lock to do the
update. Limit it to one CPU with a separate "uncontended" atomic
variable.
- A set of improvements for the timer migration mechanism:
- Support imbalanced NUMA trees correctly
- Support dynamic exclusion of CPUs from the migrator duty to allow the
cpuset/isolation mechanism to exclude them from handling timers of
remote idle CPUs.
- The usual small updates, cleanups and enhancements
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmks7doTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaxrD/40nxx+8cEXsVbVLIkP2PQbd2Y8+7sk
YbNu/Cb7j7Bg7R8YIs4p5GHk+7Yt/hNsW77SmbAzRPUyYYG6L3bUYlBa3yQlvIuo
xRPbzGA+RJies9skIGHbQ8z6ig1zUASRJPcBYiuaVIAuQhCfLNc4Nii9cEWtjZ24
+5gfRwV+vy74ArWwRkwaGejDK1tav+gd62OkFQZC8WtjQ08ozGZ6VBJNg7nYq/gH
FYO1rH2tQ/ZyjlO/x5NF8gFcjYD8iv5PDp8oH35MPx+XTdDccf0G3QB7ug0ffVdV
b4gA6lZTAmpsu/NHb6ByN4i/kf3wf8la/i+EaAh/Ov7NW078gunvVKVA7jStcbBl
ZgG5SRHiKRvQF/WXLGVQAnilRDZwRuS0nmJlqfExa44v23l5o3768RwdRYwQlv8g
X5KSRl0jlVgVtZHgNBlZtgX9+rnQSr9sB5sVGBP2a6a1WhVXQV/2kp0wjdnU0mPw
jLCnSdsHqBlSf9V7O/na823WCnBFb7blrLBXUoSbHBnICqtVFzhE1kBXWw3S7Kqh
CiaWM+S4WfR0HRnUlWMTS8BZ82MgiDnd7nGUXWwXBbdqWmoj/9CoU6SZRjbMBkzi
EY1XvmoYf6eSzdxfydI1hFi0/bbb8K9umHQlrpW3HeN9uXnVc0/+TroVPLuaKUdi
53ClqXjzE+CpJg==
=lQKn
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Prevent a thundering herd problem when the timekeeper CPU is delayed
and a large number of CPUs compete to acquire jiffies_lock to do the
update. Limit it to one CPU with a separate "uncontended" atomic
variable.
- A set of improvements for the timer migration mechanism:
- Support imbalanced NUMA trees correctly
- Support dynamic exclusion of CPUs from the migrator duty to allow
the cpuset/isolation mechanism to exclude them from handling
timers of remote idle CPUs
- The usual small updates, cleanups and enhancements
* tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Exclude isolated cpus from hierarchy
cpumask: Add initialiser to use cleanup helpers
sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
timers/migration: Use scoped_guard on available flag set/clear
timers/migration: Add mask for CPUs available in the hierarchy
timers/migration: Rename 'online' bit to 'available'
selftests/timers/nanosleep: Add tests for return of remaining time
selftests/timers: Clean up kernel version check in posix_timers
time: Fix a few typos in time[r] related code comments
time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc
hrtimer: Store time as ktime_t in restart block
timers/migration: Remove dead code handling idle CPU checking for remote timers
timers/migration: Remove unused "cpu" parameter from tmigr_get_group()
timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy
timers/migration: Fix imbalanced NUMA trees
timers/migration: Remove locking on group connection
timers/migration: Convert "while" loops to use "for"
tick/sched: Limit non-timekeeper CPUs calling jiffies update
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
-----BEGIN PGP SIGNATURE-----
iIgEABYKADAWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaS3m5RIcb3VwdG9uQGtl
cm5lbC5vcmcACgkQor51iCR83Rb4NAD8C1fGoiCErb6htQMHf1I7ua0ThdIx7OnY
Mk1EysNWu94BAI/VKEYgz+UC5uapHh+gnsoOdVTMJZedI/OPrnKa3QIA
=/Vl1
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
- SBI MPXY support for KVM guest
- New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support enabling dirty log gradually in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmkpa8kACgkQrUjsVaLH
LAd3lBAAhNlBVnva6fZseKf1ICGpwclXT/Ndqhn6CKWAPuvqsZvApQzTkW6f/txI
dwhu7SfAJeH62bQRHoyH/gpd5I1cplogp/xmUcAQJrzD4W0Wf0799hdFNOm9PAJf
IWeMMXSvj4CT8s3xinoKPt1YbmNvdDq3KkK776CET5B0/mIaGi3zBWC9ThU0aMl9
mlUTvIojApqmdhe6rXpjIZWj/nSP8XrDuYVmJS1Ys4xvCRW4Qyiu4QU1OKYMcwYR
xh6fgXDYufxojMs+h59mL8HOqBO5Kf79aO4lvjesFfZiRIii0+BATf16InH3XPyn
bkX3RD4LqgkU4q9I5TtwZ+UpxFvrkigliUewLYrxWFgLzJu6kSBpACduQYDyNSgm
X33iAm+m8V2tbl0FLHWRQGw970H9z4ycmEa4eII//+AePGTeFlHK90Qy9As2uW4E
XQet0Wqh/tw+qHRpy7Bls1k5MRtyYGJwi4fbSOp/g8Kjgg/DzSsF+qN2FyNE8GNj
+w8044fNYpDqd13BsSR99K/cUtFiAOjWN+RiMsu1wM8MRXpAL1lgW01KWqcH/LaD
gKZjmevETiWMKDUdERkXj+e7xZCb2cfyheJ+vw9Ds5u8Dwp9p8cga8dGyvgcUTEX
gF+4dx+MoW6uirX+Cd/TJYluu5c19bYKhgEybVBG/5er24cnshE=
=9ob6
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.19
- SBI MPXY support for KVM guest
- New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support enabling dirty log gradually in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmkpR/sWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImevPcD/9foNp5fo4MYnMe7WtRnWfjrAsY
VLaNJclUr9tER7HGbRzfj//mx7JkTjCNqlD2Ii6r6N1tikU0o9OVAGVV4ROXbopJ
efQxBZc5TfOrkecrCkKVJ634+tkwuf8Uea/jK2nxkE2UYCVIGPYlS0ZSkXB1lmi/
YnYHGv7EOVAuJ64BsVOWfFQoKBD5AJtChibqTaUeZuq9Y6k087Ns3gPRS5AqjueG
FFmKYO9pIZZV7hlV5+misR+UiKk7tk8p/7MjpBKN1fJ4P2j9dshfDb+uF1Ir671N
F+ZxujYJkG+52NQuTSOq9q9EyWh7qzrlWRah/YpM3OMiRB9VpxuYvAthyN7o2NyA
ftEmYYi+Ose24/ND6aeDQDKeoTtZm7UsfO5X4rMRC5VnrbHUH6d3ZlZQDpnfoeHA
yw9eL4JI5i5DM8oFo/E8Ag38MUQ1o6btTgeQwXUTgGUZWGnNKfkdi3LTxKr2J18C
5b2Pudhts6f8cL1pfNgbzbglkNtWdi2UBr7fwNZYHKK2i8JRX2rD9cfEdjWU0qxY
Ybzqp6DL/+p38cGt29oQOv51+z/aEwOLTnnrf9wl7LBWRB/tbzuh6kIGGE6Ap9Wv
qC+I0F/nitOSjmNmmb5HHOB4LnkjwRb6cJhzWZH1zrwz/ZkTQqyZqltOGsiHRo24
z1TqIjJ0Er7CNfrb4Q==
=880E
-----END PGP SIGNATURE-----
Merge tag 'loongarch-kvm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.19
1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
Add a CPUID testcase to verify that KVM allows KVM_SET_CPUID2 after (or in
conjunction with) runtime updates. This is a regression test for the bug
introduced by commit 93da6af3ae ("KVM: x86: Defer runtime updates of
dynamic CPUID bits until CPUID emulation"), where KVM would incorrectly
reject KVM_SET_CPUID due to a not handling a pending runtime update on the
current CPUID, resulting in a false mismatch between the "old" and "new"
CPUID entries.
Link: https://lore.kernel.org/all/20251128123202.68424a95@imammedo
Link: https://patch.msgid.link/20251202015049.1167490-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In commit 0297cdc12a ("KVM: selftests: Add option to rseq test to
override /dev/cpu_dma_latency"), a 'break' is missed before the option
'l' in the argument parsing loop, which leads to an unexpected core
dump in atoi_paranoid(). It tries to get the latency from non-existent
argument.
host$ ./rseq_test -u
Random seed: 0x6b8b4567
Segmentation fault (core dumped)
Add a 'break' before the option 'l' in the argument parsing loop to avoid
the unexpected core dump.
Fixes: 0297cdc12a ("KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Gavin Shan <gshan@redhat.com>
Link: https://patch.msgid.link/20251124050427.1924591-1-gshan@redhat.com
[sean: describe code change in shortlog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add tests that trigger packet drops in cake_enqueue(): "CAKE with QFQ
Parent - CAKE enqueue with packets dropping". It forces CAKE_enqueue to
return NET_XMIT_CN after dropping the packets when it has a QFQ parent.
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20251128001415.377823-3-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This is a very large set of updates, as well as some more extensive
cleanup work from Morimto-san we've also added a generic SCDA class
driver for SoundWire devices enabling us to support many chips with
no custom code. There's also a batch of new drivers added for both
SoCs and CODECs.
- Added a SoundWire SCDA generic class driver, pulling in a little
regmap work to support it.
- A *lot* of cleaup and API improvement work from Morimoto-san.
- Lots of work on the existing Cirrus, Intel, Maxim and Qualcomm
drivers.
- Support for Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
TAS5815, TAS5828 and TAS5830.
This also pulls in some gpiolib changes supporting shared GPIOs in the
core there so we can convert some of the ASoC drivers open coding
handling of that to the core functionality.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmkt6lUACgkQJNaLcl1U
h9D7dgf+JP2+yZIeRBud7CEO4Docda2uoRssT7GAIY/Rqrpem5FI0c0pWyZISvhn
scyjkoCrQfHEoeYrtC3l5bDI7F8o5Tc91hGzhJiCi3mb8jSwi+CaNIpR0Cet3epV
B9wQgzxlxbmKCxJRUYTPQF3n1uBJWc5EBHSc5QPddTZ0vdUfSlX0FAKHsabpmaOC
TpkdJnOlH8WUokmP3kP3TpzlflmOSLehnWX4BelJe5Os5O0PQpiKh/JG3oCYHSmc
yEbzCjOaya80HHn11FShOpy+B4b6sLUMcN+CAmDiLAdNFGvvjgmjpwwZtLYAm09Z
zFhN7XuVk1vXf+Zx/jHqYKaZtvvAsQ==
=Xwls
-----END PGP SIGNATURE-----
Merge tag 'asoc-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v6.19
This is a very large set of updates, as well as some more extensive
cleanup work from Morimto-san we've also added a generic SCDA class
driver for SoundWire devices enabling us to support many chips with
no custom code. There's also a batch of new drivers added for both
SoCs and CODECs.
- Added a SoundWire SCDA generic class driver, pulling in a little
regmap work to support it.
- A *lot* of cleaup and API improvement work from Morimoto-san.
- Lots of work on the existing Cirrus, Intel, Maxim and Qualcomm
drivers.
- Support for Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
TAS5815, TAS5828 and TAS5830.
This also pulls in some gpiolib changes supporting shared GPIOs in the
core there so we can convert some of the ASoC drivers open coding
handling of that to the core functionality.
Currently, tolerance is computed against the TC’s expected percentage,
making TC3 (20%) validation overly strict and TC4 (80%) overly loose.
Update BandwidthValidator to take a dict of shares and compute bounds
relative to the overall total, so that all shares are validated
consistently.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-7-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct the documented bandwidth distribution between TC3 and TC4
from 80/20 to 20/80. Update test descriptions and printed messages
to consistently reflect the intended split.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-6-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 7c32f7a2d3db ("selftests: net: py: don't default to shell=True")
changed the cmd() helper to avoid spawning a shell unless explicitly
requested.
The devlink_rate_tc_bw test enables SR-IOV by writing to the
sriov_numvfs sysfs attribute using redirection. Without shell=True the
redirection is not interpreted and the VF device never appears,
causing the test to fail.
Fix by explicitly passing shell=True in the two places that update
sriov_numvfs.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-5-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GenerateTraffic was added to spin up long-running iperf3 load, mainly
to drive high PPS background traffic. It was never meant to provide
stable throughput numbers, and trying to repurpose it for measurement
does not make sense.
Introduce Iperf3Runner to allow tests to split out server/client
configuration, control start/stop, and collect JSON output for
analysis. This makes it possible to measure bandwidth directly when
validating egress shaping.
GenerateTraffic stays as the background load generator, reusing the
common iperf3 helpers under the hood.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-3-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This makes devlink_rate_tc_bw.py present in the Makefile under the same
directory.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-2-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This removes some noise that can be distracting while looking at
selftests by redirecting socat stderr to /dev/null.
Before this commit, netcons_basic would output:
Running with target mode: basic (ipv6)
2025/11/29 12:08:03 socat[259] W exiting on signal 15
2025/11/29 12:08:03 socat[271] W exiting on signal 15
basic : ipv6 : Test passed
Running with target mode: basic (ipv4)
2025/11/29 12:08:05 socat[329] W exiting on signal 15
2025/11/29 12:08:05 socat[322] W exiting on signal 15
basic : ipv4 : Test passed
Running with target mode: extended (ipv6)
2025/11/29 12:08:08 socat[386] W exiting on signal 15
2025/11/29 12:08:08 socat[386] W exiting on signal 15
2025/11/29 12:08:08 socat[380] W exiting on signal 15
extended : ipv6 : Test passed
Running with target mode: extended (ipv4)
2025/11/29 12:08:10 socat[440] W exiting on signal 15
2025/11/29 12:08:10 socat[435] W exiting on signal 15
2025/11/29 12:08:10 socat[435] W exiting on signal 15
extended : ipv4 : Test passed
After these changes, output looks like:
Running with target mode: basic (ipv6)
basic : ipv6 : Test passed
Running with target mode: basic (ipv4)
basic : ipv4 : Test passed
Running with target mode: extended (ipv6)
extended : ipv6 : Test passed
Running with target mode: extended (ipv4)
extended : ipv4 : Test passed
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251129-netcons-socat-noise-v1-1-605a0cea8fca@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
New NIPA installation had been reporting a few flaky tests.
arp_ndisc_evict_nocarrier is most flaky of them all.
I suspect that the flakiness is due to udev swapping the MAC
addresses on the interfaces. Extend the message in
arp_ndisc_evict_nocarrier to hint at this potential issue.
Having the neigh get fail right after ping is rather unusual,
unless udev changes the MAC addr causing a flush in the meantime.
Link: https://patch.msgid.link/20251127194556.2409574-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
oji0AQC5jl35xh04fJKB343InVAxtRFp8mSkJJ9Bx6x7xA7a+QEAiBMxYilUgYIW
bZMcI5LU+gNO/1y076QkVt84jTUQLww=
=WIBZ
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfd and coredump updates from Christian Brauner:
"Features:
- Expose coredump signal via pidfd
Expose the signal that caused the coredump through the pidfd
interface. The recent changes to rework coredump handling to rely
on unix sockets are in the process of being used in systemd. The
previous systemd coredump container interface requires the coredump
file descriptor and basic information including the signal number
to be sent to the container. This means the signal number needs to
be available before sending the coredump to the container.
- Add supported_mask field to pidfd
Add a new supported_mask field to struct pidfd_info that indicates
which information fields are supported by the running kernel. This
allows userspace to detect feature availability without relying on
error codes or kernel version checks.
Cleanups:
- Drop struct pidfs_exit_info and prepare to drop exit_info pointer,
simplifying the internal publication mechanism for exit and
coredump information retrievable via the pidfd ioctl
- Use guard() for task_lock in pidfs
- Reduce wait_pidfd lock scope
- Add missing PIDFD_INFO_SIZE_VER1 constant
- Add missing BUILD_BUG_ON() assert on struct pidfd_info
Fixes:
- Fix PIDFD_INFO_COREDUMP handling
Selftests:
- Split out coredump socket tests and common helpers into separate
files for better organization
- Fix userspace coredump client detection issues
- Handle edge-triggered epoll correctly
- Ignore ENOSPC errors in tests
- Add debug logging to coredump socket tests, socket protocol tests,
and test helpers
- Add tests for PIDFD_INFO_COREDUMP_SIGNAL
- Add tests for supported_mask field
- Update pidfd header for selftests"
* tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
pidfs: reduce wait_pidfd lock scope
selftests/coredump: add second PIDFD_INFO_COREDUMP_SIGNAL test
selftests/coredump: add first PIDFD_INFO_COREDUMP_SIGNAL test
selftests/coredump: ignore ENOSPC errors
selftests/coredump: add debug logging to coredump socket protocol tests
selftests/coredump: add debug logging to coredump socket tests
selftests/coredump: add debug logging to test helpers
selftests/coredump: handle edge-triggered epoll correctly
selftests/coredump: fix userspace coredump client detection
selftests/coredump: fix userspace client detection
selftests/coredump: split out coredump socket tests
selftests/coredump: split out common helpers
selftests/pidfd: add second supported_mask test
selftests/pidfd: add first supported_mask test
selftests/pidfd: update pidfd header
pidfs: expose coredump signal
pidfs: drop struct pidfs_exit_info
pidfs: prepare to drop exit_info pointer
pidfd: add a new supported_mask field
pidfs: add missing BUILD_BUG_ON() assert on struct pidfd_info
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
ooKwAP4kR5kMjHlthf8jHmmCjVU3nQFO9hUZsIQL9gFJLOIQMAD+LLoTaq1WJufl
oSgZpREXZVmI1TK61eR6EZMB1YikGAo=
=TExi
-----END PGP SIGNATURE-----
Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains substantial namespace infrastructure changes including a new
system call, active reference counting, and extensive header cleanups.
The branch depends on the shared kbuild branch for -fms-extensions support.
Features:
- listns() system call
Add a new listns() system call that allows userspace to iterate
through namespaces in the system. This provides a programmatic
interface to discover and inspect namespaces, addressing
longstanding limitations:
Currently, there is no direct way for userspace to enumerate
namespaces. Applications must resort to scanning /proc/*/ns/ across
all processes, which is:
- Inefficient - requires iterating over all processes
- Incomplete - misses namespaces not attached to any running
process but kept alive by file descriptors, bind mounts, or
parent references
- Permission-heavy - requires access to /proc for many processes
- No ordering or ownership information
- No filtering per namespace type
The listns() system call solves these problems:
ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
size_t nr_ns_ids, unsigned int flags);
struct ns_id_req {
__u32 size;
__u32 spare;
__u64 ns_id;
struct /* listns */ {
__u32 ns_type;
__u32 spare2;
__u64 user_ns_id;
};
};
Features include:
- Pagination support for large namespace sets
- Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
- Filtering by owning user namespace
- Permission checks respecting namespace isolation
- Active Reference Counting
Introduce an active reference count that tracks namespace
visibility to userspace. A namespace is visible in the following
cases:
- The namespace is in use by a task
- The namespace is persisted through a VFS object (namespace file
descriptor or bind-mount)
- The namespace is a hierarchical type and is the parent of child
namespaces
The active reference count does not regulate lifetime (that's still
done by the normal reference count) - it only regulates visibility
to namespace file handles and listns().
This prevents resurrection of namespaces that are pinned only for
internal kernel reasons (e.g., user namespaces held by
file->f_cred, lazy TLB references on idle CPUs, etc.) which should
not be accessible via (1)-(3).
- Unified Namespace Tree
Introduce a unified tree structure for all namespaces with:
- Fixed IDs assigned to initial namespaces
- Lookup based solely on inode number
- Maintained list of owned namespaces per user namespace
- Simplified rbtree comparison helpers
Cleanups
- Header Reorganization:
- Move namespace types into separate header (ns_common_types.h)
- Decouple nstree from ns_common header
- Move nstree types into separate header
- Switch to new ns_tree_{node,root} structures with helper functions
- Use guards for ns_tree_lock
- Initial Namespace Reference Count Optimization
- Make all reference counts on initial namespaces a nop to avoid
pointless cacheline ping-pong for namespaces that can never go
away
- Drop custom reference count initialization for initial namespaces
- Add NS_COMMON_INIT() macro and use it for all namespaces
- pid: rely on common reference count behavior
- Miscellaneous Cleanups
- Rename exit_task_namespaces() to exit_nsproxy_namespaces()
- Rename is_initial_namespace() and make argument const
- Use boolean to indicate anonymous mount namespace
- Simplify owner list iteration in nstree
- nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
- nsfs: use inode_just_drop()
- pidfs: raise DCACHE_DONTCACHE explicitly
- pidfs: simplify PIDFD_GET__NAMESPACE ioctls
- libfs: allow to specify s_d_flags
- cgroup: add cgroup namespace to tree after owner is set
- nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
Fixes:
- setns(pidfd, ...) race condition
Fix a subtle race when using pidfds with setns(). When the target
task exits after prepare_nsset() but before commit_nsset(), the
namespace's active reference count might have been dropped. If
setns() then installs the namespaces, it would bump the active
reference count from zero without taking the required reference on
the owner namespace, leading to underflow when later decremented.
The fix resurrects the ownership chain if necessary - if the caller
succeeded in grabbing passive references, the setns() should
succeed even if the target task exits or gets reaped.
- Return EFAULT on put_user() error instead of success
- Make sure references are dropped outside of RCU lock (some
namespaces like mount namespace sleep when putting the last
reference)
- Don't skip active reference count initialization for network
namespace
- Add asserts for active refcount underflow
- Add asserts for initial namespace reference counts (both passive
and active)
- ipc: enable is_ns_init_id() assertions
- Fix kernel-doc comments for internal nstree functions
- Selftests
- 15 active reference count tests
- 9 listns() functionality tests
- 7 listns() permission tests
- 12 inactive namespace resurrection tests
- 3 threaded active reference count tests
- commit_creds() active reference tests
- Pagination and stress tests
- EFAULT handling test
- nsid tests fixes"
* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
nstree: fix kernel-doc comments for internal functions
nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
selftests/namespaces: fix nsid tests
ns: drop custom reference count initialization for initial namespaces
pid: rely on common reference count behavior
ns: add asserts for initial namespace active reference counts
ns: add asserts for initial namespace reference counts
ns: make all reference counts on initial namespace a nop
ipc: enable is_ns_init_id() assertions
fs: use boolean to indicate anonymous mount namespace
ns: rename is_initial_namespace()
ns: make is_initial_namespace() argument const
nstree: use guards for ns_tree_lock
nstree: simplify owner list iteration
nstree: switch to new structures
nstree: add helper to operate on struct ns_tree_{node,root}
nstree: move nstree types into separate header
nstree: decouple from ns_common header
ns: move namespace types into separate header
...
* kvm-arm64/nv-xnx-haf: (22 commits)
: Support for FEAT_XNX and FEAT_HAF in nested
:
: Add support for a couple of MMU-related features that weren't
: implemented by KVM's software page table walk:
:
: - FEAT_XNX: Allows the hypervisor to describe execute permissions
: separately for EL0 and EL1
:
: - FEAT_HAF: Hardware update of the Access Flag, which in the context of
: nested means software walkers must also set the Access Flag.
:
: The series also adds some basic support for testing KVM's emulation of
: the AT instruction, including the implementation detail that AT sets the
: Access Flag in KVM.
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
* kvm-arm64/vgic-lr-overflow: (50 commits)
: Support for VGIC LR overflows, courtesy of Marc Zyngier
:
: Address deficiencies in KVM's GIC emulation when a vCPU has more active
: IRQs than can be represented in the VGIC list registers. Sort the AP
: list to prioritize inactive and pending IRQs, potentially spilling
: active IRQs outside of the LRs.
:
: Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
: which involves special consideration for SPIs being deactivated from a
: different vCPU than the one that acked it.
KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
KVM: arm64: selftests: vgic_irq: Add timer deactivation test
KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
KVM: arm64: selftests: gic_v3: Add irq group setting helper
KVM: arm64: GICv2: Always trap GICV_DIR register
KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
KVM: arm64: GICv2: Handle LR overflow when EOImode==0
KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
KVM: arm64: GICv3: Handle in-LR deactivation when possible
KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
* kvm-arm64/sea-user:
: Userspace handling of SEAs, courtesy of Jiaqi Yan
:
: Add support for processing external aborts in userspace in situations
: where the host has failed to do so, allowing the VMM to potentially
: reinject an external abort into the VM.
Documentation: kvm: new UAPI for handling SEA
KVM: selftests: Test for KVM_EXIT_ARM_SEA
KVM: arm64: VM exit to userspace to handle SEA
Signed-off-by: Oliver Upton <oupton@kernel.org>
Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
- In order to prepare the layout for nohz_full work deferral to
user exit, the context tracking state must shrink the counter
of transitions to/from RCU not watching. The only possible hazard
is to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption.
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
On recent discussions, we also concluded that all those WRITE_ONCE()
and READ_ONCE() on list APIs deserve appropriate comments. Something
to be expected for the next cycle.
- Provide a script to apply several configs to several commits with torture.
- Allow torture to reuse a build directory in order to save needless
rebuild time.
- Various cleanups.
In "uffd-stress.c" & "uffd-unit-tests.c". address of char variable having
garbage value (uninitialized) is passed to 'write' syscall triggers
warning.
uffd-stress.c:246:39: warning: variable 'c' is uninitialized when
passed as a const pointer argument here
[-Wuninitialized-const-pointer]
uffd-unit-tests.c:581:31: warning: variable 'c' is uninitialized
when passed as a const pointer argument here
[-Wuninitialized-const-pointer]
so the fix is to assign char variable to '\0' to prevent writing of
garbage value.
Link: https://lkml.kernel.org/r/20251126160830.52124-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It is useful to transition to using a bitmap for VMA flags so we can avoid
running out of flags, especially for 32-bit kernels which are constrained
to 32 flags, necessitating some features to be limited to 64-bit kernels
only.
By doing so, we remove any constraint on the number of VMA flags moving
forwards no matter the platform and can decide in future to extend beyond
64 if required.
We start by declaring an opaque types, vma_flags_t (which resembles
mm_struct flags of type mm_flags_t), setting it to precisely the same size
as vm_flags_t, and place it in union with vm_flags in the VMA declaration.
We additionally update struct vm_area_desc equivalently placing the new
opaque type in union with vm_flags.
This change therefore does not impact the size of struct vm_area_struct or
struct vm_area_desc.
In order for the change to be iterative and to avoid impacting
performance, we designate VM_xxx declared bitmap flag values as those
which must exist in the first system word of the VMA flags bitmap.
We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(),
vma_flags_overwrite_word(), vma_flags_overwrite_word_once(),
vma_flags_set_word() and vma_flags_clear_word() in order to allow us to
update the existing vm_flags_*() functions to utilise these helpers.
This is a stepping stone towards converting users to the VMA flags bitmap
and behaves precisely as before.
By doing this, we can eliminate the existing private vma->__vm_flags field
in the vma->vm_flags union and replace it with the newly introduced opaque
type vma_flags, which we call flags so we refer to the new bitmap field as
vma->flags.
We update vma_flag_[test, set]_atomic() to account for the change also.
We adapt vm_flags_reset_once() to only clear those bits above the first
system word providing write-once semantics to the first system word (which
it is presumed the caller requires - and in all current use cases this is
so).
As we currently only specify that the VMA flags bitmap size is equal to
BITS_PER_LONG number of bits, this is a noop, but is defensive in
preparation for a future change that increases this.
We additionally update the VMA userland test declarations to implement the
same changes there.
Finally, we update the rust code to reference vma->vm_flags on update
rather than vma->__vm_flags which has been removed. This is safe for now,
albeit it is implicitly performing a const cast.
Once we introduce flag helpers we can improve this more.
No functional change intended.
Link: https://lkml.kernel.org/r/bab179d7b153ac12f221b7d65caac2759282cfe9.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The userland VMA test code relied on an internal implementation detail -
the existence of vma->__vm_flags to directly access VMA flags. There is
no need to do so when we have the vm_flags_*() helper functions available.
This is ugly, but also a subsequent commit will eliminate this field
altogether so this will shortly become broken.
This patch has us utilise the helper functions instead.
Link: https://lkml.kernel.org/r/6275c53a6bb20743edcbe92d3e130183b47d18d0.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "initial work on making VMA flags a bitmap", v3.
We are in the rather silly situation that we are running out of VMA flags
as they are currently limited to a system word in size.
This leads to absurd situations where we limit features to 64-bit
architectures only because we simply do not have the ability to add a flag
for 32-bit ones.
This is very constraining and leads to hacks or, in the worst case, simply
an inability to implement features we want for entirely arbitrary reasons.
This also of course gives us something of a Y2K type situation in mm where
we might eventually exhaust all of the VMA flags even on 64-bit systems.
This series lays the groundwork for getting away from this limitation by
establishing VMA flags as a bitmap whose size we can increase in future
beyond 64 bits if required.
This is necessarily a highly iterative process given the extensive use of
VMA flags throughout the kernel, so we start by performing basic steps.
Firstly, we declare VMA flags by bit number rather than by value,
retaining the VM_xxx fields but in terms of these newly introduced
VMA_xxx_BIT fields.
While we are here, we use sparse annotations to ensure that, when dealing
with VMA bit number parameters, we cannot be passed values which are not
declared as such - providing some useful type safety.
We then introduce an opaque VMA flag type, much like the opaque mm_struct
flag type introduced in commit bb6525f2f8 ("mm: add bitmap mm->flags
field"), which we establish in union with vma->vm_flags (but still set at
system word size meaning there is no functional or data type size change).
We update the vm_flags_xxx() helpers to use this new bitmap, introducing
sensible helpers to do so.
This series lays the foundation for further work to expand the use of
bitmap VMA flags and eventually eliminate these arbitrary restrictions.
This patch (of 4):
In order to lay the groundwork for VMA flags being a bitmap rather than a
system word in size, we need to be able to consistently refer to VMA flags
by bit number rather than value.
Take this opportunity to do so in an enum which we which is additionally
useful for tooling to extract metadata from.
This additionally makes it very clear which bits are being used for what
at a glance.
We use the VMA_ prefix for the bit values as it is logical to do so since
these reference VMAs. We consistently suffix with _BIT to make it clear
what the values refer to.
We declare bit values even when the flags that use them would not be
enabled by config options as this is simply clearer and clearly defines
what bit numbers are used for what, at no additional cost.
We declare a sparse-bitwise type vma_flag_t which ensures that users can't
pass around invalid VMA flags by accident and prepares for future work
towards VMA flags being a bitmap where we want to ensure bit values are
type safe.
To make life easier, we declare some macro helpers - DECLARE_VMA_BIT()
allows us to avoid duplication in the enum bit number declarations (and
maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to
assist with declaration of flags.
Unfortunately we can't declare both in the enum, as we run into issue with
logic in the kernel requiring that flags are preprocessor definitions, and
additionally we cannot have a macro which declares another macro so we
must define each flag macro directly.
Additionally, update the VMA userland testing vma_internal.h header to
include these changes.
We also have to fix the parameters to the vma_flag_*_atomic() functions
since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will
complain otherwise.
We have to update some rather silly if-deffery found in mm/task_mmu.c
which would otherwise break.
Finally, we update the rust binding helper as now it cannot auto-detect
the flags at all.
Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
test_tc_edt currently defines the target rate in both the userspace and
BPF parts. This value could be defined once in the userspace part if we
make it able to configure the BPF program before starting the test.
Add a target_rate variable in the BPF part, and make the userspace part
set it to the desired rate before attaching the shaping program.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that test_tc_edt has been integrated in test_progs, remove the
legacy shell script.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX
veth to shape the traffic to 5MBps. It then checks that the amount of
received bytes (at interface level), compared to the TX duration, indeed
matches 5Mbps.
Convert this test script to the test_progs framework:
- keep the double veth setup, isolated in two veths
- run a small tcp server, and connect client to server
- push a pre-configured amount of bytes, and measure how much time has
been needed to push those
- ensure that this rate is in a 2% error margin around the target rate
This two percent value, while being tight, is hopefully large enough to
not make the test too flaky in CI, while also turning it into a small
example of BPF-based shaping.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test_tc_edt BPF program uses a custom section name, which works fine
when manually loading it with tc, but prevents it from being loaded with
libbpf.
Update the program section name to "tc" to be able to manipulate it with
a libbpf-based C test.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add stats to observe the success and failure rate of lock acquisition
attempts in various contexts.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub reported increased flakiness in bond_macvlan_ipvlan.sh on regular
kernel, while the tests consistently pass on a debug kernel. This suggests
a timing-sensitive issue.
To mitigate this, introduce a short sleep before each xvlan_over_bond
connectivity check. The delay helps ensure neighbor and route cache
have fully converged before verifying connectivity.
The sleep interval is kept minimal since check_connection() is invoked
nearly 100 times during the test.
Fixes: 246af950b9 ("selftests: bonding: add macvlan over bond testing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251114082014.750edfad@kernel.org
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251127143310.47740-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
runqslower was added in commit 9c01546d26 "tools/bpf: Add runqslower
tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf
workflows. runqslower continues to live in BCC (libbpf-tools), so there
is no need to keep building and maintaining it.
Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and
selftests accordingly.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The original implementation added a hack to check_mem_access()
to prevent programs from writing into insn arrays. To get rid
of this hack, enforce BPF_F_RDONLY_PROG on map creation.
Also fix the corresponding selftest, as the error message changes
with this patch.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new VFIO selftest for measuring the time it takes to run
vfio_pci_device_init() in parallel for one or more devices.
This test serves as manual regression test for the performance
improvement of commit e908f58b6b ("vfio/pci: Separate SR-IOV VF
dev_set"). For example, when running this test with 64 VFs under the
same PF:
Before:
$ ./vfio_pci_device_init_perf_test -r vfio_pci_device_init_perf_test.iommufd.init 0000:1a:00.0 0000:1a:00.1 ...
...
Wall time: 6.653234463s
Min init time (per device): 0.101215344s
Max init time (per device): 6.652755941s
Avg init time (per device): 3.377609608s
After:
$ ./vfio_pci_device_init_perf_test -r vfio_pci_device_init_perf_test.iommufd.init 0000:1a:00.0 0000:1a:00.1 ...
...
Wall time: 0.122978332s
Min init time (per device): 0.108121915s
Max init time (per device): 0.122762761s
Avg init time (per device): 0.113816748s
This test does not make any assertions about performance, since any such
assertion is likely to be flaky due to system differences and random
noise. However this test can be fed into automation to detect
regressions, and can be used by developers in the future to measure
performance optimizations.
Suggested-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-19-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Eliminate INVALID_IOVA as there are platforms where UINT64_MAX is a
valid iova.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-18-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Split out the contents of libvfio.h into separate header files, but keep
libvfio.h as the top-level include that all tests can use.
Put all new header files into a libvfio/ subdirectory to avoid future
name conflicts in include paths when libvfio is used by other selftests
like KVM.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-17-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Move the vfio_selftests_*() helpers into their own file libvfio.c. These
helpers have nothing to do with struct vfio_pci_device, so they don't
make sense in vfio_pci_device.c.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-16-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Rename vfio_util.h to libvfio.h to match the name of libvfio.mk.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-15-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Drop the struct vfio_pci_device wrappers for IOMMU map/unmap functions
and require tests to directly call iommu_map(), iommu_unmap(), etc. This
results in more concise code, and also makes it clear the map operations
are happening on a struct iommu, not necessarily on a specific device,
especially when multi-device tests are introduced.
Do the same for iova_allocator_init() as that function only needs the
struct iommu, not struct vfio_pci_device.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-14-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Move the IOVA allocator into its own file, to provide better separation
between the allocator and the struct vfio_pci_device helper code.
The allocator could go into iommu.c, but it is standalone enough that a
separate file seems cleaner. This also continues the trend of having a
.c for every major object in VFIO selftests (vfio_pci_device.c,
vfio_pci_driver.c, iommu.c, and now iova_allocator.c).
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-13-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Move all the IOMMU related library code into their own file iommu.c.
This provides a better separation between the vfio_pci_device helper
code and the iommu code.
No function change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-12-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Rename struct vfio_dma_region to dma_region. This is in preparation for
separating the VFIO PCI device library code from the IOMMU library code.
This name change also better reflects the fact that DMA mappings can be
managed by either VFIO or IOMMUFD. i.e. the "vfio_" prefix is
misleading.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-11-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Upgrade various logging in the VFIO selftests drivers from dev_info() to
dev_err(). All of these logs indicate scenarios that may be unexpected.
For example, the logging during probing indicates matching devices but
that aren't supported by the driver. And the memcpy errors can indicate
a problem if the caller was not trying to do something like exercise I/O
fault handling. Exercising I/O fault handling is certainly a valid thing
to do, but the driver can't infer the caller's expectations, so better
to just log with dev_err().
Suggested-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-10-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Prefix log messages with the device's BDF where relevant. This will help
understanding VFIO selftests logs when tests are run with multiple
devices.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-9-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Eliminate overly chatty logs that are printed during almost every test.
These logs are adding more noise than value. If a test cares about this
information it can log it itself. This is especially true as the VFIO
selftests gains support for multiple devices in a single test (which
multiplies all these logs).
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-8-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Support tests that want to add multiple devices to the same
container/iommufd by decoupling struct vfio_pci_device from
struct iommu.
Multi-devices tests can now put multiple devices in the same
container/iommufd like so:
iommu = iommu_init(iommu_mode);
device1 = vfio_pci_device_init(bdf1, iommu);
device2 = vfio_pci_device_init(bdf2, iommu);
device3 = vfio_pci_device_init(bdf3, iommu);
...
vfio_pci_device_cleanup(device3);
vfio_pci_device_cleanup(device2);
vfio_pci_device_cleanup(device1);
iommu_cleanup(iommu);
To account for the new separation of vfio_pci_device and iommu, update
existing tests to initialize and cleanup a struct iommu.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-7-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Introduce struct iommu, which logically represents either a VFIO
container or an iommufd IOAS, depending on which IOMMU mode is used by
the test.
This will be used in a subsequent commit to allow devices to be added to
the same container/iommufd.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-6-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Rename struct vfio_iommu_mode to struct iommu_mode since the mode can
include iommufd. This also prepares for splitting out all the IOMMU code
into its own structs/helpers/files which are independent from the
vfio_pci_device code.
No function change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-5-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Add support for passing multiple device BDFs to a test via the command
line. This is a prerequisite for multi-device tests.
Single-device tests can continue using vfio_selftests_get_bdf(), which
will continue to return argv[argc - 1] (if it is a BDF string), or the
environment variable $VFIO_SELFTESTS_BDF otherwise.
For multi-device tests, a new helper called vfio_selftests_get_bdfs() is
introduced which will return an array of all BDFs found at the end of
argv[], as well as the number of BDFs found (passed back to the caller
via argument). The array of BDFs returned does not need to be freed by
the caller.
The environment variable VFIO_SELFTESTS_BDF continues to support only a
single BDF for the time being.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-4-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Split run.sh into separate scripts (setup.sh, run.sh, cleanup.sh) to
enable multi-device testing, and prepare for VFIO selftests
automatically detecting which devices to use for testing by storing
device metadata on the filesystem.
- setup.sh takes one or more BDFs as arguments and sets up each device.
Metadata about each device is stored on the filesystem in the
directory:
${TMPDIR:-/tmp}/vfio-selftests-devices
Within this directory is a directory for each BDF, and then files in
those directories that cleanup.sh uses to cleanup the device.
- run.sh runs a selftest by passing it the BDFs of all set up devices.
- cleanup.sh takes zero or more BDFs as arguments and cleans up each
device. If no BDFs are provided, it cleans up all devices.
This split enables multi-device testing by allowing multiple BDFs to be
set up and passed into tests:
For example:
$ tools/testing/selftests/vfio/scripts/setup.sh <BDF1> <BDF2>
$ tools/testing/selftests/vfio/scripts/setup.sh <BDF3>
$ tools/testing/selftests/vfio/scripts/run.sh echo
<BDF1> <BDF2> <BDF3>
$ tools/testing/selftests/vfio/scripts/cleanup.sh
In the future, VFIO selftests can automatically detect set up devices by
inspecting ${TMPDIR:-/tmp}/vfio-selftests-devices. This will avoid the
need for the run.sh script.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-3-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Move run.sh in a new sub-directory scripts/. This directory will be used
to house various helper scripts to be used by humans and automation for
running VFIO selftests.
Opportunistically also switch run.sh from TEST_PROGS_EXTENDED to
TEST_FILES. The former is for actual test executables that are just not
run by default. TEST_FILES is a better fit for helper scripts.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-2-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
- Fix vfio selftests to remove the expectation that the IOMMU
supports a 64-bit IOVA space. These manifest both in the original
set of tests introduced this development cycle in identity mapping
the IOVA to buffer virtual address space, as well as the more
recent boundary testing. Implement facilities for collecting the
valid IOVA ranges from the backend, implement a simple IOVA
allocator, and use the information for determining extents.
(Alex Mastro)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkWXwYRHGFsZXhAc2hh
emJvdC5vcmcACgkQI5ubbjuwiyK96w//ep97M1B5RSA4/AIP4j+fovnlM2RVUrPy
RKxVYK+rzftO4Hq8sg1Co3aB68hItbYiw/0BDUgQBFFCEoInQ9zht89tsHhuXq55
lYq9hyYMnp60FLzpX8Fi2HKf8SKTdD+6P1hX5UWZfYExc/tCD/9FPE2QsfUTbnh/
u2aguEp0RhgC8xCcR/5EsGFzcBgioDcTBQq5UQ/5By51nUNvdmHKRHKRC7ZmQfiK
bwFpYseWHRdzdBH2KggNaDZUh0q7Y9+cwL2UndQV5IfwGhGWODztOsVhhoxAAu0n
c6gXYDg34i5i1epPrr8Yvv8trDzMFCRYJi4g8BsLPOBXe8Lr2bWj/4wA1ftdW6Ha
0ii4Gx1UO1vz0KZQBajMHv5pO06lJZshzEQgHr3sYgjaRg8AVTV6hI3IVaty2JWk
DPKPNSGKhjkWvWnx73KFiiepz5zXEL7r7ooFBimtBwFr/5bDB8hfPjpXMh+mhiZu
QW6DV+Mrqg1G1KDCwdHSvbBwH8IvfX/tQbs9eKvGt+6ICpH1GJ1cUr58vjcc6ESi
sJzKI0ceTjGZAhHJogO4xAzhI8H6kwtDHB1ZC9QLHIwbkXFN1m64VjOwwXuHk0HT
559gjqjH0yKNSABwRMV1nKzn9EFnivQn1tAoGkUGSFE9ZI/VfCNNlspPziZEz0z9
u14SILGMqWU=
=ywvT
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.18-rc6' into v6.19/vfio/next
Merge mainline vfio-selftest updates for ongoing v6.19 work.
Signed-off-by: Alex Williamson <alex@shazbot.org>
Test disconnected directories with two test suites
(layout4_disconnected_leafs and layout5_disconnected_branch) and 43
variants to cover the main corner cases.
These tests are complementary to the previous commit.
Add test_renameat() and test_exchangeat() helpers.
Test coverage for security/landlock is 92.1% of 1927 lines according to
LLVM 20.
Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20251128172200.760753-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
This adds tests for the edge case discussed in [1], with specific ones
for rename and link operations when the operands are through
disconnected paths, as that go through a separate code path in Landlock.
This has resulted in a warning, due to collect_domain_accesses() not
expecting to reach a different root from path->mnt:
# RUN layout1_bind.path_disconnected ...
# OK layout1_bind.path_disconnected
ok 96 layout1_bind.path_disconnected
# RUN layout1_bind.path_disconnected_rename ...
[..] ------------[ cut here ]------------
[..] WARNING: CPU: 3 PID: 385 at security/landlock/fs.c:1065 collect_domain_accesses
[..] ...
[..] RIP: 0010:collect_domain_accesses (security/landlock/fs.c:1065 (discriminator 2) security/landlock/fs.c:1031 (discriminator 2))
[..] current_check_refer_path (security/landlock/fs.c:1205)
[..] ...
[..] hook_path_rename (security/landlock/fs.c:1526)
[..] security_path_rename (security/security.c:2026 (discriminator 1))
[..] do_renameat2 (fs/namei.c:5264)
# OK layout1_bind.path_disconnected_rename
ok 97 layout1_bind.path_disconnected_rename
Move the const char definitions a bit above so that we can use the path
for s4d1 in cleanup code.
Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/027d5190-b37a-40a8-84e9-4ccbc352bcdf@maowtm.org [1]
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20251128172200.760753-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
* for-next/sysreg:
: arm64 sysreg updates/cleanups
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64/sysreg: Add ICH_VMCR_EL2
arm64/sysreg: Move generation of RES0/RES1/UNKN to function
arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
arm64/sysreg: Fix checks for incomplete sysreg definitions
arm64/sysreg: Replace TCR_EL1 field macros
* arm64/for-next/perf:
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY
perf/arm-ni: Fix and optimise register offset calculation
perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs
perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister()
perf/arm-ni: Add NoC S3 support
perf/arm_cspmu: nvidia: Add pmevfiltr2 support
perf/arm_cspmu: nvidia: Add revision id matching
perf/arm_cspmu: Add pmpidr support
perf/arm_cspmu: Add callback to reset filter config
perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
* for-next/misc:
: Miscellaneous patches
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
arm64: mm: make linear mapping permission update more robust for patial range
arm64/mm: Elide TLB flush in certain pte protection transitions
arm64/mm: Rename try_pgd_pgtable_alloc_init_mm
arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
arm64: add unlikely hint to MTE async fault check in el0_svc_common
arm64: acpi: add newline to deferred APEI warning
arm64: entry: Clean out some indirection
arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52
arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()
arm64: remove unused ARCH_PFN_OFFSET
arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack
arm64: Remove assertion on CONFIG_VMAP_STACK
* for-next/kselftest:
: arm64 kselftest patches
kselftest/arm64: Align zt-test register dumps
* for-next/efi-preempt:
: arm64: Make EFI calls preemptible
arm64/efi: Call EFI runtime services without disabling preemption
arm64/efi: Move uaccess en/disable out of efi_set_pgd()
arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper
arm64/fpsimd: Permit kernel mode NEON with IRQs off
arm64/fpsimd: Don't warn when EFI execution context is preemptible
efi/runtime-wrappers: Keep track of the efi_runtime_lock owner
efi: Add missing static initializer for efi_mm::cpus_allowed_lock
* for-next/assembler-macro:
: arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
* for-next/typos:
: Random typo/spelling fixes
arm64: Fix double word in comments
arm64: Fix typos and spelling errors in comments
* for-next/sme-ptrace-disable:
: Support disabling streaming mode via ptrace on SME only systems
kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
arm64/sme: Support disabling streaming mode via ptrace on SME only systems
* for-next/local-tlbi-page-reused:
: arm64, mm: avoid TLBI broadcast if page reused in write fault
arm64, tlbflush: don't TLBI broadcast if page reused in write fault
mm: add spurious fault fixing support for huge pmd
* for-next/mpam: (34 commits)
: Basic Arm MPAM driver (more to follow)
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
arm_mpam: Use long MBWU counters if supported
arm_mpam: Probe for long/lwd mbwu counters
arm_mpam: Consider overflow in bandwidth counter state
arm_mpam: Track bandwidth counter state for power management
arm_mpam: Add mpam_msmon_read() to read monitor value
arm_mpam: Add helpers to allocate monitors
arm_mpam: Probe and reset the rest of the features
arm_mpam: Allow configuration to be applied and restored during cpu online
arm_mpam: Use a static key to indicate when mpam is enabled
arm_mpam: Register and enable IRQs
arm_mpam: Extend reset logic to allow devices to be reset any time
arm_mpam: Add a helper to touch an MSC from any CPU
arm_mpam: Reset MSC controls from cpuhp callbacks
arm_mpam: Merge supported features during mpam_enable() into mpam_class
arm_mpam: Probe the hardware features resctrl supports
arm_mpam: Add helpers for managing the locking around the mon_sel registers
...
* for-next/acpi:
: arm64 acpi updates
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
* for-next/documentation:
: arm64 Documentation updates
Documentation/arm64: Fix the typo of register names
With time counter test, it is to verify that time count starts from 0
and always grows up then.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This test case setup one-shot timer and execute idle instruction
immediately to indicate giving up CPU, hypervisor will emulate SW
hrtimer and wakeup vCPU when SW hrtimer is fired.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add timer test case based on common arch_timer code, timer interrupt
with one-shot and period mode is tested.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Introduce the capability to send TCP traffic over IPv6 to
nft_flowtable netfilter selftest.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Introduce specific selftest for IPIP flowtable SW acceleration in
nft_flowtable.sh
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Introduce a new kexec-based selftest, luo_kexec_multi_session, to validate
the end-to-end lifecycle of a more complex LUO scenario.
While the existing luo_kexec_simple test covers the basic end-to-end
lifecycle, it is limited to a single session with one preserved file.
This new test significantly expands coverage by verifying LUO's ability to
handle a mixed workload involving multiple sessions, some of which are
intentionally empty. This ensures that the LUO core correctly preserves
and restores the state of all session types across a reboot.
The test validates the following sequence:
Stage 1 (Pre-kexec):
- Creates two empty test sessions (multi-test-empty-1,
multi-test-empty-2).
- Creates a session with one preserved memfd (multi-test-files-1).
- Creates another session with two preserved memfds
(multi-test-files-2), each containing unique data.
- Creates a state-tracking session to manage the transition to
Stage 2.
- Executes a kexec reboot via the helper script.
Stage 2 (Post-kexec):
- Retrieves the state-tracking session to confirm it is in the
post-reboot stage.
- Retrieves all four test sessions (both the empty and non-empty
ones).
- For the non-empty sessions, restores the preserved memfds and
verifies their contents match the original data patterns.
- Finalizes all test sessions and the state session to ensure a clean
teardown and that all associated kernel resources are correctly
released.
This test provides greater confidence in the robustness of the LUO
framework by validating its behavior in a more realistic, multi-faceted
scenario.
Link: https://lkml.kernel.org/r/20251125165850.3389713-19-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce a kexec-based selftest, luo_kexec_simple, to validate the
end-to-end lifecycle of a Live Update Orchestrator session across a
reboot.
While existing tests verify the uAPI in a pre-reboot context, this test
ensures that the core functionality—preserving state via Kexec Handover
and restoring it in a new kernel—works as expected.
The test operates in two stages, managing its state across the reboot by
preserving a dedicated "state session" containing a memfd. This mechanism
dogfoods the LUO feature itself for state tracking, making the test
self-contained.
The test validates the following sequence:
Stage 1 (Pre-kexec):
- Creates a test session (test-session).
- Creates and preserves a memfd with a known data pattern into the test
session.
- Creates the state-tracking session to signal progression to Stage 2.
- Executes a kexec reboot via a helper script.
Stage 2 (Post-kexec):
- Retrieves the state-tracking session to confirm it is in the
post-reboot stage.
- Retrieves the preserved test session.
- Restores the memfd from the test session and verifies its contents
match the original data pattern written in Stage 1.
- Finalizes both the test and state sessions to ensure a clean
teardown.
The test relies on a helper script (do_kexec.sh) to perform the reboot and
a shared utility library (luo_test_utils.c) for common LUO operations,
keeping the main test logic clean and focused.
Link: https://lkml.kernel.org/r/20251125165850.3389713-18-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce a selftest suite for LUO. These tests validate the core
userspace-facing API provided by the /dev/liveupdate device and its
associated ioctls.
The suite covers fundamental device behavior, session management, and the
file preservation mechanism using memfd as a test case. This provides
regression testing for the LUO uAPI.
The following functionality is verified:
Device Access:
Basic open and close operations on /dev/liveupdate.
Enforcement of exclusive device access (verifying EBUSY on a
second open).
Session Management:
Successful creation of sessions with unique names.
Failure to create sessions with duplicate names.
File Preservation:
Preserving a single memfd and verifying its content remains
intact post-preservation.
Preserving multiple memfds within a single session, each with
unique data.
A complex scenario involving multiple sessions, each containing
a mix of empty and data-filled memfds.
Note: This test suite is limited to verifying the pre-kexec functionality
of LUO (e.g., session creation, file preservation). The post-kexec
restoration of resources is not covered, as the kselftest framework does
not currently support orchestrating a reboot and continuing execution in
the new kernel.
Link: https://lkml.kernel.org/r/20251125165850.3389713-17-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "liveupdate: Rework KHO for in-kernel users", v9.
This series refactors the KHO framework to better support in-kernel users
like the upcoming LUO. The current design, which relies on a notifier
chain and debugfs for control, is too restrictive for direct programmatic
use.
The core of this rework is the removal of the notifier chain in favor of a
direct registration API. This decouples clients from the shutdown-time
finalization sequence, allowing them to manage their preserved state more
flexibly and at any time.
In support of this new model, this series also:
- Makes the debugfs interface optional.
- Introduces APIs to unpreserve memory and fixes a bug in the abort
path where client state was being incorrectly discarded. Note that
this is an interim step, as a more comprehensive fix is planned as
part of the stateless KHO work [1].
- Moves all KHO code into a new kernel/liveupdate/ directory to
consolidate live update components.
This patch (of 9):
Currently, KHO is controlled via debugfs interface, but once LUO is
introduced, it can control KHO, and the debug interface becomes optional.
Add a separate config CONFIG_KEXEC_HANDOVER_DEBUGFS that enables the
debugfs interface, and allows to inspect the tree.
Move all debugfs related code to a new file to keep the .c files clear of
ifdefs.
Link: https://lkml.kernel.org/r/20251101142325.1326536-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20251101142325.1326536-2-pasha.tatashin@soleen.com
Link: https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com [1]
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This follow-up patch completes centralization of kselftest.h and
ksefltest_harness.h includes in remaining seltests files, replacing all
relative paths with a non-relative paths using shared -I include path in
lib.mk
Tested with gcc-13.3 and clang-18.1, and cross-compiled successfully on
riscv, arm64, x86_64 and powerpc arch.
[reddybalavignesh9979@gmail.com: add selftests include path for kselftest.h]
Link: https://lkml.kernel.org/r/20251017090201.317521-1-reddybalavignesh9979@gmail.com
Link: https://lkml.kernel.org/r/20251016104409.68985-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mickael Salaun <mic@digikod.net>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In test_clidr() if an empty cache level is not found then the TEST_ASSERT
will not fire. Fix this by considering all 7 possible levels when iterating
through the hierarchy. Found by inspection.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARM64_FEATURE_FIELD_BITS is set to 4 but not all ID register fields are 4
bits. See for instance ID_AA64SMFR0_EL1. The last user of this define,
ARM64_FEATURE_FIELD_BITS, is the set_id_regs selftest. Its logic assumes
the fields aren't a single bits; assert that's the case and stop using the
define. As there are no more users, ARM64_FEATURE_FIELD_BITS is removed
from the arm64 tools sysreg.h header. A separate commit removes this from
the kernel version of the header.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add some basic function interfaces such as CSR register access, local
irq enable or disable APIs.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When system returns from exception with ertn instruction, PC comes from
LOONGARCH_CSR_ERA, and CSR.CRMD comes LOONGARCH_CSR_PRMD.
Here save CSR register CSR.ERA and CSR.PRMD into stack, and then restore
them from stack. So it can be modified by exception handlers in future.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
char variable in 'so_txtime.c' & 'txtimestamp.c' were left uninitilized
when switch default case taken. which raises following warning.
txtimestamp.c:240:2: warning: variable 'tsname' is used uninitialized
whenever switch default is taken [-Wsometimes-uninitialized]
so_txtime.c:210:3: warning: variable 'reason' is used uninitialized
whenever switch default is taken [-Wsometimes-uninitialized]
initializing these variables to NULL to fix this.
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20251125165302.20079-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All are singletons - please see the respective changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaSdaWwAKCRDdBJ7gKXxA
jlRmAP92O8ez4IeVq1VVhVfGVf9Xbjd8zFTykQXuGM/3yDRoTgEA+kLDFTTaJ5Wb
MGlgQAeFXqlfMZfBljhyzbV8Ubz1BAc=
=4UwS
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-11-26-11-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"8 hotfixes. 4 are cc:stable, 7 are against mm/.
All are singletons - please see the respective changelogs for details"
* tag 'mm-hotfixes-stable-2025-11-26-11-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/filemap: fix logic around SIGBUS in filemap_map_pages()
mm/huge_memory: fix NULL pointer deference when splitting folio
MAINTAINERS: add test_kho to KHO's entry
mailmap: add entry for Sam Protsenko
selftests/mm: fix division-by-zero in uffd-unit-tests
mm/mmap_lock: reset maple state on lock_vma_under_rcu() retry
mm/memfd: fix information leak in hugetlb folios
mm: swap: remove duplicate nr_swap_pages decrement in get_swap_page_of_type()
Make all headers part of make's dependencies computations.
Otherwise, updating audit.h, common.h, scoped_base_variants.h,
scoped_common.h, scoped_multiple_domain_variants.h, or wrappers.h,
re-running make and running selftests could lead to testing stale headers.
Fixes: 6a500b2297 ("selftests/landlock: Add tests for audit flags and domain IDs")
Fixes: fefcf0f7cf ("selftests/landlock: Test abstract UNIX socket scoping")
Fixes: 5147779d5e ("selftests/landlock: Add wrappers.h")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027011440.1838514-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Jason Gunthorpe says:
====================
This series is the start of adding full DMABUF support to
iommufd. Currently it is limited to only work with VFIO's DMABUF exporter.
It sits on top of Leon's series to add a DMABUF exporter to VFIO:
https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com/
The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but
otherwise works the same as it does today for a memfd. The user can select
a slice of the FD to map into the ioas and if the underliyng alignment
requirements are met it will be placed in the iommu_domain.
Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR
memory from VFIO to an iommu_domain controlled by iommufd. This is used
for PCI Peer to Peer support in VMs, and is the last feature that the VFIO
type 1 container has that iommufd couldn't do.
The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime
control and is a use-after-free security problem.
Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there
should be no access to the MMIO it can shoot down the mapping in iommufd
which will unmap it from the iommu_domain. There is no automatic remap,
this is a safety protocol so the kernel doesn't get stuck. Userspace is
expected to know it is doing something that will revoke the dmabuf and
map/unmap it around the activity. Eg when QEMU goes to issue FLR it should
do the map/unmap to iommufd.
Since DMABUF is missing some key general features for this use case it
relies on a "private interconnect" between VFIO and iommufd via the
vfio_pci_dma_buf_iommufd_map() call.
The call confirms the DMABUF has revoke semantics and delivers a phys_addr
for the memory suitable for use with iommu_map().
Medium term there is a desire to expand the supported DMABUFs to include
GPU drivers to support DPDK/SPDK type use cases so future series will work
to add a general concept of revoke and a general negotiation of
interconnect to remove vfio_pci_dma_buf_iommufd_map().
I also plan another series to modify iommufd's vfio_compat to
transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI
of type1.
The latest series for interconnect negotation to exchange a phys_addr is:
https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com
And the discussion for design of revoke is here:
https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/
====================
Based on a shared branch with vfio.
* iommufd_dmabuf:
iommufd/selftest: Add some tests for the dmabuf flow
iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
iommufd: Have iopt_map_file_pages convert the fd to a file
iommufd: Have pfn_reader process DMABUF iopt_pages
iommufd: Allow MMIO pages in a batch
iommufd: Allow a DMABUF to be revoked
iommufd: Do not map/unmap revoked DMABUFs
iommufd: Add DMABUF to iopt_pages
vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
vfio/nvgrace: Support get_dmabuf_phys
vfio/pci: Add dma-buf export support for MMIO regions
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Share the core device pointer while invoking feature functions
vfio: Export vfio device get and put registration helpers
dma-buf: provide phys_vec to scatter-gather mapping routine
PCI/P2PDMA: Document DMABUF model
PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Separate the mmap() support from the core logic
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
- Forcefully override ARCH from x86_64 to x86 to play nice with specifying
ARCH=x86_64 on the command line.
- Extend a bunch of nested VMX to validate nested SVM as well.
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
verify KVM can save/restore nested VMX state when L1 is using 5-level
paging, but L2 is not.
- Clean up the guest paging code in anticipation of sharing the core logic for
nested EPT and nested NPT.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmUnIACgkQOlYIJqCj
N/0+Zg/+NOklTP01cgj49RDkkYyk43UQBf3VdwL72Yzq4KSHdZMx4Zggj/GbmtI6
tioIbhprSPvEw880KkSzJTh+BeJHUo0sZT23hG65AeJdWGp/nMo95SHRiW6uB5mX
g/R6RLsCUyu97rzuL+6MI9P46hBfXHCMvUh55M08UL3WAV3kX3wTh+9JSmW4e+Wc
d9C+0vtZ2GcODGaD++GVlkVt+cSqPHKhEq9KTrbgIFY5oQKuPpuxaB0HshfBQ79L
LPJozoA9qJ6fXUFWAt0QGyAXdYMg8Q1sG5PdnLwKvWUBXxYB2JMIbwFIhQqFr/NE
lZ9bxrtIclQaNrtmpYCufzyNcWl93YserfS3CJY0OtnOW/2Tcic1WEdlw0cUTgpJ
INKjwzu4CleevxpVsymHz7grZy/3qHcA8c0liX88GTk/o9r/wtK9Hgh6FpVo/6Gx
DEbyZEw5vNg1mEGFIXtHxHbuyVqoR8/HLsPhMaMSL1R6A4fqVNAFjjxnydSKsUOr
5gbqZp4R4iRan48iYrsgcb4gQ5bAXzedCEBzVnX7Gxp3wCW56YHqTMBJumBV2Dji
jdYUKUgV0TCP2xgGaA5HRrTicxHXsr/wS2IojgoqCouvtHhrsDS036DhL2Wm4nf+
f1I9nAaLdDxFhfB45I2CWjVj4KWkEZMUnr5vTbtWuwiZmVo3vMM=
=vqnp
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-selftests-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.19:
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
- Forcefully override ARCH from x86_64 to x86 to play nice with specifying
ARCH=x86_64 on the command line.
- Extend a bunch of nested VMX to validate nested SVM as well.
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
verify KVM can save/restore nested VMX state when L1 is using 5-level
paging, but L2 is not.
- Clean up the guest paging code in anticipation of sharing the core logic for
nested EPT and nested NPT.
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
rough edges in guest_memfd along the way.
- Define a CLASS to automatically handle get+put when grabbing a guest_memfd
from a memslot to make it harder to leak references.
- Enhance KVM selftests to make it easer to develop and debug selftests like
those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
often result in hard-to-debug SIGBUS errors.
- Misc cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmSgcACgkQOlYIJqCj
N/0MBQ/+KzI/6q6AR2m9jYCbz8APcJpmAEJ4Ma0+Q4XrJfkymAmt9P4M46bRZl5G
Aznaqq9vak1weIzlaFsOzYqEWvSN54P/EfZKkqh0kCDKXGzl1HkSCC7FeThyqZz0
+uuWUiRtWI/dyNFEpIXB/G06DwqhMIKAlk421Zt84iBI/wz2oZeAgWoFCjPWca4a
/L/ClmpzM6LnP/Hg2DyoZtfwAXIy65pb9h0IhKbvGcgSrS4sesZPiSV20KKvKVSp
4+WVLHuNbjk9vWKkmV8IZH+BXAO2J2+y2JYckbx4DvUKQcauXUJjFjp8+wZ4gMrC
SK3SuWTTc3oe7fhaZZ98KY3BO9dq68iCbxpmdkhYrEbNqcDbQA5GMhIUZ2TgqDX7
KQ58s9zyHZvsX4cnL3XY9igsdl6tvqnFqVhGyBaxUrWNJDqAs3NPJr8Td3cnrMKg
MRB2AaJN8xX6DK7JAxKQ4LC0Nb7w3hxF0t0XL2XzBw+6nsu67NjrM7EflIhG+mHJ
YWhrnNh83Yut95cym5mpUU0kFqfHNB5SxRGGokDcQ1NAXBwwE2Tq+YopR9OKRfiZ
52/hkC195D7Xaeo9raf6iKN+YfiZ0uj3TvxuxIHs/EIqmmjhsc8VSwkfFW5OIeYf
Tulmh/ohkq1675By/dapyHvW97PSgd0IqbDzjDrlxmbZbuuS1F8=
=j3On
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-gmem-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM guest_memfd changes for 6.19:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
rough edges in guest_memfd along the way.
- Define a CLASS to automatically handle get+put when grabbing a guest_memfd
from a memslot to make it harder to leak references.
- Enhance KVM selftests to make it easer to develop and debug selftests like
those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
often result in hard-to-debug SIGBUS errors.
- Misc cleanups.
so_peek_off.c is reported to be flaky on NIPA:
# # so_peek_off.c:149:two_chunks_overlap_blocking:Expected -1 (-1) != bytes (-1)
# # two_chunks_overlap_blocking: Test terminated by assertion
# # FAIL so_peek_off.stream.two_chunks_overlap_blocking
The test fork()s a child process to send() data after 1ms to
wake up the parent process being blocked (up to 3ms) on recv().
But, from the log, the parent woke up after 3ms timeout, so it
could be too short when the host is overloaded.
Let's extend it to 5s.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251124070722.1e828c53@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124212805.486235-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Somehow AF_UNIX tests have reused ../.gitignore,
but now NIPA warns about it.
Let's create .gitignore under af_unix/.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124212805.486235-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sk->sk_timer has been used for TCP keepalives.
Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.
Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.
Added space is reclaimed in the following patch.
This includes changes to MPTCP, which was also using sk_timer.
Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow users to configure the critical section delay for both task/normal
and NMI contexts, and set to 20ms and 10ms as before by default.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add statistics per-CPU broken down by context and various timing windows
for the time taken to acquire an rqspinlock. Cases where all
acquisitions fit into the 10ms window are skipped from printing,
otherwise the full breakdown is displayed when printing the summary.
This allows capturing precisely the number of times outlier attempts
happened for a given lock in a given context.
A critical detail is that time is captured regardless of success or
failure, which is important to capture events for failed but long
waiting timeout attempts.
Output:
[ 64.279459] rqspinlock acquisition latency histogram (ms):
[ 64.279472] cpu1: total 528426 (normal 526559, nmi 1867)
[ 64.279477] 0-1ms: total 524697 (normal 524697, nmi 0)
[ 64.279480] 2-2ms: total 3652 (normal 1811, nmi 1841)
[ 64.279482] 3-3ms: total 66 (normal 47, nmi 19)
[ 64.279485] 4-4ms: total 2 (normal 1, nmi 1)
[ 64.279487] 5-5ms: total 1 (normal 1, nmi 0)
[ 64.279489] 6-6ms: total 1 (normal 0, nmi 1)
[ 64.279490] 101-150ms: total 1 (normal 0, nmi 1)
[ 64.279492] >= 251ms: total 6 (normal 2, nmi 4)
...
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is
calculated nicely by adding to the mode enum. Enables running single CPU
AA tests.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The bench test "trig-kernel-count" can be used as a baseline comparison
for fentry and other benchmarks, and the calling to bpf_get_numa_node_id()
should be considered as composition of the baseline. So, let's call it in
trigger_count(). Meanwhile, rename trigger_count() to
trigger_kernel_count() to make it easier understand.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
Basic tests of establishing a dmabuf and revoking it. The selftest kernel
side provides a basic small dmabuf for this testing.
Link: https://patch.msgid.link/r/9-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
netdev CI reserves SKIP in selftests for cases which can't be executed
due to setup issues, like missing or old commands. Tests which are
expected to fail must use XFAIL.
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251123021601.158709-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit ensures that the required log level is set at the start of
the test iteration.
Part of the cleanup performed at the end of each test iteration resets
the log level (do_cleanup in lib_netcons.sh) to the values defined at the
time test script started. This may cause further test iterations to fail
if the default values are not sufficient.
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251121-netcons-basic-loglevel-v1-1-577f8586159c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Increase the receiver timeout. When running between machines
in different geographic regions the test needs more than
a second to SSH across and send the frames.
The bkg() command that runs the receiver defaults to 5 sec timeout,
so using 4 sec sounds like a reasonable value for the receiver itself.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace the simple modulo math with the real indirection table
read from the device. This makes the tests pass for mlx5 and
bnxt NICs.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have YNL support for RSS accessing the RSS info from
C is very easy. Instead of passing the RSS key from Python do it
directly in the C code.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that the NIC under test is configured for pure Toeplitz
hashing, and no input key transform (no symmetric hashing).
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Looks like the liburing is not updated by distros very aggressively.
Presumably because a lot of packages depend on it. I just updated
to Fedora 43 and it's still on liburing 2.9. The test is 9mo old,
at this stage I think this warrants handling the build failure
more gracefully.
Detect if iouring is recent enough and if not print a warning
and exclude the C prog from build. The Python test will just
fail since the binary won't exist. But it removes the major
annoyance of having to update liburing from sources when
developing other tests.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251121040259.3647749-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 31158ad02d ("rqspinlock: Add deadlock detection
and recovery") the updated path on re-entrancy now reports deadlock
via -EDEADLK instead of the previous -EBUSY.
Also, the way reentrancy was exercised (via fentry/lookup_elem_raw)
has been fragile because lookup_elem_raw may be inlined
(find_kernel_btf_id() will return -ESRCH).
To fix this fentry is attached to bpf_obj_free_fields() instead of
lookup_elem_raw() and:
- The htab map is made to use a BTF-described struct val with a
struct bpf_timer so that check_and_free_fields() reliably calls
bpf_obj_free_fields() on element replacement.
- The selftest is updated to do two updates to the same key (insert +
replace) in prog_test.
- The selftest is updated to align with expected errno with the
kernel’s current behavior.
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20251117060752.129648-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
vm_flags_reset() is not available in the userland VMA tests, so add a stub
which const-casts vma->vm_flags and avoids the upcoming removal of the
vma->__vm_flags field.
Link: https://lkml.kernel.org/r/4aff8bf7-d367-4ba3-90ad-13eef7a063fa@lucifer.local
Fixes: c5c67c1de357 ("tools/testing/vma: eliminate dependency on vma->__vm_flags")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: Add soft-dirty and uffd-wp support for RISC-V", v15.
This patchset adds support for Svrsw60t59b [1] extension which is ratified
now, also add soft dirty and userfaultfd write protect tracking for
RISC-V.
The patches 1 and 2 add macros to allow architectures to define their own
checks if the soft-dirty / uffd_wp PTE bits are available, in other words
for RISC-V, the Svrsw60t59b extension is supported on which device the
kernel is running. Also patch1-2 are removing "ifdef
CONFIG_MEM_SOFT_DIRTY" "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef
CONFIG_PTE_MARKER_UFFD_WP" in favor of checks which if not overridden by
the architecture, no change in behavior is expected.
This patchset has been tested with kselftest mm suite in which soft-dirty,
madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass, and
no regressions are observed in any of the other tests.
This patch (of 6):
Some platforms can customize the PTE PMD entry soft-dirty bit making it
unavailable even if the architecture provides the resource.
Add an API which architectures can define their specific implementations
to detect if soft-dirty bit is available on which device the kernel is
running.
This patch is removing "ifdef CONFIG_MEM_SOFT_DIRTY" in favor of
pgtable_supports_soft_dirty() checks that defaults to
IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), if not overridden by the architecture,
no change in behavior is expected.
We make sure to never set VM_SOFTDIRTY if !pgtable_supports_soft_dirty(),
so we will never run into VM_SOFTDIRTY checks.
[lorenzo.stoakes@oracle.com: fix VMA selftests]
Link: https://lkml.kernel.org/r/dac6ddfe-773a-43d5-8f69-021b9ca4d24b@lucifer.local
Link: https://lkml.kernel.org/r/20251113072806.795029-1-zhangchunyan@iscas.ac.cn
Link: https://lkml.kernel.org/r/20251113072806.795029-2-zhangchunyan@iscas.ac.cn
Link: https://github.com/riscv-non-isa/riscv-iommu/pull/543 [1]
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The 'FOLL_WRITE' of the copied source is located in mm_types.h of mm, not
mm.h, so fix it.
Link: https://lkml.kernel.org/r/20251117154012.197499-2-peng8420.li@gmail.com
Signed-off-by: Peng Li <peng8420.li@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit 0f20bba168 ("mm/gup: explicitly define and check internal GUP
flags, disallow FOLL_TOUCH") marked FOLL_TOUCH as a GUP-internal flag.
This causes a warning to fire when running gup_test, for example:
$ ./gup_test -L -r 100 -z
dmesg:
WARNING: CPU: 1 PID: 117 at mm/gup.c:2512 is_valid_gup_args+0x66/0x8c
Therefore, remove the "FOLL_TOUCH" test code from gup_test.c.
Link: https://lkml.kernel.org/r/20251117154012.197499-1-peng8420.li@gmail.com
Signed-off-by: Peng Li <peng8420.li@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add new benchmark style support to test transfer bandwidth for zone device
memory operations.
Link: https://lkml.kernel.org/r/20251001065707.920170-16-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add partial unmap test case which munmaps memory while in the device.
Add tests exercising mremap on faulted-in memory (CPU and GPU) at various
offsets and verify correctness.
Update anon_write_child to read device memory after fork verifying this
flow works in the kernel.
Both THP and non-THP cases are updated.
Link: https://lkml.kernel.org/r/20251001065707.920170-15-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add new tests for migrating anon THP pages, including anon_huge,
anon_huge_zero and error cases involving forced splitting of pages during
migration.
Link: https://lkml.kernel.org/r/20251001065707.920170-14-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a new test case that triggers the HW deactivation emulation path
when trapping ICV_DIR_EL1. This is obviously tied to the way KVM
works now, but the test follows the expected architectural behaviour.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-50-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Add a new test case that inject a Group-0 interrupt together
with a bunch of Group-1 interrupts, Ack/EOI the G1 interrupts,
and only then enable G0, expecting to get the G0 interrupt.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-49-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Add a new test case that makes an interrupt pending on a vcpu,
activates it, do the priority drop, and then get *another* vcpu
to do the deactivation.
Special care is taken not to trigger an exit in the process, so
that we are sure that the active interrupt is in an LR. Joy.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-48-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
When EOImode==1, perform the deactivation in the order of activation,
just to make things a bit worse for KVM. Yes, I'm nasty.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-47-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Good news: our GIC emulation is not completely broken, and we can
activate as many interrupts as we want.
Bump the test to cover all the SGIs, all the allowed PPIs, and
31 SPIs. Yes, 31, because we have 31 available priorities, and the
test is not happy with having two interrupts with the same priority.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-46-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
The PPI injection API is clear that you can't inject the timer PPIs
from userspace, since they are controlled by the timers themselves.
Add an exclusion list for this purpose.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-45-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
The architecture is pretty clear that changing the configuration of
an enable interrupt is not OK. It doesn't really matter here, but
doing the right thing is not more expensive.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-44-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
No, 0 is not a spurious INTID. Never been, never was.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-43-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Make sure G0 is disabled at the point of initialising the GIC.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-42-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Being able to set the group of an interrupt is pretty useful.
Add such a helper.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-41-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
Commit 4dfd4bba85 ("selftests/mm/uffd: refactor non-composite global
vars into struct") moved some of the operations previously implemented in
uffd_setup_environment() earlier in the main test loop.
The calculation of nr_pages, which involves a division by page_size, now
occurs before checking that default_huge_page_size() returns a non-zero
This leads to a division-by-zero error on systems with !CONFIG_HUGETLB.
Fix this by relocating the non-zero page_size check before the nr_pages
calculation, as it was originally implemented.
Link: https://lkml.kernel.org/r/20251113034623.3127012-1-cmllamas@google.com
Fixes: 4dfd4bba85 ("selftests/mm/uffd: refactor non-composite global vars into struct")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently selftests require xxd with the "-n <name>" option
which allows the user to specify a name not derived from
the input object path. Instead of relying on this newer
feature, older xxd can be used if we link our desired name
("test_progs_verification_cert") to the input object.
Many distros ship xxd in vim-common package and do not have
the latest xxd with -n support.
Fixes: b720903e2b ("selftests/bpf: Enable signature verification for some lskel tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20251120084754.640405-3-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The KVM RISC-V allows SBI MPXY extensions for Guest/VM so add
it to the get-reg-list test.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20251017155925.361560-5-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
If the linker emits warnings these should abort the build.
Otherwise they will be swallowed by run-tests.sh and not shown.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
LLVM 21 switched to -mcmodel=medium for LoongArch64 compilations.
This code model uses R_LARCH_ECALL36 relocations which might not be
supported by GNU ld which to nolibc testsuite uses by default.
ld will not resolve the relocation and all function calls will end up
as busy loops.
Use lld instead.
We can not switch to lld for all LLVM builds, as it does not support all
necessary architectures.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
As verifier now supports nested rcu critical sections, add new test
cases to make sure unbalanced usage of rcu_read_lock()/unlock() is
rejected.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, nested rcu critical sections are rejected by the verifier and
rcu_lock state is managed by a boolean variable. Add support for nested
rcu critical sections by make active_rcu_locks a counter similar to
active_preempt_locks. bpf_rcu_read_lock() increments this counter and
bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition
happens when active_rcu_locks drops to 0.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A new test is added: caller_stack_write_tail_call tests that the live
stack is correctly tracked for a tail call.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Link: https://lore.kernel.org/r/20251119160355.1160932-5-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Three tests are added:
- invalidate_pkt_pointers_by_tail_call checks that one can use the
packet pointer after a tail call. This was originally possible
and also poses not problems, but was made impossible by 1a4607ffba.
- invalidate_pkt_pointers_by_static_tail_call tests a corner case
found by Eduard Zingerman during the discussion of the original fix,
which was broken in that fix.
- subprog_result_tail_call tests that precision propagation works
correctly across tail calls. This did not work before.
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251119160355.1160932-3-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
commit 603b441623 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
changed digest of prog_tag to SHA256 but forgot to update tests
correspondingly. Fix it.
Fixes: 603b441623 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Link: https://lore.kernel.org/r/20251121061458.3145167-1-higuoxing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, test_perf_branches_no_hw() relies on the busy loop within
test_perf_branches_common() being slow enough to allow at least one
perf event sample tick to occur before starting to tear down the
backing perf event BPF program. With a relatively small fixed
iteration count of 1,000,000, this is not guaranteed on modern fast
CPUs, resulting in the test run to subsequently fail with the
following:
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:PASS:output not valid 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
test_perf_branches_no_hw:PASS:perf_event_open 0 nsec
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_bad_sample:FAIL:output not valid no valid sample from prog
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.
On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1
million increments of a volatile integer can take significantly less
than 1 millisecond. If the spin loop and detachment of the perf event
BPF program elapses before the first 1 ms sampling interval elapses,
the perf event will never end up firing. Fix this by bumping the loop
iteration counter a little within test_perf_branches_common(), along
with ensuring adding another loop termination condition which is
directly influenced by the backing perf event BPF program
executing. Notably, a concious decision was made to not adjust the
sample_freq value as that is just not a reliable way to go about
fixing the problem. It effectively still leaves the race window open.
Fixes: 67306f84ca ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Gracefully skip the test_perf_branches_hw subtest on platforms that
do not support LBR or require specialized perf event attributes
to enable branch sampling.
For example, AMD's Milan (Zen 3) supports BRS rather than traditional
LBR. This requires specific configurations (attr.type = PERF_TYPE_RAW,
attr.config = RETIRED_TAKEN_BRANCH_INSTRUCTIONS) that differ from the
generic setup used within this test. Notably, it also probably doesn't
hold much value to special case perf event configurations for selected
micro architectures.
Fixes: 67306f84ca ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251120142059.2836181-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
arm64 JIT now supports gotox instruction and jumptables, so run tests in
verifier_gotox.c for arm64.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251117130732.11107-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The select_reuseport selftest uses a custom sa46 union to represent
IPv4 and IPv6 addresses. This custom wrapper requires extra manual
handling for address family and field extraction.
Replace sa46 with sockaddr_storage and update the helper functions to
operate on native socket structures. This simplifies the code and
removes unnecessary custom address-handling logic. No functional
changes intended.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-3-hoyeon.lee@suse.com
The cls_redirect test uses a custom addr_port/tuple wrapper to represent
IPv4/IPv6 addresses and ports. This custom wrapper requires extra
conversion logic and specific helpers such as fill_addr_port(), which
are no longer necessary when using standard socket address structures.
This commit replaces addr_port/tuple with the standard sockaddr_storage
so test handles address families and ports using native socket types.
It removes the custom helper, eliminates redundant casts, and simplifies
the setup helpers without functional changes. set_up_conn() and
build_input() now take src/dst sockaddr_storage directly.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-2-hoyeon.lee@suse.com
Call paths leading to __virt_pg_map() are currently:
(a) virt_pg_map() -> virt_arch_pg_map() -> __virt_pg_map()
(b) virt_map_level() -> __virt_pg_map()
For (a), calls to virt_pg_map() from kvm_util.c make sure they update
vm->vpages_mapped, but other callers do not. Move the sparsebit_set()
call into virt_pg_map() to make sure all callers are captured.
For (b), call sparsebit_set_num() from virt_map_level().
It's tempting to have a single the call inside __virt_pg_map(), however:
- The call path in (a) is not x86-specific, while (b) is. Moving the
call into __virt_pg_map() would require doing something similar for
other archs implementing virt_pg_map().
- Future changes will reusue __virt_pg_map() for nested PTEs, which should
not update vm->vpages_mapped, i.e. a triple underscore version that does
not update vm->vpages_mapped would need to be provided.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-12-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace __virt_pg_map() calls in tests by high-level equivalent
functions, removing some loops in the process.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-11-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation
exceptions to user space. This also includes the 0x0000 instructions
managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants
to emulate instructions which do not (yet) have an opcode.
While we're at it refine the documentation for
KVM_CAP_S390_USER_INSTR0.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Test querying default values and resetting to default values for
netdevsim devlink params.
This should cover the basic paths of interest: driverinit and
non-driverinit cmodes, as well as bool and non-bool value
type. Default param values of type bool are encoded with u8 netlink
type as opposed to flag type, so that userspace can distinguish
"not-present" from false.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251119025038.651131-7-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Increase MAX_USERDATA_ITEMS from 16 to 256 entries now that the userdata
buffer is allocated dynamically.
The previous limit of 16 was necessary because the buffer was statically
allocated for all targets. With dynamic allocation, we can support more
entries without wasting memory on targets that don't use userdata.
This allows users to attach more metadata to their netconsole messages,
which is useful for complex debugging and logging scenarios.
Also update the testcase accordingly.
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251119-netconsole_dynamic_extradata-v3-4-497ac3191707@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
gro.sh and toeplitz.sh used to source in one of two setup scripts
depending on whether the test was expected to be run against
veth or a real device. veth testing is replaced by netdevsim
and existing "remote endpoint" support in our Python tests.
Add a script which sets up loopback mode.
The usage is a little bit more complicated than running
the scripts used to be. Testing used to work like this:
./../gro.sh -i eth0 ...
now the "setup script" has to be run explicitly:
NETIF=eth0 ./../ksft_setup_loopback.sh ./../gro.sh
But the functionality itself is retained.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rewrite the existing toeplitz.sh test in Python. The conversion
is a lot less exact than the GRO one. We use Netlink APIs to
get the device RSS and IRQ information. We expect that the device
has neither RPS nor RFS configured, and set RPS up as part of
the test.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rewrite the existing gro.sh test in Python. The conversion
not exact, the changes are related to integrating the test
with our "remote endpoint" paradigm. The test now reads
the IP addresses from the user config. It resolves the MAC
address (including running over Layer 3 networks).
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We're already saving the info about the local dev in env.dev
for the tests, save remote dev as well. This is more symmetric,
env generally provides the same info for local and remote end.
While at it make sure that we reliably get the detailed info
about the local dev. nsim used to read the dev info without -d.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's a common synchronization problem when a script (Python test)
uses a C program to set up some state (usually start a receiving
process for traffic). The script needs to know when the process
has fully initialized. The inverse of the problem exists for shutting
the process down - we need a reliable way to tell the process to exit.
We added helpers to do this safely in
commit 7147713799 ("selftests: drv-net: add a way to wait for a local process")
unfortunately the two operations (wait for init, and shutdown) are
controlled by a single parameter (ksft_wait). Add support for using
ksft_ready without using the second fd for exit.
This is useful for programs which wait for a specific number of packets
to rx so exit_wait is a good match, but we still need to wait for init.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251120021024.2944527-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GRO test can run on a real device or a veth.
The Toeplitz hash test can only run on a real device.
Move them from net/ to drivers/net/ and drivers/net/hw/ respectively.
There are two scripts which set up the environment for these tests
setup_loopback.sh and setup_veth.sh. Move those scripts to net/lib.
The paths to the setup files are a little ugly but they will be
deleted shortly.
toeplitz_client.sh is not a test in itself, but rather a helper
to send traffic, so add it to TEST_FILES rather than TEST_PROGS.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use just-added ksft variants for XDP qstat tests.
While at it correct the number of packets, we're sending
1000 packets now.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's a lot of cases where we try to re-run the same code with
different parameters. We currently need to either use a generator
method or create a "main" case implementation which then gets called
by trivial case functions:
def _test(x, y, z):
...
def case_int():
_test(1, 2, 3)
def case_str():
_test('a', 'b', 'c')
Add support for variants, similar to kselftests_harness.h and
a lot of other frameworks. Variants can be added as decorator
to test functions:
@ksft_variants([(1, 2, 3), ('a', 'b', 'c')])
def case(x, y, z):
...
ksft_run() will auto-generate case names:
case.1_2_3
case.a_b_c
Because the names may not always be pretty (and to avoid forcing
classes to implement case-friendly __str__()) add a wrapper class
KsftNamedVariant which lets the user specify the name for the variant.
Note that ksft_run's args are still supported. ksft_run splices args
and variant params together.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251120021024.2944527-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for adding test variants move the test case
collection logic to a dedicated function. New helper returns
(function, args, name, )
tuples. The main test loop can simply run them, not much
logic or discernment needed.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251120021024.2944527-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We're about to add more features here and finding new issues with old
ones in place is hard. Address ruff checks:
- bare exceptions
- f-string with no params
- unused import
We need to use BaseException when handling defer(), as Petr points out.
This retains the old behavior of ignoring SIGTERM while running cleanups.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251120021024.2944527-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a selftest that verifies KVM's ability to save and restore
nested state when the L1 guest is using 5-level paging and the L2
guest is using 4-level paging. Specifically, canonicality tests of
the VMCS12 host-state fields should accept 57-bit virtual addresses.
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-5-jmattson@google.com
[sean: rename to vmx_nested_la57_state_test to prep nested_<test> namespace]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use 57-bit addresses with 5-level paging on hardware that supports
LA57. Continue to use 48-bit addresses with 4-level paging on hardware
that doesn't support LA57.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-4-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Walk the guest page tables via a loop when searching for a PTE,
instead of using unique variables for each level of the page tables.
This simplifies the code and makes it easier to support 5-level paging
in the future.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Walk the guest page tables via a loop when creating new mappings,
instead of using unique variables for each level of the page tables.
This simplifies the code and makes it easier to support 5-level paging
in the future.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-8-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
vmx_tsc_adjust_test currently verifies that a nested VMLAUNCH fails with
an invalid CR3. This is irrelevant to TSC scaling, move it to a
standalone test.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-6-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-4-yosry.ahmed@linux.dev
[sean: rename to "nested_close_kvm_test" to provide nested_* sorting]
Signed-off-by: Sean Christopherson <seanjc@google.com>
For pen and stylus, the ABS_Z event reports ABS_DISTANCE values
in the hid generic kernel driver. This test is to make sure that
the assignment is properly done for all pen and stylus tools.
Same as tilt, distance is an optional event.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Tatsunosuke Tobit <tatsunosuke.tobita@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Assert that we correctly merge VMAs containing VM_SOFTDIRTY flags now that
we correctly handle these as sticky.
In order to do so, we have to account for the fact the pagemap interface
checks soft dirty PTEs and additionally that newly merged VMAs are marked
VM_SOFTDIRTY.
We do this by using use unfaulted anon VMAs, establishing one and clearing
references on that one, before establishing another and merging the two
before checking that soft-dirty is propagated as expected.
We check that this functions correctly with mremap() and mprotect() as
sample cases, because VMA merge of adjacent newly mapped VMAs will
automatically be made soft-dirty due to existing logic which does so.
We are therefore exercising other means of merging VMAs.
Link: https://lkml.kernel.org/r/d5a0f735783fb4f30a604f570ede02ccc5e29be9.1763399675.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2.
Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().
However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.
Now we have the concept of making VMA flags 'sticky', that is that they
both don't prevent merge and, importantly, are propagated to merged VMAs,
this seems a sensible alternative to the existing special-casing of
VM_SOFTDIRTY.
We additionally add a self-test that demonstrates that this logic behaves
as expected.
This patch (of 2):
Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().
However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.
This is because currently we simply ignore VM_SOFTDIRTY for the purposes
of merge, so one VMA may possess the flag and another not, and whichever
happens to be the target VMA will be the one upon which the merge is
performed which may or may not have VM_SOFTDIRTY set.
Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one
which solves this issue.
Additionally update VMA userland tests to propagate changes.
[akpm@linux-foundation.org: update comments, per Lorenzo]
Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local
Link: https://lkml.kernel.org/r/cover.1763399675.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Andrey Vagin <avagin@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For each test case, sysfs.py makes changes to DAMON, dumps DAMON internal
status and asserts the expectation is met. The dumping part should be the
same for all cases, so it is duplicated for each test case. Which means
it is easy to make mistakes. Actually a few of those duplicates are not
turning DAMON off in case of the dumping failure. It makes following
selftests that need to turn DAMON on fails with -EBUSY. Merge the status
dumping into commitment assertion with proper dumping failure handling, to
deduplicate and avoid the unnecessary following tests failures.
Link: https://lkml.kernel.org/r/20251112154114.66053-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMOS filters that are handled by the ops layer are linked to
damos->ops_filters. Owing to the ops_ prefix on the name, it is easy to
understand it is for ops layer handled filters. The other types of
filters, which are handled by the core layer, are linked to
damos->filters. Because of the name, it is easy to confuse the list is
there for not only core layer handled ones but all filters. Avoid such
confusions by renaming the field to core_filters.
Link: https://lkml.kernel.org/r/20251112154114.66053-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current shmem_allocate_area() implementation uses a hardcoded virtual
base address (BASE_PMD_ADDR) as a hint for mmap() when creating
shmem-backed test areas. This approach is fragile and may fail on systems
with ASLR or different virtual memory layouts, where the chosen address is
unavailable.
Replace the static base address with a dynamically reserved address range
obtained via mmap(NULL, ..., PROT_NONE). The memfd-backed areas and their
alias are then mapped into that reserved region using MAP_FIXED,
preserving the original layout and aliasing semantics while avoiding
collisions with unrelated mappings.
This change improves robustness and portability of the test suite without
altering its behavior or coverage.
[mehdi.benhadjkhelifa@gmail.com: make cleanup code more clear, per Mike]
Link: https://lkml.kernel.org/r/20251113142050.108638-1-mehdi.benhadjkhelifa@gmail.com
Link: https://lkml.kernel.org/r/20251111205739.420009-1-mehdi.benhadjkhelifa@gmail.com
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Hunter <david.hunter.linux@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "vma_start_write_killable"", v2.
When we added the VMA lock, we made a major oversight in not adding a
killable variant. That can run us into trouble where a thread takes the
VMA lock for read (eg handling a page fault) and then goes out to lunch
for an hour (eg doing reclaim). Another thread tries to modify the VMA,
taking the mmap_lock for write, then attempts to lock the VMA for write.
That blocks on the first thread, and ensures that every other page fault
now tries to take the mmap_lock for read. Because everything's in an
uninterruptible sleep, we can't kill the task, which makes me angry.
This patchset just adds vma_start_write_killable() and converts one caller
to use it. Most users are somewhat tricky to convert, so expect follow-up
individual patches per call-site which need careful analysis to make sure
we've done proper cleanup.
This patch (of 2):
The vma can be held read-locked for a substantial period of time, eg if
memory allocation needs to go into reclaim. It's useful to be able to
send fatal signals to threads which are waiting for the write lock.
Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Chris Li <chriscli@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Assert that we observe guard regions appearing in /proc/$pid/smaps as
expected, and when split/merge is performed too (with expected sticky
behaviour).
Also add handling for file systems which don't sanely handle mmap() VMA
merging so we don't incorrectly encounter a test failure in this
situation.
Link: https://lkml.kernel.org/r/059e62b8c67e55e6d849878206a95ea1d7c1e885.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
To ensure the retract_page_tables() logic functions correctly with the
introduction of VM_MAYBE_GUARD, add a test to assert that madvise collapse
fails when guard regions are established in the collapsed range in all
cases.
Unfortunately we cannot differentiate between e.g.
CONFIG_READ_ONLY_THP_FOR_FS not being set vs. a file-backed VMA having
collapse correctly disallowed, so in each instance we will get an assert
pass here.
We add an additional check to see whether guard regions are preserved
across collapse in case of a bug causing the collapse to succeed, which
will give us more data to debug with should this occur in future.
Link: https://lkml.kernel.org/r/0748beeb864525b8ddfa51adad7128dd32eb3ac4.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Modify existing merge new/existing userland VMA tests to assert that
sticky VMA flags behave as expected.
We do so by generating every possible permutation of VMAs being
manipulated being sticky/not sticky and asserting that VMA flags with this
property retain are retained upon merge.
Link: https://lkml.kernel.org/r/5e2c7244485867befd052f8afc8188be6a4be670.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gather all the VMA flags whose presence implies that page tables must be
copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this
rather than specifying individual flags in vma_needs_copy().
We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies
that there may be metadata contained in the page tables (that is - guard
markers) which would will not and cannot be propagated upon fork.
This was already being done manually previously in vma_needs_copy(), but
this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and
VM_UFFD_WP all of which imply the same.
Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too
- because equally a flag being VM_STICKY indicates that the VMA contains
metadat that is not propagated by being faulted in - i.e. that the VMA
metadata does not fully describe the VMA alone, and thus we must propagate
whatever metadata there is on a fork.
However, for maximum flexibility, we do not make this necessarily the case
here.
Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It is useful to be able to designate that certain flags are 'sticky', that
is, if two VMAs are merged one with a flag of this nature and one without,
the merged VMA sets this flag.
As a result we ignore these flags for the purposes of determining VMA flag
differences between VMAs being considered for merge.
This patch therefore updates the VMA merge logic to perform this action,
with flags possessing this property being described in the VM_STICKY
bitmap.
Those flags which ought to be ignored for the purposes of VMA merge are
described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also
updated to use.
As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
already had this behaviour, alongside VM_STICKY as sticky flags by
implication must not disallow merge.
Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its
own right, but this change is out of scope for this series.
The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result
of this change, once the VMA flag is set upon guard region installation,
VMAs with guard ranges will now not have their merge behaviour impacted as
a result and can be freely merged with other VMAs without VM_MAYBE_GUARD
set.
Also update the comments for vma_modify_flags() to directly reference
sticky flags now we have established the concept.
We also update the VMA userland tests to account for the changes.
Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The vma_modify_*() family of functions each either perform splits, a merge
or no changes at all in preparation for the requested modification to
occur.
When doing so for a VMA flags change, we currently don't account for any
flags which may remain (for instance, VM_SOFTDIRTY) despite the requested
change in the case that a merge succeeded.
This is made more important by subsequent patches which will introduce the
concept of sticky VMA flags which rely on this behaviour.
This patch fixes this by passing the VMA flags parameter as a pointer and
updating it accordingly on merge and updating callers to accommodate for
this.
Additionally, while we are here, we add kdocs for each of the
vma_modify_*() functions, as the fact that the requested modification is
not performed is confusing so it is useful to make this abundantly clear.
We also update the VMA userland tests to account for this change.
Link: https://lkml.kernel.org/r/23b5b549b0eaefb2922625626e58c2a352f3e93c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4.
Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.
This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.
This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.
The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.
It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.
To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:
a. if set in one VMA and not in another still permit those VMAs to be
merged (if otherwise compatible).
b. When they are merged, the resultant VMA must have the flag set.
The VMA logic is updated to propagate these flags correctly.
Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.
We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves
this issue.
Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.
The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.
This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.
Finally we introduce extensive VMA userland tests to assert that the
sticky VMA logic behaves correctly as well as guard region self tests to
assert that smaps visibility is correctly implemented.
This patch (of 9):
Currently, if a user needs to determine if guard regions are present in a
range, they have to scan all VMAs (or have knowledge of which ones might
have guard regions).
Since commit 8e2f2aeb8b ("fs/proc/task_mmu: add guard region bit to
pagemap") and the related commit a516403787 ("fs/proc: extend the
PAGEMAP_SCAN ioctl to report guard regions"), users can use either
/proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
operation at a virtual address level.
This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
that guard regions exist in ranges.
This patch remedies the situation by establishing a new VMA flag,
VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
uncertain because we cannot reasonably determine whether a
MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
additionally VMAs may change across merge/split).
We utilise 0x800 for this flag which makes it available to 32-bit
architectures also, a flag that was previously used by VM_DENYWRITE, which
was removed in commit 8d0920bde5 ("mm: remove VM_DENYWRITE") and hasn't
bee reused yet.
We also update the smaps logic and documentation to identify these VMAs.
Another major use of this functionality is that we can use it to identify
that we ought to copy page tables on fork.
We do not actually implement usage of this flag in mm/madvise.c yet as we
need to allow some VMA flags to be applied atomically under mmap/VMA read
lock in order to avoid the need to acquire a write lock for this purpose.
Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
subtest_kmem_cache_iter_check_slabinfo() fundamentally compares slab
cache names parsed out from /proc/slabinfo against those stored within
struct kmem_cache_result. The current problem is that the slab cache
name within struct kmem_cache_result is stored within a bounded
fixed-length array (sized to SLAB_NAME_MAX(32)), whereas the name
parsed out from /proc/slabinfo is not. Meaning, using ASSERT_STREQ()
can certainly lead to test failures, particularly when dealing with
slab cache names that are longer than SLAB_NAME_MAX(32)
bytes. Notably, kmem_cache_create() allows callers to create slab
caches with somewhat arbitrarily sized names via its __name identifier
argument, so exceeding the SLAB_NAME_MAX(32) limit that is in place
now can certainly happen.
Make subtest_kmem_cache_iter_check_slabinfo() more reliable by only
checking up to sizeof(struct kmem_cache_result.name) - 1 using
ASSERT_STRNEQ().
Fixes: a496d0cdc8 ("selftests/bpf: Add a test for kmem_cache_iter")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20251118073734.4188710-1-mattbobrowski@google.com
Cross-merge networking fixes after downstream PR (net-6.18-rc7).
No conflicts, adjacent changes:
tools/testing/selftests/net/af_unix/Makefile
e1bb28bf13 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
45a1cd8346 ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previous releases - regressions:
- prevent NULL deref in generic_hwtstamp_ioctl_lower(),
newer APIs don't populate all the pointers in the request
- phylink: add missing supported link modes for the fixed-link
- mptcp: fix false positive warning in mptcp_pm_nl_rm_addr
Previous releases - always broken:
- openvswitch: remove never-working support for setting NSH fields
- xfrm: number of fixes for error paths of xfrm_state creation/
modification/deletion
- xfrm: fixes for offload
- fix the determination of the protocol of the inner packet
- don't push locally generated packets directly to L2 tunnel
mode offloading, they still need processing from the standard
xfrm path
- mptcp: fix a couple of corner cases in fallback and fastclose
handling
- wifi: rtw89: hw_scan: prevent connections from getting stuck,
work around apparent bug in FW by tweaking messages we send
- af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait
- veth: more robust handing of race to avoid txq getting stuck
- eth: ps3_gelic_net: handle skb allocation failures
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkfQv8ACgkQMUZtbf5S
IruptQ//aa3wFISt/xfCxaygNnQAL1gxWbKg15cLnv88S5r5wrh+Mf/r+qHkT0IG
qaxPToztcTKNGzRqMI/7x/OEXjCdfEmJtNcb5JfYRxGkYVcje81bhHrxZN+nLLcj
jx1xD7rXCjfFNDVFSl+OzQkoTZtfGphX+JgCTKLPZ8t4qR/UhU1IwYgpoaiK0uCT
gitXJPHrDGJLHud0ilaXbn+q0ZH9ZU08QSRhsaAq7C6YPafUseEr2fuQYgFr1kkp
mtzFm84bEnRSneQ5+noVzoc5llbu3vf3Wd9Y5tr5sBaBjh5OpH/kK0kuPy6Lzhvd
8y05jl8kyASMtRBoTvmpYOdi3xUNbge5AwJN3FIo4KPCmyr1bQoNoQVIivUpoiGz
Zvv7QcceP7E5057CRuhi06krld/QaScHkmhUododbqJKmL8NWaSoXLRO7oYjB4kU
1MuAwOJWVpsqnTUEbMw7dHJPZFTqRkLYv6oAmIqb/AWzTcqg9OXjLEc+OzUqWEl6
mHv/SonSjAM81m3GxCxtNtH2ErKwdMIgI/ado39+KvksKTH1bhN9ZpmuyyktzpCj
PBn9FVdd7zyct92u5xXtsKVkDWKGyD419Z2lFHC0FaAsNQmPrSqzi+U50pAtie5M
JWABm8hsSdia4dTDrdldMgM50dzuBd5nSceY7XCqPA8nZ5+Af/8=
=89Xl
-----END PGP SIGNATURE-----
Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from IPsec and wireless.
Previous releases - regressions:
- prevent NULL deref in generic_hwtstamp_ioctl_lower(),
newer APIs don't populate all the pointers in the request
- phylink: add missing supported link modes for the fixed-link
- mptcp: fix false positive warning in mptcp_pm_nl_rm_addr
Previous releases - always broken:
- openvswitch: remove never-working support for setting NSH fields
- xfrm: number of fixes for error paths of xfrm_state creation/
modification/deletion
- xfrm: fixes for offload
- fix the determination of the protocol of the inner packet
- don't push locally generated packets directly to L2 tunnel
mode offloading, they still need processing from the standard
xfrm path
- mptcp: fix a couple of corner cases in fallback and fastclose
handling
- wifi: rtw89: hw_scan: prevent connections from getting stuck,
work around apparent bug in FW by tweaking messages we send
- af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait
- veth: more robust handing of race to avoid txq getting stuck
- eth: ps3_gelic_net: handle skb allocation failures"
* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
vsock: Ignore signal/timeout on connect() if already established
be2net: pass wrb_params in case of OS2BMC
l2tp: reset skb control buffer on xmit
net: dsa: microchip: lan937x: Fix RGMII delay tuning
selftests: mptcp: add a check for 'add_addr_accepted'
mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
selftests: mptcp: join: userspace: longer timeout
selftests: mptcp: join: endpoints: longer timeout
selftests: mptcp: join: fastclose: remove flaky marks
mptcp: fix duplicate reset on fastclose
mptcp: decouple mptcp fastclose from tcp close
mptcp: do not fallback when OoO is present
mptcp: fix premature close in case of fallback
mptcp: avoid unneeded subflow-level drops
mptcp: fix ack generation for fallback msk
wifi: rtw89: hw_scan: Don't let the operating channel be last
net: phylink: add missing supported link modes for the fixed-link
selftest: af_unix: Add test for SO_PEEK_OFF.
af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
net/mlx5: Clean up only new IRQ glue on request_irq() failure
...
The previous patch fixed an issue with the 'add_addr_accepted' counter.
This was not spot by the test suite.
Check this counter and 'add_addr_signal' in MPTCP Join 'delete re-add
signal' test. This should help spotting similar regressions later on.
These counters are crucial for ensuring the MPTCP path manager correctly
handles the subflow creation via 'ADD_ADDR'.
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-11-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.
To play it safe, all userspace tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.
The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.
Fixes: 290493078b ("selftests: mptcp: join: userspace: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-9-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some endpoints
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.
To play it safe, all endpoints tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.
The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.
Fixes: 6457595db9 ("selftests: mptcp: join: endpoints: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-8-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After recent fixes like the parent commit, and "selftests: mptcp:
connect: trunc: read all recv data", the two fastclose subtests no
longer look flaky any more.
It then feels fine to remove these flaky marks, to no longer ignore
these subtests in case of errors.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-7-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the ftrace fprobe is both fgraph and ftrace based implemented,
the selftest needs to be updated. This does not count the actual
number of lines, but just check the differences.
Link: https://lore.kernel.org/r/176295318112.431538.11780280333728368327.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit 2867495dea ("tracing: tprobe-events: Register tracepoint when
enable tprobe event") caused regression bug and tprobe did not work.
To prevent similar problems, add a testcase which enables/disables a
tprobe and check the results.
Link: https://lore.kernel.org/r/176252610176.214996.3978515319000806265.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Parsing KTAP is quite an inconvenience, but most of the time the thing
you really want to know is "did anything fail"?
Let's give the user the his information without them needing
to parse anything.
Because of the use of subshells and namespaces, this needs to be
communicated via a file. Just write arbitrary data into the file and
treat non-empty content as a signal that something failed.
In case any user depends on the current behaviour, such as running this
from a script with `set -e` and parsing the result for failures
afterwards, add a flag they can set to get the old behaviour, namely
--no-error-on-fail.
Link: https://lore.kernel.org/r/20251111-b4-ksft-error-on-fail-v3-1-0951a51135f6@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The printf statement attempts to print the DMA direction string using
the syntax 'dir[directions]', which is an invalid array access. The
variable 'dir' is an integer, and 'directions' is a char pointer array.
This incorrect syntax should be 'directions[dir]', using 'dir' as the
index into the 'directions' array. Fix this by correcting the array
access from 'dir[directions]' to 'directions[dir]'.
Link: https://lore.kernel.org/r/20251104025234.2363-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC setup to
map interrupt events to ITT entries and collection IDs to
redistributors, respectively.
We have no guarantee that the ITS will finish handling these mapping
commands before the selftest calls KVM_SIGNAL_MSI to inject LPIs to the
guest. If LPIs are injected before ITS mapping completes, the ITS cannot
properly pass the interrupt on to the redistributor.
Fix by adding a SYNC command to the selftests ITS library, then calling
SYNC after ITS mapping to ensure mapping completes before signal_lpi()
writes to GITS_TRANSLATER.
Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-2-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
The selftests GIC library and tests assume that the
GICR_TYPER.Processor_number associated with a given CPU is the same as
the CPU's selftest index.
Since this assumption is not guaranteed by specification, add an assert
in gicv3_cpu_init() that validates this is true.
Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-1-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
Add selftests to cbo.c to verify Zicbop extension behavior, and split
the previous `--sigill` mode into two options so they can be tested
independently.
The test checks:
- That hwprobe correctly reports Zicbop presence and block size.
- That prefetch instructions execute without exception on valid and NULL
addresses when Zicbop is present.
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://patch.msgid.link/20251118162436.15485-3-zihong.plct@isrc.iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Add a test case that does some basic verification of the Vector ptrace
interface. This forks a child process then using ptrace to inspect and
manipulate the v31 register of the child.
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251013091318.467864-3-yongxuan.wang@sifive.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
The new test checks that a route that has been promoted from RA-learned
to static does not switch back when a new RA message arrives. In
addition, it checks that the route is owned by RA again when the static
address is removed.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20251115095939.6967-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test covers various cases to verify SO_PEEK_OFF behaviour
for all AF_UNIX socket types.
two_chunks_blocking and two_chunks_overlap_blocking reproduce
the issue mentioned in the previous patch.
Without the patch, the two tests fail:
# RUN so_peek_off.stream.two_chunks_blocking ...
# so_peek_off.c:121:two_chunks_blocking:Expected 'bbbb' == 'aaaabbbb'.
# two_chunks_blocking: Test terminated by assertion
# FAIL so_peek_off.stream.two_chunks_blocking
not ok 3 so_peek_off.stream.two_chunks_blocking
# RUN so_peek_off.stream.two_chunks_overlap_blocking ...
# so_peek_off.c:159:two_chunks_overlap_blocking:Expected 'bbbb' == 'aaaabbbb'.
# two_chunks_overlap_blocking: Test terminated by assertion
# FAIL so_peek_off.stream.two_chunks_overlap_blocking
not ok 5 so_peek_off.stream.two_chunks_overlap_blocking
With the patch, all tests pass:
# PASSED: 15 / 15 tests passed.
# Totals: pass:15 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251117174740.3684604-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mock_get_event() uses an uninitialized local variable, nr_overflow, to
populate the overflow_err_count field. That results in incorrect
overflow_err_count values in mocked cxl_overflow trace events, such as
this case where the records are reported as 0 and should be non-zero:
[] cxl_overflow: memdev=mem7 host=cxl_mem.6 serial=7: log=Failure : 0 records from 1763228189130895685 to 1763228193130896180
Fix by using log->nr_overflow and remove the unused local variable.
A follow-up change was considered in cxl_mem_get_records_log() to
confirm that the overflow_err_count is non-zero when the overflow flag
is set [1]. Since the driver has no functional dependency on this
constraint, and a device that violates this specific requirement does
not cause incorrect driver behavior, no validation check is added.
[1] CXL 3.2, Table 8-65 Get Event Records Output Payload
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251116013036.1713313-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Commit 364ee9f326 ("cxl/test: Enhance event testing") changed the
loop iterator in mock_get_event() from a static constant,
CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
intent was to vary the number of events returned per call to simulate
events occurring while logs are being read.
However, ret_limit is modified without synchronization. When multiple
threads call mock_get_event() concurrently, one thread may read
ret_limit, another thread may increment it, and the first thread's
loop condition and size calculation see and use the updated value.
This is visible during cxl_test module load when all memdevs are
initializing simultaneously, which includes getting event records. It
is not tied to the cxl-events.sh unit test specifically, as that
operates on a single memdev.
While no actual harm results (the buffer is always large enough and
the record count fields correctly reflect what was written), this is
a correctness issue. The race creates an inconsistent state within
mock_get_event() and adding variability based on a race appears
unintended.
Make ret_limit a local variable populated from an atomic counter. Each
call gets a stable value that won't change during execution. That
preserves the intended behavior of varying the return counts across
calls while eliminating the race condition.
This implementation uses "+ 1" to produce the full range of 1 to
CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
produced.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251116013819.1713780-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
The connect4_prog and bpf_iter_setsockopt tests duplicate the same
open-coded TCP congestion control string comparison logic. Since
bpf_strncmp() provides the same functionality, use it instead to
avoid repeated open-coded loops.
This change applies only to functional BPF tests and does not affect
the verifier performance benchmarks (veristat.cfg). No functional
changes intended.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-5-hoyeon.lee@suse.com
Some BPF selftests contain identical copies of the min(), max(),
before(), and after() helpers. These repeated snippets are the same
across the tests and do not need to be defined separately.
Move these helpers into bpf_tracing_net.h so they can be shared by
TCP related BPF programs. This removes repeated code and keeps the
helpers in a single place.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-4-hoyeon.lee@suse.com
Since commit 733b57f262 ("cxl/pci: Early setup RCH dport component registers from RCRB")
is not necessary under mocking tests.
[ dj: Fixup commit representation flagged by checkpatch. ]
[ dj: Ammend subject line to indicate which function. ]
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20251118182202.2083244-1-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Some of the vsyscall selftests expect a #PF when vsyscalls are disabled.
However, with LASS enabled, an invalid access results in a SIGSEGV due
to a #GP instead of a #PF. One such negative test fails because it is
expecting X86_PF_INSTR to be set.
Update the failing test to expect either a #GP or a #PF. Also, update
the printed messages to show the trap number (denoting the type of
fault) instead of assuming a #PF.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-8-sohil.mehta%40intel.com
Add selftests to verify and document Linux’s intended behaviour for
UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes.
The tests verify that:
1. SOCK_STREAM returns EOF when the peer closes normally.
2. SOCK_STREAM returns ECONNRESET if the peer closes with unread data.
3. SOCK_SEQPACKET returns EOF when the peer closes normally.
4. SOCK_SEQPACKET returns ECONNRESET if the peer closes with unread data.
5. SOCK_DGRAM does not return ECONNRESET when the peer closes.
This follows up on review feedback suggesting a selftest to clarify
Linux’s semantics.
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Link: https://patch.msgid.link/20251113112802.44657-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ret_set_ksft_status() calls ksft_status_merge() with the current return
status and the last one. It treats a non-zero return code from
ksft_status_merge() as an indication that the return status was
overwritten by the last one and therefore overwrites the return message
with the last one.
Currently, ksft_status_merge() returns a non-zero return code even if
the current return status and the last one are equal. This results in
return messages being overwritten which is counter-productive since we
are more interested in the first failure message and not the last one.
Fix by changing ksft_status_merge() to only return a non-zero return
code if the current return status was actually changed.
Add a test case which checks that the first error message is not
overwritten.
Before:
# ./lib_sh_test.sh
[...]
TEST: RET tfail2 tfail -> fail [FAIL]
retmsg=tfail expected tfail2
[...]
# echo $?
1
After:
# ./lib_sh_test.sh
[...]
TEST: RET tfail2 tfail -> fail [ OK ]
[...]
# echo $?
0
Fixes: 596c8819cb ("selftests: forwarding: Have RET track kselftest framework constants")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251116081029.69112-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently, some debugging happened around a test that was timing out. The
stats were showing connections being closed which was confusing because
the closing state was caused by the timeout stopping the transfer.
To avoid such confusion, the timeout is no longer done per mptcp_connect
process, but separately. In case of timeout, the stats are now printed,
then the apps are killed.
The stats will still be printed after the kill, but that's fine, and
this might even be useful, just in case. Timeout should be exceptional.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-8-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After having started mptcp_connect in listening mode,
'mptcp_lib_wait_local_port_listen' can be used to wait for the listening
socket to be ready.
This is better than using the 'sleep' command, not to pause for a fixed
amount of time, but waiting for an event. This helper is used in all
other MPTCP selftests, but not in these two.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-7-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the same netns is used for the listener and the connector, no need
to take exactly the same packet trace twice, one is enough.
This avoids confusions when the traces are the same, and wasting
resources which might not help reproducing an issue.
While at it, avoid long lines and double spaces now that these lines are
no longer aligned.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-6-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before, 'nstat' was used to retrieve each individual counter: this means
querying 4 different sources from /proc/net and iterating over 100+
counters each time. Instead, the stats could be retrieved once, and the
output file could be parsed for each counter. Even better, such file is
already present: the nstat history file.
To be able to get this working, the nstat history file also needs to
contains zero counters too, so it is still possible to know if a counter
is missing or set to 0.
This also simplifies mptcp_connect.sh: instead of checking multiple
counters before and after a test to compute the difference, the stats
history files can be reset before each test, and nstat can display only
the difference.
mptcp_lib_get_counter() continues to work when no history file is
available: by fetching nstat directly, like before. This is the case in
diag.sh and userspace_pm.sh where there is no need to save the history
file. This is also the case in mptcp_join.sh, when 'run_tests' is
executed in the background: easier to continue fetching counters than
updating the history each time it is needed.
Note: 'nstat' is called with '-s' in mptcp_lib_nstat_get(), so this
helper can be called multiple times during the test if needed.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-5-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of errors, dump the stats from history instead of using nstat.
There are multiple advantages to that:
- The same filters from pr_err_stats are used, e.g. the unused 'rate'
column is not displayed.
- The counters are closer to the ones from when the test stopped.
- While at it, the errors can be better presented: error colours, a
small indentation to distinguish the different parts, extra new lines.
Even if it should only happen in rare cases -- internal errors, or netns
issues -- if no history is available, 'nstat' is used like before, just
in case.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-4-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With the MPTCP selftests, the nstat daemon is not used. It means that
the last column (the rate) is always 0.0, and that's not something
interesting to display.
Then, this last column can be filtered out.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-3-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that these files are written from MPTCP lib helpers, the stats file
paths are uniformed. Then, no need to specify them from the each
selftest.
No behavioural changes intended.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-2-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These new helpers are easier to read than the long and multi lines
commands. Plus it will ease the addition of new features related to that
in the next commits.
No behavioural changes intended.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251114-net-next-mptcp-sft-count-cache-stats-timeout-v1-1-863cb04e1b7b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On a system which support SME but not SVE we can now disable streaming mode
via ptrace by writing FPSIMD formatted data through NT_ARM_SVE with a VL of
0. Extend fp-ptrace to cover rather than skip these cases, relax the check
for SVE writes of FPSIMD format data to not skip if SME is supported and
accept 0 as the VL when performing the ptrace write.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to allow exiting streaming mode on systems with SME but not SVE
we allow writes of FPSIMD format data via NT_ARM_SVE even when SVE is not
supported, add a test case that covers this to sve-ptrace.
We do not support reads.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Extended linear cache unit testing support
- Standardize CXL auto region size
- Add cxl_test CFMWS support for extended linear cache
- Add support for acpi extended linear cache
Add the mock wrappers for hmat_get_extended_linear_cache_size() in order
to emulate the ACPI helper function for the regions that are mock'd by
cxl_test.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20251117144611.903692-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Add a module parameter to allow activation of extended linear cache
on the auto region for cxl_test. The current platform implementation
for extended linear cache is 1:1 of DRAM and CXL memory. A CFMWS is
created with the size of both memory together where DRAM takes the
first part of the memory range and CXL covers the second part. The
current CXL auto region on cxl_test consists of 2 256M devices that
creates a 512M region. The new extended linear cache setup will have
512M DRAM and 512M CXL memory for a total of 1G CFMWS. The hardware
decoders must have their starting offset moved to after the DRAM region
to handle the CXL regions.
[ dj: Fixup commenting style. (Jonathan) ]
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20251117144611.903692-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Create a global define for the size of the mock CXL auto region used
in cxl_test. Remove the declared size in mock_init_hdm_decoder()
function.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20251117144611.903692-2-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Remove s390 compat support from everything within tools, since s390 compat
support will be removed from the kernel.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO
Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
A new DAMON sysfs file for pin-point target removal, namely
obsolete_target, has been added. Add a test for the functionality. It
starts DAMON with three monitoring target processes, mark one in the
middle as obsolete, commit it, and confirm the internal DAMON status is
updated to remove the target in the middle.
Link: https://lkml.kernel.org/r/20251023012535.69625-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
assert_ctx_committed() is not asserting monitoring targets commitment,
since all existing callers of the function assume no target changes.
Extend it for future usage.
Link: https://lkml.kernel.org/r/20251023012535.69625-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A new field of damon_target for pin-point target removal, namely obsolete,
has newly been added. Extend drgn_dump_damon_status.py to dump it, for
easily writing a future DAMON selftests of it.
Link: https://lkml.kernel.org/r/20251023012535.69625-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A DAMON sysfs file, namely obsolete_target, has been newly introduced.
Add a support of that file to _damon_sysfs.py so that DAMON selftests for
the file can be easily written.
Link: https://lkml.kernel.org/r/20251023012535.69625-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bijan Tabatabai <bijan311@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Some drivers/filesystems need to perform additional tasks after the VMA is
set up. This is typically in the form of pre-population.
The forms of pre-population most likely to be performed are a PFN remap
or the insertion of normal folios and PFNs into a mixed map.
We start by implementing the PFN remap functionality, ensuring that we
perform the appropriate actions at the appropriate time - that is setting
flags at the point of .mmap_prepare, and performing the actual remap at the
point at which the VMA is fully established.
This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.
Unfortunately callers still do often require some kind of custom action,
so we add an optional success/error _hook to allow the caller to do
something after the action has succeeded or failed.
This is done at the point when the VMA has already been established, so
the harm that can be done is limited.
The error hook can be used to filter errors if necessary.
There may be cases in which the caller absolutely must hold the file rmap
lock until the operation is entirely complete. It is an edge case, but
certainly the hugetlbfs mmap hook requires it.
To accommodate this, we add the hide_from_rmap_until_complete flag to the
mmap_action type. In this case, if a new VMA is allocated, we will hold the
file rmap lock until the operation is entirely completed (including any
success/error hooks).
Note that we do not need to update __compat_vma_mmap() to accommodate this
flag, as this function will be invoked from an .mmap handler whose VMA is
not yet visible, so we implicitly hide it from the rmap.
If any error arises on these final actions, we simply unmap the VMA
altogether.
Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.
While we're here, rename __compat_vma_mmap_prepare() to __compat_vma_mmap()
as we are now performing actions invoked by the mmap_prepare in addition to
just the mmap_prepare hook.
Link: https://lkml.kernel.org/r/2601199a7b2eaeadfcd8ab6e199c6d1706650c94.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
To reproduce the issue mentioned by [1], this add a setting of
pages_to_scan and sleep_millisecs at the start of test_prctl_fork_exec().
The main change is just raise the scanning frequency of ksmd.
[1] https://lore.kernel.org/all/202510012256278259zrhgATlLA2C510DMD3qI@zte.com.cn/
Link: https://lkml.kernel.org/r/20251007182935207jm31wCIgLpZg5XbXQY64S@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All are singletons - please see the respective changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaRoauQAKCRDdBJ7gKXxA
jtNFAQDEMH0+zRGz/Larkf9cgmdKcDgij1DP2gP/3i8PWAoaGQD8C9evZxu1h9wC
rFbaSkPDeSdDafo3RZfpo1gqE0LdEA4=
=oew8
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"7 hotfixes. 5 are cc:stable, 4 are against mm/
All are singletons - please see the respective changelogs for details"
* tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm, swap: fix potential UAF issue for VMA readahead
selftests/user_events: fix type cast for write_index packed member in perf_test
lib/test_kho: check if KHO is enabled
mm/huge_memory: fix folio split check for anon folios in swapcache
MAINTAINERS: update David Hildenbrand's email address
crash: fix crashkernel resource shrink
mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member
warning due to potential unaligned pointer access:
perf_test.c:239:38: warning: taking address of packed member 'write_index'
of class or structure 'user_reg' may result in an unaligned pointer value
[-Waddress-of-packed-member]
239 | ASSERT_NE(-1, write(self->data_fd, ®.write_index,
| ^~~~~~~~~~~~~~~
Since write(2) works with any alignment. Casting '®.write_index'
explicitly to 'void *' to suppress this warning.
Link: https://lkml.kernel.org/r/20251106095532.15185-1-ankitkhushwaha.linux@gmail.com
Fixes: 42187bdc3c ("selftests/user_events: Add perf self-test for empty arguments events")
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: sunliming <sunliming@kylinos.cn>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test to check that bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) is
rejected (-EINVAL) if skb->transport_header is not set. The test
needs to lower the MTU of the loopback device. Thus, take this
opportunity to run the test in a netns by adding "ns_" to the test
name. The "serial_" prefix can then be removed.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251112232331.1566074-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_task_work_schedule_resume() and bpf_task_work_schedule_signal() have
been renamed in bpf tree to bpf_task_work_schedule_resume_impl() and
bpf_task_work_schedule_signal_impl() accordingly.
There are few uses of these kfuncs in selftests that are not in bpf
tree, so that when we port [1] into bpf-next, those BPF programs will
not compile.
This patch aligns those remaining callsites with the kfunc renaming.
It should go on top of [1] when applying on bpf-next.
1: https://lore.kernel.org/all/20251104-implv2-v3-0-4772b9ae0e06@meta.com/
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251105132105.597344-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The XDP qstats tests send 2k packets over a single socket.
Looks like when netdev CI is busy running those tests in QEMU
occasionally flakes. The target doesn't get to run at all
before all 2000 packets are sent.
Lower the number of packets to 1000 and reopen the socket
every 50 packets, to give RSS a chance to spread the packets
to multiple queues.
For the netdev CI testing either lowering the count or using
multiple sockets is enough, but let's do both for extra resiliency.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20251113152703.3819756-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On clang 20.1.8 the XDP program fails to load with a register spill error.
Since hdr_len is a __u32, the compiler decided it only needed the lower
32-bits of ctx->data, which later triggers the register spill verifier
error.
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251113043102.4062150-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When an IPv6 address with a finite lifetime (configured with valid_lft
and preferred_lft) is manually deleted, the kernel does not clean up the
associated prefix route. This results in orphaned routes (marked "proto
kernel") remaining in the routing table even after their corresponding
address has been deleted.
This is particularly problematic on networks using combination of SLAAC
and bridges.
1. Machine comes up and performs RA on eth0.
2. User creates a bridge
- does an ip -6 addr flush dev eth0;
- adds the eth0 under the bridge.
3. SLAAC happens on br0.
Even tho the address has "moved" to br0 there will still be a route
pointing to eth0, but eth0 is not usable for IP any more.
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251113031700.3736285-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add several ./test_progs tests:
1. btf/dedup:recursive typedef ensures that deduplication no
longer fails on recursive typedefs.
2. btf/dedup:typedef ensures that typedefs are deduplicated correctly
just as they were before this patch.
Signed-off-by: Paul Houssel <paul.houssel@orange.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/9fac2f744089f6090257d4c881914b79f6cd6c6a.1763037045.git.paul.houssel@orange.com
When test_send_signal_kern__open_and_load() fails parent closes the
pipe which cases ASSERT_EQ(read(pipe_p2c...)) to fail, but child
continues and enters infinite loop, while parent is stuck in wait(NULL).
Other error paths have similar issue, so kill the child before waiting on it.
The bug was discovered while compiling all of selftests with -O1 instead of -O2
which caused progs/test_send_signal_kern.c to fail to load.
Fixes: ab8b7f0cb3 ("tools/bpf: Add self tests for bpf_send_signal_thread()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251113171153.2583-1-alexei.starovoitov@gmail.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmkXpZUACgkQ6rmadz2v
bTrGCw//UCx+KBXbzvv7m0A1QGOUL3oHL/Qd+OJA3RW3B+saVbYYzn9jjl0SRgFP
X0q/DwbDOjFtOSORV9oFgJkrucn7+BM/yxPaC4sE1SQZJAjDFA/CSaF0r8duuGsM
Mvat9TTiwwetOMAkNB9WZ1e6AKGovBLguLFGAWZc6vLeQZopcER5+pFwS44a9RrK
dq0Th8O/oY3VmUDgSKJ2KyY51KxpJU7k2ipifiIbu1M1MWZ7s2vERkMEkzJ/lB8/
nldMsTZUdknGFzVH/W6Rc9ScFYlH+h/x1gkOHwTibMsqDBm92mWVo6O7hvuUbsEO
NlPDgMtkhBp7PDSx9SA0UBcriMs1M6ovNBOpj/cI4AL1k8WNubf/FHZtrBwoy8C9
3HaM+8lkA2uiHVPUvT5dImzWqshweN0GXoXAoa9xPSQPchJ38UdzCHqYRAg/kWFZ
5jUK2j4e5+yyII44pD7Xti0PrfoP81giliqmTbGFV8+Y89dQnk+WK12vnbv34ER7
unLwId8HLtq0ZN7FVG4F6s/4qNdEMKqXbAkve0WWFXn4vKZMCju4ol6NYVGisRAg
zcn7Yk+weSuY3UOzC+/4SxhfTEAD0Kg6fUoG/1JdflgNsm8XhLBja0DZaAlIVO0p
xz5UaljwcNvjAKGGMYbCGrf3XN2tOmGpVyJkMj17Vcq88y3bJBU=
=JJui
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix interaction between livepatch and BPF fexit programs (Song Liu)
With Steven and Masami acks.
- Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa)
With Steven and Masami acks.
- Fix out of bounds access in widen_imprecise_scalars() in the verifier
(Eduard Zingerman)
- Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen)
- Fix net_sched storage collision with BPF data_meta/data_end (Eric
Dumazet)
- Add _impl suffix to BPF kfuncs with implicit args to avoid breaking
them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test widen_imprecise_scalars() with different stack depth
bpf: account for current allocated stack depth in widen_imprecise_scalars()
bpf: Add bpf_prog_run_data_pointers()
selftests/bpf: Add mptcp test with sockmap
mptcp: Fix proto fallback detection with BPF
mptcp: Disallow MPTCP subflows from sockmap
selftests/bpf: Add stacktrace ips test for raw_tp
selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"
bpf: add _impl suffix for bpf_stream_vprintk() kfunc
bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
selftests/bpf: Add tests for livepatch + bpf trampoline
ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()
ftrace: Fix BPF fexit with livepatch
Increase arena test coverage.
Convert glob_match() to bpf arena in two steps:
1.
Copy paste lib/glob.c into bpf_arena_strsearch.h
Copy paste lib/globtests.c into progs/arena_strsearch.c
2.
Add __arena to pointers
Add __arg_arena to global functions that accept arena pointers
Add cond_break to loops
The test also serves as a good example of what's possible
with bpf arena and how existing algorithms can be converted.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251111032931.21430-1-alexei.starovoitov@gmail.com
If interrupted by a signal clock_nanosleep() returns the remaining time
into the structure pointed to by the rmtp parameter. So far this
functionality was not tested by the timer selftests.
Extend the nanosleep selftest to cover this feature.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251106-nanosleep-rtmp-selftest-v1-1-f9212fb295fe@linutronix.de
Several tests in the posix_timers selftest which test timer behavior
related to SIG_IGN fail on kernels older than 6.13. This is due to
a refactoring of signal handling in commit caf77435dd ("signal:
Handle ignored signals in do_sigaction(action != SIG_IGN)").
A previous attempt to fix this by adding a kernel version check to each
of the nine affected tests was suboptimal, as it resulted in emitting
the same skip message nine times.
Following the suggestion from Thomas Gleixner, this is refactored to
perform a single version check in main(). To satisfy the kselftest
framework's requirement for the test count to match the declared plan,
the plan is now conditionally set to 10 (for older kernels) or 19.
While setting the plan conditionally may seem complex, it is the
better approach to avoid the alternatives: either running tests on
unsupported kernels that are known to fail, or emitting a noisy series
of nine identical skip messages. A single informational message is now
printed instead when the tests are skipped.
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250807085042.1690931-1-wakel@google.com/
Link: https://patch.msgid.link/20251103114502.584940-1-wakel@google.com
Conform the layout, informational and status messages to KTAP. No
functional change is intended other than the layout of output messages.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Suggested-by: Sebastian Chlad <sebastian.chlad@suse.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
A test case for a situation when widen_imprecise_scalars() is called
with old->allocated_stack > cur->allocated_stack. Test structure:
def widening_stack_size_bug():
r1 = 0
for r6 in 0..1:
iterator_with_diff_stack_depth(r1)
r1 = 42
def iterator_with_diff_stack_depth(r1):
if r1 != 42:
use 128 bytes of stack
iterator based loop
iterator_with_diff_stack_depth() is verified with r1 == 0 first and
r1 == 42 next. Causing stack usage of 128 bytes on a first visit and 8
bytes on a second. Such arrangement triggered a KASAN error in
widen_imprecise_scalars().
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Create a test for the robust list mechanism. Test the following uAPI
operations:
- Creating a robust mutex where the lock waiter is wake by the kernel
when the lock owner died
- Setting a robust list to the current task
- Getting a robust list from the current task
- Getting a robust list from another task
- Using the list_op_pending field from robust_list_head struct to test
robustness when the lock owner dies before completing the locking
- Setting a invalid size for syscall argument `len`
- Adding multiple elements to a robust list wait waiting for each of
them
- Creating a circular list and checking that the kernel does not get
stuck in an infinity loop
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251110224130.3044761-1-andrealmeid@igalia.com
On systems where the shmget() syscall is not supported, tests like
anon_page and shared_waitv will fail. Skip these tests in such cases to
allow the rest of the test suite to run.
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251016162009.3270784-1-cmllamas@google.com
This was missed in commit e5c04d0f3e ("selftests/futex: Refactor
futex_wait with kselftest_harness.h") while replacing previous perror()
calls, which automatically append the newline character.
Fixes: e5c04d0f3e ("selftests/futex: Refactor futex_wait with kselftest_harness.h")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251015173556.2899646-1-cmllamas@google.com
Commit ed323aeda5 ("selftest/futex: Compile also with libnuma <
2.0.16") removed the unused function test_futex_mpol() and commit
d35ca2f642 ("selftests/futex: Refactor futex_numa_mpol with
kselftest_harness.h") added it back by accident. Remove it again.
Fixes: d35ca2f642 ("selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h")
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251001220438.66227-1-andrealmeid@igalia.com
- Fix vfio selftests to remove the expectation that the IOMMU
supports a 64-bit IOVA space. These manifest both in the original
set of tests introduced this development cycle in identity mapping
the IOVA to buffer virtual address space, as well as the more
recent boundary testing. Implement facilities for collecting the
valid IOVA ranges from the backend, implement a simple IOVA
allocator, and use the information for determining extents.
(Alex Mastro)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkWXwYRHGFsZXhAc2hh
emJvdC5vcmcACgkQI5ubbjuwiyK96w//ep97M1B5RSA4/AIP4j+fovnlM2RVUrPy
RKxVYK+rzftO4Hq8sg1Co3aB68hItbYiw/0BDUgQBFFCEoInQ9zht89tsHhuXq55
lYq9hyYMnp60FLzpX8Fi2HKf8SKTdD+6P1hX5UWZfYExc/tCD/9FPE2QsfUTbnh/
u2aguEp0RhgC8xCcR/5EsGFzcBgioDcTBQq5UQ/5By51nUNvdmHKRHKRC7ZmQfiK
bwFpYseWHRdzdBH2KggNaDZUh0q7Y9+cwL2UndQV5IfwGhGWODztOsVhhoxAAu0n
c6gXYDg34i5i1epPrr8Yvv8trDzMFCRYJi4g8BsLPOBXe8Lr2bWj/4wA1ftdW6Ha
0ii4Gx1UO1vz0KZQBajMHv5pO06lJZshzEQgHr3sYgjaRg8AVTV6hI3IVaty2JWk
DPKPNSGKhjkWvWnx73KFiiepz5zXEL7r7ooFBimtBwFr/5bDB8hfPjpXMh+mhiZu
QW6DV+Mrqg1G1KDCwdHSvbBwH8IvfX/tQbs9eKvGt+6ICpH1GJ1cUr58vjcc6ESi
sJzKI0ceTjGZAhHJogO4xAzhI8H6kwtDHB1ZC9QLHIwbkXFN1m64VjOwwXuHk0HT
559gjqjH0yKNSABwRMV1nKzn9EFnivQn1tAoGkUGSFE9ZI/VfCNNlspPziZEz0z9
u14SILGMqWU=
=ywvT
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio
Pull VFIO seftest fixes from Alex Williamson:
- Fix vfio selftests to remove the expectation that the IOMMU supports
a 64-bit IOVA space.
These manifest both in the original set of tests introduced this
development cycle in identity mapping the IOVA to buffer virtual
address space, as well as the more recent boundary testing.
Implement facilities for collecting the valid IOVA ranges from the
backend, implement a simple IOVA allocator, and use the information
for determining extents (Alex Mastro)
* tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio:
vfio: selftests: replace iova=vaddr with allocated iovas
vfio: selftests: add iova allocator
vfio: selftests: fix map limit tests to use last available iova
vfio: selftests: add iova range query helpers
Executing the test_maps binary on platforms with extremely high core
counts may cause intermittent assertion failures in
test_update_delete() (called via test_map_parallel()). This can occur
because bpf_map_update_elem() under some circumstances (specifically
in this case while performing bpf_map_update_elem() with BPF_NOEXIST
on a BPF_MAP_TYPE_HASH with its map_flags set to BPF_F_NO_PREALLOC)
can return an E2BIG error code i.e.
error -7 7 tools/testing/selftests/bpf/test_maps.c:#: void
test_update_delete(unsigned int, void *): Assertion `err == 0' failed.
tools/testing/selftests/bpf/test_maps.c:#: void
__run_parallel(unsigned int, void (*)(unsigned int, void *), void *):
Assertion `status == 0' failed.
As it turns out, is_map_full() which is called from alloc_htab_elem()
can take on a conservative approach when htab->use_percpu_counter is
true (which is the case here because the percpu_counter is used when a
BPF_MAP_TYPE_HASH is created with its map_flags set to
BPF_F_NO_PREALLOC). This conservative approach prioritizes preventing
over-allocation and potential issues that could arise from possibly
exceeding htab->map.max_entries in highly concurrent environments,
even if it means slightly under-utilizing the htab map's capacity.
Given that bpf_map_update_elem() from test_update_delete() can return
E2BIG, update can_retry() such that it also accounts for the E2BIG
error code (specifically only when running with map_flags being set to
BPF_F_NO_PREALLOC). The retry loop will allow the global count
belonging to the percpu_counter to become synchronized and better
reflect the current htab map's capacity.
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251113092519.2632079-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.
Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-4-jiayuan.chen@linux.dev
Cross-merge networking fixes after downstream PR (net-6.18-rc6).
No conflicts, adjacent changes in:
drivers/net/phy/micrel.c
96a9178a29 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
61b7ade9ba ("net: phy: micrel: Add support for non PTP SKUs for lan8814")
and a trivial one in tools/testing/selftests/drivers/net/Makefile.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes event-filter-function.tc tracing test failure caused when a first
run to sample events triggers kmem_cache_free which interferes with the
rest of the test. Fix this calling sample_events twice to eliminate the
kmem_cache_free related noise from the sampling.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmkWC6YACgkQCwJExA0N
QxzbIQ/9G93/8N7G9otEc5MRalvJlv2+NZmD2DOvbitc1zpKyYlGJc2ZakBkuQrA
z3WBa1bei4cZfrWEZDtkU7YzcRiCmLFPV1cDzAudc30InnJDLVN8IwMpFZY61Gwp
gBanPFHpxpcccolFzDQ5rtP2pJL6/XFsHqDZ5wUFgV8MbosgNskpIeGk7f33JZrh
xqUi+fBTOnR1TszsDCT4pRr1bM2jB/HhcJIQNPLkD75VIOXer+6DGenREkbiMubf
Wss9JdpA8x2L40GgkMRn6dqXW/1KSL7CT9UgMoPxgs6F1s1R6iDd4f5PO4IXa2sU
FnD8x4QIh1cZHirtRC+ChQwO9KKCoHq5CeEXtr2zGs3Xus2YbS+2OBGFCVoxivQp
BPE093Auglunk3Zf4ZQSophYoc8IC1xC8ie99TOJexxCANpXAcVd5mDoK9tIjsKu
mCmY0ng9yVsHaNyiX/NcsViyp5LcIRlL3P9q0ZKXwOxB4MB0c4IOJeAyN8KtTH/4
GXjYzIq0+u7Pk47rMvkHHOi3myIYm1mUfMQsWtIqXracsFN824AXeoWtcxuHoZ9Q
puobPEwNBKcj61JYymogS58Eu7+3AFWLagO+U3zKORDBq3OAq7nb2qF/VMcpGlF6
LQEo7wdzcrRmaU4OVw7NTIddy1E035rjFsuKNjBls5PK5NSfAHc=
=OZMi
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"Fixes event-filter-function.tc tracing test failure caused when a
first run to sample events triggers kmem_cache_free which interferes
with the rest of the test.
Fix this by calling sample_events twice to eliminate the
kmem_cache_free related noise from the sampling"
* tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Run sample events to clear page cache events
Add test to verify that updating [lru_,]percpu_hash maps decrements
refcount when BPF_KPTR_REF objects are involved.
The tests perform the following steps:
. Call update_elem() to insert an initial value.
. Use bpf_refcount_acquire() to increment the refcount.
. Store the node pointer in the map value.
. Add the node to a linked list.
. Probe-read the refcount and verify it is *2*.
. Call update_elem() again to trigger refcount decrement.
. Probe-read the refcount and verify it is *1*.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251105151407.12723-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For NICs with a large (1024+) number of queues, this test can cause
excessive memory fragmentation. This results in OOM errors, and in the
worst case driver/kernel crashes. We don't need to test with the max number
of queues, just enough to create a high likelihood of races between
reconfiguration and stats getting read.
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251111225319.3019542-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The hung_task_panic sysctl is currently a blunt instrument: it's all or
nothing.
Panicking on a single hung task can be an overreaction to a transient
glitch. A more reliable indicator of a systemic problem is when
multiple tasks hang simultaneously.
Extend hung_task_panic to accept an integer threshold, allowing the
kernel to panic only when N hung tasks are detected in a single scan.
This provides finer control to distinguish between isolated incidents
and system-wide failures.
The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic on the first hung task (unchanged)
- N > 1: Panic after N hung tasks are detected in a single scan
The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility.
[lance.yang@linux.dev: new changelog]
Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> [aspeed_g5_defconfig]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Florian Wesphal <fw@strlen.de>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr
as part of DMA mapping operations. However, not all IOMMUs support the
same virtual address width as the processor. For instance, older Intel
consumer platforms only support 39-bits of IOMMU address space. On such
platforms, using the virtual address as the IOVA fails.
Make the tests more robust by using iova_allocator to vend IOVAs, which
queries legally accessible IOVAs from the underlying IOMMUFD or VFIO
container.
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Add struct iova_allocator, which gives tests a convenient way to generate
legally-accessible IOVAs to map. This allocator traverses the sorted
available IOVA ranges linearly, requires power-of-two size allocations,
and does not support freeing iova allocations. The assumption is that
tests are not IOVA space-bounded, and will not need to recycle IOVAs.
This is based on Alex Williamson's patch series for adding an IOVA
allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-3-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Use the newly available vfio_pci_iova_ranges() to determine the last
legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.
Fixes: de8d1f2fd5 ("vfio: selftests: add end of address space DMA map/unmap tests")
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-2-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
VFIO selftests need to map IOVAs from legally accessible ranges, which
could vary between hardware. Tests in vfio_dma_mapping_test.c are making
excessively strong assumptions about which IOVAs can be mapped.
Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the
IOMMUFD or VFIO container associated with the device. The queried ranges
are normalized to IOMMUFD's iommu_iova_range representation so that
handling of IOVA ranges up the stack can be implementation-agnostic.
iommu_iova_range and vfio_iova_range are equivalent, so bias to using the
new interface's struct.
Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES.
Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.
The underlying vfio_iommu_type1_info buffer-related functionality has
been kept generic so the same helpers can be used to query other
capability chain information, if needed.
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Disable shellcheck rules SC2317 and SC2119. These rules are being
triggered due to false positives. For SC2317, many `return
"${KSFT_PASS}"` lines are reported as unreachable, even though they are
executed during normal runs. For SC2119, the fact that
log_guest/log_host accept either stdin or arguments triggers SC2119,
despite being valid.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-12-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add vsock_loopback module loading to the loopback test so that vmtest.sh
can be used for kernels built with loopback as a module.
This is not technically a fix as kselftest expects loopback to be
built-in already (defined in selftests/vsock/config). This is useful
only for using vmtest.sh outside of kselftest.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-11-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Testing with 1.37 shows all tests passing but emits the warning:
warning: vng version 'virtme-ng 1.37' has not been tested and may not function properly.
The following versions have been tested: 1.33 1.36
This patch adds 1.37 to the virtme-ng versions to get rid of the above
warning.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-10-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the definition for BUILD and initialize it to zero. This avoids
'bash -u vmtest.sh` from throwing 'unbound variable' when BUILD is not
set to 1 and is later checked for its value.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-9-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for future patches that introduce tests that cannot
re-use the same VM, add functions to identify those that *can* re-use a
VM.
By continuing to re-use the same VM for these tests we can save time by
avoiding the delay of booting a VM for every test.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-8-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add check_result() function to reuse logic for incrementing the
pass/fail counters. This function will get used by different callers as
we add different types of tests in future patches (namely, namespace and
non-namespace tests will be called at different places, and re-use this
function).
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-7-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reduce the time waiting for the QEMU pidfile from three minutes to five
seconds. The three minute time window was chosen to make sure QEMU had
enough time to fully boot up. This, however, is an unreasonably long
delay for QEMU to write the pidfile, which happens earlier when the QEMU
process starts (not after VM boot). The three minute delay becomes
noticeably wasteful in future tests that expect QEMU to fail and wait a
full three minutes for a pidfile that will never exist.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-6-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If QEMU fails to boot, then set the returncode (via timeout) instead of
unconditionally dying. This is in preparation for tests that expect QEMU
to fail to boot. In that case, we just want to know if the boot failed
or not so we can test the pass/fail criteria, and continue executing the
next test.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-5-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change QEMU to use generated pidfile names instead of just a single
globally-defined pidfile. This allows multiple QEMU instances to
co-exist with different pidfiles. This is required for future tests that
use multiple VMs to check for CID collissions.
Additionally, this also places the burden of killing the QEMU process
and cleaning up the pidfile on the caller of vm_start(). To help with
this, a function terminate_pidfiles() is introduced that callers use to
perform the cleanup. The terminate_pidfiles() function supports multiple
pidfile removals because future patches will need to process two
pidfiles at a time.
Change QEMU_OPTS to be initialized inside the vm_start(). This allows
the generated pidfile to be passed to the string assignment, and
prepares for future vm-specific options as well (e.g., cid).
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-4-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add wrapper functions vm_vsock_test() and host_vsock_test() to invoke
the vsock_test binary. This encapsulates several items of repeat logic,
such as waiting for the server to reach listening state and
enabling/disabling the bash option pipefail to avoid pipe-style logging
from hiding failures.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-3-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rewrite wait_for_listener()'s pattern matching to avoid tripping the
if-condition when pipefail is on.
awk doesn't gracefully handle SIGPIPE with a non-zero exit code, so grep
exiting upon finding a match causes false-positives when the pipefail
option is used (grep exits, SIGPIPE emits, and awk complains with a
non-zero exit code). Instead, move all of the pattern matching into awk
so that SIGPIPE cannot happen and the correct exit code is returned.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-2-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve usability of logging functions. Remove the test name prefix from
logging functions so that logging calls can be made deeper into the call
stack without passing down the test name or setting some global. Teach
log function to accept a LOG_PREFIX variable to avoid unnecessary
argument shifting.
Remove log_setup() and instead use log_host(). The host/guest prefixes
are useful to show whether a failure happened on the guest or host side,
but "setup" doesn't really give additional useful information. Since all
log_setup() calls happen on the host, lets just use log_host() instead.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251108-vsock-selftests-fixes-and-improvements-v4-1-d5e8d6c87289@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.
The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
guest and KVM injects expected SEA into the VCPU.
Tested on a data center server running Siryn AmpereOne processor
that has RAS support.
Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
APEI and GHES to offline the memory before guest can consume
injected UER, and making test unable to trigger SEA.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
The 'run_tests' function is executed in the background, but killing its
associated PID would not kill the children tasks running in the
background.
To properly kill all background tasks, 'kill -- -PID' could be used, but
this requires kill from procps-ng. Instead, all children tasks are
listed using 'ps', and 'kill' is called with all PIDs of this group.
Fixes: 31ee4ad86a ("selftests: mptcp: join: stop transfer when check is done (part 1)")
Cc: stable@vger.kernel.org
Fixes: 04b57c9e09 ("selftests: mptcp: join: stop transfer when check is done (part 2)")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-6-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MPTCP Join "fastclose server" selftest is sometimes failing because the
client output file doesn't have the expected size, e.g. 296B instead of
1024B.
When looking at a packet trace when this happens, the server sent the
expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE.
It is then strange to see the client only receiving 296B, which would
mean it only got a part of the second packet. The problem is then not on
the networking side, but rather on the data reception side.
When mptcp_connect is launched with '-f -1', it means the connection
might stop before having sent everything, because a reset has been
received. When this happens, the program was directly stopped. But it is
also possible there are still some data to read, simply because the
previous 'read' step was done with a buffer smaller than the pending
data, see do_rnd_read(). In this case, it is important to read what's
left in the kernel buffers before stopping without error like before.
SIGPIPE is now ignored, not to quit the app before having read
everything.
Fixes: 6bf41020b7 ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-5-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications
To play it safe, all userspace tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.
Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Fixes: b2e2248f36 ("selftests: mptcp: userspace pm create id 0 subflow")
Fixes: e3b47e460b ("selftests: mptcp: userspace pm remove initial subflow")
Fixes: b9fb176081 ("selftests: mptcp: userspace pm send RM_ADDR for ID 0")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-4-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.
Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications
To play it safe, all endpoints tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.
Fixes: 69c6ce7b6e ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Fixes: e274f71540 ("selftests: mptcp: add subflow limits test-cases")
Fixes: b5e2fb832f ("selftests: mptcp: add explicit test case for remove/readd")
Fixes: e06959e9ee ("selftests: mptcp: join: test for flush/re-add endpoints")
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-3-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some of these 'remove' tests rarely fail because a subflow has been
reset instead of cleanly removed. This can happen when one extra subflow
which has never carried data is being closed (FIN) on one side, while
the other is sending data for the first time.
To avoid such subflows to be used right at the end, the backup flag has
been added. With that, data will be only carried on the initial subflow.
Fixes: d2c4333a80 ("selftests: mptcp: add testcases for removing addrs")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-2-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "fallback due to TCP OoO" was never printed because the stat_ooo_now
variable was checked twice: once in the parent if-statement, and one in
the child one. The second condition was then always true then, and the
'else' branch was never taken.
The idea is that when there are more ACK + MP_CAPABLE than expected, the
test either fails if there was no out of order packets, or a notice is
printed.
Fixes: 69ca3d29a7 ("mptcp: update selftest for fallback due to OoO")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-1-a4332c714e10@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a series of tests to validate the RV tracefs API and basic
functionality.
* available monitors:
Check that all monitors (from the monitors folder) appear as
available and have a description. Works with nested monitors.
* enable/disable:
Enable and disable all monitors and validate both the enabled file
and the enabled_monitors. Check that enabling container monitors
enables all nested monitors.
* reactors:
Set all reactors and validate the setting, also for nested monitors.
* wwnr with printk:
wwnr is broken on purpose, run it with a load and check that the
printk reactor works. Also validate disabling reacting_on or
monitoring_on prevents reactions.
These tests use the ftracetest suite.
Acked-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20251017115203.140080-3-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
The ftracetest script is a fairly complete test framework for tracefs-like
subsystem, but it can only be used for ftrace selftests.
If OPT_TEST_DIR is provided and includes a function file, use that as
test directory going forward rather than just grabbing tests from it.
Generalise function names like initialize_ftrace to initialize_system.
Add the --rv argument to set up the test for rv, basically changing the
trace directory to $TRACING_DIR/rv and displaying an error if that
cannot be found.
This prepares for rv selftests inclusion.
Link: https://lore.kernel.org/r/20251017115203.140080-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Bring in the shared branch with the kbuild tree to enable
'-fms-extensions' for 6.19. Further namespace cleanup work
requires this extension.
Signed-off-by: Christian Brauner <brauner@kernel.org>
This patch adds a selftest that verifies netconsole functionality
over bonded network interfaces using netdevsim. It sets up two bonded
interfaces acting as transmit (TX) and receive (RX) ends, placed in
separate network namespaces. The test sends kernel log messages and
verifies that they are properly received on the bonded RX interfaces
with both IPv4 and IPv6, and using basic and extended netconsole
formats.
This patchset aims to test a long-standing netpoll subsystem where
netpoll has multiple users. (in this case netconsole and bonding). A
similar selftest has been discussed in [1] and [2].
This test also tries to enable bonding and netpoll in different order,
just to guarantee that all the possibilities are exercised.
Link: https://lore.kernel.org/all/20250905-netconsole_torture-v3-0-875c7febd316@debian.org/ [1]
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/ [2]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-4-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
messages are being sent, as creating parallel targets.
The code launches three background jobs on distinct schedules:
* Toggle netcons target every 30 iterations
* create and delete random_target every 50 iterations
* toggle iface every 70 iterations
This creates multiple concurrency sources that interact with netconsole
states. This is good practice to simulate stress, and exercise netpoll
and netconsole locks.
This test already found an issue as reported in [1]
Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-3-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extract the netconsole target creation from create_dynamic_target(), by
moving it from create_dynamic_target() into a new helper function. This
enables other tests to use the creation of netconsole targets with
arbitrary parameters and no sleep.
The new helper will be utilized by forthcoming torture-type selftests
that require dynamic target management.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251107-netconsole_torture-v10-2-749227b55f63@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tracing selftest "event-filter-function.tc" was failing because it
first runs the "sample_events" function that triggers the kmem_cache_free
event and it looks at what function was used during a call to "ls".
But the first time it calls this, it could trigger events that are used to
pull pages into the page cache.
The rest of the test uses the function it finds during that call to see if
it will be called in subsequent "sample_events" calls. But if there's no
need to pull pages into the page cache, it will not trigger that function
and the test will fail.
Call the "sample_events" twice to trigger all the page cache work before
it calls it to find a function to use in subsequent checks.
Cc: stable@vger.kernel.org
Fixes: eb50d0f250 ("selftests/ftrace: Choose target function for filter test from samples")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This tests introduces a tiny smc_hs_ctrl for filtering SMC connections
based on IP pairs, and also adds a realistic topology model to verify it.
Also, we can only use SMC loopback under CI test, so an additional
configuration needs to be enabled.
Follow the steps below to run this test.
make -C tools/testing/selftests/bpf
cd tools/testing/selftests/bpf
sudo ./test_progs -t smc
Results shows:
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20251107035632.115950-4-alibuda@linux.alibaba.com
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_proto(), which modifies packet headroom to accommodate
different IP header sizes.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-16-5ceb08a9b37b@cloudflare.com
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet
headroom/tailroom and can trigger head reallocation.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-15-5ceb08a9b37b@cloudflare.com
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.
The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com
Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.
Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com
Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com
Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.
Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com
- Fix trapping regression when no in-kernel irqchip is present
- Check host-provided, untrusted ranges and offsets in pKVM
- Fix regression restoring the ID_PFR1_EL1 register
- Fix vgic ITS locking issues when LPIs are not directly injected
Arm selftests:
- Correct target CPU programming in vgic_lpi_stress selftest
- Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
RISC-V:
- Fix check for local interrupts on riscv32
- Read HGEIP CSR on the correct cpu when checking for IMSIC interrupts
- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
x86:
- Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as KVM
doesn't support virtualization the instructions, but the instructions
are gated only by VMXON. That is, they will VM-Exit instead of taking
a #UD and until now this resulted in KVM exiting to userspace with an
emulation error.
- Unload the "FPU" when emulating INIT of XSTATE features if and only if
the FPU is actually loaded, instead of trying to predict when KVM will
emulate an INIT (CET support missed the MP_STATE path). Add sanity
checks to detect and harden against similar bugs in the future.
- Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is unloaded.
- Use a raw spinlock for svm->ir_list_lock as the lock is taken during
schedule(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y.
- Remove guest_memfd bindings on memslot deletion when a gmem file is dying
to fix a use-after-free race found by syzkaller.
- Fix a goof in the EPT Violation handler where KVM checks the wrong
variable when determining if the reported GVA is valid.
- Fix and simplify the handling of LBR virtualization on AMD, which was made
buggy and unnecessarily complicated by nested VM support
Misc:
- Update Oliver's email address
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkQSAAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMDzQf9GW6F8yAo1QTnLNfSIvHMQcJAKwr3
pd+bOLtMN/OzdQO/dvGVdkYuYYg/PQv/+zw6dVIeiLjtlrsTg40rsqVZEAXYCsbB
q7TFM634thgON6R6fD6eLa+72UP0wMai7xqEfEyXVW3enAEEe+lrPWC9BiwJRqKZ
pY1MpVIdYa0XNfUBeiOhO5AH+y9OmUDq5AHptrYn9X5xdsU2OWqQCyHW/1RLPWvA
9bkyuz1A8+EQ20ngHUd0hrQx4UeJ7jvPblbryUxXaMwqahPC9sA2iDI12gAu8a84
skvWbIPHSMgj5qDO/CkAsHb47GiqudU4LH7LniDZNsq21iTekiURSbdklw==
=SM6O
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Fix trapping regression when no in-kernel irqchip is present
- Check host-provided, untrusted ranges and offsets in pKVM
- Fix regression restoring the ID_PFR1_EL1 register
- Fix vgic ITS locking issues when LPIs are not directly injected
Arm selftests:
- Correct target CPU programming in vgic_lpi_stress selftest
- Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest
RISC-V:
- Fix check for local interrupts on riscv32
- Read HGEIP CSR on the correct cpu when checking for IMSIC
interrupts
- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()
x86:
- Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as
KVM doesn't support virtualization the instructions, but the
instructions are gated only by VMXON. That is, they will VM-Exit
instead of taking a #UD and until now this resulted in KVM exiting
to userspace with an emulation error.
- Unload the "FPU" when emulating INIT of XSTATE features if and only
if the FPU is actually loaded, instead of trying to predict when
KVM will emulate an INIT (CET support missed the MP_STATE path).
Add sanity checks to detect and harden against similar bugs in the
future.
- Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is
unloaded.
- Use a raw spinlock for svm->ir_list_lock as the lock is taken
during schedule(), and "normal" spinlocks are sleepable locks when
PREEMPT_RT=y.
- Remove guest_memfd bindings on memslot deletion when a gmem file is
dying to fix a use-after-free race found by syzkaller.
- Fix a goof in the EPT Violation handler where KVM checks the wrong
variable when determining if the reported GVA is valid.
- Fix and simplify the handling of LBR virtualization on AMD, which
was made buggy and unnecessarily complicated by nested VM support
Misc:
- Update Oliver's email address"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: nSVM: Fix and simplify LBR virtualization handling with nested
KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
MAINTAINERS: Switch myself to using kernel.org address
KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
KVM: arm64: Make all 32bit ID registers fully writable
KVM: VMX: Fix check for valid GVA on an EPT violation
KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
KVM: SVM: switch to raw spinlock for svm->ir_list_lock
KVM: SVM: Make avic_ga_log_notifier() local to avic.c
KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
KVM: arm64: Check the untrusted offset in FF-A memory share
KVM: arm64: Check range args for pKVM mem transitions
...
Add support for the file descriptor based variant of chdir().
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
It seems that most of the tests prepare the interfaces once before the test
run (setup_prepare()), rely on setup_wait() to wait for link and only then
run the test(s).
local_termination brings the physical interfaces down and up during test
run but never wait for them to come up. If the auto-negotiation takes
some seconds, first test packets are being lost, which leads to
false-negative test results.
Use setup_wait() in run_test() to make sure auto-negotiation has been
completed after all simple_if_init() calls on physical interfaces and test
packets will not be lost because of the race against link establishment.
Fixes: 90b9566aa5 ("selftests: forwarding: add a test for local_termination.sh")
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20251106161213.459501-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This script sets net.ipv4.tcp_rto_max_ms to 1000 and checks
if SYN+ACK RTO is capped at 1s for TFO and non-TFO.
Without the previous patch, the max RTO is applied to TFO
SYN+ACK only, and non-TFO SYN+ACK RTO increases exponentially.
# selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
# TAP version 13
# 1..2
# tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
expected outbound packet at 5.091936 sec but happened at 6.107826 sec; tolerance 0.127974 sec
# script packet: 5.091936 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
# actual packet: 6.107826 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
# not ok 1 ipv4
# tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
expected outbound packet at 5.075901 sec but happened at 6.091841 sec; tolerance 0.127976 sec
# script packet: 5.075901 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
# actual packet: 6.091841 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
# not ok 2 ipv6
# # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
not ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt # exit=1
With the previous patch, all SYN+ACKs are retransmitted
after 1s.
# selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
# TAP version 13
# 1..2
# ok 1 ipv4
# ok 2 ipv6
# # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Three fixes:
- Syzkaller found a case where maths overflows can cause divide by 0
- Typo in a compiler bug warning fix in the selftests broke the selftests
- type1 compatability had a mismatch when unmapping an already unmaped
range, it should succeed
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaQ473AAKCRCFwuHvBreF
Ya8gAP9Qifu4AYobrhgD9rKKXLniGRejXUHBQ0G8y8y9sBHJdgEApwxzCUcI1jLZ
GTaPEABblBBWxUmN+Psya2Hm3Oa53AA=
=TJHp
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- Syzkaller found a case where maths overflows can cause divide by 0
- Typo in a compiler bug warning fix in the selftests broke the
selftests
- type1 compatability had a mismatch when unmapping an already unmapped
range, it should succeed
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
iommufd: Don't overflow during division for dirty tracking
The zt-test output is awkward to read, as the 'Expected' value isn't
dumped on its own line and isn't aligned with the 'Got' value beneath.
For example:
Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469]
Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
SVCR: 2
Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get
output that can be read more easily:
Mismatch: PID=5281, iteration=3270249
Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469]
Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
SVCR: 2
Admittedly this isn't all that important when the 'Got' value is all
zeroes, but otherwise this would be a major help for identifying which
portion of the 'Got' value is not as expected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kselftest@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This commit loosens the kvm.sh script's regular expressions to permit
negative-valued Kconfig options, for example:
--kconfig CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN=-1
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Now that start_server_str enforces SO_REUSEADDR, there's no need to keep
using start_reusport_server in tc_tunnel, especially since it only uses
one server at a time.
Replace start_reuseport_server with start_server_str in tc_tunnel test.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-start-server-soreuseaddr-v1-2-1bbd9c1f8d65@bootlin.com
Some tests have to stop/start a server multiple time with the same
listening address. Doing so without SO_REUSADDR leads to failures due to
the socket still being in TIME_WAIT right after the first instance
stop/before the second instance start. Instead of letting each test
manually set SO_REUSEADDR on their servers, it can be done automatically
by start_server_addr for all tests (and without any major downside).
Enforce SO_REUSEADDR in start_server_addr for all tests.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-start-server-soreuseaddr-v1-1-1bbd9c1f8d65@bootlin.com
Commit 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read
user space") made an update that fixed both trace_marker and
trace_marker_raw. But the small difference made to trace_marker_raw had a
blatant bug in it that any basic testing would have uncovered.
Unfortunately, the self tests have tests for trace_marker but nothing for
trace_marker_raw which allowed the bug to get upstream.
Add basic selftests to test trace_marker_raw so that this doesn't happen
again.
Link: https://lore.kernel.org/r/20251014145149.3e3c1033@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Cross-merge networking fixes after downstream PR (net-6.18-rc5).
Conflicts:
drivers/net/wireless/ath/ath12k/mac.c
9222582ec5 ("Revert "wifi: ath12k: Fix missing station power save configuration"")
6917e268c4 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon")
https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net
Adjacent changes:
drivers/net/ethernet/intel/Kconfig
b1d16f7c00 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG")
93f53db9f9 ("ice: switch to Page Pool")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - new code bugs:
- ptp: expose raw cycles only for clocks with free-running counter
- bonding: fix null-deref in actor_port_prio setting
- mdio: ERR_PTR-check regmap pointer returned by device_node_to_regmap()
- eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG
Previous releases - regressions:
- virtio_net: fix perf regression due to bad alignment of
virtio_net_hdr_v1_hash
- Revert "wifi: ath10k: avoid unnecessary wait for service ready message"
caused regressions for QCA988x and QCA9984
- Revert "wifi: ath12k: Fix missing station power save configuration"
caused regressions for WCN7850
- eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
corruptions after kexec
Previous releases - always broken:
- virtio-net: fix received packet length check for big packets
- sctp: fix races in socket diag handling
- wifi: add an hrtimer-based delayed work item to avoid low granularity
of timers set relatively far in the future, and use it where it matters
(e.g. when performing AP-scheduled channel switch)
- eth: mlx5e:
- correctly propagate error in case of module EEPROM read failure
- fix HW-GRO on systems with PAGE_SIZE == 64kB
- dsa: b53: fixes for tagging, link configuration / RMII, FDB, multicast
- phy: lan8842: implement latest errata
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkMyx0ACgkQMUZtbf5S
Irs4gA//fRGPHfEksbVO0lAR9QVUKq/Wvokt6DnLXQsi7js7kun7ymVFyU8tVacP
sGoYbTTXO34XpijKVMTFe5WnnnSF3cri1/e+PYNXndKGHOheLvz0aq+CniiLexaT
ag/opHX9+Io6x8DyOVkkRJYByvEcXgy1vFjhk5O04wn7tGPBNzvaKQJzjfVIiiWP
thma/e77jVUqsNptS/VzJNCDwPB3Qi0fqylRovu7A59Kmr7GLZxaqZQpvM5/XjU3
s3NhNjNDawmiwxN2AIztXo+vMReqFaiCr+z36OcruE04CFslkQGmE/5g3FdDGg+Q
RE7Wfgk2UJ+Q5Y1jnQWNYOkGaMd6bMgPQD/8zZvwEf163Gh1YBh0rtDY07DWV5FR
wp5cmeNDo2Mf+nnCM2UaoUQS2AAmkub2aAIjnlvrefeJMKciyMqtQe+V02aIxuvK
BPmeSCN1IOyAIDCRE3Fd6JFyk+4Z4YC6Hdm/WOG517eGrDh3yV9JE/+C3jdh3JwT
lvaScKhCdyq/9mFqEcU/yR5Kc4Od6Rw0K0s5DM4LAbKSsxI6hp6SkuoFHUsqFsRR
HL3ucy4c4hmLxz6AA5/+UVIPzFsY5cwOnhs3AU2UVuA95fH0QWqX2u2ll0rC9mAW
xDXa/ai0/KAGvVYqjC+MDyF8zrtzSG7C40ZsxhdcU84s94ZNVFI=
=a/Ml
-----END PGP SIGNATURE-----
Merge tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
Including fixes from bluetooth and wireless.
Current release - new code bugs:
- ptp: expose raw cycles only for clocks with free-running counter
- bonding: fix null-deref in actor_port_prio setting
- mdio: ERR_PTR-check regmap pointer returned by
device_node_to_regmap()
- eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG
Previous releases - regressions:
- virtio_net: fix perf regression due to bad alignment of
virtio_net_hdr_v1_hash
- Revert "wifi: ath10k: avoid unnecessary wait for service ready
message" caused regressions for QCA988x and QCA9984
- Revert "wifi: ath12k: Fix missing station power save configuration"
caused regressions for WCN7850
- eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
corruptions after kexec
Previous releases - always broken:
- virtio-net: fix received packet length check for big packets
- sctp: fix races in socket diag handling
- wifi: add an hrtimer-based delayed work item to avoid low
granularity of timers set relatively far in the future, and use it
where it matters (e.g. when performing AP-scheduled channel switch)
- eth: mlx5e:
- correctly propagate error in case of module EEPROM read failure
- fix HW-GRO on systems with PAGE_SIZE == 64kB
- dsa: b53: fixes for tagging, link configuration / RMII, FDB,
multicast
- phy: lan8842: implement latest errata"
* tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
selftests/vsock: avoid false-positives when checking dmesg
net: bridge: fix MST static key usage
net: bridge: fix use-after-free due to MST port state bypass
lan966x: Fix sleeping in atomic context
bonding: fix NULL pointer dereference in actor_port_prio setting
net: dsa: microchip: Fix reserved multicast address table programming
net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
net: libwx: fix device bus LAN ID
net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
net/mlx5e: SHAMPO, Fix skb size check for 64K pages
net/mlx5e: SHAMPO, Fix header mapping for 64K pages
net: ti: icssg-prueth: Fix fdb hash size configuration
net/mlx5e: Fix return value in case of module EEPROM read error
net: gro_cells: Reduce lock scope in gro_cell_poll
libie: depend on DEBUG_FS when building LIBIE_FWLOG
wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
netpoll: Fix deadlock in memory allocation under spinlock
net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error
virtio-net: fix received length check in big packets
bnxt_en: Fix warning in bnxt_dl_reload_down()
...
Sometimes VMs will have some intermittent dmesg warnings that are
unrelated to vsock. Change the dmesg parsing to filter on strings
containing 'vsock' to avoid false positive failures that are unrelated
to vsock. The downside is that it is possible for some vsock related
warnings to not contain the substring 'vsock', so those will be missed.
Fixes: a4a65c6fe0 ("selftests/vsock: add initial vmtest.sh for vsock")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251105-vsock-vmtest-dmesg-fix-v2-1-1a042a14892c@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add C-level selftests for indirect jumps to validate LLVM and libbpf
functionality. The tests are intentionally disabled, to be run
locally by developers, but will not make the CI red.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-13-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a specific test for instructions arrays with blinding enabled.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-7-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the following selftests for new insn_array map:
* Incorrect instruction indexes are rejected
* Two programs can't use the same map
* BPF progs can't operate the map
* no changes to code => map is the same
* expected changes when instructions are added
* expected changes when instructions are deleted
* expected changes when multiple functions are present
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-5-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding test that verifies we get expected initial 2 entries from
stacktrace for rawtp probe via ORC unwind.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Adding test that attaches kprobe/kretprobe multi and verifies the
ORC stacktrace matches expected functions.
Adding bpf_testmod_stacktrace_test function to bpf_testmod kernel
module which is called through several functions so we get reliable
call path for stacktrace.
The test is only for ORC unwinder to keep it simple.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
We now have an RCU_EXPERT config for testing small-sized RCU dynticks
counter: CONFIG_RCU_DYNTICKS_TORTURE.
Modify scenario TREE04 to exercise to use this config in order to test a
ridiculously small counter (2 bits).
Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds "inplace" and "inplace-force" values to the kvm-again.sh
"--link" argument, which causes the run's output to be placed into the
build directory. This could be used to save build time if the machine
went down partway into a run, but it can also be used to do a large
number of builds, and run the resulting kernels concurrently even if the
builds are based on different commits. A later commit will add this
latter capability to kvm-series.sh in order to produce large speedups
for branch-checking operations.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds a kvm-series.sh script that takes a list of scenarios and
a list of commits, and then runs each scenario on all of the commits.
A given scenario is run on all the commits before advancing to the
next scenario to minimize build times. The successes and failures are
summarized at the end of the run.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Store the tools/testing/selftests/vfio/lib outputs (e.g. object files)
in $(OUTPUT)/libvfio rather than in $(OUTPUT)/lib. This is in
preparation for building the VFIO selftests library into the KVM
selftests (see Link below).
Specifically this will avoid name conflicts between
tools/testing/selftests/{vfio,kvm/lib and also avoid leaving behind
empty directories under tools/testing/selftests/kvm after a make clean.
Link: https://lore.kernel.org/kvm/20250912222525.2515416-2-dmatlack@google.com/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250922224857.2528737-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
The iommufd self test uses an xarray to store the pfns and their orders to
emulate a page table. Make it act more like a real iommu driver by
replacing the xarray with an iommupt based page table. The new AMDv1 mock
format behaves similarly to the xarray.
Add set_dirty() as a iommu_pt operation to allow the test suite to
simulate HW dirty.
Userspace can select between several formats including the normal AMDv1
format and a special MOCK_IOMMUPT_HUGE variation for testing huge page
dirty tracking. To make the dirty tracking test work the page table must
only store exactly 2M huge pages otherwise the logic the test uses
fails. They cannot be broken up or combined.
Aside from aligning the selftest with a real page table implementation,
this helps test the iommupt code itself.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename bpf_stream_vprintk() to bpf_stream_vprintk_impl().
This makes bpf_stream_vprintk() follow the already established "_impl"
suffix-based naming convention for kfuncs with the bpf_prog_aux
argument provided by the verifier implicitly. This convention will be
taken advantage of with the upcoming KF_IMPLICIT_ARGS feature to
preserve backwards compatibility to BPF programs.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-2-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Rename:
bpf_task_work_schedule_resume()->bpf_task_work_schedule_resume_impl()
bpf_task_work_schedule_signal()->bpf_task_work_schedule_signal_impl()
This aligns task work scheduling kfuncs with the established naming
scheme for kfuncs with the bpf_prog_aux argument provided by the
verifier implicitly. This convention will be taken advantage of with the
upcoming KF_IMPLICIT_ARGS feature to preserve backwards compatibility to
BPF programs.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-1-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
By design, an MPTCP connection will not accept extra subflows where no
MPTCP listening sockets can accept such requests.
In other words, it means that if the 'server' listens on a specific
address / device, it cannot accept MP_JOIN sent to a different address /
device. Except if there is another MPTCP listening socket accepting
them.
This is what the new tests are validating:
- Forcing a bind on the main v4/v6 address, and checking that MP_JOIN
to announced addresses are not accepted.
- Also forcing a bind on the main v4/v6 address, but before, another
listening socket is created to accept additional subflows. Note that
'mptcpize run nc -l' -- or something else only doing: socket(MPTCP),
bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is
reused not to depend on another tool just for that.
- Same as the previous one, but using v6 link-local addresses: this is
a bit particular because it is required to specify the outgoing
network interface when connecting to a link-local address announced
by the other peer. When using the routing rules, this doesn't work
(the outgoing interface is not known) ; but it does work with a
'laminar' endpoint having a specified interface.
Note that extra small modifications are needed for these tests to work:
- mptcp_connect's check_getpeername_connect() check should strip the
specified interface when comparing addresses.
- With IPv6 link-local addresses, it is required to wait for them to
be ready (no longer in 'tentative' mode) before using them, otherwise
the bind() will not be allowed.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/591
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-4-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The same extra long commands are present twice, with small differences:
the variable for the stdin file is different.
Use new dedicated variables in one command to avoid this code
duplication.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-3-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Our documentation is saying that the in-kernel PM is only using fullmesh
endpoints to establish subflows to announced addresses when at least one
endpoint has a fullmesh flag. But this was not totally correct: only
fullmesh endpoints were used if at least one endpoint *from the same
address family as the received ADD_ADDR* has the fullmesh flag.
This is confusing, and it seems clearer not to have differences
depending on the address family.
So, now, when at least one MPTCP endpoint has a fullmesh flag, the local
addresses are picked from all fullmesh endpoints, which might be 0 if
there are no endpoints for the correct address family.
One selftest needs to be adapted for this behaviour change.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-2-b4166772d6bb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Write raw BTF to files, parse it and compare to original;
this allows us to test parsing of (multi-)split BTF code.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251104203309.318429-3-alan.maguire@oracle.com
Verify that when using simple socket-based coredump (@ pattern),
the coredump_signal field is correctly exposed as SIGABRT.
Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-22-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Verify that when using simple socket-based coredump (@ pattern),
the coredump_signal field is correctly exposed as SIGSEGV.
Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-21-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
If we crash multiple processes at the same time we may run out of space.
Just ignore those errors. They're not actually all that relevant for the
test.
Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-20-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
PIDFD_INFO_COREDUMP is only retrievable until the task has exited. After
it has exited task->mm is NULL. So if the task didn't actually coredump
we can't retrieve it's dumpability settings anymore. Only if the task
did coredump will we have stashed the coredump information in the
respective struct pid.
Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-15-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Verify that when PIDFD_INFO_SUPPORTED_MASK is requested, the kernel
returns the supported_mask field indicating which flags the kernel
supports.
Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-10-ca449b7b7aa0@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add testcase to run busy poll test with threaded napi busy poll enabled.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin Karsten <mkarsten@uwaterloo.ca>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Link: https://patch.msgid.link/20251028203007.575686-3-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add test cases to verify the correctness of the BPF verifier's branch analysis
when conditional jumps are performed on the same scalar register. And make sure
that JGT does not trigger verifier BUG.
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251103063108.1111764-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Both livepatch and BPF trampoline use ftrace. Special attention is needed
when livepatch and fexit program touch the same function at the same
time, because livepatch updates a kernel function and the BPF trampoline
need to call into the right version of the kernel function.
Use samples/livepatch/livepatch-sample.ko for the test.
The test covers two cases:
1) When a fentry program is loaded first. This exercises the
modify_ftrace_direct code path.
2) When a fentry program is loaded first. This exercises the
register_ftrace_direct code path.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251027175023.1521602-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
KASAN reports a global-out-of-bounds access when running these nfit
tests: clear.sh, pmem-errors.sh, pfn-meta-errors.sh, btt-errors.sh,
daxdev-errors.sh, and inject-error.sh.
[] BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x769f/0x7840 [nfit_test]
[] Read of size 4 at addr ffffffffc03ea01c by task ndctl/1215
[] The buggy address belongs to the variable:
[] handle+0x1c/0x1df4 [nfit_test]
nfit_test_search_spa() uses handle[nvdimm->id] to retrieve a device
handle and triggers a KASAN error when it reads past the end of the
handle array. It should not be indexing the handle array at all.
The correct device handle is stored in per-DIMM test data. Each DIMM
has a struct nfit_mem that embeds a struct acpi_nfit_memdev that
describes the NFIT device handle. Use that device handle here.
Fixes: 10246dc84d ("acpi nfit: nfit_test supports translate SPA")
Cc: stable@vger.kernel.org
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
Link: https://patch.msgid.link/20251031234227.1303113-1-alison.schofield@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Pull cgroup/for-6.19 to receive:
16dad7801a ("cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()")
260fbcb92b ("cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()")
d245698d72 ("cgroup: Defer task cgroup unlink until after the task is done switching out")
These are needed for the sched_ext cgroup exit ordering fix.
Signed-off-by: Tejun Heo <tj@kernel.org>
Rename "guest_paddr" variables in vm_userspace_mem_region_add() and
vm_mem_add() to KVM's de facto standard "gpa", both for consistency and
to shorten line lengths.
Opportunistically fix the indentation of the
vm_userspace_mem_region_add() declaration.
Link: https://patch.msgid.link/20251007223625.369939-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Stress tests for namespace active reference counting.
These tests validate that the active reference counting system can
handle high load scenarios including rapid namespace
creation/destruction, large numbers of concurrent namespaces, and
various edge cases under stress.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-71-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test that namespaces become inactive after subprocess with multiple
threads exits. Create a subprocess that unshares user and network
namespaces, then creates two threads that share those namespaces. Verify
that after all threads and subprocess exit, the namespaces are no longer
listed by listns() and cannot be opened by open_by_handle_at().
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-69-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test that a namespace remains active while a thread holds an fd to it.
Even after the thread exits, the namespace should remain active as long
as another thread holds a file descriptor to it.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-68-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test multi-level namespace resurrection across three user namespace levels.
This test creates a complex namespace hierarchy with three levels of user
namespaces and a network namespace at the deepest level. It verifies that
the resurrection semantics work correctly when SIOCGSKNS is called on a
socket from an inactive namespace tree, and that listns() and
open_by_handle_at() correctly respect visibility rules.
Hierarchy after child processes exit (all with 0 active refcount):
net_L3A (0) <- Level 3 network namespace
|
+
userns_L3 (0) <- Level 3 user namespace
|
+
userns_L2 (0) <- Level 2 user namespace
|
+
userns_L1 (0) <- Level 1 user namespace
|
x
init_user_ns
The test verifies:
1. SIOCGSKNS on a socket from inactive net_L3A resurrects the entire chain
2. After resurrection, all namespaces are visible in listns()
3. Resurrected namespaces can be reopened via file handles
4. Closing the netns FD cascades down: the entire ownership chain
(userns_L3 -> userns_L2 -> userns_L1) becomes inactive again
5. Inactive namespaces disappear from listns() and cannot be reopened
6. Calling SIOCGSKNS again on the same socket resurrects the tree again
7. After second resurrection, namespaces are visible and can be reopened
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-66-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test combined listns() and file handle operations with socket-kept
netns. Create a netns, keep it alive with a socket, verify it appears in
listns(), then reopen it via file handle obtained from listns() entry.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-65-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test that socket-kept netns can be reopened via file handle.
Verify that a network namespace kept alive by a socket FD can be
reopened using file handles even after the creating process exits.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-64-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test that socket-kept netns appears in listns() output.
Verify that a network namespace kept alive by a socket FD appears in
listns() output even after the creating process exits, and that it
disappears when the socket is closed.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-63-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test that user namespace as a child also propagates correctly.
Create user_A -> user_B, verify when user_B is active that user_A
is also active. This is different from non-user namespace children.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-36-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test that parent stays active as long as ANY child is active.
Create parent user namespace with two child net namespaces.
Parent should remain active until BOTH children are inactive.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-35-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test hierarchical propagation with deep namespace hierarchy.
Create: init_user_ns -> user_A -> user_B -> net_ns
When net_ns is active, both user_A and user_B should be active.
This verifies the conditional recursion in __ns_ref_active_put() works.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-34-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test hierarchical active reference propagation.
When a child namespace is active, its owning user namespace should also
be active automatically due to hierarchical active reference propagation.
This ensures parents are always reachable when children are active.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-29-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Test namespace lifecycle: create a namespace in a child process, get a
file handle while it's active, then try to reopen after the process
exits (namespace becomes inactive).
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-24-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add a loadable test module that validates CXL address translation
calculations using parameterized test vectors. The module tests both
host-to-device and device-to-host address translations for Modulo and
XOR interleave arithmetic.
Two types of testing are provided:
1. Parameterized test vectors:
Test vectors are passed as module parameters in the format:
"dpa pos r_eiw r_eig hb_ways math expected_spa".
Round-trip validation is performed:
- Translate a DPA and position to a SPA
- Verify the result matches expected SPA
- Translate that SPA back to a DPA and position
- Verify round-trip consistency
2. Internal validation testing:
When no test vectors are provided, the module performs validation
of the translation functions by checking parameter boundaries and
running 10,000 iterations of randomly generated valid parameters
to exercise the core calculation functions.
The module uses the CXL Driver translation functions through symbols
exported exclusively for cxl_translate.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
devm_cxl_port_enumerate_dports() is not longer used after below commit
commit 4f06d81e7c ("cxl: Defer dport allocation for switch ports")
Delete it and the relevant interface implemented in cxl_test.
Signed-off-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Refactor ublk_thread to be a thread-local variable instead of storing
it in ublk_dev:
- Remove pthread_t thread field from struct ublk_thread and move it to
struct ublk_thread_info
- Remove struct ublk_thread array from struct ublk_dev, reducing memory
footprint
- Define struct ublk_thread as local variable in __ublk_io_handler_fn()
instead of accessing it from dev->threads[]
- Extract main IO handling logic into __ublk_io_handler_fn() which is
marked as noinline
- Move CPU affinity setup to ublk_io_handler_fn() before calling
__ublk_io_handler_fn()
- Update ublk_thread_set_sched_affinity() to take struct ublk_thread_info *
instead of struct ublk_thread *, and use pthread_setaffinity_np()
instead of sched_setaffinity()
- Reorder struct ublk_thread fields to group related state together
This change makes each thread's ublk_thread structure truly local to
the thread, improving cache locality and reducing memory usage.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move ublk_thread_set_sched_affinity() call before ublk_thread_init()
to ensure memory allocations during thread initialization occur on
the correct NUMA node. This leverages Linux's first-touch memory
policy for better NUMA locality.
Also convert ublk_thread_set_sched_affinity() to use
pthread_setaffinity_np() instead of sched_setaffinity(), as the
pthread API is the proper interface for setting thread affinity in
multithreaded programs.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Surprisingly we forgot to add this common one. It was added with a
per-arch guard allowing to later implement it in arch-specific asm
code like was done for a few other ones.
The test verifies that we don't search past the indicated length.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
The script "ethtool-common.sh" is not installed in INSTALL_PATH, and
triggers some errors when I try to run the test
'drivers/net/netdevsim/ethtool-coalesce.sh':
TAP version 13
1..1
# timeout set to 600
# selftests: drivers/net/netdevsim: ethtool-coalesce.sh
# ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory
# ./ethtool-coalesce.sh: line 25: make_netdev: command not found
# ethtool: bad command line argument(s)
# ./ethtool-coalesce.sh: line 124: check: command not found
# ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected
# FAILED /0 checks
not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1
Install this file to avoid this error. After this patch:
TAP version 13
1..1
# timeout set to 600
# selftests: drivers/net/netdevsim: ethtool-coalesce.sh
# PASSED all 22 checks
ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh
Fixes: fbb8531e58 ("selftests: extract common functions in ethtool-common.sh")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251030040340.3258110-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GRO self-test, gro.c, currently constructs IPv6 packets containing a
Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path
correctly handles IPv6 extension headers.
However, network elements may be configured to drop packets with the
Hop-by-Hop Options header (HBH). This causes the self-test to fail
in environments where such network elements are present.
To improve the robustness and reliability of this test in diverse
network environments, switch from using IPPROTO_HOPOPTS to
IPPROTO_DSTOPTS (Destination Options).
The Destination Options header is less likely to be dropped by
intermediate routers and still serves the core purpose of the test:
validating GRO's handling of an IPv6 extension header. This change
ensures the test can execute successfully without being incorrectly
failed by network policies outside the kernel's control.
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030060436.1556664-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to the gro_sender sending data packets and FIN packets
in very quick succession, these are received almost simultaneously
by the gro_receiver. FIN packets are sometimes processed before the
data packets leading to intermittent (~1/100) test failures.
This change adds a delay of 100ms before sending FIN packets
in gro:tcp test to avoid the out-of-order delivery. The same
mitigation already exists for the gro:ip test.
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030062818.1562228-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
test_tc_tunnel is missing checks on any open_netns. Add those checks
anytime we try to enter a net namespace, and skip the related operations
if we fail. While at it, reduce the number of open_netns/close_netns for
cases involving operations in two distinct namespaces: the test
currently does the following:
nstoken = open_netns("foo")
do_operation();
close(nstoken);
nstoken = open_netns("bar")
do_another_operation();
close(nstoken);
As already stated in reviews for the initial test, we don't need to go
back to the root net namespace to enter a second namespace, so just do:
ntoken_client = open_netns("foo")
do_operation();
nstoken_server = open_netns("bar")
do_another_operation();
close(nstoken_server);
close(nstoken_client);
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251031-tc_tunnel_improv-v1-2-0ffe44d27eda@bootlin.com
A subtest setup can fail in a wide variety of ways, so make sure not to
run it if an issue occurs during its setup. The return value is
already representing whether the setup succeeds or fails, it is just
about wiring it.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251031-tc_tunnel_improv-v1-1-0ffe44d27eda@bootlin.com
- Fix overflows in vfio type1 backend for mappings at the end of the
64-bit address space, resulting in leaked pinned memory. New
selftest support included to avoid such issues in the future.
(Alex Mastro)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkFF0YRHGFsZXhAc2hh
emJvdC5vcmcACgkQI5ubbjuwiyLZsQ/+OzFOEOOw5UIpVrMsnp7DmLOSYgiorlpS
q9jtsP99dh4U1JqJaONAGb6uu1SL6yX6VL3QZNEppPY83WXVSxX44lo3qpTZ90iq
xpeUTNHv/YOhAYJ+eXbGfjoJsxQqRLECkWyJwQAQl227yRsqWzAR5twRhBrLi9Gf
hd6d+ZQuTbEQo3oa7bNA2Q+EUevn7gEiNEBAz6L+zz0DWiVrd+XEk4N+37MR7UKr
6og/p3lI6AyjFG3dqUGBB2uUVfCGzdCXi6F01reZm3q7aPUdO+PuP7MnYjpJ5j4n
Ph3qN5iZ+bF24wXtZEGs+NDvaNXsfk5RoPxjc62DeYRG7WfH83KZCyrQju6vuCsd
CFirCf/97MK/LdDaj46nOi37vNQGap4EibmJTI9hMRh2x2RRlxTVdOv+j/lkQAEr
GgNwPMZO1yp2tJKQ5DJEgaXgQyOfEpWAtG2qbVCXHUc6lzsN2HQTpIDSbXsZdpfK
IuCLtmaR20tfUoRQJYCryb2hww2+6Z5j6W+CDdZpCfkqmQnsC0duoebru1g32vc0
20PrVE/xMMtw+I1CBTDLWFDLs/M8sg0AbBicoHFXh1Ea8n9EC3b/aczPk0ff0b9g
x2BcDBaoCyslE7pSpeCwemCrACtreXu7Ttzb5CgAIQAuBcZCvKAFGC80KGOt8Tbh
M0bQYDkeouM=
=sfMw
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Fix overflows in vfio type1 backend for mappings at the end of the
64-bit address space, resulting in leaked pinned memory.
New selftest support included to avoid such issues in the future
(Alex Mastro)
* tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio:
vfio: selftests: add end of address space DMA map/unmap tests
vfio: selftests: update DMA map/unmap helpers to support more test kinds
vfio/type1: handle DMA map/unmap up to the addressable limit
vfio/type1: move iova increment to unmap_unpin_*() caller
vfio/type1: sanitize for overflow using check_*_overflow()
test_xsk.c isn't part of the test_progs framework.
Integrate the tests defined by test_xsk.c into the test_progs framework
through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it
as veth peers don't support it.
Move test_xsk{.c/.h} to prog_tests/.
Add the find_bit library to test_progs sources in the Makefile as it is
is used by test_xsk.c
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-15-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Following tests won't fit in the CI:
- XDP_ADJUST_TAIL_* and SEND_RECEIVE_9K_PACKETS because of their
flakyness
- UNALIGNED_* because they depend on huge page allocations
- *_RING_SIZE because they depend on HW rings
- TEARDOWN because it's too long
Remove these tests from the nominal tests table so they won't be
run by the CI in upcoming patch.
Create a skip_ci_tests table to hold them.
Use this skip_ci table in xskxceiver.c to keep all the tests available
from the test_xsk.sh script.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-14-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is
called. This terminates the program immediately. It prevents the following
tests from running and isn't compliant with the CI.
Return NULL in case of allocation failure.
Return TEST_FAILURE when something goes wrong in the packet generation.
Clean up the resources if a failure happens between two steps of a test.
Move exit_with_error()'s definition into xskxceiver.c as it isn't used
anywhere else now.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-13-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
__testapp_validate_traffic() calls exit_with_error() on failures. This
exits the program immediately. It prevents the following tests from
running and isn't compliant with the CI.
Return TEST_FAILURE instead of calling exit_with_error().
Release the resource of the 1st thread if a failure happens between its
creation and the creation of the second thread.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-12-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
TX and RX workers can fail in many places. These failures trigger a call
to exit_with_error() which exits the program immediately. It prevents the
following tests from running and isn't compliant with the CI.
Add return value to functions that can fail.
Handle failures more smoothly through report_failure().
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-11-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
exit_with_error() is called when gettimeofday() fails. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.
Return TEST_FAILURE instead of calling exit_on_error().
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-10-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
xsk_reattach_xdp calls exit_with_error() on failures. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.
Add a return value to the functions handling XDP attachments to handle
errors more smoothly.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-9-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
init_iface() doesn't have any return value while it can fail. In case of
failure it calls exit_on_error() which exits the application
immediately. This prevents the following tests from being run and isn't
compliant with the CI
Add a return value to init_iface() so errors can be handled more
smoothly.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-8-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
testapp_validate_traffic() doesn't release the sockets and the umem
created by the threads if the test isn't currently in its last step.
Thus, if the swap_xsk_resources() fails before the last step, the
created resources aren't cleaned up.
Clean the sockets and the umem in case of swap_xsk_resources() failure.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-7-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The clean-up done at the end of a test in __testapp_validate_traffic()
isn't wrapped in a function. It isn't convenient if we want to use it
somewhere else in the code.
Wrap the clean-up in two new functions : the first deletes the sockets,
the second releases the umem.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-6-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr,
where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed.
At the end of the test pkt_stream_restore_default() only releases
xsk_arr[0] which leads to memory leaks.
Release the missing pkt_stream at the end of testapp_xdp_shared_umem()
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-5-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
testapp_stats_rx_dropped() generates pkt_stream twice. The last
generated is released by pkt_stream_restore_default() at the end of the
test but we lose the pointer of the first pkt_stream.
Release the 'middle' pkt_stream when it's getting replaced to prevent
memory leaks.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-4-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
__testapp_validate_traffic is supposed to return an integer value that
tells if the test passed (0), failed (-1) or was skiped (2). It actually
returns a boolean in the end. This doesn't harm when the test is
successful but can lead to misinterpretation in case of failure as 1
will be returned instead of -1.
Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-3-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bitmap is used before being initialized.
Initialize it to zero before using it.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-2-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>