Commit Graph

22457 Commits

Author SHA1 Message Date
Jiayuan Chen
8dd1bdde38 selftests/bpf: add test for xdp_master_redirect with bond not up
Add a selftest that reproduces the null-ptr-deref in
bond_rr_gen_slave_id() when XDP redirect targets a bond device in
round-robin mode that was never brought up. The test verifies the fix
by ensuring no crash occurs.

Test setup:
- bond0: active-backup mode, UP, with native XDP (enables
  bpf_master_redirect_enabled_key globally)
- bond1: round-robin mode, never UP
- veth1: slave of bond1, with generic XDP (XDP_TX)
- BPF_PROG_TEST_RUN with live frames triggers the redirect path

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260411005524.201200-3-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-14 10:39:24 +02:00
Christian Brauner
cad3bf1c33
selftests/namespaces: remove unused utils.h include from listns_efault_test
Remove the inclusion of ../filesystems/utils.h from listns_efault_test.c.
The test doesn't use any symbols from that header. Including it alongside
../pidfd/pidfd.h causes a build failure because both headers define
wait_for_pid() with conflicting linkage:

  ../filesystems/utils.h:  extern int wait_for_pid(pid_t pid);
  ../pidfd/pidfd.h:        static inline int wait_for_pid(pid_t pid)

All symbols the test actually uses (create_child, read_nointr,
write_nointr, sys_pidfd_send_signal) come from pidfd.h.

Reported-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/all/acPV19IY3Gna6Ira@sirena.org.uk
Fixes: 07d7ad46da ("selftests/namespaces: test for efault")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-14 09:31:18 +02:00
Christian Brauner
660c09404c
selftests/fsmount_ns: add missing TARGETS and fix cap test
Add missing top-level kselftest TARGETS entries for empty_mntns and
fsmount_ns so that 'make kselftest' discovers and runs these tests.

Fix requires_cap_sys_admin test which always SKIPped because fsopen()
was called after enter_userns(), where CAP_SYS_ADMIN in the mount
namespace's user_ns is unavailable. Move fsopen/fsconfig before fork so
the configured fs_fd is inherited by the child, which then only needs to
call fsmount() after dropping privileges.

Fixes: 3ac7ea91f3 ("selftests: add FSMOUNT_NAMESPACE tests")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-14 09:31:07 +02:00
Christian Brauner
d38aa6cdee
selftests/empty_mntns: fix wrong CLONE_EMPTY_MNTNS hex value in comment
CLONE_EMPTY_MNTNS is (1ULL << 37) = 0x2000000000ULL, not 0x400000000ULL.

Fixes: 5b8ffd63fb ("selftests/filesystems: add clone3 tests for empty mount namespaces")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-14 09:30:57 +02:00
Christian Brauner
1a398a2378
selftests/empty_mntns: fix statmount_alloc() signature mismatch
empty_mntns.h includes ../statmount/statmount.h which provides a
4-argument statmount_alloc(mnt_id, mnt_ns_id, mask, flags), but then
redefines its own 3-argument version without the flags parameter. This
causes a build failure due to conflicting types.

Remove the duplicate definition from empty_mntns.h and update all
callers to pass 0 for the flags argument.

Fixes: 32f54f2bbc ("selftests/filesystems: add tests for empty mount namespaces")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-14 09:30:48 +02:00
Christian Brauner
a27e464262
selftests/statmount: remove duplicate wait_for_pid()
Remove the local static wait_for_pid() definition from
statmount_test_ns.c as it conflicts with the extern declaration in
utils.h. The identical function is already provided by utils.c.

Fixes: 3ac7ea91f3 ("selftests: add FSMOUNT_NAMESPACE tests")
Cc: <stable@kernel.org> # mainline only
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-14 09:30:31 +02:00
Linus Torvalds
1334d2a3b3 gpio updates for v7.1-rc1
GPIO core:
 - defer probe on software node lookups when the remote software node
   exists but has not been registered as a firmware node yet
 - unify GPIO hog handling by moving code duplicated in OF and ACPI
   modules into GPIO core and allow setting up hogs with software nodes
 - allow matching GPIO controllers by secondary firmware node if matching
   by primary does not succeed
 - demote deferral warnings to debug level as they are quite normal when
   using software nodes which don't support fw_devlink yet
 - disable the legacy GPIO character device uAPI v1 supprt in Kconfig by
   default
 - rework several core functions in preparation for the upcoming Revocable
   helper library for protecting resources against sudden removal, this
   reduces the number of SRCU dereferences in GPIO core
 - simplify file descriptor logic in GPIO character device code by using
   FD_PREPARE()
 - introduce a header defining symbols used by both GPIO consumers and
   providers to avoid having to include provider-specific headers from
   drivers which only consume GPIOs
 - replace snprintf() with strscpy() where formatting is not required
 
 New drivers:
 - add the gpio-by-pinctrl generic driver using the ARM SCMI protocol to
   control GPIOs (along with SCMI changes pulled from the pinctrl tree)
 - add a driver providing support for handling of platform events via
   GPIO-signalled ACPI events (used on Intel Nova Lake and later platforms)
 
 Driver changes:
 - extend the gpio-kempld driver with support for more recent models,
   interrupts and setting/getting multiple values at once
 - improve interrupt handling in gpio-brcmstb
 - add support for multi-SoC systems in gpio-tegra186
 - make sure we return correct values from the .get() callbacks in several
   GPIO drivers by normalizing any values other than 0, 1 or negative error
   numbers
 - use flexible arrays in several drivers to reduce the number of required
   memory allocations
 - simplify synchronous waiting for virtual drivers to probe and remove the
   dedicated, a bit overengineered helper library dev-sync-probe
 - remove unneeded Kconfig dependencies on OF_GPIO in several drivers and
   subsystems
 - convert the two remaining users of of_get_named_gpio() to using GPIO
   descriptors and remove the (no longer used) function along with the
   header that declares it
 - add missing includes in gpio-mmio
 - shrink and simplify code in gpio-max732x by using guard(mutex)
 - remove duplicated code handling the 'ngpios' property from gpio-ts4800,
   it's already handled in GPIO core
 - use correct variable type in gpio-aspeed
 - add support for a new model in gpio-realtek-otto
 - allow to specify the active-low setting of simulated hogs over the
   configfs interface (in addition to existing devicetree support) in
   gpio-sim
 
 Bug fixes:
 - clear the OF_POPULATED flag on hog nodes in GPIO chip remove path on
   OF systems
 - fix resource leaks in error path in gpiochip_add_data_with_key()
 - drop redundant device reference in gpio-mpsse
 
 Tests:
 - add selftests for use-after-free cases in GPIO character device code
 
 DT bindings:
 - add a DT binding document for SCMI based, gpio-over-pinctrl devices
 - fix interrupt description in microchip,mpfs-gpio
 - add new compatible for gpio-realtek-otto
 - describe the resets of the mpfs-gpio controller
 - fix maintainer's email in gpio-delay bindings
 - remove the binding document for cavium,thunder-8890 as the corresponding
   device is bound over PCI and not firmware nodes
 
 Documentation:
 - update the recommended way of converting legacy boards to using software
   nodes for GPIO description
 - describe GPIO line value semantics
 - misc updates to kerneldocs
 
 Misc:
 - convert OMAP1 ams-delta board to using GPIO hogs described with software
   nodes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEkeUTLeW1Rh17omX8BZ0uy/82hMMFAmnYsngACgkQBZ0uy/82
 hMO+Tw/+N8eX1GOWkEdZBZRzd6QW+2qjmeMgvizMUu2CzfYpcuO9kpOqiSjguj2S
 6hODOajwU6EDdrxEHy7zJrAc6Tw1xIQeCnmSsFjC+4OePsCP6bU0QBgdb0V5waIx
 3kEdBsM3Msw48SCDFTUQJ0XyUD/VP4TZXzhDmU0X1OJsGYDSMchiBytNpZrFt18q
 qTq/NJm6mT6h5XlTeTCmfBcf/TG7MZhAPzXw8YZp+ZIgDsTRtD/P6CAZgJ0OU9f4
 MQwJO5+JFkTO7XhL+qOfJcnKnC2lRHaa7mJiSQ+XS43NOqO7NGeGH2l7hU/Lx/fR
 NIIZk27uBRV1akpnMGtgbqL2A8SFeH5yj3/o6S4rp9IzDwOouN+1seaL2RUHpTns
 TgIm037MNIZI8eQ2lSA9/+f4vwF1bml8mA/6lVBGHI9ZcaZbcUyWjXPcsuVIyiqY
 HlV+A3sVjchpaH9Eie78nbVm2X7Wm5slEazXAl3zVjlekQut+Fp+xoBWwulEjp+H
 7PZXqLP2hV/Xw4C2mn/zwwokzj+1S1DPW5Inn5Y7Qi6/j4GmvdVmI1zBculrf7jj
 GE6UUz+Vm9v6oE+q19jsksudVrDPapASYV/TLGQZk48IXhy+KxB8lep6L2rEAQtS
 YgBltnjJlbrx99u1sZkPECzgCRYSgm59Lt0aOv93CL2fURI1Seg=
 =VknJ
 -----END PGP SIGNATURE-----

Merge tag 'gpio-updates-for-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
 "For this merge window we have two new drivers: support for
  GPIO-signalled ACPI events on Intel platforms and a generic
  GPIO-over-pinctrl driver using the ARM SCMI protocol for
  controlling pins.

  Several things have been reworked in GPIO core: we unduplicated GPIO
  hog handling, reduced the number of SRCU locks and dereferences,
  improved support for software-node-based lookup and removed more
  legacy code after converting remaining users to modern alternatives.

  There's also a number of driver reworks and refactoring, documentation
  updates, some bug-fixes and new tests.

  GPIO core:
   - defer probe on software node lookups when the remote software node
     exists but has not been registered as a firmware node yet
   - unify GPIO hog handling by moving code duplicated in OF and ACPI
     modules into GPIO core and allow setting up hogs with software
     nodes
   - allow matching GPIO controllers by secondary firmware node if
     matching by primary does not succeed
   - demote deferral warnings to debug level as they are quite normal
     when using software nodes which don't support fw_devlink yet
   - disable the legacy GPIO character device uAPI v1 supprt in Kconfig
     by default
   - rework several core functions in preparation for the upcoming
     Revocable helper library for protecting resources against sudden
     removal, this reduces the number of SRCU dereferences in GPIO core
   - simplify file descriptor logic in GPIO character device code by
     using FD_PREPARE()
   - introduce a header defining symbols used by both GPIO consumers and
     providers to avoid having to include provider-specific headers from
     drivers which only consume GPIOs
   - replace snprintf() with strscpy() where formatting is not required

  New drivers:
   - add the gpio-by-pinctrl generic driver using the ARM SCMI protocol
     to control GPIOs (along with SCMI changes pulled from the pinctrl
     tree)
   - add a driver providing support for handling of platform events via
     GPIO-signalled ACPI events (used on Intel Nova Lake and later
     platforms)

  Driver changes:
   - extend the gpio-kempld driver with support for more recent models,
     interrupts and setting/getting multiple values at once
   - improve interrupt handling in gpio-brcmstb
   - add support for multi-SoC systems in gpio-tegra186
   - make sure we return correct values from the .get() callbacks in
     several GPIO drivers by normalizing any values other than 0, 1 or
     negative error numbers
   - use flexible arrays in several drivers to reduce the number of
     required memory allocations
   - simplify synchronous waiting for virtual drivers to probe and
     remove the dedicated, a bit overengineered helper library
     dev-sync-probe
   - remove unneeded Kconfig dependencies on OF_GPIO in several drivers
     and subsystems
   - convert the two remaining users of of_get_named_gpio() to using
     GPIO descriptors and remove the (no longer used) function along
     with the header that declares it
   - add missing includes in gpio-mmio
   - shrink and simplify code in gpio-max732x by using guard(mutex)
   - remove duplicated code handling the 'ngpios' property from
     gpio-ts4800, it's already handled in GPIO core
   - use correct variable type in gpio-aspeed
   - add support for a new model in gpio-realtek-otto
   - allow to specify the active-low setting of simulated hogs over the
     configfs interface (in addition to existing devicetree support) in
     gpio-sim

  Bug fixes:
   - clear the OF_POPULATED flag on hog nodes in GPIO chip remove path
     on OF systems
   - fix resource leaks in error path in gpiochip_add_data_with_key()
   - drop redundant device reference in gpio-mpsse

  Tests:
   - add selftests for use-after-free cases in GPIO character device
     code

  DT bindings:
   - add a DT binding document for SCMI based, gpio-over-pinctrl devices
   - fix interrupt description in microchip,mpfs-gpio
   - add new compatible for gpio-realtek-otto
   - describe the resets of the mpfs-gpio controller
   - fix maintainer's email in gpio-delay bindings
   - remove the binding document for cavium,thunder-8890 as the
     corresponding device is bound over PCI and not firmware nodes

  Documentation:
   - update the recommended way of converting legacy boards to using
     software nodes for GPIO description
   - describe GPIO line value semantics
   - misc updates to kerneldocs

  Misc:
   - convert OMAP1 ams-delta board to using GPIO hogs described with
     software nodes"

* tag 'gpio-updates-for-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (79 commits)
  gpio: swnode: defer probe on references to unregistered software nodes
  dt-bindings: gpio: cavium,thunder-8890: Remove DT binding
  Documentation: gpio: update the preferred method for using software node lookup
  gpio: gpio-by-pinctrl: s/used to do/is used to do/
  gpio: aspeed: fix unsigned long int declaration
  gpio: rockchip: convert to dynamic GPIO base allocation
  gpio: remove dev-sync-probe
  gpio: virtuser: stop using dev-sync-probe
  gpio: aggregator: stop using dev-sync-probe
  gpio: sim: stop using dev-sync-probe
  gpio: Add Intel Nova Lake ACPI GPIO events driver
  gpiolib: Make deferral warnings debug messages
  gpiolib: fix hogs with multiple lines
  gpio: fix up CONFIG_OF dependencies
  gpio: gpio-by-pinctrl: add pinctrl based generic GPIO driver
  gpio: dt-bindings: Add GPIO on top of generic pin control
  firmware: arm_scmi: Allow PINCTRL_REQUEST to return EOPNOTSUPP
  pinctrl: scmi: ignore PIN_CONFIG_PERSIST_STATE
  pinctrl: scmi: Delete PIN_CONFIG_OUTPUT_IMPEDANCE_OHMS support
  pinctrl: scmi: Add SCMI_PIN_INPUT_VALUE
  ...
2026-04-13 20:10:58 -07:00
Linus Torvalds
4793dae01f Driver core changes for 7.1-rc1
- debugfs:
   - Fix NULL pointer dereference in debugfs_create_str()
   - Fix misplaced EXPORT_SYMBOL_GPL for debugfs_create_str()
   - Fix soundwire debugfs NULL pointer dereference from uninitialized
     firmware_file
 
 - device property:
   - Make fwnode flags modifications thread safe; widen the field to
     unsigned long and use set_bit() / clear_bit() based accessors
   - Document how to check for the property presence
 
 - devres:
   - Separate struct devres_node from its "subclasses" (struct devres,
     struct devres_group); give struct devres_node its own release and
     free callbacks for per-type dispatch
   - Introduce struct devres_action for devres actions, avoiding the
     ARCH_DMA_MINALIGN alignment overhead of struct devres
   - Export struct devres_node and its init/add/remove/dbginfo
     primitives for use by Rust Devres<T>
   - Fix missing node debug info in devm_krealloc()
   - Use guard(spinlock_irqsave) where applicable; consolidate unlock
     paths in devres_release_group()
 
 - driver_override:
   - Convert PCI, WMI, vdpa, s390/cio, s390/ap, and fsl-mc to the
     generic driver_override infrastructure, replacing per-bus
     driver_override strings, sysfs attributes, and match logic; fixes
     a potential UAF from unsynchronized access to driver_override in
     bus match() callbacks
   - Simplify __device_set_driver_override() logic
 
 - kernfs:
   - Send IN_DELETE_SELF and IN_IGNORED inotify events on kernfs
     file and directory removal
   - Add corresponding selftests for memcg
 
 - platform:
   - Allow attaching software nodes when creating platform devices via
     a new 'swnode' field in struct platform_device_info
   - Add kerneldoc for struct platform_device_info
 
 - software node:
   - Move software node initialization from postcore_initcall() to
     driver_init(), making it available early in the boot process
   - Move kernel_kobj initialization (ksysfs_init) earlier to support
     the above
   - Remove software_node_exit(); dead code in a built-in unit
 
 - SoC:
   - Introduce of_machine_read_compatible() and of_machine_read_model()
     OF helpers and export soc_attr_read_machine() to replace direct
     accesses to of_root from SoC drivers; also enables
     CONFIG_COMPILE_TEST coverage for these drivers
 
 - sysfs:
   - Constify attribute group array pointers to
     'const struct attribute_group *const *' in sysfs functions,
     device_add_groups() / device_remove_groups(), and struct class
 
 - Rust:
   - Devres:
     - Embed struct devres_node directly in Devres<T> instead of going
       through devm_add_action(), avoiding the extra allocation and
       the unnecessary ARCH_DMA_MINALIGN alignment
 
   - I/O:
     - Turn IoCapable from a marker trait into a functional trait
       carrying the raw I/O accessor implementation (io_read /
       io_write), providing working defaults for the per-type Io
       methods
     - Add RelaxedMmio wrapper type, making relaxed accessors usable
       in code generic over the Io trait
     - Remove overloaded per-type Io methods and per-backend macros
       from Mmio and PCI ConfigSpace
 
   - I/O (Register):
     - Add IoLoc trait and generic read/write/update methods to the Io
       trait, making I/O operations parameterizable by typed locations
     - Add register! macro for defining hardware register types with
       typed bitfield accessors backed by Bounded values; supports
       direct, relative, and array register addressing
     - Add write_reg() / try_write_reg() and LocatedRegister trait
     - Update PCI sample driver to demonstrate the register! macro
 
         Example:
 
         ```
             register! {
                 /// UART control register.
                 CTRL(u32) @ 0x18 {
                     /// Receiver enable.
                     19:19   rx_enable => bool;
                     /// Parity configuration.
                     14:13   parity ?=> Parity;
                 }
 
                 /// FIFO watermark and counter register.
                 WATER(u32) @ 0x2c {
                     /// Number of datawords in the receive FIFO.
                     26:24   rx_count;
                     /// RX interrupt threshold.
                     17:16   rx_water;
                 }
             }
 
             impl WATER {
                 fn rx_above_watermark(&self) -> bool {
                     self.rx_count() > self.rx_water()
                 }
             }
 
             fn init(bar: &pci::Bar<BAR0_SIZE>) {
                 let water = WATER::zeroed()
                     .with_const_rx_water::<1>(); // > 3 would not compile
                 bar.write_reg(water);
 
                 let ctrl = CTRL::zeroed()
                     .with_parity(Parity::Even)
                     .with_rx_enable(true);
                 bar.write_reg(ctrl);
             }
 
             fn handle_rx(bar: &pci::Bar<BAR0_SIZE>) {
                 if bar.read(WATER).rx_above_watermark() {
                     // drain the FIFO
                 }
             }
 
             fn set_parity(bar: &pci::Bar<BAR0_SIZE>, parity: Parity) {
                 bar.update(CTRL, |r| r.with_parity(parity));
             }
         ```
 
   - IRQ:
     - Move 'static bounds from where clauses to trait declarations
       for IRQ handler traits
 
   - Misc:
     - Enable the generic_arg_infer Rust feature
     - Extend Bounded with shift operations, single-bit bool conversion,
       and const get()
 
 - Misc:
   - Make deferred_probe_timeout default a Kconfig option
   - Drop auxiliary_dev_pm_ops; the PM core falls back to driver PM
     callbacks when no bus type PM ops are set
   - Add conditional guard support for device_lock()
   - Add ksysfs.c to the DRIVER CORE MAINTAINERS entry
   - Fix kernel-doc warnings in base.h
   - Fix stale reference to memory_block_add_nid() in documentation
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCadl5SwAKCRBFlHeO1qrK
 LpjDAQCSG3vYznwrngfpmRU5bCB9sdUy/pZiX5px1357+amJkwEA9LgIVQvtHAZW
 ZXcQ7Jr+mR3mJEdlatbkWHp3w1VHqAQ=
 =y1DV
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core updates from Danilo Krummrich:
 "debugfs:
   - Fix NULL pointer dereference in debugfs_create_str()
   - Fix misplaced EXPORT_SYMBOL_GPL for debugfs_create_str()
   - Fix soundwire debugfs NULL pointer dereference from uninitialized
     firmware_file

  device property:
   - Make fwnode flags modifications thread safe; widen the field to
     unsigned long and use set_bit() / clear_bit() based accessors
   - Document how to check for the property presence

  devres:
   - Separate struct devres_node from its "subclasses" (struct devres,
     struct devres_group); give struct devres_node its own release and
     free callbacks for per-type dispatch
   - Introduce struct devres_action for devres actions, avoiding the
     ARCH_DMA_MINALIGN alignment overhead of struct devres
   - Export struct devres_node and its init/add/remove/dbginfo
     primitives for use by Rust Devres<T>
   - Fix missing node debug info in devm_krealloc()
   - Use guard(spinlock_irqsave) where applicable; consolidate unlock
     paths in devres_release_group()

  driver_override:
   - Convert PCI, WMI, vdpa, s390/cio, s390/ap, and fsl-mc to the
     generic driver_override infrastructure, replacing per-bus
     driver_override strings, sysfs attributes, and match logic; fixes a
     potential UAF from unsynchronized access to driver_override in bus
     match() callbacks
   - Simplify __device_set_driver_override() logic

  kernfs:
   - Send IN_DELETE_SELF and IN_IGNORED inotify events on kernfs file
     and directory removal
   - Add corresponding selftests for memcg

  platform:
   - Allow attaching software nodes when creating platform devices via a
     new 'swnode' field in struct platform_device_info
   - Add kerneldoc for struct platform_device_info

  software node:
   - Move software node initialization from postcore_initcall() to
     driver_init(), making it available early in the boot process
   - Move kernel_kobj initialization (ksysfs_init) earlier to support
     the above
   - Remove software_node_exit(); dead code in a built-in unit

  SoC:
   - Introduce of_machine_read_compatible() and of_machine_read_model()
     OF helpers and export soc_attr_read_machine() to replace direct
     accesses to of_root from SoC drivers; also enables
     CONFIG_COMPILE_TEST coverage for these drivers

  sysfs:
   - Constify attribute group array pointers to
     'const struct attribute_group *const *' in sysfs functions,
     device_add_groups() / device_remove_groups(), and struct class

  Rust:
   - Devres:
      - Embed struct devres_node directly in Devres<T> instead of going
        through devm_add_action(), avoiding the extra allocation and the
        unnecessary ARCH_DMA_MINALIGN alignment

   - I/O:
      - Turn IoCapable from a marker trait into a functional trait
        carrying the raw I/O accessor implementation (io_read /
        io_write), providing working defaults for the per-type Io
        methods
      - Add RelaxedMmio wrapper type, making relaxed accessors usable in
        code generic over the Io trait
      - Remove overloaded per-type Io methods and per-backend macros
        from Mmio and PCI ConfigSpace

   - I/O (Register):
      - Add IoLoc trait and generic read/write/update methods to the Io
        trait, making I/O operations parameterizable by typed locations
      - Add register! macro for defining hardware register types with
        typed bitfield accessors backed by Bounded values; supports
        direct, relative, and array register addressing
      - Add write_reg() / try_write_reg() and LocatedRegister trait
      - Update PCI sample driver to demonstrate the register! macro

         Example:

         ```
             register! {
                 /// UART control register.
                 CTRL(u32) @ 0x18 {
                     /// Receiver enable.
                     19:19   rx_enable => bool;
                     /// Parity configuration.
                     14:13   parity ?=> Parity;
                 }

                 /// FIFO watermark and counter register.
                 WATER(u32) @ 0x2c {
                     /// Number of datawords in the receive FIFO.
                     26:24   rx_count;
                     /// RX interrupt threshold.
                     17:16   rx_water;
                 }
             }

             impl WATER {
                 fn rx_above_watermark(&self) -> bool {
                     self.rx_count() > self.rx_water()
                 }
             }

             fn init(bar: &pci::Bar<BAR0_SIZE>) {
                 let water = WATER::zeroed()
                     .with_const_rx_water::<1>(); // > 3 would not compile
                 bar.write_reg(water);

                 let ctrl = CTRL::zeroed()
                     .with_parity(Parity::Even)
                     .with_rx_enable(true);
                 bar.write_reg(ctrl);
             }

             fn handle_rx(bar: &pci::Bar<BAR0_SIZE>) {
                 if bar.read(WATER).rx_above_watermark() {
                     // drain the FIFO
                 }
             }

             fn set_parity(bar: &pci::Bar<BAR0_SIZE>, parity: Parity) {
                 bar.update(CTRL, |r| r.with_parity(parity));
             }
         ```

   - IRQ:
      - Move 'static bounds from where clauses to trait declarations for
        IRQ handler traits

   - Misc:
      - Enable the generic_arg_infer Rust feature
      - Extend Bounded with shift operations, single-bit bool
        conversion, and const get()

  Misc:
   - Make deferred_probe_timeout default a Kconfig option
   - Drop auxiliary_dev_pm_ops; the PM core falls back to driver PM
     callbacks when no bus type PM ops are set
   - Add conditional guard support for device_lock()
   - Add ksysfs.c to the DRIVER CORE MAINTAINERS entry
   - Fix kernel-doc warnings in base.h
   - Fix stale reference to memory_block_add_nid() in documentation"

* tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (67 commits)
  bus: fsl-mc: use generic driver_override infrastructure
  s390/ap: use generic driver_override infrastructure
  s390/cio: use generic driver_override infrastructure
  vdpa: use generic driver_override infrastructure
  platform/wmi: use generic driver_override infrastructure
  PCI: use generic driver_override infrastructure
  driver core: make software nodes available earlier
  software node: remove software_node_exit()
  kernel: ksysfs: initialize kernel_kobj earlier
  MAINTAINERS: add ksysfs.c to the DRIVER CORE entry
  drivers/base/memory: fix stale reference to memory_block_add_nid()
  device property: Document how to check for the property presence
  soundwire: debugfs: initialize firmware_file to empty string
  debugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str()
  debugfs: check for NULL pointer in debugfs_create_str()
  driver core: Make deferred_probe_timeout default a Kconfig option
  driver core: simplify __device_set_driver_override() clearing logic
  driver core: auxiliary bus: Drop auxiliary_dev_pm_ops
  device property: Make modifications of fwnode "flags" thread safe
  rust: devres: embed struct devres_node directly
  ...
2026-04-13 19:03:11 -07:00
Linus Torvalds
d568788baa hardening updates for v7.1-rc1
- randomize_kstack: Improve implementation across arches (Ryan Roberts)
 
 - lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
 
 - refcount: Remove unused __signed_wrap function annotations
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCad16PwAKCRA2KwveOeQk
 u7crAP4qz8gXCjes76KsZm/YQS8PtOG5JroAVu5Oa4ohw0RfaQD+K/XLow1plcNF
 4Bi8zSuv2ifcLysh9qEAbx5+wcHijgo=
 =woB3
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

 - randomize_kstack: Improve implementation across arches (Ryan Roberts)

 - lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test

 - refcount: Remove unused __signed_wrap function annotations

* tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
  refcount: Remove unused __signed_wrap function annotations
  randomize_kstack: Unify random source across arches
  randomize_kstack: Maintain kstack_offset per task
2026-04-13 17:52:29 -07:00
Linus Torvalds
cea4a90faf seccomp update for v7.1-rc1
- selftests: Add hard-coded __NR_uprobe for x86_64 (Oleg Nesterov)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCad146wAKCRA2KwveOeQk
 u9LrAP4hemiMrgvEIo2DPVJ8H+QiRZtsJvFLFC/yvdgrllxyUQEA24RjRKXoZcL4
 G9hD3FL31MTcbTV9Gdf0bOlLCqqUqg4=
 =7sDF
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp update from Kees Cook:

 - selftests: Add hard-coded __NR_uprobe for x86_64 (Oleg Nesterov)

* tag 'seccomp-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: Add hard-coded __NR_uprobe for x86_64
2026-04-13 17:45:25 -07:00
Linus Torvalds
d142ab35ee CRC updates for 7.1
- Several improvements related to crc_kunit, to align with the
   standard KUnit conventions and make it easier for developers and CI
   systems to run this test suite
 
 - Add an arm64-optimized implementation of CRC64-NVME
 
 - Remove unused code for big endian arm64
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCadWAVxQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK4zEAP9Asz9U6aA8XdlYa66nSORpUz0zQxVL
 tIvSOaQis6K0LwEAm4xABXc+rrVHqYhePRRqDteAEOzhBOX9mwY8wPjR3Ag=
 =VknF
 -----END PGP SIGNATURE-----

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

 - Several improvements related to crc_kunit, to align with the standard
   KUnit conventions and make it easier for developers and CI systems to
   run this test suite

 - Add an arm64-optimized implementation of CRC64-NVME

 - Remove unused code for big endian arm64

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crc: arm64: Simplify intrinsics implementation
  lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
  lib/crc: arm64: Drop unnecessary chunking logic from crc64
  lib/crc: arm64: Assume a little-endian kernel
  lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
  lib/crc: arm64: Drop check for CONFIG_KERNEL_MODE_NEON
  crypto: crc32c - Remove another outdated comment
  crypto: crc32c - Remove more outdated usage information
  kunit: configs: Enable all CRC tests in all_tests.config
  lib/crc: tests: Add a .kunitconfig file
  lib/crc: tests: Add CRC_ENABLE_ALL_FOR_KUNIT
  lib/crc: tests: Make crc_kunit test only the enabled CRC variants
2026-04-13 17:36:04 -07:00
Linus Torvalds
370c388319 Crypto library updates for 7.1
- Migrate more hash algorithms from the traditional crypto subsystem
   to lib/crypto/.
 
   Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
   the implementations, improves performance, enables further
   simplifications in calling code, and solves various other issues:
 
     - AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
 
         - Support these algorithms in lib/crypto/ using the AES
           library and the existing arm64 assembly code
 
         - Reimplement the traditional crypto API's "cmac(aes)",
           "xcbc(aes)", and "cbcmac(aes)" on top of the library
 
         - Convert mac80211 to use the AES-CMAC library. Note: several
           other subsystems can use it too and will be converted later
 
         - Drop the broken, nonstandard, and likely unused support for
           "xcbc(aes)" with key lengths other than 128 bits
 
         - Enable optimizations by default
 
     - GHASH
 
         - Migrate the standalone GHASH code into lib/crypto/
 
         - Integrate the GHASH code more closely with the very similar
           POLYVAL code, and improve the generic GHASH implementation
           to resist cache-timing attacks and use much less memory
 
         - Reimplement the AES-GCM library and the "gcm" crypto_aead
           template on top of the GHASH library. Remove "ghash" from
           the crypto_shash API, as it's no longer needed
 
         - Enable optimizations by default
 
     - SM3
 
         - Migrate the kernel's existing SM3 code into lib/crypto/, and
           reimplement the traditional crypto API's "sm3" on top of it
 
         - I don't recommend using SM3, but this cleanup is worthwhile
           to organize the code the same way as other algorithms
 
 - Testing improvements
 
     - Add a KUnit test suite for each of the new library APIs
 
     - Migrate the existing ChaCha20Poly1305 test to KUnit
 
     - Make the KUnit all_tests.config enable all crypto library tests
 
     - Move the test kconfig options to the Runtime Testing menu
 
 - Other updates to arch-optimized crypto code
 
     - Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
 
     - Remove some MD5 implementations that are no longer worth keeping
 
     - Drop big endian and voluntary preemption support from the arm64
       code, as those configurations are no longer supported on arm64
 
 - Make jitterentropy and samples/tsm-mr use the crypto library APIs
 
 Note: the overall diffstat is neutral, but when the test code is
 excluded it is significantly negative:
 
     Tests:     13 files changed, 1982 insertions(+),  888 deletions(-)
     Non-test: 141 files changed, 2897 insertions(+), 3987 deletions(-)
     All:      154 files changed, 4879 insertions(+), 4875 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCadWPyxQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK8QCAQD0i98miI1mu01RKuEwrBzmn7L/2sUH
 ReYV/dFDtnN0GwD+KMCiNAM2XTVLRKq5t3OxPHpKZ4y+gZwRowAJeFA02Q8=
 =5rip
 -----END PGP SIGNATURE-----

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library updates from Eric Biggers:

 - Migrate more hash algorithms from the traditional crypto subsystem to
   lib/crypto/

   Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
   the implementations, improves performance, enables further
   simplifications in calling code, and solves various other issues:

     - AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)

         - Support these algorithms in lib/crypto/ using the AES library
           and the existing arm64 assembly code

         - Reimplement the traditional crypto API's "cmac(aes)",
           "xcbc(aes)", and "cbcmac(aes)" on top of the library

         - Convert mac80211 to use the AES-CMAC library. Note: several
           other subsystems can use it too and will be converted later

         - Drop the broken, nonstandard, and likely unused support for
           "xcbc(aes)" with key lengths other than 128 bits

         - Enable optimizations by default

     - GHASH

         - Migrate the standalone GHASH code into lib/crypto/

         - Integrate the GHASH code more closely with the very similar
           POLYVAL code, and improve the generic GHASH implementation to
           resist cache-timing attacks and use much less memory

         - Reimplement the AES-GCM library and the "gcm" crypto_aead
           template on top of the GHASH library. Remove "ghash" from the
           crypto_shash API, as it's no longer needed

         - Enable optimizations by default

     - SM3

         - Migrate the kernel's existing SM3 code into lib/crypto/, and
           reimplement the traditional crypto API's "sm3" on top of it

         - I don't recommend using SM3, but this cleanup is worthwhile
           to organize the code the same way as other algorithms

 - Testing improvements:

     - Add a KUnit test suite for each of the new library APIs

     - Migrate the existing ChaCha20Poly1305 test to KUnit

     - Make the KUnit all_tests.config enable all crypto library tests

     - Move the test kconfig options to the Runtime Testing menu

 - Other updates to arch-optimized crypto code:

     - Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine

     - Remove some MD5 implementations that are no longer worth keeping

     - Drop big endian and voluntary preemption support from the arm64
       code, as those configurations are no longer supported on arm64

 - Make jitterentropy and samples/tsm-mr use the crypto library APIs

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits)
  lib/crypto: arm64: Assume a little-endian kernel
  arm64: fpsimd: Remove obsolete cond_yield macro
  lib/crypto: arm64/sha3: Remove obsolete chunking logic
  lib/crypto: arm64/sha512: Remove obsolete chunking logic
  lib/crypto: arm64/sha256: Remove obsolete chunking logic
  lib/crypto: arm64/sha1: Remove obsolete chunking logic
  lib/crypto: arm64/poly1305: Remove obsolete chunking logic
  lib/crypto: arm64/gf128hash: Remove obsolete chunking logic
  lib/crypto: arm64/chacha: Remove obsolete chunking logic
  lib/crypto: arm64/aes: Remove obsolete chunking logic
  lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
  lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
  lib/crypto: aescfb: Don't disable IRQs during AES block encryption
  lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit
  lib/crypto: sparc: Drop optimized MD5 code
  lib/crypto: mips: Drop optimized MD5 code
  lib: Move crypto library tests to Runtime Testing menu
  crypto: sm3 - Remove 'struct sm3_state'
  crypto: sm3 - Remove the original "sm3_block_generic()"
  crypto: sm3 - Remove sm3_base.h
  ...
2026-04-13 17:31:39 -07:00
Linus Torvalds
7fe6ac157b for-7.1/block-20260411
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmna0tgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptEbD/0ZMEsz5pcN+/bpM9Qva5lVVkByRieua+JA
 T7L+JMcEigp1Hf2idAPlv1e9dbrtgOGhkjZNlbZenP2MHXBmbUTnzTWDKW5w0ZQ4
 UqnVC7fMmxzI57DPt7iG/1WQo8O6QPHWwBof5ZXn0b83qwByTB2oVkAb9ysT7CdM
 wGk5KnPRLIAWf5o+aZ4LoWE+196jQiszx1m6U58FTqnCgvJ/GyKyrgzx+uvGUgF+
 owZT/6TrN7cN9A68fOnmcjEZ7beZXygOQPTn32sF9rEOi8JsgK71EE2LofdVVSNU
 ES/tyKVJbSNDgUH2b0T84rErT4MtZcw5J29V3k7CVndC+DcT2uLSroPz3lYQjDg9
 TLeq7ZLjnyoBG+muboWdXcvBKn3aKLec3nfVSbz6J1xb/Z22gWYy5TZbrGnGH8fJ
 zBiyKkHMaZi55IdTDWQT3a48h36qFh0Y2wbvZ6uhyYOfXHyj4pA4ccJZgFfmf4ZG
 flVRFGEL9Tqc82lB8dfy9DBp0ZQSjeBUCd+gyDKjiuWVau5L5iTUeMMkt8yr7qbg
 PY+ATJcHk5S5zwM2xcZUt5EcHBBbCaKQ6DdRZKwzMMUvCjHlvnWvENVjUtRa9Dng
 1vUKpB/e5NGpqD05Iqgyai+OD9/tALc4sUEI2yQ7/dk9pKIXQ4RE9HR/pSkgbjeR
 LGokj08cgg==
 =ga3t
 -----END PGP SIGNATURE-----

Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
   copies between kernel and userspace by matching registered buffer
   PFNs at I/O time. Includes selftests.

 - Refactor bio integrity to support filesystem initiated integrity
   operations and arbitrary buffer alignment.

 - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
   and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
   unify synchronous bi_end_io callbacks.

 - Fix zone write plug refcount handling and plug removal races. Add
   support for serializing zone writes at QD=1 for rotational zoned
   devices, yielding significant throughput improvements.

 - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
   command.

 - Add io_uring passthrough (uring_cmd) support to the BSG layer.

 - Replace pp_buf in partition scanning with struct seq_buf.

 - zloop improvements and cleanups.

 - drbd genl cleanup, switching to pre_doit/post_doit.

 - NVMe pull request via Keith:
      - Fabrics authentication updates
      - Enhanced block queue limits support
      - Workqueue usage updates
      - A new write zeroes device quirk
      - Tagset cleanup fix for loop device

 - MD pull requests via Yu Kuai:
      - Fix raid5 soft lockup in retry_aligned_read()
      - Fix raid10 deadlock with check operation and nowait requests
      - Fix raid1 overlapping writes on writemostly disks
      - Fix sysfs deadlock on array_state=clear
      - Proactive RAID-5 parity building with llbitmap, with
        write_zeroes_unmap optimization for initial sync
      - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
        version mismatch fallback
      - Fix bcache use-after-free and uninitialized closure
      - Validate raid5 journal metadata payload size
      - Various cleanups

 - Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
  ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
  scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
  block: refactor blkdev_zone_mgmt_ioctl
  MAINTAINERS: update ublk driver maintainer email
  Documentation: ublk: address review comments for SHMEM_ZC docs
  ublk: allow buffer registration before device is started
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  xfs: use bio_await in xfs_zone_gc_reset_sync
  block: add a bio_submit_or_kill helper
  block: factor out a bio_await helper
  block: unify the synchronous bi_end_io callbacks
  xfs: fix number of GC bvecs
  selftests/ublk: add read-only buffer registration test
  selftests/ublk: add filesystem fio verify test for shmem_zc
  selftests/ublk: add hugetlbfs shmem_zc test for loop target
  selftests/ublk: add shared memory zero-copy test
  selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
  ...
2026-04-13 15:51:31 -07:00
Linus Torvalds
b8f82cb0d8 Landlock update for v7.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCadfgCxAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSl5cA/0QZjJ+4V2DVzJQM5qzmNK9He9uYaOs7F2Ks
 xRvg7IebAPwMEcVY+CVQxD+YGj08UgM753yx4CRbhsu4k5mowEEJDQ==
 =Lz9R
 -----END PGP SIGNATURE-----

Merge tag 'landlock-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull Landlock update from Mickaël Salaün:
 "This adds a new Landlock access right for pathname UNIX domain sockets
  thanks to a new LSM hook, and a few fixes"

* tag 'landlock-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (23 commits)
  landlock: Document fallocate(2) as another truncation corner case
  landlock: Document FS access right for pathname UNIX sockets
  selftests/landlock: Simplify ruleset creation and enforcement in fs_test
  selftests/landlock: Check that coredump sockets stay unrestricted
  selftests/landlock: Audit test for LANDLOCK_ACCESS_FS_RESOLVE_UNIX
  selftests/landlock: Test LANDLOCK_ACCESS_FS_RESOLVE_UNIX
  selftests/landlock: Replace access_fs_16 with ACCESS_ALL in fs_test
  samples/landlock: Add support for named UNIX domain socket restrictions
  landlock: Clarify BUILD_BUG_ON check in scoping logic
  landlock: Control pathname UNIX domain socket resolution by path
  landlock: Use mem_is_zero() in is_layer_masks_allowed()
  lsm: Add LSM hook security_unix_find
  landlock: Fix kernel-doc warning for pointer-to-array parameters
  landlock: Fix formatting in tsync.c
  landlock: Improve kernel-doc "Return:" section consistency
  landlock: Add missing kernel-doc "Return:" sections
  selftests/landlock: Fix format warning for __u64 in net_test
  selftests/landlock: Skip stale records in audit_match_record()
  selftests/landlock: Drain stale audit records on init
  selftests/landlock: Fix socket file descriptor leaks in audit helpers
  ...
2026-04-13 15:42:19 -07:00
Linus Torvalds
ef3da345cc vfs-7.1-rc1.misc
Please consider pulling these changes from the signed vfs-7.1-rc1.misc tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCwAKCRCRxhvAZXjc
 ohhBAQCAmQMlMRAXAgUZFYMTZpeQlcujP5rv+/vT2Tf/xS76YwD/dRDaw1FH294+
 qtk/Z1NjleNixzE2sld1K9J32NxeyAc=
 =+g9q
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:
   - coredump: add tracepoint for coredump events
   - fs: hide file and bfile caches behind runtime const machinery

  Fixes:
   - fix architecture-specific compat_ftruncate64 implementations
   - dcache: Limit the minimal number of bucket to two
   - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START
   - fs/mbcache: cancel shrink work before destroying the cache
   - dcache: permit dynamic_dname()s up to NAME_MAX

  Cleanups:
   - remove or unexport unused fs_context infrastructure
   - trivial ->setattr cleanups
   - selftests/filesystems: Assume that TIOCGPTPEER is defined
   - writeback: fix kernel-doc function name mismatch for wb_put_many()
   - autofs: replace manual symlink buffer allocation in autofs_dir_symlink
   - init/initramfs.c: trivial fix: FSM -> Finite-state machine
   - fs: remove stale and duplicate forward declarations
   - readdir: Introduce dirent_size()
   - fs: Replace user_access_{begin/end} by scoped user access
   - kernel: acct: fix duplicate word in comment
   - fs: write a better comment in step_into() concerning .mnt assignment
   - fs: attr: fix comment formatting and spelling issues"

* tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  dcache: permit dynamic_dname()s up to NAME_MAX
  fs: attr: fix comment formatting and spelling issues
  fs: hide file and bfile caches behind runtime const machinery
  fs: write a better comment in step_into() concerning .mnt assignment
  proc: rename proc_notify_change to proc_setattr
  proc: rename proc_setattr to proc_nochmod_setattr
  affs: rename affs_notify_change to affs_setattr
  adfs: rename adfs_notify_change to adfs_setattr
  hfs: update comments on hfs_inode_setattr
  kernel: acct: fix duplicate word in comment
  fs: Replace user_access_{begin/end} by scoped user access
  readdir: Introduce dirent_size()
  coredump: add tracepoint for coredump events
  fs: remove do_sys_truncate
  fs: pass on FTRUNCATE_* flags to do_truncate
  fs: fix archiecture-specific compat_ftruncate64
  fs: remove stale and duplicate forward declarations
  init/initramfs.c: trivial fix: FSM -> Finite-state machine
  autofs: replace manual symlink buffer allocation in autofs_dir_symlink
  fs/mbcache: cancel shrink work before destroying the cache
  ...
2026-04-13 14:20:11 -07:00
Linus Torvalds
07c3ef5822 vfs-7.1-rc1.pidfs
Please consider pulling these changes from the signed vfs-7.1-rc1.pidfs tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCwAKCRCRxhvAZXjc
 omfuAQDckt5g7vxBr9hKdyrq1//nsu44fst/mRqr2iSYjuKfPQD/VN6Lw9e56Y/q
 l4hHxsPPrSSxbijwng7im36iPIGdfwI=
 =BbFh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull clone and pidfs updates from Christian Brauner:
 "Add three new clone3() flags for pidfd-based process lifecycle
  management.

  CLONE_AUTOREAP:

     CLONE_AUTOREAP makes a child process auto-reap on exit without ever
     becoming a zombie. This is a per-process property in contrast to
     the existing auto-reap mechanism via SA_NOCLDWAIT or SIG_IGN for
     SIGCHLD which applies to all children of a given parent.

     Currently the only way to automatically reap children is to set
     SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped
     property affecting all children which makes it unsuitable for
     libraries or applications that need selective auto-reaping of
     specific children while still being able to wait() on others.

     CLONE_AUTOREAP stores an autoreap flag in the child's
     signal_struct. When the child exits do_notify_parent() checks this
     flag and causes exit_notify() to transition the task directly to
     EXIT_DEAD. Since the flag lives on the child it survives
     reparenting: if the original parent exits and the child is
     reparented to a subreaper or init the child still auto-reaps when
     it eventually exits. This is cleaner than forcing the subreaper to
     get SIGCHLD and then reaping it. If the parent doesn't care the
     subreaper won't care. If there's a subreaper that would care it
     would be easy enough to add a prctl() that either just turns back
     on SIGCHLD and turns off auto-reaping or a prctl() that just
     notifies the subreaper whenever a child is reparented to it.

     CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent
     to monitor the child's exit via poll() and retrieve exit status via
     PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
     pattern. No exit signal is delivered so exit_signal must be zero.
     CLONE_THREAD and CLONE_PARENT are rejected: CLONE_THREAD because
     autoreap is a process-level property, and CLONE_PARENT because an
     autoreap child reparented via CLONE_PARENT could become an
     invisible zombie under a parent that never calls wait().

     The flag is not inherited by the autoreap process's own children.
     Each child that should be autoreaped must be explicitly created
     with CLONE_AUTOREAP.

  CLONE_NNP:

     CLONE_NNP sets no_new_privs on the child at clone time. Unlike
     prctl(PR_SET_NO_NEW_PRIVS) which a process sets on itself,
     CLONE_NNP allows the parent to impose no_new_privs on the child at
     creation without affecting the parent's own privileges.
     CLONE_THREAD is rejected because threads share credentials.
     CLONE_NNP is useful on its own for any spawn-and-sandbox pattern
     but was specifically introduced to enable unprivileged usage of
     CLONE_PIDFD_AUTOKILL.

  CLONE_PIDFD_AUTOKILL:

     This flag ties a child's lifetime to the pidfd returned from
     clone3(). When the last reference to the struct file created by
     clone3() is closed the kernel sends SIGKILL to the child. A pidfd
     obtained via pidfd_open() for the same process does not keep the
     child alive and does not trigger autokill - only the specific
     struct file from clone3() has this property. This is useful for
     container runtimes, service managers, and sandboxed subprocess
     execution - any scenario where the child must die if the parent
     crashes or abandons the pidfd or just wants a throwaway helper
     process.

     CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD and CLONE_AUTOREAP.
     It requires CLONE_PIDFD because the whole point is tying the
     child's lifetime to the pidfd. It requires CLONE_AUTOREAP because a
     killed child with no one to reap it would become a zombie - the
     primary use case is the parent crashing or abandoning the pidfd so
     no one is around to call waitpid(). CLONE_THREAD is rejected
     because autokill targets a process not a thread.

     If CLONE_NNP is specified together with CLONE_PIDFD_AUTOKILL an
     unprivileged user may spawn a process that is autokilled. The child
     cannot escalate privileges via setuid/setgid exec after being
     spawned. If CLONE_PIDFD_AUTOKILL is specified without CLONE_NNP the
     caller must have have CAP_SYS_ADMIN in its user namespace"

* tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests: check pidfd_info->coredump_code correctness
  pidfds: add coredump_code field to pidfd_info
  kselftest/coredump: reintroduce null pointer dereference
  selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests
  selftests/pidfd: add CLONE_NNP tests
  selftests/pidfd: add CLONE_AUTOREAP tests
  pidfd: add CLONE_PIDFD_AUTOKILL
  clone: add CLONE_NNP
  clone: add CLONE_AUTOREAP
2026-04-13 13:27:11 -07:00
Linus Torvalds
c8db08110c vfs-7.1-rc1.xattr
Please consider pulling these changes from the signed vfs-7.1-rc1.xattr tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc
 olWpAQCf//do9MtHmqTjgVfMIv18vw5woI8Bdx/X3m5S9e2D9gD/QddDnkFiVMmu
 txsT+N8BGuW+jbAkYwax5LnUFGUTYgY=
 =a4gm
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs xattr updates from Christian Brauner:
 "This reworks the simple_xattr infrastructure and adds support for
  user.* extended attributes on sockets.

  The simple_xattr subsystem currently uses an rbtree protected by a
  reader-writer spinlock. This series replaces the rbtree with an
  rhashtable giving O(1) average-case lookup with RCU-based lockless
  reads. This sped up concurrent access patterns on tmpfs quite a bit
  and it's an overall easy enough conversion to do and gets rid or
  rwlock_t.

  The conversion is done incrementally: a new rhashtable path is added
  alongside the existing rbtree, consumers are migrated one at a time
  (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
  consumers switch from embedded structs to pointer-based lazy
  allocation so the rhashtable overhead is only paid for inodes that
  actually use xattrs.

  With this infrastructure in place the series adds support for user.*
  xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
  from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
  that is everything created via socket() including abstract namespace
  AF_UNIX sockets - had no xattr support at all.

  The xattr_permission() checks are reworked to allow user.* xattrs on
  S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
  128KB total value size matching the limits already in use for kernfs.

  The practical motivation comes from several directions. systemd and
  GNOME are expanding their use of Varlink as an IPC mechanism.

  For D-Bus there are tools like dbus-monitor that can observe IPC
  traffic across the system but this only works because D-Bus has a
  central broker.

  For Varlink there is no broker and there is currently no way to
  identify which sockets speak Varlink. With user.* xattrs on sockets a
  service can label its socket with the IPC protocol it speaks (e.g.,
  user.varlink=1) and an eBPF program can then selectively capture
  traffic on those sockets. Enumerating bound sockets via netlink
  combined with these xattr labels gives a way to discover all Varlink
  IPC entrypoints for debugging and introspection.

  Similarly, systemd-journald wants to use xattrs on the /dev/log socket
  for protocol negotiation to indicate whether RFC 5424 structured
  syslog is supported or whether only the legacy RFC 3164 format should
  be used.

  In containers these labels are particularly useful as high-privilege
  or more complicated solutions for socket identification aren't
  available.

  The series comes with comprehensive selftests covering path-based
  AF_UNIX sockets, sockfs socket operations, per-inode limit
  enforcement, and xattr operations across multiple address families
  (AF_INET, AF_INET6, AF_NETLINK, AF_PACKET)"

* tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/xattr: test xattrs on various socket families
  selftests/xattr: sockfs socket xattr tests
  selftests/xattr: path-based AF_UNIX socket xattr tests
  xattr: support extended attributes on sockets
  xattr,net: support limited amount of extended attributes on sockfs sockets
  xattr: move user limits for xattrs to generic infra
  xattr: switch xattr_permission() to switch statement
  xattr: add xattr_permission_error()
  xattr: remove rbtree-based simple_xattr infrastructure
  pidfs: adapt to rhashtable-based simple_xattrs
  kernfs: adapt to rhashtable-based simple_xattrs with lazy allocation
  shmem: adapt to rhashtable-based simple_xattrs with lazy allocation
  xattr: add rhashtable-based simple_xattr infrastructure
  xattr: add rcu_head and rhash_head to struct simple_xattr
2026-04-13 10:10:28 -07:00
Cao Ruichuang
f8e0a5a174 selftests/ftrace: Quote check_requires comparisons
check_requires() compares requirement strings that can contain shell
pattern characters such as '[' and ']'. Under /bin/sh, the unquoted
test expressions can emit 'unexpected operator' warnings while parsing
README-backed requirements.

Quote the relevant comparisons and path checks so the helper handles
those patterns without spurious shell warnings.

Validated by rerunning fprobe_syntax_errors.tc and confirming the
previous '/bin/sh: unexpected operator' lines disappear from the
detailed ftracetest log.

Signed-off-by: Cao Ruichuang <create0818@163.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20260408043212.8063-1-create0818@163.com
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-13 11:05:39 -06:00
Ricardo B. Marlière
7e47389142 selftests: Preserve subtarget failures in all/install
Track failures explicitly in the top-level selftests all/install loops.

The current code multiplies `ret` by each sub-make exit status. For
example, with `TARGETS=net`, the implicit `net/lib` dependency runs after
`net`, so a failed `net` build can be followed by a successful `net/lib`
build and reset the final result to success.

Set `ret` to 1 on any non-zero sub-make exit code and keep it sticky, so
the top-level make returns failure when any selected selftest target
fails.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Link: https://lore.kernel.org/r/20260320-selftests-fixes-v1-5-79144f76be01@suse.com
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-13 11:05:39 -06:00
Ricardo B. Marlière
d6ea9f404b selftests/run_kselftest.sh: Allow choosing per-test log directory
The --per-test-log option currently hard-codes /tmp. However, the system
under test will most likely have tmpfs mounted there. Since it's not clear
which filenames the log files will have, the user should be able to specify
a persistent directory to store the logs. Keeping those logs are important
because the run_kselftest.sh runner will only yield KTAP output, trimming
information that is otherwise available through running individual tests
directly.

Allow --per-test-log to take an optional directory argument. Keep the
existing behaviour when the option is passed without an argument, but if
a directory is provided, create it if needed, reject non-directory paths
and non-writable directories, canonicalize it, and have runner.sh write
per-test logs there instead of /tmp.

This also makes relative paths safe by resolving them before the runner
changes into a collection directory.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Link: https://lore.kernel.org/r/20260320-selftests-fixes-v1-4-79144f76be01@suse.com
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-13 11:05:39 -06:00
Ricardo B. Marlière
a82e076f4a selftests/run_kselftest.sh: Resolve BASE_DIR with pwd -P
run_kselftest.sh only needs to canonicalize the directory containing the
script itself. Use shell-native path resolution for that by changing into
the directory and calling pwd -P.

This avoids depending on either realpath or readlink -f while still
producing a physical absolute path for BASE_DIR.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Link: https://lore.kernel.org/r/20260320-selftests-fixes-v1-3-79144f76be01@suse.com
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-13 11:05:39 -06:00
Paolo Bonzini
6b80203187 - ESA nesting support
- 4k memslots
 - LPSW/E fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmncpfkACgkQ41TmuOI4
 ufj4ehAA0fTpaA4VdUbF/uH1o4BLu/hElPXhJYnyDa6hUK0XiFS6bpouz50wTMz/
 QjbmM+uCLKxVBK2FPE0cPj3iobvlfTTgP0tNkgwHDFlLfuZ9914cxYc4HYPrRJ/y
 Ey+6TT4ynkf2mihiLFHKKuBPi4DjfC3rAjy8ZHOnNh5ro+00uXVCGhssBUKvXNST
 X45q6JaN6p3eDVjC/ov/K593BJgMoW5x/kDmoyICuhDYs+8TiY+n+61BdVARKdtu
 3+vwkjQ/mrl+IwJMvfeH+nO2qnjREc6EZd9YTJOCheThhELw0tX4jeha4PldeeZY
 fg+8uObSmbzxcmsvWRGTuVpobEBpOqRP9sdADxF77dq1ExFXwthXFT8AQw8NzI2k
 leU8DQqXVUOkykmpvacV96AGlYrRWb47806TdVM+fJmLkvmt0llS/MK6fQNz+Jlb
 okFx1kLnqSKz7x0O6Avgz/+F6yjFAwTp7mwKmd8bHzKCkLCYq8Gl6WPxx/peFY0P
 dwEwq0k89Wld7gjkAXwtwjttIrQcwghacqBCJAu4cA/3NnM2DCAPf3gSiY1PoYPX
 06ZUYBzLH8wQJRZLToWpYvH9xOOfMmTETx7LDsYuMztxyesS+ReR/dVkCCei/2oD
 KeoGD0vBA0d8/wW+ZmB6YYxUiWT0WOllb/9s26NG/7lCTY1UgLI=
 =YTUc
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-7.1-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

- ESA nesting support
- 4k memslots
- LPSW/E fix
2026-04-13 19:01:15 +02:00
Paolo Bonzini
92cdeac6a4 KVM SVM changes for 7.1
- Fix and optimize IRQ window inhibit handling for AVIC (the tracking needs to
    be per-vCPU, e.g. so that KVM doesn't prematurely re-enable AVIC if multiple
    vCPUs have to-be-injected IRQs).
 
  - Fix an undefined behavior warning where a crafty userspace can read the
    "avic" module param before it's fully initialized.
 
  - Fix a (likely benign) bug in the "OS-visible workarounds" handling, where
    KVM could clobber state when enabling virtualization on multiple CPUs in
    parallel, and clean up and optimize the code.
 
  - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a
    "too large" size based purely on user input, and clean up and harden the
    related pinning code.
 
  - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as
    doing so for an SNP guest will trigger an RMP violation #PF and crash the
    host.
 
  - Protect all of sev_mem_enc_register_region() with kvm->lock to ensure
    sev_guest() is stable for the entire of the function.
 
  - Lock all vCPUs when synchronizing VMSAs for SNP guests to ensure the VMSA
    page isn't actively being used.
 
  - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are
    required to hold kvm->lock (KVM has had multiple bugs due "is SEV?" checks
    becoming stale), enforced by lockdep.  Add and use vCPU-scoped APIs when
    possible/appropriate, as all checks that originate from a vCPU are
    guaranteed to be stable.
 
  - Convert a pile of kvm->lock SEV code to guard().
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmnZK4wACgkQOlYIJqCj
 N/2uOQ/+LzGQD7myCn47rUhiMo/aY3qjrS+u6PSuFeEMFyaATiWpf/s50hIMHh+/
 VCRAptKgL0PBV/RbOqhZdx4Zn/Yb/NNBwraqc7xQgMOlQwFedOetuFtRveJ4z6Af
 8ycwMxYYtz6SbaT+R3AdK51Nb8S2ZRpd082CiaLgChVcdodkeFtS5KVBqrlBGB21
 EKFbW+QXMHrpmGbgZ8YWMrL5UCSmJFG8ZztcncNfsLS6WxbUjdo/MEiLEDIsrXZd
 oGViwmnY7hcJ5ClcF8UMPtXHHP1+EOk6BKAsmYguG3qUxbX+EEbymb8o16k+h6iw
 ybUZWF7cq44Pl1FModTFAB5LQPg6z6XNhjZ8L+0kjAI05lvszf3QDtezQ+BF24tW
 S18x6yCIpdEJ3VxM4r5Yqf10CRbxMtHKU6EUjL7C4KNNYOz2sX+Tqgi/uHtbgzUJ
 zPG9faY5M3hMjfj5tOCpy/fAEF3fD1mg4GE8pfXZa8d/ppqI4hU0ASpFzw/d4LnH
 PJSaeJhmmEIdRj+RtIGIRSZ9flHM61/+clKngaoR+c/mPQPnDbapivl2kgKWbVJ4
 47c44pYQLTWI01nuwcEILCEj8D1mABJygPjNoO79b2mitmYazMnO42mV3lI5oP0c
 QyzX7sSed6ImIRn8xadfE+tIz3ji9r/ak+ekZvdNiqiNEoi2YG8=
 =AjgE
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-7.1' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 7.1

 - Fix and optimize IRQ window inhibit handling for AVIC (the tracking needs to
   be per-vCPU, e.g. so that KVM doesn't prematurely re-enable AVIC if multiple
   vCPUs have to-be-injected IRQs).

 - Fix an undefined behavior warning where a crafty userspace can read the
   "avic" module param before it's fully initialized.

 - Fix a (likely benign) bug in the "OS-visible workarounds" handling, where
   KVM could clobber state when enabling virtualization on multiple CPUs in
   parallel, and clean up and optimize the code.

 - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a
   "too large" size based purely on user input, and clean up and harden the
   related pinning code.

 - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as
   doing so for an SNP guest will trigger an RMP violation #PF and crash the
   host.

 - Protect all of sev_mem_enc_register_region() with kvm->lock to ensure
   sev_guest() is stable for the entire of the function.

 - Lock all vCPUs when synchronizing VMSAs for SNP guests to ensure the VMSA
   page isn't actively being used.

 - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are
   required to hold kvm->lock (KVM has had multiple bugs due "is SEV?" checks
   becoming stale), enforced by lockdep.  Add and use vCPU-scoped APIs when
   possible/appropriate, as all checks that originate from a vCPU are
   guaranteed to be stable.

 - Convert a pile of kvm->lock SEV code to guard().
2026-04-13 19:00:43 +02:00
Linus Torvalds
28483203f7 RCU changes for v7.1
NOCB CPU management:
 
 - Consolidate rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() to reduce
   code duplication.
 - Extract nocb_bypass_needs_flush() helper to reduce duplication in NOCB
   bypass path.
 
 rcutorture/torture infrastructure:
 
 - Add NOCB01 config for RCU_LAZY torture testing.
 - Add NOCB02 config for NOCB poll mode testing.
 - Add TRIVIAL-PREEMPT config for textbook-style preemptible RCU torture.
 - Test call_srcu() with preemption both disabled and enabled.
 - Remove kvm-check-branches.sh in favor of kvm-series.sh.
 - Make hangs more visible in torture.sh output.
 - Add informative message for tests without a recheck file.
 - Fix numeric test comparison in srcu_lockdep.sh.
 - Use torture_shutdown_init() in refscale and rcuscale instead of open-coded
   shutdown functions.
 - Fix modulo-zero error in torture_hrtimeout_ns().
 
 SRCU:
 
 - Fix SRCU read flavor macro comments.
 - Fix s/they disables/they disable/ typo in srcu_read_unlock_fast().
 
 RCU Tasks:
 
 - Document that RCU Tasks Trace grace periods now imply RCU grace periods.
 - Remove unnecessary smp_store_release() in cblist_init_generic().
 
 RCU stall:
 
 - Add BOOTPARAM_RCU_STALL_PANIC Kconfig option to allow triggering a kernel
   panic on RCU stall via kernel boot parameter.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEcoCIrlGe4gjE06JJqA4nf2o45hAFAmnMBEAWHGpvZWxhZ25l
 bGZAbnZpZGlhLmNvbQAKCRCoDid/ajjmEBCdD/4rUNIocKBmSJWHFeDzl8rvMGSx
 RISjeaj8m+HUsQRSq7hecygUkSbGEQuj8KwFDJOsmuzhMRg3hR7yEz1AEphQiiIR
 TN0IdGblQd57KqAk6sHsXA5mNjSWZfJQuOkvHsMxIJdhhDRDHPTGtaCBwlYoLc6b
 de31LCvCfRL1QgiNAznTlPLRyehI9/+6VASLvKbmi3sQQv9B/KBBWydkryvMJbfW
 uc05SUn2qQq2h/NNeYvo6RllXWMPqO4/0ll8Y71ltwFhg8JdS9qXDSFX5cWlLISA
 uBA4pC9bk0xi7RGE5v4n9+fNNTr64Tjtd9QoxFvd/Jb6jdKDGjabwojoAYXCsdNw
 r8Dp3fKwGofruhUqgO6LyHyG54Ro2paVYjsqr5HW9C6jalZ9+HmH5wOnKw2h5E/i
 VRAM6MpwQ869ZeHhDxF8meRKf3+E+UIR95qfNZkABcFnwxgXheT+RASP/UOnoTM7
 Zexuzp9c6GQu34MTt0Rz9vNfJta+ZmPlnccan1T3DgoHvcWip7IDhUD1Vqerd9/4
 VK4JeeKI0t/fR5ydjs5qiA/O5caIFqtQTD0Ag2vLGQiP72pe2lRroIjD3xNZ2bWS
 DN4k6ZCxR2hdPUcWzdYlGe2pYmvAlqCBxZ4fNgrs55uP2EuqW+Y8aJd9RwvRPb2B
 wmTZepOh7Ct9+cjTBA==
 =wjsx
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2026.03.31a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Joel Fernandes:
 "NOCB CPU management:

   - Consolidate rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() to
     reduce code duplication

   - Extract nocb_bypass_needs_flush() helper to reduce duplication in
     NOCB bypass path

  rcutorture/torture infrastructure:

   - Add NOCB01 config for RCU_LAZY torture testing

   - Add NOCB02 config for NOCB poll mode testing

   - Add TRIVIAL-PREEMPT config for textbook-style preemptible RCU
     torture

   - Test call_srcu() with preemption both disabled and enabled

   - Remove kvm-check-branches.sh in favor of kvm-series.sh

   - Make hangs more visible in torture.sh output

   - Add informative message for tests without a recheck file

   - Fix numeric test comparison in srcu_lockdep.sh

   - Use torture_shutdown_init() in refscale and rcuscale instead of
     open-coded shutdown functions

   - Fix modulo-zero error in torture_hrtimeout_ns().

  SRCU:

   - Fix SRCU read flavor macro comments

   - Fix s/they disables/they disable/ typo in srcu_read_unlock_fast()

  RCU Tasks:

   - Document that RCU Tasks Trace grace periods now imply RCU grace
     periods

   - Remove unnecessary smp_store_release() in cblist_init_generic()"

* tag 'rcu.2026.03.31a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  rcutorture: Test call_srcu() with preemption disabled and not
  rcu: Add BOOTPARAM_RCU_STALL_PANIC Kconfig option
  torture: Avoid modulo-zero error in torture_hrtimeout_ns()
  rcu/nocb: Extract nocb_bypass_needs_flush() to reduce duplication
  rcu/nocb: Consolidate rcu_nocb_cpu_offload/deoffload functions
  rcu-tasks: Remove unnecessary smp_store_release() in cblist_init_generic()
  rcutorture: Add NOCB02 config for nocb poll mode testing
  rcutorture: Add NOCB01 config for RCU_LAZY torture testing
  rcu-tasks: Document that RCU Tasks Trace grace periods now imply RCU grace periods
  srcu: Fix s/they disables/they disable/ typo in srcu_read_unlock_fast()
  srcu: Fix SRCU read flavor macro comments
  rcuscale: Ditch rcu_scale_shutdown in favor of torture_shutdown_init()
  refscale: Ditch ref_scale_shutdown in favor of torture_shutdown_init()
  rcutorture: Fix numeric "test" comparison in srcu_lockdep.sh
  torture: Print informative message for test without recheck file
  torture: Make hangs more visible in torture.sh output
  kvm-check-branches.sh: Remove in favor of kvm-series.sh
  rcutorture: Add a textbook-style trivial preemptible RCU
2026-04-13 09:36:45 -07:00
Marc Harvey
d3870724eb selftests: net: Add tests for team driver decoupled tx and rx control
Use ping and tcpdump to verify that independent rx and tx enablement
of team driver member interfaces works as intended.

Signed-off-by: Marc Harvey <marcharvey@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260409-teaming-driver-internal-v7-10-f47e7589685d@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-13 15:09:49 +02:00
Marc Harvey
10407eebe8 selftests: net: Add test for enablement of ports with teamd
There are no tests that verify enablement and disablement of team driver
ports with teamd. This should work even with changes to the enablement
option, so it is important to test.

This test sets up an active-backup network configuration across two
network namespaces, and tries to send traffic while changing which
link is the active one.

Also increase the team test timeout to 300 seconds, because gracefully
killing teamd can take 30 seconds for each instance.

Signed-off-by: Marc Harvey <marcharvey@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260409-teaming-driver-internal-v7-5-f47e7589685d@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-13 15:09:49 +02:00
Marc Harvey
05e352444b selftests: net: Add tests for failover of team-aggregated ports
There are currently no kernel tests that verify the effect of setting
the enabled team driver option. In a followup patch, there will be
changes to this option, so it will be important to make sure it still
behaves as it does now.

The test verifies that tcp continues to work across two different team
devices in separate network namespaces, even when member links are
manually disabled.

Signed-off-by: Marc Harvey <marcharvey@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260409-teaming-driver-internal-v7-4-f47e7589685d@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-13 15:09:49 +02:00
Paolo Bonzini
ea8bc95fbb KVM nested SVM changes for 7.1 (with one common x86 fix)
- To minimize the probability of corrupting guest state, defer KVM's
    non-architectural delivery of exception payloads (e.g. CR2 and DR6) until
    consumption of the payload is imminent, and force delivery of the payload
    in all paths where userspace saves relevant state.
 
  - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT to fix a
    bug where L2's CR2 can get corrupted after a save/restore, e.g. if the VM
    is migrated while L2 is faulting in memory.
 
  - Fix a class of nSVM bugs where some fields written by the CPU are not
    synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not
    up-to-date when saved by KVM_GET_NESTED_STATE.
 
  - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and
    KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after
    save+restore.
 
  - Add a variety of missing nSVM consistency checks.
 
  - Fix several bugs where KVM failed to correctly update VMCB fields on nested
    #VMEXIT.
 
  - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for
    SVM-related instructions.
 
  - Add support for save+restore of virtualized LBRs (on SVM).
 
  - Refactor various helpers and macros to improve clarity and (hopefully) make
    the code easier to maintain.
 
  - Aggressively sanitize fields when copying from vmcb12 to guard against
    unintentionally allowing L1 to utilize yet-to-be-defined features.
 
  - Fix several bugs where KVM botched rAX legality checks when emulating SVM
    instructions.  Note, KVM is still flawed in that KVM doesn't address size
    prefix overrides for 64-bit guests; this should probably be documented as a
    KVM erratum.
 
  - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of
    somewhat arbitrarily synthesizing #GP (i.e. don't bastardize AMD's already-
    sketchy behavior of generating #GP if for "unsupported" addresses).
 
  - Cache all used vmcb12 fields to further harden against TOCTOU bugs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmnZfbwACgkQOlYIJqCj
 N/0pVRAAkys8LLtIekQtEVkaX3EPaXk0lGGmnzXbihgHFsS5lMAS4tcsr7oyk4TI
 rvJUGmkaTKTboQdTaCq0G7lwCu5hMuXsZ10WvmKfivMFxy3kSppqfffux5zVXng2
 U/8oyJSorkX1WPC7d5QAZYMqqcSwQaR+a0FxowghGWBXMRHylerSuH00CiGr6Ron
 QQbZaKBNtkYwYFNos2tLuT4tueyFogk8FPAmdejEQ9CMxUjeAivlKm8JVXaDvGik
 lyPYbJJLukjuxSYGYmeRyGLLwK7VBGkFHQp/KBYSBgzGdweabhsQa1Z0CGm24+w1
 q626W0sxsq97dZ0cd7oE6Cw+AdlMBK+mjpxB9gX4uLGyYlnFkdJV7OSlHVTR9d96
 cqKduT0JvlBnVb7Yd5jyaGVl1YD62p0nwcrTuWidR5IJ16b4mYwwPzvkkQKHLt64
 VAhH8lBVtATtblI9gfsbwGezV74xXnuLb0L1G7xeh1VIWu7pubFdqyRwIA+qiXQa
 OkyxzoDlFl+QF2Uf3cBCFMojBOrSZRiGiLzIkUnjBsN4N2uOPYTsQEfr9BXVVcv7
 obT9xl/wUwry2fAJhUL+IBCDE42+8C62UaWT5KJHQLttBL7Mm06e75hFN5ObbE/x
 nExL+NmAcsSUUbbdojjnD0KWxYKkosNiONBVrjqqXdmBjmzzOvI=
 =ys7N
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-nested-7.1' of https://github.com/kvm-x86/linux into HEAD

KVM nested SVM changes for 7.1 (with one common x86 fix)

 - To minimize the probability of corrupting guest state, defer KVM's
   non-architectural delivery of exception payloads (e.g. CR2 and DR6) until
   consumption of the payload is imminent, and force delivery of the payload
   in all paths where userspace saves relevant state.

 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT to fix a
   bug where L2's CR2 can get corrupted after a save/restore, e.g. if the VM
   is migrated while L2 is faulting in memory.

 - Fix a class of nSVM bugs where some fields written by the CPU are not
   synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not
   up-to-date when saved by KVM_GET_NESTED_STATE.

 - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and
   KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after
   save+restore.

 - Add a variety of missing nSVM consistency checks.

 - Fix several bugs where KVM failed to correctly update VMCB fields on nested
   #VMEXIT.

 - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for
   SVM-related instructions.

 - Add support for save+restore of virtualized LBRs (on SVM).

 - Refactor various helpers and macros to improve clarity and (hopefully) make
   the code easier to maintain.

 - Aggressively sanitize fields when copying from vmcb12 to guard against
   unintentionally allowing L1 to utilize yet-to-be-defined features.

 - Fix several bugs where KVM botched rAX legality checks when emulating SVM
   instructions.  Note, KVM is still flawed in that KVM doesn't address size
   prefix overrides for 64-bit guests; this should probably be documented as a
   KVM erratum.

 - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of
   somewhat arbitrarily synthesizing #GP (i.e. don't bastardize AMD's already-
   sketchy behavior of generating #GP if for "unsupported" addresses).

 - Cache all used vmcb12 fields to further harden against TOCTOU bugs.
2026-04-13 13:01:50 +02:00
Paolo Bonzini
c13008ed3d KVM selftests changes for 7.1
- Add support for Hygon CPUs in KVM selftests.
 
  - Fix a bug in the MSR test where it would get false failures on AMD/Hygon
    CPUs with exactly one of RDPID or RDTSCP.
 
  - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a
    bug where the kernel would attempt to collapse guest_memfd folios against
    KVM's will.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmnZJ9oACgkQOlYIJqCj
 N/1vxBAAkpV4HCG6Q5BV/2jxnRLDwRllg8YOAQk1p+DttmOraNe0PI8zYSzZG+f1
 Np8HnWwjhEiQZLlBpcTpab937mlaCK45xr1CPLfqIvAD16sIZIt0/g/DDIOH+IkQ
 NJGSyQySVBoo50ln8WzC3vDmxaVKZntFzhilRYY+g+Kgo+mgAnuLHM1grPwM/oVU
 q621DGQHDeJzeovMNy+bJoZg75AybZIV+GvBlF1/pZXUMkZp7K7z8NJeilKcptBb
 vCeIfNwDSdPZ5zTPfAoPhts90IGkIdgsmVhG3/j29OApbiADj5Wgcdgae96BkYPD
 hleeQXrTNNDfQKYhHdNkl+d+9Ab/j4/dQ5gAwtciF+LkgT7HPs+q6t9qNAYAJvRS
 c7FJNZPQqtVr1chbc3nI7FOBJwK+R9UY9biHR1DE1Uwfpuh+fjhCMiSFr8MktSWO
 GQ6ZgTvXbkwqYsjGJNCCn03egkj+2B4i18De0j8Lzmrz0FVv1Y1WIQZ+vaEwoF9g
 hzpGvyMarqJR2QGezthHGhjO6eRbeZkTq9Ya1t6NYfNQIvkfy86nl6b5CUJjZ4/1
 Xj7mNeOsfRO/Ez+LwO60jXsUI7YGmcFKOpivhI5QtUCwEqj8NkJ4xmYerB1K8E8L
 5llsPETeWxGm4FDf/hadmLy9FBzE7Gltd9/oDHPKdE6wunvsAiI=
 =aySp
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests-7.1' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 7.1

 - Add support for Hygon CPUs in KVM selftests.

 - Fix a bug in the MSR test where it would get false failures on AMD/Hygon
   CPUs with exactly one of RDPID or RDTSCP.

 - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a
   bug where the kernel would attempt to collapse guest_memfd folios against
   KVM's will.
2026-04-13 11:53:46 +02:00
Paolo Bonzini
e74c3a8891 KVM/arm64 updates for 7.1
* New features:
 
 - Add support for tracing in the standalone EL2 hypervisor code,
   which should help both debugging and performance analysis.
   This comes with a full infrastructure for 'remote' trace buffers
   that can be exposed by non-kernel entities such as firmware.
 
 - Add support for GICv5 Per Processor Interrupts (PPIs), as the
   starting point for supporting the new GIC architecture in KVM.
 
 - Finally add support for pKVM protected guests, with anonymous
   memory being used as a backing store. About time!
 
 * Improvements and bug fixes:
 
 - Rework the dreaded user_mem_abort() function to make it more
   maintainable, reducing the amount of state being exposed to
   the various helpers and rendering a substantial amount of
   state immutable.
 
 - Expand the Stage-2 page table dumper to support NV shadow
   page tables on a per-VM basis.
 
 - Tidy up the pKVM PSCI proxy code to be slightly less hard
   to follow.
 
 - Fix both SPE and TRBE in non-VHE configurations so that they
   do not generate spurious, out of context table walks that
   ultimately lead to very bad HW lockups.
 
 - A small set of patches fixing the Stage-2 MMU freeing in error
   cases.
 
 - Tighten-up accepted SMC immediate value to be only #0 for host
   SMCCC calls.
 
 - The usual cleanups and other selftest churn.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmnWdswACgkQI9DQutE9
 ekNYvBAAxj5Zmsx8sJ2CYDTJc2w4XkEjSgDugA+J/s0TMgrzExeBlWCstdhVTncy
 68nwOjQl3TotnIrt7q36kko9u7IdD0pHNrk34NtlggLjHfB61n9SNcAA6j4F6zJa
 GFkHpJSrSnZuUPqapkDnlyhuPkgTIAkEUk2Am9siksSfY4HvRyHZJm2FTdxsdIBn
 NN9wvQqw2wefTXOQ8gS+oHbPVp1cPbwrF2a3EhzXXv/6W3mUBstXgsijgo07UzCp
 W6vHCv2wqHbHdf67z3Q3hL+VXlVH6oHlyW99/swqISvqRkH/iSB90+oUojnMRrSm
 yB6Wmhh8jboCaajWMJhG+veZw+7GMXU4nOrGd1rbnY8cwRl/TQ5YibhRm7DIdvjO
 xeUluTLJ0NdweQUwE2k4OlgKOuGang3E2p0clmkUO4SstA48MdqR/kpST6guIlWw
 U5syuNaaaiuwP5QOi9qZmMCNmQ3ZfnZG3nseJFdoyGjhVhf5jyQyv4Du9vGZQFF/
 Zkg7yTqC4OWiC+3GkW9YYAySM1MyetivLtd47PGzHPTdtaZziWhNvQ0y+8QjQ+R+
 CJNvyS/DvsT7epSya4sLgMP1ZAlih9xkz5sQ6k8NJLBYYXi0v33qwqditErgLLyj
 S4Ci4WNhHHWIusvCVM7JUBkH0AElpmi506f7F6iHoFLlkYR4t9U=
 =/SuQ
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 7.1

* New features:

- Add support for tracing in the standalone EL2 hypervisor code,
  which should help both debugging and performance analysis.
  This comes with a full infrastructure for 'remote' trace buffers
  that can be exposed by non-kernel entities such as firmware.

- Add support for GICv5 Per Processor Interrupts (PPIs), as the
  starting point for supporting the new GIC architecture in KVM.

- Finally add support for pKVM protected guests, with anonymous
  memory being used as a backing store. About time!

* Improvements and bug fixes:

- Rework the dreaded user_mem_abort() function to make it more
  maintainable, reducing the amount of state being exposed to
  the various helpers and rendering a substantial amount of
  state immutable.

- Expand the Stage-2 page table dumper to support NV shadow
  page tables on a per-VM basis.

- Tidy up the pKVM PSCI proxy code to be slightly less hard
  to follow.

- Fix both SPE and TRBE in non-VHE configurations so that they
  do not generate spurious, out of context table walks that
  ultimately lead to very bad HW lockups.

- A small set of patches fixing the Stage-2 MMU freeing in error
  cases.

- Tighten-up accepted SMC immediate value to be only #0 for host
  SMCCC calls.

- The usual cleanups and other selftest churn.
2026-04-13 11:49:54 +02:00
Paolo Bonzini
05578316ca LoongArch KVM changes for v7.1
1. Use CSR_CRMD_PLV in kvm_arch_vcpu_in_kernel().
 2. Let vcpu_is_preempted() a macro & some enhanments.
 3. Add DMSINTC irqchip in kernel support.
 4. Add KVM PMU test cases for tools/selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmnXiPMWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImehVBD/4v8Y6S4Sxkc/EBUDKbPwLGGhGR
 aZEe2dzHr10C/mx7Q2cYwjKhR9bpgPcBe0xiEomxAopLwK15qMai2mnNRX6SJA8P
 00J/3xWpKR6XsgsMv2KF9XvqdT1SlnzOC04D2v/wbkjlebWaCRIZgWG7yoRTHRIj
 TOpzf7XFBOnNpuzg94DjXsgAlSOo0qbHAMGMgbQ3k7OKzomAIlD4ljCyPD+JdvCz
 T7jW7n4Nho1SoOYPeWwXyxbIeorgtRB3JQ8RakMCjkJYyChICe1BGXJ66qeTLizd
 G5GOhiePtU5LLXQlRUU/uOLmxsJ5jZjJWs3tfsQOFz9f2i8JmF5nSw3DqmpTaQSF
 IF3v+3Iu9o+1dUBPsZVUjPWORWuRSFrXnnrUF3JPBZazXPwJHq8Gvbt3z6QFE8RO
 Z+Z9zDDcVrSWfJkYV3uHocPPnkCNTcIUdT2QFAWZYkBQCVlbbKET43dY0MbeFR9R
 n+mQcQVJOfp/a5oXwQyiiov6c67JX9yTT8wCB3tyPJVsLsiCOR8hN9UHSiQBc+Sx
 TLCuSkt0uVgwkTEM+pnJqLofRZGc9A6z8RPubwCgyxJp3+YPX5d3FtgHYcdNmAfK
 fQ2ILp7K0L52FcjVSr3uV8QacqUMhxLknODdjBhBcU0sh2V7yJPd9zoJq41xjtLy
 e8PC1D6NHGneLiuCCA==
 =zbdN
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v7.1

1. Use CSR_CRMD_PLV in kvm_arch_vcpu_in_kernel().
2. Let vcpu_is_preempted() a macro & some enhanments.
3. Add DMSINTC irqchip in kernel support.
4. Add KVM PMU test cases for tools/selftests.
2026-04-13 11:46:11 +02:00
Paolo Bonzini
d880d2a9c6 KVM/riscv changes for 7.1
- Fix steal time shared memory alignment checks
  - Fix vector context allocation leak
  - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
  - Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
  - Fix integer overflow in kvm_pmu_validate_counter_mask()
  - Fix shift-out-of-bounds in make_xfence_request()
  - Fix lost write protection on huge pages during dirty logging
  - Split huge pages during fault handling for dirty logging
  - Skip CSR restore if VCPU is reloaded on the same core
  - Implement kvm_arch_has_default_irqchip() for KVM selftests
  - Factored-out ISA checks into separate sources
  - Added hideleg to struct kvm_vcpu_config
  - Factored-out VCPU config into separate sources
  - Support configuration of per-VM HGATP mode from KVM user space
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmnbi74ACgkQrUjsVaLH
 LAejGw/3XRqexEOrxJ74GvAylGtGkQRuTw003mhIBIyshosYw6PiOSHEuAu7TQLc
 N074wt0QSfwbmQmeNaa0q3gIqb7Sp6gC0Eidurv1zHXuKSaAGbppnKD0VOZibtm6
 CK4+HQqBXFxf2mMeJSQX7+EOWNO+rf2jfw80c3SiKTnPE8mtb8Xfn3G6Zw22UmBZ
 gOrDrW4IijNKNkrBItb8V1IJgsfFIdUY+1Il2n1MSRuuqQL+tJcmHWXEPs7GdfMf
 9siV7asCfhKdf6xDys/Px42DBgQxLASG72Q8X2cESCxkO1kDplJZAt1AitAGfzXL
 bk20uAWikO0j7/Su93pWDOwMxRqb8c9dIMrnyRsCpmx1ovWA/odEnkrhslS1lafN
 hpUr6DWmTrE+2I6PW9UgBRAbleMomtK411fLhZh28PSyvp42sxG8841PwUNd/hFP
 lfcFO/ksJS7HFxQK5RaGZaMUuSMsgtZAu5P7zGGZm9uQh60eaHRF1vFcA+c5GAHe
 hibOegHjdYMkcLFqbVJDegU6a5+kohO3I8R21WXItp+VQjGvezOWuTDJtcR+jPAJ
 RNe4R1QeRgbsIwRuP9+fm6QnLeTPmBfxkCPFS1FvzrimLWskcQxqPWUq+eDvNDp+
 EdnE1KPjPsTykWq2baiG70+0xnetHP4ZGIrWwOQNOREJPEGOqg==
 =LbnE
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-7.1-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 7.1

 - Fix steal time shared memory alignment checks
 - Fix vector context allocation leak
 - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
 - Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
 - Fix integer overflow in kvm_pmu_validate_counter_mask()
 - Fix shift-out-of-bounds in make_xfence_request()
 - Fix lost write protection on huge pages during dirty logging
 - Split huge pages during fault handling for dirty logging
 - Skip CSR restore if VCPU is reloaded on the same core
 - Implement kvm_arch_has_default_irqchip() for KVM selftests
 - Factored-out ISA checks into separate sources
 - Added hideleg to struct kvm_vcpu_config
 - Factored-out VCPU config into separate sources
 - Support configuration of per-VM HGATP mode from KVM user space
2026-04-13 11:42:26 +02:00
Sun Jian
f1cc94665d selftests/bpf: cover short IPv4/IPv6 inputs with adjust_room
Add a selftest covering ETH_HLEN-sized IPv4/IPv6 EtherType inputs for
bpf_prog_test_run_skb().

Reuse a single zero-initialized struct ethhdr eth_hlen and set
eth_hlen.h_proto from the per-test h_proto field.

Also add a dedicated tc_adjust_room program and route the short
IPv4/IPv6 cases to it, so the selftest actually exercises the
bpf_skb_adjust_room() path from the report.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260408034623.180320-3-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-12 15:42:57 -07:00
Alexei Starovoitov
47687a29b2
selftests/bpf: Use memfd_create instead of shm_open in cgroup_iter_memcg
Replace shm_open/shm_unlink with memfd_create in the shmem subtest.
shm_open requires /dev/shm to be mounted, which is not always available
in test environments, causing the test to fail with ENOENT.
memfd_create creates an anonymous shmem-backed fd without any filesystem
dependency while exercising the same shmem accounting path.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20260412210636.47516-1-alexei.starovoitov@gmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2026-04-12 23:46:32 +02:00
Alexei Starovoitov
fa2942918a Merge patch series "bpf: Fix OOB in pcpu_init_value and add a test"
xulang <xulang@uniontech.com> says:
====================

Fix OOB read when copying element from a BPF_MAP_TYPE_CGROUP_STORAGE
map to another pcpu map with the same value_size that is not rounded
up to 8 bytes, and add a test case to reproduce the issue.

The root cause is that pcpu_init_value() uses copy_map_value_long() which
rounds up the copy size to 8 bytes, but CGROUP_STORAGE map values are not
8-byte aligned (e.g., 4-byte). This causes a 4-byte OOB read when
the copy is performed.
====================

Link: https://lore.kernel.org/r/7653EEEC2BAB17DF+20260402073948.2185396-1-xulang@uniontech.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-12 13:36:55 -07:00
Lang Xu
171609f047 selftests/bpf: Add test for cgroup storage OOB read
Add a test case to reproduce the out-of-bounds read issue when copying
from a cgroup storage map to a pcpu map with a value_size not rounded
up to 8 bytes.

The test creates:
1. A CGROUP_STORAGE map with 4-byte value (not 8-byte aligned)
2. A LRU_PERCPU_HASH map with 4-byte value (same size)

When a socket is created in the cgroup, the BPF program triggers
bpf_map_update_elem() which calls copy_map_value_long(). This function
rounds up the copy size to 8 bytes, but the cgroup storage buffer is
only 4 bytes, causing an OOB read (before the fix).

Signed-off-by: Lang Xu <xulang@uniontech.com>
Link: https://lore.kernel.org/r/D63BF0DBFF1EA122+20260402074236.2187154-2-xulang@uniontech.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-12 13:36:32 -07:00
Paul Chaignon
2fefa9c81a selftests/bpf: Fix reg_bounds to match new tnum-based refinement
Commit efc11a6678 ("bpf: Improve bounds when tnum has a single
possible value") improved the bounds refinement to detect when the tnum
and u64 range overlap in a single value (and the bounds can thus be set
to that value).

Eduard then noticed that it broke the slow-mode reg_bounds selftests
because they don't have an equivalent logic and are therefore unable to
refine the bounds as much as the verifier. The following test case
illustrates this.

  ACTUAL   TRUE1:  scalar(u64=0xffffffff00000000,u32=0,s64=0xffffffff00000000,s32=0)
  EXPECTED TRUE1:  scalar(u64=[0xfffffffe00000001; 0xffffffff00000000],u32=0,s64=[0xfffffffe00000001; 0xffffffff00000000],s32=0)
  [...]
  #323/1007 reg_bounds_gen_consts_s64_s32/(s64)[0xfffffffe00000001; 0xffffffff00000000] (s32)<op> S64_MIN:FAIL

with the verifier logs:

  [...]
  19: w0 = w6                 ; R0=scalar(smin=0,smax=umax=0xffffffff,
                                          var_off=(0x0; 0xffffffff))
                                R6=scalar(smin=0xfffffffe00000001,smax=0xffffffff00000000,
                                          umin=0xfffffffe00000001,umax=0xffffffff00000000,
                                          var_off=(0xfffffffe00000000; 0x1ffffffff))
  20: w0 = w7                 ; R0=0 R7=0x8000000000000000
  21: if w6 == w7 goto pc+3
  [...]
  from 21 to 25: [...]
  25: w0 = w6                 ; R0=0 R6=0xffffffff00000000
                              ;         ^
                              ;         unexpected refined value
  26: w0 = w7                 ; R0=0 R7=0x8000000000000000
  27: exit

When w6 == w7 is true, the verifier can deduce that the R6's tnum is
equal to (0xfffffffe00000000; 0x100000000) and then use that information
to refine the bounds: the tnum only overlap with the u64 range in
0xffffffff00000000. The reg_bounds selftest doesn't know about tnums
and therefore fails to perform the same refinement.

This issue happens when the tnum carries information that cannot be
represented in the ranges, as otherwise the selftest could reach the
same refined value using just the ranges. The tnum thus needs to
represent non-contiguous values (ex., R6's tnum above, after the
condition). The only way this can happen in the reg_bounds selftest is
at the boundary between the 32 and 64bit ranges. We therefore only need
to handle that case.

This patch fixes the selftest refinement logic by checking if the u32
and u64 ranges overlap in a single value. If so, the ranges can be set
to that value. We need to handle two cases: either they overlap in
umin64...

  u64 values
  matching u32 range:     xxx        xxx        xxx        xxx
                      |--------------------------------------|
  u64 range:          0                xxxxx                 UMAX64

or in umax64:

  u64 values
  matching u32 range:     xxx        xxx        xxx        xxx
                      |--------------------------------------|
  u64 range:          0          xxxxx                       UMAX64

To detect the first case, we decrease umax64 to the maximum value that
matches the u32 range. If that happens to be umin64, then umin64 is the
only overlap. We proceed similarly for the second case, increasing
umin64 to the minimum value that matches the u32 range.

Note this is similar to how the verifier handles the general case using
tnum, but we don't need to care about a single-value overlap in the
middle of the range. That case is not possible when comparing two
ranges.

This patch also adds two test cases reproducing this bug as part of the
normal test runs (without SLOW_TESTS=1).

Fixes: efc11a6678 ("bpf: Improve bounds when tnum has a single possible value")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Closes: https://lore.kernel.org/bpf/4e6dd64a162b3cab3635706ae6abfdd0be4db5db.camel@gmail.com/
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ada9UuSQi2SE2IfB@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-12 13:15:31 -07:00
Emil Tsalapatis
4c5f21d4df selftests/bpf: Add tests for non-arena/arena operations
Add a selftest that ensures instructions with arena source and
non-arena destination registers are accepted by the verifier.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20260412174546.18684-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-12 12:48:11 -07:00
Menglong Dong
9fd19e3ed7 bpf: add missing fsession to the verifier log
The fsession attach type is missed in the verifier log in
check_get_func_ip(), bpf_check_attach_target() and check_attach_btf_id().
Update them to make the verifier log proper. Meanwhile, update the
corresponding selftests.

Acked-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260412060346.142007-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-12 12:42:38 -07:00
Jiayuan Chen
04013c3ca0 selftests/bpf: Add tests for sock_ops ctx access with same src/dst register
Add selftests to verify SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() correctly
return NULL/zero when dst_reg == src_reg and is_fullsock == 0.

Three subtests are included:
 - get_sk: ctx->sk with same src/dst register (SOCK_OPS_GET_SK)
 - get_field: ctx->snd_cwnd with same src/dst register (SOCK_OPS_GET_FIELD)
 - get_sk_diff_reg: ctx->sk with different src/dst register (baseline)

Each BPF program uses inline asm (__naked) to force specific register
allocation, reads is_fullsock first, then loads the field using the same
(or different) register. The test triggers TCP_NEW_SYN_RECV via a TCP
handshake and checks that the result is NULL/zero when is_fullsock == 0.

Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260407022720.162151-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12 12:28:05 -07:00
Joe Damato
5d3b12d1a2 selftests: drv-net: Add USO test
Add a simple test for USO. Tests both ipv4 and ipv6 with several full
segments and a partial segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-11-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12 10:54:33 -07:00
Florian Westphal
6111954266 selftests: netfilter: nft_tproxy.sh: adjust to socat changes
Like e65d8b6f30 ("selftests: drv-net: adjust to socat changes") we
need to add shut-none for this test too.

The extra 0-packet can trigger a second (unexpected) reply from the server.

Fixes: 7e37e0eacd ("selftests: netfilter: nft_tproxy.sh: add tcp tests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260408152432.24b8ad0d@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260409224506.27072-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12 09:41:16 -07:00
Jakub Kicinski
e46ff213f7 selftests: net: py: add test case filtering and listing
When developing new test cases and reproducing failures in
existing ones we currently have to run the entire test which
can take minutes to finish.

Add command line options for test selection, modeled after
kselftest_harness.h:

  -l       list tests (filtered, if filters were specified)
  -t name  include test
  -T name  exclude test

Since we don't have as clean separation into fixture / variant /
test as kselftest_harness this is not really a 1 to 1 match.
We have to lean on glob patterns instead.

Like in kselftest_harness filters are evaluated in order, first
match wins. If only exclusions are specified everything else is
included and vice versa.

Glob patterns (*, ?, [) are supported in addition to exact
matching.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260410013921.1710295-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12 09:09:09 -07:00
Alexei Starovoitov
ae3f8ca2ba bpf: Move checks for reserved fields out of the main pass
Check reserved fields of each insn once in a prepass
instead of repeatedly rechecking them during the main verifier pass.

Link: https://lore.kernel.org/r/20260411200932.41797-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-11 15:24:41 -07:00
Eduard Zingerman
335a6ca041 selftests/bpf: inline TEST_TAG constants in test_loader.c
After str_has_pfx() refactoring each TEST_TAG_* / TEST_BTF_PATH
constant is used exactly once. Since constant definitions are not
shared between BPF-side bpf_misc.h and userspace side test_loader.c,
there is no need in the additional redirection layer.

Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-selftests-global-tags-ordering-v2-4-c566ec9781bf@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-11 07:17:06 -07:00
Eduard Zingerman
713db9fd03 selftests/bpf: impose global ordering for test decl_tags
Impose global ordering for all decl tags used by test_loader.c based
tests (__success, __failure, __msg, etc):
- change every tag to expand as
  __attribute__((btf_decl_tag("comment:" XSTR(__COUNTER__) ...)))
- change parse_test_spec() to collect all decl tags before
  processing and sort them using strverscmp().

The ordering is necessary for gcc-bpf.
Neither GCC nor the C standard defines the order in which function
attributes are consumed. While Clang tends to preserve definition order,
GCC may process them out of sequence. This inconsistency causes BPF
tests with multiple __msg entries to fail when compiled with GCC.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-selftests-global-tags-ordering-v2-3-c566ec9781bf@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-11 07:17:06 -07:00
Eduard Zingerman
5160e584c3 selftests/bpf: make str_has_pfx return pointer past the prefix
Change str_has_pfx() to return a pointer to the first character after
the prefix, thus eliminating the repetitive (s + sizeof(PFX) - 1)
patterns.

Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-selftests-global-tags-ordering-v2-2-c566ec9781bf@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-11 07:17:06 -07:00
Eduard Zingerman
cdd54fe98c selftests/bpf: fix __jited_unpriv tag name
__jited_unpriv was using "test_jited=" as its tag name, same as the
priv variant __jited. Fix by using "test_jited_unpriv=".

Fixes: 7d743e4c75 ("selftests/bpf: __jited test tag to check disassembly after jit")
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-selftests-global-tags-ordering-v2-1-c566ec9781bf@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-11 07:17:06 -07:00
Amery Hung
78ee02a966 selftests/bpf: Remove kmalloc tracing from local storage create bench
Remove the raw_tp/kmalloc BPF program and its associated reporting from
the local storage create benchmark. The kmalloc count per create is not
a useful metric as different code paths use different allocators (e.g.
kmalloc_nolock vs kzalloc), introducing noise that makes the number
hard to interpret.

Keep total_creates in the summary output as it is useful for normalizing
perf statistics collected alongside the benchmark.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260411015419.114016-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 21:22:31 -07:00
Daniel Borkmann
497fa510ee selftests/bpf: Add test for add_const base_id consistency
Add a test to verifier_linked_scalars that exercises the base_id
consistency check for BPF_ADD_CONST linked scalars during state
pruning.

With the fix, pruning fails and the verifier discovers the true
branch's R3 is too wide for the stack access.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
  [...]
  #613/22  verifier_linked_scalars/scalars_stale_delta_from_cleared_id:OK
  #613/23  verifier_linked_scalars/scalars_stale_delta_from_cleared_id_alu32:OK
  #613/24  verifier_linked_scalars/linked scalars: add_const base_id must be consistent for pruning:OK
  #613     verifier_linked_scalars:OK
  Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260410232651.559778-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 17:39:09 -07:00
Linus Torvalds
e774d5f1bc RISC-V updates for v7.0-rc8
Before v7.0 is released, fix a few issues with the CFI patchset,
 merged earlier in v7.0-rc, that primarily affect interfaces to
 non-kernel code:
 
 - Improve the prctl() interface for per-task indirect branch landing
   pad control to expand abbreviations and to resemble the speculation
   control prctl() interface
 
 - Expand the "LP" and "SS" abbreviations in the ptrace uapi header
   file to "branch landing pad" and "shadow stack", to improve
   readability
 
 - Fix a typo in a CFI-related macro name in the ptrace uapi header
   file
 
 - Ensure that the indirect branch tracking state and shadow stack
   state are unlocked immediately after an exec() on the new task so
   that libc subsequently can control it
 
 - While working in this area, clean up the kernel-internal,
   cross-architecture prctl() function names by expanding the
   abbreviations mentioned above
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmnYP5YACgkQx4+xDQu9
 KkuoPQ//Yye5D+35EqfA12yP96Vrtg0QCKiqMotz3yLo0T7zh5KosAs/QIE5eQi7
 vWRnCld5PsFa0ZS2822oPfQo8pKVO1y7M2ecFWSwaOWq865Xs82M/puqEQF3GFCS
 219cg1dTVBGvvKSf4MINUBRprfZmZRT9pzhSk79qHEbHKzwCDk7uah51iUdyPJyd
 KX3hshYMLq3rooTHR2wD/ChTpV+pCrt2rSUVbW8+sTUWDfv2sTLauHmemKw7LpdW
 C0SulXvcYkGyiqsB5AXW9x2ttJ5hX9diPb73XS6eBCU0CaMl9BVZWNKeqhEMJxKR
 wmqIadD8pelf7Jh7wGAbNW4hWqTsO3xRpZH38Y/cGLdhs3cqvKjEmT3fOFWUP9bP
 hWv5027gVXVSOmvxhPiUJs7D5WWAz4Q64JZfdJSmDdEWVXcI0v/hzdukuPw4iiT6
 DaqOyClTcwc+j1jawFTICXTF7wXfvZT5sjulrmPk1HX4nZ5padKpfQ77AdKHF9Q6
 9pC25QHQk42h/R4ynA4lm15YnCOfYvjP25hU7K64gQnqO6qBrolfrA4kJOmdYv/g
 1IXsA2YZafJbcXwyFZjWy50uu5gaCM5JhRRFdUrjmB6j3gv9HfBlWJXQywReUjPo
 Kq4tnFppxzFVm23COj9j5kyjsFjUhZ8KCft3+n7lrndeOCk5Z3E=
 =5/Ct
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:
 "Before v7.0 is released, fix a few issues with the CFI patchset,
  merged earlier in v7.0-rc, that primarily affect interfaces to
  non-kernel code:

   - Improve the prctl() interface for per-task indirect branch landing
     pad control to expand abbreviations and to resemble the speculation
     control prctl() interface

   - Expand the "LP" and "SS" abbreviations in the ptrace uapi header
     file to "branch landing pad" and "shadow stack", to improve
     readability

   - Fix a typo in a CFI-related macro name in the ptrace uapi header
     file

   - Ensure that the indirect branch tracking state and shadow stack
     state are unlocked immediately after an exec() on the new task so
     that libc subsequently can control it

   - While working in this area, clean up the kernel-internal,
     cross-architecture prctl() function names by expanding the
     abbreviations mentioned above"

* tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  prctl: cfi: change the branch landing pad prctl()s to be more descriptive
  riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headers
  prctl: rename branch landing pad implementation functions to be more explicit
  riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headers
  riscv: cfi: clear CFI lock status in start_thread()
  riscv: ptrace: cfi: fix "PRACE" typo in uapi header
2026-04-10 17:27:08 -07:00
Andy Roulin
20ae6d76e3 selftests: net: add bridge STP mode selection test
Add a selftest for the IFLA_BR_STP_MODE bridge attribute that verifies:

1. stp_mode defaults to auto on new bridges
2. stp_mode can be toggled between user, kernel, and auto
3. Changing stp_mode while STP is active is rejected with -EBUSY
4. Re-setting the same stp_mode while STP is active succeeds
5. stp_mode user in a network namespace yields userspace STP (stp_state=2)
6. stp_mode kernel forces kernel STP (stp_state=1)
7. stp_mode auto in a netns preserves traditional fallback to kernel STP
8. stp_mode and stp_state can be set atomically in a single message
9. stp_mode persists across STP disable/enable cycles

Test 5 is the key use case: it demonstrates that userspace STP can now
be enabled in non-init network namespaces by setting stp_mode to user
before enabling STP.

Test 8 verifies the atomic usage pattern where both attributes are set
in a single netlink message, which is supported because br_changelink()
processes IFLA_BR_STP_MODE before IFLA_BR_STP_STATE.

The test gracefully skips if the installed iproute2 does not support
the stp_mode attribute.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Link: https://patch.msgid.link/20260405205224.3163000-4-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10 15:52:25 -07:00
Dimitri Daskalakis
a66374a3eb selftests: drv-net: ntuple: Add dst-ip, src-port, dst-port fields
Extend the ntuple flow steering test to cover dst-ip, src-port, and
dst-port fields. The test supports arbitrary combinations of the fields,
for now we test src_ip/dst_ip, and src_ip/dst_ip/src_port/dst_port.

The tests currently match full fields, but we can consider adding
support for masked fields in the future.

 TAP version 13
 1..24
 ok 1 ntuple.queue.tcp4.src_ip
 ok 2 ntuple.queue.tcp4.dst_ip
 ok 3 ntuple.queue.tcp4.src_port
 ok 4 ntuple.queue.tcp4.dst_port
 ok 5 ntuple.queue.tcp4.src_ip.dst_ip
 ok 6 ntuple.queue.tcp4.src_ip.dst_ip.src_port.dst_port
 ok 7 ntuple.queue.udp4.src_ip
 ok 8 ntuple.queue.udp4.dst_ip
 ok 9 ntuple.queue.udp4.src_port
 ok 10 ntuple.queue.udp4.dst_port
 ok 11 ntuple.queue.udp4.src_ip.dst_ip
 ok 12 ntuple.queue.udp4.src_ip.dst_ip.src_port.dst_port
 ok 13 ntuple.queue.tcp6.src_ip
 ok 14 ntuple.queue.tcp6.dst_ip
 ok 15 ntuple.queue.tcp6.src_port
 ok 16 ntuple.queue.tcp6.dst_port
 ok 17 ntuple.queue.tcp6.src_ip.dst_ip
 ok 18 ntuple.queue.tcp6.src_ip.dst_ip.src_port.dst_port
 ok 19 ntuple.queue.udp6.src_ip
 ok 20 ntuple.queue.udp6.dst_ip
 ok 21 ntuple.queue.udp6.src_port
 ok 22 ntuple.queue.udp6.dst_port
 ok 23 ntuple.queue.udp6.src_ip.dst_ip
 ok 24 ntuple.queue.udp6.src_ip.dst_ip.src_port.dst_port
 # Totals: pass:24 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260407164954.2977820-3-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10 15:32:11 -07:00
Dimitri Daskalakis
18589df934 selftests: drv-net: Add ntuple (NFC) flow steering test
Add a test for ethtool NFC (ntuple) flow steering rules. The test
creates an ntuple rule matching on various flow fields and verifies
that traffic is steered to the correct queue.

The test forces all traffic to queue 0 via the indirection table,
then installs an ntuple rule to steer select traffic to a specific
queue. The test then verifies the expected number of packets is received
on the queue.

This test has variants for TCP/UDP over IPv4/IPv6, with rules matching
the source IP. Additional match fields will be added in the next commit.

 TAP version 13
 1..4
 ok 1 ntuple.queue.tcp4.src_ip
 ok 2 ntuple.queue.udp4.src_ip
 ok 3 ntuple.queue.tcp6.src_ip
 ok 4 ntuple.queue.udp6.src_ip
 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260407164954.2977820-2-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10 15:32:11 -07:00
Alexei Starovoitov
2cb27158ad bpf: poison dead stack slots
As a sanity check poison stack slots that stack liveness determined
to be dead, so that any read from such slots will cause program rejection.
If stack liveness logic is incorrect the poison can cause
valid program to be rejected, but it also will prevent unsafe program
to be accepted.

Allow global subprogs "read" poisoned stack slots.
The static stack liveness determined that subprog doesn't read certain
stack slots, but sizeof(arg_type) based global subprog validation
isn't accurate enough to know which slots will actually be read by
the callee, so it needs to check full sizeof(arg_type) at the caller.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-patch-set-v4-14-5d4eecb343db@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 15:13:38 -07:00
Alexei Starovoitov
27417e5eb9 selftests/bpf: add new tests for static stack liveness analysis
Add a bunch of new tests to verify the static stack
liveness analysis.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-patch-set-v4-13-5d4eecb343db@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 15:13:38 -07:00
Alexei Starovoitov
957c30c067 selftests/bpf: adjust verifier_log buffers
The new liveness analysis in liveness.c adds verbose output at
BPF_LOG_LEVEL2, making the verifier log for good_prog exceed the 1024-byte
reference buffer. When the reference is truncated in fixed mode, the
rolling mode captures the actual tail of the full log, which doesn't match
the truncated reference.

The fix is to increase the buffer sizes in the test.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-patch-set-v4-12-5d4eecb343db@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 15:13:38 -07:00
Alexei Starovoitov
b42eb55f6c selftests/bpf: update existing tests due to liveness changes
The verifier cleans all dead registers and stack slots in the current
state. Adjust expected output in tests or insert dummy stack/register
reads. Also update verifier_live_stack tests to adhere to new logging
scheme.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260410-patch-set-v4-11-5d4eecb343db@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 15:13:37 -07:00
Alan Maguire
0e4dc6fbdd selftests/bpf: Add BTF sanitize test covering BTF layout
Add test that fakes up a feature cache of supported BPF
features to simulate an older kernel that does not support
BTF layout information.  Ensure that BTF is sanitized correctly
to remove layout info between types and strings, and that all
offsets and lengths are adjusted appropriately.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260408165735.843763-3-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 12:34:36 -07:00
Venkat Rao Bagalkote
aacee214d5 selftests/bpf: Remove test_access_variable_array
test_access_variable_array relied on accessing struct sched_domain::span
to validate variable-length array handling via BTF. Recent scheduler
refactoring removed or hid this field, causing the test
to fail to build.

Given that this test depends on internal scheduler structures that are
subject to refactoring, and equivalent variable-length array coverage
already exists via bpf_testmod-based tests, remove
test_access_variable_array entirely.

Link: https://lore.kernel.org/all/177434340048.1647592.8586759362906719839.tip-bot2@tip-bot2/

Signed-off-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Tested-by: Naveen Kumar Thummalapenta <naveen66@linux.ibm.com>
Link: https://lore.kernel.org/r/20260410105404.91126-1-venkat88@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10 12:32:53 -07:00
Catalin Marinas
480a9e57cc Merge branches 'for-next/misc', 'for-next/tlbflush', 'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core
* arm64/for-next/perf:
  : Perf updates
  perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc()
  perf/arm-cmn: Fix incorrect error check for devm_ioremap()
  perf: add NVIDIA Tegra410 C2C PMU
  perf: add NVIDIA Tegra410 CPU Memory Latency PMU
  perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
  perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
  perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
  perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
  perf/arm_cspmu: nvidia: Rename doc to Tegra241
  perf/arm-cmn: Stop claiming entire iomem region
  arm64: cpufeature: Use pmuv3_implemented() function
  arm64: cpufeature: Make PMUVer and PerfMon unsigned
  KVM: arm64: Read PMUVer as unsigned

* arm64/for-next/read-once:
  : Fixes for __READ_ONCE() with CONFIG_LTO=y
  arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  arm64: Optimize __READ_ONCE() with CONFIG_LTO=y

* for-next/misc:
  : Miscellaneous cleanups/fixes
  arm64: rsi: use linear-map alias for realm config buffer
  arm64: Kconfig: fix duplicate word in CMDLINE help text
  arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
  arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
  arm64: kexec: Remove duplicate allocation for trans_pgd
  arm64: mm: Use generic enum pgtable_level
  arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0
  arm64: remove ARCH_INLINE_*

* for-next/tlbflush:
  : Refactor the arm64 TLB invalidation API and implementation
  arm64: mm: __ptep_set_access_flags must hint correct TTL
  arm64: mm: Provide level hint for flush_tlb_page()
  arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range()
  arm64: mm: More flags for __flush_tlb_range()
  arm64: mm: Refactor __flush_tlb_range() to take flags
  arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid()
  arm64: mm: Simplify __flush_tlb_range_limit_excess()
  arm64: mm: Simplify __TLBI_RANGE_NUM() macro
  arm64: mm: Re-implement the __flush_tlb_range_op macro in C
  arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range()
  arm64: mm: Push __TLBI_VADDR() into __tlbi_level()
  arm64: mm: Implicitly invalidate user ASID based on TLBI operation
  arm64: mm: Introduce a C wrapper for by-range TLB invalidation
  arm64: mm: Re-implement the __tlbi_level macro as a C function

* for-next/ttbr-macros-cleanup:
  : Cleanups of the TTBR1_* macros
  arm64/mm: Directly use TTBRx_EL1_CnP
  arm64/mm: Directly use TTBRx_EL1_ASID_MASK
  arm64/mm: Describe TTBR1_BADDR_4852_OFFSET

* for-next/kselftest:
  : arm64 kselftest updates
  selftests/arm64: Implement cmpbr_sigill() to hwcap test

* for-next/feat_lsui:
  : Futex support using FEAT_LSUI instructions to avoid toggling PAN
  arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
  arm64: Kconfig: Add support for LSUI
  KVM: arm64: Use CAST instruction for swapping guest descriptor
  arm64: futex: Support futex with FEAT_LSUI
  arm64: futex: Refactor futex atomic operation
  KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI
  KVM: arm64: Expose FEAT_LSUI to guests
  arm64: cpufeature: Add FEAT_LSUI

* for-next/mpam: (40 commits)
  : Expose MPAM to user-space via resctrl:
  :  - Add architecture context-switch and hiding of the feature from KVM.
  :  - Add interface to allow MPAM to be exposed to user-space using resctrl.
  :  - Add errata workaoround for some existing platforms.
  :  - Add documentation for using MPAM and what shape of platforms can use resctrl
  arm64: mpam: Add initial MPAM documentation
  arm_mpam: Quirk CMN-650's CSU NRDY behaviour
  arm_mpam: Add workaround for T241-MPAM-6
  arm_mpam: Add workaround for T241-MPAM-4
  arm_mpam: Add workaround for T241-MPAM-1
  arm_mpam: Add quirk framework
  arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
  arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
  arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
  arm_mpam: resctrl: Update the rmid reallocation limit
  arm_mpam: resctrl: Add resctrl_arch_rmid_read()
  arm_mpam: resctrl: Allow resctrl to allocate monitors
  arm_mpam: resctrl: Add support for csu counters
  arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
  arm_mpam: resctrl: Add kunit test for control format conversions
  arm_mpam: resctrl: Add support for 'MB' resource
  arm_mpam: resctrl: Wait for cacheinfo to be ready
  arm_mpam: resctrl: Add rmid index helpers
  arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
  arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT
  ...

* for-next/hotplug-batched-tlbi:
  : arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
  arm64/mm: Reject memory removal that splits a kernel leaf mapping
  arm64/mm: Enable batched TLB flush in unmap_hotplug_range()

* for-next/bbml2-fixes:
  : Fixes for realm guest and BBML2_NOABORT
  arm64: mm: Remove pmd_sect() and pud_sect()
  arm64: mm: Handle invalid large leaf mappings correctly
  arm64: mm: Fix rodata=full block mapping support for realm guests

* for-next/sysreg:
  : arm64 sysreg updates
  arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06

* for-next/generic-entry:
  : More arm64 refactoring towards using the generic entry code
  arm64: Check DAIF (and PMR) at task-switch time
  arm64: entry: Use split preemption logic
  arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  arm64: entry: Consistently prefix arm64-specific wrappers
  arm64: entry: Don't preempt with SError or Debug masked
  entry: Split preemption from irqentry_exit_to_kernel_mode()
  entry: Split kernel mode logic from irqentry_{enter,exit}()
  entry: Move irqentry_enter() prototype later
  entry: Remove local_irq_{enable,disable}_exit_to_user()
  entry: Fix stale comment for irqentry_enter()

* for-next/acpi:
  : arm64 ACPI updates
  ACPI: AGDI: fix missing newline in error message
2026-04-10 14:22:24 +01:00
fangqiurong
dcd47f27c0 selftests/sched_ext: Fix wrong DSQ ID in peek_dsq error message
The error path after scx_bpf_create_dsq(real_dsq_id, ...) was reporting
test_dsq_id instead of real_dsq_id in the error message, which would
mislead debugging.

Signed-off-by: fangqiurong <fangqiurong@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-09 23:00:44 -10:00
Jakub Kicinski
3d2c3d2eea selftests: net: py: explicitly forbid multiple ksft_run() calls
People (do people still write code or is it all AI?) seem to not
get that ksft_run() can only be called once. If we call it
multiple times KTAP parsers will likely cut off after the first
batch has finished.

Link: https://patch.msgid.link/20260408221952.819822-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 20:38:33 -07:00
Cosmin Ratiu
26555673bc selftests: Add MACsec VLAN propagation traffic test
Add VLAN filter propagation tests through offloaded MACsec devices via
actual traffic.

The tests create MACsec tunnels with matching SAs on both endpoints,
stack VLANs on top, and verify connectivity with ping. Covered:
- Offloaded MACsec with VLAN (filters propagate to HW)
- Software MACsec with VLAN (no HW filter propagation)
- Offload on/off toggle and verifying traffic still works

On netdevsim this makes use of the VLAN filter debugfs file to actually
validate that filters are applied/removed correctly.
On real hardware the traffic should validate actual VLAN filter
propagation.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260408115240.1636047-4-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 19:38:42 -07:00
Cosmin Ratiu
e1ab601bb2 selftests: Migrate nsim-only MACsec tests to Python
Move MACsec offload API and ethtool feature tests from
tools/testing/selftests/drivers/net/netdevsim/macsec-offload.sh to
tools/testing/selftests/drivers/net/macsec.py using the NetDrvEnv
framework so tests can run against both netdevsim (default) and real
hardware (NETIF=ethX). As some real hardware requires MACsec to use
encryption, add that to the tests.

Netdevsim-specific limit checks (max SecY, max RX SC) were moved into
separate test cases to avoid failures on real hardware.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260408115240.1636047-2-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 19:38:41 -07:00
Jakub Kicinski
1508922588 Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'
Daniel Borkmann says:

====================
netkit: Support for io_uring zero-copy and AF_XDP

Containers use virtual netdevs to route traffic from a physical netdev
in the host namespace. They do not have access to the physical netdev
in the host and thus can't use memory providers or AF_XDP that require
reconfiguring/restarting queues in the physical netdev.

This patchset adds the concept of queue leasing to virtual netdevs that
allow containers to use memory providers and AF_XDP at native speed.
Leased queues are bound to a real queue in a physical netdev and act
as a proxy.

Memory providers and AF_XDP operations take an ifindex and queue id,
so containers would pass in an ifindex for a virtual netdev and a queue
id of a leased queue, which then gets proxied to the underlying real
queue.

We have implemented support for this concept in netkit and tested the
latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
(bnxt_en) 100G NICs. For more details see the individual patches.
====================

Link: https://patch.msgid.link/20260402231031.447597-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 18:24:35 -07:00
David Wei
65d657d806 selftests/net: Add queue leasing tests with netkit
Add extensive selftests for netkit queue leasing, using io_uring zero
copy test binary inside of a netns with netkit. This checks that memory
providers can be bound against virtual queues in a netkit within a
netns that are leasing from a physical netdev in the default netns.
Also add various test cases around corner cases for the queue creation
itself as well as queue info dumping and teardown in case of netkit in
device pair and single mode.

Signed-off-by: David Wei <dw@davidwei.uk>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260402231031.447597-15-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 18:21:47 -07:00
Thomas Weißschuh
b070dc3629 selftests/nolibc: use gcc 15
Newer compilers tend to detect more problematic code.

Update the testsuite to use gcc 15.2.0 by default.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260408-nolibc-gcc-15-v1-3-330d0c40f894@weissschuh.net
2026-04-09 23:25:45 +02:00
Jakub Kicinski
b6e39e4846 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-7.0-rc8).

Conflicts:

net/ipv6/seg6_iptunnel.c
  c3812651b5 ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel")
  78723a62b9 ("seg6: add per-route tunnel source address")
https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk

net/ipv4/icmp.c
  fde29fd934 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()")
  d98adfbdd5 ("ipv4: drop ipv6_stub usage and use direct function calls")
https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/chain_mode.c
  51f4e090b9 ("net: stmmac: fix integer underflow in chain mode")
  6b4286e055 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09 13:20:59 -07:00
Daniel Borkmann
8697bdd67b selftests/bpf: Add test for stale pkt range after scalar arithmetic
Extend the verifier_direct_packet_access BPF selftests to exercise the
verifier code paths which ensure that the pkt range is cleared after
add/sub alu with a known scalar. The tests reject the invalid access.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_direct
  [...]
  #592/35  verifier_direct_packet_access/direct packet access: pkt_range cleared after sub with known scalar:OK
  #592/36  verifier_direct_packet_access/direct packet access: pkt_range cleared after add with known scalar:OK
  #592/37  verifier_direct_packet_access/direct packet access: test3:OK
  #592/38  verifier_direct_packet_access/direct packet access: test3 @unpriv:OK
  #592/39  verifier_direct_packet_access/direct packet access: test34 (non-linear, cgroup_skb/ingress, too short eth):OK
  #592/40  verifier_direct_packet_access/direct packet access: test35 (non-linear, cgroup_skb/ingress, too short 1):OK
  #592/41  verifier_direct_packet_access/direct packet access: test36 (non-linear, cgroup_skb/ingress, long enough):OK
  #592     verifier_direct_packet_access:OK
  [...]
  Summary: 2/47 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260409155016.536608-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-09 13:11:31 -07:00
Linus Torvalds
a55f7f5f29 Including fixes from netfilter, IPsec and wireless. This is again
considerably bigger than the old average. No known outstanding
 regressions.
 
 Current release - regressions:
 
   - net: increase IP_TUNNEL_RECURSION_LIMIT to 5
 
   - eth: ice: fix PTP timestamping broken by SyncE code on E825C
 
 Current release - new code bugs:
 
   - eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure
 
 Previous releases - regressions:
 
   - core: fix cross-cache free of KFENCE-allocated skb head
 
   - sched: act_csum: validate nested VLAN headers
 
   - rxrpc: fix call removal to use RCU safe deletion
 
   - xfrm:
     - wait for RCU readers during policy netns exit
     - fix refcount leak in xfrm_migrate_policy_find
 
   - wifi: rt2x00usb: fix devres lifetime
 
   - mptcp: fix slab-use-after-free in __inet_lookup_established
 
   - ipvs: fix NULL deref in ip_vs_add_service error path
 
   - eth: airoha: fix memory leak in airoha_qdma_rx_process()
 
   - eth: lan966x: fix use-after-free and leak in lan966x_fdma_reload()
 
 Previous releases - always broken:
 
   - ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data()
 
   - ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump
 
   - bridge: guard local VLAN-0 FDB helpers against NULL vlan group
 
   - xsk: tailroom reservation and MTU validation
 
   - rxrpc:
     - fix to request an ack if window is limited
     - fix RESPONSE authenticator parser OOB read
 
   - netfilter: nft_ct: fix use-after-free in timeout object destroy
 
   - batman-adv: hold claim backbone gateways by reference
 
   - eth: stmmac: fix PTP ref clock for Tegra234
 
   - eth: idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling
 
   - eth: ipa: fix GENERIC_CMD register field masks for IPA v5.0+
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmnXtnsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkZeYQAKfZCL4rCkeO7VuoZn8lMN4YrBqVphuU
 MFpLKnvU8muDamBSmXGwpsdryrzQdUtEl0C7E/YyKO8TKpmFkjQRKe/Ay5XSsmJi
 fqjQiZIC9TKgVbJJbQZ4yZqOO2EZXHMRx8awnDjIwIrSLTyJtD29XaJqvmm+rojw
 uAVECbXpVOWdRVyIgHf0N3y99ItvwQycv6npjXWGHDryGVH1uXz4CiWFgltFd827
 MgNx5gZ7wn6ls1B4E1EsIXZeCnVOoNMUBX+CtkSl7ctZD/nvqLZ0PqGEViqGZ+w7
 kEK9jWWvsmST3j0wG4IldbnQJORZrDXR5lAmvOJILxUDD4jG4zaqHPYs4ELS5sHK
 E1QOs6uNBNvu40neGe7zcH4DpQzv5/W5yj0ELPBZJhV/5madjEpETOh6yO7EJRBl
 sdd32LD0z8wFt8yJGEbXM7YC4A8tzNagWF0wKpRqbiKFlWHdJffwqcmEe6+2CiXx
 rg0q2DAfvTesmzdMgGuk4ZOeczfZ9JbxPYA0IYrUegYmbI6tAuCK5slaKGOwoyml
 hX2lXNBxaVmTk7F9Qq6I9Ona78XqO0Tg0UBzC2dIsQITvkue7ItJBpkurOwYSOGt
 a8SAVV0JwXSfPquKlOfLhagPZcuQuTQfIqRKVqM47KPPO/i99okRXQbfJGrpHJKM
 8bzRl6654nAs
 =uzl/
 -----END PGP SIGNATURE-----

Merge tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter, IPsec and wireless. This is again
  considerably bigger than the old average. No known outstanding
  regressions.

  Current release - regressions:

   - net: increase IP_TUNNEL_RECURSION_LIMIT to 5

   - eth: ice: fix PTP timestamping broken by SyncE code on E825C

  Current release - new code bugs:

   - eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure

  Previous releases - regressions:

   - core: fix cross-cache free of KFENCE-allocated skb head

   - sched: act_csum: validate nested VLAN headers

   - rxrpc: fix call removal to use RCU safe deletion

   - xfrm:
      - wait for RCU readers during policy netns exit
      - fix refcount leak in xfrm_migrate_policy_find

   - wifi: rt2x00usb: fix devres lifetime

   - mptcp: fix slab-use-after-free in __inet_lookup_established

   - ipvs: fix NULL deref in ip_vs_add_service error path

   - eth:
      - airoha: fix memory leak in airoha_qdma_rx_process()
      - lan966x: fix use-after-free and leak in lan966x_fdma_reload()

  Previous releases - always broken:

   - ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data()

   - ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group
     dump

   - bridge: guard local VLAN-0 FDB helpers against NULL vlan group

   - xsk: tailroom reservation and MTU validation

   - rxrpc:
      - fix to request an ack if window is limited
      - fix RESPONSE authenticator parser OOB read

   - netfilter: nft_ct: fix use-after-free in timeout object destroy

   - batman-adv: hold claim backbone gateways by reference

   - eth:
      - stmmac: fix PTP ref clock for Tegra234
      - idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling
      - ipa: fix GENERIC_CMD register field masks for IPA v5.0+"

* tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (104 commits)
  net: lan966x: fix use-after-free and leak in lan966x_fdma_reload()
  net: lan966x: fix page pool leak in error paths
  net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool()
  nfc: pn533: allocate rx skb before consuming bytes
  l2tp: Drop large packets with UDP encap
  net: ipa: fix event ring index not programmed for IPA v5.0+
  net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+
  MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver
  devlink: Fix incorrect skb socket family dumping
  af_unix: read UNIX_DIAG_VFS data under unix_state_lock
  Revert "mptcp: add needs_id for netlink appending addr"
  mptcp: fix slab-use-after-free in __inet_lookup_established
  net: txgbe: leave space for null terminators on property_entry
  net: ioam6: fix OOB and missing lock
  rxrpc: proc: size address buffers for %pISpc output
  rxrpc: only handle RESPONSE during service challenge
  rxrpc: Fix buffer overread in rxgk_do_verify_authenticator()
  rxrpc: Fix leak of rxgk context in rxgk_verify_response()
  rxrpc: Fix integer overflow in rxgk_verify_response()
  rxrpc: Fix missing error checks for rxkad encryption/decryption failure
  ...
2026-04-09 08:39:25 -07:00
Will Deacon
f8d5e7066d Merge branches 'fixes', 'arm/smmu/updates', 'arm/smmu/bindings', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next 2026-04-09 13:18:27 +01:00
Song Gao
e47b8e1db9 KVM: LoongArch: selftests: Add PMU overflow interrupt test
Extend the PMU test suite to cover overflow interrupts. The test enables
the PMI (Performance Monitor Interrupt), sets counter 0 to one less than
the overflow value, and verifies that an interrupt is raised when the
counter overflows. A guest interrupt handler checks the interrupt cause
and disables further PMU interrupts upon success.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-09 18:56:38 +08:00
Song Gao
11c8401927 KVM: LoongArch: selftests: Add basic PMU event counting test
Introduce a basic PMU test that verifies hardware event counting for
four performance counters. The test enables the events for CPU cycles,
instructions retired, branch instructions, and branch misses, runs a
fixed number of loops, and checks that the counter values fall within
expected ranges. It also validates that the host supports PMU and that
the VM feature is enabled.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-09 18:56:37 +08:00
Song Gao
fa19ea9a7b KVM: LoongArch: selftests: Add cpucfg read/write helpers
Add helper macros and functions to read and write CPU configuration
registers (cpucfg) from the guest and from the VMM. This interface is
required in upcoming selftests for querying and setting CPU features,
such as PMU capabilities.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-09 18:56:37 +08:00
Takashi Iwai
3f44bccdd6 Merge branch 'for-linus' into for-next
Pull 7.0-devel branch for further development of HD-audio codec quirks.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-04-09 08:32:17 +02:00
Or Har-Toov
2a8e912352 selftest: netdevsim: Add resource dump and scope filter test
Add resource_dump_test() which verifies dumping resources for all
devices and ports, and tests that scope=dev returns only device-level
resources and scope=port returns only port resources.

Skip if userspace does not support the scope parameter.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260407194107.148063-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:55:39 -07:00
Or Har-Toov
3961353771 selftest: netdevsim: Add devlink port resource doit test
Tests that querying a specific port handle returns the expected
resource name and size.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260407194107.148063-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:55:39 -07:00
Leon Hwang
5ae4ba98d7 selftests/drivers/net: Add an xdp test to xdp.py
In "bpf: Disallow freplace on XDP with mismatched xdp_has_frags values" [1],
this XDP test is suggested to add to xdp.py.

1. Verify the failure of updating frag-capable prog with non-frag-capable
   prog, when the frag-capable prog attaches to mtu=9k driver.

The test has been verified against Mellanox CX6 and Intel 82599ES NICs.

With dropping other tests, here is the test log.

 # ethtool -i eth0
 driver: mlx5_core
 version: 6.19.0-061900-generic

 # NETIF=eth0 python3 xdp.py
 TAP version 13
 1..1
 ok 1 xdp.test_xdp_native_update_mb_to_sb
 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

 # ethtool -i eth0
 driver: ixgbe
 version: 6.19.0-061900-generic

 # NETIF=eth0 python3 xdp.py
 TAP version 13
 1..1
 # CMD: ip  link set dev eth0 xdpdrv obj /path/to/tools/testing/selftests/net/lib/xdp_dummy.bpf.o sec xdp.frags
 #   EXIT: 2
 #   STDERR: RTNETLINK answers: Invalid argument
 ok 1 xdp.test_xdp_native_update_mb_to_sb # SKIP device does not support multi-buffer XDP
 # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0

Signed-off-by: Leon Hwang <leon.huangfu@shopee.com>
Link: https://patch.msgid.link/20260406072655.368173-1-leon.huangfu@shopee.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:48:56 -07:00
Ioana Ciornei
fff75dba79 selftests: forwarding: lib: rewrite processing of command line arguments
The piece of code which processes the command line arguments and
populates NETIFS based on them is really unobvious. Rewrite it so that
the intention is clear and the code is easy to follow.

Suggested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260407102058.867279-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 19:26:44 -07:00
Jakub Kicinski
ea0f90d1ed ipsec-next-2026-04-08
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmnWIe8ACgkQrB3Eaf9P
 W7dDqRAAho59mSQlQAaoj6lkPlBCR8/TZrEHWXeTZvWzzyILiE8GJGdkMoUOk47S
 QR2YJ7xTg/eAALJFFPCKj82k5GOt2CjOo30BS901zdBhSZbN/H+tW57QfYRegR3o
 BFZ0eBCDc5FHQYRl8QbCi2XtF4Sqr8erLIvNwfaOiuoPCZmoehD2kyMpPhb/w9qQ
 DD0OsYWjZuhBP+MwHGCsmtMBoesVKI/86HV0LpeyH7uU+928Tf+TcACJzkLMrUcE
 AwrvTL3Mvp2ljsm9mw6mElyiAqemQHM87yg8BrR7NoXlahAEOJx8UWchKpAgGXv5
 bO8ng0Y8lNcuG+tN7rVk4/KeyjGNSW6ubRKfZbast6aoj5LfUhOIxxMTyYOEU5rH
 wKbIX00ilONs8S+kK/S4D0/1EdszOB/WVUTN5yEH1+FxkpvMGs3LUfhEjzfk9Lnz
 sT1ZF65YNwR0qa1SaIU4kYM543mlr/CrFgoPx5VOu0+jG+xCVWiC8fy+/SD688ht
 VTQGf8Y6gGX0yRMYJeauHHCBeMwbF7WEu7MYSi+4+7uUCYexh700QpOjaYLrTpgS
 NLpT9JPvuyWQ389DjJ+h5cpTqIsLrNs6+SXo+mZ6nkubGe+HRKZnLFwXj+41p3hE
 tUv+EcZTKDa+YVGymVcjORC5JjqvJXXklqQFeROuoamdJF0M96c=
 =0E3L
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2026-04-08

1) Update outdated comment in xfrm_dst_check().
   From kexinsun.

2) Drop support for HMAC-RIPEMD-160 from IPsec.
   From Eric Biggers.

* tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Drop support for HMAC-RIPEMD-160
  xfrm: update outdated comment
====================

Link: https://patch.msgid.link/20260408094258.148555-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08 18:51:54 -07:00
Daniel Borkmann
e0fcb42bc6 selftests/bpf: Add tests for ld_{abs,ind} failure path in subprogs
Extend the verifier_ld_ind BPF selftests with subprogs containing
ld_{abs,ind} and craft the test in a way where the invalid register
read is rejected in the fixed case. Also add a success case each,
and add additional coverage related to the BTF return type enforcement.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_ld_ind
  [...]
  #611/1   verifier_ld_ind/ld_ind: check calling conv, r1:OK
  #611/2   verifier_ld_ind/ld_ind: check calling conv, r1 @unpriv:OK
  #611/3   verifier_ld_ind/ld_ind: check calling conv, r2:OK
  #611/4   verifier_ld_ind/ld_ind: check calling conv, r2 @unpriv:OK
  #611/5   verifier_ld_ind/ld_ind: check calling conv, r3:OK
  #611/6   verifier_ld_ind/ld_ind: check calling conv, r3 @unpriv:OK
  #611/7   verifier_ld_ind/ld_ind: check calling conv, r4:OK
  #611/8   verifier_ld_ind/ld_ind: check calling conv, r4 @unpriv:OK
  #611/9   verifier_ld_ind/ld_ind: check calling conv, r5:OK
  #611/10  verifier_ld_ind/ld_ind: check calling conv, r5 @unpriv:OK
  #611/11  verifier_ld_ind/ld_ind: check calling conv, r7:OK
  #611/12  verifier_ld_ind/ld_ind: check calling conv, r7 @unpriv:OK
  #611/13  verifier_ld_ind/ld_abs: subprog early exit on ld_abs failure:OK
  #611/14  verifier_ld_ind/ld_ind: subprog early exit on ld_ind failure:OK
  #611/15  verifier_ld_ind/ld_abs: subprog with both paths safe:OK
  #611/16  verifier_ld_ind/ld_ind: subprog with both paths safe:OK
  #611/17  verifier_ld_ind/ld_abs: reject void return subprog:OK
  #611/18  verifier_ld_ind/ld_ind: reject void return subprog:OK
  #611     verifier_ld_ind:OK
  Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260408191242.526279-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08 18:43:28 -07:00
Cheng-Yang Chou
ff1befcb16 selftests/sched_ext: Improve runner error reporting for invalid arguments
Report an error for './runner foo' (positional arg instead of -t) and
for './runner -t foo' when the filter matches no tests. Previously both
cases produced no error output.

Pre-scan the test list before the main loop so the error is reported
immediately, avoiding spurious SKIP output from '-s' when no tests
match.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-08 15:20:44 -10:00
Varun R Mallya
c7cab53f9d selftests/bpf: Add test to ensure kprobe_multi is not sleepable
Add a selftest to ensure that kprobe_multi programs cannot be attached
using the BPF_F_SLEEPABLE flag. This test succeeds when the kernel
rejects attachment of kprobe_multi when the BPF_F_SLEEPABLE flag is set.

Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Link: https://lore.kernel.org/r/20260408190137.101418-3-varunrmallya@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08 18:15:56 -07:00
Christian Bruel
1d3225cb5d selftests: pci_endpoint: Skip BAR subrange test on -ENOSPC
In pci-epf-test.c, set the STATUS_NO_RESOURCE status bit if
pci_epc_set_bar() returns -ENOSPC.  This status bit is used to indicate
that there are not enough inbound window resources to allocate the
subrange.

In pci_endpoint_test.c, return -ENOSPC instead of -EIO when
STATUS_NO_RESOURCE is set.

In pci_endpoint_test.c, skip the BAR subrange test if -ENOSPC, i.e., there
are not enough inbound window resources to run the test.

Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: squash related commits]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Koichiro Den <den@valinux.co.jp>
Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-1-6f2e65f2298c@foss.st.com
Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-2-6f2e65f2298c@foss.st.com
Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-3-6f2e65f2298c@foss.st.com
2026-04-08 14:41:39 -05:00
Fernando Fernandez Mancera
dde1a6084c selftests: nft_queue.sh: add a parallel stress test
Introduce a new stress test to check for race conditions in the
nfnetlink_queue subsystem, where an entry is freed while another CPU is
concurrently walking the global rhashtable.

To trigger this, `nf_queue.c` is extended with two new flags:
  * -O (out-of-order): Buffers packet IDs and flushes them in reverse.
  * -b (bogus verdicts): Floods the kernel with non-existent packet IDs.

The bogus verdict loop forces the kernel's lookup function to perform
full rhashtable bucket traversals (-ENOENT). Combined with reverse-order
flushing and heavy parallel UDP/ping flooding across 8 queues, this puts
the nfnetlink_queue code under pressure.

Joint work with Florian Westphal.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-08 13:34:51 +02:00
Marc Zyngier
94b4ae79eb Merge branch kvm-arm64/misc-7.1 into kvmarm-master/next
* kvm-arm64/misc-7.1:
  KVM: arm64: selftests: Avoid testing the IMPDEF behavior
  KVM: arm64: Destroy stage-2 page-table in kvm_arch_destroy_vm()
  KVM: arm64: Don't leave mmu->pgt dangling on kvm_init_stage2_mmu() error
  KVM: arm64: Prevent the host from using an smc with imm16 != 0

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:26:11 +01:00
Marc Zyngier
d77f4792db Merge branch kvm-arm64/vgic-fixes-7.1 into kvmarm-master/next
* kvm-arm64/vgic-fixes-7.1:
  : .
  : FIrst pass at fixing a number of vgic-v5 bugs that were found
  : after the merge of the initial series.
  : .
  KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE
  KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs
  KVM: arm64: set_id_regs: Allow GICv3 support to be set at runtime
  KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported
  KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling
  KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid()
  KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer()
  KVM: arm64: Kill arch_timer_context::direct field
  KVM: arm64: vgic-v5: Correctly set dist->ready once initialised
  KVM: arm64: vgic-v5: Make the effective priority mask a strict limit
  KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours
  KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2
  KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs
  KVM: arm64: Account for RESx bits in __compute_fgt()
  KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1
  arm64: Fix field references for ICH_PPI_DVIR[01]_EL2
  KVM: arm64: Don't skip per-vcpu NV initialisation
  KVM: arm64: vgic: Don't reset cpuif/redist addresses at finalize time

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:26:00 +01:00
Marc Zyngier
f8078d51ee Merge branch kvm-arm64/vgic-v5-ppi into kvmarm-master/next
* kvm-arm64/vgic-v5-ppi: (40 commits)
  : .
  : Add initial GICv5 support for KVM guests, only adding PPI support
  : for the time being. Patches courtesy of Sascha Bischoff.
  :
  : From the cover letter:
  :
  : "This is v7 of the patch series to add the virtual GICv5 [1] device
  : (vgic_v5). Only PPIs are supported by this initial series, and the
  : vgic_v5 implementation is restricted to the CPU interface,
  : only. Further patch series are to follow in due course, and will add
  : support for SPIs, LPIs, the GICv5 IRS, and the GICv5 ITS."
  : .
  KVM: arm64: selftests: Add no-vgic-v5 selftest
  KVM: arm64: selftests: Introduce a minimal GICv5 PPI selftest
  KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI
  Documentation: KVM: Introduce documentation for VGICv5
  KVM: arm64: gic-v5: Probe for GICv5 device
  KVM: arm64: gic-v5: Set ICH_VCTLR_EL2.En on boot
  KVM: arm64: gic-v5: Introduce kvm_arm_vgic_v5_ops and register them
  KVM: arm64: gic-v5: Hide FEAT_GCIE from NV GICv5 guests
  KVM: arm64: gic: Hide GICv5 for protected guests
  KVM: arm64: gic-v5: Mandate architected PPI for PMU emulation on GICv5
  KVM: arm64: gic-v5: Enlighten arch timer for GICv5
  irqchip/gic-v5: Introduce minimal irq_set_type() for PPIs
  KVM: arm64: gic-v5: Initialise ID and priority bits when resetting vcpu
  KVM: arm64: gic-v5: Create and initialise vgic_v5
  KVM: arm64: gic-v5: Support GICv5 interrupts with KVM_IRQ_LINE
  KVM: arm64: gic-v5: Implement direct injection of PPIs
  KVM: arm64: Introduce set_direct_injection irq_op
  KVM: arm64: gic-v5: Trap and mask guest ICC_PPI_ENABLERx_EL1 writes
  KVM: arm64: gic-v5: Check for pending PPIs
  KVM: arm64: gic-v5: Clear TWI if single task running
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:22:35 +01:00
Marc Zyngier
2de32a25a3 Merge branch kvm-arm64/hyp-tracing into kvmarm-master/next
* kvm-arm64/hyp-tracing: (40 commits)
  : .
  : EL2 tracing support, adding both 'remote' ring-buffer
  : infrastructure and the tracing itself, courtesy of
  : Vincent Donnefort. From the cover letter:
  :
  : "The growing set of features supported by the hypervisor in protected
  : mode necessitates debugging and profiling tools. Tracefs is the
  : ideal candidate for this task:
  :
  :   * It is simple to use and to script.
  :
  :   * It is supported by various tools, from the trace-cmd CLI to the
  :     Android web-based perfetto.
  :
  :   * The ring-buffer, where are stored trace events consists of linked
  :     pages, making it an ideal structure for sharing between kernel and
  :     hypervisor.
  :
  : This series first introduces a new generic way of creating remote events and
  : remote buffers. Then it adds support to the pKVM hypervisor."
  : .
  tracing: selftests: Extend hotplug testing for trace remotes
  tracing: Non-consuming read for trace remotes with an offline CPU
  tracing: Adjust cmd_check_undefined to show unexpected undefined symbols
  tracing: Restore accidentally removed SPDX tag
  KVM: arm64: avoid unused-variable warning
  tracing: Generate undef symbols allowlist for simple_ring_buffer
  KVM: arm64: tracing: add ftrace dependency
  tracing: add more symbols to whitelist
  tracing: Update undefined symbols allow list for simple_ring_buffer
  KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing
  tracing: selftests: Add hypervisor trace remote tests
  KVM: arm64: Add selftest event support to nVHE/pKVM hyp
  KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
  KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
  KVM: arm64: Add trace reset to the nVHE/pKVM hyp
  KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
  KVM: arm64: Add trace remote for the nVHE/pKVM hyp
  KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
  KVM: arm64: Support unaligned fixmap in the pKVM hyp
  KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:21:51 +01:00
Andrea Mayer
32dfd742f0 selftests: seg6: add test for dst_cache isolation in seg6 lwtunnel
Add a selftest that verifies the dst_cache in seg6 lwtunnel is not
shared between the input (forwarding) and output (locally generated)
paths.

The test creates three namespaces (ns_src, ns_router, ns_dst)
connected in a line. An SRv6 encap route on ns_router encapsulates
traffic destined to cafe::1 with SID fc00::100. The SID is
reachable only for forwarded traffic (from ns_src) via an ip rule
matching the ingress interface (iif veth-r0 lookup 100), and
blackholed in the main table.

The test verifies that:

  1. A packet generated locally on ns_router does not reach
     ns_dst with an empty cache, since the SID is blackholed;
  2. A forwarded packet from ns_src populates the input cache
     from table 100 and reaches ns_dst;
  3. A packet generated locally on ns_router still does not
     reach ns_dst after the input cache is populated,
     confirming the output path does not reuse the input
     cache entry.

Both the forwarded and local packets are pinned to the same CPU
with taskset, since dst_cache is per-cpu.

Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
Link: https://patch.msgid.link/20260404004405.4057-3-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 20:20:56 -07:00
Daniel Golle
efaa71faf2 selftests: net: bridge_vlan_mcast: wait for h1 before querier check
The querier-interval test adds h1 (currently a slave of the VRF created
by simple_if_init) to a temporary bridge br1 acting as an outside IGMP
querier. The kernel VRF driver (drivers/net/vrf.c) calls cycle_netdev()
on every slave add and remove, toggling the interface admin-down then up.
Phylink takes the PHY down during the admin-down half of that cycle.
Since h1 and swp1 are cable-connected, swp1 also loses its link may need
several seconds to re-negotiate.

Use setup_wait_dev $h1 0 which waits for h1 to return to UP state, so the
test can rely on the link being back up at this point.

Fixes: 4d8610ee8b ("selftests: net: bridge: add vlan mcast_querier_interval tests")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/c830f130860fd2efae08bfb9e5b25fd028e58ce5.1775424423.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 20:16:16 -07:00
Jakub Kicinski
e65d8b6f30 selftests: drv-net: adjust to socat changes
socat v1.8.1.0 now defaults to shut-null, it sends an extra
0-length UDP packet when sender disconnects. This breaks
our tests which expect the exact packet sequence.

Add shut-none which was the old default where necessary.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260404230103.2719103-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07 18:54:03 -07:00
Amery Hung
4cfb09a383 selftests/bpf: Test overwriting referenced dynptr
Test overwriting referenced dynptr and clones to make sure it is only
allow when there is at least one other dynptr with the same ref_obj_id.
Also make sure slice is still invalidated after the dynptr's stack slot
is destroyed.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406150548.1354271-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07 18:20:49 -07:00
Daniel Borkmann
cac16ce1e3 selftests/bpf: Add tests for stale delta leaking through id reassignment
Extend the verifier_linked_scalars BPF selftest with a stale delta test
such that the div-by-zero path is rejected in the fixed case.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
  [...]
  ./test_progs -t verifier_linked_scalars
  #612/1   verifier_linked_scalars/scalars: find linked scalars:OK
  #612/2   verifier_linked_scalars/sync_linked_regs_preserves_id:OK
  #612/3   verifier_linked_scalars/scalars_neg:OK
  #612/4   verifier_linked_scalars/scalars_neg_sub:OK
  #612/5   verifier_linked_scalars/scalars_neg_alu32_add:OK
  #612/6   verifier_linked_scalars/scalars_neg_alu32_sub:OK
  #612/7   verifier_linked_scalars/scalars_pos:OK
  #612/8   verifier_linked_scalars/scalars_sub_neg_imm:OK
  #612/9   verifier_linked_scalars/scalars_double_add:OK
  #612/10  verifier_linked_scalars/scalars_sync_delta_overflow:OK
  #612/11  verifier_linked_scalars/scalars_sync_delta_overflow_large_range:OK
  #612/12  verifier_linked_scalars/scalars_alu32_big_offset:OK
  #612/13  verifier_linked_scalars/scalars_alu32_basic:OK
  #612/14  verifier_linked_scalars/scalars_alu32_wrap:OK
  #612/15  verifier_linked_scalars/scalars_alu32_zext_linked_reg:OK
  #612/16  verifier_linked_scalars/scalars_alu32_alu64_cross_type:OK
  #612/17  verifier_linked_scalars/scalars_alu32_alu64_regsafe_pruning:OK
  #612/18  verifier_linked_scalars/alu32_negative_offset:OK
  #612/19  verifier_linked_scalars/spurious_precision_marks:OK
  #612/20  verifier_linked_scalars/scalars_self_add_clears_id:OK
  #612/21  verifier_linked_scalars/scalars_self_add_alu32_clears_id:OK
  #612/22  verifier_linked_scalars/scalars_stale_delta_from_cleared_id:OK
  #612/23  verifier_linked_scalars/scalars_stale_delta_from_cleared_id_alu32:OK
  #612     verifier_linked_scalars:OK
  Summary: 1/23 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260407192421.508817-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07 18:15:43 -07:00
Daniel Borkmann
ed2eecdc0c selftests/bpf: Add tests for delta tracking when src_reg == dst_reg
Extend the verifier_linked_scalars BPF selftest with a rX += rX test
such that the div-by-zero path is rejected in the fixed case.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
  [...]
  ./test_progs -t verifier_linked_scalars
  #612/1   verifier_linked_scalars/scalars: find linked scalars:OK
  #612/2   verifier_linked_scalars/sync_linked_regs_preserves_id:OK
  #612/3   verifier_linked_scalars/scalars_neg:OK
  #612/4   verifier_linked_scalars/scalars_neg_sub:OK
  #612/5   verifier_linked_scalars/scalars_neg_alu32_add:OK
  #612/6   verifier_linked_scalars/scalars_neg_alu32_sub:OK
  #612/7   verifier_linked_scalars/scalars_pos:OK
  #612/8   verifier_linked_scalars/scalars_sub_neg_imm:OK
  #612/9   verifier_linked_scalars/scalars_double_add:OK
  #612/10  verifier_linked_scalars/scalars_sync_delta_overflow:OK
  #612/11  verifier_linked_scalars/scalars_sync_delta_overflow_large_range:OK
  #612/12  verifier_linked_scalars/scalars_alu32_big_offset:OK
  #612/13  verifier_linked_scalars/scalars_alu32_basic:OK
  #612/14  verifier_linked_scalars/scalars_alu32_wrap:OK
  #612/15  verifier_linked_scalars/scalars_alu32_zext_linked_reg:OK
  #612/16  verifier_linked_scalars/scalars_alu32_alu64_cross_type:OK
  #612/17  verifier_linked_scalars/scalars_alu32_alu64_regsafe_pruning:OK
  #612/18  verifier_linked_scalars/alu32_negative_offset:OK
  #612/19  verifier_linked_scalars/spurious_precision_marks:OK
  #612/20  verifier_linked_scalars/scalars_self_add_clears_id:OK
  #612/21  verifier_linked_scalars/scalars_self_add_alu32_clears_id:OK
  #612     verifier_linked_scalars:OK
  Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260407192421.508817-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07 18:15:43 -07:00
Andrey Grodzovsky
cea4323f1c selftests/bpf: Add tests for kprobe attachment with duplicate symbols
bpf_fentry_shadow_test exists in both vmlinux (net/bpf/test_run.c) and
bpf_testmod (bpf_testmod.c), creating a duplicate symbol condition when
bpf_testmod is loaded. Add subtests that verify kprobe behavior with
this duplicate symbol:

In attach_probe:
- dup-sym-{default,legacy,perf,link}: unqualified attach succeeds
  across all four modes, preferring vmlinux over module shadow.
- MOD:SYM qualification attaches to the module version.

In kprobe_multi_test:
- dup_sym: kprobe_multi attach with kprobe and kretprobe succeeds.

bpf_fentry_shadow_test is not invoked via test_run, so tests verify
attach and detach succeed without triggering the probe.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Link: https://lore.kernel.org/r/20260407203912.1787502-3-andrey.grodzovsky@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07 16:28:12 -07:00
Qi Tang
a4985a1755 selftests/bpf: add test for nullable PTR_TO_BUF access
Add iter_buf_null_fail with two tests and a test runner:
  - iter_buf_null_deref: verifier must reject direct dereference of
    ctx->key (PTR_TO_BUF | PTR_MAYBE_NULL) without a null check
  - iter_buf_null_check_ok: verifier must accept dereference after
    an explicit null check

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
Link: https://lore.kernel.org/r/20260407145421.4315-1-tpluszz77@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07 15:53:45 -07:00
Kumar Kartikeya Dwivedi
a8aa306741 selftests/bpf: Allow prog name matching for tests with __description
For tests that carry a __description tag, allow matching on both the
description string and program name for convenience. Before this commit,
the description string must be spelt out to filter the tests.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260407145606.3991770-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07 12:24:29 -07:00
Günther Noack
dc75f89046
selftests/landlock: Simplify ruleset creation and enforcement in fs_test
* Add enforce_fs() for defining and enforcing a ruleset in one step
* In some places, dropped "ASSERT_LE(0, fd)" checks after
  create_ruleset() call -- create_ruleset() already checks that.
* In some places, rename "file_fd" to "fd" if it is not needed to
  disambiguate any more.

Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-12-gnoack3000@gmail.com
[mic: Tweak subjet]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:10 +02:00
Günther Noack
f433fd3fa2
selftests/landlock: Check that coredump sockets stay unrestricted
Even when a process is restricted with the new
LANDLOCK_ACCESS_FS_RESOLVE_UNIX right, the kernel can continue writing
its coredump to the configured coredump socket.

In the test, we create a local server and rewire the system to write
coredumps into it.  We then create a child process within a Landlock
domain where LANDLOCK_ACCESS_FS_RESOLVE_UNIX is restricted and make
the process crash.  The test uses SO_PEERCRED to check that the
connecting client process is the expected one.

Includes a fix by Mickaël Salaün for setting the EUID to 0 (see [1]).

Link[1]: https://lore.kernel.org/all/20260218.ohth8theu8Yi@digikod.net/
Suggested-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-11-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:10 +02:00
Günther Noack
0f42f5be0b
selftests/landlock: Audit test for LANDLOCK_ACCESS_FS_RESOLVE_UNIX
Add an audit test to check that Landlock denials from
LANDLOCK_ACCESS_FS_RESOLVE_UNIX result in audit logs in the expected
format.  (There is one audit test for each filesystem access right, so
we should add one for LANDLOCK_ACCESS_FS_RESOLVE_UNIX as well.)

Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-10-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:09 +02:00
Günther Noack
9da41c65c9
selftests/landlock: Test LANDLOCK_ACCESS_FS_RESOLVE_UNIX
* Extract common helpers from an existing IOCTL test that
  also uses pathname unix(7) sockets.
* These tests use the common scoped domains fixture which is also used
  in other Landlock scoping tests and which was used in Tingmao Wang's
  earlier patch set in [1].

These tests exercise the cross product of the following scenarios:

* Stream connect(), Datagram connect(), Datagram sendmsg() and
  Seqpacket connect().
* Child-to-parent and parent-to-child communication
* The Landlock policy configuration as listed in the scoped_domains
  fixture.
  * In the default variant, Landlock domains are only placed where
    prescribed in the fixture.
  * In the "ALL_DOMAINS" variant, Landlock domains are also placed in
    the places where the fixture says to omit them, but with a
    LANDLOCK_RULE_PATH_BENEATH that allows connection.

Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Tingmao Wang <m@maowtm.org>
Cc: Mickaël Salaün <mic@digikod.net>
Link[1]: https://lore.kernel.org/all/53b9883648225d5a08e82d2636ab0b4fda003bc9.1767115163.git.m@maowtm.org/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-9-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:09 +02:00
Günther Noack
db8201a3fa
selftests/landlock: Replace access_fs_16 with ACCESS_ALL in fs_test
The access_fs_16 variable was originally intended to stay frozen at 16
access rights so that audit tests would not need updating when new
access rights are added.  Now that we have 17 access rights, the name
is confusing.

Replace all uses of access_fs_16 with ACCESS_ALL and delete the
variable.

Suggested-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-8-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:08 +02:00
Günther Noack
ae97330d1b
landlock: Control pathname UNIX domain socket resolution by path
* Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
  controls the lookup operations for named UNIX domain sockets.  The
  resolution happens during connect() and sendmsg() (depending on
  socket type).
* Change access_mask_t from u16 to u32 (see below)
* Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
  LSM hook.  Make policy decisions based on the new access rights
* Increment the Landlock ABI version.
* Minor test adaptations to keep the tests working.
* Document the design rationale for scoped access rights,
  and cross-reference it from the header documentation.

With this access right, access is granted if either of the following
conditions is met:

* The target socket's filesystem path was allow-listed using a
  LANDLOCK_RULE_PATH_BENEATH rule, *or*:
* The target socket was created in the same Landlock domain in which
  LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.

In case of a denial, connect() and sendmsg() return EACCES, which is
the same error as it is returned if the user does not have the write
bit in the traditional UNIX file system permissions of that file.

The access_mask_t type grows from u16 to u32 to make space for the new
access right.  This also doubles the size of struct layer_access_masks
from 32 byte to 64 byte.  To avoid memory layout inconsistencies between
architectures (especially m68k), pack and align struct access_masks [2].

Document the (possible future) interaction between scoped flags and
other access rights in struct landlock_ruleset_attr, and summarize the
rationale, as discussed in code review leading up to [3].

This feature was created with substantial discussion and input from
Justin Suess, Tingmao Wang and Mickaël Salaün.

Cc: Tingmao Wang <m@maowtm.org>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Link[1]: https://github.com/landlock-lsm/linux/issues/36
Link[2]: https://lore.kernel.org/all/20260401.Re1Eesu1Yaij@digikod.net/
Link[3]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20260327164838.38231-5-gnoack3000@gmail.com
[mic: Fix kernel-doc formatting, pack and align access_masks]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:06 +02:00
Mickaël Salaün
a060ac0b8c
selftests/landlock: Fix format warning for __u64 in net_test
On architectures where __u64 is unsigned long (e.g. powerpc64), using
%llx to format a __u64 triggers a -Wformat warning because %llx expects
unsigned long long.  Cast the argument to unsigned long long.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: a549d055a2 ("selftests/landlock: Add network tests")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202604020206.62zgOTeP-lkp@intel.com/
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-6-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:03 +02:00
Mickaël Salaün
07c2572a87
selftests/landlock: Skip stale records in audit_match_record()
Domain deallocation records are emitted asynchronously from kworker
threads (via free_ruleset_work()).  Stale deallocation records from a
previous test can arrive during the current test's deallocation read
loop and be picked up by audit_match_record() instead of the expected
record, causing a domain ID mismatch.  The audit.layers test (which
creates 16 nested domains) is particularly vulnerable because it reads
16 deallocation records in sequence, providing a large window for stale
records to interleave.

The same issue affects audit_flags.signal, where deallocation records
from a previous test (audit.layers) can leak into the next test and be
picked up by audit_match_record() instead of the expected record.

Fix this by continuing to read records when the type matches but the
content pattern does not.  Stale records are silently consumed, and the
loop only stops when both type and pattern match (or the socket times
out with -EAGAIN).

Additionally, extend matches_log_domain_deallocated() with an
expected_domain_id parameter.  When set, the regex pattern includes the
specific domain ID as a literal hex value, so that deallocation records
for a different domain do not match the pattern at all.  This handles
the case where the stale record has the same denial count as the
expected one (e.g. both have denials=1), which the type+pattern loop
alone cannot distinguish.  Callers that already know the expected domain
ID (from a prior denial or allocation record) now pass it to filter
precisely.

When expected_domain_id is set, matches_log_domain_deallocated() also
temporarily increases the socket timeout to audit_tv_dom_drop (1 second)
to wait for the asynchronous kworker deallocation, and restores
audit_tv_default afterward.  This removes the need for callers to manage
the timeout switch manually.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b2297 ("selftests/landlock: Add tests for audit flags and domain IDs")
Link: https://lore.kernel.org/r/20260402192608.1458252-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:02 +02:00
Mickaël Salaün
3647a4977f
selftests/landlock: Drain stale audit records on init
Non-audit Landlock tests generate audit records as side effects when
audit_enabled is non-zero (e.g. from boot configuration).  These records
accumulate in the kernel audit backlog while no audit daemon socket is
open.  When the next test opens a new netlink socket and registers as
the audit daemon, the stale backlog is delivered, causing baseline
record count checks to fail spuriously.

Fix this by draining all pending records in audit_init() right after
setting the receive timeout.  The 1-usec SO_RCVTIMEO causes audit_recv()
to return -EAGAIN once the backlog is empty, naturally terminating the
drain loop.

Domain deallocation records are emitted asynchronously from a work
queue, so they may still arrive after the drain.  Remove records.domain
== 0 checks that are not preceded by audit_match_record() calls, which
would otherwise consume stale records before the count.  Document this
constraint above audit_count_records().

Increasing the drain timeout to catch in-flight deallocation records was
considered but rejected: a longer timeout adds latency to every
audit_init() call even when no stale record is pending, and any fixed
timeout is still not guaranteed to catch all records under load.
Removing the unprotected checks is simpler and avoids the spurious
failures.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b2297 ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:01 +02:00
Mickaël Salaün
9143d79033
selftests/landlock: Fix socket file descriptor leaks in audit helpers
audit_init() opens a netlink socket and configures it, but leaks the
file descriptor if audit_set_status() or setsockopt() fails.  Fix this
by jumping to an error path that closes the socket before returning.

Apply the same fix to audit_init_with_exe_filter(), which leaks the file
descriptor from audit_init() if audit_init_filter_exe() or
audit_filter_exe() fails, and to audit_cleanup(), which leaks it if
audit_init_filter_exe() fails in FIXTURE_TEARDOWN_PARENT().

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b2297 ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:01 +02:00
Mickaël Salaün
b566f7a4f0
selftests/landlock: Fix snprintf truncation checks in audit helpers
snprintf() returns the number of characters that would have been
written, excluding the terminating NUL byte.  When the output is
truncated, this return value equals or exceeds the buffer size.  Fix
matches_log_domain_allocated() and matches_log_domain_deallocated() to
detect truncation with ">=" instead of ">".

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b2297 ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:00 +02:00
Mickaël Salaün
e75e38055b
landlock: Allow TSYNC with LOG_SUBDOMAINS_OFF and fd=-1
LANDLOCK_RESTRICT_SELF_TSYNC does not allow
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF with ruleset_fd=-1, preventing
a multithreaded process from atomically propagating subdomain log muting
to all threads without creating a domain layer.  Relax the fd=-1
condition to accept TSYNC alongside LOG_SUBDOMAINS_OFF, and update the
documentation accordingly.

Add flag validation tests for all TSYNC combinations with ruleset_fd=-1,
and audit tests verifying both transition directions: muting via TSYNC
(logged to not logged) and override via TSYNC (not logged to logged).

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 42fc7e6543 ("landlock: Multithreading support for landlock_restrict_self()")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260407164107.2012589-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:00 +02:00
Mickaël Salaün
874c8f8382
landlock: Fix LOG_SUBDOMAINS_OFF inheritance across fork()
hook_cred_transfer() only copies the Landlock security blob when the
source credential has a domain.  This is inconsistent with
landlock_restrict_self() which can set LOG_SUBDOMAINS_OFF on a
credential without creating a domain (via the ruleset_fd=-1 path): the
field is committed but not preserved across fork() because the child's
prepare_creds() calls hook_cred_transfer() which skips the copy when
domain is NULL.

This breaks the documented use case where a process mutes subdomain logs
before forking sandboxed children: the children lose the muting and
their domains produce unexpected audit records.

Fix this by unconditionally copying the Landlock credential blob.

Cc: Günther Noack <gnoack@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Fixes: ead9079f75 ("landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260407164107.2012589-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:50:56 +02:00
Claudio Imbrenda
857e92662c KVM: s390: selftests: enable some common memory-related tests
Enable the following tests on s390:
* memslot_modification_stress_test
* memslot_perf_test
* mmu_stress_test

Since the first two tests are now supported on all architectures, move
them into TEST_GEN_PROGS_COMMON and out of the indiviual architectures.

Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07 17:20:30 +02:00
Claudio Imbrenda
c10e2771c7 KVM: selftests: Remove 1M alignment requirement for s390
Remove the 1M memslot alignment requirement for s390, since it is not
needed anymore.

Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07 17:07:27 +02:00
Ming Lei
affb5f67d7 selftests/ublk: add read-only buffer registration test
Add --rdonly_shmem_buf option to kublk that registers shared memory
buffers with UBLK_SHMEM_BUF_READ_ONLY (read-only pinning without
FOLL_WRITE) and mmaps with PROT_READ only.

Add test_shmemzc_04.sh which exercises the new flag with a null target,
hugetlbfs buffer, and write workload. Write I/O works because the
server only reads from the shared buffer — the data flows from client
to kernel to the shared pages, and the server reads them out.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:42:39 -06:00
Ming Lei
12075992c6 selftests/ublk: add filesystem fio verify test for shmem_zc
Add test_shmemzc_03.sh which exercises shmem_zc through the full
filesystem stack: mkfs ext4 on the ublk device, mount it, then run
fio verify on a file inside the filesystem with --mem=mmaphuge.

Extend _mkfs_mount_test() to accept an optional command that runs
between mount and umount. The function cd's into the mount directory
so the command can use relative file paths. Existing callers that
pass only the device are unaffected.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-10-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:42:23 -06:00
Ming Lei
d486650332 selftests/ublk: add hugetlbfs shmem_zc test for loop target
Add test_shmem_zc_02.sh which tests the UBLK_IO_F_SHMEM_ZC zero-copy
path on the loop target using a hugetlbfs shared buffer. Both kublk and
fio mmap the same hugetlbfs file with MAP_SHARED, sharing physical
pages. The kernel's PFN matching enables zero-copy — the loop target
reads/writes directly from the shared buffer to the backing file.

Uses standard fio --mem=mmaphuge:<path> (supported since fio 1.10),
no patched fio required.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:42:23 -06:00
Ming Lei
2f1e9468bd selftests/ublk: add shared memory zero-copy test
Add test_shmem_zc_01.sh which tests UBLK_IO_F_SHMEM_ZC on the null
target using a hugetlbfs shared buffer. Both kublk (--htlb) and fio
(--mem=mmaphuge:<path>) mmap the same hugetlbfs file with MAP_SHARED,
sharing physical pages. The kernel PFN match enables zero-copy I/O.

Uses standard fio --mem=mmaphuge:<path> (supported since fio 1.10),
no patched fio required.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:42:23 -06:00
Ming Lei
ec20aa44ac selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
Add loop_queue_shmem_zc_io() which handles I/O requests marked with
UBLK_IO_F_SHMEM_ZC. When the kernel sets this flag, the request data
lives in a registered shared memory buffer — decode index + offset
from iod->addr and use the server's mmap as the I/O buffer.

The dispatch check in loop_queue_tgt_rw_io() routes SHMEM_ZC requests
to this new function, bypassing the normal buffer registration path.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:42:23 -06:00
Ming Lei
166b476b8d selftests/ublk: add shared memory zero-copy support in kublk
Add infrastructure for UBLK_F_SHMEM_ZC shared memory zero-copy:

- kublk.h: struct ublk_shmem_entry and table for tracking registered
  shared memory buffers
- kublk.c: per-device unix socket listener that accepts memfd
  registrations from clients via SCM_RIGHTS fd passing. The listener
  mmaps the memfd and registers the VA range with the kernel for PFN
  matching. Also adds --shmem_zc command line option.
- kublk.c: --htlb <path> option to open a pre-allocated hugetlbfs
  file, mmap it with MAP_SHARED|MAP_POPULATE, and register it with
  the kernel via ublk_ctrl_reg_buf(). Any process that mmaps the same
  hugetlbfs file shares the same physical pages, enabling zero-copy
  without socket-based fd passing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:41:55 -06:00
Qingfang Deng
dfecb0c5af selftests: net: add tests for PPP
Add ping and iperf3 tests for ppp_async.c and pppoe.c.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/20260403034908.30017-1-qingfang.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-07 12:08:46 +02:00
Eric Biggers
05d42dc8ab xfrm: Drop support for HMAC-RIPEMD-160
Drop support for HMAC-RIPEMD-160 from IPsec to reduce the UAPI surface
and simplify future maintenance.  It's almost certainly unused.

RIPEMD-160 received some attention in the early 2000s when SHA-* weren't
quite as well established.  But it never received much adoption outside
of certain niches such as Bitcoin.

It's actually unclear that Linux + IPsec + HMAC-RIPEMD-160 has *ever*
been used, even historically.  When support for it was added in 2003, it
was done so in a "cleanup" commit without any justification [1].  It
didn't actually work until someone happened to fix it 5 years later [2].
That person didn't use or test it either [3].  Finally, also note that
"hmac(rmd160)" is by far the slowest of the algorithms in aalg_list[].

Of course, today IPsec is usually used with an AEAD, such as AES-GCM.
But even for IPsec users still using a dedicated auth algorithm, they
almost certainly aren't using, and shouldn't use, HMAC-RIPEMD-160.

Thus, let's just drop support for it.  Note: no kconfig update is
needed, since CRYPTO_RMD160 wasn't actually being selected anyway.

References:
  [1] linux-history commit d462985fc1941a47
      ("[IPSEC]: Clean up key manager algorithm handling.")
  [2] linux commit a13366c632
      ("xfrm: xfrm_algo: correct usage of RIPEMD-160")
  [3] https://lore.kernel.org/all/1212340578-15574-1-git-send-email-rueegsegger@swiss-it.ch

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07 10:47:58 +02:00
Thomas Weißschuh
598b670af3 selftests/nolibc: don't skip tests for unimplemented syscalls anymore
The automatic skipping of tests on ENOSYS returns was introduced in
commit 349afc8a52 ("selftests/nolibc: skip tests for unimplemented
syscalls"). It handled the fact that nolibc would return ENOSYS for many
syscall wrappers on riscv32.

Nowadays nolibc handles all these correctly, so this logic is not used
anymore. To make missing nolibc functionality more obvious fail the
tests again if something is not implemented.

Revert the mentioned commit again.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260406-nolibc-no-skip-enosys-v1-2-c046b1ac7d73@weissschuh.net/
2026-04-07 09:29:26 +02:00
Thomas Weißschuh
9a5206f256 selftests/nolibc: explicitly handle ENOSYS from ptrace()
The automatic ENOSYS handling in EXPECT_SYSER() is about to be removed.
ptrace() will return legitimately return ENOSYS on qemu-user, so handle
it explicitly.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260406-nolibc-no-skip-enosys-v1-1-c046b1ac7d73@weissschuh.net/
2026-04-07 09:28:32 +02:00
Thomas Weißschuh
ce834c9cb9 tools/nolibc: add byteorder conversions
Add some standard functions to convert between different byte orders.
Conveniently the UAPI headers provide all the necessary functionality.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260405-nolibc-bswap-v1-1-f7699ca9cee0@weissschuh.net
2026-04-07 09:27:25 +02:00
Thomas Weißschuh
2eb64b936d tools/nolibc: add the _syscall() macro
The standard syscall() function or macro uses the libc return value
convention. Errors returned from the kernel as negative values are
stored in errno and -1 is returned. Users who want to avoid using
errno don't have a way to call raw syscalls and check the returned
error.

Add a new macro _syscall() which works like the standard syscall()
but passes through the return value from the kernel unchanged.
The naming scheme and return values match the named _sys_foo()
system call wrappers already part of nolibc.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260405-nolibc-syscall-v1-3-e5b12bc63211@weissschuh.net
2026-04-07 09:27:07 +02:00
Matthieu Baerts (NGI0)
c4a5cb2f00 selftests: mptcp: join: recreate signal endp with same ID
In this "delete re-add signal" MPTCP Join subtest, the endpoint linked
to the initial subflow is removed, but readded once with different ID.

It appears that there was an issue when reusing the same ID, recently
fixed by commit d191101dee ("mptcp: pm: in-kernel: always set ID as
avail when rm endp"). The test then now reuses the same ID the first
time, but continue to use another one (88) the second time.

This should then cover more cases.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/615
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-5-b0b33bea3fed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 19:14:30 -07:00
Stefano Garzarella
24ad7ff668 vsock/test: fix send_buf()/recv_buf() EINTR handling
When send() or recv() returns -1 with errno == EINTR, the code skips
the break but still adds the return value to nwritten/nread, making it
decrease by 1. This leads to wrong buffer offsets and wrong bytes count.

Fix it by explicitly continuing the loop on EINTR, so the return value
is only added when it is positive.

Fixes: a8ed71a27e ("vsock/test: add recv_buf() utility function")
Fixes: 12329bd51f ("vsock/test: add send_buf() utility function")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20260403093251.30662-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:46:03 -07:00
Maciej Fijalkowski
62838e363e selftests: bpf: adjust rx_dropped xskxceiver's test to respect tailroom
Since we have changed how big user defined headroom in umem can be,
change the logic in testapp_stats_rx_dropped() so we pass updated
headroom validation in xdp_umem_reg() and still drop half of frames.

Test works on non-mbuf setup so __xsk_pool_get_rx_frame_size() that is
called on xsk_rcv_check() will not account skb_shared_info size. Taking
the tailroom size into account in test being fixed is needed as
xdp_umem_reg() defaults to respect it.

Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-9-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:43:52 -07:00
Maciej Fijalkowski
16546954e1 selftests: bpf: have a separate variable for drop test
Currently two different XDP programs share a static variable for
different purposes (picking where to redirect on shared umem test &
whether to drop a packet). This can be a problem when running full test
suite - idx can be written by shared umem test and this value can cause
a false behavior within XDP drop half test.

Introduce a dedicated variable for drop half test so that these two
don't step on each other toes. There is no real need for using
__sync_fetch_and_add here as XSK tests are executed on single CPU.

Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-8-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:43:52 -07:00
Maciej Fijalkowski
3197c51ce2 selftests: bpf: fix pkt grow tests
Skip tail adjust tests in xskxceiver for SKB mode as it is not very
friendly for it. multi-buffer case does not work as xdp_rxq_info that is
registered for generic XDP does not report ::frag_size. The non-mbuf
path copies packet via skb_pp_cow_data() which only accounts for
headroom, leaving us with no tailroom and causing underlying XDP prog to
drop packets therefore.

For multi-buffer test on other modes, change the amount of bytes we use
for growth, assume worst-case scenario and take care of headroom and
tailroom.

Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-7-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:43:51 -07:00
Maciej Fijalkowski
c5866a6be4 selftests: bpf: introduce a common routine for reading procfs
Parametrize current way of getting MAX_SKB_FRAGS value from {sys,proc}fs
so that it can be re-used to get cache line size of system's CPU. All
that just to mimic and compute size of kernel's struct skb_shared_info
which for xsk and test suite interpret as tailroom.

Introduce two variables to ifobject struct that will carry count of skb
frags and tailroom size. Do the reading and computing once, at the
beginning of test suite execution in xskxceiver, but for test_progs such
way is not possible as in this environment each test setups and torns
down ifobject structs.

Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-6-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 18:43:51 -07:00
Anton Protopopov
1c2e217ad3 selftests/bpf: Add more tests for loading insn arrays with offsets
A `gotox rX` instruction accepts only values of type PTR_TO_INSN.
The only way to create such a value is to load it from a map of
type insn_array:

   rX = *(rY + offset) # rY was read from an insn_array
   ...
   gotox rX

Add instruction-level and C-level selftests to validate loads
with nonzero offsets.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260406160141.36943-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 18:38:32 -07:00
Jakub Kicinski
3b45559f6c selftests: net: py: color the basics in the output
Sometimes it's hard to spot the ok / not ok lines in the output.
This is especially true for the GRO tests which retries a lot
so there's a wall of non-fatal output printed.

Try to color the crucial lines green / red / yellow when running
in a terminal.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260402215444.1589893-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06 17:47:59 -07:00
Kumar Kartikeya Dwivedi
171580e432 selftests/bpf: Add tests for syscall ctx accesses beyond U16_MAX
Ensure we reject programs that access beyond the maximum syscall ctx
size, i.e. U16_MAX either through direct accesses or helpers/kfuncs.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 15:27:27 -07:00
Kumar Kartikeya Dwivedi
0dca817f4d selftests/bpf: Add tests for unaligned syscall ctx accesses
Add coverage for unaligned access with fixed offsets and variable
offsets, and through helpers or kfuncs.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 15:27:27 -07:00
Kumar Kartikeya Dwivedi
02c68b10d8 selftests/bpf: Test modified syscall ctx for ARG_PTR_TO_CTX
Ensure that global subprogs and tail calls can only accept an unmodified
PTR_TO_CTX for syscall programs. For all other program types, fixed or
variable offsets on PTR_TO_CTX is rejected when passed into an argument
of any call instruction type, through the unified logic of
check_func_arg_reg_off.

Finally, add a positive example of a case that should succeed with all
our previous changes.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 15:27:27 -07:00
Kumar Kartikeya Dwivedi
5a34139b27 selftests/bpf: Add syscall ctx variable offset tests
Add various tests to exercise fixed and variable offsets on PTR_TO_CTX
for syscall programs, and cover disallowed cases for other program types
lacking convert_ctx_access callback. Load verifier_ctx with CAP_SYS_ADMIN
so that kfunc related logic can be tested. While at it, convert assembly
tests to C. Unfortunately, ctx_pointer_to_helper_2's unpriv case conflicts
with usage of kfuncs in the file and cannot be run.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 15:27:26 -07:00
Kumar Kartikeya Dwivedi
02f500ce01 selftests/bpf: Convert ctx tests from ASM to C
Convert existing tests from ASM to C, in prep for future changes to add
more comprehensive tests.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 15:27:26 -07:00
Kumar Kartikeya Dwivedi
ae5ef001aa bpf: Support variable offsets for syscall PTR_TO_CTX
Allow accessing PTR_TO_CTX with variable offsets in syscall programs.
Fixed offsets are already enabled for all program types that do not
convert their ctx accesses, since the changes we made in the commit
de6c7d99f8 ("bpf: Relax fixed offset check for PTR_TO_CTX"). Note
that we also lift the restriction on passing syscall context into
helpers, which was not permitted before, and passing modified syscall
context into kfuncs.

The structure of check_mem_access can be mostly shared and preserved,
but we must use check_mem_region_access to correctly verify access with
variable offsets.

The check made in check_helper_mem_access is hardened to only allow
PTR_TO_CTX for syscall programs to be passed in as helper memory. This
was the original intention of the existing code anyway, and it makes
little sense for other program types' context to be utilized as a memory
buffer. In case a convincing example presents itself in the future, this
check can be relaxed further.

We also no longer use the last-byte access to simulate helper memory
access, but instead go through check_mem_region_access. Since this no
longer updates our max_ctx_offset, we must do so manually, to keep track
of the maximum offset at which the program ctx may be accessed.

Take care to ensure that when arg_type is ARG_PTR_TO_CTX, we do not
relax any fixed or variable offset constraints around PTR_TO_CTX even in
syscall programs, and require them to be passed unmodified. There are
several reasons why this is necessary. First, if we pass a modified ctx,
then the global subprog's accesses will not update the max_ctx_offset to
its true maximum offset, and can lead to out of bounds accesses. Second,
tail called program (or extension program replacing global subprog) where
their max_ctx_offset exceeds the program they are being called from can
also cause issues. For the latter, unmodified PTR_TO_CTX is the first
requirement for the fix, the second is ensuring max_ctx_offset >= the
program they are being called from, which has to be a separate change
not made in this commit.

All in all, we can hint using arg_type when we expect ARG_PTR_TO_CTX and
make our relaxation around offsets conditional on it.

Drop coverage of syscall tests from verifier_ctx.c temporarily for
negative cases until they are updated in subsequent commits.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06 15:27:26 -07:00
David Gow
8f260b02ee kunit: tool: Terminate kernel under test on SIGINT
kunit.py will attempt to catch SIGINT / ^C in order to ensure the TTY isn't
messed up, but never actually attempts to terminate the running kernel (be
it UML or QEMU). This can lead to a bit of frustration if the kernel has
crashed or hung.

Terminate the kernel process in the signal handler, if it's running. This
requires plumbing through the process handle in a few more places (and
having some checks to see if the kernel is still running in places where it
may have already been killed).

Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Closes: https://lore.kernel.org/all/aaFmiAmg9S18EANA@smile.fi.intel.com/
Signed-off-by: David Gow <david@davidgow.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-06 14:09:04 -06:00
Shuvam Pandey
e42c349f4c kunit: tool: skip stty when stdin is not a tty
run_kernel() cleanup and signal_handler() invoke stty unconditionally.
When stdin is not a tty (for example in CI or unit tests), this writes
noise to stderr.

Call stty only when stdin is a tty.

Add regression tests for these paths:
- run_kernel() with non-tty stdin
- signal_handler() with non-tty stdin
- signal_handler() with tty stdin

Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-06 14:07:29 -06:00
David Gow
b73f50ffd4 kunit: tool: Recommend --raw_output=all if no KTAP found
If no KTAP header is found in the kernel output (e.g., because the kernel
crashed before the KUnit executor was run), it's very useful to re-run the
test with --raw_output=all, as that will show any error output (such as a
stacktrace, log message, BUG, etc). This is not particularly intuitive,
however, as --raw_output=all is not well known.

Add an extra log line to advertise --raw_output=all in this case, as it's
a terrible user experience to just get "Did any KUnit tests run?"

Signed-off-by: David Gow <david@davidgow.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-06 13:47:52 -06:00
Ryota Sakamoto
b5f92fc4a7 kunit: Add --list_suites to show suites
Currently, kunit.py allows listing all individual tests via --list_tests.
However, users often need to see only the available test suites.

Add --list_suites to show suites. This option parses the test list output
from the kernel and prints only the suite names.

Example of the output of --list_suites:
  example_init
  miscdev_init
  printk-ringbuffer

Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2026-04-06 13:47:52 -06:00
Thomas Weißschuh
08b96aa962 selftests/nolibc: only use libgcc when really necessary
nolibc should work without libgcc to be compatible with as many
toolchains as possible. Currently the functionality tested by
nolibc-test does not contain any dependencies, make sure it stays
this way by not linking libgcc anymore.

On the ppc target GCC always emits references to '_restgpr_' functions,
so keep linking libgcc there.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260404-nolibc-libgcc-v1-1-eb3ecfe0e176@weissschuh.net
2026-04-06 19:46:54 +02:00
Thomas Weißschuh
e70a7bb575 selftests/nolibc: test the memory allocator
The memory allocator has not seen any testing so far.

Add a simple testcase for it.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/adDRK8D6YBZgv36H@1wt.eu/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260404-nolibc-asprintf-v2-2-17d2d0df9763@weissschuh.net
2026-04-06 19:46:52 +02:00
Thomas Weißschuh
12496aad10 tools/nolibc: add support for asprintf()
Add support for dynamically allocating formatted strings through
asprintf() and vasprintf().

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260401-nolibc-asprintf-v1-3-46292313439f@weissschuh.net
2026-04-06 19:46:51 +02:00
Uday Shankar
320f9b1c6a selftests: ublk: test that teardown after incomplete recovery completes
Before the fix, teardown of a ublk server that was attempting to recover
a device, but died when it had submitted a nonempty proper subset of the
fetch commands to any queue would loop forever. Add a test to verify
that, after the fix, teardown completes. This is done by:

- Adding a new argument to the fault_inject target that causes it die
  after fetching a nonempty proper subset of the IOs to a queue
- Using that argument in a new test while trying to recover an
  already-created device
- Attempting to delete the ublk device at the end of the test; this
  hangs forever if teardown from the fault-injected ublk server never
  completed.

It was manually verified that the test passes with the fix and hangs
without it.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260405-cancel-v2-2-02d711e643c2@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-06 08:37:48 -06:00
Alexis Lothoré (eBPF Foundation)
f254fb58dd selftests/bpf: remove unused toggle in tc_tunnel
tc_tunnel test is based on a send_and_test_data function which takes a
subtest configuration, and a boolean indicating whether the connection
is supposed to fail or not. This boolean is systematically passed to
true, and is a remnant from the first (not integrated) attempts to
convert tc_tunnel to test_progs: those versions validated for
example that a connection properly fails when only one side of the
connection has tunneling enabled. This specific testing has not been
integrated because it involved large timeouts which increased quite a
lot the test duration, for little added value.

Remove the unused boolean from send_and_test_data to simplify the
generic part of subtests.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260403-tc_tunnel_cleanup-v1-1-4f1bb113d3ab@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-05 18:45:48 -07:00
Weiming Shi
262b857da6 selftests/bpf: add get_next_key boundary test for cgroup_storage
Verify that bpf_map__get_next_key() correctly returns -ENOENT when
called on the last (and only) key in a cgroup_storage map. Before the
fix in the previous patch, this would succeed with bogus key data
instead of failing.

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260403132951.43533-3-bestswngs@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-05 18:45:05 -07:00
Mykyta Yatsenko
f64eb44ce9 selftests/bpf: Add torn write detection test for htab BPF_F_LOCK
Add a consistency subtest to htab_reuse that detects torn writes
caused by the BPF_F_LOCK lockless update racing with element
reallocation in alloc_htab_elem().

The test uses three thread roles started simultaneously via a pipe:
 - locked updaters: BPF_F_LOCK|BPF_EXIST in-place updates
 - delete+update workers: delete then BPF_ANY|BPF_F_LOCK insert
 - locked readers: BPF_F_LOCK lookup checking value consistency

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260401-bpf_map_torn_writes-v1-2-782d071c55e7@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-05 18:37:32 -07:00
Linus Torvalds
85fb6da43a RISC-V updates for v7.0-rc7
- Fix a CONFIG_SPARSEMEM crash on RV32 by avoiding early phys_to_page()
 
 - Prevent runtime const infrastructure from being used by modules, similar
   to what was done for x86
 
 - Avoid problems when shutting down ACPI systems with IOMMUs by adding
   a device dependency between IOMMU and devices that use it
 
 - Fix a bug where the CPU pointer masking state isn't properly reset
   when tagged addresses aren't enabled for a task
 
 - Fix some incorrect register assignments, and add some missing ones,
   in kgdb support code
 
 - Fix compilation of non-kernel code that uses the ptrace uapi header
   by replacing BIT() with _BITUL()
 
 - Fix compilation of the validate_v_ptrace kselftest by working around
   kselftest macro expansion issues
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmnSgysACgkQx4+xDQu9
 KksznQ//UKuNcpTgGoTOSAi9m5XrLNG7B0Z2Es5n3IuuFLeX4uFwD8pJjUouAqja
 Y89HKHcbuawAZLxoEj5QImbFxyM6zgdA24R2kM76+Ds5nMM4hetL1hR1Gphs1ghs
 Vg/klLkSQ/QkV8xTZlWe9A3s96PeiYKgwQUYdENjL/OXWjTbi4Ho/EQYjsXWGyuc
 sGkWVbGeqPhNlv8bMcA11kM8rCsvyhFnAC5yIbmybmup6ObzS1tEnOXodp1jVDlZ
 TPzi7SyjSLiTbsaJGZ1O5oFXSrr8zBLFt2RinR7rUt/8Aq8c5xSSvK9n808jytNP
 ubIgqWjW3wGjzbZfQw4WhOIihtAsp2VssWZlt1p0Q7EGOx0g+/zMA6Uq1VVIuEML
 +Xm6BwxLFm43NDSa7HPtytCoN/qqIQmiRkiLAG7WHL3mSkYDXYjTXZxTmp0awJ8R
 WTlZsQFQlnNd8VydP++cwqi/lCPPqWqZbc8ys0lLt57+oe6eE91W3a4jXnIn/5YR
 dtHLdmHF6xG3pVdilEfFgH7CkA1DMlFox5qQRFx4lLWBY7tTEY1S2o1tmIG1zqKd
 QTcaO1VbuobTLAy06kD8XNUNh8jzW0zedk37BcxA+J+1B59c0N9J7rW8rkRYu4Le
 eeIy9p8kPWUB/JfcMY+6jKUjZgQL9un8M4PpVZ/uWJDxQVDJcRs=
 =d0PH
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

 - Fix a CONFIG_SPARSEMEM crash on RV32 by avoiding early phys_to_page()

 - Prevent runtime const infrastructure from being used by modules,
   similar to what was done for x86

 - Avoid problems when shutting down ACPI systems with IOMMUs by adding
   a device dependency between IOMMU and devices that use it

 - Fix a bug where the CPU pointer masking state isn't properly reset
   when tagged addresses aren't enabled for a task

 - Fix some incorrect register assignments, and add some missing ones,
   in kgdb support code

 - Fix compilation of non-kernel code that uses the ptrace uapi header
   by replacing BIT() with _BITUL()

 - Fix compilation of the validate_v_ptrace kselftest by working around
   kselftest macro expansion issues

* tag 'riscv-for-linus-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  ACPI: RIMT: Add dependency between iommu and devices
  selftests: riscv: Add braces around EXPECT_EQ()
  riscv: use _BITUL macro rather than BIT() in ptrace uapi and kselftests
  riscv: Reset pmm when PR_TAGGED_ADDR_ENABLE is not set
  riscv: make runtime const not usable by modules
  riscv: patch: Avoid early phys_to_page()
  riscv: kgdb: fix several debug register assignment bugs
2026-04-05 14:43:47 -07:00
Lorenzo Stoakes (Oracle)
62c65fd740 mm: add mmap_action_map_kernel_pages[_full]()
A user can invoke mmap_action_map_kernel_pages() to specify that the
mapping should map kernel pages starting from desc->start of a specified
number of pages specified in an array.

In order to implement this, adjust mmap_action_prepare() to be able to
return an error code, as it makes sense to assert that the specified
parameters are valid as quickly as possible as well as updating the VMA
flags to include VMA_MIXEDMAP_BIT as necessary.

This provides an mmap_prepare equivalent of vm_insert_pages().  We
additionally update the existing vm_insert_pages() code to use
range_in_vma() and add a new range_in_vma_desc() helper function for the
mmap_prepare case, sharing the code between the two in range_is_subset().

We add both mmap_action_map_kernel_pages() and
mmap_action_map_kernel_pages_full() to allow for both partial and full VMA
mappings.

We update the documentation to reflect the new features.

Finally, we update the VMA tests accordingly to reflect the changes.

Link: https://lkml.kernel.org/r/926ac961690d856e67ec847bee2370ab3c6b9046.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:45 -07:00
Lorenzo Stoakes (Oracle)
668937b7b2 mm: allow handling of stacked mmap_prepare hooks in more drivers
While the conversion of mmap hooks to mmap_prepare is underway, we will
encounter situations where mmap hooks need to invoke nested mmap_prepare
hooks.

The nesting of mmap hooks is termed 'stacking'.  In order to flexibly
facilitate the conversion of custom mmap hooks in drivers which stack, we
must split up the existing __compat_vma_mmap() function into two separate
functions:

* compat_set_desc_from_vma() - This allows the setting of a vm_area_desc
  object's fields to the relevant fields of a VMA.

* __compat_vma_mmap() - Once an mmap_prepare hook has been executed upon a
  vm_area_desc object, this function performs any mmap actions specified by
  the mmap_prepare hook and then invokes its vm_ops->mapped() hook if any
  were specified.

In ordinary cases, where a file's f_op->mmap_prepare() hook simply needs
to be invoked in a stacked mmap() hook, compat_vma_mmap() can be used.

However some drivers define their own nested hooks, which are invoked in
turn by another hook.

A concrete example is vmbus_channel->mmap_ring_buffer(), which is invoked
in turn by bin_attribute->mmap():

vmbus_channel->mmap_ring_buffer() has a signature of:

int (*mmap_ring_buffer)(struct vmbus_channel *channel,
			struct vm_area_struct *vma);

And bin_attribute->mmap() has a signature of:

	int (*mmap)(struct file *, struct kobject *,
		    const struct bin_attribute *attr,
		    struct vm_area_struct *vma);

And so compat_vma_mmap() cannot be used here for incremental conversion of
hooks from mmap() to mmap_prepare().

There are many such instances like this, where conversion to mmap_prepare
would otherwise cascade to a huge change set due to nesting of this kind.

The changes in this patch mean we could now instead convert
vmbus_channel->mmap_ring_buffer() to
vmbus_channel->mmap_prepare_ring_buffer(), and implement something like:

	struct vm_area_desc desc;
	int err;

	compat_set_desc_from_vma(&desc, file, vma);
	err = channel->mmap_prepare_ring_buffer(channel, &desc);
	if (err)
		return err;

	return __compat_vma_mmap(&desc, vma);

Allowing us to incrementally update this logic, and other logic like it.

Unfortunately, as part of this change, we need to be able to flexibly
assign to the VMA descriptor, so have to remove some of the const
declarations within the structure.

Also update the VMA tests to reflect the changes.

Link: https://lkml.kernel.org/r/24aac3019dd34740e788d169fccbe3c62781e648.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:44 -07:00
Lorenzo Stoakes (Oracle)
a1b7fb40cb mm: add mmap_action_simple_ioremap()
Currently drivers use vm_iomap_memory() as a simple helper function for
I/O remapping memory over a range starting at a specified physical address
over a specified length.

In order to utilise this from mmap_prepare, separate out the core logic
into __simple_ioremap_prep(), update vm_iomap_memory() to use it, and add
simple_ioremap_prepare() to do the same with a VMA descriptor object.

We also add MMAP_SIMPLE_IO_REMAP and relevant fields to the struct
mmap_action type to permit this operation also.

We use mmap_action_ioremap() to set up the actual I/O remap operation once
we have checked and figured out the parameters, which makes
simple_ioremap_prepare() easy to implement.

We then add mmap_action_simple_ioremap() to allow drivers to make use of
this mode.

We update the mmap_prepare documentation to describe this mode.  Finally,
we update the VMA tests to reflect this change.

Link: https://lkml.kernel.org/r/a08ef1c4542202684da63bb37f459d5dbbeddd91.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:43 -07:00
Lorenzo Stoakes (Oracle)
c50ca15dd4 mm: add vm_ops->mapped hook
Previously, when a driver needed to do something like establish a
reference count, it could do so in the mmap hook in the knowledge that the
mapping would succeed.

With the introduction of f_op->mmap_prepare this is no longer the case, as
it is invoked prior to actually establishing the mapping.

mmap_prepare is not appropriate for this kind of thing as it is called
before any merge might take place, and after which an error might occur
meaning resources could be leaked.

To take this into account, introduce a new vm_ops->mapped callback which
is invoked when the VMA is first mapped (though notably - not when it is
merged - which is correct and mirrors existing mmap/open/close behaviour).

We do better that vm_ops->open() here, as this callback can return an
error, at which point the VMA will be unmapped.

Note that vm_ops->mapped() is invoked after any mmap action is complete
(such as I/O remapping).

We intentionally do not expose the VMA at this point, exposing only the
fields that could be used, and an output parameter in case the operation
needs to update the vma->vm_private_data field.

In order to deal with stacked filesystems which invoke inner filesystem's
mmap() invocations, add __compat_vma_mapped() and invoke it on vfs_mmap()
(via compat_vma_mmap()) to ensure that the mapped callback is handled when
an mmap() caller invokes a nested filesystem's mmap_prepare() callback.

Update the mmap_prepare documentation to describe the mapped hook and make
it clear what its intended use is.

The vm_ops->mapped() call is handled by the mmap complete logic to ensure
the same code paths are handled by both the compatibility and VMA layers.

Additionally, update VMA userland test headers to reflect the change.

Link: https://lkml.kernel.org/r/4c5e98297eb0aae9565c564e1c296a112702f144.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:43 -07:00
Lorenzo Stoakes (Oracle)
382c0f2895 mm: have mmap_action_complete() handle the rmap lock and unmap
Rather than have the callers handle this both the rmap lock release and
unmapping the VMA on error, handle it within the mmap_action_complete()
logic where it makes sense to, being careful not to unlock twice.

This simplifies the logic and makes it harder to make mistake with this,
while retaining correct behaviour with regard to avoiding deadlocks.

Also replace the call_action_complete() function with a direct invocation
of mmap_action_complete() as the abstraction is no longer required.

Also update the VMA tests to reflect this change.

Link: https://lkml.kernel.org/r/8d1ee8ebd3542d006a47e8382fb80cf5b57ecf10.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:42 -07:00
Lorenzo Stoakes (Oracle)
33506d4bae mm: switch the rmap lock held option off in compat layer
In the mmap_prepare compatibility layer, we don't need to hold the rmap
lock, as we are being called from an .mmap handler.

The .mmap_prepare hook, when invoked in the VMA logic, is called prior to
the VMA being instantiated, but the completion hook is called after the VMA
is linked into the maple tree, meaning rmap walkers can reach it.

The mmap hook does not link the VMA into the tree, so this cannot happen.

Therefore it's safe to simply disable this in the mmap_prepare
compatibility layer.

Also update VMA tests code to reflect current compatibility layer state.

[akpm@linux-foundation.org: fix comment typo, per Vlastimil]
Link: https://lkml.kernel.org/r/dda74230d26a1fcd79a3efab61fa4101dd1cac64.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:42 -07:00
Lorenzo Stoakes (Oracle)
827e97cf4b mm: document vm_operations_struct->open the same as close()
Describe when the operation is invoked and the context in which it is
invoked, matching the description already added for vm_op->close().

While we're here, update all outdated references to an 'area' field for
VMAs to the more consistent 'vma'.

Link: https://lkml.kernel.org/r/7d0ca833c12014320f0fa00f816f95e6e10076f2.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:42 -07:00
Lorenzo Stoakes (Oracle)
3e4bb27068 mm: various small mmap_prepare cleanups
Patch series "mm: expand mmap_prepare functionality and usage", v4.

This series expands the mmap_prepare functionality, which is intended to
replace the deprecated f_op->mmap hook which has been the source of bugs
and security issues for some time.

This series starts with some cleanup of existing mmap_prepare logic, then
adds documentation for the mmap_prepare call to make it easier for
filesystem and driver writers to understand how it works.

It then importantly adds a vm_ops->mapped hook, a key feature that was
missing from mmap_prepare previously - this is invoked when a driver which
specifies mmap_prepare has successfully been mapped but not merged with
another VMA.

mmap_prepare is invoked prior to a merge being attempted, so you cannot
manipulate state such as reference counts as if it were a new mapping.

The vm_ops->mapped hook allows a driver to perform tasks required at this
stage, and provides symmetry against subsequent vm_ops->open,close calls.

The series uses this to correct the afs implementation which wrongly
manipulated reference count at mmap_prepare time.

It then adds an mmap_prepare equivalent of vm_iomap_memory() -
mmap_action_simple_ioremap(), then uses this to update a number of drivers.

It then splits out the mmap_prepare compatibility layer (which allows for
invocation of mmap_prepare hooks in an mmap() hook) in such a way as to
allow for more incremental implementation of mmap_prepare hooks.

It then uses this to extend mmap_prepare usage in drivers.

Finally it adds an mmap_prepare equivalent of vm_map_pages(), which lays
the foundation for future work which will extend mmap_prepare to DMA
coherent mappings.


This patch (of 21):

Rather than passing arbitrary fields, pass a vm_area_desc pointer to mmap
prepare functions to mmap prepare, and an action and vma pointer to mmap
complete in order to put all the action-specific logic in the function
actually doing the work.

Additionally, allow mmap prepare functions to return an error so we can
error out as soon as possible if there is something logically incorrect in
the input.

Update remap_pfn_range_prepare() to properly check the input range for the
CoW case.

Also remove io_remap_pfn_range_complete(), as we can simply set up the
fields correctly in io_remap_pfn_range_prepare() and use
remap_pfn_range_complete() for this.

While we're here, make remap_pfn_range_prepare_vma() a little neater, and
pass mmap_action directly to call_action_complete().

Then, update compat_vma_mmap() to perform its logic directly, as
__compat_vma_map() is not used by anything so we don't need to export it.

Also update compat_vma_mmap() to use vfs_mmap_prepare() rather than
calling the mmap_prepare op directly.

Finally, update the VMA userland tests to reflect the changes.

Link: https://lkml.kernel.org/r/cover.1774045440.git.ljs@kernel.org
Link: https://lkml.kernel.org/r/99f408e4694f44ab12bdc55fe0bd9685d3bd1117.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:41 -07:00
Lorenzo Stoakes (Oracle)
90cb921c4d mm/vma: convert __mmap_region() to use vma_flags_t
Update the mmap() implementation logic implemented in __mmap_region() and
functions invoked by it.  The mmap_region() function converts its input
vm_flags_t parameter to a vma_flags_t value which it then passes to
__mmap_region() which uses the vma_flags_t value consistently from then
on.

As part of the change, we convert map_deny_write_exec() to using
vma_flags_t (it was incorrectly using unsigned long before), and place it
in vma.h, as it is only used internal to mm.

With this change, we eliminate the legacy is_shared_maywrite_vm_flags()
helper function which is now no longer required.

We are also able to update the MMAP_STATE() and VMG_MMAP_STATE() macros to
use the vma_flags_t value.

Finally, we update the VMA tests to reflect the change.

Link: https://lkml.kernel.org/r/1fc33a404c962f02da778da100387cc19bd62153.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:41 -07:00
Lorenzo Stoakes (Oracle)
a06eb2f827 mm/vma: convert vma_modify_flags[_uffd]() to use vma_flags_t
Update the vma_modify_flags() and vma_modify_flags_uffd() functions to
accept a vma_flags_t parameter rather than a vm_flags_t one, and propagate
the changes as needed to implement this change.

Also add vma_flags_reset_once() in replacement of vm_flags_reset_once(). We
still need to be careful here because we need to avoid tearing, so maintain
the assumption that the first system word set of flags are the only ones
that require protection from tearing, and retain this functionality.

We can copy the remainder of VMA flags above 64 bits normally. But
hopefully by the time that happens, we will have replaced the logic that
requires these WRITE_ONCE()'s with something else.

We also replace instances of vm_flags_reset() with a simple write of VMA
flags. We are no longer perform a number of checks, most notable of all the
VMA flags asserts becase:

1. We might be operating on a VMA that is not yet added to the tree.

2. We might be operating on a VMA that is now detached.

3. Really in all but core code, you should be using vma_desc_xxx().

4. Other VMA fields are manipulated with no such checks.

5. It'd be egregious to have to add variants of flag functions just to
   account for cases such as the above, especially when we don't do so for
   other VMA fields. Drivers are the problematic cases and why it was
   especially important (and also for debug as VMA locks were introduced),
   the mmap_prepare work is solving this generally.

Additionally, we can fairly safely assume by this point the soft dirty
flags are being set correctly, so it's reasonable to drop this also.

Finally, update the VMA tests to reflect this.

Link: https://lkml.kernel.org/r/51afbb2b8c3681003cc7926647e37335d793836e.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:41 -07:00
Lorenzo Stoakes (Oracle)
769669bd9c mm/vma: convert as much as we can in mm/vma.c to vma_flags_t
Now we have established a good foundation for vm_flags_t to vma_flags_t
changes, update mm/vma.c to utilise vma_flags_t wherever possible.

We are able to convert VM_STARTGAP_FLAGS entirely as this is only used in
mm/vma.c, and to account for the fact we can't use VM_NONE to make life
easier, place the definition of this within existing #ifdef's to be
cleaner.

Generally the remaining changes are mechanical.

Also update the VMA tests to reflect the changes.

Link: https://lkml.kernel.org/r/5fdeaf8af9a12c2a5d68497495f52fa627d05a5b.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:41 -07:00
Lorenzo Stoakes (Oracle)
a6f14fb593 tools/testing/vma: update VMA tests to test vma_clear_flags[_mask]()
The tests have existing flag clearing logic, so simply expand this to use
the new VMA-specific flag clearing helpers.

Also correct some trivial formatting issue in a macro define.

Link: https://lkml.kernel.org/r/f5da681d3c33039dd4a838188385796eb8d58373.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:41 -07:00
Lorenzo Stoakes (Oracle)
d720b81d01 mm/vma: introduce vma_clear_flags[_mask]()
Introduce a helper function and helper macro to easily clear a VMA's flags
using the new vma_flags_t vma->flags field:

* vma_clear_flags_mask() - Clears all of the flags in a specified mask in
			   the VMA's flags field.
* vma_clear_flags() - Clears all of the specified individual VMA flag bits
		      in a VMA's flags field.

Also update the VMA tests to reflect the change.

Link: https://lkml.kernel.org/r/9bd15da35c2c90e7441265adf01b5c2d3b5c6d41.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Lorenzo Stoakes (Oracle)
3a6455d56b mm: convert do_brk_flags() to use vma_flags_t
In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS
and friends and update the architecture-specific definitions also.

We then have to update some KSM logic to handle VMA flags, and introduce
VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS.

We also introduce two helper functions for use during the time we are
converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and
legacy_to_vma_flags().

This enables us to iteratively make changes to break these changes up into
separate parts.

We use these explicitly here to keep VM_STACK_FLAGS around for certain
users which need to maintain the legacy vm_flags_t values for the time
being.

We are no longer able to rely on the simple VM_xxx being set to zero if
the feature is not enabled, so in the case of VM_DROPPABLE we introduce
VMA_DROPPABLE as the vma_flags_t equivalent, which is set to
EMPTY_VMA_FLAGS if the droppable flag is not available.

While we're here, we make the description of do_brk_flags() into a kdoc
comment, as it almost was already.

We use vma_flags_to_legacy() to not need to update the vm_get_page_prot()
logic as this time.

Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON()
with a VM_WARN_ON_ONCE() as the tested values are no longer build time
available.

We also update mprotect_fixup() to use VMA flags where possible, though we
have to live with a little duplication between vm_flags_t and vma_flags_t
values for the time being until further conversions are made.

While we're here, update VM_SPECIAL to be defined in terms of
VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy().

Finally, we update the VMA tests to reflect these changes.

Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>	[SELinux]
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Lorenzo Stoakes (Oracle)
bbbc17cb02 tools/testing/vma: test vma_flags_count,vma[_flags]_test_single_mask
Update the VMA tests to assert that vma_flags_count() behaves as expected,
as well as vma_flags_test_single_mask() and vma_test_single_mask().

For the test functions we can simply update the existing vma_test(), et
al.  test to also test the single_mask variants.

We also add some explicit testing of an empty VMA flag to this test to
ensure this is handled properly.

In order to test vma_flags_count() we simply take an existing set of flags
and gradually remove flags ensuring the count remains as expected
throughout.

We also update the vma[_flags]_test_all() tests to make clear the
semantics that we expect vma[_flags]_test_all(..., EMPTY_VMA_FLAGS) to
return true, as trivially, all flags of none are always set in VMA flags.

Link: https://lkml.kernel.org/r/4af95d559cd2af0ba3388de1e1386b9f94c0e009.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Lorenzo Stoakes (Oracle)
e79d1c500f mm: introduce vma_flags_count() and vma[_flags]_test_single_mask()
vma_flags_count() determines how many bits are set in VMA flags, using
bitmap_weight().

vma_flags_test_single_mask() determines if a vma_flags_t set of flags
contains a single flag specified as another vma_flags_t value, or if the
sought flag mask is empty, it is defined to return false.

This is useful when we want to declare a VMA flag as optionally a single
flag in a mask or empty depending on kernel configuration.

This allows us to have VM_NONE-like semantics when checking whether the
flag is set.

In a subsequent patch, we introduce the use of VMA_DROPPABLE of type
vma_flags_t using precisely these semantics.

It would be actively confusing to use vma_flags_test_any_single_mask() for
this (and vma_flags_test_all_mask() is not correct to use here, as it
trivially returns true when tested against an empty vma flags mask).

We introduce vma_flags_count() to be able to assert that the compared flag
mask is singular or empty, checked when CONFIG_DEBUG_VM is enabled.

Also update the VMA tests as part of this change.

Link: https://lkml.kernel.org/r/cd778dd02b9f2a01eb54d25a49dea8ec2ddf7753.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Lorenzo Stoakes (Oracle)
63cdb667d1 tools/testing/vma: update VMA flag tests to test vma_test[_any_mask]()
Update the existing test logic to assert that vma_test(), vma_test_any()
and vma_test_any_mask() (implicitly tested via vma_test_any()) are
functioning correctly.

We already have tests for other variants like this, so it's simply a
matter of expanding those tests to also include tests for the VMA-specific
helpers.

Link: https://lkml.kernel.org/r/dea3e97c6c3dd86f1a3f1a0703241b03f6e3a33f.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Lorenzo Stoakes (Oracle)
fb67bba5d9 mm/vma: introduce vma_test[_any[_mask]](), and make inlining consistent
Introduce helper functions and macros to make it convenient to test flags
and flag masks for VMAs, specifically:

* vma_test() - determine if a single VMA flag is set in a VMA.
* vma_test_any_mask() - determine if any flags in a vma_flags_t value are
			set in a VMA.
* vma_test_any() - Helper macro to test if any of specific flags are set.

Also, there are a mix of 'inline's and '__always_inline's in VMA helper
function declarations, update to consistently use __always_inline.

Finally, update the VMA tests to reflect the changes.

Link: https://lkml.kernel.org/r/be1d71f08307d747a82232cbd8664a88c0f41419.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Lorenzo Stoakes (Oracle)
a8add93f80 tools/testing/vma: test that legacy flag helpers work correctly
Update the existing compare_legacy_flags() predicate function to assert
that legacy_to_vma_flags() and vma_flags_to_legacy() behave as expected.

Link: https://lkml.kernel.org/r/3374e50053adb65818fde948ae3488e1e29ae8b1.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:39 -07:00
Lorenzo Stoakes (Oracle)
c8555bc95d mm/vma: introduce [vma_flags,legacy]_to_[legacy,vma_flags]() helpers
While we are still converting VMA flags from vma_flags_t to vm_flags_t,
introduce helpers to convert between the two to allow for iterative
development without having to 'change the world' in a single commit'.

Also update VMA flags tests to reflect the change.

Finally, refresh vma_flags_overwrite_word(),
vma_flag_overwrite_word_once(), vma_flags_set_word() and
vma_flags_clear_word() in the VMA tests to reflect current kernel
implementations - this should make no functional difference, but keeps the
logic consistent between the two.

Link: https://lkml.kernel.org/r/d3569470dbb3ae79134ca7c3eb3fc4df7086e874.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:39 -07:00
Lorenzo Stoakes (Oracle)
3ee5845382 mm/vma: introduce vma_flags_same[_mask/_pair]()
Add helpers to determine if two sets of VMA flags are precisely the same,
that is - that every flag set one is set in another, and neither contain
any flags not set in the other.

We also introduce vma_flags_same_pair() for cases where we want to compare
two sets of VMA flags which are both non-const values.

Also update the VMA tests to reflect the change, we already implicitly
test that this functions correctly having used it for testing purposes
previously.

Link: https://lkml.kernel.org/r/4f764bf619e77205837c7c819b62139ef6337ca3.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:39 -07:00
Lorenzo Stoakes (Oracle)
b22a48ec09 tools/testing/vma: add simple test for append_vma_flags()
Add a simple test for append_vma_flags() to assert that it behaves as
expected.

Additionally, include the VMA_REMAP_FLAGS definition in the VMA tests to
allow us to use this value in the testing.

Link: https://lkml.kernel.org/r/eebd946c5325ad7fae93027245a562eb1aeb68a2.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:39 -07:00
Lorenzo Stoakes (Oracle)
e8d464f4a9 mm/vma: add append_vma_flags() helper
In order to be able to efficiently combine VMA flag masks with additional
VMA flag bits we need to extend the concept introduced in mk_vma_flags()
and __mk_vma_flags() by allowing the specification of a VMA flag mask to
append VMA flag bits to.

Update __mk_vma_flags() to allow for this and update mk_vma_flags()
accordingly, and also provide append_vma_flags() to allow for the caller
to specify which VMA flags mask to append to.

Finally, update the VMA flags tests to reflect the change.

Link: https://lkml.kernel.org/r/9f928cd4688270002f2c0c3777fcc9b49cc7a8ea.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:39 -07:00
Lorenzo Stoakes (Oracle)
06531d2bf3 tools/testing/vma: fix VMA flag tests
The VMA tests are incorrectly referencing NUM_VMA_FLAGS, which doesn't
exist, rather they should reference NUM_VMA_FLAG_BITS.

Additionally, remove the custom-written implementation of __mk_vma_flags()
as this means we are not testing the code as present in the kernel, rather
add the actual __mk_vma_flags() to dup.h and add #ifdef's to handle
declarations differently depending on NUM_VMA_FLAG_BITS.

Link: https://lkml.kernel.org/r/b19c63af3d5efdfe712bf5d5f89368a5360a60f7.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:39 -07:00
Lorenzo Stoakes (Oracle)
7ec1885a7e mm/vma: use new VMA flags for sticky flags logic
Use the new vma_flags_t flags implementation to perform the logic around
sticky flags and what flags are ignored on VMA merge.

We make use of the new vma_flags_empty(), vma_flags_diff_pair(), and
vma_flags_and_mask() functionality.

Also update the VMA tests accordingly.

Link: https://lkml.kernel.org/r/369574f06360ffa44707047e3b58eb4897345fba.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:38 -07:00
Lorenzo Stoakes (Oracle)
bd44d91d0c tools/testing/vma: convert bulk of test code to vma_flags_t
Convert the test code to utilise vma_flags_t as opposed to the deprecate
vm_flags_t as much as possible.

As part of this change, add VMA_STICKY_FLAGS and VMA_SPECIAL_FLAGS as
early versions of what these defines will look like in the kernel logic
once this logic is implemented.

Link: https://lkml.kernel.org/r/df90efe29300bd899989f695be4ae3adc901a828.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:38 -07:00
Lorenzo Stoakes (Oracle)
8228e42b5f mm/vma: add further vma_flags_t unions
In order to utilise the new vma_flags_t type, we currently place it in
union with legacy vm_flags fields of type vm_flags_t to make the
transition smoother.

Add vma_flags_t union entries for mm->def_flags and vmg->vm_flags -
mm->def_vma_flags and vmg->vma_flags respectively.

Once the conversion is complete, these will be replaced with vma_flags_t
entries alone.

Also update the VMA tests to reflect the change.

Link: https://lkml.kernel.org/r/d507d542c089ba132e9da53f2ff7f80ca117c3b4.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:38 -07:00
Lorenzo Stoakes (Oracle)
e4fd34b84b tools/testing/vma: add unit tests flag empty, diff_pair, and[_mask]
Add VMA unit tests to assert that:

* vma_flags_empty()
* vma_flags_diff_pair()
* vma_flags_and_mask()
* vma_flags_and()

All function as expected.

In additional to the added tests, in order to make testing easier, add
vma_flags_same_mask() and vma_flags_same() for testing only.  If/when
these are required in kernel code, they can be moved over.

Also add ASSERT_FLAGS_[NOT_]SAME[_MASK](), ASSERT_FLAGS_[NON]EMPTY() test
helpers to make asserting flag state easier and more convenient.

Link: https://lkml.kernel.org/r/471ce7ceb1d32e5fc9c0660966b9eacdf899b4d1.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:38 -07:00
Lorenzo Stoakes (Oracle)
6bc0987d0b mm/vma: add vma_flags_empty(), vma_flags_and(), vma_flags_diff_pair()
Patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code", v4.

This series converts a lot of the existing use of the legacy vm_flags_t
data type to the new vma_flags_t type which replaces it.

In order to do so it adds a number of additional helpers:

* vma_flags_empty() - Determines whether a vma_flags_t value has no bits
  set.

* vma_flags_and() - Performs a bitwise AND between two vma_flags_t values.

* vma_flags_diff_pair() - Determines which flags are not shared between a
  pair of VMA flags (typically non-constant values)

* append_vma_flags() - Similar to mk_vma_flags(), but allows a vma_flags_t
  value to be specified (typically a constant value) which will be copied
  and appended to to create a new vma_flags_t value, with additional flags
  specified to append to it.

* vma_flags_same() - Determines if a vma_flags_t value is exactly equal to
  a set of VMA flags.

* vma_flags_same_mask() - Determines if a vma_flags_t value is eactly equal
  to another vma_flags_t value (typically constant).

* vma_flags_same_pair() - Determines if a pair of vma_flags_t values are
  exactly equal to one another (typically both non-constant).

* vma_flags_to_legacy() - Converts a vma_flags_t value to a vm_flags_t
  value, used to enable more iterative introduction of the use of
  vma_flags_t.

* legacy_to_vma_flags() - Converts a vm_flags_t value to a vma_flags-t
  value, for the same purpose.

* vma_flags_test_single_mask() - Tests whether a vma_flags_t value contain
  the single flag specified in an input vma_flags_t flag mask, or if that
  flag mask is empty, is defined to return false. Useful for
  config-predicated VMA flag mask defines.

* vma_test() - Tests whether a VMA's flags contain a specific singular VMA
  flag.

* vma_test_any() - Tests whether a VMA's flags contain any of a set of VMA
  flags.

* vma_test_any_mask() - Tests whether a VMA's flags contain any of the
  flags specified in another, typically constant, vma_flags_t value.

* vma_test_single_mask() - Tests whether a VMA's flags contain the single
  flag specified in an input vma_flags_t flag mask, or if that flag mask is
  empty, is defined to return false. Useful for config-predicated VMA flag
  mask defines.

* vma_clear_flags() - Clears a specific set of VMA flags from a vma_flags_t
  value.

* vma_clear_flags_mask() - Clears those flag set in a vma_flags_t value
  (typically constant) from a (typically not constant) vma_flags_t value.

The series mostly focuses on the the VMA specific code, especially that
contained in mm/vma.c and mm/vma.h.

It updates both brk() and mmap() logic to utils vma_flags_t values as much
as is practiaclly possible at this point, changing surrounding logic to be
able to do so.

It also updates the vma_modify_xxx() functions where they interact with VMA
flags directly to use vm_flags_t values where possible.

There is extensive testing added in the VMA userland tests to assert that
all of these new VMA flag functions work correctly.


This patch (of 25):

Firstly, add the ability to determine if VMA flags are empty, that is no
flags are set in a vma_flags_t value.

Next, add the ability to obtain the equivalent of the bitwise and of two
vma_flags_t values, via vma_flags_and_mask().

Next, add the ability to obtain the difference between two sets of VMA
flags, that is the equivalent to the exclusive bitwise OR of the two sets
of flags, via vma_flags_diff_pair().

vma_flags_xxx_mask() typically operates on a pointer to a vma_flags_t
value, which is assumed to be an lvalue of some kind (such as a field in a
struct or a stack variable) and an rvalue of some kind (typically a
constant set of VMA flags obtained e.g.  via mk_vma_flags() or
equivalent).

However vma_flags_diff_pair() is intended to operate on two lvalues, so
use the _pair() suffix to make this clear.

Finally, update VMA userland tests to add these helpers.

We also port bitmap_xor() and __bitmap_xor() to the tools/ headers and
source to allow the tests to work with vma_flags_diff_pair().

Link: https://lkml.kernel.org/r/cover.1774034900.git.ljs@kernel.org
Link: https://lkml.kernel.org/r/53ab55b7da91425775e42c03177498ad6de88ef4.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:38 -07:00
Zi Yan
224f129261 selftests/mm: add folio_split() and filemap_get_entry() race test
The added folio_split_race_test is a modified C port of the race condition
test from [1].  The test creates shmem huge pages, where the main thread
punches holes in the shmem to cause folio_split() in the kernel and a set
of 16 threads reads the shmem to cause filemap_get_entry() in the kernel. 
filemap_get_entry() reads the folio and xarray split by folio_split()
locklessly.  The original test[2] is written in rust and uses memfd (shmem
backed).  This C port uses shmem directly and use a single process.

Note: the initial rust to C conversion is done by Cursor.

Link: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/ [1]
Link: https://github.com/dfinity/thp-madv-remove-test [2]
Link: https://lkml.kernel.org/r/20260323163717.184107-1-ziy@nvidia.com
Co-developed-by: Bas van Dijk <bas@dfinity.org>
Signed-off-by: Bas van Dijk <bas@dfinity.org>
Co-developed-by: Adam Bratschi-Kaye <adam.bratschikaye@dfinity.org>
Signed-off-by: Adam Bratschi-Kaye <adam.bratschikaye@dfinity.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:36 -07:00
Lorenzo Stoakes (Oracle)
2d1e54aab6 mm: abstract reading sysctl_max_map_count, and READ_ONCE()
Concurrent reads and writes of sysctl_max_map_count are possible, so we
should READ_ONCE() and WRITE_ONCE().

The sysctl procfs logic already enforces WRITE_ONCE(), so abstract the
read side with get_sysctl_max_map_count().

While we're here, also move the field to mm/internal.h and add the getter
there since only mm interacts with it, there's no need for anybody else to
have access.

Finally, update the VMA userland tests to reflect the change.

Link: https://lkml.kernel.org/r/0715259eb37cbdfde4f9e5db92a20ec7110a1ce5.1773249037.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Jianzhou Zhao <luckd0g@163.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:28 -07:00
Waiman Long
2d028f3e4b selftest: memcg: skip memcg_sock test if address family not supported
The test_memcg_sock test in memcontrol.c sets up an IPv6 socket and send
data over it to consume memory and verify that memory.stat.sock and
memory.current values are close.

On systems where IPv6 isn't enabled or not configured to support
SOCK_STREAM, the test_memcg_sock test always fails.  When the socket()
call fails, there is no way we can test the memory consumption and verify
the above claim.  I believe it is better to just skip the test in this
case instead of reporting a test failure hinting that there may be
something wrong with the memcg code.

Link: https://lkml.kernel.org/r/20260311200526.885899-1-longman@redhat.com
Fixes: 5f8f019380 ("selftests: cgroup/memcontrol: add basic test for socket accounting")
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:27 -07:00
Mike Rapoport (Microsoft)
f08f610ea0 selftests/mm: pagemap_ioctl: remove hungarian notation
Replace lpBaseAddress with addr and dwRegionSize with size.

Link: https://lkml.kernel.org/r/20260311180737.3767545-1-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:27 -07:00
SeongJae Park
ddac713da3 selftests/damon/sysfs.py: test goal_tuner commit
Extend the near-full DAMON parameters commit selftest to commit goal_tuner
and confirm the internal status is updated as expected.

Link: https://lkml.kernel.org/r/20260310010529.91162-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
c2b0cb96e7 selftests/damon/drgn_dump_damon_status: support quota goal_tuner dumping
Update drgn_dump_damon_status.py, which is being used to dump the
in-kernel DAMON status for tests, to dump goal_tuner setup status.

Link: https://lkml.kernel.org/r/20260310010529.91162-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
SeongJae Park
c00863bc7c selftests/damon/_damon_sysfs: support goal_tuner setup
Add support of goal_tuner setup to the test-purpose DAMON sysfs interface
control helper, _damon_sysfs.py.

Link: https://lkml.kernel.org/r/20260310010529.91162-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:26 -07:00
Anthony Yznaga
d239462787 mm: prevent droppable mappings from being locked
Droppable mappings must not be lockable.  There is a check for VMAs with
VM_DROPPABLE set in mlock_fixup() along with checks for other types of
unlockable VMAs which ensures this when calling mlock()/mlock2().

For mlockall(MCL_FUTURE), the check for unlockable VMAs is different.  In
apply_mlockall_flags(), if the flags parameter has MCL_FUTURE set, the
current task's mm's default VMA flag field mm->def_flags has VM_LOCKED
applied to it.  VM_LOCKONFAULT is also applied if MCL_ONFAULT is also set.
When these flags are set as default in this manner they are cleared in
__mmap_complete() for new mappings that do not support mlock.  A check for
VM_DROPPABLE in __mmap_complete() is missing resulting in droppable
mappings created with VM_LOCKED set.  To fix this and reduce that chance
of similar bugs in the future, introduce and use vma_supports_mlock().

Link: https://lkml.kernel.org/r/20260310155821.17869-1-anthony.yznaga@oracle.com
Fixes: 9651fcedf7 ("mm: add MAP_DROPPABLE for designating always lazily freeable mappings")
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Suggested-by: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Tested-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:25 -07:00
David Hildenbrand (Arm)
a9496e9e4b mm: move vma_mmu_pagesize() from hugetlb to vma.c
vma_mmu_pagesize() is also queried on non-hugetlb VMAs and does not really
belong into hugetlb.c.

PPC64 provides a custom overwrite with CONFIG_HUGETLB_PAGE, see
arch/powerpc/mm/book3s64/slice.c, so we cannot easily make this a static
inline function.

So let's move it to vma.c and add some proper kerneldoc.

To make vma tests happy, add a simple vma_kernel_pagesize() stub in
tools/testing/vma/include/custom.h.

Link: https://lkml.kernel.org/r/20260309151901.123947-3-david@kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:23 -07:00
SeongJae Park
300252ebb1 selftests/damon/config: enable DAMON_DEBUG_SANITY
CONFIG_DAMON_DEBUG_SANITY is recommended for DAMON development and test
setups.  Enable it on the build config for DAMON selftests.

Link: https://lkml.kernel.org/r/20260306152914.86303-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:21 -07:00
Lorenzo Stoakes (Oracle)
5cfb95f38a tools/testing/vma: add test for vma_flags_test(), vma_desc_test()
Now we have helpers which test singular VMA flags - vma_flags_test() and
vma_desc_test() - add a test to explicitly assert that these behave as
expected.

[ljs@kernel.org: test_vma_flags_test(): use struct initializer, per David]
  Link: https://lkml.kernel.org/r/f6f396d2-1ba2-426f-b756-d8cc5985cc7c@lucifer.local
Link: https://lkml.kernel.org/r/376a39eb9e134d2c8ab10e32720dd292970b080a.1772704455.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Chunhai Guo <guochunhai@vivo.com>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hongbo Li <lihongbo22@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:19 -07:00
Lorenzo Stoakes (Oracle)
0c2aa66357 mm: reintroduce vma_desc_test() as a singular flag test
Similar to vma_flags_test(), we have previously renamed vma_desc_test() to
vma_desc_test_any().  Now that is in place, we can reintroduce
vma_desc_test() to explicitly check for a single VMA flag.

As with vma_flags_test(), this is useful as often flag tests are against a
single flag, and vma_desc_test_any(flags, VMA_READ_BIT) reads oddly and
potentially causes confusion.

As with vma_flags_test() a combination of sparse and vma_flags_t being a
struct means that users cannot misuse this function without it getting
flagged.

Also update the VMA tests to reflect this change.

Link: https://lkml.kernel.org/r/3a65ca23defb05060333f0586428fe279a484564.1772704455.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Chunhai Guo <guochunhai@vivo.com>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hongbo Li <lihongbo22@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:19 -07:00
Lorenzo Stoakes (Oracle)
5e6d45d720 mm: reintroduce vma_flags_test() as a singular flag test
Since we've now renamed vma_flags_test() to vma_flags_test_any() to be
very clear as to what we are in fact testing, we now have the opportunity
to bring vma_flags_test() back, but for explicitly testing a single VMA
flag.

This is useful, as often flag tests are against a single flag, and
vma_flags_test_any(flags, VMA_READ_BIT) reads oddly and potentially causes
confusion.

We use sparse to enforce that users won't accidentally pass vm_flags_t to
this function without it being flagged so this should make it harder to
get this wrong.

Of course, passing vma_flags_t to the function is impossible, as it is a
struct.

Also update the VMA tests to reflect this change.

Link: https://lkml.kernel.org/r/f33f8d7f16c3f3d286a1dc2cba12c23683073134.1772704455.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Chunhai Guo <guochunhai@vivo.com>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hongbo Li <lihongbo22@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:18 -07:00
Lorenzo Stoakes (Oracle)
a5eee1128d mm: always inline __mk_vma_flags() and invoked functions
Be explicit about __mk_vma_flags() (which is used by the mk_vma_flags()
macro) always being inline, as we rely on the compiler to evaluate the
loop in this function and determine that it can replace the code with the
an equivalent constant value, e.g. that:

__mk_vma_flags(2, (const vma_flag_t []){ VMA_WRITE_BIT, VMA_EXEC_BIT });

Can be replaced with:

(1UL << VMA_WRITE_BIT) | (1UL << VMA_EXEC_BIT)

= (1UL << 1) | (1UL << 2) = 6

Most likely an 'inline' will suffice for this, but be explicit as we can
be.

Also update all of the functions __mk_vma_flags() ultimately invokes to be
always inline too.

Note that test_bitmap_const_eval() asserts that the relevant bitmap
functions result in build time constant values.

Additionally, vma_flag_set() operates on a vma_flags_t type, so it is
inconsistently named versus other VMA flags functions.

We only use vma_flag_set() in __mk_vma_flags() so we don't need to worry
about its new name being rather cumbersome, so rename it to
vma_flags_set_flag() to disambiguate it from vma_flags_set().

Also update the VMA test headers to reflect the changes.

Link: https://lkml.kernel.org/r/241f49c52074d436edbb9c6a6662a8dc142a8f43.1772704455.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Chunhai Guo <guochunhai@vivo.com>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hongbo Li <lihongbo22@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:18 -07:00
Lorenzo Stoakes (Oracle)
0b3ed2a495 mm: add vma_desc_test_all() and use it
erofs and zonefs are using vma_desc_test_any() twice to check whether all
of VMA_SHARED_BIT and VMA_MAYWRITE_BIT are set, this is silly, so add
vma_desc_test_all() to test all flags and update erofs and zonefs to use
it.

While we're here, update the helper function comments to be more
consistent.

Also add the same to the VMA test headers.

Link: https://lkml.kernel.org/r/568c8f8d6a84ff64014f997517cba7a629f7eed6.1772704455.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Chunhai Guo <guochunhai@vivo.com>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hongbo Li <lihongbo22@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yue Hu <zbestahu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:18 -07:00
Lorenzo Stoakes (Oracle)
e650bb30ca mm: rename VMA flag helpers to be more readable
Patch series "mm: vma flag tweaks".

The ongoing work around introducing non-system word VMA flags has
introduced a number of helper functions and macros to make life easier
when working with these flags and to make conversions from the legacy use
of VM_xxx flags more straightforward.

This series improves these to reduce confusion as to what they do and to
improve consistency and readability.

Firstly the series renames vma_flags_test() to vma_flags_test_any() to
make it abundantly clear that this function tests whether any of the flags
are set (as opposed to vma_flags_test_all()).

It then renames vma_desc_test_flags() to vma_desc_test_any() for the same
reason.  Note that we drop the 'flags' suffix here, as
vma_desc_test_any_flags() would be cumbersome and 'test' implies a flag
test.

Similarly, we rename vma_test_all_flags() to vma_test_all() for
consistency.

Next, we have a couple of instances (erofs, zonefs) where we are now
testing for vma_desc_test_any(desc, VMA_SHARED_BIT) &&
vma_desc_test_any(desc, VMA_MAYWRITE_BIT).

This is silly, so this series introduces vma_desc_test_all() so these
callers can instead invoke vma_desc_test_all(desc, VMA_SHARED_BIT,
VMA_MAYWRITE_BIT).

We then observe that quite a few instances of vma_flags_test_any() and
vma_desc_test_any() are in fact only testing against a single flag.

Using the _any() variant here is just confusing - 'any' of single item
reads strangely and is liable to cause confusion.

So in these instances the series reintroduces vma_flags_test() and
vma_desc_test() as helpers which test against a single flag.

The fact that vma_flags_t is a struct and that vma_flag_t utilises sparse
to avoid confusion with vm_flags_t makes it impossible for a user to
misuse these helpers without it getting flagged somewhere.

The series also updates __mk_vma_flags() and functions invoked by it to
explicitly mark them always inline to match expectation and to be
consistent with other VMA flag helpers.

It also renames vma_flag_set() to vma_flags_set_flag() (a function only
used by __mk_vma_flags()) to be consistent with other VMA flag helpers.

Finally it updates the VMA tests for each of these changes, and introduces
explicit tests for vma_flags_test() and vma_desc_test() to assert that
they behave as expected.


This patch (of 6):

On reflection, it's confusing to have vma_flags_test() and
vma_desc_test_flags() test whether any comma-separated VMA flag bit is
set, while also having vma_flags_test_all() and vma_test_all_flags()
separately test whether all flags are set.

Firstly, rename vma_flags_test() to vma_flags_test_any() to eliminate this
confusion.

Secondly, since the VMA descriptor flag functions are becoming rather
cumbersome, prefer vma_desc_test*() to vma_desc_test_flags*(), and also
rename vma_desc_test_flags() to vma_desc_test_any().

Finally, rename vma_test_all_flags() to vma_test_all() to keep the
VMA-specific helper consistent with the VMA descriptor naming convention
and to help avoid confusion vs.  vma_flags_test_all().

While we're here, also update whitespace to be consistent in helper
functions.

Link: https://lkml.kernel.org/r/cover.1772704455.git.ljs@kernel.org
Link: https://lkml.kernel.org/r/0f9cb3c511c478344fac0b3b3b0300bb95be95e9.1772704455.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Suggested-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Chunhai Guo <guochunhai@vivo.com>
Cc: Damien Le Maol <dlemoal@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hongbo Li <lihongbo22@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Yue Hu <zbestahu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:18 -07:00
Jason Miu
6b0dd42d76 kho: remove finalize state and clients
Eliminate the `kho_finalize()` function and its associated state from the
KHO subsystem.  The transition to a radix tree for memory tracking makes
the explicit "finalize" state and its serialization step obsolete.

Remove the `kho_finalize()` and `kho_finalized()` APIs and their stub
implementations.  Update KHO client code and the debugfs interface to no
longer call or depend on the `kho_finalize()` mechanism.

Complete the move towards a stateless KHO, simplifying the overall design
by removing unnecessary state management.

Link: https://lkml.kernel.org/r/20260206021428.3386442-3-jasonmiu@google.com
Signed-off-by: Jason Miu <jasonmiu@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:04 -07:00
Chen Ni
15c578d0dc selftests/mm: remove duplicate include of unistd.h
Remove duplicate inclusion of unistd.h in memory-failure.c to clean up
redundant code.

Link: https://lkml.kernel.org/r/20260211064311.2981726-1-nichen@iscas.ac.cn
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:01 -07:00
Jiayuan Chen
4e89004eeb selftests/cgroup: add test for zswap incompressible pages
Add test_zswap_incompressible() to verify that the zswap_incomp memcg stat
correctly tracks incompressible pages.

The test allocates memory filled with random data from /dev/urandom, which
cannot be effectively compressed by zswap.  When this data is swapped out
to zswap, it should be stored as-is and tracked by the zswap_incomp
counter.

The test verifies that:
1. Pages are swapped out to zswap (zswpout increases)
2. Incompressible pages are tracked (zswap_incomp increases)

test:
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo Y > /sys/module/zswap/parameters/enabled

./test_zswap
 TAP version 13
 1..8
 ok 1 test_zswap_usage
 ok 2 test_swapin_nozswap
 ok 3 test_zswapin
 ok 4 test_zswap_writeback_enabled
 ok 5 test_zswap_writeback_disabled
 ok 6 test_no_kmem_bypass
 ok 7 test_no_invasive_cgroup_shrink
 ok 8 test_zswap_incompressible
 Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://lkml.kernel.org/r/20260213071827.5688-3-jiayuan.chen@linux.dev
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:00 -07:00
AnishMulay
54218f10df selftests/mm: skip migration tests if NUMA is unavailable
Currently, the migration test asserts that numa_available() returns 0.  On
systems where NUMA is not available (returning -1), such as certain ARM64
configurations or single-node systems, this assertion fails and crashes
the test.

Update the test to check the return value of numa_available().  If it is
less than 0, skip the test gracefully instead of failing.

This aligns the behavior with other MM selftests (like rmap) that skip
when NUMA support is missing.

Link: https://lkml.kernel.org/r/20260218163941.13499-1-anishm7030@gmail.com
Fixes: 0c2d087284 ("mm: add selftests for migration entries")
Signed-off-by: AnishMulay <anishm7030@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: Sayali Patil <sayalip@linux.ibm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:52:57 -07:00
Liam R. Howlett
280b792cac maple_tree: use maple copy node for mas_wr_split()
Instead of using the maple big node, use the maple copy node for reduced
stack usage and aligning with mas_wr_rebalance() and
mas_wr_spanning_store().

Splitting a node is similar to rebalancing, but a new evaluation of when
to ascend is needed.  The only other difference is that the data is pushed
and never rebalanced at each level.

The testing must also align with the changes to this commit to ensure the
test suite continues to pass.

Link: https://lkml.kernel.org/r/20260130205935.2559335-27-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andrew Ballance <andrewjballance@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:52:56 -07:00
Liam R. Howlett
ebfee00c0b maple_tree: add test for rebalance calculation off-by-one
During the big node removal, an incorrect rebalance step went too far up
the tree causing insufficient nodes.  Test the faulty condition by
recreating the scenario in the userspace testing.

Link: https://lkml.kernel.org/r/20260130205935.2559335-24-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andrew Ballance <andrewjballance@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:52:56 -07:00
Liam R. Howlett
a9c6716e08 maple_tree: start using maple copy node for destination
Stop using the maple subtree state and big node in favour of using three
destinations in the maple copy node.  That is, expand the way leaves were
handled to all levels of the tree and use the maple copy node to track the
new nodes.

Extract out the sibling init into the data calculation since this is where
the insufficient data can be detected.  The remainder of the sibling code
to shift the next iteration is moved to the spanning_ascend() function,
since it is not always needed.

Next introduce the dst_setup() function which will decide how many nodes
are needed to contain the data at this level.  Using the destination
count, populate the copy node's dst array with the new nodes and set
d_count to the correct value.  Note that this can be tricky in the case of
a leaf node with exactly enough room because of the rule against NULLs at
the end of leaves.

Once the destinations are ready, copy the data by altering the
cp_data_write() function to copy from the sources to the destinations
directly.  This eliminates the use of the big node in this code path.  On
node completion, node_finalise() will zero out the remaining area and set
the metadata, if necessary.

spanning_ascend() is used to decide if the operation is complete.  It may
create a new root, converge into one destination, or continue upwards by
ascending the left and right write maple states.

One test case setup needed to be tweaked so that the targeted node was
surrounded by full nodes.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20260130205935.2559335-18-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andrew Ballance <andrewjballance@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:52:55 -07:00
Liam R. Howlett
b14ffd2c6c maple_tree: testing update for spanning store
Spanning store had some corner cases which showed up during rcu stress
testing.  Add explicit tests for those cases.

At the same time add some locking for easier visibility of the rcu stress
testing.  Only a single dump of the tree will happen on the first detected
issue instead of flooding the console with output.

Link: https://lkml.kernel.org/r/20260130205935.2559335-13-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andrew Ballance <andrewjballance@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:52:54 -07:00
Charlie Jenkins
7eb2e29546 selftests: riscv: Add license to cfi selftest
The cfi selftest was missing a license so add it.

Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Link: https://patch.msgid.link/20260309-fix_selftests-v2-4-9d5a553a531e@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:42:41 -06:00
Paul Walmsley
08ee155905 prctl: cfi: change the branch landing pad prctl()s to be more descriptive
Per Linus' comments requesting the replacement of "INDIR_BR_LP" in the
indirect branch tracking prctl()s with something more readable, and
suggesting the use of the speculation control prctl()s as an exemplar,
reimplement the prctl()s and related constants that control per-task
forward-edge control flow integrity.

This primarily involves two changes.  First, the prctls are
restructured to resemble the style of the speculative execution
workaround control prctls PR_{GET,SET}_SPECULATION_CTRL, to make them
easier to extend in the future.  Second, the "indir_br_lp" abbrevation
is expanded to "branch_landing_pads" to be less telegraphic.  The
kselftest and documentation is adjusted accordingly.

Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:40:58 -06:00
Paul Walmsley
e5342fe2c1 riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headers
Similar to the recent change to expand "LP" to "branch landing pad",
let's expand "SS" in the ptrace uapi macros to "shadow stack" as well.
This aligns with the existing prctl() arguments, which use the
expanded "shadow stack" names, rather than just the abbreviation.

Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/
Cc: Deepak Gupta <debug@rivosinc.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:40:58 -06:00
Paul Walmsley
ac4e61c730 riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headers
Per Linus' comments about the unreadability of abbreviations such as
"LP", rename the RISC-V ptrace landing pad CFI macro names to be more
explicit.  This primarily involves expanding "LP" in the names to some
variant of "branch landing pad."

Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/
Cc: Deepak Gupta <debug@rivosinc.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:40:58 -06:00
Charlie Jenkins
511361fe7a selftests: riscv: Add braces around EXPECT_EQ()
EXPECT_EQ() expands to multiple lines, breaking up one-line if
statements. This issue was not present in the patch on the mailing list
but was instead introduced by the maintainer when attempting to fix up
checkpatch warnings. Add braces around EXPECT_EQ() to avoid the error
even though checkpatch suggests them to be removed:

validate_v_ptrace.c:626:17: error: ‘else’ without a previous ‘if’

Fixes: 3789d5eecd ("selftests: riscv: verify syscalls discard vector context")
Fixes: 30eb191c89 ("selftests: riscv: verify ptrace rejects invalid vector csr inputs")
Fixes: 849f05ae1e ("selftests: riscv: verify ptrace accepts valid vector csr values")
Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Reviewed-and-tested-by: Sergey Matyukevich <geomatsi@gmail.com>
Link: https://patch.msgid.link/20260309-fix_selftests-v2-2-9d5a553a531e@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:37:57 -06:00
Paul Walmsley
87ad7cc9aa riscv: use _BITUL macro rather than BIT() in ptrace uapi and kselftests
Fix the build of non-kernel code that includes the RISC-V ptrace uapi
header, and the RISC-V validate_v_ptrace.c kselftest, by using the
_BITUL() macro rather than BIT().  BIT() is not available outside
the kernel.

Based on patches and comments from Charlie Jenkins, Michael Neuling,
and Andreas Schwab.

Fixes: 30eb191c89 ("selftests: riscv: verify ptrace rejects invalid vector csr inputs")
Fixes: 2af7c9cf02 ("riscv/ptrace: expose riscv CFI status and state via ptrace and in core files")
Cc: Andreas Schwab <schwab@suse.de>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Charlie Jenkins <thecharlesjenkins@gmail.com>
Link: https://patch.msgid.link/20260330024248.449292-1-mikey@neuling.org
Link: https://lore.kernel.org/linux-riscv/20260309-fix_selftests-v2-1-9d5a553a531e@gmail.com/
Link: https://lore.kernel.org/linux-riscv/20260309-fix_selftests-v2-3-9d5a553a531e@gmail.com/
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:37:54 -06:00
Thomas Weißschuh
572246dcdd tools/nolibc: handle all major and minor numbers in makedev() and friends
Remove the limitation of only handling small major and minor numbers.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260404-nolibc-makedev-v2-5-456a429bf60c@weissschuh.net
2026-04-04 10:29:02 +02:00
Thomas Weißschuh
5afc7e9b90 selftests/nolibc: add a test for stat().st_rdev
The handling of 'dev_t' values is about to be changed.

Add a test to make sure they are returned correctly from stat().

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260404-nolibc-makedev-v2-2-456a429bf60c@weissschuh.net
2026-04-04 10:28:44 +02:00
Thomas Weißschuh
9c0ff257fb selftests/nolibc: add some tests for makedev() and friends
These functions/macros are about to be changed.

Add some tests to make sure they continue working.
As they only handle small dev_t values, only test those for now.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://patch.msgid.link/20260404-nolibc-makedev-v2-1-456a429bf60c@weissschuh.net
2026-04-04 10:28:35 +02:00
Yosry Ahmed
052ca584bd KVM: selftests: Drop 'invalid' from svm_nested_invalid_vmcb12_gpa's name
The test checks both invalid GPAs as well as unmappable GPAs, so drop
'invalid' from its name.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260316202732.3164936-10-yosry@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-03 16:08:05 -07:00
Yosry Ahmed
428543fbf0 KVM: selftests: Rework svm_nested_invalid_vmcb12_gpa
The test currently allegedly makes sure that VMRUN causes a #GP in
vmcb12 GPA is valid but unmappable. However, it calls run_guest() with
an the test vmcb12 GPA, and the #GP is produced from VMLOAD, not VMRUN.

Additionally, the underlying logic just changed to match architectural
behavior, and all of VMRUN/VMLOAD/VMSAVE fail emulation if vmcb12 cannot
be mapped. The CPU still injects a #GP if the vmcb12 GPA exceeds
maxphyaddr.

Rework the test such to use the KVM_ONE_VCPU_TEST[_SUITE] harness, and
test all of VMRUN/VMLOAD/VMSAVE with both an invalid GPA (-1ULL) causing
a #GP, and a valid but unmappable GPA causing emulation failure. Execute
the instructions directly from L1 instead of run_guest() to make sure
the #GP or emulation failure is produced by the right instruction.

Leave the #VMEXIT with unmappable GPA test case as-is, but wrap it with
a test harness as well.

Opportunisitically drop gp_triggered, as the test already checks that
a #GP was injected through a SYNC. Also, use the first unmapped GPA
instead of the maximum legal GPA, as some CPUs inject a #GP for the
maximum legal GPA (likely in a reserved area).

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260316202732.3164936-9-yosry@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-03 16:08:04 -07:00
Jakub Kicinski
764d0833e7 selftests: drv-net: gro: add a test for bad IPv4 csum
We have a test for coalescing with bad TCP checksum, let's also
test bad IPv4 header checksum.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:45 -07:00
Jakub Kicinski
9a84a4047d selftests: drv-net: gro: test ip6ip6
We explicitly test ipip encap. Let's add ip6ip6, too. Having
just ipip seems like favoring IPv4 which we should not do :)
Testing all combinations is left for future work, not sure
it's actually worth it.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:44 -07:00
Jakub Kicinski
024597cc20 selftests: drv-net: gro: make large packet math more precise
When constructing the packets for large_* test cases we use
a static value for packet count and MSS. It works okay for
ipv4 vs ipv6 but the gap between ipv4 and ip6ip6 is going to
be quite significant.

Make the defines calculate the worst case values, those
are only used for sizing stack arrays. Create helpers for
calculating precise values based on the exact test case.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:44 -07:00
Jakub Kicinski
166b0cc6df selftests: drv-net: gro: remove TOTAL_HDR_LEN
Willem points out TOTAL_HDR_LEN is identical to MAX_HDR_LEN.
This seems to have been the case ever since the test was added.
Replace the uses of TOTAL_HDR_LEN with MAX_HDR_LEN, MAX seems
more common for what this value is.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:44 -07:00
Jakub Kicinski
5469b695f2 selftests: drv-net: gro: prepare for ip6ip6 support
Try to use already calculated offsets and not depend on the ipip
flag as much. This patch should not change any functionality,
it's just a cleanup to make ip6ip6 support easier.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:43 -07:00
Jakub Kicinski
d973484747 selftests: drv-net: gro: always wait for FIN in the capacity test
The new capacity/order test exits as soon as it sees the expected
packet sequence. This may allow the "flushing" FIN packet to spill
over to the next test. Let's always wait for the FIN before exiting.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:43 -07:00
Jakub Kicinski
436ea8a1b7 selftests: drv-net: gro: add 1 byte payload test
Small IPv4 packets get padded to 60B, this may break / confuse
some buggy implementations. Add a test to coalesce a 1B payload.
Keep this separate from the lrg_sml test because I suspect some
implementations may not handle this case (treat padded frames
as ineligible for coalescing).

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:43 -07:00
Jakub Kicinski
30f831b44a selftests: drv-net: gro: add data burst test case
Add a test trying to induce a GRO context timeout followed
by another sequence of packets for the same flow. The second
burst arrives 100ms after the first one so any implementation
(SW or HW) must time out waiting at that point. We expect both
bursts to be aggregated successfully but separately.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260402210000.1512696-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-03 15:05:42 -07:00
Dave Jiang
202432ae8a Merge branch 'for-7.1/cxl-region-refactor' into cxl-for-next
Refactor CXL core/region code to make region code more manageable by
splitting out DAX and PMEM code from RAM handling code.

cxl/core: use cleanup.h for devm_cxl_add_dax_region
cxl/core/region: move dax region device logic into region_dax.c
cxl/core/region: move pmem region driver logic into region_pmem.c
2026-04-03 12:30:57 -07:00
Dave Jiang
303d32843b Merge branch 'for-7.1/dax-hmem' into cxl-for-next
The series addresses conflicts between HMEM and CXL when handling Soft
Reserved memory ranges. CXL will try best effort in claiming the Soft
Reserved memory region that are CXL regions. If fails, it will punt
back to HMEM.

tools/testing/cxl: Test dax_hmem takeover of CXL regions
tools/testing/cxl: Simulate auto-assembly failure
dax/hmem: Parent dax_hmem devices
dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
dax/hmem: Reduce visibility of dax_cxl coordination symbols
cxl/region: Constify cxl_region_resource_contains()
cxl/region: Limit visibility of cxl_region_contains_resource()
dax/cxl: Fix HMEM dependencies
cxl/region: Fix use-after-free from auto assembly failure
dax/hmem, cxl: Defer and resolve Soft Reserved ownership
cxl/region: Add helper to check Soft Reserved containment by CXL regions
dax: Track all dax_region allocations under a global resource tree
dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
dax/hmem: Factor HMEM registration into __hmem_register_device()
dax/bus: Use dax_region_put() in alloc_dax_region() error path
2026-04-03 12:24:18 -07:00
Dave Jiang
7aacc62557 Merge branch 'for-7.1/cxl-type2-support' into cxl-for-next
Prep patches for CXL type2 accelerator basic support

cxl/region: Factor out interleave granularity setup
cxl/region: Factor out interleave ways setup
cxl: Make region type based on endpoint type
cxl/pci: Remove redundant cxl_pci_find_port() call
cxl: Move pci generic code from cxl_pci to core/cxl_pci
cxl: export internal structs for external Type2 drivers
cxl: support Type2 when initializing cxl_dev_state
2026-04-03 12:18:23 -07:00
Alison Schofield
f3b1d22607 tools/testing/cxl: Enable replay of user regions as auto regions
The cxl_test module currently hard-codes auto regions in the mock
topology, limiting coverage of the driver's region auto-assembly
logic.

Teach cxl_test to replay previously committed decoder programming
across a cxl_acpi unbind/bind cycle. Decoder programming is recorded
in a registry keyed by a stable port identity and decoder id. The
registry is updated on decoder commit and reset events and consulted
during enumeration to restore previously enabled decoders.

This allows regions created through the user interface to be replayed
during enumeration and treated as auto-discovered regions, enabling
testing of region auto-assembly using configurations created in the
cxl_test topology.

Example workflow:
  # cxl create-region ...
  # echo 1 > /sys/bus/platform/devices/cxl_acpi.0/decoder_reset_preserve_registry
  # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/unbind
  # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/bind
  # echo 0 > /sys/bus/platform/devices/cxl_acpi.0/decoder_reset_preserve_registry

The NDCTL CXL unit test, cxl-region-replay.sh, demonstrates the usage.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260314061952.2221030-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-04-03 11:45:48 -07:00
Sean Christopherson
25a642b6ab KVM: selftests: Remove duplicate LAUNCH_UPDATE_VMSA call in SEV-ES migrate test
Drop the explicit KVM_SEV_LAUNCH_UPDATE_VMSA call when creating an SEV-ES
VM in the SEV migration test, as sev_vm_create() automatically updates the
VMSA pages for SEV-ES guests.  The only reason the duplicate call doesn't
cause visible problems is because the test doesn't actually try to run the
vCPUs.  That will change when KVM adds a check to prevent userspace from
re-launching a VMSA (which corrupts the VMSA page due to KVM writing
encrypted private memory).

Fixes: 69f8e15ab6 ("KVM: selftests: Use the SEV library APIs in the intra-host migration test")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310234829.2608037-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-03 09:37:35 -07:00
Oleg Nesterov
41fa043273 selftests/seccomp: Add hard-coded __NR_uprobe for x86_64
This complements the commit 18f7686a1c ("selftests/seccomp:
Add hard-coded __NR_uretprobe for x86_64").

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://patch.msgid.link/ac_BAMSggw-_ABPE@redhat.com
Signed-off-by: Kees Cook <kees@kernel.org>
2026-04-03 08:41:38 -07:00
Alexei Starovoitov
f1606dd0ac bpf: Add bpf_compute_const_regs() and bpf_prune_dead_branches() passes
Add two passes before the main verifier pass:

bpf_compute_const_regs() is a forward dataflow analysis that tracks
register values in R0-R9 across the program using fixed-point
iteration in reverse postorder. Each register is tracked with
a six-state lattice:

  UNVISITED -> CONST(val) / MAP_PTR(map_index) /
               MAP_VALUE(map_index, offset) / SUBPROG(num) -> UNKNOWN

At merge points, if two paths produce the same state and value for
a register, it stays; otherwise it becomes UNKNOWN.

The analysis handles:
 - MOV, ADD, SUB, AND with immediate or register operands
 - LD_IMM64 for plain constants, map FDs, map values, and subprogs
 - LDX from read-only maps: constant-folds the load by reading the
   map value directly via bpf_map_direct_read()

Results that fit in 32 bits are stored per-instruction in
insn_aux_data and bitmasks.

bpf_prune_dead_branches() uses the computed constants to evaluate
conditional branches. When both operands of a conditional jump are
known constants, the branch outcome is determined statically and the
instruction is rewritten to an unconditional jump.
The CFG postorder is then recomputed to reflect new control flow.
This eliminates dead edges so that subsequent liveness analysis
doesn't propagate through dead code.

Also add runtime sanity check to validate that precomputed
constants match the verifier's tracked state.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-5-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-03 08:34:36 -07:00
Alexei Starovoitov
427c07ddb9 selftests/bpf: Add tests for subprog topological ordering
Add few tests for topo sort:
- linear chain: main -> A -> B
- diamond: main -> A, main -> B, A -> C, B -> C
- mixed global/static: main -> global -> static leaf
- shared callee: main -> leaf, main -> global -> leaf
- duplicate calls: main calls same subprog twice
- no calls: single subprog

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-4-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-03 08:34:33 -07:00
Alexei Starovoitov
e6898ec751 bpf: Sort subprogs in topological order after check_cfg()
Add a pass that sorts subprogs in topological order so that iterating
subprog_topo_order[] walks leaf subprogs first, then their callers.
This is computed as a DFS post-order traversal of the CFG.

The pass runs after check_cfg() to ensure the CFG has been validated
before traversing and after postorder has been computed to avoid
walking dead code.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-3-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-03 08:34:30 -07:00
Alexei Starovoitov
503d21ef8e bpf: Do register range validation early
Instead of checking src/dst range multiple times during
the main verifier pass do them once.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-03 08:34:26 -07:00
Alexei Starovoitov
891a05ccba Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc6+
Cross-merge BPF and other fixes after downstream PR.

Minor conflict in kernel/bpf/verifier.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-03 08:14:13 -07:00
Abhishek Dubey
e1f7a0e196 selftest/bpf: Enable gotox tests for powerpc64
With gotox instruction and jumptable now supported,
enable corresponding bpf selftest on powerpc.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401152133.42544-5-adubey@linux.ibm.com
2026-04-03 14:14:25 +05:30
Abhishek Dubey
66cad93ad3 selftest/bpf: Enable instruction array test for powerpc
With instruction array now supported, enable corresponding bpf
selftest for powerpc.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401152133.42544-3-adubey@linux.ibm.com
2026-04-03 14:14:25 +05:30
Abhishek Dubey
e640bcd1bf selftests/bpf: Enable private stack tests for powerpc64
With support of private stack, relevant tests must pass
on powerpc64.

#./test_progs -t struct_ops_private_stack
#434/1   struct_ops_private_stack/private_stack:OK
#434/2   struct_ops_private_stack/private_stack_fail:OK
#434/3   struct_ops_private_stack/private_stack_recur:OK
#434     struct_ops_private_stack:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401103215.104438-2-adubey@linux.ibm.com
2026-04-03 14:09:43 +05:30
Linus Torvalds
7b9e74c5a4 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmnPGdMACgkQ6rmadz2v
 bTrNxw/9Hcn2V/Jqp/cEagmKIKqSAUFgEE+AwRbQU5YL2Yem/6Q15rnOk8pOSDT5
 jqk7VbuchVmWa+a9DVy7d3XVWohk332QbvQRHfqV8P0ZpnfJa0YqdZlKg2/4/8P/
 yVhLzVrGIGcvvz9CfhIynRhq/fvr7iYbSSv9JT3nig4qCYpUf7kPbXSLtxyElNWN
 xX36KfTxQO4xI2+iezsNwklXF25Tv59V1fNuKF2lshxS+DwaroAzAJLd3MGvTHRj
 8y5kU1UDb+HeJh9DpEFjppQp4qUQjIKAiNVvXGUOe7TI/i9VTIiMfesniWKNwzYv
 Alo2G8fLb4nJhzNL2ol4R0I5BCYmMT55tBFvSNJQ+9Esy6azkbExmKuE1hXsUXo1
 jY0TbNt58zSZEmyz9SYoFKlg4lOW4ZIMl0RtnSBRoDwtK3ThGV7QFlnKq3uPZ6ce
 RcpMk7cOnERLzwPnpSiACrQmzhMk+j5HG1u+Eb3rXKxYCQO6bAhpQyPDKsiXNgkL
 uezq2zqAnNho0/CInHGlRj7E1JnvRoHCcLBT4zzyIY/jruI8fzK0aMqGMvk/qOby
 BWDnJ9GG3VmGSUc/FOp3IchKCnxXhkYqsjBCP03cbIZgr1MuixZeom81OsPNmSX8
 Ke+FeGNsU5zOUJ1iG2BZjdya/DAgP8hd85WVtaXyX60KKhuu45c=
 =w0RY
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix register equivalence for pointers to packet (Alexei Starovoitov)

 - Fix incorrect pruning due to atomic fetch precision tracking (Daniel
   Borkmann)

 - Fix grace period wait for bpf_link-ed tracepoints (Kumar Kartikeya
   Dwivedi)

 - Fix use-after-free of sockmap's sk->sk_socket (Kuniyuki Iwashima)

 - Reject direct access to nullable PTR_TO_BUF pointers (Qi Tang)

 - Reject sleepable kprobe_multi programs at attach time (Varun R
   Mallya)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add more precision tracking tests for atomics
  bpf: Fix incorrect pruning due to atomic fetch precision tracking
  bpf: Reject sleepable kprobe_multi programs at attach time
  bpf: reject direct access to nullable PTR_TO_BUF pointers
  bpf: sockmap: Fix use-after-free of sk->sk_socket in sk_psock_verdict_data_ready().
  bpf: Fix grace period wait for tracepoint bpf_link
  bpf: Fix regsafe() for pointers to packet
2026-04-02 18:59:56 -07:00
Paul Chaignon
7cbded6ed9 selftests/bpf: Remove invariant violation flags
With the changes to the verifier in previous commits, we're not
expecting any invariant violations anymore. We should therefore always
enable BPF_F_TEST_REG_INVARIANTS to fail on invariant violations. Turns
out that's already the case and we've been explicitly setting this flag
in selftests when it wasn't necessary. This commit removes those flags
from selftests, which should hopefully make clearer that it's always
enabled.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/9afce92510a7d44569dc3af63c9b8c608e69298a.1775142354.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 18:23:25 -07:00
Paul Chaignon
2ba199067b selftests/bpf: Cover invariant violation case from syzbot
This patch adds a selftest for the change in the previous patch. The
selftest is derived from a syzbot reproducer from [1] (among the 22
reproducers on that page, only 4 still reproduced on latest bpf tree,
all being small variants of the same invariant violation).

The test case failure without the previous patch is shown below.

  0: R1=ctx() R10=fp0
  0: (85) call bpf_get_prandom_u32#7    ; R0=scalar()
  1: (bf) r5 = r0                       ; R0=scalar(id=1) R5=scalar(id=1)
  2: (57) r5 &= -4                      ; R5=scalar(smax=0x7ffffffffffffffc,umax=0xfffffffffffffffc,smax32=0x7ffffffc,umax32=0xfffffffc,var_off=(0x0; 0xfffffffffffffffc))
  3: (bf) r7 = r0                       ; R0=scalar(id=1) R7=scalar(id=1)
  4: (57) r7 &= 1                       ; R7=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
  5: (07) r7 += -43                     ; R7=scalar(smin=smin32=-43,smax=smax32=-42,umin=0xffffffffffffffd5,umax=0xffffffffffffffd6,umin32=0xffffffd5,umax32=0xffffffd6,var_off=(0xffffffffffffffd4; 0x3))
  6: (5e) if w5 != w7 goto pc+1
  verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xffffffd5, 0xffffffffffffffd4] s64=[0x80000000ffffffd5, 0x7fffffffffffffd4] u32=[0xffffffd5, 0xffffffd4] s32=[0xffffffd5, 0xffffffd4] var_off=(0xffffffd4, 0xffffffff00000000)

R5 and R7 are prepared such that their tnums intersection results in a
known constant but that constant isn't within R7's u32 bounds.
is_branch_taken isn't able to detect this case today, so the verifier
walks the impossible fallthrough branch. After regs_refine_cond_op and
reg_bounds_sync refine R5 on the assumption that the branch is taken,
the impossibility becomes apparent and results in an invariant violation
for R5: umin32 is greater than umax32.

The previous patch fixes this by using regs_refine_cond_op and
reg_bounds_sync in is_branch_taken to detect the impossible branch. The
fallthrough branch is therefore correctly detected as dead code.

Link: https://syzkaller.appspot.com/bug?extid=c950cc277150935cc0b5 [1]
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/b1e22233a3206ead522f02eda27b9c5c991a0de9.1775142354.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 18:23:25 -07:00
Amery Hung
63f5156a9c selftests/bpf: Improve task local data documentation and fix potential memory leak
If TLD_FREE_DATA_ON_THREAD_EXIT is not enabled in a translation unit
that calls __tld_create_key() first, another translation unit that
enables it will not get the auto cleanup feature as pthread key is only
created once when allocation metadata. Fix it by always try to create
the pthread key when __tld_create_key() is called.

Also improve the documentation:
- Discourage user from using different options in different translation
  units
- Specify calling tld_free() before thread exit as undefined behavior

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 15:11:08 -07:00
Amery Hung
0b481a6915 selftests/bpf: Remove TLD_READ_ONCE() in the user space header
TLD_READ_ONCE() is redundant as the only reference passed to it is
defined as _Atomic. The load is guaranteed to be atomic in C11 standard
(6.2.6.1). Drop the macro.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 15:11:08 -07:00
Amery Hung
80aa8e9c64 selftests/bpf: Make sure TLD_DEFINE_KEY runs first
Without specifying constructor priority of the hidden constructor
function defined by TLD_DEFINE_KEY, __tld_create_key(..., dyn_data =
false) may run after tld_get_data() called from other constructors.
Threads calling tld_get_data() before __tld_create_key(..., dyn_data
= false) will not allocate enough memory for all TLDs and later result
in OOB access. Therefore, set it to the lowest value available to
users. Note that lower means higher priority and 0-100 is reserved to
the compiler.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 15:11:08 -07:00
Amery Hung
bb6d9f5cf1 selftests/bpf: Simplify task_local_data memory allocation
Simplify data allocation by always using aligned_alloc() and passing
size_pot, size rounded up to the closest power of two to alignment.

Currently, aligned_alloc(page_size, size) is only intended to be used
with memory allocators that can fulfill the request without rounding
size up to page_size to conserve memory. This is enabled by defining
TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to
the limitation of UPTR where only a page can be pinned to the kernel.
Otherwise, malloc(size * 2) is used to allocate memory for data.

However, we don't need to call aligned_alloc(page_size, size) to get
a contiguous memory of size bytes within a page. aligned_alloc(size_pot,
...) will also do the trick. Therefore, just use aligned_alloc(size_pot,
...) universally.

As for the size argument, create a new option,
TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size.
This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing
memory allocators with low overhead aligned_alloc() to not waste memory.
To enable this, users need to make sure it is not an undefined behavior
for the memory allocator to have size not being an integral multiple of
alignment.

Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC
used to always waste size-byte of memory due to malloc(size * 2).
Now the worst case becomes size - 1 and the best case is 0 when the size
is already a power of two.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 15:11:08 -07:00
Amery Hung
7c8ca532a7 selftests/bpf: Fix task_local_data data allocation size
Currently, when allocating memory for data, size of tld_data_u->start
is not taken into account. This may cause OOB access. Fixed it by adding
the non-flexible array part of tld_data_u.

Besides, explicitly align tld_data_u->data to 8 bytes in case some
fields are added before data in the future. It could break the
assumption that every data field is 8 byte aligned and
sizeof(tld_data_u) will no longer be equal to
offsetof(struct tld_data_u, data), which we use interchangeably.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 15:11:08 -07:00
Hoyeon Lee
9d77cefe8f selftests/bpf: Add test for raw-address single kprobe attach
Currently, attach_probe covers manual single-kprobe attaches by
func_name, but not the raw-address form that the PMU-based
single-kprobe path can accept.

This commit adds PERF and LINK raw-address coverage. It resolves
SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address
in bpf_kprobe_opts.offset with func_name = NULL, and verifies that
kprobe and kretprobe are still triggered. It also verifies that LEGACY
rejects the same form.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-4-hoyeon.lee@suse.com
2026-04-02 13:23:19 -07:00
Jakub Kicinski
8ffb33d770 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-7.0-rc7).

Conflicts:

net/vmw_vsock/af_vsock.c
  b18c833888 ("vsock: initialize child_ns_mode_locked in vsock_net_init()")
  0de607dc4f ("vsock: add G2H fallback for CIDs not owned by H2G transport")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
  ceee35e567 ("bnxt_en: Refactor some basic ring setup and adjustment logic")
  57cdfe0dc7 ("bnxt_en: Resize RSS contexts on channel count change")

drivers/net/wireless/intel/iwlwifi/mld/mac80211.c
  4d56037a02 ("wifi: iwlwifi: mld: block EMLSR during TDLS connections")
  687a95d204 ("wifi: iwlwifi: mld: correctly set wifi generation data")

drivers/net/wireless/intel/iwlwifi/mld/scan.h
  b6045c899e ("wifi: iwlwifi: mld: Refactor scan command handling")
  ec66ec6a5a ("wifi: iwlwifi: mld: Fix MLO scan timing")

drivers/net/wireless/intel/iwlwifi/mvm/fw.c
  078df640ef ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2")
  323156c354 ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02 11:03:13 -07:00
Daniel Borkmann
e1b5687a86 selftests/bpf: Add more precision tracking tests for atomics
Add verifier precision tracking tests for BPF atomic fetch operations.
Validate that backtrack_insn correctly propagates precision from the
fetch dst_reg to the stack slot for {fetch_add,xchg,cmpxchg} atomics.
For the first two src_reg gets the old memory value, and for the last
one r0. The fetched register is used for pointer arithmetic to trigger
backtracking. Also add coverage for fetch_{or,and,xor} flavors which
exercises the bitwise atomic fetch variants going through the same
insn->imm & BPF_FETCH check but with different imm values.

Add dual-precision regression tests for fetch_add and cmpxchg where
both the fetched value and a reread of the same stack slot are tracked
for precision. After the atomic operation, the stack slot is STACK_MISC,
so the ldx does not set INSN_F_STACK_ACCESS. These tests verify that
stack precision propagates solely through the atomic fetch's load side.

Add map-based tests for fetch_add and cmpxchg which validate that non-
stack atomic fetch completes precision tracking without falling back
to mark_all_scalars_precise. Lastly, add 32-bit variants for {fetch_add,
cmpxchg} on map values to cover the second valid atomic operand size.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_precision
  [...]
  + /etc/rcS.d/S50-startup
  ./test_progs -t verifier_precision
  [    1.697105] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.700220] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.777043] tsc: Refined TSC clocksource calibration: 3407.986 MHz
  [    1.777619] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc6d7268, max_idle_ns: 440795260133 ns
  [    1.778658] clocksource: Switched to clocksource tsc
  #633/1   verifier_precision/bpf_neg:OK
  #633/2   verifier_precision/bpf_end_to_le:OK
  #633/3   verifier_precision/bpf_end_to_be:OK
  #633/4   verifier_precision/bpf_end_bswap:OK
  #633/5   verifier_precision/bpf_load_acquire:OK
  #633/6   verifier_precision/bpf_store_release:OK
  #633/7   verifier_precision/state_loop_first_last_equal:OK
  #633/8   verifier_precision/bpf_cond_op_r10:OK
  #633/9   verifier_precision/bpf_cond_op_not_r10:OK
  #633/10  verifier_precision/bpf_atomic_fetch_add_precision:OK
  #633/11  verifier_precision/bpf_atomic_xchg_precision:OK
  #633/12  verifier_precision/bpf_atomic_fetch_or_precision:OK
  #633/13  verifier_precision/bpf_atomic_fetch_and_precision:OK
  #633/14  verifier_precision/bpf_atomic_fetch_xor_precision:OK
  #633/15  verifier_precision/bpf_atomic_cmpxchg_precision:OK
  #633/16  verifier_precision/bpf_atomic_fetch_add_dual_precision:OK
  #633/17  verifier_precision/bpf_atomic_cmpxchg_dual_precision:OK
  #633/18  verifier_precision/bpf_atomic_fetch_add_map_precision:OK
  #633/19  verifier_precision/bpf_atomic_cmpxchg_map_precision:OK
  #633/20  verifier_precision/bpf_atomic_fetch_add_32bit_precision:OK
  #633/21  verifier_precision/bpf_atomic_cmpxchg_32bit_precision:OK
  #633/22  verifier_precision/bpf_neg_2:OK
  #633/23  verifier_precision/bpf_neg_3:OK
  #633/24  verifier_precision/bpf_neg_4:OK
  #633/25  verifier_precision/bpf_neg_5:OK
  #633     verifier_precision:OK
  Summary: 1/25 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260331222020.401848-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-02 09:57:59 -07:00
Linus Torvalds
f8f5627a8a With fixes from wireless, bluetooth and Netfilter included we're back
to each PR carrying 30%+ more fixes than in previous era. The good
 news is that so far none of the "extra" fixes are themselves
 causing real regressions. Not sure how much comfort that is.
 
 Current release - fix to a fix:
 
  - netdevsim: fix build if SKB_EXTENSIONS=n
 
  - eth: stmmac: skip VLAN restore when VLAN hash ops are missing
 
 Previous releases - regressions:
 
  - wifi: iwlwifi: mvm: don't send a 6E related command when
    not supported
 
 Previous releases - always broken:
 
  - some info leak fixes
 
  - add missing clearing of skb->cb[] on ICMP paths from tunnels
 
  - ipv6: flowlabel: defer exclusive option free until RCU teardown
 
  - ipv6: avoid overflows in ip6_datagram_send_ctl()
 
  - mpls: add seqcount to protect platform_labels from OOB access
 
  - bridge: improve safety of parsing ND options
 
  - Bluetooth: fix leaks, overflows and races in hci_sync
 
  - netfilter: add more input validation, some to address bugs directly
    some to prevent exploits from cooking up broken configurations
 
  - wifi: ath: avoid poor performance due to stopping the wrong
    aggregation session
 
  - wifi: virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free
 
  - eth: fec: fix the PTP periodic output sysfs interface
 
  - eth: enetc: safely reinitialize TX BD ring when it has unsent frames
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmnOldMACgkQMUZtbf5S
 IrvaPQ/9EdZIY8AnvdgZmzVrMkTbbshpOy/lLxkpFE4yX1Hgw9BLSZqoC3rq2b41
 78Q6Zk7tbOHQb8rBLawi3+YuY+Eq5R4ajt4MNWWd1sYaaHnOXwp91jO4rvocSCjz
 8o8/Z3VU4znG+cK85mcuYqNZcar/0dI8m01136Dtoi0dtZ4KKdUBBDT/Zq7Ov3gJ
 pKrSMZBFT5UwnhlLi+xZ65KjdUMlbTujlQf0vH815p+iM+5E8fJNK5h+a6ZefXB4
 Un+jXxhD/Vj5TBwq8ZouDSAWVCAG26Yy9RGcn5O7w0mlzv48mWB1bIoXFEyc2F8s
 EbsiEqCNygHLoVTsBU1+0psYqey7aZDfceokzYMONHpJgpWbFmmHjfcFxfgeq9Of
 iI3DU7IQMBKdN7uC4dCKc94Ty9Jye+DvCnkeMUEwxV4Dkhnr+2wP0pGqo6r2K0sT
 9mFBh8YP2KyRd5+Ei8D4zmQrGpqpsXwSIwrhnGHEkWGjMAW+TltyOPzPzUgvMBHX
 XllZIAFpTFaZiR9ZZU8PRyUNRfh93AmV0tY4xYCqVArf85A/LjqmJCw6K6Pthcmw
 RzezpyQUCJ044EyDfDhjVgK/YEEkdT+wUcKKLw31pdOvQVAPJ4pI95pWbeVz4kLk
 30DE7PR+2hExm44GHUfG/v8MJTE2OkSRu26Ci4dQsm3sT2zvv2g=
 =3Pjk
 -----END PGP SIGNATURE-----

Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "With fixes from wireless, bluetooth and netfilter included we're back
  to each PR carrying 30%+ more fixes than in previous era.

  The good news is that so far none of the "extra" fixes are themselves
  causing real regressions. Not sure how much comfort that is.

  Current release - fix to a fix:

   - netdevsim: fix build if SKB_EXTENSIONS=n

   - eth: stmmac: skip VLAN restore when VLAN hash ops are missing

  Previous releases - regressions:

   - wifi: iwlwifi: mvm: don't send a 6E related command when
     not supported

  Previous releases - always broken:

   - some info leak fixes

   - add missing clearing of skb->cb[] on ICMP paths from tunnels

   - ipv6:
      - flowlabel: defer exclusive option free until RCU teardown
      - avoid overflows in ip6_datagram_send_ctl()

   - mpls: add seqcount to protect platform_labels from OOB access

   - bridge: improve safety of parsing ND options

   - bluetooth: fix leaks, overflows and races in hci_sync

   - netfilter: add more input validation, some to address bugs directly
     some to prevent exploits from cooking up broken configurations

   - wifi:
      - ath: avoid poor performance due to stopping the wrong
        aggregation session
      - virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free

   - eth:
      - fec: fix the PTP periodic output sysfs interface
      - enetc: safely reinitialize TX BD ring when it has unsent frames"

* tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
  ipv6: avoid overflows in ip6_datagram_send_ctl()
  net: hsr: fix VLAN add unwind on slave errors
  net: hsr: serialize seq_blocks merge across nodes
  vsock: initialize child_ns_mode_locked in vsock_net_init()
  selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
  net/sched: cls_flow: fix NULL pointer dereference on shared blocks
  net/sched: cls_fw: fix NULL pointer dereference on shared blocks
  net/x25: Fix overflow when accumulating packets
  net/x25: Fix potential double free of skb
  bnxt_en: Restore default stat ctxs for ULP when resource is available
  bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
  bnxt_en: Refactor some basic ring setup and adjustment logic
  net/mlx5: Fix switchdev mode rollback in case of failure
  net/mlx5: Avoid "No data available" when FW version queries fail
  net/mlx5: lag: Check for LAG device before creating debugfs
  net: macb: properly unregister fixed rate clocks
  net: macb: fix clk handling on PCI glue driver removal
  virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
  net/sched: sch_netem: fix out-of-bounds access in packet corruption
  ...
2026-04-02 09:57:06 -07:00