Commit Graph

1781 Commits

Author SHA1 Message Date
Helge Deller
e6a650acbd parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set
The CONFIG_PA11 option can not be used as a reliable check if we build a
32-bit kernel which needs the 32-bit VDSO.
Instead depend on CONFIG_64BIT and CONFIG_COMPAT only.

Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-30 09:10:07 +02:00
Johan Hovold
b8425ceefe parisc: drivers: switch to dynamic root device
Driver core expects devices to be dynamically allocated and will, for
example, complain loudly if a device that lacks a release function
is ever freed.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

While at it, drop the redundant additional reference taken at init.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-28 17:18:52 +02:00
Helge Deller
3dce917902 parisc: Allow to build without VDSO32
When building for 64-bit and without CONFIG_COMPAT, leave out the
vdso32 binary.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-17 15:46:45 +02:00
Helge Deller
35493b28e7 parisc: Fix default stack size when COMPAT=n
The CONFIG_STACK_MAX_DEFAULT_SIZE_MB config option does not exist
when CONFIG_COMPAT is disabled. Use default 1 GB stack in this case.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-17 15:46:45 +02:00
Helge Deller
7dc9ee6e5e parisc: Fix signal code to depend on CONFIG_COMPAT instead of CONFIG_64BIT
The signal handler code used CONFIG_64BIT to decide if compat handling
code should be compiled in. Fix it to use CONFIG_COMPAT instead.
This allows to disable CONFIG_COMPAT even when running a 64-bit kernel.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-17 15:46:45 +02:00
Helge Deller
97bfda4520 parisc: Avoid compat syscalls when COMPAT=n
Drop unnecessary code and syscall tables when we run a 64-bit
kernel with conpat mode disabled.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-17 15:46:45 +02:00
Helge Deller
da3680f564 parisc: _llseek syscall is only available for 32-bit userspace
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2026-04-17 11:32:46 +02:00
Linus Torvalds
40286d6379 pci-v7.1-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmnfwfMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxwIRAAlN1h5er8aFDbjON5YMXBZqlQmzaC
 bjlUHgwm7HkdErTFozyuqhE8QUO1kCm4uMQzeyJdfY9nRWqMDOuKYxMD5j0exk+o
 4tbbJg6Xx4dq7Qrawy9PhxyQm/PDAcvs+FRRlGala+qq9o3fxPDOAZVDE/1C8qFQ
 Jd7GGd7NZn/NN4xrqST4RQHjO8fwaMwmksWCStsb79kfesQWP6kLADGfIMcWxNUB
 2s+oTnK6Hw0tkBv56n6i8mbb0EzS3/RN1daTevGAta1rmfUVVtWGRZ4paMvv0Owi
 Rl5+O5Jz6/c1qiXZbUqu5CRQPIy7Dr3JPvURcZX6qbsV8PzWXZr0Wi+geWefGOnp
 55y+3OT0vdBGAuXLJhrcU7Clzq9D/TZOt8oTI8IFArUfDlmrAIdozPn7gr+VGre5
 QuKymSk3XWtyIbe4o8UeZ4f9g0y6ZY1XvtvB7K1tze+OOmqlkfq966+z8aZuGOKx
 ZvAU/NIat5H02EgB4dEVOP8R5vPZlXGT0RLGl1JWRypPWyZDbVVA3z927qRQG5md
 IsVq8WaIrB1zyl9g37lZeEaYwP/qCIQsHkMGPYcP4wdOQEV9AQqi5pmjMXnWyQJD
 PR1nvmTKW7USRCJ+pz8xPhZh0cj3ENaddORTD3I/0CGVV0y452bU/5rr4T+K04bK
 PCJBpxTIDuWDwXc=
 =FFRz
 -----END PGP SIGNATURE-----

Merge tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Allow TLP Processing Hints to be enabled for RCiEPs (George Abraham
     P)

   - Enable AtomicOps only if we know the Root Port supports them (Gerd
     Bayer)

   - Don't enable AtomicOps for RCiEPs since none of them need Atomic
     Ops and we can't tell whether the Root Complex would support them
     (Gerd Bayer)

   - Leave Precision Time Measurement disabled until a driver enables it
     to avoid PCIe errors (Mika Westerberg)

   - Make pci_set_vga_state() fail if bridge doesn't support VGA
     routing, i.e., PCI_BRIDGE_CTL_VGA is not writable, and return
     errors to vga_get() callers including userspace via
     /dev/vga_arbiter (Simon Richter)

   - Validate max-link-speed from DT in j721e, brcmstb, mediatek-gen3,
     rzg3s drivers (where the actual controller constraints are known),
     and remove validation from the generic OF DT accessor (Hans Zhang)

   - Remove pc110pad driver (no longer useful after 486 CPU support
     removed) and no_pci_devices() (pc110pad was the last user) (Dmitry
     Torokhov, Heiner Kallweit)

  Resource management:

   - Prevent assigning space to unimplemented bridge windows; previously
     we mistakenly assumed prefetchable window existed and assigned
     space and put a BAR there (Ahmed Naseef)

   - Avoid shrinking bridge windows to fit in the initial Root Port
     window; fixes one problem with devices with large BARs connected
     via switches, e.g., Thunderbolt (Ilpo Järvinen)

   - Pass full extent of empty space, not just the aligned space, to
     resource_alignf callback so free space before the requested
     alignment can be used (Ilpo Järvinen)

   - Place small resources before larger ones for better utilization of
     address space (Ilpo Järvinen)

   - Fix alignment calculation for resource size larger than align,
     e.g., bridge windows larger than the 1MB required alignment (Ilpo
     Järvinen)

  Reset:

   - Update slot handling so all ARI functions are treated as being in
     the same slot. They're all reset by Secondary Bus Reset, but
     previously drivers of ARI functions that appeared to be on a
     non-zero device weren't notified and fatal hardware errors could
     result (Keith Busch)

   - Make sysfs reset_subordinate hotplug safe to avoid spurious hotplug
     events (Keith Busch)

   - Hide Secondary Bus Reset ('bus') from sysfs reset_methods if masked
     by CXL because it has no effect (Vidya Sagar)

   - Avoid FLR for AMD NPU device, where it causes the device to hang
     (Lizhi Hou)

  Error handling:

   - Clear only error bits in PCIe Device Status to avoid accidentally
     clearing Emergency Power Reduction Detected (Shuai Xue)

   - Check for AER errors even in devices without drivers (Lukas Wunner)

   - Initialize ratelimit info so DPC and EDR paths log AER error
     information (Kuppuswamy Sathyanarayanan)

  Power control:

   - Add UPD720201/UPD720202 USB 3.0 xHCI Host Controller .compatible so
     generic pwrctrl driver can control it (Neil Armstrong)

  Hotplug:

   - Set LED_HW_PLUGGABLE for NPEM hotplug-capable ports so LED core
     doesn't complain when setting brightness fails because the endpoint
     is gone (Richard Cheng)

  Peer-to-peer DMA:

   - Allow wildcards in list of host bridges that support peer-to-peer
     DMA between hierarchy domains and add all Google SoCs (Jacob
     Moroni)

  Endpoint framework:

   - Advertise dynamic inbound mapping support in pci-epf-test and
     update host pci_endpoint_test to skip doorbell testing if not
     advertised by endpoint (Koichiro Den)

   - Return 0, not remaining timeout, when MHI eDMA ops complete so
     mhi_ep_ring_add_element() doesn't interpret non-zero as failure
     (Daniel Hodges)

   - Remove vntb and ntb duplicate resource teardown that leads to oops
     when .allow_link() fails or .drop_link() is called (Koichiro Den)

   - Disable vntb delayed work before clearing BAR mappings and
     doorbells to avoid oops caused by doing the work after resources
     have been torn down (Koichiro Den)

   - Add a way to describe reserved subregions within BARs, e.g.,
     platform-owned fixed register windows, and use it for the RK3588
     BAR4 DMA ctrl window (Koichiro Den)

   - Add BAR_DISABLED for BARs that will never be available to an EPF
     driver, and change some BAR_RESERVED annotations to BAR_DISABLED
     (Niklas Cassel)

   - Add NTB .get_dma_dev() callback for cases where DMA API requires a
     different device, e.g., vNTB devices (Koichiro Den)

   - Add reserved region types for MSI-X Table and PBA so Endpoint
     controllers can them as describe hardware-owned regions in a
     BAR_RESERVED BAR (Manikanta Maddireddy)

   - Make Tegra194/234 BAR0 programmable and remove 1MB size limit
     (Manikanta Maddireddy)

   - Expose Tegra BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVED
     (Manikanta Maddireddy)

   - Add Tegra194 and Tegra234 device table entries to pci_endpoint_test
     (Manikanta Maddireddy)

   - Skip the BAR subrange selftest if there are not enough inbound
     window resources to run the test (Christian Bruel)

  New native PCIe controller drivers:

   - Add DT binding and driver for Andes QiLai SoC PCIe host controller
     (Randolph Lin)

   - Add DT binding and driver for ESWIN PCIe Root Complex (Senchuan
     Zhang)

  Baikal T-1 PCIe controller driver:

   - Remove driver since it never quite became usable (Andy Shevchenko)

  Cadence PCIe controller driver:

   - Implement byte/word config reads with dword (32-bit) reads because
     some Cadence controllers don't support sub-dword accesses (Aksh
     Garg)

  CIX Sky1 PCIe controller driver:

   - Add 'power-domains' to DT binding for SCMI power domain (Gary Yang)

  Freescale i.MX6 PCIe controller driver:

   - Add i.MX94 and i.MX943 to fsl,imx6q-pcie-ep DT binding (Richard
     Zhu)

   - Delay instead of polling for L2/L3 Ready after PME_Turn_off when
     suspending i.MX6SX because LTSSM registers are inaccessible
     (Richard Zhu)

   - Separate PERST# assertion (for resetting endpoints) from core reset
     (for resetting the RC itself) to prepare for new DTs with PERST#
     GPIO in per-Root Port nodes (Sherry Sun)

   - Retain Root Port MSI capability on i.MX7D, i.MX8MM, and i.MX8MQ so
     MSI from downstream devices will work (Richard Zhu)

   - Fix i.MX95 reference clock source selection when internal refclk is
     used (Franz Schnyder)

  Freescale Layerscape PCIe controller driver:

   - Allow building as a removable module (Sascha Hauer)

  MediaTek PCIe Gen3 controller driver:

   - Use dev_err_probe() to simplify error paths and make deferred probe
     messages visible in /sys/kernel/debug/devices_deferred (Chen-Yu
     Tsai)

   - Power off device if setup fails (Chen-Yu Tsai)

   - Integrate new pwrctrl API to enable power control for WiFi/BT
     adapters on mainboard or in PCIe or M.2 slots (Chen-Yu Tsai)

  NVIDIA Tegra194 PCIe controller driver:

   - Poll less aggressively and non-atomically for PME_TO_Ack during
     transition to L2 (Vidya Sagar)

   - Disable LTSSM after transition to Detect on surprise link down to
     stop toggling between Polling and Detect (Manikanta Maddireddy)

   - Don't force the device into the D0 state before L2 when suspending
     or shutting down the controller (Vidya Sagar)

   - Disable PERST# IRQ only in Endpoint mode because it's not
     registered in Root Port mode (Manikanta Maddireddy)

   - Handle 'nvidia,refclk-select' as optional (Vidya Sagar)

   - Disable direct speed change in Endpoint mode so link speed change
     is controlled by the host (Vidya Sagar)

   - Set LTR values before link up to avoid bogus LTR messages with 0
     latency (Vidya Sagar)

   - Allow system suspend when the Endpoint link is down (Vidya Sagar)

   - Use DWC IP core version, not Tegra custom values, to avoid DWC core
     version check warnings (Manikanta Maddireddy)

   - Apply ECRC workaround to devices based on DesignWare 5.00a as well
     as 4.90a (Manikanta Maddireddy)

   - Disable PM Substate L1.2 in Endpoint mode to work around Tegra234
     erratum (Vidya Sagar)

   - Delay post-PERST# cleanup until core is powered on to avoid CBB
     timeout (Manikanta Maddireddy)

   - Assert CLKREQ# so switches that forward it to their downstream side
     can bring up those links successfully (Vidya Sagar)

   - Calibrate pipe to UPHY for Endpoint mode to reset stale PLL state
     from any previous bad link state (Vidya Sagar)

   - Remove IRQF_ONESHOT flag from Endpoint interrupt registration so
     DMA driver and Endpoint controller driver can share the interrupt
     line (Vidya Sagar)

   - Enable DMA interrupt to support DMA in both Root Port and Endpoint
     modes (Vidya Sagar)

   - Enable hardware link retraining after link goes down in Endpoint
     mode (Vidya Sagar)

   - Add DT binding and driver support for core clock monitoring (Vidya
     Sagar)

  Qualcomm PCIe controller driver:

   - Advertise 'Hot-Plug Capable' and set 'No Command Completed Support'
     since Qcom Root Ports support hotplug events like DL_Up/Down and
     can accept writes to Slot Control without delays between writes
     (Krishna Chaitanya Chundru)

  Renesas R-Car PCIe controller driver:

   - Mark Endpoint BAR0 and BAR2 as Resizable (Koichiro Den)

   - Reduce EPC BAR alignment requirement to 4K (Koichiro Den)

  Renesas RZ/G3S PCIe controller driver:

   - Add RZ/G3E to DT binding and to driver (John Madieu)

   - Assert (not deassert) resets in probe error path (John Madieu)

   - Assert resets in suspend path in reverse order they were deasserted
     during probe (John Madieu)

   - Rework inbound window algorithm to prevent mapping more than
     intended region and enforce alignment on size, to prepare for
     RZ/G3E support (John Madieu)

  Rockchip DesignWare PCIe controller driver:

   - Add tracepoints for PCIe controller LTSSM transitions and link rate
     changes (Shawn Lin)

   - Trace LTSSM events collected by the dw-rockchip debug FIFO (Shawn
     Lin)

  SOPHGO PCIe controller driver:

   - Disable ASPM L0s and L1 on Sophgo 2042 PCIe Root Ports that
     advertise support for them (Yao Zi)

  Synopsys DesignWare PCIe controller driver:

   - Continue with system suspend even if an Endpoint doesn't respond
     with PME_TO_Ack message (Manivannan Sadhasivam)

   - Set Endpoint MSI-X Table Size in the correct function of a
     multi-function device when configuring MSI-X, not in Function 0
     (Aksh Garg)

   - Set Max Link Width and Max Link Speed for all functions of a
     multi-function device, not just Function 0 (Aksh Garg)

   - Expose PCIe event counters in groups 5-7 in debugfs (Hans Zhang)

  Miscellaneous:

   - Warn only once about invalid ACS kernel parameter format (Richard
     Cheng)

   - Suppress FW_BUG warning when writing sysfs 'numa_node' with the
     current value (Li RongQing)

   - Drop redundant 'depends on PCI' from Kconfig (Julian Braha)"

* tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (165 commits)
  PCI/P2PDMA: Add Google SoCs to the P2P DMA host bridge list
  PCI/P2PDMA: Allow wildcard Device IDs in host bridge list
  PCI: sg2042: Avoid L0s and L1 on Sophgo 2042 PCIe Root Ports
  PCI: cadence: Add flags for disabling ASPM capability for broken Root Ports
  PCI: tegra194: Add core monitor clock support
  dt-bindings: PCI: tegra194: Add monitor clock support
  PCI: tegra194: Enable hardware hot reset mode in Endpoint mode
  PCI: tegra194: Enable DMA interrupt
  PCI: tegra194: Remove IRQF_ONESHOT flag during Endpoint interrupt registration
  PCI: tegra194: Calibrate pipe to UPHY for Endpoint mode
  PCI: tegra194: Assert CLKREQ# explicitly by default
  PCI: tegra194: Fix CBB timeout caused by DBI access before core power-on
  PCI: tegra194: Disable L1.2 capability of Tegra234 EP
  PCI: dwc: Apply ECRC workaround to DesignWare 5.00a as well
  PCI: tegra194: Use DWC IP core version
  PCI: tegra194: Free up Endpoint resources during remove()
  PCI: tegra194: Allow system suspend when the Endpoint link is not up
  PCI: tegra194: Set LTR message request before PCIe link up in Endpoint mode
  PCI: tegra194: Disable direct speed change for Endpoint mode
  PCI: tegra194: Use devm_gpiod_get_optional() to parse "nvidia,refclk-select"
  ...
2026-04-15 14:41:21 -07:00
Linus Torvalds
334fbe734e mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       368
 Reviews/patch:       1.56
 Reviewed rate:       74%
 
 Excluding DAMON:
 
 Total patches:       316
 Reviews/patch:       1.77
 Reviewed rate:       81%
 
 Excluding DAMON and zram:
 
 Total patches:       306
 Reviews/patch:       1.81
 Reviewed rate:       82%
 
 Excluding DAMON, zram and maple_tree:
 
 Total patches:       276
 Reviews/patch:       2.01
 Reviewed rate:       91%
 
 Significant patch series in this merge:
 
 - The 30 patch series "maple_tree: Replace big node with maple copy"
   from Liam Howlett is mainly prepararatory work for ongoing development
   but it does reduce stack usage and is an improvement.
 
 - The 12 patch series "mm, swap: swap table phase III: remove swap_map"
   from Kairui Song offers memory savings by removing the static swap_map.
   It also yields some CPU savings and implements several cleanups.
 
 - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush
   Yadav adds file seal preservation to LUO's memfd code.
 
 - The 2 patch series "mm: zswap: add per-memcg stat for incompressible
   pages" from Jiayuan Chen adds additional userspace stats reportng to
   zswap.
 
 - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike
   Rapoport implements some cleanups for our handling of ZERO_PAGE() and
   zero_pfn.
 
 - The 2 patch series "mm/kmemleak: Improve scan_should_stop()
   implementation" from Zhongqiu Han provides an robustness improvement and
   some cleanups in the kmemleak code.
 
 - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang
   "improves the khugepaged scan logic and reduces CPU consumption by
   prioritizing scanning tasks that access memory frequently".
 
 - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies
   Kexec Handover by "transitioning KHO from an xarray-based metadata
   tracking system with serialization to a radix tree data structure that
   can be passed directly to the next kernel"
 
 - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan
   tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's
   tracepointing.
 
 - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper
   and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow
   stack code: remove per-arch code in favour of a generic implementation.
 
 - The 2 patch series "Fix KASAN support for KHO restored vmalloc
   regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO
   restores a vmalloc area.
 
 - The 4 patch series "mm: Remove stray references to pagevec" from Tal
   Zussman provides several cleanups, mainly udpating references to "struct
   pagevec", which became folio_batch three years ago.
 
 - The 17 patch series "mm: Eliminate fake head pages from vmemmap
   optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap
   optimization (HVO) by changing how tail pages encode their relationship
   to the head page.
 
 - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for
   core layer filters" from SeongJae Park improves two problematic
   behaviors of DAMOS that makes it less efficient when core layer filters
   are used.
 
 - The 3 patch series "mm/damon: strictly respect min_nr_regions" from
   SeongJae Park improves DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter.
 
 - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil
   Babka is a proper fix for a previously hotfixed SMP=n issue.  Code
   simplifications and cleanups ennsed.
 
 - The 16 patch series "mm: cleanups around unmapping / zapping" from
   David Hildenbrand implements "a bunch of cleanups around unmapping and
   zapping.  Mostly simplifications, code movements, documentation and
   renaming of zapping functions".
 
 - The 6 patch series "support batched checking of the young flag for
   MGLRU" from Baolin Wang supports batched checking of the young flag for
   MGLRU.  It's part cleanups; one benchmark shows large performance
   benefits for arm64.
 
 - The 5 patch series "memcg: obj stock and slab stat caching cleanups"
   from Johannes Weiner provides memcg cleanup and robustness improvements.
 
 - The 5 patch series "Allow order zero pages in page reporting" from
   Yuvraj Sakshith enhances page_reporting's free page reporting - it is
   presently and undesirably order-0 pages when reporting free memory.
 
 - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is
   cleanup work following from the recent conversion of the VMA flags to a
   bitmap.
 
 - The 10 patch series "mm/damon: add optional debugging-purpose sanity
   checks" from SeongJae Park adds some more developer-facing debug checks
   into DAMON core.
 
 - The 2 patch series "mm/damon: test and document power-of-2
   min_region_sz requirement" from SeongJae Park adds an additional DAMON
   kunit test and makes some adjustments to the addr_unit parameter
   handling.
 
 - The 3 patch series "mm/damon/core: make passed_sample_intervals
   comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time
   overflow issue in DAMON core.
 
 - The 7 patch series "mm/damon: improve/fixup/update ratio calculation,
   test and documentation" from SeongJae Park is a "batch of misc/minor
   improvements and fixups" for DAMON.
 
 - The 4 patch series "mm: move vma_(kernel|mmu)_pagesize() out of
   hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device
   when CONFIG_HUGETLB=n.  Some code movement was required.
 
 - The 6 patch series "zram: recompression cleanups and tweaks" from
   Sergey Senozhatsky provides "a somewhat random mix of fixups,
   recompression cleanups and improvements" in the zram code.
 
 - The 11 patch series "mm/damon: support multiple goal-based quota
   tuning algorithms" from SeongJae Park extend DAMOS quotas goal
   auto-tuning to support multiple tuning algorithms that users can select.
 
 - The 4 patch series "mm: thp: reduce unnecessary
   start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs
   handling so we no longer spam the logs with reams of junk when
   starting/stopping khugepaged.
 
 - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes
   provides some cleanups and slight fixes in the mremap, mmap and vma
   code.
 
 - The 5 patch series "mm/damon: support addr_unit on default monitoring
   targets for modules" from SeongJae Park extends the use of DAMON core's
   addr_unit tunable.
 
 - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites"
   from Nico Pache provides cleanups in the khugepaged and is a base for
   Nico's planned khugepaged mTHP support.
 
 - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups"
   from David Hildenbrand implements code movement and cleanups in the
   memhotplug and sparsemem code.
 
 - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and
   cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some
   memhotplug Kconfig support.
 
 - The 6 patch series "change young flag check functions to return bool"
   from Baolin Wang is "a cleanup patchset to change all young flag check
   functions to return bool".
 
 - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL
   dereference issues" from Josh Law and SeongJae Park fixes a few
   potential DAMON bugs.
 
 - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma
   code" from "converts a lot of the existing use of the legacy vm_flags_t
   data type to the new vma_flags_t type which replaces it".  Mainly in the
   vma code.
 
 - The 21 patch series "mm: expand mmap_prepare functionality and usage"
   from Lorenzo Stoakes "expands the mmap_prepare functionality, which is
   intended to replace the deprecated f_op->mmap hook which has been the
   source of bugs and security issues for some time".  Cleanups,
   documentation, extension of mmap_prepare into filesystem drivers.
 
 - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from
   Lorenzo Stoakes simplifies and cleans up zap_huge_pmd().  Additional
   cleanups around vm_normal_folio_pmd() and the softleaf functionality are
   performed.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA
 jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi
 GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ=
 =1Q7o
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "maple_tree: Replace big node with maple copy" (Liam Howlett)

   Mainly prepararatory work for ongoing development but it does reduce
   stack usage and is an improvement.

 - "mm, swap: swap table phase III: remove swap_map" (Kairui Song)

   Offers memory savings by removing the static swap_map. It also yields
   some CPU savings and implements several cleanups.

 - "mm: memfd_luo: preserve file seals" (Pratyush Yadav)

   File seal preservation to LUO's memfd code

 - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
   Chen)

   Additional userspace stats reportng to zswap

 - "arch, mm: consolidate empty_zero_page" (Mike Rapoport)

   Some cleanups for our handling of ZERO_PAGE() and zero_pfn

 - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
   Han)

   A robustness improvement and some cleanups in the kmemleak code

 - "Improve khugepaged scan logic" (Vernon Yang)

   Improve khugepaged scan logic and reduce CPU consumption by
   prioritizing scanning tasks that access memory frequently

 - "Make KHO Stateless" (Jason Miu)

   Simplify Kexec Handover by transitioning KHO from an xarray-based
   metadata tracking system with serialization to a radix tree data
   structure that can be passed directly to the next kernel

 - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
   Ballasi and Steven Rostedt)

   Enhance vmscan's tracepointing

 - "mm: arch/shstk: Common shadow stack mapping helper and
   VM_NOHUGEPAGE" (Catalin Marinas)

   Cleanup for the shadow stack code: remove per-arch code in favour of
   a generic implementation

 - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)

   Fix a WARN() which can be emitted the KHO restores a vmalloc area

 - "mm: Remove stray references to pagevec" (Tal Zussman)

   Several cleanups, mainly udpating references to "struct pagevec",
   which became folio_batch three years ago

 - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
   Shutsemau)

   Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
   pages encode their relationship to the head page

 - "mm/damon/core: improve DAMOS quota efficiency for core layer
   filters" (SeongJae Park)

   Improve two problematic behaviors of DAMOS that makes it less
   efficient when core layer filters are used

 - "mm/damon: strictly respect min_nr_regions" (SeongJae Park)

   Improve DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter

 - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)

   The proper fix for a previously hotfixed SMP=n issue. Code
   simplifications and cleanups ensued

 - "mm: cleanups around unmapping / zapping" (David Hildenbrand)

   A bunch of cleanups around unmapping and zapping. Mostly
   simplifications, code movements, documentation and renaming of
   zapping functions

 - "support batched checking of the young flag for MGLRU" (Baolin Wang)

   Batched checking of the young flag for MGLRU. It's part cleanups; one
   benchmark shows large performance benefits for arm64

 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)

   memcg cleanup and robustness improvements

 - "Allow order zero pages in page reporting" (Yuvraj Sakshith)

   Enhance free page reporting - it is presently and undesirably order-0
   pages when reporting free memory.

 - "mm: vma flag tweaks" (Lorenzo Stoakes)

   Cleanup work following from the recent conversion of the VMA flags to
   a bitmap

 - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
   Park)

   Add some more developer-facing debug checks into DAMON core

 - "mm/damon: test and document power-of-2 min_region_sz requirement"
   (SeongJae Park)

   An additional DAMON kunit test and makes some adjustments to the
   addr_unit parameter handling

 - "mm/damon/core: make passed_sample_intervals comparisons
   overflow-safe" (SeongJae Park)

   Fix a hard-to-hit time overflow issue in DAMON core

 - "mm/damon: improve/fixup/update ratio calculation, test and
   documentation" (SeongJae Park)

   A batch of misc/minor improvements and fixups for DAMON

 - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
   Hildenbrand)

   Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
   movement was required.

 - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)

   A somewhat random mix of fixups, recompression cleanups and
   improvements in the zram code

 - "mm/damon: support multiple goal-based quota tuning algorithms"
   (SeongJae Park)

   Extend DAMOS quotas goal auto-tuning to support multiple tuning
   algorithms that users can select

 - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)

   Fix the khugpaged sysfs handling so we no longer spam the logs with
   reams of junk when starting/stopping khugepaged

 - "mm: improve map count checks" (Lorenzo Stoakes)

   Provide some cleanups and slight fixes in the mremap, mmap and vma
   code

 - "mm/damon: support addr_unit on default monitoring targets for
   modules" (SeongJae Park)

   Extend the use of DAMON core's addr_unit tunable

 - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)

   Cleanups to khugepaged and is a base for Nico's planned khugepaged
   mTHP support

 - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)

   Code movement and cleanups in the memhotplug and sparsemem code

 - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
   CONFIG_MIGRATION" (David Hildenbrand)

   Rationalize some memhotplug Kconfig support

 - "change young flag check functions to return bool" (Baolin Wang)

   Cleanups to change all young flag check functions to return bool

 - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
   Law and SeongJae Park)

   Fix a few potential DAMON bugs

 - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
   Stoakes)

   Convert a lot of the existing use of the legacy vm_flags_t data type
   to the new vma_flags_t type which replaces it. Mainly in the vma
   code.

 - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)

   Expand the mmap_prepare functionality, which is intended to replace
   the deprecated f_op->mmap hook which has been the source of bugs and
   security issues for some time. Cleanups, documentation, extension of
   mmap_prepare into filesystem drivers

 - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)

   Simplify and clean up zap_huge_pmd(). Additional cleanups around
   vm_normal_folio_pmd() and the softleaf functionality are performed.

* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm: fix deferred split queue races during migration
  mm/khugepaged: fix issue with tracking lock
  mm/huge_memory: add and use has_deposited_pgtable()
  mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  mm/huge_memory: separate out the folio part of zap_huge_pmd()
  mm/huge_memory: use mm instead of tlb->mm
  mm/huge_memory: remove unnecessary sanity checks
  mm/huge_memory: deduplicate zap deposited table call
  mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  mm/huge_memory: add a common exit path to zap_huge_pmd()
  mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  mm/huge: avoid big else branch in zap_huge_pmd()
  mm/huge_memory: simplify vma_is_specal_huge()
  mm: on remap assert that input range within the proposed VMA
  mm: add mmap_action_map_kernel_pages[_full]()
  uio: replace deprecated mmap hook with mmap_prepare in uio_info
  drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
  mm: allow handling of stacked mmap_prepare hooks in more drivers
  ...
2026-04-15 12:59:16 -07:00
Linus Torvalds
c1fe867b5b Updates for the timer/timekeeping core:
- A rework of the hrtimer subsystem to reduce the overhead for frequently
     armed timers, especially the hrtick scheduler timer.
 
       - Better timer locality decision
 
       - Simplification of the evaluation of the first expiry time by
         keeping track of the neighbor timers in the RB-tree by providing a
         RB-tree variant with neighbor links. That avoids walking the
         RB-tree on removal to find the next expiry time, but even more
         important allows to quickly evaluate whether a timer which is
         rearmed changes the position in the RB-tree with the modified
         expiry time or not. If not, the dequeue/enqueue sequence which both
         can end up in rebalancing can be completely avoided.
 
       - Deferred reprogramming of the underlying clock event device. This
         optimizes for the situation where a hrtimer callback sets the need
         resched bit. In that case the code attempts to defer the
         re-programming of the clock event device up to the point where the
         scheduler has picked the next task and has the next hrtick timer
         armed. In case that there is no immediate reschedule or soft
         interrupts have to be handled before reaching the reschedule point
         in the interrupt entry code the clock event is reprogrammed in one
         of those code paths to prevent that the timer becomes stale.
 
       - Support for clocksource coupled clockevents
 
       	The TSC deadline timer is coupled to the TSC. The next event is
       	programmed in TSC time. Currently this is done by converting the
       	CLOCK_MONOTONIC based expiry value into a relative timeout,
       	converting it into TSC ticks, reading the TSC adding the delta
       	ticks and writing the deadline MSR.
 
 	As the timekeeping core has the conversion factors for the TSC
 	already, the whole back and forth conversion can be completely
 	avoided. The timekeeping core calculates the reverse conversion
 	factors from nanoseconds to TSC ticks and utilizes the base
 	timestamps of TSC and CLOCK_MONOTONIC which are updated once per
 	tick. This allows a direct conversion into the TSC deadline value
 	without reading the time and as a bonus keeps the deadline
 	conversion in sync with the TSC conversion factors, which are
 	updated by adjtimex() on systems with NTP/PTP enabled.
 
      - Allow inlining of the clocksource read and clockevent write
        functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR.
 
     With all those enhancements in place a hrtick enabled scheduler
     provides the same performance as without hrtick. But also other hrtimer
     users obviously benefit from these optimizations.
 
   - Robustness improvements and cleanups of historical sins in the hrtimer
     and timekeeping code.
 
   - Rewrite of the clocksource watchdog.
 
     The clocksource watchdog code has over time reached the state of an
     impenetrable maze of duct tape and staples. The original design, which was
     made in the context of systems far smaller than today, is based on the
     assumption that the to be monitored clocksource (TSC) can be trivially
     compared against a known to be stable clocksource (HPET/ACPI-PM timer).
 
     Over the years this rather naive approach turned out to have major
     flaws. Long delays between the watchdog invocations can cause wrap
     arounds of the reference clocksource. The access to the reference
     clocksource degrades on large multi-sockets systems dure to
     interconnect congestion. This has been addressed with various
     heuristics which degraded the accuracy of the watchdog to the point
     that it fails to detect actual TSC problems on older hardware which
     exposes slow inter CPU drifts due to firmware manipulating the TSC to
     hide SMI time.
 
     The rewrite addresses this by:
 
       - Restricting the validation against the reference clocksource to the
         boot CPU which is usually closest to the legacy block which
         contains the reference clocksource (HPET/ACPI-PM).
 
       - Do a round robin validation betwen the boot CPU and the other CPUs
         based only on the TSC with an algorithm similar to the TSC
         synchronization code during CPU hotplug.
 
       - Being more leniant versus remote timeouts
 
   - The usual tiny fixes, cleanups and enhancements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmnbxdMQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoeq6EAC4h9wuBr5yCkxmog1Bhlk9cnK0oX1THb7V
 Q4z6DrYAiDXP6z4IDwqSW+3vvmNw1QXOeqpyMTiiIcQ5mNSs1IDnCt5HOEwY+ICm
 fiSUMYkXkH6xdFWspYWFkD7aExHJRT3hd/bo+WnXGHhHclPj5NHZssLMIDboHrzX
 jLV1hljmthfwg/uOXDGmQUPRFjqr2ZQjo7zGA5SwfVg8Krz7g/qRVy2wUns9TdW/
 NYwihDm1YV7qkK/+f1GnMdd70toqb1OZo/fS9FPbBrPLdyi8V+UbnFSUeZu8kCwV
 KubAzjLZR4xYCnrlaHhoi208GMd0TOvHMOrdAA0zkQHfhmszGl4N0pbF/EI29Ft2
 tQG/FUTG+nzgNOrMCPN2nr3u/UOXLP+gO+2hkyyQjqUP35IyaTYQn10JgBmPTdJL
 Ab6E8WL9gTMCd+t/bVjdU/B8W9ruMihKBtWkTfMBCcQ9uNJFCEGzrcMF8hzFYRTs
 /4rMDr3NlGYydAnbKPj6bkC5gtjBvh/L08kOdUFyXCMSqIzvJkZJ4241ogl1Awi6
 VfdwjF5KZCQo3M1ujpep+1L010wC/yulqLt2brQMO9Nt05dRhgwM3lxy7cnlMNm3
 NdfMgi+OG0CzQ+ZUpvo20hCgTDUVgWN9g5R3rar8FJX+Ym3T+ZoEKlShZF+fSRjf
 YAUIbUyi7A==
 =2qc8
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:

 - A rework of the hrtimer subsystem to reduce the overhead for
   frequently armed timers, especially the hrtick scheduler timer:

     - Better timer locality decision

     - Simplification of the evaluation of the first expiry time by
       keeping track of the neighbor timers in the RB-tree by providing
       a RB-tree variant with neighbor links. That avoids walking the
       RB-tree on removal to find the next expiry time, but even more
       important allows to quickly evaluate whether a timer which is
       rearmed changes the position in the RB-tree with the modified
       expiry time or not. If not, the dequeue/enqueue sequence which
       both can end up in rebalancing can be completely avoided.

     - Deferred reprogramming of the underlying clock event device. This
       optimizes for the situation where a hrtimer callback sets the
       need resched bit. In that case the code attempts to defer the
       re-programming of the clock event device up to the point where
       the scheduler has picked the next task and has the next hrtick
       timer armed. In case that there is no immediate reschedule or
       soft interrupts have to be handled before reaching the reschedule
       point in the interrupt entry code the clock event is reprogrammed
       in one of those code paths to prevent that the timer becomes
       stale.

     - Support for clocksource coupled clockevents

       The TSC deadline timer is coupled to the TSC. The next event is
       programmed in TSC time. Currently this is done by converting the
       CLOCK_MONOTONIC based expiry value into a relative timeout,
       converting it into TSC ticks, reading the TSC adding the delta
       ticks and writing the deadline MSR.

       As the timekeeping core has the conversion factors for the TSC
       already, the whole back and forth conversion can be completely
       avoided. The timekeeping core calculates the reverse conversion
       factors from nanoseconds to TSC ticks and utilizes the base
       timestamps of TSC and CLOCK_MONOTONIC which are updated once per
       tick. This allows a direct conversion into the TSC deadline value
       without reading the time and as a bonus keeps the deadline
       conversion in sync with the TSC conversion factors, which are
       updated by adjtimex() on systems with NTP/PTP enabled.

     - Allow inlining of the clocksource read and clockevent write
       functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR.

   With all those enhancements in place a hrtick enabled scheduler
   provides the same performance as without hrtick. But also other
   hrtimer users obviously benefit from these optimizations.

 - Robustness improvements and cleanups of historical sins in the
   hrtimer and timekeeping code.

 - Rewrite of the clocksource watchdog.

   The clocksource watchdog code has over time reached the state of an
   impenetrable maze of duct tape and staples. The original design,
   which was made in the context of systems far smaller than today, is
   based on the assumption that the to be monitored clocksource (TSC)
   can be trivially compared against a known to be stable clocksource
   (HPET/ACPI-PM timer).

   Over the years this rather naive approach turned out to have major
   flaws. Long delays between the watchdog invocations can cause wrap
   arounds of the reference clocksource. The access to the reference
   clocksource degrades on large multi-sockets systems dure to
   interconnect congestion. This has been addressed with various
   heuristics which degraded the accuracy of the watchdog to the point
   that it fails to detect actual TSC problems on older hardware which
   exposes slow inter CPU drifts due to firmware manipulating the TSC to
   hide SMI time.

   The rewrite addresses this by:

     - Restricting the validation against the reference clocksource to
       the boot CPU which is usually closest to the legacy block which
       contains the reference clocksource (HPET/ACPI-PM).

     - Do a round robin validation betwen the boot CPU and the other
       CPUs based only on the TSC with an algorithm similar to the TSC
       synchronization code during CPU hotplug.

     - Being more leniant versus remote timeouts

 - The usual tiny fixes, cleanups and enhancements all over the place

* tag 'timers-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  alarmtimer: Access timerqueue node under lock in suspend
  hrtimer: Fix incorrect #endif comment for BITS_PER_LONG check
  posix-timers: Fix stale function name in comment
  timers: Get this_cpu once while clearing the idle state
  clocksource: Rewrite watchdog code completely
  clocksource: Don't use non-continuous clocksources as watchdog
  x86/tsc: Handle CLOCK_SOURCE_VALID_FOR_HRES correctly
  MIPS: Don't select CLOCKSOURCE_WATCHDOG
  parisc: Remove unused clocksource flags
  hrtimer: Add a helper to retrieve a hrtimer from its timerqueue node
  hrtimer: Remove trailing comma after HRTIMER_MAX_CLOCK_BASES
  hrtimer: Mark index and clockid of clock base as const
  hrtimer: Drop unnecessary pointer indirection in hrtimer_expire_entry event
  hrtimer: Drop spurious space in 'enum hrtimer_base_type'
  hrtimer: Don't zero-initialize ret in hrtimer_nanosleep()
  hrtimer: Remove hrtimer_get_expires_ns()
  timekeeping: Mark offsets array as const
  timekeeping/auxclock: Consistently use raw timekeeper for tk_setup_internals()
  timer_list: Print offset as signed integer
  tracing: Use explicit array size instead of sentinel elements in symbol printing
  ...
2026-04-14 10:27:07 -07:00
Linus Torvalds
ef3da345cc vfs-7.1-rc1.misc
Please consider pulling these changes from the signed vfs-7.1-rc1.misc tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCwAKCRCRxhvAZXjc
 ohhBAQCAmQMlMRAXAgUZFYMTZpeQlcujP5rv+/vT2Tf/xS76YwD/dRDaw1FH294+
 qtk/Z1NjleNixzE2sld1K9J32NxeyAc=
 =+g9q
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:
   - coredump: add tracepoint for coredump events
   - fs: hide file and bfile caches behind runtime const machinery

  Fixes:
   - fix architecture-specific compat_ftruncate64 implementations
   - dcache: Limit the minimal number of bucket to two
   - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START
   - fs/mbcache: cancel shrink work before destroying the cache
   - dcache: permit dynamic_dname()s up to NAME_MAX

  Cleanups:
   - remove or unexport unused fs_context infrastructure
   - trivial ->setattr cleanups
   - selftests/filesystems: Assume that TIOCGPTPEER is defined
   - writeback: fix kernel-doc function name mismatch for wb_put_many()
   - autofs: replace manual symlink buffer allocation in autofs_dir_symlink
   - init/initramfs.c: trivial fix: FSM -> Finite-state machine
   - fs: remove stale and duplicate forward declarations
   - readdir: Introduce dirent_size()
   - fs: Replace user_access_{begin/end} by scoped user access
   - kernel: acct: fix duplicate word in comment
   - fs: write a better comment in step_into() concerning .mnt assignment
   - fs: attr: fix comment formatting and spelling issues"

* tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  dcache: permit dynamic_dname()s up to NAME_MAX
  fs: attr: fix comment formatting and spelling issues
  fs: hide file and bfile caches behind runtime const machinery
  fs: write a better comment in step_into() concerning .mnt assignment
  proc: rename proc_notify_change to proc_setattr
  proc: rename proc_setattr to proc_nochmod_setattr
  affs: rename affs_notify_change to affs_setattr
  adfs: rename adfs_notify_change to adfs_setattr
  hfs: update comments on hfs_inode_setattr
  kernel: acct: fix duplicate word in comment
  fs: Replace user_access_{begin/end} by scoped user access
  readdir: Introduce dirent_size()
  coredump: add tracepoint for coredump events
  fs: remove do_sys_truncate
  fs: pass on FTRUNCATE_* flags to do_truncate
  fs: fix archiecture-specific compat_ftruncate64
  fs: remove stale and duplicate forward declarations
  init/initramfs.c: trivial fix: FSM -> Finite-state machine
  autofs: replace manual symlink buffer allocation in autofs_dir_symlink
  fs/mbcache: cancel shrink work before destroying the cache
  ...
2026-04-13 14:20:11 -07:00
Thomas Gleixner
ff1c0c5d07 Merge branch 'timers/urgent' into timers/core
to resolve the conflict with urgent fixes.
2026-04-11 07:58:33 +02:00
Baolin Wang
06c4dfa3ce mm: change to return bool for ptep_clear_flush_young()/clear_flush_young_ptes()
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to
clear the young flag and flush the TLB, returning whether the young flag
was set.  Change the return type to bool to make the intention clearer.

Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:35 -07:00
Ilpo Järvinen
9036bd0efc PCI: Align head space better
When a bridge window contains big and small resource(s), the small
resource(s) may not amount to the half of the size of the big resource
which would allow calculate_head_align() to shrink the head alignment.
This results in always placing the small resource(s) after the big
resource.

In general, it would be good to be able to place the small resource(s)
before the big resource to achieve better utilization of the address space.
In the cases where the large resource can only fit at the end of the
window, it is even required.

However, carrying the information over from pbus_size_mem() and
calculate_head_align() to __pci_assign_resource() and
pcibios_align_resource() is not easy with the current data structures.

A somewhat hacky way to move the non-aligning tail part to the head is
possible within pcibios_align_resource(). The free space between the start
of the free space span and the aligned start address can be compared with
the non-aligning remainder of the size. If the free space is larger than
the remainder, placing the remainder before the start address is possible.
This relocation should generally work, because PCI resources consist only
power-of-2 atoms.

Various arch requirements may still need to override the relocation, so the
relocation is only applied selectively in such cases.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221205
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-10-ilpo.jarvinen@linux.intel.com
2026-03-27 10:19:08 -05:00
Ilpo Järvinen
38ec53e16f parisc/PCI: Clean up align handling
Caller of pcibios_align_resource() (__find_resource_space()) already aligns
the start address by 'alignment' so aligning is only necessary if align >
alignment.

Change also to use ALIGN() instead of open-coding.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260324165633.4583-8-ilpo.jarvinen@linux.intel.com
2026-03-27 10:19:08 -05:00
Ilpo Järvinen
f699bcc8bc resource: Pass full extent of empty space to resource_alignf callback
__find_resource_space() calculates the full extent of empty space but only
passes the aligned space to resource_alignf callback. In some situations,
the callback may choose take advantage of the free space before the
requested alignment.

Pass the full extent of the calculated empty space to resource_alignf
callback as an additional parameter.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-3-ilpo.jarvinen@linux.intel.com
2026-03-27 10:18:39 -05:00
Christoph Hellwig
e43dce8a0b
fs: fix archiecture-specific compat_ftruncate64
The "small" argument to do_sys_ftruncate indicates if > 32-bit size
should be reject, but all the arch-specific compat ftruncate64
implementations get this wrong.  Merge do_sys_ftruncate and
ksys_ftruncate, replace the integer as boolean small flag with a
descriptive one about LFS semantics, and use it correctly in the
architecture-specific ftruncate64 implementations.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: 3dd681d944 ("arm64: 32-bit (compat) applications support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260323070205.2939118-2-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-23 12:41:57 +01:00
Ingo Molnar
f6472b1793 Linux 7.0-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmm3G/UeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZJUH/R0vQ3Vha48QDEic
 1NREwaHxAoTFi0i3y7OPPklqrP2V09D1qg4Q6fExYQVTQgV6F2DRjVbyPKrmr4ay
 BA6aHrUdnFngYHpDlI1b1r7rJiAIN4WFHl7StO70bS+EB+UPsP9cfP3CKXUfKfqT
 kyHXzUrd5QnjYmlb9rQw1E6rzsRamNtGUtZf7TwDidJYjtm3sPeDHUkjyRy4xkYd
 UouIu6W7UXoicl38bJAgaWBY5BiYtjN6ktnY4/gcqDeqYd7mTM3Eb1B+OSXgFfip
 F0OYfJhfWn+63WnPA+1I5jXWC1UrdVXTMK/NTYjhmGlfdmkLcWDlNGtu+qKZbpwj
 fmF3Kyo=
 =6nX1
 -----END PGP SIGNATURE-----

Merge tag 'v7.0-rc4' into timers/core, to resolve conflict

Resolve conflict between this change in the upstream kernel:

  4c652a4772 ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline")

... and this pending change in timers/core:

  0e98eb1481 ("entry: Prepare for deferred hrtimer rearming")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-03-21 08:02:36 +01:00
Helge Deller
2c98a8fbd6 parisc: Flush correct cache in cacheflush() syscall
The assembly flush instructions were swapped for I- and D-cache flags:

SYSCALL_DEFINE3(cacheflush, ...)
{
	if (cache & DCACHE) {
			"fic ...\n"
	}
	if (cache & ICACHE && error == 0) {
			"fdc ...\n"
	}

Fix it by using fdc for DCACHE, and fic for ICACHE flushing.

Reported-by: Felix Lechner <felix.lechner@lease-up.com>
Fixes: c6d96328fe ("parisc: Add cacheflush() syscall")
Cc: <stable@vger.kernel.org> # v6.5+
Signed-off-by: Helge Deller <deller@gmx.de>
2026-03-15 09:28:49 +01:00
Thomas Gleixner
2a14f7c5ee parisc: Remove unused clocksource flags
PARISC does not enable the clocksource watchdog, so the VERIFY flags are
pointless as they are not evaluated. Remove them from the clocksource.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Helge Deller <deller@gmx.de>
Link: https://patch.msgid.link/20260123231521.655892451@kernel.org
2026-03-12 12:23:26 +01:00
Linus Torvalds
6deccafcb4 parisc architecture fixes for kernel v7.0-rc3:
Three initial kernel mapping fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaayE4AAKCRD3ErUQojoP
 X4U4AQDtHPc9nlM3areu5yTQnOcPTExuEoIpvBm9ktwNCdrwCgEAt4tqv3hhxCvG
 /lwb6XBCHfyw3d/AsTRbOIH1MGCnaQQ=
 =itGt
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "While testing Sasha Levin's 'kallsyms: embed source file:line info in
  kernel stack traces' patch series, which increases the typical kernel
  image size, I found some issues with the parisc initial kernel mapping
  which may prevent the kernel to boot.

  The three small patches here fix this"

* tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix initial page table creation for boot
  parisc: Check kernel mapping earlier at bootup
  parisc: Increase initial mapping to 64 MB with KALLSYMS
2026-03-07 12:38:16 -08:00
Helge Deller
8475d8fe21 parisc: Fix initial page table creation for boot
The KERNEL_INITIAL_ORDER value defines the initial size (usually 32 or
64 MB) of the page table during bootup. Up until now the whole area was
initialized with PTE entries, but there was no check if we filled too
many entries.  Change the code to fill up with so many entries that the
"_end" symbol can be reached by the kernel, but not more entries than
actually fit into the initial PTE tables.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.0+
2026-03-06 11:33:13 +01:00
Helge Deller
17c144f110 parisc: Check kernel mapping earlier at bootup
The check if the initial mapping is sufficient needs to happen much
earlier during bootup. Move this test directly to the start_parisc()
function and use native PDC iodc functions to print the warning, because
panic() and printk() are not functional yet.

This fixes boot when enabling various KALLSYSMS options which need
much more space.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.0+
2026-03-06 11:33:13 +01:00
Nathan Chancellor
8678591b47
kbuild: Split .modinfo out from ELF_DETAILS
Commit 3e86e4d74c ("kbuild: keep .modinfo section in
vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it
from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and
ELF_DETAILS was present in all architecture specific vmlinux linker
scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and
COMMON_DISCARDS may be used by other linker scripts, such as the s390
and x86 compressed boot images, which may not expect to have a .modinfo
section. In certain circumstances, this could result in a bootloader
failing to load the compressed kernel [1].

Commit ddc6cbef3e ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with
SecureBoot trailer") recently addressed this for the s390 bzImage but
the same bug remains for arm, parisc, and x86. The presence of .modinfo
in the x86 bzImage was the root cause of the issue worked around with
commit d50f210913 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes
lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its
MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition.

Split .modinfo out from ELF_DETAILS into its own macro and handle it in
all vmlinux linker scripts. Discard .modinfo in the places where it was
previously being discarded from being in COMMON_DISCARDS, as it has
never been necessary in those uses.

Cc: stable@vger.kernel.org
Fixes: 3e86e4d74c ("kbuild: keep .modinfo section in vmlinux.unstripped")
Reported-by: Ed W <lists@wildgooses.com>
Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1]
Tested-by: Ed W <lists@wildgooses.com> # x86_64
Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-26 11:50:19 -07:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
8ad8d24d96 parisc architecture fixes and updates for kernel v7.0-rc1:
- Fix device reference leak in error path
 - Check if system provides a 64-bit free running platform counter
 - Minor fixes in debug code
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaYuPEgAKCRD3ErUQojoP
 X3z9AQCfvU14/vkMab/iXt7mn7yf1CZbvo7cczAMA8XTRvLNUQD7BL+6tHqSxwCh
 rZax1BQR8TKArZYWkdOADtldog0plQ4=
 =NhTL
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:

 - Fix device reference leak in error path

 - Check if system provides a 64-bit free running platform counter

 - Minor fixes in debug code

* tag 'parisc-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: lba_pci: Add debug code to show IO and PA ranges
  parisc: Detect 64-bit free running platform counter
  parisc: Fix minor printk issues in iosapic debug code
  parisc: Enhance debug code for PAT firmware
  parisc: Add PDC PAT call to get free running 64-bit counter
  parisc: Fix module path output in qemu tables
  parisc: Export model name for MPE/ix
  parisc: Prevent interrupts during reboot
  parisc: Print hardware IDs as 4 digit hex strings
  parisc: kernel: replace kfree() with put_device() in create_tree_node()
2026-02-10 21:42:10 -08:00
Helge Deller
62c544bc10 parisc: Detect 64-bit free running platform counter
Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:19 +01:00
Helge Deller
db7e826030 parisc: Enhance debug code for PAT firmware
Enhance output when running PAT firmware, which is helpful when
developing the QEMU emulator. The added code is behind an ifdef,
so no performance overhead for normal kernels.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:19 +01:00
Helge Deller
0b3b90a0f9 parisc: Add PDC PAT call to get free running 64-bit counter
PDC PAT defines this optional function. Testing on my C8000 workstation
and a rp3440 server did not indicate that they provide such counter.
Nevertheless, add the function since we should try to use such a
counter if it's available. In Qemu it should be simple to add it.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:19 +01:00
Helge Deller
5ff7842103 parisc: Fix module path output in qemu tables
Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:19 +01:00
Helge Deller
97cb9150ff parisc: Export model name for MPE/ix
Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:18 +01:00
Helge Deller
35ac5a728c parisc: Prevent interrupts during reboot
Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:18 +01:00
Helge Deller
ba74652c30 parisc: Print hardware IDs as 4 digit hex strings
The hardware IDs are 32-bit unsigned integers, so print them as 4-digit
hex numbers. Additionally fix some whitespace glitches.

Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:18 +01:00
Haoxiang Li
dcf69599c4 parisc: kernel: replace kfree() with put_device() in create_tree_node()
If device_register() fails, put_device() is the correct way to
drop the device reference.

Found by code review.

Fixes: 1070c9655b ("[PA-RISC] Fix must_check warnings in drivers.c")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-07 00:45:18 +01:00
Thomas Gleixner
99d2592023 rseq: Implement sys_rseq_slice_yield()
Provide a new syscall which has the only purpose to yield the CPU after the
kernel granted a time slice extension.

sched_yield() is not suitable for that because it unconditionally
schedules, but the end of the time slice extension is not required to
schedule when the task was already preempted. This also allows to have a
strict check for termination to catch user space invoking random syscalls
including sched_yield() from a time slice extension region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251215155708.929634896@linutronix.de
2026-01-22 11:11:17 +01:00
Linus Torvalds
50471f8b73 parisc architecture fixes and updates for kernel v6.19-rc1:
- Fix boot on 710 workstation by not reprogramming ASP chip
 - Fix 64bit userspace syscalls (64-bit userspace is still being
   developed)
 - minor code cleanups in asm/bug.h and perf_regs.c
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaTSEZgAKCRD3ErUQojoP
 XyssAQDdM4gi+BUSGPx9OSnylp6H5OY1FHjIHqYWziT1fTVaRQD/RQM5phzXj25+
 NAmG2EEVfS9mj6fA39QqZ8yw9MOGmgs=
 =8ZMr
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc architecture updates from Helge Deller:
 "A fix which allows booting on the very old 710 workstations, and two
  fixes in the syscall entry/exit path which allow to execute 64-bit
  userspace binaries.

  Note that although we currently have a 64-bit (static) kernel to allow
  more than 4 GB physical RAM, there is no support for 64-bit userspace
  for parisc-linux yet, but Dave and Sven are making slowly progress to
  port and fix glibc and gcc.

  Summary:

   - Fix boot on 710 workstation by not reprogramming ASP chip

   - Fix 64bit userspace syscalls (64-bit userspace is still being
     developed)

   - minor code cleanups in asm/bug.h and perf_regs.c"

* tag 'parisc-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Do not reprogram affinitiy on ASP chip
  parisc: Drop linux/kernel.h include from asm/bug.h header
  parisc: remove unneeded semicolon in perf_regs.c
  parisc: entry.S: fix space adjustment on interruption for 64-bit userspace
  parisc: entry: set W bit for !compat tasks in syscall_restore_rfi()
  parisc: Drop padding fields and layers entries from inventory log
2025-12-06 16:24:52 -08:00
Linus Torvalds
415d34b92c namespace-6.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 ooKwAP4kR5kMjHlthf8jHmmCjVU3nQFO9hUZsIQL9gFJLOIQMAD+LLoTaq1WJufl
 oSgZpREXZVmI1TK61eR6EZMB1YikGAo=
 =TExi
 -----END PGP SIGNATURE-----

Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains substantial namespace infrastructure changes including a new
  system call, active reference counting, and extensive header cleanups.
  The branch depends on the shared kbuild branch for -fms-extensions support.

  Features:

   - listns() system call

     Add a new listns() system call that allows userspace to iterate
     through namespaces in the system. This provides a programmatic
     interface to discover and inspect namespaces, addressing
     longstanding limitations:

     Currently, there is no direct way for userspace to enumerate
     namespaces. Applications must resort to scanning /proc/*/ns/ across
     all processes, which is:
      - Inefficient - requires iterating over all processes
      - Incomplete - misses namespaces not attached to any running
        process but kept alive by file descriptors, bind mounts, or
        parent references
      - Permission-heavy - requires access to /proc for many processes
      - No ordering or ownership information
      - No filtering per namespace type

     The listns() system call solves these problems:

       ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
                      size_t nr_ns_ids, unsigned int flags);

       struct ns_id_req {
             __u32 size;
             __u32 spare;
             __u64 ns_id;
             struct /* listns */ {
                     __u32 ns_type;
                     __u32 spare2;
                     __u64 user_ns_id;
             };
       };

     Features include:
      - Pagination support for large namespace sets
      - Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
      - Filtering by owning user namespace
      - Permission checks respecting namespace isolation

   - Active Reference Counting

     Introduce an active reference count that tracks namespace
     visibility to userspace. A namespace is visible in the following
     cases:
      - The namespace is in use by a task
      - The namespace is persisted through a VFS object (namespace file
        descriptor or bind-mount)
      - The namespace is a hierarchical type and is the parent of child
        namespaces

     The active reference count does not regulate lifetime (that's still
     done by the normal reference count) - it only regulates visibility
     to namespace file handles and listns().

     This prevents resurrection of namespaces that are pinned only for
     internal kernel reasons (e.g., user namespaces held by
     file->f_cred, lazy TLB references on idle CPUs, etc.) which should
     not be accessible via (1)-(3).

   - Unified Namespace Tree

     Introduce a unified tree structure for all namespaces with:
      - Fixed IDs assigned to initial namespaces
      - Lookup based solely on inode number
      - Maintained list of owned namespaces per user namespace
      - Simplified rbtree comparison helpers

   Cleanups

    - Header Reorganization:
      - Move namespace types into separate header (ns_common_types.h)
      - Decouple nstree from ns_common header
      - Move nstree types into separate header
      - Switch to new ns_tree_{node,root} structures with helper functions
      - Use guards for ns_tree_lock

   - Initial Namespace Reference Count Optimization
      - Make all reference counts on initial namespaces a nop to avoid
        pointless cacheline ping-pong for namespaces that can never go
        away
      - Drop custom reference count initialization for initial namespaces
      - Add NS_COMMON_INIT() macro and use it for all namespaces
      - pid: rely on common reference count behavior

   - Miscellaneous Cleanups
      - Rename exit_task_namespaces() to exit_nsproxy_namespaces()
      - Rename is_initial_namespace() and make argument const
      - Use boolean to indicate anonymous mount namespace
      - Simplify owner list iteration in nstree
      - nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
      - nsfs: use inode_just_drop()
      - pidfs: raise DCACHE_DONTCACHE explicitly
      - pidfs: simplify PIDFD_GET__NAMESPACE ioctls
      - libfs: allow to specify s_d_flags
      - cgroup: add cgroup namespace to tree after owner is set
      - nsproxy: fix free_nsproxy() and simplify create_new_namespaces()

  Fixes:

   - setns(pidfd, ...) race condition

     Fix a subtle race when using pidfds with setns(). When the target
     task exits after prepare_nsset() but before commit_nsset(), the
     namespace's active reference count might have been dropped. If
     setns() then installs the namespaces, it would bump the active
     reference count from zero without taking the required reference on
     the owner namespace, leading to underflow when later decremented.

     The fix resurrects the ownership chain if necessary - if the caller
     succeeded in grabbing passive references, the setns() should
     succeed even if the target task exits or gets reaped.

   - Return EFAULT on put_user() error instead of success

   - Make sure references are dropped outside of RCU lock (some
     namespaces like mount namespace sleep when putting the last
     reference)

   - Don't skip active reference count initialization for network
     namespace

   - Add asserts for active refcount underflow

   - Add asserts for initial namespace reference counts (both passive
     and active)

   - ipc: enable is_ns_init_id() assertions

   - Fix kernel-doc comments for internal nstree functions

   - Selftests
      - 15 active reference count tests
      - 9 listns() functionality tests
      - 7 listns() permission tests
      - 12 inactive namespace resurrection tests
      - 3 threaded active reference count tests
      - commit_creds() active reference tests
      - Pagination and stress tests
      - EFAULT handling test
      - nsid tests fixes"

* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
  pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
  nstree: fix kernel-doc comments for internal functions
  nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
  selftests/namespaces: fix nsid tests
  ns: drop custom reference count initialization for initial namespaces
  pid: rely on common reference count behavior
  ns: add asserts for initial namespace active reference counts
  ns: add asserts for initial namespace reference counts
  ns: make all reference counts on initial namespace a nop
  ipc: enable is_ns_init_id() assertions
  fs: use boolean to indicate anonymous mount namespace
  ns: rename is_initial_namespace()
  ns: make is_initial_namespace() argument const
  nstree: use guards for ns_tree_lock
  nstree: simplify owner list iteration
  nstree: switch to new structures
  nstree: add helper to operate on struct ns_tree_{node,root}
  nstree: move nstree types into separate header
  nstree: decouple from ns_common header
  ns: move namespace types into separate header
  ...
2025-12-01 09:47:41 -08:00
Jiapeng Chong
3317aaca33 parisc: remove unneeded semicolon in perf_regs.c
No functional modification involved.

./arch/parisc/kernel/perf_regs.c:30:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26159
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511090323.OwYsZkev-lkp@intel.com/
Signed-off-by: Helge Deller <deller@gmx.de>
2025-11-08 23:15:26 +01:00
Helge Deller
fd9f30d103 parisc: Avoid crash due to unaligned access in unwinder
Guenter Roeck reported this kernel crash on his emulated B160L machine:

Starting network: udhcpc: started, v1.36.1
 Backtrace:
  [<104320d4>] unwind_once+0x1c/0x5c
  [<10434a00>] walk_stackframe.isra.0+0x74/0xb8
  [<10434a6c>] arch_stack_walk+0x28/0x38
  [<104e5efc>] stack_trace_save+0x48/0x5c
  [<105d1bdc>] set_track_prepare+0x44/0x6c
  [<105d9c80>] ___slab_alloc+0xfc4/0x1024
  [<105d9d38>] __slab_alloc.isra.0+0x58/0x90
  [<105dc80c>] kmem_cache_alloc_noprof+0x2ac/0x4a0
  [<105b8e54>] __anon_vma_prepare+0x60/0x280
  [<105a823c>] __vmf_anon_prepare+0x68/0x94
  [<105a8b34>] do_wp_page+0x8cc/0xf10
  [<105aad88>] handle_mm_fault+0x6c0/0xf08
  [<10425568>] do_page_fault+0x110/0x440
  [<10427938>] handle_interruption+0x184/0x748
  [<11178398>] schedule+0x4c/0x190
  BUG: spinlock recursion on CPU#0, ifconfig/2420
  lock: terminate_lock.2+0x0/0x1c, .magic: dead4ead, .owner: ifconfig/2420, .owner_cpu: 0

While creating the stack trace, the unwinder uses the stack pointer to guess
the previous frame to read the previous stack pointer from memory.  The crash
happens, because the unwinder tries to read from unaligned memory and as such
triggers the unalignment trap handler which then leads to the spinlock
recursion and finally to a deadlock.

Fix it by checking the alignment before accessing the memory.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # v6.12+
2025-11-04 12:21:59 +01:00
Christian Brauner
b36d4b6aa8
arch: hookup listns() system call
Add the listns() system call to all architectures.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-20-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:18 +01:00
Sven Schnelle
1aa4524c0c parisc: entry.S: fix space adjustment on interruption for 64-bit userspace
In wide mode, the IASQ contain the upper part of the GVA
during interruption. This needs to be reversed before
the space is used - otherwise it contains parts of IAOQ.
See Page 2-13 "Processing Resources / Interruption Instruction
Address Queues" in the Parisc 2.0 Architecture Manual page 2-13
for an explanation.

The IAOQ/IASQ space_adjust was skipped for other interruptions
than itlb misses. However, the code in handle_interruption()
checks whether iasq[0] contains a valid space. Due to the not
masked out bits this match failed and the process was killed.

Also add space_adjust for IAOQ1/IASQ1 so ptregs contains sane values.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-30 15:16:32 +01:00
Sven Schnelle
5fb1d3ce3e parisc: entry: set W bit for !compat tasks in syscall_restore_rfi()
When the kernel leaves to userspace via syscall_restore_rfi(), the
W bit is not set in the new PSW. This doesn't cause any problems
because there's no 64 bit userspace for parisc. Simple static binaries
are usually loaded at addresses way below the 32 bit limit so the W bit
doesn't matter.

Fix this by setting the W bit when TIF_32BIT is not set.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-16 01:37:39 +02:00
Helge Deller
3631b9cb2a parisc: Drop padding fields and layers entries from inventory log
Drop those, as they are zero and not used by HPPA-SeaBIOS.

Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-13 21:37:18 +02:00
Linus Torvalds
8cc8ea228c parisc architecture updates for kernel v6.18-rc1:
Minor enhancements and fixes, specifically:
 - report emulation and alignment faults via perf
 - add initial kernel-side support for perf_events
 - small initialization fixes in the parisc firmware layer
 - adjust TC* constants and avoid referencing termio structs to avoid
   userspace build errors.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaOjZoAAKCRD3ErUQojoP
 X3cJAQCoSmeEmrWePAQQg/OkUntwkqxb7JEkmzJ6jqR7Cr9ncAD+NKlOiFSRkAvl
 FSGG3YZ/oqDSVvjkIGbhCDp16H/pGg4=
 =EHkQ
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
 "Minor enhancements and fixes, specifically:

   - report emulation and alignment faults via perf

   - add initial kernel-side support for perf_events

   - small initialization fixes in the parisc firmware layer

   - adjust TC* constants and avoid referencing termio structs to avoid
     userspace build errors"

* tag 'parisc-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix iodc and device path return values on old machines
  parisc: Firmware: Fix returned path for PDC_MODULE_FIND on older machines
  parisc: Add initial kernel-side perf_event support
  parisc: Report software alignment faults via perf
  parisc: Report emulation faults via perf
  parisc: don't reference obsolete termio struct for TC* constants
  parisc: Remove spurious if statement from raw_copy_from_user()
2025-10-10 10:01:55 -07:00
Helge Deller
f4edb5c52c parisc: Fix iodc and device path return values on old machines
Older machines may not fully initialize the return values when asking for IODC
and device path data when building the inventory.  Work around possible
firmware leaks by proper initialization of the variables.

Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-09 23:45:04 +02:00
Helge Deller
44ac7f5c6d parisc: Firmware: Fix returned path for PDC_MODULE_FIND on older machines
Older machines (like my 715/64) don't correctly initialize the
device path when returning from the PDC_MODULE_FIND firmware call.
Work around that shortcoming by initializing the path with the
known values.

Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-09 23:45:04 +02:00
Helge Deller
610cb23bcc parisc: Add initial kernel-side perf_event support
Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-07 19:35:51 +02:00
Helge Deller
912b9fd7c7 parisc: Report software alignment faults via perf
Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-07 18:01:53 +02:00
Helge Deller
6fb2e09c3a parisc: Report emulation faults via perf
Signed-off-by: Helge Deller <deller@gmx.de>
2025-10-07 18:01:53 +02:00