Commit Graph

1099 Commits

Author SHA1 Message Date
Linus Torvalds
334fbe734e mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       368
 Reviews/patch:       1.56
 Reviewed rate:       74%
 
 Excluding DAMON:
 
 Total patches:       316
 Reviews/patch:       1.77
 Reviewed rate:       81%
 
 Excluding DAMON and zram:
 
 Total patches:       306
 Reviews/patch:       1.81
 Reviewed rate:       82%
 
 Excluding DAMON, zram and maple_tree:
 
 Total patches:       276
 Reviews/patch:       2.01
 Reviewed rate:       91%
 
 Significant patch series in this merge:
 
 - The 30 patch series "maple_tree: Replace big node with maple copy"
   from Liam Howlett is mainly prepararatory work for ongoing development
   but it does reduce stack usage and is an improvement.
 
 - The 12 patch series "mm, swap: swap table phase III: remove swap_map"
   from Kairui Song offers memory savings by removing the static swap_map.
   It also yields some CPU savings and implements several cleanups.
 
 - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush
   Yadav adds file seal preservation to LUO's memfd code.
 
 - The 2 patch series "mm: zswap: add per-memcg stat for incompressible
   pages" from Jiayuan Chen adds additional userspace stats reportng to
   zswap.
 
 - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike
   Rapoport implements some cleanups for our handling of ZERO_PAGE() and
   zero_pfn.
 
 - The 2 patch series "mm/kmemleak: Improve scan_should_stop()
   implementation" from Zhongqiu Han provides an robustness improvement and
   some cleanups in the kmemleak code.
 
 - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang
   "improves the khugepaged scan logic and reduces CPU consumption by
   prioritizing scanning tasks that access memory frequently".
 
 - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies
   Kexec Handover by "transitioning KHO from an xarray-based metadata
   tracking system with serialization to a radix tree data structure that
   can be passed directly to the next kernel"
 
 - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan
   tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's
   tracepointing.
 
 - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper
   and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow
   stack code: remove per-arch code in favour of a generic implementation.
 
 - The 2 patch series "Fix KASAN support for KHO restored vmalloc
   regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO
   restores a vmalloc area.
 
 - The 4 patch series "mm: Remove stray references to pagevec" from Tal
   Zussman provides several cleanups, mainly udpating references to "struct
   pagevec", which became folio_batch three years ago.
 
 - The 17 patch series "mm: Eliminate fake head pages from vmemmap
   optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap
   optimization (HVO) by changing how tail pages encode their relationship
   to the head page.
 
 - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for
   core layer filters" from SeongJae Park improves two problematic
   behaviors of DAMOS that makes it less efficient when core layer filters
   are used.
 
 - The 3 patch series "mm/damon: strictly respect min_nr_regions" from
   SeongJae Park improves DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter.
 
 - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil
   Babka is a proper fix for a previously hotfixed SMP=n issue.  Code
   simplifications and cleanups ennsed.
 
 - The 16 patch series "mm: cleanups around unmapping / zapping" from
   David Hildenbrand implements "a bunch of cleanups around unmapping and
   zapping.  Mostly simplifications, code movements, documentation and
   renaming of zapping functions".
 
 - The 6 patch series "support batched checking of the young flag for
   MGLRU" from Baolin Wang supports batched checking of the young flag for
   MGLRU.  It's part cleanups; one benchmark shows large performance
   benefits for arm64.
 
 - The 5 patch series "memcg: obj stock and slab stat caching cleanups"
   from Johannes Weiner provides memcg cleanup and robustness improvements.
 
 - The 5 patch series "Allow order zero pages in page reporting" from
   Yuvraj Sakshith enhances page_reporting's free page reporting - it is
   presently and undesirably order-0 pages when reporting free memory.
 
 - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is
   cleanup work following from the recent conversion of the VMA flags to a
   bitmap.
 
 - The 10 patch series "mm/damon: add optional debugging-purpose sanity
   checks" from SeongJae Park adds some more developer-facing debug checks
   into DAMON core.
 
 - The 2 patch series "mm/damon: test and document power-of-2
   min_region_sz requirement" from SeongJae Park adds an additional DAMON
   kunit test and makes some adjustments to the addr_unit parameter
   handling.
 
 - The 3 patch series "mm/damon/core: make passed_sample_intervals
   comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time
   overflow issue in DAMON core.
 
 - The 7 patch series "mm/damon: improve/fixup/update ratio calculation,
   test and documentation" from SeongJae Park is a "batch of misc/minor
   improvements and fixups" for DAMON.
 
 - The 4 patch series "mm: move vma_(kernel|mmu)_pagesize() out of
   hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device
   when CONFIG_HUGETLB=n.  Some code movement was required.
 
 - The 6 patch series "zram: recompression cleanups and tweaks" from
   Sergey Senozhatsky provides "a somewhat random mix of fixups,
   recompression cleanups and improvements" in the zram code.
 
 - The 11 patch series "mm/damon: support multiple goal-based quota
   tuning algorithms" from SeongJae Park extend DAMOS quotas goal
   auto-tuning to support multiple tuning algorithms that users can select.
 
 - The 4 patch series "mm: thp: reduce unnecessary
   start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs
   handling so we no longer spam the logs with reams of junk when
   starting/stopping khugepaged.
 
 - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes
   provides some cleanups and slight fixes in the mremap, mmap and vma
   code.
 
 - The 5 patch series "mm/damon: support addr_unit on default monitoring
   targets for modules" from SeongJae Park extends the use of DAMON core's
   addr_unit tunable.
 
 - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites"
   from Nico Pache provides cleanups in the khugepaged and is a base for
   Nico's planned khugepaged mTHP support.
 
 - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups"
   from David Hildenbrand implements code movement and cleanups in the
   memhotplug and sparsemem code.
 
 - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and
   cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some
   memhotplug Kconfig support.
 
 - The 6 patch series "change young flag check functions to return bool"
   from Baolin Wang is "a cleanup patchset to change all young flag check
   functions to return bool".
 
 - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL
   dereference issues" from Josh Law and SeongJae Park fixes a few
   potential DAMON bugs.
 
 - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma
   code" from "converts a lot of the existing use of the legacy vm_flags_t
   data type to the new vma_flags_t type which replaces it".  Mainly in the
   vma code.
 
 - The 21 patch series "mm: expand mmap_prepare functionality and usage"
   from Lorenzo Stoakes "expands the mmap_prepare functionality, which is
   intended to replace the deprecated f_op->mmap hook which has been the
   source of bugs and security issues for some time".  Cleanups,
   documentation, extension of mmap_prepare into filesystem drivers.
 
 - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from
   Lorenzo Stoakes simplifies and cleans up zap_huge_pmd().  Additional
   cleanups around vm_normal_folio_pmd() and the softleaf functionality are
   performed.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA
 jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi
 GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ=
 =1Q7o
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "maple_tree: Replace big node with maple copy" (Liam Howlett)

   Mainly prepararatory work for ongoing development but it does reduce
   stack usage and is an improvement.

 - "mm, swap: swap table phase III: remove swap_map" (Kairui Song)

   Offers memory savings by removing the static swap_map. It also yields
   some CPU savings and implements several cleanups.

 - "mm: memfd_luo: preserve file seals" (Pratyush Yadav)

   File seal preservation to LUO's memfd code

 - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
   Chen)

   Additional userspace stats reportng to zswap

 - "arch, mm: consolidate empty_zero_page" (Mike Rapoport)

   Some cleanups for our handling of ZERO_PAGE() and zero_pfn

 - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
   Han)

   A robustness improvement and some cleanups in the kmemleak code

 - "Improve khugepaged scan logic" (Vernon Yang)

   Improve khugepaged scan logic and reduce CPU consumption by
   prioritizing scanning tasks that access memory frequently

 - "Make KHO Stateless" (Jason Miu)

   Simplify Kexec Handover by transitioning KHO from an xarray-based
   metadata tracking system with serialization to a radix tree data
   structure that can be passed directly to the next kernel

 - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
   Ballasi and Steven Rostedt)

   Enhance vmscan's tracepointing

 - "mm: arch/shstk: Common shadow stack mapping helper and
   VM_NOHUGEPAGE" (Catalin Marinas)

   Cleanup for the shadow stack code: remove per-arch code in favour of
   a generic implementation

 - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)

   Fix a WARN() which can be emitted the KHO restores a vmalloc area

 - "mm: Remove stray references to pagevec" (Tal Zussman)

   Several cleanups, mainly udpating references to "struct pagevec",
   which became folio_batch three years ago

 - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
   Shutsemau)

   Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
   pages encode their relationship to the head page

 - "mm/damon/core: improve DAMOS quota efficiency for core layer
   filters" (SeongJae Park)

   Improve two problematic behaviors of DAMOS that makes it less
   efficient when core layer filters are used

 - "mm/damon: strictly respect min_nr_regions" (SeongJae Park)

   Improve DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter

 - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)

   The proper fix for a previously hotfixed SMP=n issue. Code
   simplifications and cleanups ensued

 - "mm: cleanups around unmapping / zapping" (David Hildenbrand)

   A bunch of cleanups around unmapping and zapping. Mostly
   simplifications, code movements, documentation and renaming of
   zapping functions

 - "support batched checking of the young flag for MGLRU" (Baolin Wang)

   Batched checking of the young flag for MGLRU. It's part cleanups; one
   benchmark shows large performance benefits for arm64

 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)

   memcg cleanup and robustness improvements

 - "Allow order zero pages in page reporting" (Yuvraj Sakshith)

   Enhance free page reporting - it is presently and undesirably order-0
   pages when reporting free memory.

 - "mm: vma flag tweaks" (Lorenzo Stoakes)

   Cleanup work following from the recent conversion of the VMA flags to
   a bitmap

 - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
   Park)

   Add some more developer-facing debug checks into DAMON core

 - "mm/damon: test and document power-of-2 min_region_sz requirement"
   (SeongJae Park)

   An additional DAMON kunit test and makes some adjustments to the
   addr_unit parameter handling

 - "mm/damon/core: make passed_sample_intervals comparisons
   overflow-safe" (SeongJae Park)

   Fix a hard-to-hit time overflow issue in DAMON core

 - "mm/damon: improve/fixup/update ratio calculation, test and
   documentation" (SeongJae Park)

   A batch of misc/minor improvements and fixups for DAMON

 - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
   Hildenbrand)

   Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
   movement was required.

 - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)

   A somewhat random mix of fixups, recompression cleanups and
   improvements in the zram code

 - "mm/damon: support multiple goal-based quota tuning algorithms"
   (SeongJae Park)

   Extend DAMOS quotas goal auto-tuning to support multiple tuning
   algorithms that users can select

 - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)

   Fix the khugpaged sysfs handling so we no longer spam the logs with
   reams of junk when starting/stopping khugepaged

 - "mm: improve map count checks" (Lorenzo Stoakes)

   Provide some cleanups and slight fixes in the mremap, mmap and vma
   code

 - "mm/damon: support addr_unit on default monitoring targets for
   modules" (SeongJae Park)

   Extend the use of DAMON core's addr_unit tunable

 - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)

   Cleanups to khugepaged and is a base for Nico's planned khugepaged
   mTHP support

 - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)

   Code movement and cleanups in the memhotplug and sparsemem code

 - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
   CONFIG_MIGRATION" (David Hildenbrand)

   Rationalize some memhotplug Kconfig support

 - "change young flag check functions to return bool" (Baolin Wang)

   Cleanups to change all young flag check functions to return bool

 - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
   Law and SeongJae Park)

   Fix a few potential DAMON bugs

 - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
   Stoakes)

   Convert a lot of the existing use of the legacy vm_flags_t data type
   to the new vma_flags_t type which replaces it. Mainly in the vma
   code.

 - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)

   Expand the mmap_prepare functionality, which is intended to replace
   the deprecated f_op->mmap hook which has been the source of bugs and
   security issues for some time. Cleanups, documentation, extension of
   mmap_prepare into filesystem drivers

 - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)

   Simplify and clean up zap_huge_pmd(). Additional cleanups around
   vm_normal_folio_pmd() and the softleaf functionality are performed.

* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm: fix deferred split queue races during migration
  mm/khugepaged: fix issue with tracking lock
  mm/huge_memory: add and use has_deposited_pgtable()
  mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  mm/huge_memory: separate out the folio part of zap_huge_pmd()
  mm/huge_memory: use mm instead of tlb->mm
  mm/huge_memory: remove unnecessary sanity checks
  mm/huge_memory: deduplicate zap deposited table call
  mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  mm/huge_memory: add a common exit path to zap_huge_pmd()
  mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  mm/huge: avoid big else branch in zap_huge_pmd()
  mm/huge_memory: simplify vma_is_specal_huge()
  mm: on remap assert that input range within the proposed VMA
  mm: add mmap_action_map_kernel_pages[_full]()
  uio: replace deprecated mmap hook with mmap_prepare in uio_info
  drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
  mm: allow handling of stacked mmap_prepare hooks in more drivers
  ...
2026-04-15 12:59:16 -07:00
Linus Torvalds
f2729827ae nilfs2 updates for v7.1
- nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()
 - nilfs2: fix NULL i_assoc_inode dereference in nilfs_mdt_save_to_shadow_map
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCadlsMAAKCRAhI4xTh21N
 nVLWAQC9fY9kH2fzW7MQKt0/33iN+KwnWPCTChVjSspoSG14QAEAiWqEiVcJtvGN
 ARJgvHKX7fUzZf2YoVnVeGheOrbdvgE=
 =XmKt
 -----END PGP SIGNATURE-----

Merge tag 'nilfs2-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2

Pull nilfs2 updates from Viacheslav Dubeyko:
 "This contains fixes of syzbot reported issues in NILFS2 functionality:

   - The DAT inode's btree node cache (i_assoc_inode) is initialized
     lazily during btree operations.

     However, nilfs_mdt_save_to_shadow_map() assumes i_assoc_inode is
     already initialized when copying dirty pages to the shadow map
     during GC. If NILFS_IOCTL_CLEAN_SEGMENTS is called immediately
     after mount before any btree operation has occurred on the DAT
     inode, i_assoc_inode is NULL leading to a general protection fault.

     Fix this by calling nilfs_attach_btree_node_cache() on the DAT
     inode in nilfs_dat_read() at mount time, ensuring i_assoc_inode is
     always initialized before any GC operation can use it (Deepanshu
     Kartikey)

   - nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead
     blocks by comparing it with the current block number bd_blocknr. If
     they differ, the block is considered dead and skipped.

     A corrupted ioctl request with bd_oblocknr set to 0 causes the
     comparison to incorrectly match when the lookup returns -ENOENT and
     sets bd_blocknr to 0, bypassing the dead block check and calling
     nilfs_bmap_mark() on a non- existent block. This causes
     nilfs_btree_do_lookup() to return -ENOENT, triggering the
     WARN_ON(ret == -ENOENT).

     Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at
     the beginning of each iteration (Deepanshu Kartikey)"

* tag 'nilfs2-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2:
  nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()
  nilfs2: fix NULL i_assoc_inode dereference in nilfs_mdt_save_to_shadow_map
2026-04-13 16:53:19 -07:00
Tal Zussman
4e1d77a8f3 folio_batch: rename pagevec.h to folio_batch.h
struct pagevec was removed in commit 1e0877d58b ("mm: remove struct
pagevec").  Rename include/linux/pagevec.h to reflect reality and update
includes tree-wide.  Add the new filename to MAINTAINERS explicitly, as it
no longer matches the "include/linux/page[-_]*" pattern in MEMORY
MANAGEMENT - CORE.

Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-3-716868cc2d11@columbia.edu
Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:07 -07:00
Deepanshu Kartikey
be3e5d1064 nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()
nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead blocks
by comparing it with the current block number bd_blocknr. If they differ,
the block is considered dead and skipped.

However, bd_oblocknr should never be 0 since block 0 typically stores the
primary superblock and is never a valid GC target block. A corrupted ioctl
request with bd_oblocknr set to 0 causes the comparison to incorrectly
match when the lookup returns -ENOENT and sets bd_blocknr to 0, bypassing
the dead block check and calling nilfs_bmap_mark() on a non-existent
block. This causes nilfs_btree_do_lookup() to return -ENOENT, triggering
the WARN_ON(ret == -ENOENT).

Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at the
beginning of each iteration.

[ryusuke: slightly modified the commit message and comments for accuracy]

Fixes: 7942b919f7 ("nilfs2: ioctl operations")
Reported-by: syzbot+98a040252119df0506f8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=98a040252119df0506f8
Suggested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
Reported-by: syzbot+466a45fcfb0562f5b9a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=466a45fcfb0562f5b9a0
Cc: Junjie Cao <junjie.cao@linux.dev>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-03-31 11:05:55 -07:00
Deepanshu Kartikey
4a4e0328ed nilfs2: fix NULL i_assoc_inode dereference in nilfs_mdt_save_to_shadow_map
The DAT inode's btree node cache (i_assoc_inode) is initialized lazily
during btree operations. However, nilfs_mdt_save_to_shadow_map()
assumes i_assoc_inode is already initialized when copying dirty pages
to the shadow map during GC.

If NILFS_IOCTL_CLEAN_SEGMENTS is called immediately after mount before
any btree operation has occurred on the DAT inode, i_assoc_inode is
NULL leading to a general protection fault.

Fix this by calling nilfs_attach_btree_node_cache() on the DAT inode
in nilfs_dat_read() at mount time, ensuring i_assoc_inode is always
initialized before any GC operation can use it.

Reported-by: syzbot+4b4093b1f24ad789bf37@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4b4093b1f24ad789bf37
Tested-by: syzbot+4b4093b1f24ad789bf37@syzkaller.appspotmail.com
Fixes: e897be17a4 ("nilfs2: fix lockdep warnings in page operations for btree nodes")
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-03-30 19:57:15 -07:00
Jeff Layton
81359c146f
nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group()
With the change to make inode->i_ino a u64, the build started failing on
32-bit ARM with:

    ERROR: modpost: "__aeabi_uldivmod" [fs/nilfs2/nilfs2.ko] undefined!

Fix this by using udiv_u64() for the division.

For finding the index into the group, switch to using a bitwise &
operation since that's more efficient. With this change however,
NILFS_BMAP_GROUP_DIV must be a power of two, so add a compile-time
assertion for that.

Fixes: 998a59d371 ("treewide: fix missed i_ino format specifier conversions")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603100602.KPxiClIO-lkp@intel.com/
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260316-iino-u64-v3-2-d1076b8f7a20@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-17 15:38:49 +01:00
Jeff Layton
0b2600f81c
treewide: change inode->i_ino from unsigned long to u64
On 32-bit architectures, unsigned long is only 32 bits wide, which
causes 64-bit inode numbers to be silently truncated. Several
filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that
exceed 32 bits, and this truncation can lead to inode number collisions
and other subtle bugs on 32-bit systems.

Change the type of inode->i_ino from unsigned long to u64 to ensure that
inode numbers are always represented as 64-bit values regardless of
architecture. Update all format specifiers treewide from %lu/%lx to
%llu/%llx to match the new type, along with corresponding local variable
types.

This is the bulk treewide conversion. Earlier patches in this series
handled trace events separately to allow trace field reordering for
better struct packing on 32-bit.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-06 14:31:28 +01:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
d10a88ce16 nilfs2 updates for v7.0
- nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
 - nilfs2: convert nilfs_super_block to kernel-doc
 - nilfs2: Fix potential block overflow that cause system hang
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCaYZw2gAKCRAhI4xTh21N
 nZuFAQD8vJk3OJKbxxroCyTGxDT3C/WL7PMC9Z1QNQ/yF0zBYwD+JSPN/dfQY6uF
 LracWCoFQcdaWRRWn1SA2k7L1OchFgQ=
 =WXsp
 -----END PGP SIGNATURE-----

Merge tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2

Pull nilfs2 updates from Viacheslav Dubeyko:

 - Fix potential block overflow that cause system hang

   When executing the FITRIM command, an underflow can occur in the
   calculation of nblocks. This ultimately leads to the block layer
   function __blkdev_issue_discard() taking an excessively long time
   to process the bio chain, and the ns_segctor_sem lock remains held
   for a long period.

   This prevents other tasks from acquiring the ns_segctor_sem lock,
   resulting in a hang reported by syzbot (Edward Adam Davis)

 - Fix missing struct keywords in nilfs2_api.h kernel-doc (Ryusuke
   Konishi)

 - Convert nilfs_super_block to kernel-doc

   Eliminate 40+ kernel-doc warnings in nilfs2_ondisk.h by converting
   all of the struct member comments to kernel-doc comments (Randy
   Dunlap)

* tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2:
  nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
  nilfs2: convert nilfs_super_block to kernel-doc
  nilfs2: Fix potential block overflow that cause system hang
2026-02-09 15:55:41 -08:00
Jeff Layton
f46bb13dc5
nilfs2: add setlease file operation
Add the setlease file_operation to nilfs_file_operations and
nilfs_dir_operations, pointing to generic_setlease.  A future patch
will change the default behavior to reject lease attempts with -EINVAL
when there is no setlease file operation defined. Add generic_setlease
to retain the ability to set leases on this filesystem.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-13-ea4dec9b67fa@kernel.org
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:55:46 +01:00
Edward Adam Davis
ed527ef0c2 nilfs2: Fix potential block overflow that cause system hang
When a user executes the FITRIM command, an underflow can occur when
calculating nblocks if end_block is too small. Since nblocks is of
type sector_t, which is u64, a negative nblocks value will become a
very large positive integer. This ultimately leads to the block layer
function __blkdev_issue_discard() taking an excessively long time to
process the bio chain, and the ns_segctor_sem lock remains held for a
long period. This prevents other tasks from acquiring the ns_segctor_sem
lock, resulting in the hang reported by syzbot in [1].

If the ending block is too small, typically if it is smaller than 4KiB
range, depending on the usage of the segment 0, it may be possible to
attempt a discard request beyond the device size causing the hang.

Exiting successfully and assign the discarded size (0 in this case)
to range->len.

Although the start and len values in the user input range are too small,
a conservative strategy is adopted here to safely ignore them, which is
equivalent to a no-op; it will not perform any trimming and will not
throw an error.

[1]
task:segctord state:D stack:28968 pid:6093 tgid:6093  ppid:2 task_flags:0x200040 flags:0x00080000
Call Trace:
 rwbase_write_lock+0x3dd/0x750 kernel/locking/rwbase_rt.c:272
 nilfs_transaction_lock+0x253/0x4c0 fs/nilfs2/segment.c:357
 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2569 [inline]
 nilfs_segctor_thread+0x6ec/0xe00 fs/nilfs2/segment.c:2684

[ryusuke: corrected part of the commit message about the consequences]

Fixes: 82e11e857b ("nilfs2: add nilfs_sufile_trim_fs to trim clean segs")
Reported-by: syzbot+7eedce5eb281acd832f0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7eedce5eb281acd832f0
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2025-12-22 15:45:29 -08:00
Linus Torvalds
509d3f4584 Significant patch series in this pull request:
- The 6 patch series "panic: sys_info: Refactor and fix a potential
   issue" from Andy Shevchenko fixes a build issue and does some cleanup in
   ib/sys_info.c.
 
 - The 9 patch series "Implement mul_u64_u64_div_u64_roundup()" from
   David Laight enhances the 64-bit math code on behalf of a PWM driver and
   beefs up the test module for these library functions.
 
 - The 2 patch series "scripts/gdb/symbols: make BPF debug info available
   to GDB" from Ilya Leoshkevich makes BPF symbol names, sizes, and line
   numbers available to the GDB debugger.
 
 - The 4 patch series "Enable hung_task and lockup cases to dump system
   info on demand" from Feng Tang adds a sysctl which can be used to cause
   additional info dumping when the hung-task and lockup detectors fire.
 
 - The 6 patch series "lib/base64: add generic encoder/decoder, migrate
   users" from Kuan-Wei Chiu adds a general base64 encoder/decoder to lib/
   and migrates several users away from their private implementations.
 
 - The 2 patch series "rbree: inline rb_first() and rb_last()" from Eric
   Dumazet makes TCP a little faster.
 
 - The 9 patch series "liveupdate: Rework KHO for in-kernel users" from
   Pasha Tatashin reworks the KEXEC Handover interfaces in preparation for
   Live Update Orchestrator (LUO), and possibly for other future clients.
 
 - The 13 patch series "kho: simplify state machine and enable dynamic
   updates" from Pasha Tatashin increases the flexibility of KEXEC
   Handover.  Also preparation for LUO.
 
 - The 18 patch series "Live Update Orchestrator" from Pasha Tatashin is
   a major new feature targeted at cloud environments.  Quoting the [0/N]:
 
     This series introduces the Live Update Orchestrator, a kernel subsystem
     designed to facilitate live kernel updates using a kexec-based reboot.
     This capability is critical for cloud environments, allowing hypervisors
     to be updated with minimal downtime for running virtual machines.  LUO
     achieves this by preserving the state of selected resources, such as
     memory, devices and their dependencies, across the kernel transition.
 
     As a key feature, this series includes support for preserving memfd file
     descriptors, which allows critical in-memory data, such as guest RAM or
     any other large memory region, to be maintained in RAM across the kexec
     reboot.
 
   Mike Rappaport merits a mention here, for his extensive review and
   testing work.
 
 - The 3 patch series "kexec: reorganize kexec and kdump sysfs" from
   Sourabh Jain moves the kexec and kdump sysfs entries from /sys/kernel/
   to /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day.
 
 - The 2 patch series "kho: fixes for vmalloc restoration" from Mike
   Rapoport fixes a BUG which was being hit during KHO restoration of
   vmalloc() regions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTSAkQAKCRDdBJ7gKXxA
 jrkiAP9QKfsRv46XZaM5raScjY1ayjP+gqb2rgt6BQ/gZvb2+wD/cPAYOR6BiX52
 n0pVpQmG5P/KyOmpLztn96ejL4heKwQ=
 =JY96
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...
2025-12-06 14:01:20 -08:00
Linus Torvalds
afdf0fb340 vfs-6.19-rc1.fs_header
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
 oq2EAQD09y/qVU81E7Qg7Cn4n5/3WTlnQjx0aSvhb4p6dFUcFwD+K9uVJNP8x8tA
 xTaPt59nZbEX9BIAwtLChSPa4CZsnwM=
 =XrvE
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fs header updates from Christian Brauner:
 "This contains initial work to start splitting up fs.h.

  Begin the long-overdue work of splitting up the monolithic fs.h
  header. The header has grown to over 3000 lines and includes types and
  functions for many different subsystems, making it difficult to
  navigate and causing excessive compilation dependencies.

  This series introduces new focused headers for superblock-related
  code:

   - Rename fs_types.h to fs_dirent.h to better reflect its actual
     content (directory entry types)

   - Add fs/super_types.h containing superblock type definitions

   - Add fs/super.h containing superblock function declarations

  This is the first step in a longer effort to modularize the VFS
  headers.

  Cleanups:

   - Inode Field Layout Optimization (Mateusz Guzik)

     Move inode fields used during fast path lookup closer together to
     improve cache locality during path resolution.

   - current_umask() Optimization (Mateusz Guzik)

     Inline current_umask() and move it to fs_struct.h. This improves
     performance by avoiding function call overhead for this
     frequently-used function, and places it in a more appropriate
     header since it operates on fs_struct"

* tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: move inode fields used during fast path lookup closer together
  fs: inline current_umask() and move it to fs_struct.h
  fs: add fs/super.h header
  fs: add fs/super_types.h header
  fs: rename fs_types.h to fs_dirent.h
2025-12-01 14:18:01 -08:00
Linus Torvalds
9368f0f941 vfs-6.19-rc1.inode
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc
 omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC
 RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM=
 =zo/J
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode updates from Christian Brauner:
 "Features:

   - Hide inode->i_state behind accessors. Open-coded accesses prevent
     asserting they are done correctly. One obvious aspect is locking,
     but significantly more can be checked. For example it can be
     detected when the code is clearing flags which are already missing,
     or is setting flags when it is illegal (e.g., I_FREEING when
     ->i_count > 0)

   - Provide accessors for ->i_state, converts all filesystems using
     coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
     overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
     compile

   - Rework I_NEW handling to operate without fences, simplifying the
     code after the accessor infrastructure is in place

  Cleanups:

   - Move wait_on_inode() from writeback.h to fs.h

   - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
     for clarity

   - Cosmetic fixes to LRU handling

   - Push list presence check into inode_io_list_del()

   - Touch up predicts in __d_lookup_rcu()

   - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage

   - Assert on ->i_count in iput_final()

   - Assert ->i_lock held in __iget()

  Fixes:

   - Add missing fences to I_NEW handling"

* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  dcache: touch up predicts in __d_lookup_rcu()
  fs: push list presence check into inode_io_list_del()
  fs: cosmetic fixes to lru handling
  fs: rework I_NEW handling to operate without fences
  fs: make plain ->i_state access fail to compile
  xfs: use the new ->i_state accessors
  nilfs2: use the new ->i_state accessors
  overlayfs: use the new ->i_state accessors
  gfs2: use the new ->i_state accessors
  f2fs: use the new ->i_state accessors
  smb: use the new ->i_state accessors
  ceph: use the new ->i_state accessors
  btrfs: use the new ->i_state accessors
  Manual conversion to use ->i_state accessors of all places not covered by coccinelle
  Coccinelle-based conversion to use ->i_state accessors
  fs: provide accessors for ->i_state
  fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
  fs: move wait_on_inode() from writeback.h to fs.h
  fs: add missing fences to I_NEW handling
  ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
  ...
2025-12-01 09:02:34 -08:00
Andrew Morton
bc947af677 Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be able
to merge "kho: make debugfs interface optional" into mm-nonmm-stable.
2025-11-27 14:17:02 -08:00
Thorsten Blum
3e4b89e970 nilfs2: replace vmalloc + copy_from_user with vmemdup_user
Replace vmalloc() followed by copy_from_user() with vmemdup_user() to
improve nilfs_ioctl_clean_segments() and nilfs_ioctl_set_suinfo().  Use
kvfree() to free the buffers created by vmemdup_user().

Use u64_to_user_ptr() instead of manually casting the pointers and
remove the obsolete 'out_free' label.

No functional changes intended.

Link: https://lkml.kernel.org/r/20251030154700.7444-1-konishi.ryusuke@gmail.com
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12 10:00:17 -08:00
Edward Adam Davis
9a6b60cb14 nilfs2: avoid having an active sc_timer before freeing sci
Because kthread_stop did not stop sc_task properly and returned -EINTR,
the sc_timer was not properly closed, ultimately causing the problem [1]
reported by syzbot when freeing sci due to the sc_timer not being closed.

Because the thread sc_task main function nilfs_segctor_thread() returns 0
when it succeeds, when the return value of kthread_stop() is not 0 in
nilfs_segctor_destroy(), we believe that it has not properly closed
sc_timer.

We use timer_shutdown_sync() to sync wait for sc_timer to shutdown, and
set the value of sc_task to NULL under the protection of lock
sc_state_lock, so as to avoid the issue caused by sc_timer not being
properly shutdowned.

[1]
ODEBUG: free active (active state 0) object: 00000000dacb411a object type: timer_list hint: nilfs_construction_timeout
Call trace:
 nilfs_segctor_destroy fs/nilfs2/segment.c:2811 [inline]
 nilfs_detach_log_writer+0x668/0x8cc fs/nilfs2/segment.c:2877
 nilfs_put_super+0x4c/0x12c fs/nilfs2/super.c:509

Link: https://lkml.kernel.org/r/20251029225226.16044-1-konishi.ryusuke@gmail.com
Fixes: 3f66cc261c ("nilfs2: use kthread_create and kthread_stop for the log writer thread")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=24d8b70f039151f65590
Tested-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Cc: <stable@vger.kernel.org>	[6.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-09 21:19:46 -08:00
Mateusz Guzik
5b8ed52866
fs: inline current_umask() and move it to fs_struct.h
There is no good reason to have this as a func call, other than avoiding
the churn of adding fs_struct.h as needed.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251104170448.630414-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-05 22:51:23 +01:00
Mateusz Guzik
a18d43041b
nilfs2: use the new ->i_state accessors
Change generated with coccinelle and fixed up by hand as appropriate.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20 20:22:27 +02:00
Andrew Morton
bc9950b56f Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up
changes required by mm-stable material: hugetlb and damon.
2025-09-21 14:19:36 -07:00
Matthew Wilcox (Oracle)
53fbef56e0 mm: introduce memdesc_flags_t
Patch series "Add and use memdesc_flags_t".

At some point struct page will be separated from struct slab and struct
folio.  This is a step towards that by introducing a type for the 'flags'
word of all three structures.  This gives us a certain amount of type
safety by establishing that some of these unsigned longs are different
from other unsigned longs in that they contain things like node ID,
section number and zone number in the upper bits.  That lets us have
functions that can be easily called by anyone who has a slab, folio or
page (but not easily by anyone else) to get the node or zone.

There's going to be some unusual merge problems with this as some odd bits
of the kernel decide they want to print out the flags value or something
similar by writing page->flags and now they'll need to write page->flags.f
instead.  That's most of the churn here.  Maybe we should be removing
these things from the debug output?


This patch (of 11):

Wrap the unsigned long flags in a typedef.  In upcoming patches, this will
provide a strong hint that you can't just pass a random unsigned long to
functions which take this as an argument.

[willy@infradead.org: s/flags/flags.f/ in several architectures]
  Link: https://lkml.kernel.org/r/aKMgPRLD-WnkPxYm@casper.infradead.org
[nicola.vetrini@gmail.com: mips: fix compilation error]
  Link: https://lore.kernel.org/lkml/CA+G9fYvkpmqGr6wjBNHY=dRp71PLCoi2341JxOudi60yqaeUdg@mail.gmail.com/
  Link: https://lkml.kernel.org/r/20250825214245.1838158-1-nicola.vetrini@gmail.com
Link: https://lkml.kernel.org/r/20250805172307.1302730-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20250805172307.1302730-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 16:55:07 -07:00
Nathan Chancellor
025e87f8ea nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
When accessing one of the files under /sys/fs/nilfs2/features when
CONFIG_CFI_CLANG is enabled, there is a CFI violation:

  CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d)
  ...
  Call Trace:
   <TASK>
   sysfs_kf_seq_show+0x2a6/0x390
   ? __cfi_kobj_attr_show+0x10/0x10
   kernfs_seq_show+0x104/0x15b
   seq_read_iter+0x580/0xe2b
  ...

When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype
is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops.  When
nilfs_feature_attr_group is added to that kobject via
sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw,
which will call sysfs_kf_seq_show() when ->seq_show() is called. 
sysfs_kf_seq_show() in turn calls kobj_attr_show() through
->sysfs_ops->show().  kobj_attr_show() casts the provided attribute out to
a 'struct kobj_attribute' via container_of() and calls ->show(), resulting
in the CFI violation since neither nilfs_feature_revision_show() nor
nilfs_feature_README_show() match the prototype of ->show() in 'struct
kobj_attribute'.

Resolve the CFI violation by adjusting the second parameter in
nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct
kobj_attribute' to match the expected prototype.

Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com
Fixes: aebe17f684 ("nilfs2: add /sys/fs/nilfs2/features group")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 13:05:38 -07:00
Linus Torvalds
57fcb7d930 vfs-6.17-rc1.fileattr
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc
 oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp
 YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4=
 =geAl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fileattr updates from Christian Brauner:
 "This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file
2025-07-28 15:24:14 -07:00
Linus Torvalds
7031769e10 vfs-6.17-rc1.mmap_prepare
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
 os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
 pcYE40CacaekD8rFWwYUyszpgmv6ewc=
 =wCwp
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull mmap_prepare updates from Christian Brauner:
 "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
  introduce new .mmap_prepare() file callback").

  This is preferred to the existing f_op->mmap() hook as it does require
  a VMA to be established yet, thus allowing the mmap logic to invoke
  this hook far, far earlier, prior to inserting a VMA into the virtual
  address space, or performing any other heavy handed operations.

  This allows for much simpler unwinding on error, and for there to be a
  single attempt at merging a VMA rather than having to possibly
  reattempt a merge based on potentially altered VMA state.

  Far more importantly, it prevents inappropriate manipulation of
  incompletely initialised VMA state, which is something that has been
  the cause of bugs and complexity in the past.

  The intent is to gradually deprecate f_op->mmap, and in that vein this
  series coverts the majority of file systems to using f_op->mmap_prepare.

  Prerequisite steps are taken - firstly ensuring all checks for mmap
  capabilities use the file_has_valid_mmap_hooks() helper rather than
  directly checking for f_op->mmap (which is now not a valid check) and
  secondly updating daxdev_mapping_supported() to not require a VMA
  parameter to allow ext4 and xfs to be converted.

  Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
  nested file systems") handles the nasty edge-case of nested file
  systems like overlayfs, which introduces a compatibility shim to allow
  f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.

  This allows for nested filesystems to continue to function correctly
  with all file systems regardless of which callback is used. Once we
  finally convert all file systems, this shim can be removed.

  As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
  can nest all other file systems.

  We additionally do not update resctl - as this requires an update to
  remap_pfn_range() (or an alternative to it) which we defer to a later
  series, equally we do not update cramfs which needs a mixed mapping
  insertion with the same issue, nor do we update procfs, hugetlbfs,
  syfs or kernfs all of which require VMAs for internal state and hooks.
  We shall return to all of these later"

* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: update porting, vfs documentation to describe mmap_prepare()
  fs: replace mmap hook with .mmap_prepare for simple mappings
  fs: convert most other generic_file_*mmap() users to .mmap_prepare()
  fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
  mm/filemap: introduce generic_file_*_mmap_prepare() helpers
  fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
  fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
  fs/dax: make it possible to check dev dax support without a VMA
  fs: consistently use can_mmap_file() helper
  mm/nommu: use file_has_valid_mmap_hooks() helper
  mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28 13:43:25 -07:00
Linus Torvalds
7879d7aff0 vfs-6.17-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM/KwAKCRCRxhvAZXjc
 opT+AP407JwhRSBjUEmHg5JzUyDoivkOySdnthunRjaBKD8rlgEApM6SOIZYucU7
 cPC3ZY6ORFM6Mwaw+iDW9lasM5ucHQ8=
 =CHha
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc VFS updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add ext4 IOCB_DONTCACHE support

     This refactors the address_space_operations write_begin() and
     write_end() callbacks to take const struct kiocb * as their first
     argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
     to the filesystem's buffered I/O path.

     Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
     and advertises support via the FOP_DONTCACHE file operation flag.

     Additionally, the i915 driver's shmem write paths are updated to
     bypass the legacy write_begin/write_end interface in favor of
     directly calling write_iter() with a constructed synchronous kiocb.
     Another i915 change replaces a manual write loop with
     kernel_write() during GEM shmem object creation.

  Cleanups:

   - don't duplicate vfs_open() in kernel_file_open()

   - proc_fd_getattr(): don't bother with S_ISDIR() check

   - fs/ecryptfs: replace snprintf with sysfs_emit in show function

   - vfs: Remove unnecessary list_for_each_entry_safe() from
     evict_inodes()

   - filelock: add new locks_wake_up_waiter() helper

   - fs: Remove three arguments from block_write_end()

   - VFS: change old_dir and new_dir in struct renamedata to dentrys

   - netfs: Remove unused declaration netfs_queue_write_request()

  Fixes:

   - eventpoll: Fix semi-unbounded recursion

   - eventpoll: fix sphinx documentation build warning

   - fs/read_write: Fix spelling typo

   - fs: annotate data race between poll_schedule_timeout() and
     pollwake()

   - fs/pipe: set FMODE_NOWAIT in create_pipe_files()

   - docs/vfs: update references to i_mutex to i_rwsem

   - fs/buffer: remove comment about hard sectorsize

   - fs/buffer: remove the min and max limit checks in __getblk_slow()

   - fs/libfs: don't assume blocksize <= PAGE_SIZE in
     generic_check_addressable

   - fs_context: fix parameter name in infofc() macro

   - fs: Prevent file descriptor table allocations exceeding INT_MAX"

* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
  netfs: Remove unused declaration netfs_queue_write_request()
  eventpoll: fix sphinx documentation build warning
  ext4: support uncached buffered I/O
  mm/pagemap: add write_begin_get_folio() helper function
  fs: change write_begin/write_end interface to take struct kiocb *
  drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
  drm/i915: Use kernel_write() in shmem object create
  eventpoll: Fix semi-unbounded recursion
  vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
  fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
  fs/buffer: remove the min and max limit checks in __getblk_slow()
  fs: Prevent file descriptor table allocations exceeding INT_MAX
  fs: Remove three arguments from block_write_end()
  fs/ecryptfs: replace snprintf with sysfs_emit in show function
  fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
  docs/vfs: update references to i_mutex to i_rwsem
  fs/buffer: remove comment about hard sectorsize
  fs_context: fix parameter name in infofc() macro
  VFS: change old_dir and new_dir in struct renamedata to dentrys
  proc_fd_getattr(): don't bother with S_ISDIR() check
  ...
2025-07-28 11:22:56 -07:00
Ryusuke Konishi
4aead50caf nilfs2: reject invalid file types when reading inodes
To prevent inodes with invalid file types from tripping through the vfs
and causing malfunctions or assertion failures, add a missing sanity check
when reading an inode from a block device.  If the file type is not valid,
treat it as a filesystem error.

Link: https://lkml.kernel.org/r/20250710134952.29862-1-konishi.ryusuke@gmail.com
Fixes: 05fe58fdc1 ("nilfs2: inode operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:26:16 -07:00
Taotao Chen
e9d8e2bf23
fs: change write_begin/write_end interface to take struct kiocb *
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.

Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.

Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-16 14:48:18 +02:00
Christian Brauner
ca115d7e75
tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Matthew Wilcox (Oracle)
b39f7d75dc
fs: Remove three arguments from block_write_end()
block_write_end() looks like it can be used as a ->write_end()
implementation.  However, it can't as it does not unlock nor put
the folio.  Since it does not use the 'file', 'mapping' nor 'fsdata'
arguments, remove them.

Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/20250624132130.1590285-1-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-24 15:53:40 +02:00
Lorenzo Stoakes
2e3b37a7e4
fs: replace mmap hook with .mmap_prepare for simple mappings
Since commit c84bf6dd2b ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().

This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.

This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.

It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.

Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c27 ("mm: add mmap_prepare()
compatibility layer for nested file systems").

In this patch we apply this change to file systems with relatively simple
mmap() hook logic - exfat, ceph, f2fs, bcachefs, zonefs, btrfs, ocfs2,
orangefs, nilfs2, romfs, ramfs and aio.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/f528ac4f35b9378931bd800920fee53fc0c5c74d.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-19 13:56:59 +02:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Christoph Hellwig
84e437640b nilfs2: remove wbc->for_reclaim handling
Since commit 013a07052a ("nilfs2: convert metadata aops from writepage
to writepages"), nilfs_mdt_write_folio can't be called from reclaim
context any more.  Remove the code keyed of the wbc->for_reclaim flag,
which is now only set for writing out swap or shmem pages inside the swap
code, but never passed to file systems.

Link: https://lkml.kernel.org/r/20250508054938.15894-7-hch@lst.de
Link: https://lkml.kernel.org/r/20250516123417.6779-1-konishi.ryusuke@gmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 10:48:23 -07:00
Ryusuke Konishi
8e39fbb1ed nilfs2: do not propagate ENOENT error from nilfs_btree_propagate()
In preparation for writing logs, in nilfs_btree_propagate(), which makes
parent and ancestor node blocks dirty starting from a modified data block
or b-tree node block, if the starting block does not belong to the b-tree,
i.e.  is isolated, nilfs_btree_do_lookup() called within the function
fails with -ENOENT.

In this case, even though -ENOENT is an internal code, it is propagated to
the log writer via nilfs_bmap_propagate() and may be erroneously returned
to system calls such as fsync().

Fix this issue by changing the error code to -EINVAL in this case, and
having the bmap layer detect metadata corruption and convert the error
code appropriately.

Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com
Fixes: 1f5abe7e7d ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:12 -07:00
Wentao Liang
f43f024292 nilfs2: add pointer check for nilfs_direct_propagate()
Patch series "nilfs2: improve sanity checks in dirty state propagation".

This fixes one missed check for block mapping anomalies and one improper
return of an error code during a preparation step for log writing, thereby
improving checking for filesystem corruption on writeback.


This patch (of 2):

In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr()
need to be checked to ensure it is not an invalid pointer.

If the pointer value obtained by nilfs_direct_get_ptr() is
NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in
the nilfs_inode_info struct) that should point to the data block at the
buffer head of the argument is corrupted and the data block is orphaned,
meaning that the file system has lost consistency.

Add a value check and return -EINVAL when it is an invalid pointer.

Link: https://lkml.kernel.org/r/20250428173808.6452-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250428173808.6452-2-konishi.ryusuke@gmail.com
Fixes: 36a580eb48 ("nilfs2: direct block mapping")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:12 -07:00
Ryusuke Konishi
fb881cd760 nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
After commit c0e473a0d2 ("block: fix race between set_blocksize and read
paths") was merged, set_blocksize() called by sb_set_blocksize() now locks
the inode of the backing device file.  As a result of this change, syzbot
started reporting deadlock warnings due to a circular dependency involving
the semaphore "ns_sem" of the nilfs object, the inode lock of the backing
device file, and the locks that this inode lock is transitively dependent
on.

This is caused by a new lock dependency added by the above change, since
init_nilfs() calls sb_set_blocksize() in the lock section of "ns_sem". 
However, these warnings are false positives because init_nilfs() is called
in the early stage of the mount operation and the filesystem has not yet
started.

The reason why "ns_sem" is locked in init_nilfs() was to avoid a race
condition in nilfs_fill_super() caused by sharing a nilfs object among
multiple filesystem instances (super block structures) in the early
implementation.  However, nilfs objects and super block structures have
long ago become one-to-one, and there is no longer any need to use the
semaphore there.

So, fix this issue by removing the use of the semaphore "ns_sem" in
init_nilfs().

Link: https://lkml.kernel.org/r/20250503053327.12294-1-konishi.ryusuke@gmail.com
Fixes: c0e473a0d2 ("block: fix race between set_blocksize and read paths")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00f7f5b884b117ee6773
Tested-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Reported-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f30591e72bfc24d4715b
Tested-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07 23:39:42 -07:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
NeilBrown
88d5baf690
Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g.  on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns.  For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.

This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes.  Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.

This lookup-after-mkdir requires that the directory remains locked to
avoid races.  Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.

To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
  NULL - the directory was created and no other dentry was used
  ERR_PTR() - an error occurred
  non-NULL - this other dentry was spliced in

This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations.  Subsequent patches will make
further changes to some file-systems to return a correct dentry.

Not all filesystems reliably result in a positive hashed dentry:

- NFS, cifs, hostfs will sometimes need to perform a lookup of
  the name to get inode information.  Races could result in this
  returning something different. Note that this lookup is
  non-atomic which is what we are trying to avoid.  Placing the
  lookup in filesystem code means it only happens when the filesystem
  has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
  operation ensures that lookup will be called to correctly populate
  the dentry.  This could be fixed but I don't think it is important
  to any of the users of vfs_mkdir() which look at the dentry.

The recommendation to use
    d_drop();d_splice_alias()
is ugly but fits with current practice.  A planned future patch will
change this.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 20:00:17 +01:00
Nikita Zhandarovich
6438ef381c nilfs2: fix possible int overflows in nilfs_fiemap()
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.

While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.

Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.

Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01 03:53:26 -08:00
Ryusuke Konishi
e30ccbb9c0 nilfs2: do not update mtime of renamed directory that is not moved
A minor issue with nilfs_rename, originating from an old ext2
implementation, is that the mtime is updated even if the rename target is
a directory and it is renamed within the same directory, rather than moved
to a different directory.

In this case, the child directory being renamed does not change in any
way, so changing its mtime is unnecessary according to the specification,
and can unnecessarily confuse backup tools.

In ext2, this issue was fixed by commit 39fe7557b4 ("ext2: Do not update
mtime of a moved directory") and a few subsequent fixes, but it remained
in nilfs2.

Fix this issue by not calling nilfs_set_link(), which rewrites the inode
number of the directory entry that refers to the parent directory, when
the move target is a directory and the source and destination are the same
directory.

Here, the directory to be moved only needs to be read if the inode number
of the parent directory is rewritten with nilfs_set_link, so also adjust
the execution conditions of the preparation work to avoid unnecessary
directory reads.

Link: https://lkml.kernel.org/r/20250111143518.7901-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:26 -08:00
Ryusuke Konishi
ee70999a98 nilfs2: handle errors that nilfs_prepare_chunk() may return
Patch series "nilfs2: fix issues with rename operations".

This series fixes BUG_ON check failures reported by syzbot around rename
operations, and a minor behavioral issue where the mtime of a child
directory changes when it is renamed instead of moved.


This patch (of 2):

The directory manipulation routines nilfs_set_link() and
nilfs_delete_entry() rewrite the directory entry in the folio/page
previously read by nilfs_find_entry(), so error handling is omitted on the
assumption that nilfs_prepare_chunk(), which prepares the buffer for
rewriting, will always succeed for these.  And if an error is returned, it
triggers the legacy BUG_ON() checks in each routine.

This assumption is wrong, as proven by syzbot: the buffer layer called by
nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may
fail due to metadata corruption or other reasons.  This has been there all
along, but improved sanity checks and error handling may have made it more
reproducible in fuzzing tests.

Fix this issue by adding missing error paths in nilfs_set_link(),
nilfs_delete_entry(), and their caller nilfs_rename().

Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1
Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
2e62857f13 nilfs2: revise the return value description style for consistency.
Also for comments that do not cause kernel-doc warnings (those that list
multiple error codes), revise the return value description style to match
Brian G.'s suggestion of "..., or one of the following negative error
codes on failure:".

Link: https://lkml.kernel.org/r/CAAq45aNh1qV8P6XgDhKeNstT=PvcPUaCXsAF-f9rvmzznsZL5A@mail.gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-8-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
fd4e7fadfd nilfs2: add missing return value kernel-doc descriptions
There are a number of kernel-doc comments for functions that are missing
return values, which also causes a number of warnings when the kernel-doc
script is run with the "-Wall" option.

Fix this issue by adding proper return value descriptions, and improve
code maintainability.

Link: https://lkml.kernel.org/r/20250110010530.21872-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
343d4a33f2 nilfs2: correct return value kernel-doc descriptions for the rest
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for the
rest.

Link: https://lkml.kernel.org/r/20250110010530.21872-6-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
b8268501f1 nilfs2: correct return value kernel-doc descriptions for metadata files
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions for metadata file
functions other than sufile.

Link: https://lkml.kernel.org/r/20250110010530.21872-5-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
d33544adcb nilfs2: correct return value kernel-doc descriptions for sufile
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for
sufile-related functions, eliminating a dozen warnings emitted by the
kernel-doc script.

Link: https://lkml.kernel.org/r/20250110010530.21872-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
e4a99d48bb nilfs2: correct return value kernel-doc descriptions for bmap functions
Similar to the previous patch to fix the ioctl return value descriptions,
this fixes the format of the return value descriptions for bmap (and
btree)-related functions, which was causing the kernel-doc script to emit
a number of warnings.

Link: https://lkml.kernel.org/r/20250110010530.21872-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:25 -08:00
Ryusuke Konishi
17c46a45cd nilfs2: correct return value kernel-doc descriptions for ioctl functions
Patch series "nilfs2: fix kernel-doc comments for function return values",
v2.

This series fixes the inadequacies in the return value descriptions in
nilfs2's kernel-doc comments (mainly incorrect formatting), as well as the
lack of return value descriptions themselves, and fixes most of the
remaining warnings that are output when the kernel-doc script is run with
the "-Wall" option.


This patch (of 7):

In the kernel-doc comments for functions, there are many cases where the
format of the return value description is inaccurate, such as "Return
Value: ...", which causes many warnings to be output when the kernel-doc
script is executed with the "-Wall" option.

This fixes such incorrectly formatted return value descriptions for ioctl
functions.

Link: https://lkml.kernel.org/r/20250110010530.21872-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:24 -08:00
Ryusuke Konishi
367a9bffab nilfs2: protect access to buffers with no active references
nilfs_lookup_dirty_data_buffers(), which iterates through the buffers
attached to dirty data folios/pages, accesses the attached buffers without
locking the folios/pages.

For data cache, nilfs_clear_folio_dirty() may be called asynchronously
when the file system degenerates to read only, so
nilfs_lookup_dirty_data_buffers() still has the potential to cause use
after free issues when buffers lose the protection of their dirty state
midway due to this asynchronous clearing and are unintentionally freed by
try_to_free_buffers().

Eliminate this race issue by adjusting the lock section in this function.

Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:24 -08:00
Ryusuke Konishi
ca76bb226b nilfs2: do not force clear folio if buffer is referenced
Patch series "nilfs2: protect busy buffer heads from being force-cleared".

This series fixes the buffer head state inconsistency issues reported by
syzbot that occurs when the filesystem is corrupted and falls back to
read-only, and the associated buffer head use-after-free issue.


This patch (of 2):

Syzbot has reported that after nilfs2 detects filesystem corruption and
falls back to read-only, inconsistencies in the buffer state may occur.

One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty()
to set a data or metadata buffer as dirty, but it detects that the buffer
is not in the uptodate state:

 WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520
  fs/buffer.c:1177
 ...
 Call Trace:
  <TASK>
  nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598
  nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73
  nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344
  nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218
  vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
  do_mkdirat+0x264/0x3a0 fs/namei.c:4280
  __do_sys_mkdirat fs/namei.c:4295 [inline]
  __se_sys_mkdirat fs/namei.c:4293 [inline]
  __x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

The other is when nilfs_btree_propagate(), which propagates the dirty
state to the ancestor nodes of a b-tree that point to a dirty buffer,
detects that the origin buffer is not dirty, even though it should be:

 WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089
  nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089
 ...
 Call Trace:
  <TASK>
  nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345
  nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587
  nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006
  nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045
  nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline]
  nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline]
  nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115
  nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479
  nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline]
  nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701
  kthread+0x2f0/0x390 kernel/kthread.c:389
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
  </TASK>

Both of these issues are caused by the callbacks that handle the
page/folio write requests, forcibly clear various states, including the
working state of the buffers they hold, at unexpected times when they
detect read-only fallback.

Fix these issues by checking if the buffer is referenced before clearing
the page/folio state, and skipping the clear if it is.

Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7
Reported-by: syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Tested-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:24 -08:00