Commit Graph

95369 Commits

Author SHA1 Message Date
Linus Torvalds
b5287c55de vfs-6.13.exec.deny_write_access.revert
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ0cIXAAKCRCRxhvAZXjc
 orkEAP0WCHag28YUS8IYteRSQx+rIxYCacMWExfeJsXCKMTlOQD/Rli3A8eu2sfE
 PgU8xIYwd/f7MyE9wz5s5nZUNijKnAc=
 =zjsd
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.exec.deny_write_access.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull deny_write_access revert from Christian Brauner:
 "It turns out that the mold linker relies on the deny_write_access()
  mechanism for executables.

  The mold linker tries to open a file for writing and if ETXTBSY is
  returned mold falls back to creating a new file"

* tag 'vfs-6.13.exec.deny_write_access.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "fs: don't block i_writecount during exec"
2024-11-27 08:03:38 -08:00
Christian Brauner
3b83203538
Revert "fs: don't block i_writecount during exec"
This reverts commit 2a010c4128.

Rui Ueyama <rui314@gmail.com> writes:

> I'm the creator and the maintainer of the mold linker
> (https://github.com/rui314/mold). Recently, we discovered that mold
> started causing process crashes in certain situations due to a change
> in the Linux kernel. Here are the details:
>
> - In general, overwriting an existing file is much faster than
> creating an empty file and writing to it on Linux, so mold attempts to
> reuse an existing executable file if it exists.
>
> - If a program is running, opening the executable file for writing
> previously failed with ETXTBSY. If that happens, mold falls back to
> creating a new file.
>
> - However, the Linux kernel recently changed the behavior so that
> writing to an executable file is now always permitted
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a010c412853).
>
> That caused mold to write to an executable file even if there's a
> process running that file. Since changes to mmap'ed files are
> immediately visible to other processes, any processes running that
> file would almost certainly crash in a very mysterious way.
> Identifying the cause of these random crashes took us a few days.
>
> Rejecting writes to an executable file that is currently running is a
> well-known behavior, and Linux had operated that way for a very long
> time. So, I don’t believe relying on this behavior was our mistake;
> rather, I see this as a regression in the Linux kernel.

Quoting myself from commit 2a010c4128 ("fs: don't block i_writecount during exec")

> Yes, someone in userspace could potentially be relying on this. It's not
> completely out of the realm of possibility but let's find out if that's
> actually the case and not guess.

It seems we found out that someone is relying on this obscure behavior.
So revert the change.

Link: https://github.com/rui314/mold/issues/1361
Link: https://lore.kernel.org/r/4a2bc207-76be-4715-8e12-7fc45a76a125@leemhuis.info
Cc: <stable@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-27 12:51:30 +01:00
Linus Torvalds
6daf0882c6 vfs-6.13.ecryptfs.mount.api
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcoSAAKCRCRxhvAZXjc
 otZlAP9h7WaciOu+BWvPJzAhCy+zfjaMdB3HffjKZ7m6Kg2YIAEApu/FftxXEyLC
 XCKVmuHQzb58X+bIyuS62YIBjOfA2gY=
 =EUT7
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull ecryptfs mount api conversion from Christian Brauner:
 "Convert ecryptfs to the new mount api"

* tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ecryptfs: Fix spelling mistake "validationg" -> "validating"
  ecryptfs: Convert ecryptfs to use the new mount API
  ecryptfs: Factor out mount option validation
2024-11-26 13:39:02 -08:00
Linus Torvalds
1675db5c42 vfs-6.13.exportfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzckSQAKCRCRxhvAZXjc
 oroxAQDqt3NN64UCM14rrmiC2rw48SYjaYj4Nu0sRaJgScOFLgEA3B2I5Lh+bw6b
 fVH/uUjOsm50eYuFbqjOmEAp2DNP/QY=
 =14eq
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs exportfs updates from Christian Brauner:
 "This contains work to bring NFS connectable file handles to userspace
  servers.

  The name_to_handle_at() system call is extended to encode connectable
  file handles. Such file handles can be resolved to an open file with a
  connected path. So far userspace NFS servers couldn't make use of this
  functionality even though the kernel does already support it. This is
  achieved by introducing a new flag for name_to_handle_at().

  Similarly, the open_by_handle_at() system call is tought to understand
  connectable file handles explicitly created via name_to_handle_at()"

* tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: open_by_handle_at() support for decoding "explicit connectable" file handles
  fs: name_to_handle_at() support for "explicit connectable" file handles
  fs: prepare for "explicit connectable" file handles
2024-11-26 13:26:15 -08:00
Linus Torvalds
445d9f05fa NFSD 6.13 Release Notes
Jeff Layton contributed a scalability improvement to NFSD's NFSv4
 backchannel session implementation. This improvement is intended to
 increase the rate at which NFSD can safely recall NFSv4 delegations
 from clients, to avoid the need to revoke them. Revoking requires
 a slow state recovery process.
 
 A wide variety of bug fixes and other incremental improvements make
 up the bulk of commits in this series. As always I am grateful to
 the NFSD contributors, reviewers, testers, and bug reporters who
 participated during this cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmdEgLQACgkQM2qzM29m
 f5cwmg/9HcfG7blepU/2qNHopzSYRO5vZw1YNJQ5/Wi3bmqIea83lf8OcCY1G/aj
 6K+jnenzHrwfhaA4u7N2FPXPVl8sPSMuOrJXY5zC4yE5QnIbranjcyEW5l5zlj3n
 ukkTYQgjUsKre3pHlvn3JmDHfUhNPEfzirsJeorP7DS3omne+OFA1LNncNP6emRu
 h0aEC6EJ43zUkYiz9nZYqPwIAwrUIA0WOrvVnq7vsi6gR4/Muk7nS+X/y4qFjli3
 9enVskEv8sFmmOAIMK3CHJq+exEeKtKEKUuYkD23QgPt2R4+IwqS70o9IM/S1ypf
 APiv958BIhxm/SwUn1IjoxIckTB5EdksMxU5/4qGr1ZxprPG4/ruKO80BkrxLzW2
 n1HmJ4ZNnpWPQvHN7RQ0WOsPNzL8byxJbGr1bpNgU4AGXnTFWPrAnB6juiyX4xb+
 YNfgkQGDY79o7r1OJ5UUdCyx0QBSnaLNACTGm2u2FpI/ukMFPdrWIE99QbBgSe1p
 MgWaiPwSY+9crFfGPJeQ4t6/siRAec6L3RO9KT9Epcd2S7/Uts3NXYRdJfwZ+Qza
 TkPY2bm7T/WCcMhW7DN372hqgfRHPWOf4tacJ1Tob+As1d6p6qXEX2zi6piCCOLj
 dmTVDSVPClRXt8YigF9WqosyWv1jUzSnh9ne+eYPBpj93Ag2YBY=
 =wBvS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Jeff Layton contributed a scalability improvement to NFSD's NFSv4
  backchannel session implementation. This improvement is intended to
  increase the rate at which NFSD can safely recall NFSv4 delegations
  from clients, to avoid the need to revoke them. Revoking requires a
  slow state recovery process.

  A wide variety of bug fixes and other incremental improvements make up
  the bulk of commits in this series. As always I am grateful to the
  NFSD contributors, reviewers, testers, and bug reporters who
  participated during this cycle"

* tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits)
  nfsd: allow for up to 32 callback session slots
  nfs_common: must not hold RCU while calling nfsd_file_put_local
  nfsd: get rid of include ../internal.h
  nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur
  NFSD: Add nfsd4_copy time-to-live
  NFSD: Add a laundromat reaper for async copy state
  NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations
  NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD
  NFSD: Free async copy information in nfsd4_cb_offload_release()
  NFSD: Fix nfsd4_shutdown_copy()
  NFSD: Add a tracepoint to record canceled async COPY operations
  nfsd: make nfsd4_session->se_flags a bool
  nfsd: remove nfsd4_session->se_bchannel
  nfsd: make use of warning provided by refcount_t
  nfsd: Don't fail OP_SETCLIENTID when there are too many clients.
  svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
  xdrgen: Remove program_stat_to_errno() call sites
  xdrgen: Update the files included in client-side source code
  xdrgen: Remove check for "nfs_ok" in C templates
  xdrgen: Remove tracepoint call site
  ...
2024-11-26 12:59:30 -08:00
Linus Torvalds
44b4d13b70 f2fs-for-6.13-rc1
This series introduces a device aliasing feature where user can carve out
 partitions but reclaim the space back by deleting aliased file in root dir.
 In addition to that, there're numerous minor bug fixes in zoned device support,
 checkpoint=disable, extent cache management, fiemap, and lazytime mount option.
 The full list of noticeable changes can be found below.
 
 Enhancement:
  - introduce device aliasing file
  - add stats in debugfs to show multiple devices
  - add a sysfs node to limit max read extent count per-inode
  - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
  - decrease spare area for pinned files for zoned devices
 
 Bug fix:
  - Revert "f2fs: remove unreachable lazytime mount option parsing"
  - adjust unusable cap before checkpoint=disable mode
  - fix to drop all discards after creating snapshot on lvm device
  - fix to shrink read extent node in batches
  - fix changing cursegs if recovery fails on zoned device
  - fix to adjust appropriate length for fiemap
  - fix fiemap failure issue when page size is 16KB
  - fix to avoid forcing direct write to use buffered IO on inline_data inode
  - fix to map blocks correctly for direct write
  - fix to account dirty data in __get_secs_required()
  - fix null-ptr-deref in f2fs_submit_page_bio()
  - f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI
 UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9
 O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG
 QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe
 a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/
 Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6
 Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC
 j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn
 2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE
 6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt
 J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V
 MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU=
 =LOgM
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This series introduces a device aliasing feature where user can carve
  out partitions but reclaim the space back by deleting aliased file in
  root dir.

  In addition to that, there're numerous minor bug fixes in zoned device
  support, checkpoint=disable, extent cache management, fiemap, and
  lazytime mount option. The full list of noticeable changes can be
  found below.

  Enhancements:
   - introduce device aliasing file
   - add stats in debugfs to show multiple devices
   - add a sysfs node to limit max read extent count per-inode
   - modify f2fs_is_checkpoint_ready logic to allow more data to be
     written with the CP disable
   - decrease spare area for pinned files for zoned devices

  Fixes:
   - Revert "f2fs: remove unreachable lazytime mount option parsing"
   - adjust unusable cap before checkpoint=disable mode
   - fix to drop all discards after creating snapshot on lvm device
   - fix to shrink read extent node in batches
   - fix changing cursegs if recovery fails on zoned device
   - fix to adjust appropriate length for fiemap
   - fix fiemap failure issue when page size is 16KB
   - fix to avoid forcing direct write to use buffered IO on inline_data
     inode
   - fix to map blocks correctly for direct write
   - fix to account dirty data in __get_secs_required()
   - fix null-ptr-deref in f2fs_submit_page_bio()
   - fix inconsistent update of i_blocks in release_compress_blocks and
     reserve_compress_blocks"

* tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: fix to drop all discards after creating snapshot on lvm device
  f2fs: add a sysfs node to limit max read extent count per-inode
  f2fs: fix to shrink read extent node in batches
  f2fs: print message if fscorrupted was found in f2fs_new_node_page()
  f2fs: clear SBI_POR_DOING before initing inmem curseg
  f2fs: fix changing cursegs if recovery fails on zoned device
  f2fs: adjust unusable cap before checkpoint=disable mode
  f2fs: fix to requery extent which cross boundary of inquiry
  f2fs: fix to adjust appropriate length for fiemap
  f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
  f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
  f2fs: replace deprecated strcpy with strscpy
  Revert "f2fs: remove unreachable lazytime mount option parsing"
  f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
  f2fs: fix to map blocks correctly for direct write
  f2fs: fix race in concurrent f2fs_stop_gc_thread
  f2fs: fix fiemap failure issue when page size is 16KB
  f2fs: remove redundant atomic file check in defragment
  f2fs: fix to convert log type to segment data type correctly
  f2fs: clean up the unused variable additional_reserved_segments
  ...
2024-11-26 12:50:58 -08:00
Linus Torvalds
fb527fc1f3 fuse update for 6.13
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ0Rb/wAKCRDh3BK/laaZ
 PK80AQDAUgA6S5SSrbJxwRFNOhbwtZxZqJ8fomJR5xuWIEQ9pwEAkpFqhBhBW0y1
 0YaREow2aDANQQtSUrfPtgva1ZXFwQU=
 =Cyx5
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Add page -> folio conversions (Joanne Koong, Josef Bacik)

 - Allow max size of fuse requests to be configurable with a sysctl
   (Joanne Koong)

 - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun)

 - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao)

 - Fix attribute inconsistency in case readdirplus (and plain lookup in
   corner cases) is racing with inode eviction (Zhang Tianci)

 - Fix a WARN_ON triggered by virtio_fs (Asahi Lina)

* tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits)
  virtiofs: dax: remove ->writepages() callback
  fuse: check attributes staleness on fuse_iget()
  fuse: remove pages for requests and exclusively use folios
  fuse: convert direct io to use folios
  mm/writeback: add folio_mark_dirty_lock()
  fuse: convert writebacks to use folios
  fuse: convert retrieves to use folios
  fuse: convert ioctls to use folios
  fuse: convert writes (non-writeback) to use folios
  fuse: convert reads to use folios
  fuse: convert readdir to use folios
  fuse: convert readlink to use folios
  fuse: convert cuse to use folios
  fuse: add support in virtio for requests using folios
  fuse: support folios in struct fuse_args_pages and fuse_copy_pages()
  fuse: convert fuse_notify_store to use folios
  fuse: convert fuse_retrieve to use folios
  fuse: use the folio based vmstat helpers
  fuse: convert fuse_writepage_need_send to take a folio
  fuse: convert fuse_do_readpage to use folios
  ...
2024-11-26 12:41:27 -08:00
Linus Torvalds
ff2a7a064a gfs2 changes
- Fix the code that cleans up left-over unlinked files.  Various fixes
   and minor improvements in deleting files cached or held open remotely.
 
 - Simplify the use of dlm's DLM_LKF_QUECVT flag.
 
 - A few other minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmdESTYUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpdyA/9EWDxx2Y6JeVeAC+J138pSOYqHtwn
 wLtMeTdwbycW6M8V5kyW3vCh+lLLS6s0dZuwn2Xv8jx5QytrD4c51Wj3bRYjuidM
 Zt0L+wohOQISvL1+AViYuIns2pzQQvNZUC2aAVr9J3KGhdIFonbU6PdLOeEN0cZe
 R08Nseux9oJ/geaKJ3jh/ReX2VZehp2WAaQ4I+PoQkkNflBULPkyysxjkv9sc8tW
 9hN1sK7dk/U5OLKr4H6SSi1Uu6N6Wek0x2zo4NxTRqyfBiRXYtZYnXPkdftuB+6N
 M7N2dAIuhnXiAhQdo7OOe9hZZVXTFhmeQK1tyTsw/FZkQJNMX+bdBn4g7NV94drz
 CpTliqm+Z5dTnkSdS4cIozkQZ7zID1eibX8uF7QsnozBWm7bjbW6fi7a+z+u5ykN
 hsWanoMKhH1524oNKaiSjIxT0b1oda114DJQVpdU68HjkyHf5l0GXUTcVpg0dxs3
 peXhpZ+CjHbaTMXl5xqGOucD+ACPhMOGXPAX1lF2bIcfbLqgbTVn0fMMUYWeb8j1
 medJtQ0itwpiCHZTl62xUOLEOCqCiS5J1/TjrwNuJ1HLJ5JP1UePNl5kjT9nDfsA
 KXB31sKFfPX99rPVYJgjLXPgRLwslcniSHOg9p+bWwq7ZI1PSxPSauUgtKZpSe6A
 E3YfnIxjPxRMKks=
 =BP41
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix the code that cleans up left-over unlinked files.

   Various fixes and minor improvements in deleting files cached or held
   open remotely.

 - Simplify the use of dlm's DLM_LKF_QUECVT flag.

 - A few other minor cleanups.

* tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits)
  gfs2: Prevent inode creation race
  gfs2: Only defer deletes when we have an iopen glock
  gfs2: Simplify DLM_LKF_QUECVT use
  gfs2: gfs2_evict_inode clarification
  gfs2: Make gfs2_inode_refresh static
  gfs2: Use get_random_u32 in gfs2_orlov_skip
  gfs2: Randomize GLF_VERIFY_DELETE work delay
  gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict
  gfs2: Update to the evict / remote delete documentation
  gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode
  gfs2: Clean up delete work processing
  gfs2: Minor delete_work_func cleanup
  gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock
  gfs2: Rename dinode_demise to evict_behavior
  gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE
  gfs2: Faster gfs2_upgrade_iopen_glock wakeups
  KMSAN: uninit-value in inode_go_dump (5)
  gfs2: Fix unlinked inode cleanup
  gfs2: Allow immediate GLF_VERIFY_DELETE work
  gfs2: Initialize gl_no_formal_ino earlier
  ...
2024-11-26 12:34:50 -08:00
Linus Torvalds
f5f4745a7f - The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code.
 
 - The series "Improve the copy of task comm" from Yafang Shao addresses
   possible race-induced overflows in the management of task_struct.comm[].
 
 - The series "Remove unnecessary header includes from
   {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
   small fix to the list_sort library code and to its selftest.
 
 - The series "Enhance min heap API with non-inline functions and
   optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
   min_heap library code.
 
 - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
   finishes off nilfs2's folioification.
 
 - The series "add detect count for hung tasks" from Lance Yang adds more
   userspace visibility into the hung-task detector's activity.
 
 - Apart from that, singelton patches in many places - please see the
   individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA
 jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m
 Ii8igH0UBibIgva7MrCyJedDI1O23AA=
 =8BIU
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "resource: A couple of cleanups" from Andy Shevchenko
   performs some cleanups in the resource management code

 - The series "Improve the copy of task comm" from Yafang Shao addresses
   possible race-induced overflows in the management of
   task_struct.comm[]

 - The series "Remove unnecessary header includes from
   {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
   small fix to the list_sort library code and to its selftest

 - The series "Enhance min heap API with non-inline functions and
   optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
   min_heap library code

 - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
   finishes off nilfs2's folioification

 - The series "add detect count for hung tasks" from Lance Yang adds
   more userspace visibility into the hung-task detector's activity

 - Apart from that, singelton patches in many places - please see the
   individual changelogs for details

* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  gdb: lx-symbols: do not error out on monolithic build
  kernel/reboot: replace sprintf() with sysfs_emit()
  lib: util_macros_kunit: add kunit test for util_macros.h
  util_macros.h: fix/rework find_closest() macros
  Improve consistency of '#error' directive messages
  ocfs2: fix uninitialized value in ocfs2_file_read_iter()
  hung_task: add docs for hung_task_detect_count
  hung_task: add detect count for hung tasks
  dma-buf: use atomic64_inc_return() in dma_buf_getfile()
  fs/proc/kcore.c: fix coccinelle reported ERROR instances
  resource: avoid unnecessary resource tree walking in __region_intersects()
  ocfs2: remove unused errmsg function and table
  ocfs2: cluster: fix a typo
  lib/scatterlist: use sg_phys() helper
  checkpatch: always parse orig_commit in fixes tag
  nilfs2: convert metadata aops from writepage to writepages
  nilfs2: convert nilfs_recovery_copy_block() to take a folio
  nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
  nilfs2: remove nilfs_writepage
  nilfs2: convert checkpoint file to be folio-based
  ...
2024-11-25 16:09:48 -08:00
Linus Torvalds
5c00ff742b - The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
   This leads to improved memory savings.
 
 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
 
 	- "refine mas_mab_cp()"
 	- "Reduce the space to be cleared for maple_big_node"
 	- "maple_tree: simplify mas_push_node()"
 	- "Following cleanup after introduce mas_wr_store_type()"
 	- "refine storing null"
 
 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.
 
 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping code.
 
 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of shadow
   entries.
 
 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.
 
 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in the
   hugetlb code.
 
 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page into
   small pages.  Instead we replace the whole thing with a THP.  More
   consistent cleaner and potentiall saves a large number of pagefaults.
 
 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.
 
 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to do.
 
 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio size
   rather than as individual pages.  A 20% speedup was observed.
 
 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
 
 - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
   removes the long-deprecated memcgv2 charge moving feature.
 
 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.
 
 - The series "x86/module: use large ROX pages for text allocations" from
   Mike Rapoport teaches x86 to use large pages for read-only-execute
   module text.
 
 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.
 
 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/.  A slow march towards shrinking
   struct page.
 
 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.
 
 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression.  It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.
 
 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
   over to the KUnit framework.
 
 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a single
   VMA, rather than requiring that multiple VMAs be created for this.
   Improved efficiencies for userspace memory allocators are expected.
 
 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.
 
 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.
 
 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP from
   the kernel boot command line.
 
 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.
 
 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep is
   enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
 jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
 xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
 =JfWR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
2024-11-23 09:58:07 -08:00
Chao Yu
bc8aeb04fd f2fs: fix to drop all discards after creating snapshot on lvm device
Piergiorgio reported a bug in bugzilla as below:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
 __issue_discard_cmd+0x1ca/0x350 [f2fs]
 issue_discard_thread+0x191/0x480 [f2fs]
 kthread+0xcf/0x100
 ret_from_fork+0x31/0x50
 ret_from_fork_asm+0x1a/0x30

w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs

The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.

This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().

Cc: stable@vger.kernel.org
Fixes: 35ec7d5748 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:15 +00:00
Chao Yu
009a8241a8 f2fs: add a sysfs node to limit max read extent count per-inode
Quoted:
"at this time, there are still 1086911 extent nodes in this zombie
extent tree that need to be cleaned up.

crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500
  node_cnt = {
    counter = 1086911
  },
"

As reported by Xiuhong, there will be a huge number of extent nodes
in extent tree, it may potentially cause:
- slab memory fragments
- extreme long time shrink on extent tree
- low mapping efficiency

Let's add a sysfs node to limit max read extent count for each inode,
by default, value of this threshold is 10240, it can be updated
according to user's requirement.

Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:13 +00:00
Linus Torvalds
228a1157fb 15 smb3 client fixes, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmdA/W0ACgkQiiy9cAdy
 T1HN6Av/ai3dqbNO0d9IrmM7JONITa6CYqBA+DC2zyv7LtNmSOVGAeP+LyEuM0rE
 jHbQBNSRVhoNKCv5ywT6GhjMNnDeO0ctZG2aXoQnGfCFdZ5dE/08r8Cc24xQW+x5
 cdLbBGR22HBi4MjqqWALL2goL9TpLYo9ht31P1xmiM1pw3/pbh2lsEDLVMS/veeG
 RQ1pIg5YpWcWQWAnuwI6RDxGHF2taj7tcm8gZ1mYRUlaPHmsCHeNg6lgHdDXLFKw
 0HA4dx8AeH2rdMWgwP42UVIVoxG/H//xPpLym+A8yV+11EJcurjYkskhEsSUxeWq
 vzsf0xFxN2MhjwI5DYfr5kIQknjS3qCTOofRsJi6s5GhxJy2hl0Wfto0AJs6+Fl4
 /sukKWjWSgrkcpILLokDUZptYEBH7pUNcBVpfcQOScnUeV4qQk8oko3bTKDSfgTq
 q3zrh2mPanTOpM6UgcBRfywR4r5tAKzoQOUfJpQuItLvLhgrgAm+WGCWFepztJlT
 LPEGTxrG
 =K4Ai
 -----END PGP SIGNATURE-----

Merge tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

 - Fix two SMB3.1.1 POSIX Extensions problems

 - Fixes for special file handling (symlinks and FIFOs)

 - Improve compounding

 - Four cleanup patches

 - Fix use after free in signing

 - Add support for handling namespaces for reconnect related upcalls
   (e.g. for DNS names resolution and auth)

 - Fix various directory lease problems (directory entry caching),
   including some important potential use after frees

* tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: prevent use-after-free due to open_cached_dir error paths
  smb: Don't leak cfid when reconnect races with open_cached_dir
  smb: client: handle max length for SMB symlinks
  smb: client: get rid of bounds check in SMB2_ioctl_init()
  smb: client: improve compound padding in encryption
  smb3: request handle caching when caching directories
  cifs: Recognize SFU char/block devices created by Windows NFS server on Windows Server <<2012
  CIFS: New mount option for cifs.upcall namespace resolution
  smb/client: Prevent error pointer dereference
  fs/smb/client: implement chmod() for SMB3 POSIX Extensions
  smb: cached directories can be more than root file handle
  smb: client: fix use-after-free of signing key
  smb: client: Use str_yes_no() helper function
  smb: client: memcpy() with surrounding object base address
  cifs: Remove pre-historic unused CIFSSMBCopy
2024-11-22 21:54:14 -08:00
Linus Torvalds
e7675238b9 overlayfs updates for 6.13
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmc90jsACgkQEVvVyTe/
 1Wol0A//RhzFCG8geR7Grbptp40CUm9kVISvkr50mPBdvVk3jX9WvH9m/10qapGP
 tcGHSdHt+q5qabqutKLmQRiFbwpGEaBMaFOe7JH8na8xWvmSa3p7sJC5kLByS3rm
 D2F+cVx3Di7MTscz/Ma724bHdHOUO5RbDuMIcjp7uXRvaNWJ0uZg5xWlBKsNa3h8
 DbNSYi5ICihLYpUxI9NglHZ6iqcS2jHsUHSAw52/GJ2Zon1LAAmKoSn6s7hZ27ZJ
 f8Rv5fFuYmkRV7nYo/gjLY1gt7KXZFcfUtMT05yd7zcnqDayKEFXEiwI/Bz5fXZL
 HmZpOP4RV2M9B8HzhReVR/yG8gZaaUezX+aVQp7plZSc73GhMdFFd1bUyjgJ4Lzf
 C2BlBMWafc/Zc7a7r0+X5577i34nED8lGuVMEdYMtjSjstpzIP+1Wlzn2cGi4+5K
 VAb+kEravjP9ck7YrmbruRYfVhDaE37BDs4XML4S8gzcZgdaTcEMyGw1ifEhvPjA
 vLbRs24a5VO7/cKlks7PWS6i9uExaz7g4re0jUPwUuc+nS+Hv+y8kLSPqLS4CtNY
 MxhS2IhKK5gp1Z9XGpLsak+ancTYLSV0OJ15qsAChpqoqSG5Xd9Lt4CWACnF33Ea
 ny8z5QpOAHWVb97k6xaEvu/r0dl+PHdG7vfb0MNhXaajNF8SKiU=
 =pgoX
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs updates from Amir Goldstein:

 - Fix a syzbot reported NULL pointer deref with bfs lower layers

 - Fix a copy up failure of large file from lower fuse fs

 - Followup cleanup of backing_file API from Miklos

 - Introduction and use of revert/override_creds_light() helpers, that
   were suggested by Christian as a mitigation to cache line bouncing
   and false sharing of fields in overlayfs creator_cred long lived
   struct cred copy.

 - Store up to two backing file references (upper and lower) in an
   ovl_file container instead of storing a single backing file in
   file->private_data.

   This is used to avoid the practice of opening a short lived backing
   file for the duration of some file operations and to avoid the
   specialized use of FDPUT_FPUT in such occasions, that was getting in
   the way of Al's fd_file() conversions.

* tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: Filter invalid inodes with missing lookup function
  ovl: convert ovl_real_fdget() callers to ovl_real_file()
  ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
  ovl: store upper real file in ovl_file struct
  ovl: allocate a container struct ovl_file for ovl private context
  ovl: do not open non-data lower file for fsync
  ovl: Optimize override/revert creds
  ovl: pass an explicit reference of creators creds to callers
  ovl: use wrapper ovl_revert_creds()
  fs/backing-file: Convert to revert/override_creds_light()
  cred: Add a light version of override/revert_creds()
  backing-file: clean up the API
  ovl: properly handle large files in ovl_security_fileattr
2024-11-22 20:55:42 -08:00
Linus Torvalds
060fc106b6 unicode updates
This update includes:
 
   - A patch by Thomas Weißschuh constifying a read-only struct.
 
   - A patch by André Almeida fixing the error path of unicode_load,
   which might trigger a kernel oops if it fails to find the unicode
   module.
 
   - One documentation fix by Gan Jie, updating a filename in the README.
 
   - A patch by André Almeida adding the link of my tree to MAINTAINERS.
 
 All but the MAINTAINERS patch have been sitting on my tree and in
 linux-next since early in the 6.12 cycle.
 
 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZ0D4JwAKCRBOwQgI3t8R
 JmjZAP988O9eB4ITF6KHKsHyY3pOhxSRXU5jpr78v7ofDDuGwAD/UBJZyF35wgJz
 S2q295kCAEP8bUKxj6RJtyMyQnamQg8=
 =irw2
 -----END PGP SIGNATURE-----

Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
2024-11-22 20:50:55 -08:00
Linus Torvalds
980f8f8fd4 Summary
* sysctl ctl_table constification
 
   Constifying ctl_table structs prevents the modification of proc_handler
   function pointers. All ctl_table struct arguments are const qualified in the
   sysctl API in such a way that the ctl_table arrays being defined elsewhere
   and passed through sysctl can be constified one-by-one. We kick the
   constification off by qualifying user_table in kernel/ucount.c and expect all
   the ctl_tables to be constified in the coming releases.
 
 * Misc fixes
 
   Adjust comments in two places to better reflect the code. Remove superfluous
   dput calls. Remove Luis from sysctl maintainership. Replace comments about
   holding a lock with calls to lockdep_assert_held.
 
 * Testing
 
   All these went through 0-day and they have all been in linux-next for at
   least 1 month (since Oct-24). I also rand these through the sysctl selftest
   for x86_64.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmdAXMsACgkQupfNUreW
 QU/KfQv8Daq9sew98ohmS/lkdoE1dfpI72motzEn1993CbLjN2h3CZauaHjBPFnr
 rpr8qPrphdWTyDbDMgx63oxcNxM07g7a9H0y/K3IwdUsx7fGINgHF5kfWeVn09ov
 X8I3NuL/+xSHAZRsLQeBykbY6BD5e0uuxL6ayGzkejrgRd+80dmC3MzXqX207v1z
 rlrUFXEXwqKYgxP/H+pxmvmVWKAeFsQt/E49GOkg2qSg9mVFhtKpxHwMJVqS2a8u
 qAKHgcZhB5T8TQSb1eKnyCzXLDLpzqUBj9ejqJSsQm16fweawv221Ji6a1k53QYG
 chreoB9R8qCZ/jGoWI3ZKGRZ/Vl37l+GF/82X/sDrMbKwVlxvaERpb1KXrnh/D1v
 qNze1Eea0eYv22weGGEa3J5N2tKfgX6NcRFioDNe9VEXX6zDcAtJKTKZtbMB3gXX
 CzQicH5yXApyAk3aNCq0S3s+WRQR0syGAYCmtxhaRgXRnSu9qifKZ1XhZQyhgKIG
 Flt9MsU2
 =bOJ0
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:
 "sysctl ctl_table constification:

   - Constifying ctl_table structs prevents the modification of
     proc_handler function pointers. All ctl_table struct arguments are
     const qualified in the sysctl API in such a way that the ctl_table
     arrays being defined elsewhere and passed through sysctl can be
     constified one-by-one.

     We kick the constification off by qualifying user_table in
     kernel/ucount.c and expect all the ctl_tables to be constified in
     the coming releases.

  Misc fixes:

   - Adjust comments in two places to better reflect the code

   - Remove superfluous dput calls

   - Remove Luis from sysctl maintainership

   - Replace comments about holding a lock with calls to
     lockdep_assert_held"

* tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: Reduce dput(child) calls in proc_sys_fill_cache()
  sysctl: Reorganize kerneldoc parameter names
  ucounts: constify sysctl table user_table
  sysctl: update comments to new registration APIs
  MAINTAINERS: remove me from sysctl
  sysctl: Convert locking comments to lockdep assertions
  const_structs.checkpatch: add ctl_table
  sysctl: make internal ctl_tables const
  sysctl: allow registration of const struct ctl_table
  sysctl: move internal interfaces to const struct ctl_table
  bpf: Constify ctl_table argument of filter function
2024-11-22 20:36:11 -08:00
Linus Torvalds
563cb0b1e7 cxl changes for v6.13
- Constify range_contains() input parameters to prevent changes.
 - Add support for displaying RCD capabilities in sysfs to support lspci for CXL device.
 - Downgrade warning message to debug in cxl_probe_component_regs().
 - Add support for adding a printf specifier '$pra' to emit 'struct range' content.
   - Add sanity tests for 'struct resource'.
   - Add documentation for special case.
   - Add %pra for 'struct range'.
   - Add %pra usage in CXL code.
 - Add preparation code for DCD support
   - Add range_overlaps().
   - Add CDAT DSMAS table shared and read only flag in ACPICA.
   - Add documentation to 'struct dev_dax_range'.
   - Delay event buffer allocation in CXL PCI code until needed.
   - Use guard() in cxl_dpa_set_mode().
   - Refactor create region code to consolidate common code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmc84dMACgkQYGjFFmlT
 OEoGTg//cSJlQ9X7+xZDbngnzpJwcLzQkR/FXDfe3obtmgs7woDJgNNcYnKSlgyf
 wal47Q0UM/1Hv8Dtfrt62Ay1fmOvDL2GSpey35NVJGCEpIsfOqqk1zTCgfgwRHTO
 MZJLnOSFUIlDYlVz8ljLNHnNqPjr7dCoUh9tdBefvkw59FqbkHNcWI8hG1lh1SR4
 2frtJcqVg54S6vJa2eeWmNVpxz7RZvPFrb8TJzhdrGM8PkTMNFA2oJINAf0j00Ev
 8/T6HXTxXvFtNhBH0dtMO1MFh1d6Qr/zFnX/gmrnPWl1l/12HFDMBIZIzq/Whjpo
 +7hQ5xK3cwkMevFgFrAhwdZMj8maR84x1dbFItoThaoeDIQ4sGfyQEMPsbkZP/Sc
 67i5hQFIBZc+ORLB0W+z9Da52ZFGyVw/xsCmDRzXCw4s7N2twpydIoA7Pvu9NN1X
 3JVF35NrsRZ+PyuGWEitNjo0Rj6swNpBC5Xv/T1mgFtSgvVuk1T2QtSHJcPoQyzQ
 zbijsCKmvJYbdJBnPiotdrBs1BUxBsP9dBT9IxWzMy6lcEpTJrYpUheRCk2tSHFa
 Kk8O8IYNiBKZaSpN9UHKaGzr43H8gNbLf4svSIiu1lZJTSSdtWqfZZYjXFBgB1Vb
 l2gBCDmPJ0y7WKZSCa53UmQiOusr+l3Pi+OflZEfCy6JxbSqTTM=
 =GNlu
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl updates from Dave Jiang:

 - Constify range_contains() input parameters to prevent changes

 - Add support for displaying RCD capabilities in sysfs to support lspci
   for CXL device

 - Downgrade warning message to debug in cxl_probe_component_regs()

 - Add support for adding a printf specifier '%pra' to emit 'struct
   range' content:
     - Add sanity tests for 'struct resource'
     - Add documentation for special case
     - Add %pra for 'struct range'
     - Add %pra usage in CXL code

 - Add preparation code for DCD support:
     - Add range_overlaps()
     - Add CDAT DSMAS table shared and read only flag in ACPICA
     - Add documentation to 'struct dev_dax_range'
     - Delay event buffer allocation in CXL PCI code until needed
     - Use guard() in cxl_dpa_set_mode()
     - Refactor create region code to consolidate common code

* tag 'cxl-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/region: Refactor common create region code
  cxl/hdm: Use guard() in cxl_dpa_set_mode()
  cxl/pci: Delay event buffer allocation
  dax: Document struct dev_dax_range
  ACPI/CDAT: Add CDAT/DSMAS shared and read only flag values
  range: Add range_overlaps()
  cxl/cdat: Use %pra for dpa range outputs
  printf: Add print format (%pra) for struct range
  Documentation/printf: struct resource add start == end special case
  test printf: Add very basic struct resource tests
  cxl: downgrade a warning message to debug level in cxl_probe_component_regs()
  cxl/pci: Add sysfs attribute for CXL 1.1 device link status
  cxl/core/regs: Add rcd_pcie_cap initialization
  kernel/range: Const-ify range_contains parameters
2024-11-22 12:33:52 -08:00
Linus Torvalds
40f48f82a1 configfs updates for Linux 6.13
- remove unused code (Dr. David Alan Gilbert)
  - improve item creation performance (Seamus Connor)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmc8w7wLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNInBAAw8SDXF/rsJPic44XqRr4+nSgFif0Bhd0l8Xu1EZl
 ujE6EDGpbpk35CPK2cqhCk6tboFDayNeACWur52/NHxm+fO/gQq22LZU618HFtxJ
 QgS40naf6PczCY3SgEe4pOp/KlpGiO7y/FbgGYF2mVsTXpLzFq6ZC/RkAc3iir5E
 VUrnXRfsw0B7uZ/ndS+kBh6q0wTHkGqppi8rf4IpK051b/b5boFG+Kn0UpwK0iLX
 47W7mwMgamAOWXHTHRxb/1dL7w5Q7YHvEqD39yHgs1iS+B9ooLZxSkdrkaigjWTM
 q6VQfemq9UQuDfbuAH8RKTbsopTxAUbPok/ioFe8SCiqbc/3CiU6ejQoIIWEh7Gy
 WUvD1ObMQKyP1T+rzblIzg1IU8XxddDjE/qpwZxhAKSmiihh6roRfjtTXmIGJdD4
 TbwaqIOROSFCHFSbjLraXZ2YqE0OznIT01ZaV9SYEfi4AdbbVyy7bt8YOGX6zd36
 Z73Z6G/t+ksL252KHQqtbUOTz2S3bVI3RePTqD6tl+J4sXLIR8CVC5zYn2z+gY8I
 0fDBGt1Ur6XHOf/RGAuLo5PFo9xRntgKKPIMjTah4BFYAhIGhPGlxiX0l/XeokuX
 3z/3zPJNX82o7QeI8bGo4qYz3oAs7PDsK4xmW2m6Ho8xXKVzLoohRQe8Aiex8o4o
 zQg=
 =nC4V
 -----END PGP SIGNATURE-----

Merge tag 'configfs-6.13-2024-11-19' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:

 - remove unused code (Dr. David Alan Gilbert)

 - improve item creation performance (Seamus Connor)

* tag 'configfs-6.13-2024-11-19' of git://git.infradead.org/users/hch/configfs:
  configfs: improve item creation performance
  configfs: remove unused configfs_hash_and_remove
2024-11-21 11:26:38 -08:00
Linus Torvalds
fc39fb5691 A few more patches to add sanity checks in jfs
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmc/WUoACgkQNqiEXrVA
 jGRbDQ/8DEhFbtbr0ugUJmStEfn+8oBXhsKhYZE1Y+U69SZT924Vn1gfbhxIBH0d
 W1aKcRVvmP0ASfplQemIyiB9sY4Nxd4CUlFD3RqycFR0fAyGIb7XccpkJBhms5/J
 itscgATuE8u5wms3CDByAiEj3kMu8Luf+jDTFVB2SNP0xPeqsqluXtTQwEFHKAmS
 4IhbAH6hQ9E+mqufID8WjhLyKZLhCe8PQ4ccsenNZo8km4xDAv5m0CZD1Ks6k87b
 3n6iG6Wk+5gjzGC/vGOJJIX3eZchMFhCwG+SkJ/7UKT475oKYDjgflpO2yVXWjXz
 mGTFdTLg+wNH+Jh9/nUNCN/2KsFAIx2wcaFmnLC7fmCJFKjVEIQV5Pm0eGi+vdt5
 hU/onwn8rlPeKD8OIh4zKGMzDZj2uX825lfVXlYJP64UE/rOPkJgd7MH8TtNiFlu
 lpxkhM/xpCMfyShUHMWe38C2VwjqMNhW5Mz2EoT9OqSFsTBXTpeoPZOFG+iiJ1Gc
 AFWYM9UrofVj4/bSOOVfDMGSqx8dGfGc/QtEDwvKmTsaZ9imfxDRGmQyTwwuBKmi
 EyuKUjwvIQmF8Z/fqLov+0JpW0XsJLmpVJJ3t451sJgzwrffob0afYrMsW/e7a+f
 zHDz+WSJBKYsb8Wq9owmY29oHGtaoNQ3IguxnbFmD2pkNNnmg/g=
 =V1oj
 -----END PGP SIGNATURE-----

Merge tag 'jfs-6.13' of github.com:kleikamp/linux-shaggy

Pull jfs updates from Dave Kleikamp:
 "A few more patches to add sanity checks in jfs"

* tag 'jfs-6.13' of github.com:kleikamp/linux-shaggy:
  jfs: add a check to prevent array-index-out-of-bounds in dbAdjTree
  jfs: xattr: check invalid xattr size more strictly
  jfs: fix array-index-out-of-bounds in jfs_readdir
  jfs: fix shift-out-of-bounds in dbSplit
  jfs: array-index-out-of-bounds fix in dtReadFirst
2024-11-21 09:59:59 -08:00
Linus Torvalds
6a550ae556 dlm for 6.13
- Fix recovery of locks that are being converted between PR/CW modes.
 - Fix cleanup of rsb list if recovery is interrupted during recover_members.
 - Fix null dereference in debug code if dlm api is called improperly.
 - Fix wrong args passed to trace function.
 - Move error checks out of add_to_waiters so the function can't fail.
 - Clean up some code for configfs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEcGkeEvkvjdvlR90nOBtzx/yAaaoFAmc/VzgACgkQOBtzx/yA
 aapoARAA2PziTYUrYOPkvGAhFb78Sa64s2ECL0EGJTaVMzwniNp3lIL4T6Pr7d3Y
 nSM+Kg/HMEXT5lC9jJHMfGrOLoUicfWlBPEmoWDe+eNvGYBQVXSFoKnOEATMoHx3
 OaH+UwbO1sXW13I0mflRTM/T4cTJr8BfaHsCf2yvp68VcZDXLo/HcbMX5AYxc1er
 7xOeCcEUOnAGtb4hG4Dm/PaFKcsC1k1GL/ePq/OgiEKVYNeI4k6MWmbpULjiZd7y
 kd6SfFqb+V/YTXWMTuheQ04eBxvMovzXjhGpxQLWyofOsij8lTjndHVm8AkZCMRo
 oqa40dh1GzN6llvsIMJbuXfvp7aQR9EcYU4jc5FLx1rLH0KJzbmKursOlvcV7oRn
 qBWixB3fUN8as47od86vpZIwkW/GTn7UTrrtu602lyp0pjhdq0DErYFwJuoYStti
 Q0Txf7S0mxqQ5jNvMEk3Rx0sV6zuwu2yq/QU5yiUa9eVUrukUuIBpdbHLdpXMgRR
 SDAENw70PsHobW29i9ipZnttNJ3DrG61beYdizLxMxfuaRK+532i8GETIW83wn0o
 Aj/B5EZPB8z8pbw0PvDFKDK4VzuVHkCBMVfdOGS8RgRVtYiyfyC2QJ+u7FQfu2U7
 LXvYqmFDH3VqqAYuYwMGpx1Iz9RgFipZCmauFyIdFmLL6ClWDp4=
 =5K+Y
 -----END PGP SIGNATURE-----

Merge tag 'dlm-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:

 - Fix recovery of locks that are being converted between PR/CW modes

 - Fix cleanup of rsb list if recovery is interrupted during
   recover_members

 - Fix null dereference in debug code if dlm api is called improperly

 - Fix wrong args passed to trace function

 - Move error checks out of add_to_waiters so the function can't fail

 - Clean up some code for configfs

* tag 'dlm-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: fix dlm_recover_members refcount on error
  dlm: fix recovery of middle conversions
  dlm: make add_to_waiters() that it can't fail
  dlm: dlm_config_info config fields to unsigned int
  dlm: use dlm_config as only cluster configuration
  dlm: handle port as __be16 network byte order
  dlm: disallow different configs nodeid storages
  dlm: fix possible lkb_resource null dereference
  dlm: fix swapped args sb_flags vs sb_status
2024-11-21 09:57:46 -08:00
Linus Torvalds
2dde263d81 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmc/WikACgkQnJ2qBz9k
 QNnZdwf9FfT95zhnNWk3ohNOh5BO0P/uTY2fNkQBDPLPY3Bi8nywPIjXYCDSOgX1
 SBV0rakkWp+rVO1/qkg5J1mUvBoefzT7O17rG0LfRw3zjHPX+XeO+e3Xf/kPmJHJ
 3fvN//VTZQ6uPcn8PWgLe8VVQqNXD3nlUrwz/JKaxyodsdm0ERej4QZjG6Cikotk
 aKuDPAnOiS37/lIFZGdJRca/rwJPwMekNt1SxVrnmin0/QfB/Uubba2+NNdQ+z3W
 SCA/26PK822T3ELB8BkfwpdINC17WUwDJlkC8qha/JRzDlxJC/ysr43fHn/7Adfb
 CthG8V4JDGm51jcC0qe0Yk2HV75U4A==
 =htHs
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "A couple of smaller random fsnotify fixes"

* tag 'fsnotify_for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Fix ordering of iput() and watched_objects decrement
  fsnotify: fix sending inotify event with unexpected filename
  fanotify: allow reporting errors on failure to open fd
  fsnotify, lsm: Decouple fsnotify from lsm
2024-11-21 09:55:45 -08:00
Linus Torvalds
c01f664e4c \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmc/WXkACgkQnJ2qBz9k
 QNnwjAf/c8K3Vhw9RuKMtPF0K+gC//0mLsq+WmgrtXfMLvbSymrACnwHFJzpNGeS
 iEqCYlCC7vlqzPXpsVRlFeHpM52oVnE/wFF0Hp1h/Y1oqbRSzur6iSl4epmmBN+K
 AsPoWEXco7ABqtrhoZb0b1n7io9VorHN4nLhO6KWD83nZAawJDWgSw0sNCqcT6to
 vVxR3baP/EhONxNquxXe2lxq26dMilehmTk4AOyYslNYb0iG4r18TPyNb7fmuuKG
 M+nFfMnM9EPH8lnmgx6Mg/X77d/eZoq4pMRmeqSsroB5k/AQJnNrGweNL1+yr7OY
 adWNOMGWdNNQXPFgGbL5yZwNZ64kRA==
 =Eq1B
 -----END PGP SIGNATURE-----

Merge tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull reiserfs removal from Jan Kara:
 "The deprecation period of reiserfs is ending at the end of this year
  so it is time to remove it"

* tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: The last commit
2024-11-21 09:50:18 -08:00
Linus Torvalds
597861d6cd \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmc/WPcACgkQnJ2qBz9k
 QNkRXwgAlN2gbf1O2Df3NFl7CXt5jXhlvQ9UdA77F/j1YFL90kieVrHvQY1yManO
 +Z36nwWPTaigR7COuF9oFpCZsp6H1YY8yeqlATE+kZBFqSFQjFP8jpgZ/1JObs0M
 C2DeNibrhYm81FJhnetXyTnXifUv20Fa7xwHv5ULtZjPF0MY6ry4zqy+t4fE68Uq
 OhGyjwrTW2k1Dv36Nx6CAt3KOWvM5KPHVBpMu2frSRwNB7XBEfmfeCjqISjYccPl
 WzHsEQof7Re14GZOa9M6G/efLdXAiwFJF/w2BFCqcijLpjbBrHheEH8N9WBWoATd
 wJLt94PRTK2uJqc3jQ3tANR3gQO9Xg==
 =XneM
 -----END PGP SIGNATURE-----

Merge tag 'for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota and isofs updates from Jan Kara:
 "Fix a memory leak in isofs and a cleanup of includes in quota"

* tag 'for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  dquot.c: get rid of include ../internal.h
  isofs: avoid memory leak in iocharset
2024-11-21 09:47:44 -08:00
Linus Torvalds
2edc8f933d New xfs code for 6.13
* convert perag to use xarrays
 * create a new generic allocation group structure
 * Add metadata inode dir trees
 * Create in-core rt allocation groups
 * Shard the RT section into allocation groups
 * Persist quota options with the enw metadata dir tree
 * Enable quota for RT volumes
 * Enable metadata directory trees
 * Some bugfixes
 
 Signed-off-by: Carlos Maiolino <cem@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQQMHYkcUKcy4GgPe2RGdaER5QtfpgUCZzyNwAAKCRBGdaER5Qtf
 psV3AYCncK/pVhFfKQSFbnCvgPSoAe7N9n0Wt5gmjy0Ill2mbQXVl9ADXkH6a015
 gcGM3t4BgIHLJQndL/Uz+3a0L5IriEb9QkAfzmx8t3vjiRBzBe3WfywEx9Yt7kZe
 xbxEJ2HQpA==
 =3ngC
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.13-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Carlos Maiolino:
 "The bulk of this pull request is a major rework that Darrick and
  Christoph have been doing on XFS's real-time volume, coupled with a
  few features to support this rework. It does also includes some bug
  fixes.

   - convert perag to use xarrays

   - create a new generic allocation group structure

   - add metadata inode dir trees

   - create in-core rt allocation groups

   - shard the RT section into allocation groups

   - persist quota options with the enw metadata dir tree

   - enable quota for RT volumes

   - enable metadata directory trees

   - some bugfixes"

* tag 'xfs-6.13-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (146 commits)
  xfs: port ondisk structure checks from xfs/122 to the kernel
  xfs: separate space btree structures in xfs_ondisk.h
  xfs: convert struct typedefs in xfs_ondisk.h
  xfs: enable metadata directory feature
  xfs: enable realtime quota again
  xfs: update sb field checks when metadir is turned on
  xfs: reserve quota for realtime files correctly
  xfs: create quota preallocation watermarks for realtime quota
  xfs: report realtime block quota limits on realtime directories
  xfs: persist quota flags with metadir
  xfs: advertise realtime quota support in the xqm stat files
  xfs: scrub quota file metapaths
  xfs: fix chown with rt quota
  xfs: use metadir for quota inodes
  xfs: refactor xfs_qm_destroy_quotainos
  xfs: use rtgroup busy extent list for FITRIM
  xfs: implement busy extent tracking for rtgroups
  xfs: port the perag discard code to handle generic groups
  xfs: move the min and max group block numbers to xfs_group
  xfs: adjust min_block usage in xfs_verify_agbno
  ...
2024-11-21 09:20:07 -08:00
Linus Torvalds
90a19b744d Changes since last update:
- Add SEEK_{DATA,HOLE} support;
 
  - Free redundant pclusters if no cached compressed data is valid;
 
  - Add sysfs entry to drop internal caches;
 
  - Several bugfixes & cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmc8nosRHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qoxDg/+JvYJjf0xXzCrSaIoSq7Xsfw0d1VLMHqP
 /ZSNCvixoUnkVouPNSR2HaBlX7xhiVSXaaAFuzAOM3Zf7UO/delVcwQNMeYYNTC2
 Wk3ndMZR0C7jP6CpSRgU7of/KOO+W+8levFMJa/6TJCKb3IKgqrU894aiDNEFYMN
 LaZjP42rFmX0MLahAt5byqbKWMfr7YrGkcCxp8agQ3wFe+ssRXO0jU8MEflDk0zV
 5Ar09HxzVSt5MRdvByPhcE66A4RBmdc71o77mN7uCFDxE8MdOxUjgyKBtcstYfUV
 A2ewHhTNt5zt9QaEA1NBK8mzhj2oAHjO8DsIZYgCyARLBvUk8RMeu6LGeu339Qjs
 n8nqpAg6v3rSCGVygBWNqovXaCzynhp3pt1Jh2463BAJXQc71pksPBXxbbzJgQVs
 O7pNJ1H+uQku/B0FfUCOAn1QgInWoGXMalZsmbh7ar9p0XLiVbXtNtUrrXtf0Ehd
 Px5OGmx3Cm3SwyGS8X18o4PKN3fAwc6Ff/C0EbJS+yl0lI0uN4IoVHywhabg9Njx
 TstfHBwz35emO6jI11XkfPgbrRLpwehWWTSu5yLTEpOC8jhsYXtUIEB7KgIz0llG
 IyFFWpnILUxEDkHvQoluHmTc9sACAAAG27anIIao5gIvhua43N0rGeara4BtD+Ni
 J3ZgND6yFA4=
 =CRuc
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "There is no outstanding feature for this cycle. The most useful
  changes are SEEK_{DATA,HOLE} support and some decompression
  micro-optimization. Other than those, there are some bugfixes and
  cleanups as usual:

   - Add SEEK_{DATA,HOLE} support

   - Free redundant pclusters if no cached compressed data is valid

   - Add sysfs entry to drop internal caches

   - Several bugfixes & cleanups"

* tag 'erofs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: handle NONHEAD !delta[1] lclusters gracefully
  erofs: clarify direct I/O support
  erofs: fix blksize < PAGE_SIZE for file-backed mounts
  erofs: get rid of `buf->kmap_type`
  erofs: fix file-backed mounts over FUSE
  erofs: simplify definition of the log functions
  erofs: add sysfs node to drop internal caches
  erofs: free pclusters if no cached folio is attached
  erofs: sunset `struct erofs_workgroup`
  erofs: move erofs_workgroup operations into zdata.c
  erofs: get rid of erofs_{find,insert}_workgroup
  erofs: add SEEK_{DATA,HOLE} support
2024-11-21 09:17:33 -08:00
Paul Aurich
a9685b409a smb: prevent use-after-free due to open_cached_dir error paths
If open_cached_dir() encounters an error parsing the lease from the
server, the error handling may race with receiving a lease break,
resulting in open_cached_dir() freeing the cfid while the queued work is
pending.

Update open_cached_dir() to drop refs rather than directly freeing the
cfid.

Have cached_dir_lease_break(), cfids_laundromat_worker(), and
invalidate_all_cached_dirs() clear has_lease immediately while still
holding cfids->cfid_list_lock, and then use this to also simplify the
reference counting in cfids_laundromat_worker() and
invalidate_all_cached_dirs().

Fixes this KASAN splat (which manually injects an error and lease break
in open_cached_dir()):

==================================================================
BUG: KASAN: slab-use-after-free in smb2_cached_lease_break+0x27/0xb0
Read of size 8 at addr ffff88811cc24c10 by task kworker/3:1/65

CPU: 3 UID: 0 PID: 65 Comm: kworker/3:1 Not tainted 6.12.0-rc6-g255cf264e6e5-dirty #87
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Workqueue: cifsiod smb2_cached_lease_break
Call Trace:
 <TASK>
 dump_stack_lvl+0x77/0xb0
 print_report+0xce/0x660
 kasan_report+0xd3/0x110
 smb2_cached_lease_break+0x27/0xb0
 process_one_work+0x50a/0xc50
 worker_thread+0x2ba/0x530
 kthread+0x17c/0x1c0
 ret_from_fork+0x34/0x60
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 2464:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0xaa/0xb0
 open_cached_dir+0xa7d/0x1fb0
 smb2_query_path_info+0x43c/0x6e0
 cifs_get_fattr+0x346/0xf10
 cifs_get_inode_info+0x157/0x210
 cifs_revalidate_dentry_attr+0x2d1/0x460
 cifs_getattr+0x173/0x470
 vfs_statx_path+0x10f/0x160
 vfs_statx+0xe9/0x150
 vfs_fstatat+0x5e/0xc0
 __do_sys_newfstatat+0x91/0xf0
 do_syscall_64+0x95/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Freed by task 2464:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x51/0x70
 kfree+0x174/0x520
 open_cached_dir+0x97f/0x1fb0
 smb2_query_path_info+0x43c/0x6e0
 cifs_get_fattr+0x346/0xf10
 cifs_get_inode_info+0x157/0x210
 cifs_revalidate_dentry_attr+0x2d1/0x460
 cifs_getattr+0x173/0x470
 vfs_statx_path+0x10f/0x160
 vfs_statx+0xe9/0x150
 vfs_fstatat+0x5e/0xc0
 __do_sys_newfstatat+0x91/0xf0
 do_syscall_64+0x95/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Last potentially related work creation:
 kasan_save_stack+0x33/0x60
 __kasan_record_aux_stack+0xad/0xc0
 insert_work+0x32/0x100
 __queue_work+0x5c9/0x870
 queue_work_on+0x82/0x90
 open_cached_dir+0x1369/0x1fb0
 smb2_query_path_info+0x43c/0x6e0
 cifs_get_fattr+0x346/0xf10
 cifs_get_inode_info+0x157/0x210
 cifs_revalidate_dentry_attr+0x2d1/0x460
 cifs_getattr+0x173/0x470
 vfs_statx_path+0x10f/0x160
 vfs_statx+0xe9/0x150
 vfs_fstatat+0x5e/0xc0
 __do_sys_newfstatat+0x91/0xf0
 do_syscall_64+0x95/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The buggy address belongs to the object at ffff88811cc24c00
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
 freed 1024-byte region [ffff88811cc24c00, ffff88811cc25000)

Cc: stable@vger.kernel.org
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:45:50 -06:00
Paul Aurich
7afb867336 smb: Don't leak cfid when reconnect races with open_cached_dir
open_cached_dir() may either race with the tcon reconnection even before
compound_send_recv() or directly trigger a reconnection via
SMB2_open_init() or SMB_query_info_init().

The reconnection process invokes invalidate_all_cached_dirs() via
cifs_mark_open_files_invalid(), which removes all cfids from the
cfids->entries list but doesn't drop a ref if has_lease isn't true. This
results in the currently-being-constructed cfid not being on the list,
but still having a refcount of 2. It leaks if returned from
open_cached_dir().

Fix this by setting cfid->has_lease when the ref is actually taken; the
cfid will not be used by other threads until it has a valid time.

Addresses these kmemleaks:

unreferenced object 0xffff8881090c4000 (size 1024):
  comm "bash", pid 1860, jiffies 4295126592
  hex dump (first 32 bytes):
    00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de  ........".......
    00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff  ..E"......O.....
  backtrace (crc 6f58c20f):
    [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350
    [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0
    [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50
    [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0
    [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200
    [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0
    [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8881044fdcf8 (size 8):
  comm "bash", pid 1860, jiffies 4295126592
  hex dump (first 8 bytes):
    00 cc cc cc cc cc cc cc                          ........
  backtrace (crc 10c106a9):
    [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480
    [<ffffffff8b7d7256>] kstrdup+0x36/0x60
    [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0
    [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50
    [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0
    [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200
    [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0
    [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

And addresses these BUG splats when unmounting the SMB filesystem:

BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100
Modules linked in:
CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty #49
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:umount_check+0xd0/0x100
Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41
RSP: 0018:ffff88811cc27978 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40
RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3
R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08
R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0
FS:  00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 d_walk+0x6a/0x530
 shrink_dcache_for_umount+0x6a/0x200
 generic_shutdown_super+0x52/0x2a0
 kill_anon_super+0x22/0x40
 cifs_kill_sb+0x159/0x1e0
 deactivate_locked_super+0x66/0xe0
 cleanup_mnt+0x140/0x210
 task_work_run+0xfb/0x170
 syscall_exit_to_user_mode+0x29f/0x2b0
 do_syscall_64+0xa1/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f23bfb93ae7
Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050
RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0
RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610
R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000
 </TASK>
irq event stamp: 1163486
hardirqs last  enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60
hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0
softirqs last  enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990
softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0
---[ end trace 0000000000000000 ]---

VFS: Busy inodes after unmount of cifs (cifs)
------------[ cut here ]------------
kernel BUG at fs/super.c:661!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G        W          6.12.0-rc4-g850925a8133c-dirty #49
Tainted: [W]=WARN
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:generic_shutdown_super+0x290/0x2a0
Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90
RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246
RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8
RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319
R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0
R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 kill_anon_super+0x22/0x40
 cifs_kill_sb+0x159/0x1e0
 deactivate_locked_super+0x66/0xe0
 cleanup_mnt+0x140/0x210
 task_work_run+0xfb/0x170
 syscall_exit_to_user_mode+0x29f/0x2b0
 do_syscall_64+0xa1/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f23bfb93ae7
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:generic_shutdown_super+0x290/0x2a0
Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90
RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246
RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8
RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319
R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0
R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0

This reproduces eventually with an SMB mount and two shells running
these loops concurrently

- while true; do
      cd ~; sleep 1;
      for i in {1..3}; do cd /mnt/test/subdir;
          echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1;
      done;
      echo ...;
  done
- while true; do
      iptables -F OUTPUT; mount -t cifs -a;
      for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done;
      iptables -I OUTPUT -p tcp --dport 445 -j DROP;
      sleep 10
      echo "unmounting"; umount -l -t cifs -a; echo "done unmounting";
      sleep 20
      echo "recovering"; iptables -F OUTPUT;
      sleep 10;
  done

Fixes: ebe98f1447 ("cifs: enable caching of directories for which a lease is held")
Fixes: 5c86919455 ("smb: client: fix use-after-free in smb2_query_info_compound()")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:45:50 -06:00
Paulo Alcantara
0812340811 smb: client: handle max length for SMB symlinks
We can't use PATH_MAX for SMB symlinks because

  (1) Windows Server will fail FSCTL_SET_REPARSE_POINT with
      STATUS_IO_REPARSE_DATA_INVALID when input buffer is larger than
      16K, as specified in MS-FSA 2.1.5.10.37.

  (2) The client won't be able to parse large SMB responses that
      includes SMB symlink path within SMB2_CREATE or SMB2_IOCTL
      responses.

Fix this by defining a maximum length value (4060) for SMB symlinks
that both client and server can handle.

Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:45:50 -06:00
Paulo Alcantara
9f544d26b1 smb: client: get rid of bounds check in SMB2_ioctl_init()
smb2_set_next_command() no longer squashes request iovs into a single
iov, so the bounds check can be dropped.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:45:50 -06:00
Paulo Alcantara
bc925c1216 smb: client: improve compound padding in encryption
After commit f7f291e14d ("cifs: fix oops during encryption"), the
encryption layer can handle vmalloc'd buffers as well as kmalloc'd
buffers, so there is no need to inefficiently squash request iovs
into a single one to handle padding in compound requests.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:45:49 -06:00
Steve French
9ed9d83a51 smb3: request handle caching when caching directories
This client was only requesting READ caching, not READ and HANDLE caching
in the LeaseState on the open requests we send for directories.  To
delay closing a handle (e.g. for caching directory contents) we should
be requesting HANDLE as well as READ (as we already do for deferred
close of files).   See MS-SMB2 3.3.1.4 e.g.

Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:45:15 -06:00
Pali Rohár
0d6b0d2e38 cifs: Recognize SFU char/block devices created by Windows NFS server on Windows Server <<2012
Windows NFS server versions on Windows Server older than 2012 release use
for storing char and block devices modified SFU format, not compatible with
the original SFU. Windows NFS server on Windows Server 2012 and new
versions use different format (reparse points), not related to SFU-style.

SFU / SUA / Interix subsystem stores the major and major numbers as pair of
64-bit integer, but Windows NFS server stores as pair of 32-bit integers.

Which makes char and block devices between Windows NFS server <<2012 and
Windows SFU/SUA/Interix subsytem incompatible.

So improve Linux SMB client.

When SFU mode is enabled (mount option -o sfu is specified) then recognize
also these kind of char and block devices and its major and minor numbers,
which are used by Windows Server versions older than 2012.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:44:29 -06:00
Ritvik Budhiraja
db363b0a1d CIFS: New mount option for cifs.upcall namespace resolution
In the current implementation, the SMB filesystem on a mount point can
trigger upcalls from the kernel to the userspace to enable certain
functionalities like spnego, dns_resolution, amongst others. These upcalls
usually either happen in the context of the mount or in the context of an
application/user. The upcall handler for cifs, cifs.upcall already has
existing code which switches the namespaces to the caller's namespace
before handling the upcall. This behaviour is expected for scenarios like
multiuser mounts, but might not cover all single user scenario with
services such as Kubernetes, where the mount can happen from different
locations such as on the host, from an app container, or a driver pod
which does the mount on behalf of a different pod.

This patch introduces a new mount option called upcall_target, to
customise the upcall behaviour. upcall_target can take 'mount' and 'app'
as possible values. This aids use cases like Kubernetes where the mount
happens on behalf of the application in another container altogether.
Having this new mount option allows the mount command to specify where the
upcall should happen: 'mount' for resolving the upcall to the host
namespace, and 'app' for resolving the upcall to the ns of the calling
thread. This will enable both the scenarios where the Kerberos credentials
can be found on the application namespace or the host namespace to which
just the mount operation is "delegated".

Reviewed-by: Shyam Prasad <shyam.prasad@microsoft.com>
Reviewed-by: Bharath S M <bharathsm@microsoft.com>
Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:44:03 -06:00
Dan Carpenter
7a2158b73c smb/client: Prevent error pointer dereference
The cifs_sb_tlink() function can return error pointers, but this code
dereferences it before checking for error pointers.  Re-order the code
to fix that.

Fixes: 0f9b6b045bb2 ("fs/smb/client: implement chmod() for SMB3 POSIX Extensions")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:43:23 -06:00
Ralph Boehme
d413eabff1 fs/smb/client: implement chmod() for SMB3 POSIX Extensions
The NT ACL format for an SMB3 POSIX Extensions chmod() is a single ACE with the
magic S-1-5-88-3-mode SID:

  NT Security Descriptor
      Revision: 1
      Type: 0x8004, Self Relative, DACL Present
      Offset to owner SID: 56
      Offset to group SID: 124
      Offset to SACL: 0
      Offset to DACL: 20
      Owner: S-1-5-21-3177838999-3893657415-1037673384-1000
      Group: S-1-22-2-1000
      NT User (DACL) ACL
          Revision: NT4 (2)
          Size: 36
          Num ACEs: 1
          NT ACE: S-1-5-88-3-438, flags 0x00, Access Allowed, mask 0x00000000
              Type: Access Allowed
              NT ACE Flags: 0x00
              Size: 28
              Access required: 0x00000000
              SID: S-1-5-88-3-438

Owner and Group should be NULL, but the server is not required to fail the
request if they are present.

Signed-off-by: Ralph Boehme <slow@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:43:01 -06:00
Paul Aurich
128630e1db smb: cached directories can be more than root file handle
Update this log message since cached fids may represent things other
than the root of a mount.

Fixes: e4029e0726 ("cifs: find and use the dentry for cached non-root directories also")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-21 10:41:35 -06:00
Linus Torvalds
fcc79e1714 Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new
 behavior is disabled by default, regression risk should be contained.
 
 Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
 default value from PTP_1588_CLOCK_KVM, as the first is intended to be
 a more reliable replacement for the latter.
 
 Core
 ----
 
  - Started a very large, in-progress, effort to make the RTNL lock
    scope per network-namespace, thus reducing the lock contention
    significantly in the containerized use-case, comprising:
    - RCU-ified some relevant slices of the FIB control path
    - introduce basic per netns locking helpers
    - namespacified the IPv4 address hash table
    - remove rtnl_register{,_module}() in favour of rtnl_register_many()
    - refactor rtnl_{new,del,set}link() moving as much validation as
      possible out of RTNL lock
    - convert all phonet doit() and dumpit() handlers to RCU
    - convert IPv4 addresses manipulation to per-netns RTNL
    - convert virtual interface creation to per-netns RTNL
    the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
    knob, disabled by default ad interim.
 
  - Introduce NAPI suspension, to efficiently switching between busy
    polling (NAPI processing suspended) and normal processing.
 
  - Migrate the IPv4 routing input, output and control path from direct
    ToS usage to DSCP macros. This is a work in progress to make ECN
    handling consistent and reliable.
 
  - Add drop reasons support to the IPv4 rotue input path, allowing
    better introspection in case of packets drop.
 
  - Make FIB seqnum lockless, dropping RTNL protection for read
    access.
 
  - Make inet{,v6} addresses hashing less predicable.
 
  - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
    and timestamps
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add small file operations for debugfs, to reduce the struct ops size.
 
  - Refactoring and optimization for the implementation of page_frag API,
    This is a preparatory work to consolidate the page_frag
    implementation.
 
 Netfilter
 ---------
 
  - Optimize set element transactions to reduce memory consumption
 
  - Extended netlink error reporting for attribute parser failure.
 
  - Make legacy xtables configs user selectable, giving users
    the option to configure iptables without enabling any other config.
 
  - Address a lot of false-positive RCU issues, pointed by recent
    CI improvements.
 
 BPF
 ---
 
  - Put xsk sockets on a struct diet and add various cleanups. Overall,
    this helps to bump performance by 12% for some workloads.
 
  - Extend BPF selftests to increase coverage of XDP features in
    combination with BPF cpumap.
 
  - Optimize and homogenize bpf_csum_diff helper for all archs and also
    add a batch of new BPF selftests for it.
 
  - Extend netkit with an option to delegate skb->{mark,priority}
    scrubbing to its BPF program.
 
  - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
    programs.
 
 Protocols
 ---------
 
  - Introduces 4-tuple hash for connected udp sockets, speeding-up
    significantly connected sockets lookup.
 
  - Add a fastpath for some TCP timers that usually expires after close,
    the socket lock contention.
 
  - Add inbound and outbound xfrm state caches to speed up state lookups.
 
  - Avoid sending MPTCP advertisements on stale subflows, reducing
    risks on loosing them.
 
  - Make neighbours table flushing more scalable, maintaining per device
    neigh lists.
 
 Driver API
 ----------
 
  - Introduce a unified interface to configure transmission H/W shaping,
    and expose it to user-space via generic-netlink.
 
  - Add support for per-NAPI config via netlink. This makes napi
    configuration persistent across queues removal and re-creation.
    Requires driver updates, currently supported drivers are:
    nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
 
  - Add ethtool support for writing SFP / PHY firmware blocks.
 
  - Track RSS context allocation from ethtool core.
 
  - Implement support for mirroring to DSA CPU port, via TC mirror
    offload.
 
  - Consolidate FDB updates notification, to avoid duplicates on
    device-specific entries.
 
  - Expose DPLL clock quality level to the user-space.
 
  - Support master-slave PHY config via device tree.
 
 Tests and tooling
 -----------------
 
  - forwarding: introduce deferred commands, to simplify
    the cleanup phase
 
 Drivers
 -------
 
  - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
    Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
    IRQs and queues to NAPI IDs, allowing busy polling and better
    introspection.
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - mlx5:
        - a large refactor to implement support for cross E-Switch
          scheduling
        - refactor H/W conter management to let it scale better
        - H/W GRO cleanups
    - Intel (100G, ice)::
      - adds support for ethtool reset
      - implement support for per TX queue H/W shaping
    - AMD/Solarflare:
      - implement per device queue stats support
    - Broadcom (bnxt):
      - improve wildcard l4proto on IPv4/IPv6 ntuple rules
    - Marvell Octeon:
      - Adds representor support for each Resource Virtualization Unit
        (RVU) device.
    - Hisilicon:
      - adds support for the BMC Gigabit Ethernet
    - IBM (EMAC):
      - driver cleanup and modernization
    - Cisco (VIC):
      - raise the queues number limit to 256
 
  - Ethernet virtual:
    - Google vNIC:
      - implements page pool support
    - macsec:
      - inherit lower device's features and TSO limits when offloading
    - virtio_net:
      - enable premapped mode by default
      - support for XDP socket(AF_XDP) zerocopy TX
    - wireguard:
      - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
        packets.
 
  - Ethernet NICs embedded and virtual:
    - Broadcom ASP:
      - enable software timestamping
    - Freescale:
      - add enetc4 PF driver
    - MediaTek: Airoha SoC:
      - implement BQL support
    - RealTek r8169:
      - enable TSO by default on r8168/r8125
      - implement extended ethtool stats
    - Renesas AVB:
      - enable TX checksum offload
    - Synopsys (stmmac):
      - support header splitting for vlan tagged packets
      - move common code for DWMAC4 and DWXGMAC into a separate FPE
        module.
      - Add the dwmac driver support for T-HEAD TH1520 SoC
    - Synopsys (xpcs):
      - driver refactor and cleanup
    - TI:
      - icssg_prueth: add VLAN offload support
    - Xilinx emaclite:
      - adds clock support
 
  - Ethernet switches:
    - Microchip:
      - implement support for the lan969x Ethernet switch family
      - add LAN9646 switch support to KSZ DSA driver
 
  - Ethernet PHYs:
    - Marvel: 88q2x: enable auto negotiation
    - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
 
  - PTP:
    - Add support for the Amazon virtual clock device
    - Add PtP driver for s390 clocks
 
  - WiFi:
    - mac80211
      - EHT 1024 aggregation size for transmissions
      - new operation to indicate that a new interface is to be added
      - support radio separation of multi-band devices
      - move wireless extension spy implementation to libiw
    - Broadcom:
      - brcmfmac: optional LPO clock support
    - Microchip:
      - add support for Atmel WILC3000
    - Qualcomm (ath12k):
      - firmware coredump collection support
      - add debugfs support for a multitude of statistics
    - Qualcomm (ath5k):
      -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
    - Realtek:
      - rtw88: 8821au and 8812au USB adapters support
      - rtw89: add thermal protection
      - rtw89: fine tune BT-coexsitence to improve user experience
      - rtw89: firmware secure boot for WiFi 6 chip
 
  - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
 wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
 vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
 jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
 b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
 kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
 BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
 lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
 Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
 eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
 THcvVM+bupEZ
 =GzPr
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most significant set of changes is the per netns RTNL. The new
  behavior is disabled by default, regression risk should be contained.

  Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
  default value from PTP_1588_CLOCK_KVM, as the first is intended to be
  a more reliable replacement for the latter.

  Core:

   - Started a very large, in-progress, effort to make the RTNL lock
     scope per network-namespace, thus reducing the lock contention
     significantly in the containerized use-case, comprising:
       - RCU-ified some relevant slices of the FIB control path
       - introduce basic per netns locking helpers
       - namespacified the IPv4 address hash table
       - remove rtnl_register{,_module}() in favour of
         rtnl_register_many()
       - refactor rtnl_{new,del,set}link() moving as much validation as
         possible out of RTNL lock
       - convert all phonet doit() and dumpit() handlers to RCU
       - convert IPv4 addresses manipulation to per-netns RTNL
       - convert virtual interface creation to per-netns RTNL
     the per-netns lock infrastructure is guarded by the
     CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.

   - Introduce NAPI suspension, to efficiently switching between busy
     polling (NAPI processing suspended) and normal processing.

   - Migrate the IPv4 routing input, output and control path from direct
     ToS usage to DSCP macros. This is a work in progress to make ECN
     handling consistent and reliable.

   - Add drop reasons support to the IPv4 rotue input path, allowing
     better introspection in case of packets drop.

   - Make FIB seqnum lockless, dropping RTNL protection for read access.

   - Make inet{,v6} addresses hashing less predicable.

   - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
     and timestamps

  Things we sprinkled into general kernel code:

   - Add small file operations for debugfs, to reduce the struct ops
     size.

   - Refactoring and optimization for the implementation of page_frag
     API, This is a preparatory work to consolidate the page_frag
     implementation.

  Netfilter:

   - Optimize set element transactions to reduce memory consumption

   - Extended netlink error reporting for attribute parser failure.

   - Make legacy xtables configs user selectable, giving users the
     option to configure iptables without enabling any other config.

   - Address a lot of false-positive RCU issues, pointed by recent CI
     improvements.

  BPF:

   - Put xsk sockets on a struct diet and add various cleanups. Overall,
     this helps to bump performance by 12% for some workloads.

   - Extend BPF selftests to increase coverage of XDP features in
     combination with BPF cpumap.

   - Optimize and homogenize bpf_csum_diff helper for all archs and also
     add a batch of new BPF selftests for it.

   - Extend netkit with an option to delegate skb->{mark,priority}
     scrubbing to its BPF program.

   - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
     programs.

  Protocols:

   - Introduces 4-tuple hash for connected udp sockets, speeding-up
     significantly connected sockets lookup.

   - Add a fastpath for some TCP timers that usually expires after
     close, the socket lock contention.

   - Add inbound and outbound xfrm state caches to speed up state
     lookups.

   - Avoid sending MPTCP advertisements on stale subflows, reducing
     risks on loosing them.

   - Make neighbours table flushing more scalable, maintaining per
     device neigh lists.

  Driver API:

   - Introduce a unified interface to configure transmission H/W
     shaping, and expose it to user-space via generic-netlink.

   - Add support for per-NAPI config via netlink. This makes napi
     configuration persistent across queues removal and re-creation.
     Requires driver updates, currently supported drivers are:
     nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.

   - Add ethtool support for writing SFP / PHY firmware blocks.

   - Track RSS context allocation from ethtool core.

   - Implement support for mirroring to DSA CPU port, via TC mirror
     offload.

   - Consolidate FDB updates notification, to avoid duplicates on
     device-specific entries.

   - Expose DPLL clock quality level to the user-space.

   - Support master-slave PHY config via device tree.

  Tests and tooling:

   - forwarding: introduce deferred commands, to simplify the cleanup
     phase

  Drivers:

   - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
     Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
     IRQs and queues to NAPI IDs, allowing busy polling and better
     introspection.

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - mlx5:
           - a large refactor to implement support for cross E-Switch
             scheduling
           - refactor H/W conter management to let it scale better
           - H/W GRO cleanups
      - Intel (100G, ice)::
         - add support for ethtool reset
         - implement support for per TX queue H/W shaping
      - AMD/Solarflare:
         - implement per device queue stats support
      - Broadcom (bnxt):
         - improve wildcard l4proto on IPv4/IPv6 ntuple rules
      - Marvell Octeon:
         - Add representor support for each Resource Virtualization Unit
           (RVU) device.
      - Hisilicon:
         - add support for the BMC Gigabit Ethernet
      - IBM (EMAC):
         - driver cleanup and modernization
      - Cisco (VIC):
         - raise the queues number limit to 256

   - Ethernet virtual:
      - Google vNIC:
         - implement page pool support
      - macsec:
         - inherit lower device's features and TSO limits when
           offloading
      - virtio_net:
         - enable premapped mode by default
         - support for XDP socket(AF_XDP) zerocopy TX
      - wireguard:
         - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
           packets.

   - Ethernet NICs embedded and virtual:
      - Broadcom ASP:
         - enable software timestamping
      - Freescale:
         - add enetc4 PF driver
      - MediaTek: Airoha SoC:
         - implement BQL support
      - RealTek r8169:
         - enable TSO by default on r8168/r8125
         - implement extended ethtool stats
      - Renesas AVB:
         - enable TX checksum offload
      - Synopsys (stmmac):
         - support header splitting for vlan tagged packets
         - move common code for DWMAC4 and DWXGMAC into a separate FPE
           module.
         - add dwmac driver support for T-HEAD TH1520 SoC
      - Synopsys (xpcs):
         - driver refactor and cleanup
      - TI:
         - icssg_prueth: add VLAN offload support
      - Xilinx emaclite:
         - add clock support

   - Ethernet switches:
      - Microchip:
         - implement support for the lan969x Ethernet switch family
         - add LAN9646 switch support to KSZ DSA driver

   - Ethernet PHYs:
      - Marvel: 88q2x: enable auto negotiation
      - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2

   - PTP:
      - Add support for the Amazon virtual clock device
      - Add PtP driver for s390 clocks

   - WiFi:
      - mac80211
         - EHT 1024 aggregation size for transmissions
         - new operation to indicate that a new interface is to be added
         - support radio separation of multi-band devices
         - move wireless extension spy implementation to libiw
      - Broadcom:
         - brcmfmac: optional LPO clock support
      - Microchip:
         - add support for Atmel WILC3000
      - Qualcomm (ath12k):
         - firmware coredump collection support
         - add debugfs support for a multitude of statistics
      - Qualcomm (ath5k):
         -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
      - Realtek:
         - rtw88: 8821au and 8812au USB adapters support
         - rtw89: add thermal protection
         - rtw89: fine tune BT-coexsitence to improve user experience
         - rtw89: firmware secure boot for WiFi 6 chip

   - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature"

* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
  mm: page_frag: fix a compile error when kernel is not compiled
  Documentation: tipc: fix formatting issue in tipc.rst
  selftests: nic_performance: Add selftest for performance of NIC driver
  selftests: nic_link_layer: Add selftest case for speed and duplex states
  selftests: nic_link_layer: Add link layer selftest for NIC driver
  bnxt_en: Add FW trace coredump segments to the coredump
  bnxt_en: Add a new ethtool -W dump flag
  bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
  bnxt_en: Add functions to copy host context memory
  bnxt_en: Do not free FW log context memory
  bnxt_en: Manage the FW trace context memory
  bnxt_en: Allocate backing store memory for FW trace logs
  bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
  bnxt_en: Refactor bnxt_free_ctx_mem()
  bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
  bnxt_en: Update firmware interface spec to 1.10.3.85
  selftests/bpf: Add some tests with sockmap SK_PASS
  bpf: fix recursive lock when verdict program return SK_PASS
  wireguard: device: support big tcp GSO
  wireguard: selftests: load nf_conntrack if not present
  ...
2024-11-21 08:28:08 -08:00
Chao Yu
3fc5d5a182 f2fs: fix to shrink read extent node in batches
We use rwlock to protect core structure data of extent tree during
its shrink, however, if there is a huge number of extent nodes in
extent tree, during shrink of extent tree, it may hold rwlock for
a very long time, which may trigger kernel hang issue.

This patch fixes to shrink read extent node in batches, so that,
critical region of the rwlock can be shrunk to avoid its extreme
long time hold.

Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:16:54 +00:00
Chao Yu
81520c684c f2fs: print message if fscorrupted was found in f2fs_new_node_page()
If fs corruption occurs in f2fs_new_node_page(), let's print
more information about corrupted metadata into kernel log.

Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead
of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not
NULL_ADDR which means nat bitmap and nat entry is inconsistent.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:16:39 +00:00
Sheng Yong
f88c7904b5 f2fs: clear SBI_POR_DOING before initing inmem curseg
SBI_POR_DOING can be cleared after recovery is completed, so that
changes made before recovery can be persistent, and subsequent
errors can be recorded into cp/sb.

Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:14:07 +00:00
Sheng Yong
1015035609 f2fs: fix changing cursegs if recovery fails on zoned device
Fsync data recovery attempts to check and fix write pointer consistency
of cursegs and all other zones. If the write pointers of cursegs are
unaligned, cursegs are changed to new sections.

If recovery fails, zone write pointers are still checked and fixed,
but the latest checkpoint cannot be written back. Additionally, retry-
mount skips recovery and rolls back to reuse the old cursegs whose
zones are already finished. This can lead to unaligned write later.

This patch addresses the issue by leaving writer pointers untouched if
recovery fails. When retry-mount is performed, cursegs and other zones
are checked and fixed after skipping recovery.

Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:13:53 +00:00
Daeho Jeong
a35749b1ed f2fs: adjust unusable cap before checkpoint=disable mode
The unusable cap value must be adjusted before checking whether
checkpoint=disable is feasible.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Chao Yu
6787a82245 f2fs: fix to requery extent which cross boundary of inquiry
dd if=/dev/zero of=file bs=4k count=5
xfs_io file -c "fiemap -v 2 16384"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..31]:         139272..139303      32 0x1000
     1: [32..39]:        139304..139311       8 0x1001
xfs_io file -c "fiemap -v 0 16384"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..31]:         139272..139303      32 0x1000
xfs_io file -c "fiemap -v 0 16385"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..39]:         139272..139311      40 0x1001

There are two problems:
- continuous extent is split to two
- FIEMAP_EXTENT_LAST is missing in last extent

The root cause is: if upper boundary of inquiry crosses extent,
f2fs_map_blocks() will truncate length of returned extent to
F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter
extent or hole to make sure current extent is last or not.

In order to fix this issue, once we found an extent locates
in the end of inquiry range by f2fs_map_blocks(), we need to
expand inquiry range to requiry.

Cc: stable@vger.kernel.org
Fixes: 7f63eb77af ("f2fs: report unwritten area in f2fs_fiemap")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Zhiguo Niu
77569f785c f2fs: fix to adjust appropriate length for fiemap
If user give a file size as "length" parameter for fiemap
operations, but if this size is non-block size aligned,
it will show 2 segments fiemap results even this whole file
is contiguous on disk, such as the following results:

 ./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
        logical addr.    physical addr.   length           flags
0       0000000000000000 0000000020baa000 0000000000004000 00001000
1       0000000000004000 0000000020bae000 0000000000001000 00001001

after this patch:
./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
    logical addr.    physical addr.   length           flags
0    0000000000000000 00000000315f3000 0000000000005000 00001001

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Chao Yu
7461f37094 f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
f2fs doesn't support different blksize in one instance, so
bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK
and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES
instead for cleanup.

Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Chao Yu
3273d8ad94 f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
It missed to cast variable to unsigned long long type before
bit shift, which will cause overflow, fix it.

Fixes: f7ef9b83b5 ("f2fs: introduce macros to convert bytes and blocks in f2fs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Daniel Yang
789ca0eb47 f2fs: replace deprecated strcpy with strscpy
strcpy is deprecated. Kernel docs recommend replacing strcpy with
strscpy. The function strcpy() return value isn't used so there
shouldn't be an issue replacing with the safer alternative strscpy.

Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Jaegeuk Kim
acff9409dd Revert "f2fs: remove unreachable lazytime mount option parsing"
This reverts commit 54f43a10fa.

The above commit broke the lazytime mount, given

mount("/dev/vdb", "/mnt/test", "f2fs", 0, "lazytime");

CC: stable@vger.kernel.org # 6.11+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:15 +00:00
Vasiliy Kovalev
c8b359dddb ovl: Filter invalid inodes with missing lookup function
Add a check to the ovl_dentry_weird() function to prevent the
processing of directory inodes that lack the lookup function.
This is important because such inodes can cause errors in overlayfs
when passed to the lowerstack.

Reported-by: syzbot+a8c9d476508bd14a90e5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=a8c9d476508bd14a90e5
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/linux-unionfs/CAJfpegvx-oS9XGuwpJx=Xe28_jzWx5eRo1y900_ZzWY+=gGzUg@mail.gmail.com/
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-11-20 10:23:04 +01:00
Linus Torvalds
bf9aa14fc5 A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
 
     posix-timers are currently auto-rearmed by the kernel when the signal
     of the timer is ignored so that the timer signal can be delivered once
     the corresponding signal is unignored.
 
     This requires to throttle the timer to prevent a DoS by small intervals
     and keeps the system pointlessly out of low power states for no value.
     This is a long standing non-trivial problem due to the lock order of
     posix-timer lock and the sighand lock along with life time issues as
     the timer and the sigqueue have different life time rules.
 
     Cure this by:
 
      * Embedding the sigqueue into the timer struct to have the same life
        time rules. Aside of that this also avoids the lookup of the timer
        in the signal delivery and rearm path as it's just a always valid
        container_of() now.
 
      * Queuing ignored timer signals onto a seperate ignored list.
 
      * Moving queued timer signals onto the ignored list when the signal is
        switched to SIG_IGN before it could be delivered.
 
      * Walking the ignored list when SIG_IGN is lifted and requeue the
        signals to the actual signal lists. This allows the signal delivery
        code to rearm the timer.
 
     This also required to consolidate the signal delivery rules so they are
     consistent across all situations. With that all self test scenarios
     finally succeed.
 
   - Core infrastructure for VFS multigrain timestamping
 
     This is required to allow the kernel to use coarse grained time stamps
     by default and switch to fine grained time stamps when inode attributes
     are actively observed via getattr().
 
     These changes have been provided to the VFS tree as well, so that the
     VFS specific infrastructure could be built on top.
 
   - Cleanup and consolidation of the sleep() infrastructure
 
     * Move all sleep and timeout functions into one file
 
     * Rework udelay() and ndelay() into proper documented inline functions
       and replace the hardcoded magic numbers by proper defines.
 
     * Rework the fsleep() implementation to take the reality of the timer
       wheel granularity on different HZ values into account. Right now the
       boundaries are hard coded time ranges which fail to provide the
       requested accuracy on different HZ settings.
 
     * Update documentation for all sleep/timeout related functions and fix
       up stale documentation links all over the place
 
     * Fixup a few usage sites
 
   - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks
 
     A system can have multiple PTP clocks which are participating in
     seperate and independent PTP clock domains. So far the kernel only
     considers the PTP clock which is based on CLOCK TAI relevant as that's
     the clock which drives the timekeeping adjustments via the various user
     space daemons through adjtimex(2).
 
     The non TAI based clock domains are accessible via the file descriptor
     based posix clocks, but their usability is very limited. They can't be
     accessed fast as they always go all the way out to the hardware and
     they cannot be utilized in the kernel itself.
 
     As Time Sensitive Networking (TSN) gains traction it is required to
     provide fast user and kernel space access to these clocks.
 
     The approach taken is to utilize the timekeeping and adjtimex(2)
     infrastructure to provide this access in a similar way how the kernel
     provides access to clock MONOTONIC, REALTIME etc.
 
     Instead of creating a duplicated infrastructure this rework converts
     timekeeping and adjtimex(2) into generic functionality which operates
     on pointers to data structures instead of using static variables.
 
     This allows to provide time accessors and adjtimex(2) functionality for
     the independent PTP clocks in a subsequent step.
 
   - Consolidate hrtimer initialization
 
     hrtimers are set up by initializing the data structure and then
     seperately setting the callback function for historical reasons.
 
     That's an extra unnecessary step and makes Rust support less straight
     forward than it should be.
 
     Provide a new set of hrtimer_setup*() functions and convert the core
     code and a few usage sites of the less frequently used interfaces over.
 
     The bulk of the htimer_init() to hrtimer_setup() conversion is already
     prepared and scheduled for the next merge window.
 
   - Drivers:
 
     * Ensure that the global timekeeping clocksource is utilizing the
       cluster 0 timer on MIPS multi-cluster systems.
 
       Otherwise CPUs on different clusters use their cluster specific
       clocksource which is not guaranteed to be synchronized with other
       clusters.
 
     * Mostly boring cleanups, fixes, improvements and code movement
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kPITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZKkD/9OUL6fOJrDUmOYBa4QVeMyfTef4EaL
 tvwIMM/29XQFeiq3xxCIn+EMnHjXn2lvIhYGQ7GKsbKYwvJ7ZBDpQb+UMhZ2nKI9
 6D6BP6WomZohKeH2fZbJQAdqOi3KRYdvQdIsVZUexkqiaVPphRvOH9wOr45gHtZM
 EyMRSotPlQTDqcrbUejDMEO94GyjDCYXRsyATLxjmTzL/N4xD4NRIiotjM2vL/a9
 8MuCgIhrKUEyYlFoOxxeokBsF3kk3/ez2jlG9b/N8VLH3SYIc2zgL58FBgWxlmgG
 bY71nVG3nUgEjxBd2dcXAVVqvb+5widk8p6O7xxOAQKTLMcJ4H0tQDkMnzBtUzvB
 DGAJDHAmAr0g+ja9O35Pkhunkh4HYFIbq0Il4d1HMKObhJV0JumcKuQVxrXycdm3
 UZfq3seqHsZJQbPgCAhlFU0/2WWScocbee9bNebGT33KVwSp5FoVv89C/6Vjb+vV
 Gusc3thqrQuMAZW5zV8g4UcBAA/xH4PB0I+vHib+9XPZ4UQ7/6xKl2jE0kd5hX7n
 AAUeZvFNFqIsY+B6vz+Jx/yzyM7u5cuXq87pof5EHVFzv56lyTp4ToGcOGYRgKH5
 JXeYV1OxGziSDrd5vbf9CzdWMzqMvTefXrHbWrjkjhNOe8E1A8O88RZ5uRKZhmSw
 hZZ4hdM9+3T7cg==
 =2VC6
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A rather large update for timekeeping and timers:

   - The final step to get rid of auto-rearming posix-timers

     posix-timers are currently auto-rearmed by the kernel when the
     signal of the timer is ignored so that the timer signal can be
     delivered once the corresponding signal is unignored.

     This requires to throttle the timer to prevent a DoS by small
     intervals and keeps the system pointlessly out of low power states
     for no value. This is a long standing non-trivial problem due to
     the lock order of posix-timer lock and the sighand lock along with
     life time issues as the timer and the sigqueue have different life
     time rules.

     Cure this by:

       - Embedding the sigqueue into the timer struct to have the same
         life time rules. Aside of that this also avoids the lookup of
         the timer in the signal delivery and rearm path as it's just a
         always valid container_of() now.

       - Queuing ignored timer signals onto a seperate ignored list.

       - Moving queued timer signals onto the ignored list when the
         signal is switched to SIG_IGN before it could be delivered.

       - Walking the ignored list when SIG_IGN is lifted and requeue the
         signals to the actual signal lists. This allows the signal
         delivery code to rearm the timer.

     This also required to consolidate the signal delivery rules so they
     are consistent across all situations. With that all self test
     scenarios finally succeed.

   - Core infrastructure for VFS multigrain timestamping

     This is required to allow the kernel to use coarse grained time
     stamps by default and switch to fine grained time stamps when inode
     attributes are actively observed via getattr().

     These changes have been provided to the VFS tree as well, so that
     the VFS specific infrastructure could be built on top.

   - Cleanup and consolidation of the sleep() infrastructure

       - Move all sleep and timeout functions into one file

       - Rework udelay() and ndelay() into proper documented inline
         functions and replace the hardcoded magic numbers by proper
         defines.

       - Rework the fsleep() implementation to take the reality of the
         timer wheel granularity on different HZ values into account.
         Right now the boundaries are hard coded time ranges which fail
         to provide the requested accuracy on different HZ settings.

       - Update documentation for all sleep/timeout related functions
         and fix up stale documentation links all over the place

       - Fixup a few usage sites

   - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
     clocks

     A system can have multiple PTP clocks which are participating in
     seperate and independent PTP clock domains. So far the kernel only
     considers the PTP clock which is based on CLOCK TAI relevant as
     that's the clock which drives the timekeeping adjustments via the
     various user space daemons through adjtimex(2).

     The non TAI based clock domains are accessible via the file
     descriptor based posix clocks, but their usability is very limited.
     They can't be accessed fast as they always go all the way out to
     the hardware and they cannot be utilized in the kernel itself.

     As Time Sensitive Networking (TSN) gains traction it is required to
     provide fast user and kernel space access to these clocks.

     The approach taken is to utilize the timekeeping and adjtimex(2)
     infrastructure to provide this access in a similar way how the
     kernel provides access to clock MONOTONIC, REALTIME etc.

     Instead of creating a duplicated infrastructure this rework
     converts timekeeping and adjtimex(2) into generic functionality
     which operates on pointers to data structures instead of using
     static variables.

     This allows to provide time accessors and adjtimex(2) functionality
     for the independent PTP clocks in a subsequent step.

   - Consolidate hrtimer initialization

     hrtimers are set up by initializing the data structure and then
     seperately setting the callback function for historical reasons.

     That's an extra unnecessary step and makes Rust support less
     straight forward than it should be.

     Provide a new set of hrtimer_setup*() functions and convert the
     core code and a few usage sites of the less frequently used
     interfaces over.

     The bulk of the htimer_init() to hrtimer_setup() conversion is
     already prepared and scheduled for the next merge window.

   - Drivers:

       - Ensure that the global timekeeping clocksource is utilizing the
         cluster 0 timer on MIPS multi-cluster systems.

         Otherwise CPUs on different clusters use their cluster specific
         clocksource which is not guaranteed to be synchronized with
         other clusters.

       - Mostly boring cleanups, fixes, improvements and code movement"

* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
  posix-timers: Fix spurious warning on double enqueue versus do_exit()
  clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
  clocksource/drivers/gpx: Remove redundant casts
  clocksource/drivers/timer-ti-dm: Fix child node refcount handling
  dt-bindings: timer: actions,owl-timer: convert to YAML
  clocksource/drivers/ralink: Add Ralink System Tick Counter driver
  clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
  clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
  clocksource/drivers:sp804: Make user selectable
  clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
  hrtimers: Delete hrtimer_init_on_stack()
  alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
  io_uring: Switch to use hrtimer_setup_on_stack()
  sched/idle: Switch to use hrtimer_setup_on_stack()
  hrtimers: Delete hrtimer_init_sleeper_on_stack()
  wait: Switch to use hrtimer_setup_sleeper_on_stack()
  timers: Switch to use hrtimer_setup_sleeper_on_stack()
  net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
  futex: Switch to use hrtimer_setup_sleeper_on_stack()
  fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
  ...
2024-11-19 16:35:06 -08:00