- Migrate more hash algorithms from the traditional crypto subsystem
to lib/crypto/.
Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
the implementations, improves performance, enables further
simplifications in calling code, and solves various other issues:
- AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
- Support these algorithms in lib/crypto/ using the AES
library and the existing arm64 assembly code
- Reimplement the traditional crypto API's "cmac(aes)",
"xcbc(aes)", and "cbcmac(aes)" on top of the library
- Convert mac80211 to use the AES-CMAC library. Note: several
other subsystems can use it too and will be converted later
- Drop the broken, nonstandard, and likely unused support for
"xcbc(aes)" with key lengths other than 128 bits
- Enable optimizations by default
- GHASH
- Migrate the standalone GHASH code into lib/crypto/
- Integrate the GHASH code more closely with the very similar
POLYVAL code, and improve the generic GHASH implementation
to resist cache-timing attacks and use much less memory
- Reimplement the AES-GCM library and the "gcm" crypto_aead
template on top of the GHASH library. Remove "ghash" from
the crypto_shash API, as it's no longer needed
- Enable optimizations by default
- SM3
- Migrate the kernel's existing SM3 code into lib/crypto/, and
reimplement the traditional crypto API's "sm3" on top of it
- I don't recommend using SM3, but this cleanup is worthwhile
to organize the code the same way as other algorithms
- Testing improvements
- Add a KUnit test suite for each of the new library APIs
- Migrate the existing ChaCha20Poly1305 test to KUnit
- Make the KUnit all_tests.config enable all crypto library tests
- Move the test kconfig options to the Runtime Testing menu
- Other updates to arch-optimized crypto code
- Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
- Remove some MD5 implementations that are no longer worth keeping
- Drop big endian and voluntary preemption support from the arm64
code, as those configurations are no longer supported on arm64
- Make jitterentropy and samples/tsm-mr use the crypto library APIs
Note: the overall diffstat is neutral, but when the test code is
excluded it is significantly negative:
Tests: 13 files changed, 1982 insertions(+), 888 deletions(-)
Non-test: 141 files changed, 2897 insertions(+), 3987 deletions(-)
All: 154 files changed, 4879 insertions(+), 4875 deletions(-)
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCadWPyxQcZWJpZ2dlcnNA
a2VybmVsLm9yZwAKCRDzXCl4vpKOK8QCAQD0i98miI1mu01RKuEwrBzmn7L/2sUH
ReYV/dFDtnN0GwD+KMCiNAM2XTVLRKq5t3OxPHpKZ4y+gZwRowAJeFA02Q8=
=5rip
-----END PGP SIGNATURE-----
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
- Migrate more hash algorithms from the traditional crypto subsystem to
lib/crypto/
Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
the implementations, improves performance, enables further
simplifications in calling code, and solves various other issues:
- AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
- Support these algorithms in lib/crypto/ using the AES library
and the existing arm64 assembly code
- Reimplement the traditional crypto API's "cmac(aes)",
"xcbc(aes)", and "cbcmac(aes)" on top of the library
- Convert mac80211 to use the AES-CMAC library. Note: several
other subsystems can use it too and will be converted later
- Drop the broken, nonstandard, and likely unused support for
"xcbc(aes)" with key lengths other than 128 bits
- Enable optimizations by default
- GHASH
- Migrate the standalone GHASH code into lib/crypto/
- Integrate the GHASH code more closely with the very similar
POLYVAL code, and improve the generic GHASH implementation to
resist cache-timing attacks and use much less memory
- Reimplement the AES-GCM library and the "gcm" crypto_aead
template on top of the GHASH library. Remove "ghash" from the
crypto_shash API, as it's no longer needed
- Enable optimizations by default
- SM3
- Migrate the kernel's existing SM3 code into lib/crypto/, and
reimplement the traditional crypto API's "sm3" on top of it
- I don't recommend using SM3, but this cleanup is worthwhile
to organize the code the same way as other algorithms
- Testing improvements:
- Add a KUnit test suite for each of the new library APIs
- Migrate the existing ChaCha20Poly1305 test to KUnit
- Make the KUnit all_tests.config enable all crypto library tests
- Move the test kconfig options to the Runtime Testing menu
- Other updates to arch-optimized crypto code:
- Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
- Remove some MD5 implementations that are no longer worth keeping
- Drop big endian and voluntary preemption support from the arm64
code, as those configurations are no longer supported on arm64
- Make jitterentropy and samples/tsm-mr use the crypto library APIs
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits)
lib/crypto: arm64: Assume a little-endian kernel
arm64: fpsimd: Remove obsolete cond_yield macro
lib/crypto: arm64/sha3: Remove obsolete chunking logic
lib/crypto: arm64/sha512: Remove obsolete chunking logic
lib/crypto: arm64/sha256: Remove obsolete chunking logic
lib/crypto: arm64/sha1: Remove obsolete chunking logic
lib/crypto: arm64/poly1305: Remove obsolete chunking logic
lib/crypto: arm64/gf128hash: Remove obsolete chunking logic
lib/crypto: arm64/chacha: Remove obsolete chunking logic
lib/crypto: arm64/aes: Remove obsolete chunking logic
lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
lib/crypto: aescfb: Don't disable IRQs during AES block encryption
lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit
lib/crypto: sparc: Drop optimized MD5 code
lib/crypto: mips: Drop optimized MD5 code
lib: Move crypto library tests to Runtime Testing menu
crypto: sm3 - Remove 'struct sm3_state'
crypto: sm3 - Remove the original "sm3_block_generic()"
crypto: sm3 - Remove sm3_base.h
...
- Various cleanups for the interface between fs/crypto/ and
filesystems, from Christoph Hellwig
- Simplify and optimize the implementation of v1 key derivation by
using the AES library instead of the crypto_skcipher API
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCadV8xhQcZWJpZ2dlcnNA
a2VybmVsLm9yZwAKCRDzXCl4vpKOK0wSAPsGg/zd0bMiF9dcKKVESAVIePSKFbvx
5e1speATaXTSVAEAnkLLL1PZJJq9HOKpQY8Wkqpzy8Kmt8x53VeI4YW0fQc=
=Rb7e
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
- Various cleanups for the interface between fs/crypto/ and
filesystems, from Christoph Hellwig
- Simplify and optimize the implementation of v1 key derivation by
using the AES library instead of the crypto_skcipher API
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: use AES library for v1 key derivation
ext4: use a byte granularity cursor in ext4_mpage_readpages
fscrypt: pass a real sector_t to fscrypt_zeroout_range
fscrypt: pass a byte length to fscrypt_zeroout_range
fscrypt: pass a byte offset to fscrypt_zeroout_range
fscrypt: pass a byte length to fscrypt_zeroout_range_inline_crypt
fscrypt: pass a byte offset to fscrypt_zeroout_range_inline_crypt
fscrypt: pass a byte offset to fscrypt_set_bio_crypt_ctx
fscrypt: pass a byte offset to fscrypt_mergeable_bio
fscrypt: pass a byte offset to fscrypt_generate_dun
fscrypt: move fscrypt_set_bio_crypt_ctx_bh to buffer.c
ext4, fscrypt: merge fscrypt_mergeable_bio_bh into io_submit_need_new_bio
ext4: factor out a io_submit_need_new_bio helper
ext4: open code fscrypt_set_bio_crypt_ctx_bh
ext4: initialize the write hint in io_submit_init_bio
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmndBe8ACgkQiiy9cAdy
T1FpBwv/aFC7A+t2Cc7YAfbcy6reyOWO7SSHFh3drnlHMnwSMuGenVT5mHl0NuqF
ZHaOu7NL9G0jwiorPFhNvfcrS9cAfKeNj/+0Zjl7XaaZg8gg4/yOBvYHbYb6k+/d
s9eNcLZncZxsx9hY6afPOFM3gZLY6sMreoe0DdILk8Ez7J2FfGDPY2bKB8OIUE40
Gt41PHhFJmd8kd1AfE0ihKtfmkLjuFcF9gBHFgq6o8TZj2Zjgzvw3SP+wjwKwIjL
WG5F6HKke1mgFJn7oB3qrzbNowoK61Il7BCdII+sDlyQLWoAXk10+3IeBvVCMjN6
mX531RmITm9NgiBS8PABizf/6GYP2zw3TKQqHGWy0zt4Rb+fzFEVquyG10xTr+s4
b7NA9UJt8iRrVZzEY9NQVU/wspXgSkBeB7OzlnCOeuxbQN9bKowHlI25EC4ZWSV3
gw6A0XPpMSwe1WPOytDKxTBTwlZ0tdeMXh3Kj/iZgnhf0KVFlZ0CJY4bJf/YbKmL
I250Z3Xl
=FbUp
-----END PGP SIGNATURE-----
Merge tag 'v7.1-rc1-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Fix EAs bounds check
- Fix OOB read in symlink response parsing
- Add support for creating tmpfiles
- Minor debug improvement for mount failure
- Minor crypto cleanup
- Add missing module description
- mount fix for lease vs. nolease
- Add Metze as maintainer for smbdirect
- Minor error mapping header cleanup
- Improve search speed of SMB1 maperror
- Fix potential null ptr ref in smb2 map error tests
* tag 'v7.1-rc1-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (26 commits)
smb: client: allow both 'lease' and 'nolease' mount options
smb: client: get rid of d_drop()+d_add()
smb: client: set ATTR_TEMPORARY with O_TMPFILE | O_EXCL
smb: client: add support for O_TMPFILE
vfs: introduce d_mark_tmpfile_name()
MAINTAINERS: create entry for smbdirect
smb: client: add missing MODULE_DESCRIPTION() to smb1maperror_test
smb: client: fix OOB reads parsing symlink error response
smb: client: fix off-by-8 bounds check in check_wsl_eas()
smb: client: Remove unnecessary selection of CRYPTO_ECB
smb/client: move smb2maperror declarations to smb2proto.h
smb/client: introduce KUnit tests to check DOS/SRV err mapping search
smb/client: check if SMB1 DOS/SRV error mapping arrays are sorted
smb/client: use binary search for SMB1 DOS/SRV error mapping
smb/client: autogenerate SMB1 DOS/SRV to POSIX error mapping
smb/client: annotate smberr.h with POSIX error codes
smb/client: move ERRnetlogonNotStarted to DOS error class
smb/client: introduce KUnit test to check ntstatus_to_dos_map search
smb/client: check if ntstatus_to_dos_map is sorted
smb/client: use binary search for NT status to DOS mapping
...
Signed-off-by: Carlos Maiolino <cem@kernel.org>
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCadyo1AAKCRBcsMJ8RxYu
YzqJAYDic0mU0VPFyPVQKGvlEw0QTPbWNrbnLa7POD3pyZ4yhTuroA5016YuMM2m
DouQN6sBgOTZ/kSRH/Rgui6dXmnG5VdR5dZo/ADcTmimfycqwXSgFnxEgTNN4eQE
Uq0drKXU1A==
=l6Ln
-----END PGP SIGNATURE-----
Merge tag 'xfs-merge-7.1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Carlos Maiolino:
"There aren't any new features.
The whole series is just a collection of bug fixes and code
refactoring. There is some new information added a couple new
tracepoints, new data added to mountstats, but no big changes"
* tag 'xfs-merge-7.1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (41 commits)
xfs: fix number of GC bvecs
xfs: untangle the open zones reporting in mountinfo
xfs: expose the number of open zones in sysfs
xfs: reduce special casing for the open GC zone
xfs: streamline GC zone selection
xfs: refactor GC zone selection helpers
xfs: rename xfs_zone_gc_iter_next to xfs_zone_gc_iter_irec
xfs: put the open zone later xfs_open_zone_put
xfs: add a separate tracepoint for stealing an open zone for GC
xfs: delay initial open of the GC zone
xfs: fix a resource leak in xfs_alloc_buftarg()
xfs: handle too many open zones when mounting
xfs: refactor xfs_mount_zones
xfs: fix integer overflow in busy extent sort comparator
xfs: fix integer overflow in deferred intent sort comparators
xfs: fold xfs_setattr_size into xfs_vn_setattr_size
xfs: remove a duplicate assert in xfs_setattr_size
xfs: return default quota limits for IDs without a dquot
xfs: start gc on zonegc_low_space attribute updates
xfs: don't decrement the buffer LRU count for in-use buffers
...
PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration performed in
airoha_fe_init() is used to duplicate multicast packets and send a copy
to the CPU when the traffic is offloaded. This is necessary just if
it is requested by the user. Disable multicast packets duplication by
default.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260412-airoha_fe_init_remove_mc_en_bit-v1-1-7b6a5a25a74d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Validate xattr h_shared_count to report -EFSCORRUPTED explicitly
for crafted images
- Verify metadata accesses for file-backed mounts via rw_verify_area()
- Fix FS_IOC_GETFSLABEL to include the trailing NUL byte, consistent
with ext4 and xfs
- Properly handle 48-bit on-disk blocks/uniaddr for extra devices
- Fix an index underflow in the LZ4 in-place decompression that can
cause out-of-bounds accesses with crafted images
- Minor fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmncchsRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qoXXA//S3iYT8i7iyyeMVwrP5/10k6AbFGDJTar
eTeLx9itzsxDDNIqih6hmcQKb4fBhHbLA7fMt183+HnO8j2gW9oGUWRfYRWuuRj2
7xz0JOe5lcRkwS/fzthywOmtHWVGqLZ/JME3bCoZkLOpK8cQSG0o0ZTJjYufdcev
2UZAIjU2eWrA1lSRfZdnfM5N5oKixOTzEb9/serifgBygYZvNSSPRLecEQPldYKY
5Db00Q5RnxVwX7HVBmAg8pYyX6ogeIo4Mnh9Itra/vyIJZ9OdgCUVfjx9Q2qAAA5
gMOv6KppnMZGa8JNvpxg/7+/K/vx2wuNxcf2w0Bz4dVs/6Sby8tbrn2iLJjNYCPh
Y72aysH51EZK5yD+GEGFKG0rDuBOeW+kSicl8IlfZWDj5hLU4hjboOKNXAZBh921
DQcoHfkPhf8Ay0z0dDW7Xod9w2VNVtiSXP08T0TvAEelHVMYwkTVCST2bQlOFVrO
8zy54QwQ5e6deFV5LSpTQn4Nj4+iPqZRaojqX2xlqFd17DXCzu6VolpCr9cIiNmO
WlAKg3WkomuLcyb1cYt7eNK4PbHZGjUmNjUS+OG6hbqXBMLn5IUF+F31syTf/Q9f
erSM89entNNk6kVwqiRIVxO314iPp5q9rFDGzBJjxyR99V+oYqv0JVSb7X3SWPTu
w4VmhbQRsiM=
=rEW8
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
- Validate xattr h_shared_count to report -EFSCORRUPTED explicitly for
crafted images
- Verify metadata accesses for file-backed mounts via rw_verify_area()
- Fix FS_IOC_GETFSLABEL to include the trailing NUL byte, consistent
with ext4 and xfs
- Properly handle 48-bit on-disk blocks/uniaddr for extra devices
- Fix an index underflow in the LZ4 in-place decompression that can
cause out-of-bounds accesses with crafted images
- Minor fixes and cleanups
* tag 'erofs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: error out obviously illegal extents in advance
erofs: clean up encoded map flags
erofs: fix unsigned underflow in z_erofs_lz4_handle_overlap()
erofs: handle 48-bit blocks/uniaddr for extra devices
erofs: include the trailing NUL in FS_IOC_GETFSLABEL
erofs: ensure all folios are managed in erofs_try_to_free_all_cached_folios()
erofs: verify metadata accesses for file-backed mounts
erofs: harden h_shared_count in erofs_init_inode_xattrs()
- Implement FALLOC_FL_ALLOCATE_RANGE to add support for preallocating
clusters without zeroing, helping to reduce file fragmentation.
- Add a unified block readahead helper for FAT chain conversion, bitmap
allocation, and directory entry lookups.
- Optimize exfat_chain_cont_cluster() by caching buffer heads to minimize
mark_buffer_dirty() and mirroring overhead during NO_FAT_CHAIN to
FAT_CHAIN conversion.
- Switch to truncate_inode_pages_final() in evict_inode() to prevent
BUG_ON caused by shadow entries during reclaim.
- Fix a 32-bit truncation bug in directory entry calculations by ensuring
proper bitwise coercion.
- Fix sb->s_maxbytes calculation to correctly reflect the maximum possible
volume size for a given cluster size, resolving xfstests generic/213.
- Introduced exfat_cluster_walk() helper to traverse FAT chains by
a specified step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN
modes.
- Introduced exfat_chain_advance() helper to advance an exfat_chain
structure, updating both the current cluster and remaining size.
- Remove dead assignments and fix Smatch warnings.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmnbHMUWHGxpbmtpbmpl
b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCHXZEACDyb7s63to6V0tdry1D+/Q0hrX
blgUDnD+oGM5J+rFsvmfU+GhUJhe3hIgvLReUpBzrjegdVXCc6wBn+pBzQkjq81A
emJOIjOzyTgwusLuneSMGBZ3oCBJDiREGWqFeoeV1IHZoXvL4oqVr11V4b+8l2ms
Y2FqMljg8AYsgD/Wf3nXPJ1kph/5qpxeKQB4O1RIMEqJp5uoU0byDAtJSjde7oKy
cBRyYeGxcJ0UkQNX3pOfJHe2+Hb2wvCOPWpZ7fw4OfsOYcJMM7IFEKoT2tH67sx9
fo3J/Q74A2Gd0pwyWUwX2xK70uSkoLCR6TxInIgLaKK2IHmgpSHUJa+F5Nz9QFED
PDsBMedt38QXlI4jHNfYTIZuFXgFpIuDJyiphX7tQ72n+JVTiWNnKEF9mRRpRzZ+
2AV6h6YXN0Sd4/EJts+hf1S5admwEKNYpaBXgNTUBD/BbUQcKmvw8JQNk5/tYjU4
9jr4ab1jj1tvWtw9r2Ceu/W71W3l+Ys8gxcJB2DJB1vvo9h23NyHoNwnBMAwmNRv
5ykE7LFFHzwt8QUTMiGlMy5NFpd9cwcVKjzXd+Iogxqa1pDW/lFXrEYxzfvsqPwo
cBNIu2hUf2ESqctAmOWbpCRZwSWMQjh7cUwIHPqNcG2f3JMCzxxVlQC6SUJYr1qv
LoSPVMnQU/ReqFfK6w==
=JG0E
-----END PGP SIGNATURE-----
Merge tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Implement FALLOC_FL_ALLOCATE_RANGE to add support for preallocating
clusters without zeroing, helping to reduce file fragmentation
- Add a unified block readahead helper for FAT chain conversion, bitmap
allocation, and directory entry lookups
- Optimize exfat_chain_cont_cluster() by caching buffer heads to
minimize mark_buffer_dirty() and mirroring overhead during
NO_FAT_CHAIN to FAT_CHAIN conversion
- Switch to truncate_inode_pages_final() in evict_inode() to prevent
BUG_ON caused by shadow entries during reclaim
- Fix a 32-bit truncation bug in directory entry calculations by
ensuring proper bitwise coercion
- Fix sb->s_maxbytes calculation to correctly reflect the maximum
possible volume size for a given cluster size, resolving xfstests
generic/213
- Introduced exfat_cluster_walk() helper to traverse FAT chains by a
specified step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN
modes
- Introduced exfat_chain_advance() helper to advance an exfat_chain
structure, updating both the current cluster and remaining size
- Remove dead assignments and fix Smatch warnings
* tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: use exfat_chain_advance helper
exfat: introduce exfat_chain_advance helper
exfat: remove NULL cache pointer case in exfat_ent_get
exfat: use exfat_cluster_walk helper
exfat: introduce exfat_cluster_walk helper
exfat: fix incorrect directory checksum after rename to shorter name
exfat: fix s_maxbytes
exfat: fix passing zero to ERR_PTR() in exfat_mkdir()
exfat: fix error handling for FAT table operations
exfat: optimize exfat_chain_cont_cluster with cached buffer heads
exfat: drop redundant sec parameter from exfat_mirror_bh
exfat: use readahead helper in exfat_get_dentry
exfat: use readahead helper in exfat_allocate_bitmap
exfat: add block readahead in exfat_chain_cont_cluster
exfat: add fallocate FALLOC_FL_ALLOCATE_RANGE support
exfat: Fix bitwise operation having different size
exfat: Drop dead assignment of num_clusters
exfat: use truncate_inode_pages_final() at evict_inode()
- nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()
- nilfs2: fix NULL i_assoc_inode dereference in nilfs_mdt_save_to_shadow_map
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCadlsMAAKCRAhI4xTh21N
nVLWAQC9fY9kH2fzW7MQKt0/33iN+KwnWPCTChVjSspoSG14QAEAiWqEiVcJtvGN
ARJgvHKX7fUzZf2YoVnVeGheOrbdvgE=
=XmKt
-----END PGP SIGNATURE-----
Merge tag 'nilfs2-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2
Pull nilfs2 updates from Viacheslav Dubeyko:
"This contains fixes of syzbot reported issues in NILFS2 functionality:
- The DAT inode's btree node cache (i_assoc_inode) is initialized
lazily during btree operations.
However, nilfs_mdt_save_to_shadow_map() assumes i_assoc_inode is
already initialized when copying dirty pages to the shadow map
during GC. If NILFS_IOCTL_CLEAN_SEGMENTS is called immediately
after mount before any btree operation has occurred on the DAT
inode, i_assoc_inode is NULL leading to a general protection fault.
Fix this by calling nilfs_attach_btree_node_cache() on the DAT
inode in nilfs_dat_read() at mount time, ensuring i_assoc_inode is
always initialized before any GC operation can use it (Deepanshu
Kartikey)
- nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead
blocks by comparing it with the current block number bd_blocknr. If
they differ, the block is considered dead and skipped.
A corrupted ioctl request with bd_oblocknr set to 0 causes the
comparison to incorrectly match when the lookup returns -ENOENT and
sets bd_blocknr to 0, bypassing the dead block check and calling
nilfs_bmap_mark() on a non- existent block. This causes
nilfs_btree_do_lookup() to return -ENOENT, triggering the
WARN_ON(ret == -ENOENT).
Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at
the beginning of each iteration (Deepanshu Kartikey)"
* tag 'nilfs2-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2:
nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()
nilfs2: fix NULL i_assoc_inode dereference in nilfs_mdt_save_to_shadow_map
- hfsplus: fix generic/642 failure
- hfsplus: rework logic of map nodes creation in xattr b-tree
- hfsplus: fix logic of alloc/free b-tree node
- hfsplus: fix error processing issue in hfs_bmap_free()
- hfsplus: fix potential race conditions in b-tree functionality
- hfsplus: extract hidden directory search into a helper function
- hfsplus: fix held lock freed on hfsplus_fill_super()
- hfsplus: fix generic/523 test-case failure
- hfsplus: validate b-tree node 0 bitmap at mount time
- hfsplus: refactor b-tree map page access and add node-type validation
- hfsplus: fix to update ctime after rename
- hfsplus: fix generic/533 test-case failure
- hfsplus: set ctime after setxattr and removexattr
- hfsplus: fix uninit-value by validating catalog record size
- hfsplus: fix potential Allocation File corruption after fsync
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCadlj7wAKCRAhI4xTh21N
nRzsAQDhjGNvUe9QJUDk7D/GjMxWUs+8jrGw9Ruu8GhvRIePzwEAtkZoJRwNQzwa
9XEyenrKqxLTaqaPtRChicbmdKw7eQQ=
=TUrR
-----END PGP SIGNATURE-----
Merge tag 'hfs-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfsplus updates from Viacheslav Dubeyko:
"This contains several fixes of syzbot reported issues and HFS+ fixes
of xfstests failures.
- Fix a syzbot reported issue of a KMSAN uninit-value in
hfsplus_strcasecmp().
The root cause was that hfs_brec_read() doesn't validate that the
on-disk record size matches the expected size for the record type
being read. The fix introduced hfsplus_brec_read_cat() wrapper that
validates the record size based on the type field and returns -EIO
if size doesn't match (Deepanshu Kartikey)
- Fix a syzbot reported issue of processing corrupted HFS+ images
where the b-tree allocation bitmap indicates that the header node
(Node 0) is free. Node 0 must always be allocated. Violating this
invariant leads to allocator corruption, which cascades into kernel
panics or undefined behavior.
Prevent trusting a corrupted allocator state by adding a validation
check during hfs_btree_open(). If corruption is detected, print a
warning identifying the specific corrupted tree and force the
filesystem to mount read-only (SB_RDONLY).
This prevents kernel panics from corrupted images while enabling
data recovery (Shardul Bankar)
- Fix a potential deadlock in hfsplus_fill_super().
hfsplus_fill_super() calls hfs_find_init() to initialize a search
structure, which acquires tree->tree_lock. If the subsequent call
to hfsplus_cat_build_key() fails, the function jumps to the
out_put_root error label without releasing the lock.
Fix this by adding the missing hfs_find_exit(&fd) call before
jumping to the out_put_root error label. This ensures that
tree->tree_lock is properly released on the error path (Zilin Guan)
- Update a files ctime after rename in hfsplus_rename() (Yangtao Li)
The rest of the patches introduce the HFS+ fixes for the case of
generic/348, generic/728, generic/533, generic/523, and generic/642
test-cases of xfstests suite"
* tag 'hfs-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfsplus: fix generic/642 failure
hfsplus: rework logic of map nodes creation in xattr b-tree
hfsplus: fix logic of alloc/free b-tree node
hfsplus: fix error processing issue in hfs_bmap_free()
hfsplus: fix potential race conditions in b-tree functionality
hfsplus: extract hidden directory search into a helper function
hfsplus: fix held lock freed on hfsplus_fill_super()
hfsplus: fix generic/523 test-case failure
hfsplus: validate b-tree node 0 bitmap at mount time
hfsplus: refactor b-tree map page access and add node-type validation
hfsplus: fix to update ctime after rename
hfsplus: fix generic/533 test-case failure
hfsplus: set ctime after setxattr and removexattr
hfsplus: fix uninit-value by validating catalog record size
hfsplus: fix potential Allocation File corruption after fsync
udp_tunnel_xmit_skb() / udp_tunnel6_xmit_skb() are expected to run with
BH disabled. After commit 6f1a9140ec ("add xmit recursion limit to
tunnel xmit functions"), on the path:
udp(6)_tunnel_xmit_skb() -> ip(6)tunnel_xmit()
dev_xmit_recursion_inc()/dec() must stay balanced on the same CPU.
Without local_bh_disable(), the context may move between CPUs, which can
break the inc/dec pairing. This may lead to incorrect recursion level
detection and cause packets to be dropped in ip(6)_tunnel_xmit() or
__dev_queue_xmit().
Fix it by disabling BH around both IPv4 and IPv6 SCTP UDP xmit paths.
In my testing, after enabling the SCTP over UDP:
# ip net exec ha sysctl -w net.sctp.udp_port=9899
# ip net exec ha sysctl -w net.sctp.encap_port=9899
# ip net exec hb sysctl -w net.sctp.udp_port=9899
# ip net exec hb sysctl -w net.sctp.encap_port=9899
# ip net exec ha iperf3 -s
- without this patch:
# ip net exec hb iperf3 -c 192.168.0.1 --sctp
[ 5] 0.00-10.00 sec 37.2 MBytes 31.2 Mbits/sec sender
[ 5] 0.00-10.00 sec 37.1 MBytes 31.1 Mbits/sec receiver
- with this patch:
# ip net exec hb iperf3 -c 192.168.0.1 --sctp
[ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec sender
[ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec receiver
Fixes: 6f1a9140ec ("net: add xmit recursion limit to tunnel xmit functions")
Fixes: 046c052b47 ("sctp: enable udp tunneling socks")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/c874a8548221dcd56ff03c65ba75a74e6cf99119.1776017727.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
encap_port in SCTP_INPUT_CB(skb) is used by sctp_vtag_verify() for
SCTP-over-UDP processing. In the GSO case, it is only set on the head
skb, while fragment skbs leave it 0.
This results in fragment skbs seeing encap_port == 0, breaking
SCTP-over-UDP connections.
Fix it by propagating encap_port from the head skb cb when initializing
fragment skbs in sctp_inq_pop().
Fixes: 046c052b47 ("sctp: enable udp tunneling socks")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/ea65ed61b3598d8b4940f0170b9aa1762307e6c3.1776017631.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi says:
====================
net: airoha: Preliminary series to support multiple net_devices connected to the same GDM port
EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
manages the traffic in a TDM manner.
This series introduces some preliminary changes necessary to introduce
support for multiple net_devices connected to the same Frame Engine (FE)
GDM port (GDM3 or GDM4).
====================
Link: https://patch.msgid.link/20260412-airoha-multi-serdes-preliminary-patch-v1-0-08d5b670ca8f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove airoha_gdm_port dependency in ETS tc callback signatures and rely
on net_device pointer instead. Please note this patch does not introduce
any logical change and it is a preliminary patch in order to support
multiple net_devices connected to the same GDM3 or GDM4 port via an
external hw arbiter.
Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260412-airoha-multi-serdes-preliminary-patch-v1-3-08d5b670ca8f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove airoha_gdm_port dependency in HTB tc callback signatures and rely
on net_device pointer instead. Please note this patch does not introduce
any logical change and it is a preliminary patch in order to support
multiple net_devices connected to the same GDM3 or GDM4 port via an
external hw arbiter.
Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260412-airoha-multi-serdes-preliminary-patch-v1-2-08d5b670ca8f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove airoha_gdm_port dependency in airoha_dev_setup_tc_block routine
signature and rely on net_device pointer instead. Please note this patch
does not introduce any logical change and it is a preliminary patch to
support multiple net_devices connected to the GDM3 or GDM4 ports via an
external hw arbiter.
Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260412-airoha-multi-serdes-preliminary-patch-v1-1-08d5b670ca8f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle says:
====================
net: dsa: mxl862xx: add statistics support
Add per-port RMON statistics support for the MxL862xx DSA driver,
covering hardware-specific ethtool -S counters, standard IEEE 802.3
MAC/ctrl/pause statistics, and rtnl_link_stats64 via polled 64-bit
accumulation.
====================
Link: https://patch.msgid.link/cover.1775951347.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Poll free-running firmware RMON counters every 2 seconds and accumulate
deltas into 64-bit per-port statistics. 32-bit packet counters wrap
in ~220s at 10 Gbps line rate with minimum-size frames; the 2s polling
interval provides a comfortable margin. The .get_stats64 callback
forces a fresh poll so that counters are always up to date when queried.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/fa38548ba05866879e8912721edc91947ce4ff12.1775951347.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MxL862xx firmware exposes per-port RMON counters through the
RMON_PORT_GET command, covering standard IEEE 802.3 MAC statistics
(unicast/multicast/broadcast packet and byte counts, collision
counters, pause frames) as well as hardware-specific counters such
as extended VLAN discard and MTU exceed events.
Add the RMON counter firmware API structures and command definitions.
Implement .get_strings, .get_sset_count, and .get_ethtool_stats for
legacy ethtool -S support. Implement .get_eth_mac_stats,
.get_eth_ctrl_stats, and .get_pause_stats for the standardized
IEEE 802.3 statistics interface.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/480be14d5ed51f3db7b1681b298044dbf8e87494.1775951347.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmnYSaIACgkQxWXV+ddt
WDtZiRAApB8mZe+pi2gDeqtzu6Gx78fyXTdxJFrdxIhrqZiKqVm+eaaxwhTHlqN6
/0sWKyb5k8n2e+kFrLTuudR6UpVABKad25FkhJfS5dG5sEyV4gRM8avamVv+8v5c
mT/guXh5b09ABfnkviIj9XLA5Vp7pIt1l86yaoIzMr34tmT9QrrsWPXpgO0dQ1g/
5agz987Kbn+3+IDY+uivSRuEkgQia7FopHD8M2VFSuU6OSlshr+5COPBlhIPICTF
rdmNkK75ms43AG3uYg84+niIFOE7XHz0vIAzoXoWWBBwlKwVtUvaDRhgeEZ6ta4S
2RxNFOEYnxl+K1/1H3UkKsBRYaGzRwb/p37yjviaN/JXrTdxZm8TT/9M9ojMKGUw
ftgd2bCz+2i7mhV+ZxCuoVBpRE/+erZYLEvhTYROoUzPGYwCb4pNiVi/iccSHbzK
v8ozw5S/36tgdYrnmq3HaeEXvUm7/i7sL8x10TjUKtALwmXupXAchmDIXAUAe9YX
lohtY5jwfRNWRPmGLGNTtCrpz8YCM702THJ+9QvfbqnAR08WZmoU3EOt2Ih0+5d+
/tK6I5n1IJTkbDl1NvWXbx5IE9GTJaElIO/FeOBZWJ93EgRNOwP6ZfGsn4sy43Xl
f1KSgy//L9SQbZ/JfFNkfLbPXXu+bWpTlpTTxmisgHnO9DT/tQg=
=jntW
-----END PGP SIGNATURE-----
Merge tag 'affs-for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull AFFS fix from David Sterba:
"There's a potential out-of-bounds read in the directory hash table
during readdir"
* tag 'affs-for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: bound hash_pos before table lookup in affs_readdir
Replace the magic number with an existing define for the LEDCR
register page number on the RTL8211F.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20260411105150.184577-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmnYSG0ACgkQxWXV+ddt
WDumDQ/9E8ms1vZcfMwZUf48o7Z2fHnZMUy6dXKHnH72NiRrqSP2jZnhluT6qGqb
MmmnqvmKFNfJ0J5QLZTgFz/MWzY7PQEIG8WkQ3JvT6iKO5Csa2vFzCXv1oaGWo+m
TIw++3IS+GliKYQedgVXMYRKFc24OP95RO+Grsh8pMOXWcpSO60oSrTPyzbkdfid
+Gv4CpSRTCCl/qQ8ZX2PRQ9tLJtR2IAnJBWkwE/MPWxFfkt0oBiauy/BoiddGwrl
ocDn5fH2CnORwONLGPbVg0ScVNMaRFJfYVrI18N8pfT+4ZVeJFiWGiRnrqSmk8PG
a8BT51VPZZunyGoVFZmpqOhsy8PtqpjX0ljpebY7K69fH+1ewrWVE9ovs/nZ6Hq+
DgB9pXu2OxKdyByHfr8Pl/0A2naWOrQ0JHOGnVsEg2qDi67vy5EBUIYQbiS9uo4s
IFdd5bA04DS0Khzp2Y8Crrc2tWootsRCcUs6oiwKgKVBoqtNbFvVHKJqfi8XZB6i
W4/rL+F0gBVzR127TZF+tejd1jq9u6WOBRKlwkHK5DoWXiv84oLv/zdwtqinTWLs
N7LOFfDgYwH1YNPx12tEm9DW3Ef76RlHPZiTAmG4NUphmgwkKaYOosqsX7WvrMqR
kkeKfbsRm4M/lQDLwd8IBUloMhl2+uspxJrkNUy/31pxWxByGvk=
=mVDJ
-----END PGP SIGNATURE-----
Merge tag 'for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes:
- move shutdown ioctl support out of experimental features, a forced
stop of filesystem operation until the next unmount; additionally
there's a super block operation to forcibly remove a device from
under the filesystem that could lead to a shutdown or not if the
redundancy allows that
- report filesystem shutdown using fserror mechanism
- tree-checker updates:
- verify free space info, extent and bitmap items
- verify remap-tree items and related data in block group items
Performance improvements:
- speed up clearing first extent in the tracked range (+10%
throughput on sample workload)
- reduce COW rewrites of extent buffers during the same transaction
- avoid taking big device lock to update device stats during
transaction commit
- fix unnecessary flush on close when truncating empty files
(observed in practice on a backup application)
- prevent direct reclaim during compressed readahead to avoid stalls
under memory pressure
Notable fixes:
- fix chunk allocation strategy on RAID1-like block groups with
disproportionate device sizes, this could lead to ENOSPC due to
skewed reservation estimates
- adjust metadata reservation overcommit ratio to be less aggressive
and also try to flush if possible, this avoids ENOSPC and potential
transaction aborts in some edge cases (that are otherwise hard to
reproduce)
- fix silent IO error in encoded writes and ordered extent split in
zoned mode, the error was not correctly propagated to the address
space and could lead to zeroed ranges
- don't mark inline files NOCOMPRESS unexpectedly, the intent was to
do that for single block writes of regular files
- fix deadlock between reflink and transaction commit when using
flushoncommit
- fix overly strict item check of a running dev-replace operation
Core:
- zoned mode space reservation fixes:
- cap delayed refs metadata reservation to avoid overcommit
- update logic to reclaim partially unusable zones
- add another state to flush and reclaim partially used zone
- limit number of zones reclaimed in one go to avoid blocking
other operations
- don't let log trees consume global reserve on overcommit and fall
back to transaction commit
- revalidate extent buffer when checking its up-to-date status
- add self tests for zoned mode block group specifics
- reduce atomic allocations in some qgroup paths
- avoid unnecessary root node COW during snapshotting
- start new transaction in block group relocation conditionally
- faster check of NOCOW files on currently snapshotted root
- change how compressed bio size is tracked from bio and reduce the
structure size
- new tracepoint for search slot restart tracking
- checksum list manipulation improvements
- type, parameter cleanups, refactoring
- error handling improvements, transaction abort call adjustments"
* tag 'for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (116 commits)
btrfs: btrfs_log_dev_io_error() on all bio errors
btrfs: fix silent IO error loss in encoded writes and zoned split
btrfs: skip clearing EXTENT_DEFRAG for NOCOW ordered extents
btrfs: use BTRFS_FS_UPDATE_UUID_TREE_GEN flag for UUID tree rescan check
btrfs: remove duplicate journal_info reset on failure to commit transaction
btrfs: tag as unlikely if statements that check for fs in error state
btrfs: fix double free in create_space_info() error path
btrfs: fix double free in create_space_info_sub_group() error path
btrfs: do not reject a valid running dev-replace
btrfs: only invalidate btree inode pages after all ebs are released
btrfs: prevent direct reclaim during compressed readahead
btrfs: replace BUG_ON() with error return in cache_save_setup()
btrfs: zstd: don't cache sectorsize in a local variable
btrfs: zlib: don't cache sectorsize in a local variable
btrfs: zlib: drop redundant folio address variable
btrfs: lzo: inline read/write length helpers
btrfs: use common eb range validation in read_extent_buffer_to_user_nofault()
btrfs: read eb folio index right before loops
btrfs: rename local variable for offset in folio
btrfs: unify types for binary search variables
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmna0vIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpu8MEACN6owH/1suaJp5HBhrKseVIPQl1ldmsGF3
ZDwZndUE6pWXaeuI3g5QjSPcfWIUuLG6vs/btkIh4M32zAcFsSD8zYPItvgFzMVp
X762WPCrUcfFwKt5GqeNn6IblO8BrsbzoJWNCaSVRhWqCdzQRVktq6684nNy/fj1
JBFnMsRpwGhoKzpg1oCLOrs0V57CRdJqFdmMzQHwRTWHemvfHf6SD2+h9axfKCaV
baqvXGOLQXLwr8qHFo1LIu8lqEltHUa7boU8EMFQn/v8sPjUv46EuqZ8VVtzXH08
fY2zqWI5atA3DZCfORCHnK0qh6tPiSUtVUilXbIffhqd6lCTs891RJf3TegRCGTZ
k8WfBFVKzVlhbgGk0Km6+tiHTaK1ZmcKU0Q+uucnb3RlOdOoPvXJy3u+I5BK74aV
36JmNPWRQfzh5icmrrGKySBTX0z7NPtMiEA+qHEndIO5FWrkf5pf9U5C5gu0WEMh
iK2gotbd0Vym3EpqKQnefxflce6IpYteOACeYPXAprcQOzPK+WYjiVUJ9JcH6DhP
RPUIXXck8+GkHnM9vWtBXBKaoR7gcATHUzLX8ZnhDkAhsTJ+tOXN8skq28gglUtj
8kLMzyXklbhAJsykxKn0rqcNUOcVMatFyK4VIFyp2tWRhzMDAY4xyXYSz0lRowkd
pZAm4eSkmw==
=IoaB
-----END PGP SIGNATURE-----
Merge tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Add a callback driven main loop for io_uring, and BPF struct_ops
on top to allow implementing custom event loop logic
- Decouple IOPOLL from being a ring-wide all-or-nothing setting,
allowing IOPOLL use cases to also issue certain white listed
non-polled opcodes
- Timeout improvements. Migrate internal timeout storage from
timespec64 to ktime_t for simpler arithmetic and avoid copying of
timespec data
- Zero-copy receive (zcrx) updates:
- Add a device-less mode (ZCRX_REG_NODEV) for testing and
experimentation where data flows through the copy fallback path
- Fix two-step unregistration regression, DMA length calculations,
xarray mark usage, and a potential 32-bit overflow in id
shifting
- Refactoring toward multi-area support: dedicated refill queue
struct, consolidated DMA syncing, netmem array refilling format,
and guard-based locking
- Zero-copy transmit (zctx) cleanup:
- Unify io_send_zc() and io_sendmsg_zc() into a single function
- Add vectorized registered buffer send for IORING_OP_SEND_ZC
- Add separate notification user_data via sqe->addr3 so
notification and completion CQEs can be distinguished without
extra reference counting
- Switch struct io_ring_ctx internal bitfields to explicit flag bits
with atomic-safe accessors, and annotate the known harmless races on
those flags
- Various optimizations caching ctx and other request fields in local
variables to avoid repeated loads, and cleanups for tctx setup, ring
fd registration, and read path early returns
* tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (58 commits)
io_uring: unify getting ctx from passed in file descriptor
io_uring/register: don't get a reference to the registered ring fd
io_uring/tctx: clean up __io_uring_add_tctx_node() error handling
io_uring/tctx: have io_uring_alloc_task_context() return tctx
io_uring/timeout: use 'ctx' consistently
io_uring/rw: clean up __io_read() obsolete comment and early returns
io_uring/zcrx: use correct mmap off constants
io_uring/zcrx: use dma_len for chunk size calculation
io_uring/zcrx: don't clear not allocated niovs
io_uring/zcrx: don't use mark0 for allocating xarray
io_uring: cast id to u64 before shifting in io_allocate_rbuf_ring()
io_uring/zcrx: reject REG_NODEV with large rx_buf_size
io_uring/cancel: validate opcode for IORING_ASYNC_CANCEL_OP
io_uring/rsrc: use io_cache_free() to free node
io_uring/zcrx: rename zcrx [un]register functions
io_uring/zcrx: check ctrl op payload struct sizes
io_uring/zcrx: cache fallback availability in zcrx ctx
io_uring/zcrx: warn on a repeated area append
io_uring/zcrx: consolidate dma syncing
io_uring/zcrx: netmem array as refiling format
...
When auxiliary_device_add() fails, the error block calls
auxiliary_device_uninit() but does not return. The uninit drops the
last reference and synchronously runs bnge_aux_dev_release(), which sets
bd->auxr_dev = NULL and frees the underlying object. The subsequent
bd->auxr_dev->net = bd->netdev then dereferences NULL, which is not a
good thing to have happen when trying to clean up from an error.
Add the missing return, as the auxiliary bus documentation states is a
requirement (seems that LLM tools read documentation better than humans
do...)
Cc: Vikas Gupta <vikas.gupta@broadcom.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Fixes: 8ac050ec3b ("bng_en: Add RoCE aux device support")
Cc: stable <stable@kernel.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026041124-banshee-molecular-0f70@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix several spelling and grammar mistakes in the IPE
documentation.
No functional change.
Signed-off-by: Evan Ducas <evan.j.ducas@gmail.com>
Acked-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Fan Wu <wufan@kernel.org>
Commit de5626b95e ("tcp: Factorise cookie-independent fields
initialisation in cookie_v[46]_check().") miscategorised
tcp_rsk(req)->req_usec_ts init to cookie_tcp_reqsk_init(),
which is used by both BPF/non-BPF SYN cookie reqsk.
Rather, it should have been moved to cookie_tcp_reqsk_alloc() by
commit 8e7bab6b96 ("tcp: Factorise cookie-dependent fields
initialisation in cookie_v[46]_check()") so that only non-BPF SYN
cookie sets tcp_rsk(req)->req_usec_ts to false.
Let's move the initialisation to cookie_tcp_reqsk_alloc() to
respect bpf_tcp_req_attrs.usec_ts_ok.
Fixes: e472f88891 ("bpf: tcp: Support arbitrary SYN Cookie.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260410235328.1773449-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In f2fs_init_sysfs(), all failure paths after kset_register() jump to
put_kobject, which unconditionally releases both f2fs_tune and
f2fs_feat.
If kobject_init_and_add(&f2fs_feat, ...) fails, f2fs_tune has not been
initialized yet, so calling kobject_put(&f2fs_tune) is invalid.
Fix this by splitting the unwind path so each error path only releases
objects that were successfully initialized.
Fixes: a907f3a68e ("f2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_sbi_show(), the extension_list, extension_count and
hot_ext_count are read without holding sbi->sb_lock. If a concurrent
sysfs store modifies the extension list via f2fs_update_extension_list(),
the show path may read inconsistent count and array contents, potentially
leading to out-of-bounds access or displaying stale data.
Fix this by holding sb_lock around the entire extension list read
and format operation.
Fixes: b6a06cbbb5 ("f2fs: support hot file extension")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
An extension should not exist in both the cold and hot extension lists
simultaneously. When adding a hot extension, check whether it already
exists in the cold list, and vice versa. Reject the operation with
-EINVAL if a conflict is found.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmna0tgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptEbD/0ZMEsz5pcN+/bpM9Qva5lVVkByRieua+JA
T7L+JMcEigp1Hf2idAPlv1e9dbrtgOGhkjZNlbZenP2MHXBmbUTnzTWDKW5w0ZQ4
UqnVC7fMmxzI57DPt7iG/1WQo8O6QPHWwBof5ZXn0b83qwByTB2oVkAb9ysT7CdM
wGk5KnPRLIAWf5o+aZ4LoWE+196jQiszx1m6U58FTqnCgvJ/GyKyrgzx+uvGUgF+
owZT/6TrN7cN9A68fOnmcjEZ7beZXygOQPTn32sF9rEOi8JsgK71EE2LofdVVSNU
ES/tyKVJbSNDgUH2b0T84rErT4MtZcw5J29V3k7CVndC+DcT2uLSroPz3lYQjDg9
TLeq7ZLjnyoBG+muboWdXcvBKn3aKLec3nfVSbz6J1xb/Z22gWYy5TZbrGnGH8fJ
zBiyKkHMaZi55IdTDWQT3a48h36qFh0Y2wbvZ6uhyYOfXHyj4pA4ccJZgFfmf4ZG
flVRFGEL9Tqc82lB8dfy9DBp0ZQSjeBUCd+gyDKjiuWVau5L5iTUeMMkt8yr7qbg
PY+ATJcHk5S5zwM2xcZUt5EcHBBbCaKQ6DdRZKwzMMUvCjHlvnWvENVjUtRa9Dng
1vUKpB/e5NGpqD05Iqgyai+OD9/tALc4sUEI2yQ7/dk9pKIXQ4RE9HR/pSkgbjeR
LGokj08cgg==
=ga3t
-----END PGP SIGNATURE-----
Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
copies between kernel and userspace by matching registered buffer
PFNs at I/O time. Includes selftests.
- Refactor bio integrity to support filesystem initiated integrity
operations and arbitrary buffer alignment.
- Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
unify synchronous bi_end_io callbacks.
- Fix zone write plug refcount handling and plug removal races. Add
support for serializing zone writes at QD=1 for rotational zoned
devices, yielding significant throughput improvements.
- Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
command.
- Add io_uring passthrough (uring_cmd) support to the BSG layer.
- Replace pp_buf in partition scanning with struct seq_buf.
- zloop improvements and cleanups.
- drbd genl cleanup, switching to pre_doit/post_doit.
- NVMe pull request via Keith:
- Fabrics authentication updates
- Enhanced block queue limits support
- Workqueue usage updates
- A new write zeroes device quirk
- Tagset cleanup fix for loop device
- MD pull requests via Yu Kuai:
- Fix raid5 soft lockup in retry_aligned_read()
- Fix raid10 deadlock with check operation and nowait requests
- Fix raid1 overlapping writes on writemostly disks
- Fix sysfs deadlock on array_state=clear
- Proactive RAID-5 parity building with llbitmap, with
write_zeroes_unmap optimization for initial sync
- Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
version mismatch fallback
- Fix bcache use-after-free and uninitialized closure
- Validate raid5 journal metadata payload size
- Various cleanups
- Various other fixes, improvements, and cleanups
* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
block: refactor blkdev_zone_mgmt_ioctl
MAINTAINERS: update ublk driver maintainer email
Documentation: ublk: address review comments for SHMEM_ZC docs
ublk: allow buffer registration before device is started
ublk: replace xarray with IDA for shmem buffer index allocation
ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
ublk: verify all pages in multi-page bvec fall within registered range
ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
xfs: use bio_await in xfs_zone_gc_reset_sync
block: add a bio_submit_or_kill helper
block: factor out a bio_await helper
block: unify the synchronous bi_end_io callbacks
xfs: fix number of GC bvecs
selftests/ublk: add read-only buffer registration test
selftests/ublk: add filesystem fio verify test for shmem_zc
selftests/ublk: add hugetlbfs shmem_zc test for loop target
selftests/ublk: add shared memory zero-copy test
selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
...
f2fs_destroy_extent_node() does not set FI_NO_EXTENT before clearing
extent nodes. When called from f2fs_drop_inode() with I_SYNC set,
concurrent kworker writeback can insert new extent nodes into the same
extent tree, racing with the destroy and triggering f2fs_bug_on() in
__destroy_extent_node(). The scenario is as follows:
drop inode writeback
- iput
- f2fs_drop_inode // I_SYNC set
- f2fs_destroy_extent_node
- __destroy_extent_node
- while (node_cnt) {
write_lock(&et->lock)
__free_extent_tree
write_unlock(&et->lock)
- __writeback_single_inode
- f2fs_outplace_write_data
- f2fs_update_read_extent_cache
- __update_extent_tree_range
// FI_NO_EXTENT not set,
// insert new extent node
} // node_cnt == 0, exit while
- f2fs_bug_on(node_cnt) // node_cnt > 0
Additionally, __update_extent_tree_range() only checks FI_NO_EXTENT for
EX_READ type, leaving EX_BLOCK_AGE updates completely unprotected.
This patch set FI_NO_EXTENT under et->lock in __destroy_extent_node(),
consistent with other callers (__update_extent_tree_range and
__drop_extent_tree) and check FI_NO_EXTENT for both EX_READ and
EX_BLOCK_AGE tree.
Fixes: 3fc5d5a182 ("f2fs: fix to shrink read extent node in batches")
Cc: stable@vger.kernel.org
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The fsparam_string_empty() gives an error when mounting without string, since
its type is set to fsparam_flag in VFS. So, let's allow the flag as well.
This addresses xfstests/f2fs/015 and f2fs/021.
Fixes: d185351325 ("f2fs: separate the options parsing and options checking")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Back in 2024 I reported a 7-12% regression on an iperf3 UDP loopback
thoughput test that we traced to the extra overhead of calling
compute_score on two places, introduced by commit f0ea27e7bf ("udp:
re-score reuseport groups when connected sockets are present"). At the
time, I pointed out the overhead was caused by the multiple calls,
associated with cpu-specific mitigations, and merged commit
50aee97d15 ("udp: Avoid call to compute_score on multiple sites") to
jump back explicitly, to force the rescore call in a single place.
Recently though, we got another regression report against a newer distro
version, which a team colleague traced back to the same root-cause.
Turns out that once we updated to gcc-13, the compiler got smart enough
to unroll the loop, undoing my previous mitigation. Let's bite the
bullet and __always_inline compute_score on both ipv4 and ipv6 to
prevent gcc from de-optimizing it again in the future. These functions
are only called in two places each, udpX_lib_lookup1 and
udpX_lib_lookup2, so the extra size shouldn't be a problem and it is hot
enough to be very visible in profilings. In fact, with gcc13, forcing
the inline will prevent gcc from unrolling the fix from commit
50aee97d15, so we don't end up increasing udpX_lib_lookup2 at all.
I haven't recollected the results myself, as I don't have access to the
machine at the moment. But the same colleague reported 4.67%
inprovement with this patch in the loopback benchmark, solving the
regression report within noise margins.
Eric Dumazet reported no size change to vmlinux when built with clang.
I report the same also with gcc-13:
scripts/bloat-o-meter vmlinux vmlinux-inline
add/remove: 0/2 grow/shrink: 4/0 up/down: 616/-416 (200)
Function old new delta
udp6_lib_lookup2 762 949 +187
__udp6_lib_lookup 810 975 +165
udp4_lib_lookup2 757 906 +149
__udp4_lib_lookup 871 986 +115
__pfx_compute_score 32 - -32
compute_score 384 - -384
Total: Before=35011784, After=35011984, chg +0.00%
Fixes: 50aee97d15 ("udp: Avoid call to compute_score on multiple sites")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260410155936.654915-1-krisman@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCadfgCxAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSl5cA/0QZjJ+4V2DVzJQM5qzmNK9He9uYaOs7F2Ks
xRvg7IebAPwMEcVY+CVQxD+YGj08UgM753yx4CRbhsu4k5mowEEJDQ==
=Lz9R
-----END PGP SIGNATURE-----
Merge tag 'landlock-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull Landlock update from Mickaël Salaün:
"This adds a new Landlock access right for pathname UNIX domain sockets
thanks to a new LSM hook, and a few fixes"
* tag 'landlock-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (23 commits)
landlock: Document fallocate(2) as another truncation corner case
landlock: Document FS access right for pathname UNIX sockets
selftests/landlock: Simplify ruleset creation and enforcement in fs_test
selftests/landlock: Check that coredump sockets stay unrestricted
selftests/landlock: Audit test for LANDLOCK_ACCESS_FS_RESOLVE_UNIX
selftests/landlock: Test LANDLOCK_ACCESS_FS_RESOLVE_UNIX
selftests/landlock: Replace access_fs_16 with ACCESS_ALL in fs_test
samples/landlock: Add support for named UNIX domain socket restrictions
landlock: Clarify BUILD_BUG_ON check in scoping logic
landlock: Control pathname UNIX domain socket resolution by path
landlock: Use mem_is_zero() in is_layer_masks_allowed()
lsm: Add LSM hook security_unix_find
landlock: Fix kernel-doc warning for pointer-to-array parameters
landlock: Fix formatting in tsync.c
landlock: Improve kernel-doc "Return:" section consistency
landlock: Add missing kernel-doc "Return:" sections
selftests/landlock: Fix format warning for __u64 in net_test
selftests/landlock: Skip stale records in audit_match_record()
selftests/landlock: Drain stale audit records on init
selftests/landlock: Fix socket file descriptor leaks in audit helpers
...
David Carlier says:
====================
octeon_ep_vf: fix napi_build_skb() NULL dereference
napi_build_skb() can return NULL on allocation failure. In
__octep_vf_oq_process_rx(), the result is used directly without a
NULL check in both the single-buffer and multi-fragment paths,
leading to a NULL pointer dereference.
Patch 1 introduces a helper to deduplicate the ring index advance
pattern, patch 2 adds the actual NULL checks.
====================
Link: https://patch.msgid.link/20260409184009.930359-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
napi_build_skb() can return NULL on allocation failure. In
__octep_vf_oq_process_rx(), the result is used directly without a NULL
check in both the single-buffer and multi-fragment paths, leading to a
NULL pointer dereference.
Add NULL checks after both napi_build_skb() calls, properly advancing
descriptors and consuming remaining fragments on failure.
Fixes: 1cd3b40797 ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260409184009.930359-3-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce octep_vf_oq_next_idx() to consolidate the repeated
ring index advance and wraparound pattern in __octep_vf_oq_process_rx().
No functional change intended.
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260409184009.930359-2-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmnZehgUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP6QQ//atglS8FsdHDfnEVwHls6jhyEwi8+
7QIJUCEJcgSnVOonUeaq1Xl2m1xz7XOMFKfGPRoSvgnRI90FYavXbuPOjCWHmTgj
1U5RHs/4ytr25kpLmvds80UUGiZuFkjI26vvwt8Y0OL+PrVTbkFIaUcExFzsNbbj
11XHU1y5NCTpqNTaVZH7Uc0Leqh8cfMm8V4oGXo3sQHuDKcTVTURWcWvhkz4cmA0
Exx+ZHHXWUINJ4HZB3Ze1rLO/7p3X9ZTeIyKbcSEgfNrIQKeLRB8n9XXHAeI+vRU
uHdZaCy3DpT5CDZOuWPOQ58jp0UZr1n5HzJ/uSCpcM7Pn7TfNnK2oSQk/aPHe/Rf
p+su5B6xKdxcwUmPDe7bfP2xGlLZarPsLxxdkLKGDopXwRk6c08VRP6MfhQkxVea
x7owR4QYcZNWeTyZvV1fLKjVOF6Zp09WjpYUS7O8QYLG9b78EIGXfM1nG/wouh9W
xgvQhGCcCQ1Yu1RLzlnIqK5srV88DeENDdTTjfUbkq/dPBkwyQx+NnOOburHbIr2
kVJ9tMTkCLL0Ib2446fVPcR9/w0LsiXAtKWlSYsVM3+UhBpa70xmal/aZXEGdw/R
aecmvgPu5E/kI+nFUfVT+T6CPO1VkXvNmhR6FCITEJljSK+bM+p4FxKV0/BjQWcd
CdDqTd5/VJETspw=
=HzM1
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux update from Paul Moore:
- Annotate a known race condition to soothe KCSAN
* tag 'selinux-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: annotate intentional data race in inode_doinit_with_dentry()
Manivannan Sadhasivam says:
====================
net: qrtr: ns: A bunch of fixs
This series fixes a bunch of possible memory exhaustion issues in the QRTR
nameserver.
====================
Link: https://patch.msgid.link/20260409-qrtr-fix-v3-0-00a8a5ff2b51@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the remove callback, if a packet arrives after destroy_workqueue() is
called, but before sock_release(), the qrtr_ns_data_ready() callback will
try to queue the work, causing use-after-free issue.
Fix this issue by saving the default 'sk_data_ready' callback during
qrtr_ns_init() and use it to replace the qrtr_ns_data_ready() callback at
the start of remove(). This ensures that even if a packet arrives after
destroy_workqueue(), the work struct will not be dereferenced.
Note that it is also required to ensure that the RX threads are completed
before destroying the workqueue, because the threads could be using the
qrtr_ns_data_ready() callback.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-qrtr-fix-v3-5-00a8a5ff2b51@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the nameserver doesn't limit the number of nodes it handles.
This can be an attack vector if a malicious client starts registering
random nodes, leading to memory exhaustion.
Hence, limit the maximum number of nodes to 64. Note that, limit of 64 is
chosen based on the current platform requirements. If requirement changes
in the future, this limit can be increased.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-qrtr-fix-v3-4-00a8a5ff2b51@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A node sends the BYE packet when it is about to go down. So the nameserver
should advertise the removal of the node to all remote and local observers
and free the node finally. But currently, the nameserver doesn't free the
node memory even after processing the BYE packet. This causes the node
memory to leak.
Hence, remove the node from Xarray list and free the node memory during
both success and failure case of ctrl_cmd_bye().
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-qrtr-fix-v3-3-00a8a5ff2b51@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current code does no bound checking on the number of lookups a client can
perform. Though the code restricts the lookups to local clients, there is
still a possibility of a malicious local client sending a flood of
NEW_LOOKUP messages over the same socket.
Fix this issue by limiting the maximum number of lookups to 64 globally.
Since the nameserver allows only atmost one local observer, this global
lookup count will ensure that the lookups stay within the limit.
Note that, limit of 64 is chosen based on the current platform
requirements. If requirement changes in the future, this limit can be
increased.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-qrtr-fix-v3-2-00a8a5ff2b51@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current code does no bound checking on the number of servers added per
node. A malicious client can flood NEW_SERVER messages and exhaust memory.
Fix this issue by limiting the maximum number of server registrations to
256 per node. If the NEW_SERVER message is received for an old port, then
don't restrict it as it will get replaced. While at it, also rate limit
the error messages in the failure path of qrtr_ns_worker().
Note that the limit of 256 is chosen based on the current platform
requirements. If requirement changes in the future, this limit can be
increased.
Cc: stable@vger.kernel.org
Fixes: 0c2204a4ad ("net: qrtr: Migrate nameservice to kernel from userspace")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260409-qrtr-fix-v3-1-00a8a5ff2b51@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmnZeioUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN15w//e6Tgou0wgffKb+vZP+9xC53bQzmL
Z0en5gdfifbqeIvj6LzSdUlIlsDUw65S42eGhqIYwk5oYNd2lwFMXd16fggakc/n
TDdF1/7WTwcwMFKJxtew5tcE3pjwC96F6bqF9YmDdcNycjuQ5cbsJ/56hQsWZYxo
g8y5y3fmWrkQ28gst2NJiR6XQx7acFc3S2FRKZc8mldkDjrmb9gN9WdwWJ6/nw+A
xnNm6BdVjZ/gnrQliO4eL4J5T1ijvy3gddW3rXdytcIoH8js/pcZh3BpfTVWlzs+
5KLPy9Tm39g3BwNx5cHrmzz1Ug6fCqIJyJzPK0M3q3vr+7w1kEWZnM5IUQwdQPsg
dVmLBvhrvnKNBnMXd53seQJm33UkcKPpfWbaYQFpUC1ZocUiQDvE3eCH+q4SIwjo
kwB6Ycc27O3PjXzMtwQv5a1oKeOHuNKr0YCOAoOs1bshcJ4lTzVwqe4StuLeoN8z
7G+sfoT4JpM/izPTQjF+tNRcDECvX7j8b42BJcGx8Zf+JiP4HC1xBmLOg4egLc6m
hxwaT5ipLhBL95eNCp16bqVrw5vVzZ+HbtDCkXJmU+grdsuTsp0bUGmm1DjpsFVk
l/PyMDCvMNzi3uuNI9v9usQblv57XK6oNVBqcuoqejEDkEuX3MBSkov7DZr9nLLO
JO1EvsWAjQdyXeQ=
=giHS
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
"We only have five patches in the LSM tree, but three of the five are
for an important bugfix relating to overlayfs and the mmap() and
mprotect() access controls for LSMs. Highlights below:
- Fix problems with the mmap() and mprotect() LSM hooks on overlayfs
As we are dealing with problems both in mmap() and mprotect() there
are essentially two components to this fix, spread across three
patches with all marked for stable.
The simplest portion of the fix is the creation of a new LSM hook,
security_mmap_backing_file(), that is used to enforce LSM mmap()
access controls on backing files in the stacked/overlayfs case. The
existing security_mmap_file() does not have visibility past the
user file. You can see from the associated SELinux hook callback
the code is fairly straightforward.
The mprotect() fix is a bit more complicated as there is no way in
the mprotect() code path to inspect both the user and backing
files, and bolting on a second file reference to vm_area_struct
wasn't really an option.
The solution taken here adds a LSM security blob and associated
hooks to the backing_file struct that LSMs can use to capture and
store relevant information from the user file. While the necessary
SELinux information is relatively small, a single u32, I expect
other LSMs to require more than that, and a dedicated backing_file
LSM blob provides a storage mechanism without negatively impacting
other filesystems.
I want to note that other LSMs beyond SELinux have been involved in
the discussion of the fixes presented here and they are working on
their own related changes using these new hooks, but due to other
issues those patches will be coming at a later date.
- Use kstrdup_const()/kfree_const() for securityfs symlink targets
- Resolve a handful of kernel-doc warnings in cred.h"
* tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux: fix overlayfs mmap() and mprotect() access checks
lsm: add backing_file LSM hooks
fs: prepare for adding LSM blob to backing_file
securityfs: use kstrdup_const() to manage symlink targets
cred: fix kernel-doc warnings in cred.h
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmnZegUUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNydxAApWBVRWp/AY7jtCQGWRYAa+6y+bQ0
RWfu8putXaOyk3NTeWP64e87FKsdByR/yflefYxMH+bXc2mwbuUZYAreEVmLCJ1P
QxHKuwCkCNOz90n/Y7nlDSDK1GYdzlFkCgidfr4iNSCD58WMTtNNpZREzaNiR8a1
PZ3bFvJH+S7BRCGA6/S/20rNYeWTga56pSrWt6VpMwVHGJ1R4DsD60pT8z0NqMYI
BTBLeZ36HlZdwUp+APldKNNDRKG1ZQVKJRO68qcSkopr4vQzK7yL/SJsCdU8MHj2
LccXTCTHHWJbpdiE7BtzPO9UobVZIdcz2wsnJHWxzHYtXlPolgM7F31111GL4HSv
V/mq5o7dR3h6nn+1gkWHjOpd/f3J3xl3FaJsH9FIIhPmCRHb4oZI0WG0ZH3mHZBl
o6aaWja3PBl0XNA+q87DQVBYDOyVNB4RjuaKy+d7hm4eronTRaZkg3zutrB6/XxP
uFbp+Q3diWNMsYO52DKFThL/sStmnnCMIRJuTxd8QaPhLVakaFSkWZycSUH4HijD
8WMk3e4yo3TeD6rCAognwKclj0vCMHS3TLOMXlY0vMD04gwXJ2S81yfyXGT4F5De
KkXj61TFMxPyiZ6yrxk86BmoqHL0DUiCDn1rMKbNdIncHedKZoNuy+O/XNLS6No/
hLRvXSI7MNthJ5E=
=1rY2
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Improved handling of unknown status requests from userspace
The current kernel code ignores unknown/unused request bits sent from
userspace and returns an error code based on the results of the
request(s) it does understand. The patch from Ricardo fixes this so
that unknown requests return an -EINVAL to userspace, making
compatibility a bit easier moving forward.
- A number of small style and formatting cleanups
* tag 'audit-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: handle unknown status requests in audit_receive_msg()
audit: fix coding style issues
audit: remove redundant initialization of static variables to 0
audit: fix whitespace alignment in include/uapi/linux/audit.h
Breno Leitao says:
====================
net: move .getsockopt away from __user buffers
Currently, the .getsockopt callback requires __user pointers:
int (*getsockopt)(struct socket *sock, int level,
int optname, char __user *optval, int __user *optlen);
This prevents kernel callers (io_uring, BPF) from using getsockopt on
levels other than SOL_SOCKET, since they pass kernel pointers.
Following Linus' suggestion [0], this series introduces sockopt_t, a
type-safe wrapper around iov_iter, and a getsockopt_iter callback that
works with both user and kernel buffers. AF_PACKET and CAN raw are
converted as initial users, with selftests covering the trickiest
conversion patterns.
[0] https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/
====================
Link: https://patch.msgid.link/20260408-getsockopt-v3-0-061bb9cb355d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert CAN raw socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.
Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of copy_to_user()
- For CAN_RAW_FILTER and CAN_RAW_XL_VCID_OPTS: on -ERANGE, set
opt->optlen to the required buffer size. The wrapper writes this
back to userspace even on error, preserving the existing API that
lets userspace discover the needed allocation size.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260408-getsockopt-v3-4-061bb9cb355d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert AF_PACKET's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.
Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- For PACKET_HDRLEN which reads from optval: use opt->iter_in with
copy_from_iter() for the input read, then the common opt->iter_out
copy_to_iter() epilogue handles the output
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260408-getsockopt-v3-3-061bb9cb355d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update do_sock_getsockopt() to use the new getsockopt_iter callback
when available. Add do_sock_getsockopt_iter() helper that:
1. Reads optlen from user/kernel space
2. Initializes a sockopt_t with the appropriate iov_iter (kvec for
kernel, ubuf for user buffers) and sets opt.optlen
3. Calls the protocol's getsockopt_iter callback
4. Writes opt.optlen back to user/kernel space
The optlen is always written back, even on failure. Some protocols
(e.g. CAN raw) return -ERANGE and set optlen to the required buffer
size so userspace knows how much to allocate.
The callback is responsible for setting opt.optlen to indicate the
returned data size.
Important to say that iov_out does not need to be copied back in
do_sock_getsockopt().
When optval is not kernel (the userspace path), sockptr_to_sockopt()
sets up opt->iter_out as a ITER_DEST ubuf iterator pointing directly at
the userspace buffer (optval.user). So when getsockopt_iter
implementations call copy_to_iter(..., &opt->iter_out), the data is
written directly to userspace — no intermediate kernel buffer is
involved.
When optval.is_kernel is true (the in-kernel path, e.g. from io_uring),
the kvec points at the already-provided kernel buffer (optval.kernel),
so the data lands in the caller's buffer directly via the kvec-backed
iterator.
In both cases the iterator writes to the final destination in-place at
protocol callback. There's nothing to copy back — only optlen needs to
be written back.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260408-getsockopt-v3-2-061bb9cb355d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new getsockopt_iter callback to struct proto_ops that uses
sockopt_t, a type-safe wrapper around iov_iter. This provides a clean
interface for socket option operations that works with both user and
kernel buffers.
The sockopt_t type encapsulates an iov_iter and an optlen field.
The optlen field, although not suggested by Linus, serves as both input
(buffer size) and output (returned data size), allowing callbacks to
return random values independent of the bytes written via
copy_to_iter(), so, keep it separated from iov_iter.count.
This is preparatory work for removing the SOL_SOCKET level restriction
from io_uring getsockopt operations.
Keep in mind that both iter_out and iter_in always point to the same
data at all times, and we just have two of them to make the callback
implementation sane.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260408-getsockopt-v3-1-061bb9cb355d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>