Replace the conditional assertions with proper error handling and
transaction abort if we find an unexpected key type in the free space
tree.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In add_remap_tree_entries(), we only process a certain number of entries
at a time, meaning we may need to loop.
But because we weren't checking the return value of btrfs_insert_empty_items()
within the loop, this meant that if the last iteration of the loop
succeeded but a previous iteration failed, we were erroneously returning
0.
Fix this by breaking the loop early if btrfs_insert_empty_items() fails.
Fixes: b56f35560b ("btrfs: handle setting up relocation of block group with remap-tree")
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although commit 304076527c ("btrfs: move shutdown and remove_bdev
callbacks out of experimental features") tries to move both shutdown and
remove_bdev out of experimental features, that commit has only addressed
the super block operation callback, the ioctl one is left untouched.
Fix that missing aspect by also moving shutdown ioctl out of
experimental features.
Since we're here, also add unknown flag detection to reject any
unsupported shutdown flags.
Fixes: 304076527c ("btrfs: move shutdown and remove_bdev callbacks out of experimental features")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently for tree block readahead we never pass a
btrfs_tree_parent_check with @has_first_key set.
Without @has_first_key set, btrfs will skip the following extra
checks:
- Header generation check
This is a minor one.
- Empty leaf/node checks
This is more serious, for certain trees like the csum tree, they are
allowed to be empty, thus an empty leaf can pass the tree checker.
But if there is a parent node for such an empty leaf, it indicates
corruption.
Without @has_first_key set, we can no longer detect such a problem.
In fact there is already a fuzzed image report that a corrupted csum
leaf which has zero nritems but still has a parent node can trigger
a BUG_ON() during csum deletion.
However there are only two call sites of btrfs_readahead_tree_block():
- Inside relocate_tree_blocks()
At this call site we are trying to grab the first key of the tree
block, thus we are not able to pass a @first_key parameter.
- Inside btrfs_readahead_node_child()
This is the more common call site, where we have the parent node and
want to readahead the child tree blocks.
In this case we can easily grab the node key and pass it for checks.
Add a new parameter @first_key to btrfs_readahead_tree_block() and pass
the node key to it inside btrfs_readahead_node_child().
This should plug the gap in empty leaf detection during readahead.
Link: https://lore.kernel.org/linux-btrfs/20260409071255.3358044-1-gality369@gmail.com/
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If one of the calls made by do_remap_reloc_trans() fails, we can leave
the remap tree in an inconsistent state. Abort the transaction if this
happens, to prevent the corrupt state from reaching the disk.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the call to btrfs_reserve_extent() in do_remap_reloc_trans() returns
a smaller extent than we asked for, currently we're not undoing the
bytes_may_use change that we made. Fix this by calling
btrfs_space_info_update_bytes_may_use() again for the difference.
Fixes: fd6594b144 ("btrfs: replace identity remaps with actual remaps when doing relocations")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the call to btrfs_reserve_extent() in move_existing_remap() returns a
smaller extent than we asked for, currently we're not undoing the
bytes_may_use change that we made. Fix this by calling
btrfs_space_info_update_bytes_may_use() again for the difference.
Fixes: bbea42dfb9 ("btrfs: move existing remaps before relocating block group")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This odd file was added to automatically figure out tool-generated
symbols.
Honestly, it *should* have been just a real honest-to-goodness regular
file in git, instead of having strange code to generate it in the
Makefile, but that is not how that silly thing works. So now we need to
ignore it explicitly.
Fixes: 1211907ac0 ("tracing: Generate undef symbols allowlist for simple_ring_buffer")
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes regressions in non-bash shells and busybox support, and reverts
a commit that regression in build and installation when one or more
tests fail to build. Fixes duplicated test number reporting introduced
in ktap support patch.
- selftests: Fix duplicated test number reporting
- selftests: Fix runner.sh for non-bash shells
- selftests: Fix runner.sh busybox support
- selftests: Deescalate error reporting
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmnmPx8ACgkQCwJExA0N
QxwZ7g/+L4ZZDp3RyuauvCV4zhG5GQFudtvAkcLOSwBVnRbiLdPmdOXXz7IkW7DN
U/WQx3pDOYmvtr2QdNvXch3HOdk1vUfNViU5yPNqC4jVZEMON2N7oGr2Eq+WVhi+
gl63pRYk9ISh+5vOlzQY9UX1sLOxlME1foMJdHQEZHhgbNxlc7s/NfpqAnRC7a4l
SFuzL/PJl9kYiMUFeYLB9kwrelvoLrzItMVz7/m56dgNVuEmbNDESBXGwJQneH6l
SWOXPC96gu2cajluNfyhOqarkuGVD8x6J+2vWBwrDnSiyMLyealAOHnK5JGR17hW
NErJDpqpdlIue5/h/XFnZ+4o43J8uEiUxmP7UiPAmreBllajeNz4xZPuz+i2vLH2
O9dzzj/SV9War5txaFdqHXpbZE2zYOfhA07Xg6VdjcB0LTWaSOPqiIPr9UTwvT6o
T7vYkvE+w4rjXwTFEscHkZ5jXrvAiWMrgiK4BuzXWy03/BvOF6LiMf0NCELvKvZG
ZubLCJ1N/2EXgt+MX9dRmxq+7ZXCGu53TU5GeX1u/vT5lqsaPwoXT4RylZek5hwx
DfKjEOU22TOQXAV01z1sPJvNwPZa84Hejzf6c6v2xobY/vaf4XxgyXAIAJJGOfpj
25nEdecvjdX62kFfY/QrX7akpr7IreonpssmQVuGt3jilpq4uLg=
=1k+f
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-next-7.1-next-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fix regressions in non-bash shells and busybox support, and revert a
commit that regressed in build and installation when one or more tests
fail to build.
Fix duplicated test number reporting introduced in ktap support patch"
* tag 'linux_kselftest-next-7.1-next-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Fix duplicated test number reporting
selftests: Fix runner.sh for non-bash shells
selftests: Fix runner.sh busybox support
selftests: Deescalate error reporting
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a recent
commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation, uninitialised
variable and checker warnings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmnmQZcACgkQa9axLQDI
XvGouA//SXlo7hQyM41rkgRru9oqftrGg0y6nxz4Z089kv50cm3Jlf/nUuti6vah
BMBLCGXA1iOQrGIVmuvtCxDRrfYZWpfKGuT9A0gmEoMqrGIpWl9gfBQG+uR+YrQX
4kp5DLqB85WrJIPiy7HUV6GQoCbFuMrRJwxl89IdWZSobaei3SczTmnttwyJtxG5
/BMitl024TYdiOPNo8bhiML1wIJCaTHvH4IrtCHPyUHEAtsHSMy00y0OrSKBtA/9
ZHZRpY7Po/jnL7YUs1AfYwsaSXjkvqXN0K1Tdavzm75k6lpJmbM3VsZabG/CEuvK
PCOGV++is4Y/A+7aQsCwXKeVnY3b6AC4sextytNq0g3GZ7I+Ht9O6nbsp5ZmyXzB
HRiFxmFS1pSQOMX9f1neKi3vxDMTy1tKPeccTTzL8dNnxTvUBXnoWfPoJh3cpbjm
Dbhe1kksiEn01WWFacGtkIPDa9c+Bkd2T+8wrsk85Z+u3Z0JPM5PfOn6v3X9YlKl
K7W8fhvlDL1wP+iyWcMT5zdo+xzHY4ZxuyWbi9a4RhKc6lFHVVG2mpUuPwSsh2ma
NnxkDouriuoADHBir89U71N483HSnNfSjhlVSFYD2LFCre5KOZM4KYZ2vwWb8Sy4
79q+BlVRUTQ5O6XjePoSPjUW4APPNviHJsF4E4IiqHkd9O5lMZU=
=LNY2
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"The main 'feature' is a workaround for C1-Pro erratum 4193714
requiring IPIs during TLB maintenance if a process is running in user
space with SME enabled.
The hardware acknowledges the DVMSync messages before completing
in-flight SME accesses, with security implications. The workaround
makes use of the mm_cpumask() to track the cores that need
interrupting (arm64 hasn't used this mask before).
The rest are fixes for MPAM, CCA and generated header that turned up
during the merging window or shortly before.
Summary:
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a
recent commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation,
uninitialised variable and checker warnings"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
- sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
- sh: Remove CONFIG_VSYSCALL reference from UAPI
- sh: Fix typo in SPDX license ID lines
- sh: Include <linux/io.h> in dac.h
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEYv+KdYTgKVaVRgAGdCY7N/W1+RMFAmnmHWgACgkQdCY7N/W1
+RNVSQ/8DCMVL6n3HWJ0hllF5q5GDjk2CrpRRvcexw9B/ewOkgdykxnlFU90xSy3
cHO+Y5ppeQuTLvnfYajCRUoR06OvBppAa3Ch1mWUYSDk/Ajs18gWnOYvvc+9isl9
Pc25RjUa1E0pqayL+1XVdihk/moWM4A25bM8ND/Gqc4A6TyfRzESKvrz+jwrm5KL
xDJBpD/tEKDqBnkiPE0Y/W3IjiZUG5ZpDuNIpkIW5JDWwlbT+4Xd7pcMQfor9JTM
UCO4ZHDuhyf7vuMptFx/x6h2D1ssfDS7+5uGBRFIMpXJsXtbKSrePwF9dp/C8jia
7XZVvmchtALZyUfyE/Z/UxaywtG8KLbPATMAlkb4veXJoqpKZXGNhtSE4M3P3h35
CFfaGeSZEZjNYYT7TUjjZfv3EgTOeuC5I2wsJ+0ZaGMa/r/lIXwo7t6GwEApXXMN
xGGG8/pS0Amyg2oWcf0xH0UGFeBt4AjO16NdC6z/5WRL42kqUn3V3KUGi2CWPx9L
oHMnTQVTXvvhB7Lml58BVauLq3NHAPSuOvc2B/aelhv2sheET9PWVt/FeKacmC4N
NyiNGFSlgonXUvMmh5YJwE1IwvQiziTF3BiHaCkuU4qS0lBpfxMsDGvmDusBlCjF
MMTjlhDowpWYEbswHZmHhi4w2/ATTemrEoaY6m9jZNvZRwOK01k=
=jFKW
-----END PGP SIGNATURE-----
Merge tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"Two patches from Thomas Zimmermann, one by Tim Bird and one by Thomas
Weißschuh.
The first patch by Thomas Zimmermann adds a missing include in dac.h
for SH-3 which became necessary after 243ce64b2b ("backlight: Do not
include <linux/fb.h> in header file") which made __raw_readb() and
__raw_writeb() inaccessible in dac.h.
Thomas' second patch drops CONFIG_FIRMWARE_EDID for SH as it depends
on X86 or EFI_GENERIC_STUB which are not defined on SH for obvious
reasons.
The patch by Tim Bird fixes just a small typo in two SPDX ID lines
which he stumbled over by accident.
And, least but not last, the patch by Thomas Weißschuh removes the
CONFIG_VSYSCALL reference from UAPI. This was necessary as the
definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe39 (arm64,ia64,ppc,s390,
sh,tile,um,x86,mm: remove default gate area)"
* tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
sh: Remove CONFIG_VSYSCALL reference from UAPI
sh: Fix typo in SPDX license ID lines
sh: Include <linux/io.h> in dac.h
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmnmEggbFIAAAAAABAAO
bWFudTIsMi41KzEuMTIsMiwyAAoJEFKgDEdIgJTyjQwP/j2a9EzdI4+DfGrmA56m
/heo44ObpFJppYaWEyAN7xX8Xpm7ErokLjZxVxhgQY7hGU5WLx1CmslnJbfWYkWy
r7q92Kd/QIDmsvwHzlE0xdaX3rp8AXp3O2iIhcDfv9FRe+UultrESBpw9JbRYXXQ
SLAFU1iOFMxyuzvhW/2l/B+PQDm0uoRMpMWTsLXo2JtNOsIGewNyV/7dhOumm+RD
/0lgVA9jMAQA33j4Hkr38REe8lYH7aGl66x1mZhDg+kYb7w7ndKW/QN21OJiP+2D
s4moi2/VLmC/UcxoDAOQTArKYkrYc2nzLMJRMaIun8jcNfCHEqaQAW2MTzSDAC78
+atdlWrfIxykORA31lTtjh4o5vg43sQPjFZCJr3RxNLfxFy15NULhT75QFrxe9sA
W3gs4Rz+LeynoQbJNCn8hgK0xcpKLtzC4dMY7dfQo6uyq2YnJauEvioJKbUYyoDh
3l3uYfnzHcfRUb+yNtIkNCiYXwrpTAnSnifCbWO3smWYxhCdAT14Rval/fxtTjSe
sTphd7m32U56WpWX0lYQbY001nMzso+gN/eQSK20IzrhvghwYUqvCKNm+fIM9tjR
nQBxka+B7XhO27hNV4QIT8ZQRUkabQQAseEFhLQI2Ptgw3eV0s85gMP60Lg4gpZ0
7OGA/VM+xrLqXvsFeDDWccSg
=O6QR
-----END PGP SIGNATURE-----
Merge tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Fix printk ring buffer initialization and sanity checks
- Workaround printf kunit test compilation with gcc < 12.1
- Add IPv6 address printf format tests
- Misc code and documentation cleanup
* tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printf: Compile the kunit test with DISABLE_BRANCH_PROFILING DISABLE_BRANCH_PROFILING
lib/vsprintf: use bool for local decode variable
lib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()
printk: ringbuffer: fix errors in comments
printk_ringbuffer: Add sanity check for 0-size data
printk_ringbuffer: Fix get_data() size sanity check
printf: add IPv6 address format tests
printk: Fix _DESCS_COUNT type for 64-bit systems
remove_waiter() is used by the slowlock paths, but it is also used for
proxy-lock rollback in rt_mutex_start_proxy_lock() when invoked from
futex_requeue().
In the latter case waiter::task is not current, but remove_waiter()
operates on current for the dequeue operation. That results in several
problems:
1) the rbtree dequeue happens without waiter::task::pi_lock being held
2) the waiter task's pi_blocked_on state is not cleared, which leaves a
dangling pointer primed for UAF around.
3) rt_mutex_adjust_prio_chain() operates on the wrong top priority waiter
task
Use waiter::task instead of current in all related operations in
remove_waiter() to cure those problems.
[ tglx: Fixup rt_mutex_adjust_prio_chain(), add a comment and amend the
changelog ]
Fixes: 8161239a8b ("rtmutex: Simplify PI algorithm and make highest prio task get lock")
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Replace all variations of "paddr" variables in KVM selftests with "gpa",
with the exception of the ELF structures, as those fields are not specific
to guest virtual addresses, to complete the conversion from vm_paddr_t to
gpa_t.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In x86's nested TDP APIs, use the appropriate gpa_t typedef and rename
variables from nested_paddr to l2_gpa to match KVM x86's nomenclature.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace all variations of "vaddr" variables in KVM selftests with "gva",
with the exception of the ELF structures, as those fields are not specific
to guest virtual addresses, to complete the conversion from vm_vaddr_t to
gva_t.
Opportunistically use gva_t instead of u64 for relevant variables, and
fixup indentation as appropriate.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename inject_uer()'s @paddr to @hpa to make it more obvious that it
injects an error using a host PA, not a guest PA.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename arm64's translate_to_host_paddr() to translate_hva_to_hpa() and
update variable names to match, as using "vaddr" and "paddr" terminology
is super confusing due to selftests using those exact names for *guest*
addresses.
Opportunisitically drop superfluous local page_addr and paddr variables.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that KVM selftests use gva_t instead of vm_vaddr_t, rename the helper
for populating the initial GVA bitmap to drop the defunct terminology and
use "vm" for the scope.
Opportunistically fixup the declaration of the API, which has been broken
since day 1. The flaw went unnoticed because the sole caller is defined
after the weak version, i.e. can see the prototype without a previous
declaration.
No functional change intended.
Fixes: e8b9a055fa ("KVM: arm64: selftests: Align VA space allocator with TTBR0")
Link: https://patch.msgid.link/20260420212004.3938325-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that KVM selftests use gva_t instead of vm_vaddr_t, rename the API
for finding an unused range of virtual memory to drop the defunct
terminology and use "vm" for the scope.
Opportunistically clean up the function comment to drop superfluous
and redundant information.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that KVM selftests use gva_t instead of vm_vaddr_t, drop "vaddr_" from
the core memory allocation APIs as the information is extraneous and does
more harm than good. E.g. the APIs don't _just_ allocate virtual memory,
they allocate backing physical memory and install mappings in the guest
page tables. And as proven by kmalloc() and malloc(), developers generally
expect that allocations come with a working virtual address.
Opportunistically clean up the function comment for vm_alloc(), and drop
the misleading and superfluous comments for its wrappers.
No functional change intended.
Link: https://patch.msgid.link/20260420212004.3938325-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use u8 instead of uint8_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint8_t/u8/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use s16 instead of int16_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/int16_t/s16/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use u16 instead of uint16_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint16_t/u16/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use s32 instead of int32_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/int32_t/s32/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use u32 instead of uint32_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint32_t/u32/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use s64 instead of int64_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/int64_t/s64/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use u64 instead of uint64_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint64_t/u64/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
Include <linux/types.h> in include/kvm_util_types.h, iinclude/test_util.h,
and include/x86/pmu.h to pick up the tools-defined u64. Arguably, all
headers (especially kvm_util_types.h) should have already been including
stdint.h to get uint64_t from the libc headers, but the missing dependency
only rears its head once KVM uses u64 instead of uint64_t.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
[sean: rename pread_uint64() => pread_u64, expand on types.h include]
Link: https://patch.msgid.link/20260420212004.3938325-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fix various Hyper-V selftests to use gpa_t for variables that contain
guest physical addresses, rather than gva_t. In practice, the bugs are
benign as both gva_t and gpa_t are u64 typedefs, i.e. gpa_t and gva_t are
interchangeable from a functional perspective, the code is just confusing.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
[sean: call out that both are u64 typedefs]
Link: https://patch.msgid.link/20260420212004.3938325-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace all occurrences of vm_paddr_t with gpa_t to align with KVM code
and with the conversion helpers (e.g. addr_hva2gpa()).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/vm_paddr_/gpa_/g'
Then by manually adjusting whitespace to make checkpatch.pl happy.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
[sean: drop bogus changelog blurb about renaming functions]
Link: https://patch.msgid.link/20260420212004.3938325-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace all occurrences of vm_vaddr_t with gva_t to align with KVM code
and with the conversion helpers (e.g. addr_gva2hva()).
This commit was generated with the following command:
git ls-files tools/testing/selftests/kvm | xargs sed -i 's/vm_vaddr_/gva_/g'
Then by manually adjusting whitespace to make checkpatch.pl happy, and
dropping renames of functions that allocate memory within a given VM.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
[sean: drop renames of allocator APIs]
Link: https://patch.msgid.link/20260420212004.3938325-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The nf_osf_ttl() function accessed skb->dev to perform a local interface
address lookup without verifying that the device pointer was valid.
Additionally, the implementation utilized an in_dev_for_each_ifa_rcu
loop to match the packet source address against local interface
addresses. It assumed that packets from the same subnet should not see a
decrement on the initial TTL. A packet might appear it is from the same
subnet but it actually isn't especially in modern environments with
containers and virtual switching.
Remove the device dereference and interface loop. Replace the logic with
a switch statement that evaluates the TTL according to the ttl_check.
Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Kito Xu (veritas501) <hxzene@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/20260414074556.2512750-1-hxzene@gmail.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In nf_osf_match(), the nf_osf_hdr_ctx structure is initialized once
and passed by reference to nf_osf_match_one() for each fingerprint
checked. During TCP option parsing, nf_osf_match_one() advances the
shared ctx->optp pointer.
If a fingerprint perfectly matches, the function returns early without
restoring ctx->optp to its initial state. If the user has configured
NF_OSF_LOGLEVEL_ALL, the loop continues to the next fingerprint.
However, because ctx->optp was not restored, the next call to
nf_osf_match_one() starts parsing from the end of the options buffer.
This causes subsequent matches to read garbage data and fail
immediately, making it impossible to log more than one match or logging
incorrect matches.
Instead of using a shared ctx->optp pointer, pass the context as a
constant pointer and use a local pointer (optp) for TCP option
traversal. This makes nf_osf_match_one() strictly stateless from the
caller's perspective, ensuring every fingerprint check starts at the
correct option offset.
Fixes: 1a6a0951fc ("netfilter: nfnetlink_osf: add missing fmatch check")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, IPVS skips MTU checks for GSO packets by excluding them with
the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel
mode encapsulates GSO packets with IPIP headers.
The issue manifests in two ways:
1. MTU violation after encapsulation:
When a GSO packet passes through IPVS tunnel mode, the original MTU
check is bypassed. After adding the IPIP tunnel header, the packet
size may exceed the outgoing interface MTU, leading to unexpected
fragmentation at the IP layer.
2. Fragmentation with problematic IP IDs:
When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments
is fragmented after encapsulation, each segment gets a sequentially
incremented IP ID (0, 1, 2, ...). This happens because:
a) The GSO packet bypasses MTU check and gets encapsulated
b) At __ip_finish_output, the oversized GSO packet is split into
separate SKBs (one per segment), with IP IDs incrementing
c) Each SKB is then fragmented again based on the actual MTU
This sequential IP ID allocation differs from the expected behavior
and can cause issues with fragment reassembly and packet tracking.
Fix this by properly validating GSO packets using
skb_gso_validate_network_len(). This function correctly validates
whether the GSO segments will fit within the MTU after segmentation. If
validation fails, send an ICMP Fragmentation Needed message to enable
proper PMTU discovery.
Fixes: 4cdd34084d ("netfilter: nf_conntrack_ipv6: improve fragmentation handling")
Signed-off-by: Yingnan Zhang <342144303@qq.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal says:
"Historically this is not an issue, even for normal base hooks: the data
path doesn't use the original nf_hook_ops that are used to register the
callbacks.
However, in v5.14 I added the ability to dump the active netfilter
hooks from userspace.
This code will peek back into the nf_hook_ops that are available
at the tail of the pointer-array blob used by the datapath.
The nat hooks are special, because they are called indirectly from
the central nat dispatcher hook. They are currently invisible to
the nfnl hook dump subsystem though.
But once that changes the nat ops structures have to be deferred too."
Update nf_nat_register_fn() to deal with partial exposition of the hooks
from error path which can be also an issue for nfnetlink_hook.
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a partial revert of:
commit ab4f21e6fb ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")
to allow ipv4 and ipv6 only.
- xt_mac
- xt_owner
- xt_physdev
These extensions are not used by ebtables in userspace.
Moreover, xt_realm is only for ipv4, since dst->tclassid is ipv4
specific.
Fixes: ab4f21e6fb ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")
Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Replace it with scnprintf, the buffer sizes are expected to be large enough
to hold the result, no need for snprintf+overflow check.
Increase buffer size in mangle_content_len() while at it.
BUG: KASAN: stack-out-of-bounds in vsnprintf+0xea5/0x1270
Write of size 1 at addr [..]
vsnprintf+0xea5/0x1270
sprintf+0xb1/0xe0
mangle_content_len+0x1ac/0x280
nf_nat_sdp_session+0x1cc/0x240
process_sdp+0x8f8/0xb80
process_invite_request+0x108/0x2b0
process_sip_msg+0x5da/0xf50
sip_help_tcp+0x45e/0x780
nf_confirm+0x34d/0x990
[..]
Fixes: 9fafcd7b20 ("[NETFILTER]: nf_conntrack/nf_nat: add SIP helper port")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_osf_match_one() computes ctx->window % f->wss.val in the
OSF_WSS_MODULO branch with no guard for f->wss.val == 0. A
CAP_NET_ADMIN user can add such a fingerprint via nfnetlink; a
subsequent matching TCP SYN divides by zero and panics the kernel.
Reject the bogus fingerprint in nfnl_osf_add_callback() above the
per-option for-loop. f->wss is per-fingerprint, not per-option, so
the check must run regardless of f->opt_num (including 0). Also
reject wss.wc >= OSF_WSS_MAX; nf_osf_match_one() already treats that
as "should not happen".
Crash:
Oops: divide error: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:nf_osf_match_one (net/netfilter/nfnetlink_osf.c:98)
Call Trace:
<IRQ>
nf_osf_match (net/netfilter/nfnetlink_osf.c:220)
xt_osf_match_packet (net/netfilter/xt_osf.c:32)
ipt_do_table (net/ipv4/netfilter/ip_tables.c:348)
nf_hook_slow (net/netfilter/core.c:622)
ip_local_deliver (net/ipv4/ip_input.c:265)
ip_rcv (include/linux/skbuff.h:1162)
__netif_receive_skb_one_core (net/core/dev.c:6181)
process_backlog (net/core/dev.c:6642)
__napi_poll (net/core/dev.c:7710)
net_rx_action (net/core/dev.c:7945)
handle_softirqs (kernel/softirq.c:622)
Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This expression only supports for ipv4, restrict it.
Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
__io_uring_add_tctx_node() reads ctx->int_flags and
ctx->iowq_limits[0..1] without holding ctx->uring_lock, while
io_register_iowq_max_workers() writes these same fields under the lock.
Mostly an application problem if you try and make these race, but let's
silence KCSAN by just grabbing the ->uring_lock around the operation.
This is a slow path operation anyway, and ->uring_lock will be grabbed
by submission right after anyway.
Fixes: 2e480058dd ("io-wq: provide a way to limit max number of workers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
During sigreturn the shadow stack signal frame is popped. The kernel does
this by reading the shadow stack using normal read accesses. When it can't
assume the memory is shadow stack, it takes extra steps to makes sure it is
reading actual shadow stack memory and not other normal readable memory. It
does this by holding the mmap read lock while doing the access and checking
the flags of the VMA.
Unfortunately that is not safe. If the read of the shadow stack sigframe
hits a page fault, the fault handler will try to recursively grab another
mmap read lock. This normally works ok, but if a writer on another CPU is
also waiting, the second read lock could fail and cause a deadlock.
Fix this by not holding mmap lock during the read access to userspace.
Instead use mmap_lock_speculate_...() to watch for changes between dropping
mmap lock and the userspace access. Retry if anything grabbed an mmap write
lock in between and could have changed the VMA.
These mmap_lock_speculate_...() helpers use mm::mm_lock_seq, which is only
available when PER_VMA_LOCK is configured. So make X86_USER_SHADOW_STACK
depend on it. On x86, PER_VMA_LOCK is a default configuration for SMP
kernels. So drop support for the other configs under the assumption that
the !SMP shadow stack user base does not exist.
Currently there is a check that skips the lookup work when the SSP can be
assumed to be on a shadow stack. While reorganizing the function, remove
the optimization to make the tricky code flows more common, such that
issues like this cannot escape detection for so long.
Fixes: 7fad2a432c ("x86/shstk: Check that signal frame is shadow stack mem")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
syzbot reports that it's hitting the below condition for exiting an
io_wq context:
WARN_ON_ONCE(!test_bit(IO_WQ_BIT_EXIT, &wq->state))
in io_wq_put_and_exit(), which can be triggered with memory allocation
fault injection. Ensure that the io_wq is marked as exiting to silence
this warning trigger.
Reported-by: syzbot+79a4cc863a8db58cd92b@syzkaller.appspotmail.com
Fixes: 7880174e1e ("io_uring/tctx: clean up __io_uring_add_tctx_node() error handling")
Reviewed-by: Clément Léger <cleger@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As with the idling code before it, the error exit path should check for
a NULL tctx->io_wq before calling io_wq_put_and_exit().
Fixes: 7880174e1e ("io_uring/tctx: clean up __io_uring_add_tctx_node() error handling")
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Clément Léger <cleger@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
nouveau_gem_pushbuf_reloc_apply() validates each relocation with
if (r->reloc_bo_offset + 4 > nvbo->bo.base.size)
but reloc_bo_offset is __u32 (uapi/drm/nouveau_drm.h) and the integer
literal 4 promotes to unsigned int, so the addition is performed in 32
bits and wraps before the comparison against the size_t bo size.
Cast to u64 so the addition happens in 64-bit arithmetic.
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Reported-by: Anthropic
Cc: stable <stable@kernel.org>
Assisted-by: gkh_clanker_t1000
Fixes: a1606a9596 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Add Fixes: tag. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The logfile contains a lot of useful information about the tests being
run. Add it to the stored failure directory when the test fails.
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Link: https://patch.msgid.link/20260420142315.7bbc3624@fedora
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>