Commit Graph

1447055 Commits

Author SHA1 Message Date
David Sterba
4d95b9efd7 btrfs: handle unexpected free-space-tree key types
Replace the conditional assertions with proper error handling and
transaction abort if we find an unexpected key type in the free space
tree.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:02:02 +02:00
Filipe Manana
999757231c btrfs: fix missing last_unlink_trans update when removing a directory
When removing a directory we are not updating its last_unlink_trans field,
which can result in incorrect fsync behaviour in case some one fsyncs the
directory after it was removed because it's holding a file descriptor on
it.

Example scenario:

   mkdir /mnt/dir1
   mkdir /mnt/dir1/dir2
   mkdir /mnt/dir3

   sync -f /mnt

   # Do some change to the directory and fsync it.
   chmod 700 /mnt/dir1
   xfs_io -c fsync /mnt/dir1

   # Move dir2 out of dir1 so that dir1 becomes empty.
   mv /mnt/dir1/dir2 /mnt/dir3/

   open fd on /mnt/dir1
   call rmdir(2) on path "/mnt/dir1"
   fsync fd

   <trigger power failure>

When attempting to mount the filesystem, the log replay will fail with
an -EIO error and dmesg/syslog has the following:

   [445771.626482] BTRFS info (device dm-0): first mount of filesystem 0368bbea-6c5e-44b5-b409-09abe496e650
   [445771.626486] BTRFS info (device dm-0): using crc32c checksum algorithm
   [445771.627912] BTRFS info (device dm-0): start tree-log replay
   [445771.628335] page: refcount:2 mapcount:0 mapping:0000000061443ddc index:0x1d00 pfn:0x7072a5
   [445771.629453] memcg:ffff89f400351b00
   [445771.629892] aops:btree_aops [btrfs] ino:1
   [445771.630737] flags: 0x17fffc00000402a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1ffff)
   [445771.632359] raw: 017fffc00000402a fffff47284d950c8 fffff472907b7c08 ffff89f458e412b8
   [445771.633713] raw: 0000000000001d00 ffff89f6c51d1a90 00000002ffffffff ffff89f400351b00
   [445771.635029] page dumped because: eb page dump
   [445771.635825] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=10 ino=258, invalid nlink: has 2 expect no more than 1 for dir
   [445771.638088] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14878 owner 5
   [445771.638091] BTRFS info (device dm-0): refs 4 lock_owner 0 current 3581087
   [445771.638094] 	item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
   [445771.638097] 		inode generation 3 transid 9 size 16 nbytes 16384
   [445771.638098] 		block group 0 mode 40755 links 1 uid 0 gid 0
   [445771.638100] 		rdev 0 sequence 2 flags 0x0
   [445771.638102] 		atime 1775744884.0
   [445771.660056] 		ctime 1775744885.645502983
   [445771.660058] 		mtime 1775744885.645502983
   [445771.660060] 		otime 1775744884.0
   [445771.660062] 	item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
   [445771.660064] 		index 0 name_len 2
   [445771.660066] 	item 2 key (256 DIR_ITEM 1843588421) itemoff 16077 itemsize 34
   [445771.660068] 		location key (259 1 0) type 2
   [445771.660070] 		transid 9 data_len 0 name_len 4
   [445771.660075] 	item 3 key (256 DIR_ITEM 2363071922) itemoff 16043 itemsize 34
   [445771.660076] 		location key (257 1 0) type 2
   [445771.660077] 		transid 9 data_len 0 name_len 4
   [445771.660078] 	item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34
   [445771.660079] 		location key (257 1 0) type 2
   [445771.660080] 		transid 9 data_len 0 name_len 4
   [445771.660081] 	item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34
   [445771.660082] 		location key (259 1 0) type 2
   [445771.660083] 		transid 9 data_len 0 name_len 4
   [445771.660084] 	item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160
   [445771.660086] 		inode generation 9 transid 9 size 8 nbytes 0
   [445771.660087] 		block group 0 mode 40777 links 1 uid 0 gid 0
   [445771.660088] 		rdev 0 sequence 2 flags 0x0
   [445771.660089] 		atime 1775744885.641174097
   [445771.660090] 		ctime 1775744885.645502983
   [445771.660091] 		mtime 1775744885.645502983
   [445771.660105] 		otime 1775744885.641174097
   [445771.660106] 	item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14
   [445771.660107] 		index 2 name_len 4
   [445771.660108] 	item 8 key (257 DIR_ITEM 2676584006) itemoff 15767 itemsize 34
   [445771.660109] 		location key (258 1 0) type 2
   [445771.660110] 		transid 9 data_len 0 name_len 4
   [445771.660111] 	item 9 key (257 DIR_INDEX 2) itemoff 15733 itemsize 34
   [445771.660112] 		location key (258 1 0) type 2
   [445771.660113] 		transid 9 data_len 0 name_len 4
   [445771.660114] 	item 10 key (258 INODE_ITEM 0) itemoff 15573 itemsize 160
   [445771.660115] 		inode generation 9 transid 10 size 0 nbytes 0
   [445771.660116] 		block group 0 mode 40755 links 2 uid 0 gid 0
   [445771.660117] 		rdev 0 sequence 0 flags 0x0
   [445771.660118] 		atime 1775744885.645502983
   [445771.660119] 		ctime 1775744885.645502983
   [445771.660120] 		mtime 1775744885.645502983
   [445771.660121] 		otime 1775744885.645502983
   [445771.660122] 	item 11 key (258 INODE_REF 257) itemoff 15559 itemsize 14
   [445771.660123] 		index 2 name_len 4
   [445771.660124] 	item 12 key (258 INODE_REF 259) itemoff 15545 itemsize 14
   [445771.660125] 		index 2 name_len 4
   [445771.660126] 	item 13 key (259 INODE_ITEM 0) itemoff 15385 itemsize 160
   [445771.660127] 		inode generation 9 transid 10 size 8 nbytes 0
   [445771.660128] 		block group 0 mode 40755 links 1 uid 0 gid 0
   [445771.660129] 		rdev 0 sequence 1 flags 0x0
   [445771.660130] 		atime 1775744885.645502983
   [445771.660130] 		ctime 1775744885.645502983
   [445771.660131] 		mtime 1775744885.645502983
   [445771.660132] 		otime 1775744885.645502983
   [445771.660133] 	item 14 key (259 INODE_REF 256) itemoff 15371 itemsize 14
   [445771.660134] 		index 3 name_len 4
   [445771.660135] 	item 15 key (259 DIR_ITEM 2676584006) itemoff 15337 itemsize 34
   [445771.660136] 		location key (258 1 0) type 2
   [445771.660137] 		transid 10 data_len 0 name_len 4
   [445771.660138] 	item 16 key (259 DIR_INDEX 2) itemoff 15303 itemsize 34
   [445771.660139] 		location key (258 1 0) type 2
   [445771.660140] 		transid 10 data_len 0 name_len 4
   [445771.660144] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected
   [445771.661650] ------------[ cut here ]------------
   [445771.662358] WARNING: fs/btrfs/disk-io.c:326 at btree_csum_one_bio+0x217/0x230 [btrfs], CPU#8: mount/3581087
   [445771.663588] Modules linked in: btrfs f2fs xfs (...)
   [445771.671229] CPU: 8 UID: 0 PID: 3581087 Comm: mount Tainted: G        W           7.0.0-rc6-btrfs-next-230+ #2 PREEMPT(full)
   [445771.672575] Tainted: [W]=WARN
   [445771.672987] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
   [445771.674460] RIP: 0010:btree_csum_one_bio+0x217/0x230 [btrfs]
   [445771.675222] Code: 89 44 24 (...)
   [445771.677364] RSP: 0018:ffffd23882247660 EFLAGS: 00010246
   [445771.678029] RAX: 0000000000000000 RBX: ffff89f6c51d1a90 RCX: 0000000000000000
   [445771.678975] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff89f406020000
   [445771.679983] RBP: ffff89f821204000 R08: 0000000000000000 R09: 00000000ffefffff
   [445771.680905] R10: ffffd23882247448 R11: 0000000000000003 R12: ffffd23882247668
   [445771.681978] R13: ffff89f458e40fc0 R14: ffff89f737f4f500 R15: ffff89f737f4f500
   [445771.682912] FS:  00007f0447a98840(0000) GS:ffff89fb9771d000(0000) knlGS:0000000000000000
   [445771.684393] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [445771.685230] CR2: 00007f0447bf1330 CR3: 000000017cb02002 CR4: 0000000000370ef0
   [445771.686273] Call Trace:
   [445771.686646]  <TASK>
   [445771.686969]  btrfs_submit_bbio+0x83f/0x860 [btrfs]
   [445771.687750]  ? write_one_eb+0x28f/0x340 [btrfs]
   [445771.688428]  btree_writepages+0x2e3/0x550 [btrfs]
   [445771.689180]  ? kmem_cache_alloc_noprof+0x12a/0x490
   [445771.689963]  ? alloc_extent_state+0x19/0x120 [btrfs]
   [445771.690801]  ? kmem_cache_free+0x135/0x380
   [445771.691328]  ? preempt_count_add+0x69/0xa0
   [445771.691831]  ? set_extent_bit+0x252/0x8e0 [btrfs]
   [445771.692468]  ? xas_load+0x9/0xc0
   [445771.692873]  ? xas_find+0x14d/0x1a0
   [445771.693304]  do_writepages+0xc6/0x160
   [445771.693756]  filemap_writeback+0xb8/0xe0
   [445771.694274]  btrfs_write_marked_extents+0x61/0x170 [btrfs]
   [445771.694999]  btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs]
   [445771.695818]  btrfs_commit_transaction+0x5c8/0xd10 [btrfs]
   [445771.696530]  ? kmem_cache_free+0x135/0x380
   [445771.697120]  ? release_extent_buffer+0x34/0x160 [btrfs]
   [445771.697786]  btrfs_recover_log_trees+0x7be/0x7e0 [btrfs]
   [445771.698525]  ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
   [445771.699206]  open_ctree+0x11e5/0x1810 [btrfs]
   [445771.699776]  btrfs_get_tree.cold+0xb/0x162 [btrfs]
   [445771.700463]  ? fscontext_read+0x165/0x180
   [445771.701146]  ? rw_verify_area+0x50/0x180
   [445771.701866]  vfs_get_tree+0x25/0xd0
   [445771.702491]  vfs_cmd_create+0x59/0xe0
   [445771.703125]  __do_sys_fsconfig+0x303/0x610
   [445771.703603]  do_syscall_64+0xe9/0xf20
   [445771.703974]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
   [445771.704700] RIP: 0033:0x7f0447cbd4aa
   [445771.705108] Code: 73 01 c3 (...)
   [445771.707263] RSP: 002b:00007ffc4e528318 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
   [445771.708107] RAX: ffffffffffffffda RBX: 00005561585d8c20 RCX: 00007f0447cbd4aa
   [445771.708931] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
   [445771.709744] RBP: 00005561585d9120 R08: 0000000000000000 R09: 0000000000000000
   [445771.710674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
   [445771.711477] R13: 00007f0447e4f580 R14: 00007f0447e5126c R15: 00007f0447e36a23
   [445771.712277]  </TASK>
   [445771.712541] ---[ end trace 0000000000000000 ]---
   [445771.713382] BTRFS error (device dm-0): error while writing out transaction: -5
   [445771.714679] BTRFS warning (device dm-0): Skipping commit of aborted transaction.
   [445771.715562] BTRFS error (device dm-0 state A): Transaction aborted (error -5)
   [445771.716459] BTRFS: error (device dm-0 state A) in cleanup_transaction:2068: errno=-5 IO failure
   [445771.717936] BTRFS error (device dm-0 state EA): failed to recover log trees with error: -5
   [445771.719681] BTRFS error (device dm-0 state EA): open_ctree failed: -5

The problem is that such a fsync should have result in a fallback to a
transaction commit, but that did not happen because through the
btrfs_rmdir() we never update the directory's last_unlink_trans field.
Any inode that had a link removed must have its last_unlink_trans updated
to the ID of transaction used for the operation, otherwise fsync and log
replay will not work correctly.

btrfs_rmdir() calls btrfs_unlink_inode() and through that call chain we
never call btrfs_record_unlink_dir() in order to update last_unlink_trans.
However btrfs_unlink(), which is used for unlinking regular files, calls
btrfs_record_unlink_dir() and then calls btrfs_unlink_inode(). So fix
this by moving the call to btrfs_record_unlink_dir() from btrfs_unlink()
to btrfs_unlink_inode().

A test case for fstests will follow soon.

Reported-by: Slava0135 <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAAJYhww5ov62Hm+n+tmhcL-e_4cBobg+OWogKjOJxVUXivC=MQ@mail.gmail.com/
CC: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:01:48 +02:00
Mark Harmstone
44366af740 btrfs: don't clobber errors in add_remap_tree_entries()
In add_remap_tree_entries(), we only process a certain number of entries
at a time, meaning we may need to loop.

But because we weren't checking the return value of btrfs_insert_empty_items()
within the loop, this meant that if the last iteration of the loop
succeeded but a previous iteration failed, we were erroneously returning
0.

Fix this by breaking the loop early if btrfs_insert_empty_items() fails.

Fixes: b56f35560b ("btrfs: handle setting up relocation of block group with remap-tree")
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:01:43 +02:00
Qu Wenruo
41e706c07e btrfs: enable shutdown ioctl for non-experimental builds
Although commit 304076527c ("btrfs: move shutdown and remove_bdev
callbacks out of experimental features") tries to move both shutdown and
remove_bdev out of experimental features, that commit has only addressed
the super block operation callback, the ioctl one is left untouched.

Fix that missing aspect by also moving shutdown ioctl out of
experimental features.

Since we're here, also add unknown flag detection to reject any
unsupported shutdown flags.

Fixes: 304076527c ("btrfs: move shutdown and remove_bdev callbacks out of experimental features")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:01:31 +02:00
Qu Wenruo
a86a283430 btrfs: apply first key check for readahead when possible
Currently for tree block readahead we never pass a
btrfs_tree_parent_check with @has_first_key set.

Without @has_first_key set, btrfs will skip the following extra
checks:

- Header generation check
  This is a minor one.

- Empty leaf/node checks
  This is more serious, for certain trees like the csum tree, they are
  allowed to be empty, thus an empty leaf can pass the tree checker.
  But if there is a parent node for such an empty leaf, it indicates
  corruption.

  Without @has_first_key set, we can no longer detect such a problem.

  In fact there is already a fuzzed image report that a corrupted csum
  leaf which has zero nritems but still has a parent node can trigger
  a BUG_ON() during csum deletion.

However there are only two call sites of btrfs_readahead_tree_block():

- Inside relocate_tree_blocks()
  At this call site we are trying to grab the first key of the tree
  block, thus we are not able to pass a @first_key parameter.

- Inside btrfs_readahead_node_child()
  This is the more common call site, where we have the parent node and
  want to readahead the child tree blocks.

  In this case we can easily grab the node key and pass it for checks.

Add a new parameter @first_key to btrfs_readahead_tree_block() and pass
the node key to it inside btrfs_readahead_node_child().

This should plug the gap in empty leaf detection during readahead.

Link: https://lore.kernel.org/linux-btrfs/20260409071255.3358044-1-gality369@gmail.com/
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:01:24 +02:00
Mark Harmstone
73db0fad67 btrfs: abort transaction in do_remap_reloc_trans() on failure
If one of the calls made by do_remap_reloc_trans() fails, we can leave
the remap tree in an inconsistent state. Abort the transaction if this
happens, to prevent the corrupt state from reaching the disk.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:00:52 +02:00
Mark Harmstone
9b8824533d btrfs: fix bytes_may_use leak in do_remap_reloc_trans()
If the call to btrfs_reserve_extent() in do_remap_reloc_trans() returns
a smaller extent than we asked for, currently we're not undoing the
bytes_may_use change that we made. Fix this by calling
btrfs_space_info_update_bytes_may_use() again for the difference.

Fixes: fd6594b144 ("btrfs: replace identity remaps with actual remaps when doing relocations")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:00:39 +02:00
Mark Harmstone
68a135013b btrfs: fix bytes_may_use leak in move_existing_remap()
If the call to btrfs_reserve_extent() in move_existing_remap() returns a
smaller extent than we asked for, currently we're not undoing the
bytes_may_use change that we made. Fix this by calling
btrfs_space_info_update_bytes_may_use() again for the difference.

Fixes: bbea42dfb9 ("btrfs: move existing remaps before relocating block group")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-21 04:00:32 +02:00
Linus Torvalds
b4e07588e7 tracing: tell git to ignore the generated 'undefsyms_base.c' file
This odd file was added to automatically figure out tool-generated
symbols.

Honestly, it *should* have been just a real honest-to-goodness regular
file in git, instead of having strange code to generate it in the
Makefile, but that is not how that silly thing works.  So now we need to
ignore it explicitly.

Fixes: 1211907ac0 ("tracing: Generate undef symbols allowlist for simple_ring_buffer")
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-04-20 17:25:56 -07:00
Linus Torvalds
f154634e42 linux_kselftest-next-7.1-next-fixes
Fixes regressions in non-bash shells and busybox support, and reverts
 a commit that regression in build and installation when one or more
 tests fail to build. Fixes duplicated test number reporting introduced
 in ktap support patch.
 
 - selftests: Fix duplicated test number reporting
 - selftests: Fix runner.sh for non-bash shells
 - selftests: Fix runner.sh busybox support
 - selftests: Deescalate error reporting
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmnmPx8ACgkQCwJExA0N
 QxwZ7g/+L4ZZDp3RyuauvCV4zhG5GQFudtvAkcLOSwBVnRbiLdPmdOXXz7IkW7DN
 U/WQx3pDOYmvtr2QdNvXch3HOdk1vUfNViU5yPNqC4jVZEMON2N7oGr2Eq+WVhi+
 gl63pRYk9ISh+5vOlzQY9UX1sLOxlME1foMJdHQEZHhgbNxlc7s/NfpqAnRC7a4l
 SFuzL/PJl9kYiMUFeYLB9kwrelvoLrzItMVz7/m56dgNVuEmbNDESBXGwJQneH6l
 SWOXPC96gu2cajluNfyhOqarkuGVD8x6J+2vWBwrDnSiyMLyealAOHnK5JGR17hW
 NErJDpqpdlIue5/h/XFnZ+4o43J8uEiUxmP7UiPAmreBllajeNz4xZPuz+i2vLH2
 O9dzzj/SV9War5txaFdqHXpbZE2zYOfhA07Xg6VdjcB0LTWaSOPqiIPr9UTwvT6o
 T7vYkvE+w4rjXwTFEscHkZ5jXrvAiWMrgiK4BuzXWy03/BvOF6LiMf0NCELvKvZG
 ZubLCJ1N/2EXgt+MX9dRmxq+7ZXCGu53TU5GeX1u/vT5lqsaPwoXT4RylZek5hwx
 DfKjEOU22TOQXAV01z1sPJvNwPZa84Hejzf6c6v2xobY/vaf4XxgyXAIAJJGOfpj
 25nEdecvjdX62kFfY/QrX7akpr7IreonpssmQVuGt3jilpq4uLg=
 =1k+f
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-7.1-next-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Fix regressions in non-bash shells and busybox support, and revert a
  commit that regressed in build and installation when one or more tests
  fail to build.

  Fix duplicated test number reporting introduced in ktap support patch"

* tag 'linux_kselftest-next-7.1-next-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: Fix duplicated test number reporting
  selftests: Fix runner.sh for non-bash shells
  selftests: Fix runner.sh busybox support
  selftests: Deescalate error reporting
2026-04-20 17:19:30 -07:00
Linus Torvalds
13f24586a2 arm64 updates for 7.1 (second round):
Core features:
 
  - Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
    DVMSync acknowledgement. The fix consists of sending IPIs on TLB
    maintenance to those CPUs running in user space with SME enabled
 
  - Include kernel-hwcap.h in list of generated files (missed in a recent
    commit generating the KERNEL_HWCAP_* macros)
 
 CCA:
 
  - Fix RSI_INCOMPLETE error check in arm-cca-guest
 
 MPAM:
 
  - Fix an unmount->remount problem with the CDP emulation, uninitialised
    variable and checker warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmnmQZcACgkQa9axLQDI
 XvGouA//SXlo7hQyM41rkgRru9oqftrGg0y6nxz4Z089kv50cm3Jlf/nUuti6vah
 BMBLCGXA1iOQrGIVmuvtCxDRrfYZWpfKGuT9A0gmEoMqrGIpWl9gfBQG+uR+YrQX
 4kp5DLqB85WrJIPiy7HUV6GQoCbFuMrRJwxl89IdWZSobaei3SczTmnttwyJtxG5
 /BMitl024TYdiOPNo8bhiML1wIJCaTHvH4IrtCHPyUHEAtsHSMy00y0OrSKBtA/9
 ZHZRpY7Po/jnL7YUs1AfYwsaSXjkvqXN0K1Tdavzm75k6lpJmbM3VsZabG/CEuvK
 PCOGV++is4Y/A+7aQsCwXKeVnY3b6AC4sextytNq0g3GZ7I+Ht9O6nbsp5ZmyXzB
 HRiFxmFS1pSQOMX9f1neKi3vxDMTy1tKPeccTTzL8dNnxTvUBXnoWfPoJh3cpbjm
 Dbhe1kksiEn01WWFacGtkIPDa9c+Bkd2T+8wrsk85Z+u3Z0JPM5PfOn6v3X9YlKl
 K7W8fhvlDL1wP+iyWcMT5zdo+xzHY4ZxuyWbi9a4RhKc6lFHVVG2mpUuPwSsh2ma
 NnxkDouriuoADHBir89U71N483HSnNfSjhlVSFYD2LFCre5KOZM4KYZ2vwWb8Sy4
 79q+BlVRUTQ5O6XjePoSPjUW4APPNviHJsF4E4IiqHkd9O5lMZU=
 =LNY2
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull more arm64 updates from Catalin Marinas:
 "The main 'feature' is a workaround for C1-Pro erratum 4193714
  requiring IPIs during TLB maintenance if a process is running in user
  space with SME enabled.

  The hardware acknowledges the DVMSync messages before completing
  in-flight SME accesses, with security implications. The workaround
  makes use of the mm_cpumask() to track the cores that need
  interrupting (arm64 hasn't used this mask before).

  The rest are fixes for MPAM, CCA and generated header that turned up
  during the merging window or shortly before.

  Summary:

  Core features:

   - Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
     DVMSync acknowledgement. The fix consists of sending IPIs on TLB
     maintenance to those CPUs running in user space with SME enabled

   - Include kernel-hwcap.h in list of generated files (missed in a
     recent commit generating the KERNEL_HWCAP_* macros)

  CCA:

   - Fix RSI_INCOMPLETE error check in arm-cca-guest

  MPAM:

   - Fix an unmount->remount problem with the CDP emulation,
     uninitialised variable and checker warnings"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
  arm_mpam: resctrl: Fix the check for no monitor components found
  arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
  virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
  arm64/hwcap: Include kernel-hwcap.h in list of generated files
  arm64: errata: Work around early CME DVMSync acknowledgement
  arm64: cputype: Add C1-Pro definitions
  arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
  arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
2026-04-20 16:46:22 -07:00
Linus Torvalds
ce9e93383a sh updates for v7.1
- sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
 - sh: Remove CONFIG_VSYSCALL reference from UAPI
 - sh: Fix typo in SPDX license ID lines
 - sh: Include <linux/io.h> in dac.h
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEYv+KdYTgKVaVRgAGdCY7N/W1+RMFAmnmHWgACgkQdCY7N/W1
 +RNVSQ/8DCMVL6n3HWJ0hllF5q5GDjk2CrpRRvcexw9B/ewOkgdykxnlFU90xSy3
 cHO+Y5ppeQuTLvnfYajCRUoR06OvBppAa3Ch1mWUYSDk/Ajs18gWnOYvvc+9isl9
 Pc25RjUa1E0pqayL+1XVdihk/moWM4A25bM8ND/Gqc4A6TyfRzESKvrz+jwrm5KL
 xDJBpD/tEKDqBnkiPE0Y/W3IjiZUG5ZpDuNIpkIW5JDWwlbT+4Xd7pcMQfor9JTM
 UCO4ZHDuhyf7vuMptFx/x6h2D1ssfDS7+5uGBRFIMpXJsXtbKSrePwF9dp/C8jia
 7XZVvmchtALZyUfyE/Z/UxaywtG8KLbPATMAlkb4veXJoqpKZXGNhtSE4M3P3h35
 CFfaGeSZEZjNYYT7TUjjZfv3EgTOeuC5I2wsJ+0ZaGMa/r/lIXwo7t6GwEApXXMN
 xGGG8/pS0Amyg2oWcf0xH0UGFeBt4AjO16NdC6z/5WRL42kqUn3V3KUGi2CWPx9L
 oHMnTQVTXvvhB7Lml58BVauLq3NHAPSuOvc2B/aelhv2sheET9PWVt/FeKacmC4N
 NyiNGFSlgonXUvMmh5YJwE1IwvQiziTF3BiHaCkuU4qS0lBpfxMsDGvmDusBlCjF
 MMTjlhDowpWYEbswHZmHhi4w2/ATTemrEoaY6m9jZNvZRwOK01k=
 =jFKW
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux

Pull sh updates from John Paul Adrian Glaubitz:
 "Two patches from Thomas Zimmermann, one by Tim Bird and one by Thomas
  Weißschuh.

  The first patch by Thomas Zimmermann adds a missing include in dac.h
  for SH-3 which became necessary after 243ce64b2b ("backlight: Do not
  include <linux/fb.h> in header file") which made __raw_readb() and
  __raw_writeb() inaccessible in dac.h.

  Thomas' second patch drops CONFIG_FIRMWARE_EDID for SH as it depends
  on X86 or EFI_GENERIC_STUB which are not defined on SH for obvious
  reasons.

  The patch by Tim Bird fixes just a small typo in two SPDX ID lines
  which he stumbled over by accident.

  And, least but not last, the patch by Thomas Weißschuh removes the
  CONFIG_VSYSCALL reference from UAPI. This was necessary as the
  definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
  avoid a default gate VMA to be created. However that default gate VMA
  was removed entirely in commit a6c19dfe39 (arm64,ia64,ppc,s390,
  sh,tile,um,x86,mm: remove default gate area)"

* tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
  sh: Remove CONFIG_VSYSCALL reference from UAPI
  sh: Fix typo in SPDX license ID lines
  sh: Include <linux/io.h> in dac.h
2026-04-20 16:41:19 -07:00
Linus Torvalds
065c4e67cc Mostly cleanups and small things, notably:
- musl libc compatibility
  - vDSO installation fix
  - TLB sync race fix for recent SMP support
  - build fix for 32-bit with Clang 20/21
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmnmKS4ACgkQ10qiO8sP
 aAD2+w//dOOblgUYgQJUXIxHpS7Gcb3Tm+a7ujC23q/kWf/pc8milCSf+zoxzUXL
 23Vwh4Gt4KrHKp8lG1gU3xZqV0qwhXNi5HO2hMpB0ioIVpX3TcrUhFbp/Oirvhgi
 3PvnvsFtUlW82DFgewB98tefXZSAlG/pg+RjQ3weHfEo+xQbjYc+kR8o59tN8LNR
 Ea4rrxyjsr3KN2yBNaFpDkMchudP6XWgKByAZBxZ2FofC3zuVRCyF8ThDfQl/3/W
 muSqX+2iuKjGpmxV0XWt72hYOhNYjBtDY7f4EPe6sbUy+PU6SjD9h/s7VTyVHgZR
 3Sii9AQLLJNYPoglExMfmWfeUnJCUJNNTLUze+ZtnhURZQYTvyJRzVmKj6fDPjK2
 jGEKXanfZCK9Cfgy2f2xbQxCxhAVwz6QT0XaQO2dZBXa0anzG+2HM0Zn8MNa9jbU
 +Lm11k1jd1QBifr+5zeni98KHt2mf77blCny8TraODgLNgWUVi5kMkPF4bZgD4Qj
 udMU9lOkTD08R89hG/Le9TsB+NIpPauyNxDHUpC/VDterFdZqFvmOFT6afTo/4RZ
 nXNVdL1tn+7O7v0bLdbyhXwj2her1GDbe6HZ5eTNqmjcOthcgI3gF2stDfFhEbNb
 /wMHnpGPncMeEI8YWtWOFA4FA5T32+LafLCKhuRJdaw0+f/NMOo=
 =oovZ
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull uml updates from Johannes Berg:
 "Mostly cleanups and small things, notably:

   - musl libc compatibility

   - vDSO installation fix

   - TLB sync race fix for recent SMP support

   - build fix for 32-bit with Clang 20/21"

* tag 'uml-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Disable GCOV_PROFILE_ALL on 32-bit UML with Clang 20/21
  um: drivers: call kernel_strrchr() explicitly in cow_user.c
  um: Replace strncpy() with strnlen()+memcpy_and_pad() in strncpy_chunk_from_user()
  x86/um: fix vDSO installation
  um: Remove CONFIG_FRAME_WARN from x86_64_defconfig
  um: Fix pte_read() and pte_exec() for kernel mappings
  um: Fix potential race condition in TLB sync
  um: time-travel: clean up kernel-doc warnings
  um: avoid struct sigcontext redefinition with musl
  um: fix address-of CMSG_DATA() rvalue in stub
2026-04-20 16:36:46 -07:00
Linus Torvalds
b66cb4f156 printk changes for 7.1
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmnmEggbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTIsMiwyAAoJEFKgDEdIgJTyjQwP/j2a9EzdI4+DfGrmA56m
 /heo44ObpFJppYaWEyAN7xX8Xpm7ErokLjZxVxhgQY7hGU5WLx1CmslnJbfWYkWy
 r7q92Kd/QIDmsvwHzlE0xdaX3rp8AXp3O2iIhcDfv9FRe+UultrESBpw9JbRYXXQ
 SLAFU1iOFMxyuzvhW/2l/B+PQDm0uoRMpMWTsLXo2JtNOsIGewNyV/7dhOumm+RD
 /0lgVA9jMAQA33j4Hkr38REe8lYH7aGl66x1mZhDg+kYb7w7ndKW/QN21OJiP+2D
 s4moi2/VLmC/UcxoDAOQTArKYkrYc2nzLMJRMaIun8jcNfCHEqaQAW2MTzSDAC78
 +atdlWrfIxykORA31lTtjh4o5vg43sQPjFZCJr3RxNLfxFy15NULhT75QFrxe9sA
 W3gs4Rz+LeynoQbJNCn8hgK0xcpKLtzC4dMY7dfQo6uyq2YnJauEvioJKbUYyoDh
 3l3uYfnzHcfRUb+yNtIkNCiYXwrpTAnSnifCbWO3smWYxhCdAT14Rval/fxtTjSe
 sTphd7m32U56WpWX0lYQbY001nMzso+gN/eQSK20IzrhvghwYUqvCKNm+fIM9tjR
 nQBxka+B7XhO27hNV4QIT8ZQRUkabQQAseEFhLQI2Ptgw3eV0s85gMP60Lg4gpZ0
 7OGA/VM+xrLqXvsFeDDWccSg
 =O6QR
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Fix printk ring buffer initialization and sanity checks

 - Workaround printf kunit test compilation with gcc < 12.1

 - Add IPv6 address printf format tests

 - Misc code and documentation cleanup

* tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printf: Compile the kunit test with DISABLE_BRANCH_PROFILING DISABLE_BRANCH_PROFILING
  lib/vsprintf: use bool for local decode variable
  lib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()
  printk: ringbuffer: fix errors in comments
  printk_ringbuffer: Add sanity check for 0-size data
  printk_ringbuffer: Fix get_data() size sanity check
  printf: add IPv6 address format tests
  printk: Fix _DESCS_COUNT type for 64-bit systems
2026-04-20 15:42:18 -07:00
Linus Torvalds
ccbc9fdb32 Fix timer stalls caused by incorrect handling of
the dev->next_event_forced flag.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnl1/4RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g/ww/7BI4CyQUJLSCpYMjvkj+87Rrfd5u6FGqt
 jz0dGeQpY0LvRGSqASwICe+1r0zwHF+xDsUfJvA13mRaPM6D6bEU+JE6ffK8B6T9
 EIyYwEwQ2a7DrdIu9+FCXTwqXDUoGLFsguD50b4qupQKFcDlwgZbg4UAWi/ptau9
 Ww+5T/+sfw/SMR9EwXBSKH79N0gOOGpDNfGtpDv+0X0qPQvo9QGAxMfIUgMf7ZaA
 y55agXi5iOdM+mAIrE69WLinBzrBvXHWNr66/SaMadQs93I1hU54sLpir4ft1yCs
 WnDtTRWG11Y0HBHUqqgbnN8BR/2VIFDVe9BtRDoDUD70iEJ8TGqJjOvF8v7C00MK
 ets0zNel9Rqbz9wjrjTekPYUHfC/t9qqzV77c0TdU1IR6FArf/OT9Ge34AVr60EX
 a5s4aX7ECLjwuTwgQPLXsSedOD0eQndf/VYdEQ86fTUfyyujVg2NCxbFEfDr3eho
 SbjcNv1UQ1WY/7miJzYaiA2aVNtwuX25YNI+t3f3pX/1tGqmx9oB1tNzJqgGfuN9
 3/Rx3uP0kH+gpbw1lKAugFJazOEHDLJG8LBgF0PYbmlVdIGn57IuBlQL5GDLe77O
 G+sFUhrLNpkrIJVhNWODkM+K/z9vvKzENiRgG4hB0oAbQEUuqTA6ppayKWWs8KDb
 9fdgLSkdjtI=
 =GPvm
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix timer stalls caused by incorrect handling of the
  dev->next_event_forced flag"

* tag 'timers-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Add missing resets of the next_event_forced flag
2026-04-20 15:30:08 -07:00
Keenan Dong
3bfdc63936 rtmutex: Use waiter::task instead of current in remove_waiter()
remove_waiter() is used by the slowlock paths, but it is also used for
proxy-lock rollback in rt_mutex_start_proxy_lock() when invoked from
futex_requeue().

In the latter case waiter::task is not current, but remove_waiter()
operates on current for the dequeue operation. That results in several
problems:

  1) the rbtree dequeue happens without waiter::task::pi_lock being held

  2) the waiter task's pi_blocked_on state is not cleared, which leaves a
     dangling pointer primed for UAF around.

  3) rt_mutex_adjust_prio_chain() operates on the wrong top priority waiter
     task

Use waiter::task instead of current in all related operations in
remove_waiter() to cure those problems.

[ tglx: Fixup rt_mutex_adjust_prio_chain(), add a comment and amend the
  	changelog ]

Fixes: 8161239a8b ("rtmutex: Simplify PI algorithm and make highest prio task get lock")
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
2026-04-21 00:22:31 +02:00
Linus Torvalds
65e9974ae2 Remove the unused ARCH_SYSCALL_WORK_{ENTER,EXIT} flags.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmnl1nYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1htAg/9EUdjaabxTyjgC5EzaGbOJ6dpK+sNnxLm
 hGljfVLxKX0ymPrPTC9dVl5+kIg6tETwtCwXOSR6TWy6ZlcZ/lB8gaddqfWaJQ2z
 FprjCgRGPcd2+UgiI/Z/fGp5vLePXe3x2KRmhHZEns9zxpZQ3L1gRhesNcbkrlXl
 rrKAWF3IndoKi0kC838uczOkZWYa/7wEaKCVTYMZ9Cw/MR7EgjoeT6+2Eg0idXNv
 EGwNA0TXFiJmqIU7J0eCWp779hFKyaiPUkloIsg4TQnL2bZMma1yP6DCf2+6o6zP
 xfONLGBakNuRmEcJXp08nBhuDLmCELFFsnEU08qzdpMNmrOCsm96RVo5ALnNZLJT
 oByxga7XU4QCnKLxPr3VF7sTUQ90/Yde+R061lkbzoKyQIffcYjcQ3qPLmi0T3+Y
 JLPBbzg8EFNd0o8Z5y6CR5k2yDfZXbde59Ani6cOjKQz7tY3UoF0+q+nLFKy59tj
 jp8qPgVRYbCAzDIkBprb+dyQHHZoPPy7waYDoHgtOk2kZRGvJHZyJkhAuIj3kMtG
 tp3uf3VpyAfKieR/q137wu6a0dnIiycftBmaB+OOJVmAi43t5qWezH0WwPtCLeqI
 sEd/bEBuawasIXHzxmfQsumTspPAbWuTSM7I4RT+Xpa6V1vYxfRTgC+pPOt8kQ/5
 DgeJC8NvwY4=
 =NYzQ
 -----END PGP SIGNATURE-----

Merge tag 'core-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull entry cleanup from Ingo Molnar:
 "Remove the unused ARCH_SYSCALL_WORK_{ENTER,EXIT} flags"

* tag 'core-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Kill ARCH_SYSCALL_WORK_{ENTER,EXIT}
2026-04-20 15:07:28 -07:00
Sean Christopherson
dfd2a8b07c KVM: selftests: Replace "paddr" with "gpa" throughout
Replace all variations of "paddr" variables in KVM selftests with "gpa",
with the exception of the ELF structures, as those fields are not specific
to guest virtual addresses, to complete the conversion from vm_paddr_t to
gpa_t.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
abc374191d KVM: selftests: Replace "u64 nested_paddr" with "gpa_t l2_gpa"
In x86's nested TDP APIs, use the appropriate gpa_t typedef and rename
variables from nested_paddr to l2_gpa to match KVM x86's nomenclature.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
df079910f9 KVM: selftests: Replace "u64 gpa" with "gpa_t" throughout
Use gpa_t instead of u64 for obvious declarations of GPA variables.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
014dfb7b9b KVM: selftests: Replace "vaddr" with "gva" throughout
Replace all variations of "vaddr" variables in KVM selftests with "gva",
with the exception of the ELF structures, as those fields are not specific
to guest virtual addresses, to complete the conversion from vm_vaddr_t to
gva_t.

Opportunistically use gva_t instead of u64 for relevant variables, and
fixup indentation as appropriate.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
a662c4e038 KVM: selftests: Clarify that arm64's inject_uer() takes a host PA, not a guest PA
Rename inject_uer()'s @paddr to @hpa to make it more obvious that it
injects an error using a host PA, not a guest PA.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
4babae4ca1 KVM: selftests: Rename translate_to_host_paddr() => translate_hva_to_hpa()
Rename arm64's translate_to_host_paddr() to translate_hva_to_hpa() and
update variable names to match, as using "vaddr" and "paddr" terminology
is super confusing due to selftests using those exact names for *guest*
addresses.

Opportunisitically drop superfluous local page_addr and paddr variables.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
3fd995905b KVM: selftests: Rename vm_vaddr_populate_bitmap() => vm_populate_gva_bitmap()
Now that KVM selftests use gva_t instead of vm_vaddr_t, rename the helper
for populating the initial GVA bitmap to drop the defunct terminology and
use "vm" for the scope.

Opportunistically fixup the declaration of the API, which has been broken
since day 1.  The flaw went unnoticed because the sole caller is defined
after the weak version, i.e. can see the prototype without a previous
declaration.

No functional change intended.

Fixes: e8b9a055fa ("KVM: arm64: selftests: Align VA space allocator with TTBR0")
Link: https://patch.msgid.link/20260420212004.3938325-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
48321f609a KVM: selftests: Rename vm_vaddr_unused_gap() => vm_unused_gva_gap()
Now that KVM selftests use gva_t instead of vm_vaddr_t, rename the API
for finding an unused range of virtual memory to drop the defunct
terminology and use "vm" for the scope.

Opportunistically clean up the function comment to drop superfluous
and redundant information.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
Sean Christopherson
85819fa0e3 KVM: selftests: Drop "vaddr_" from APIs that allocate memory for a given VM
Now that KVM selftests use gva_t instead of vm_vaddr_t, drop "vaddr_" from
the core memory allocation APIs as the information is extraneous and does
more harm than good.  E.g. the APIs don't _just_ allocate virtual memory,
they allocate backing physical memory and install mappings in the guest
page tables.  And as proven by kmalloc() and malloc(), developers generally
expect that allocations come with a working virtual address.

Opportunistically clean up the function comment for vm_alloc(), and drop
the misleading and superfluous comments for its wrappers.

No functional change intended.

Link: https://patch.msgid.link/20260420212004.3938325-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
David Matlack
6ec982b5a2 KVM: selftests: Use u8 instead of uint8_t
Use u8 instead of uint8_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint8_t/u8/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
David Matlack
2540ebd603 KVM: selftests: Use s16 instead of int16_t
Use s16 instead of int16_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/int16_t/s16/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
David Matlack
19d0914920 KVM: selftests: Use u16 instead of uint16_t
Use u16 instead of uint16_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint16_t/u16/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:17 -07:00
David Matlack
7b60918768 KVM: selftests: Use s32 instead of int32_t
Use s32 instead of int32_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/int32_t/s32/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
David Matlack
0c3a877469 KVM: selftests: Use u32 instead of uint32_t
Use u32 instead of uint32_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint32_t/u32/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
David Matlack
286e8903ae KVM: selftests: Use s64 instead of int64_t
Use s64 instead of int64_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/int64_t/s64/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://patch.msgid.link/20260420212004.3938325-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
David Matlack
26f8453288 KVM: selftests: Use u64 instead of uint64_t
Use u64 instead of uint64_t to make the KVM selftests code more concise
and more similar to the kernel (since selftests are primarily developed
by kernel developers).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/uint64_t/u64/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

Include <linux/types.h> in include/kvm_util_types.h, iinclude/test_util.h,
and include/x86/pmu.h to pick up the tools-defined u64.  Arguably, all
headers (especially kvm_util_types.h) should have already been including
stdint.h to get uint64_t from the libc headers, but the missing dependency
only rears its head once KVM uses u64 instead of uint64_t.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
[sean: rename pread_uint64() => pread_u64, expand on types.h include]
Link: https://patch.msgid.link/20260420212004.3938325-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
David Matlack
6d3494255a KVM: selftests: Use gpa_t for GPAs in Hyper-V selftests
Fix various Hyper-V selftests to use gpa_t for variables that contain
guest physical addresses, rather than gva_t.  In practice, the bugs are
benign as both gva_t and gpa_t are u64 typedefs, i.e. gpa_t and gva_t are
interchangeable from a functional perspective, the code is just confusing.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
[sean: call out that both are u64 typedefs]
Link: https://patch.msgid.link/20260420212004.3938325-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
David Matlack
97dcda3fdc KVM: selftests: Use gpa_t instead of vm_paddr_t
Replace all occurrences of vm_paddr_t with gpa_t to align with KVM code
and with the conversion helpers (e.g. addr_hva2gpa()).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/vm_paddr_/gpa_/g'

Then by manually adjusting whitespace to make checkpatch.pl happy.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
[sean: drop bogus changelog blurb about renaming functions]
Link: https://patch.msgid.link/20260420212004.3938325-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
David Matlack
5567fc9dcd KVM: selftests: Use gva_t instead of vm_vaddr_t
Replace all occurrences of vm_vaddr_t with gva_t to align with KVM code
and with the conversion helpers (e.g. addr_gva2hva()).

This commit was generated with the following command:

  git ls-files tools/testing/selftests/kvm | xargs sed -i 's/vm_vaddr_/gva_/g'

Then by manually adjusting whitespace to make checkpatch.pl happy, and
dropping renames of functions that allocate memory within a given VM.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
[sean: drop renames of allocator APIs]
Link: https://patch.msgid.link/20260420212004.3938325-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-04-20 14:54:16 -07:00
Fernando Fernandez Mancera
711987ba28 netfilter: nfnetlink_osf: fix potential NULL dereference in ttl check
The nf_osf_ttl() function accessed skb->dev to perform a local interface
address lookup without verifying that the device pointer was valid.

Additionally, the implementation utilized an in_dev_for_each_ifa_rcu
loop to match the packet source address against local interface
addresses. It assumed that packets from the same subnet should not see a
decrement on the initial TTL. A packet might appear it is from the same
subnet but it actually isn't especially in modern environments with
containers and virtual switching.

Remove the device dereference and interface loop. Replace the logic with
a switch statement that evaluates the TTL according to the ttl_check.

Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Kito Xu (veritas501) <hxzene@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/20260414074556.2512750-1-hxzene@gmail.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:45:44 +02:00
Fernando Fernandez Mancera
f5ca450087 netfilter: nfnetlink_osf: fix out-of-bounds read on option matching
In nf_osf_match(), the nf_osf_hdr_ctx structure is initialized once
and passed by reference to nf_osf_match_one() for each fingerprint
checked. During TCP option parsing, nf_osf_match_one() advances the
shared ctx->optp pointer.

If a fingerprint perfectly matches, the function returns early without
restoring ctx->optp to its initial state. If the user has configured
NF_OSF_LOGLEVEL_ALL, the loop continues to the next fingerprint.
However, because ctx->optp was not restored, the next call to
nf_osf_match_one() starts parsing from the end of the options buffer.
This causes subsequent matches to read garbage data and fail
immediately, making it impossible to log more than one match or logging
incorrect matches.

Instead of using a shared ctx->optp pointer, pass the context as a
constant pointer and use a local pointer (optp) for TCP option
traversal. This makes nf_osf_match_one() strictly stateless from the
caller's perspective, ensuring every fingerprint check starts at the
correct option offset.

Fixes: 1a6a0951fc ("netfilter: nfnetlink_osf: add missing fmatch check")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:45:43 +02:00
Yingnan Zhang
67bf42cae4 ipvs: fix MTU check for GSO packets in tunnel mode
Currently, IPVS skips MTU checks for GSO packets by excluding them with
the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel
mode encapsulates GSO packets with IPIP headers.

The issue manifests in two ways:

1. MTU violation after encapsulation:
   When a GSO packet passes through IPVS tunnel mode, the original MTU
   check is bypassed. After adding the IPIP tunnel header, the packet
   size may exceed the outgoing interface MTU, leading to unexpected
   fragmentation at the IP layer.

2. Fragmentation with problematic IP IDs:
   When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments
   is fragmented after encapsulation, each segment gets a sequentially
   incremented IP ID (0, 1, 2, ...). This happens because:

   a) The GSO packet bypasses MTU check and gets encapsulated
   b) At __ip_finish_output, the oversized GSO packet is split into
      separate SKBs (one per segment), with IP IDs incrementing
   c) Each SKB is then fragmented again based on the actual MTU

   This sequential IP ID allocation differs from the expected behavior
   and can cause issues with fragment reassembly and packet tracking.

Fix this by properly validating GSO packets using
skb_gso_validate_network_len(). This function correctly validates
whether the GSO segments will fit within the MTU after segmentation. If
validation fails, send an ICMP Fragmentation Needed message to enable
proper PMTU discovery.

Fixes: 4cdd34084d ("netfilter: nf_conntrack_ipv6: improve fragmentation handling")
Signed-off-by: Yingnan Zhang <342144303@qq.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:45:43 +02:00
Pablo Neira Ayuso
6eda0d771f netfilter: nat: use kfree_rcu to release ops
Florian Westphal says:

"Historically this is not an issue, even for normal base hooks: the data
path doesn't use the original nf_hook_ops that are used to register the
callbacks.

However, in v5.14 I added the ability to dump the active netfilter
hooks from userspace.

This code will peek back into the nf_hook_ops that are available
at the tail of the pointer-array blob used by the datapath.

The nat hooks are special, because they are called indirectly from
the central nat dispatcher hook. They are currently invisible to
the nfnl hook dump subsystem though.

But once that changes the nat ops structures have to be deferred too."

Update nf_nat_register_fn() to deal with partial exposition of the hooks
from error path which can be also an issue for nfnetlink_hook.

Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:45:41 +02:00
Pablo Neira Ayuso
b6fe26f86a netfilter: xtables: restrict several matches to inet family
This is a partial revert of:

  commit ab4f21e6fb ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")

to allow ipv4 and ipv6 only.

- xt_mac
- xt_owner
- xt_physdev

These extensions are not used by ebtables in userspace.

Moreover, xt_realm is only for ipv4, since dst->tclassid is ipv4
specific.

Fixes: ab4f21e6fb ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")
Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:27:52 +02:00
Florian Westphal
6e7066bdb4 netfilter: conntrack: remove sprintf usage
Replace it with scnprintf, the buffer sizes are expected to be large enough
to hold the result, no need for snprintf+overflow check.

Increase buffer size in mangle_content_len() while at it.

BUG: KASAN: stack-out-of-bounds in vsnprintf+0xea5/0x1270
Write of size 1 at addr [..]
 vsnprintf+0xea5/0x1270
 sprintf+0xb1/0xe0
 mangle_content_len+0x1ac/0x280
 nf_nat_sdp_session+0x1cc/0x240
 process_sdp+0x8f8/0xb80
 process_invite_request+0x108/0x2b0
 process_sip_msg+0x5da/0xf50
 sip_help_tcp+0x45e/0x780
 nf_confirm+0x34d/0x990
 [..]

Fixes: 9fafcd7b20 ("[NETFILTER]: nf_conntrack/nf_nat: add SIP helper port")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:27:46 +02:00
Xiang Mei
2195574dc6 netfilter: nfnetlink_osf: fix divide-by-zero in OSF_WSS_MODULO
nf_osf_match_one() computes ctx->window % f->wss.val in the
OSF_WSS_MODULO branch with no guard for f->wss.val == 0. A
CAP_NET_ADMIN user can add such a fingerprint via nfnetlink; a
subsequent matching TCP SYN divides by zero and panics the kernel.

Reject the bogus fingerprint in nfnl_osf_add_callback() above the
per-option for-loop. f->wss is per-fingerprint, not per-option, so
the check must run regardless of f->opt_num (including 0). Also
reject wss.wc >= OSF_WSS_MAX; nf_osf_match_one() already treats that
as "should not happen".

Crash:
 Oops: divide error: 0000 [#1] SMP KASAN NOPTI
 RIP: 0010:nf_osf_match_one (net/netfilter/nfnetlink_osf.c:98)
 Call Trace:
 <IRQ>
  nf_osf_match (net/netfilter/nfnetlink_osf.c:220)
  xt_osf_match_packet (net/netfilter/xt_osf.c:32)
  ipt_do_table (net/ipv4/netfilter/ip_tables.c:348)
  nf_hook_slow (net/netfilter/core.c:622)
  ip_local_deliver (net/ipv4/ip_input.c:265)
  ip_rcv (include/linux/skbuff.h:1162)
  __netif_receive_skb_one_core (net/core/dev.c:6181)
  process_backlog (net/core/dev.c:6642)
  __napi_poll (net/core/dev.c:7710)
  net_rx_action (net/core/dev.c:7945)
  handle_softirqs (kernel/softirq.c:622)

Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:27:41 +02:00
Pablo Neira Ayuso
b336fdbb71 netfilter: nft_osf: restrict it to ipv4
This expression only supports for ipv4, restrict it.

Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20 23:27:36 +02:00
Jens Axboe
42a702aaed io_uring: fix iowq_limits data race in tctx node addition
__io_uring_add_tctx_node() reads ctx->int_flags and
ctx->iowq_limits[0..1] without holding ctx->uring_lock, while
io_register_iowq_max_workers() writes these same fields under the lock.

Mostly an application problem if you try and make these race, but let's
silence KCSAN by just grabbing the ->uring_lock around the operation.
This is a slow path operation anyway, and ->uring_lock will be grabbed
by submission right after anyway.

Fixes: 2e480058dd ("io-wq: provide a way to limit max number of workers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-20 14:57:21 -06:00
Rick Edgecombe
9874b2917b x86/shstk: Prevent deadlock during shstk sigreturn
During sigreturn the shadow stack signal frame is popped. The kernel does
this by reading the shadow stack using normal read accesses. When it can't
assume the memory is shadow stack, it takes extra steps to makes sure it is
reading actual shadow stack memory and not other normal readable memory. It
does this by holding the mmap read lock while doing the access and checking
the flags of the VMA.

Unfortunately that is not safe. If the read of the shadow stack sigframe
hits a page fault, the fault handler will try to recursively grab another
mmap read lock. This normally works ok, but if a writer on another CPU is
also waiting, the second read lock could fail and cause a deadlock.

Fix this by not holding mmap lock during the read access to userspace.

Instead use mmap_lock_speculate_...() to watch for changes between dropping
mmap lock and the userspace access. Retry if anything grabbed an mmap write
lock in between and could have changed the VMA.

These mmap_lock_speculate_...() helpers use mm::mm_lock_seq, which is only
available when PER_VMA_LOCK is configured. So make X86_USER_SHADOW_STACK
depend on it. On x86, PER_VMA_LOCK is a default configuration for SMP
kernels. So drop support for the other configs under the assumption that
the !SMP shadow stack user base does not exist.

Currently there is a check that skips the lookup work when the SSP can be
assumed to be on a shadow stack. While reorganizing the function, remove
the optimization to make the tricky code flows more common, such that
issues like this cannot escape detection for so long.

Fixes: 7fad2a432c ("x86/shstk: Check that signal frame is shadow stack mem")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
2026-04-20 22:54:24 +02:00
Jens Axboe
41859843f2 io_uring/tctx: mark io_wq as exiting before error path teardown
syzbot reports that it's hitting the below condition for exiting an
io_wq context:

WARN_ON_ONCE(!test_bit(IO_WQ_BIT_EXIT, &wq->state))

in io_wq_put_and_exit(), which can be triggered with memory allocation
fault injection. Ensure that the io_wq is marked as exiting to silence
this warning trigger.

Reported-by: syzbot+79a4cc863a8db58cd92b@syzkaller.appspotmail.com
Fixes: 7880174e1e ("io_uring/tctx: clean up __io_uring_add_tctx_node() error handling")
Reviewed-by: Clément Léger <cleger@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-20 14:47:37 -06:00
Jens Axboe
ee5417fd02 io_uring/tctx: check for setup tctx->io_wq before teardown
As with the idling code before it, the error exit path should check for
a NULL tctx->io_wq before calling io_wq_put_and_exit().

Fixes: 7880174e1e ("io_uring/tctx: clean up __io_uring_add_tctx_node() error handling")
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Clément Léger <cleger@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-20 14:47:29 -06:00
Greg Kroah-Hartman
2fc87d37be drm/nouveau: fix u32 overflow in pushbuf reloc bounds check
nouveau_gem_pushbuf_reloc_apply() validates each relocation with

    if (r->reloc_bo_offset + 4 > nvbo->bo.base.size)

but reloc_bo_offset is __u32 (uapi/drm/nouveau_drm.h) and the integer
literal 4 promotes to unsigned int, so the addition is performed in 32
bits and wraps before the comparison against the size_t bo size.

Cast to u64 so the addition happens in 64-bit arithmetic.

Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Reported-by: Anthropic
Cc: stable <stable@kernel.org>
Assisted-by: gkh_clanker_t1000
Fixes: a1606a9596 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Add Fixes: tag. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-04-20 21:23:14 +02:00
Steven Rostedt
932cdaf3e2 ktest: Add logfile to failure directory
The logfile contains a lot of useful information about the tests being
run. Add it to the stored failure directory when the test fails.

Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Link: https://patch.msgid.link/20260420142315.7bbc3624@fedora
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-20 15:23:13 -04:00