mirror of
https://github.com/torvalds/linux.git
synced 2026-05-12 16:18:45 +02:00
master
788 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1d51b370a0 |
More robust data integrity checking and some fixes.
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmnflLsACgkQNqiEXrVA jGSafRAAjUgUKo2wEVRcpuvrLzxBnPtYcPzcShcO1MrFkhU5tQkGMPmtV8fe0eQn iCjnWFFl8SAom6OQSz/QWEpekXSv/uJ+3Sb0sB6tBlv3LwamRGGqNe3BD3DgdVPP E6hIhffj7mx/7bxsp3s3m6JSoXte6k01dMBYVfLf8ZT7TCHe84bmTnjGd71N4iS0 v4DY3BJJW9RicNlIPloHug2ghMeZ+JA9laGVGkX3bC4uF6ZsjpVmpFyhq7K9V5oV mYbqtpGC6cOGS37C2Ap96ajaHjD5SNYAMxj1kpjJHFVbltcGKD5mx1h5nsPH+gNA G4B+C+aZsAILTl8mzTguqprKY6DKY3BgsS/IekKrSWKGbWzJTKFa8leMJiMv4Sfr FShHSzr0WB0NsZ0XSbSM48Bn2/AR52m8MXRbptn8//wxXea9FtRhE85JRITkNWTV Ek/wnFB6khZfkwd60O9mV9NpWxzgdDGFHdWuMQcudBp4qZtU6nygWDI2/m+GBbhk MXBQ0pO8jOpfMklI09x+o/dM71BTwCPUeIDUgDfYCLyttpj7UlroEQllL08QyqHf guCKNKkk43+OhLw6fsQqXCoQ9jMwrmV37VPjIylBvUvDnpjZAG57LdU2B+Hs7Swr UUuS3Q/1zoD7xunmiqeRq2h6jdp7/BkaVhmnduZvdfRsiZ2CX0s= =uYSq -----END PGP SIGNATURE----- Merge tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy Pull jfs updates from Dave Kleikamp: "More robust data integrity checking and some fixes" * tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy: jfs: avoid -Wtautological-constant-out-of-range-compare warning again JFS: always load filesystem UUID during mount jfs: hold LOG_LOCK on umount to avoid null-ptr-deref jfs: Set the lbmDone flag at the end of lbmIODone jfs: fix corrupted list in dbUpdatePMap jfs: add dmapctl integrity check to prevent invalid operations jfs: add dtpage integrity check to prevent index/pointer overflows jfs: add dtroot integrity check to prevent index out-of-bounds |
||
|
|
dad98c5b2a |
jfs: avoid -Wtautological-constant-out-of-range-compare warning again
The comparison of an __s8 value against DTPAGEMAXSLOT is still trivially
true, causing a harmless (default disabled) warning with clang:
fs/jfs/jfs_dtree.c:4419:25: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
4419 | p->header.freelist >= DTPAGEMAXSLOT)) {
| ~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~
I previously worked around two of these in commit
|
||
|
|
679330e4a7 |
JFS: always load filesystem UUID during mount
The filesystem UUID was only being loaded into super_block sb when an external journal device was in use. When mounting without an external journal, the UUID remained unset, which prevented the computation of a filesystem ID (fsid), which could be confirmed via `stat -f -c "%i"` and thus user space could not use fanotify correctly. A missing filesystem ID causes fanotify to return ENODEV when marking the filesystem for events like FAN_CREATE, FAN_DELETE, FAN_MOVED_TO, and FAN_MOVED_FROM. As a result, applications relying on fanotify could not monitor these events on JFS filesystems without an external journal. Moved the UUID initialization so it is always performed during mount, ensuring the superblock UUID is consistently available. Signed-off-by: João Paredes <joaommp@yahoo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
ca5848ae87 |
jfs: hold LOG_LOCK on umount to avoid null-ptr-deref
write_special_inodes() function iterate through the log->sb_list and access the sbi fields, which can be set to NULL concurrently by umount. Fix concurrency issue by holding LOG_LOCK and checking for NULL. Reported-by: syzbot+e14b1036481911ae4d77@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e14b1036481911ae4d77 Signed-off-by: Helen Koike <koike@igalia.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
b15e431063 |
jfs: Set the lbmDone flag at the end of lbmIODone
In lbmRead(), the I/O event waited for by wait_event() finishes before it goes to sleep, and the lbmIODone() prematurely sets the flag to lbmDONE, thus ending the wait. This causes wait_event() to return before lbmREAD is cleared (because lbmDONE was set first), the premature return of wait_event() leads to the release of lbuf before lbmIODone() returns, thus triggering the use-after-free vulnerability reported in [1]. Moving the operation of setting the lbmDONE flag to after clearing lbmREAD in lbmIODone() avoids the use-after-free vulnerability reported in [1]. [1] BUG: KASAN: slab-use-after-free in rt_spin_lock+0x88/0x3e0 kernel/locking/spinlock_rt.c:56 Call Trace: blk_update_request+0x57e/0xe60 block/blk-mq.c:1007 blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1169 blk_complete_reqs block/blk-mq.c:1244 [inline] blk_done_softirq+0x10a/0x160 block/blk-mq.c:1249 Allocated by task 6101: lbmLogInit fs/jfs/jfs_logmgr.c:1821 [inline] lmLogInit+0x3d0/0x19e0 fs/jfs/jfs_logmgr.c:1269 open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline] lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069 jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257 jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532 Freed by task 6101: kfree+0x1bd/0x900 mm/slub.c:6876 lbmLogShutdown fs/jfs/jfs_logmgr.c:1864 [inline] lmLogInit+0x1137/0x19e0 fs/jfs/jfs_logmgr.c:1415 open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline] lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069 jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257 jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532 Reported-by: syzbot+1d38eedcb25a3b5686a7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1d38eedcb25a3b5686a7 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
3c778ec882 |
jfs: fix corrupted list in dbUpdatePMap
This patch resolves the "list_add corruption. next is NULL" Oops reported by syzkaller in dbUpdatePMap(). The root cause is uninitialized synclist nodes in struct metapage and struct TxBlock, plus improper list node removal using list_del() (which leaves nodes in an invalid state). This fixes the following Oops reported by syzkaller. list_add corruption. next is NULL. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:28! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 122 Comm: jfsCommit Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 RIP: 0010:__list_add_valid_or_report+0xc3/0x130 lib/list_debug.c:27 Code: 4c 89 f2 48 89 d9 e8 0c 88 a4 fc 90 0f 0b 48 c7 c7 20 de 3d 8b e8 fd 87 a4 fc 90 0f 0b 48 c7 c7 c0 de 3d 8b e8 ee 87 a4 fc 90 <0f> 0b 48 89 df e8 13 c3 7d fd 42 80 7c 2d 00 00 74 08 4c 89 e7 e8 RSP: 0018:ffffc9000395fa20 EFLAGS: 00010246 RAX: 0000000000000022 RBX: 0000000000000000 RCX: 270c5dfadb559700 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000000f0000 R08: 0000000000000000 R09: 0000000000000000 R10: dffffc0000000000 R11: fffff5200072bee9 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000004 R15: 1ffff92000632266 FS: 0000000000000000(0000) GS:ffff888126ef9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056341fdb86c0 CR3: 0000000040a18000 CR4: 00000000003526f0 Call Trace: <TASK> __list_add_valid include/linux/list.h:96 [inline] __list_add include/linux/list.h:158 [inline] list_add include/linux/list.h:177 [inline] dbUpdatePMap+0x7e4/0xeb0 fs/jfs/jfs_dmap.c:577 txAllocPMap+0x57d/0x6b0 fs/jfs/jfs_txnmgr.c:2426 txUpdateMap+0x81e/0x9c0 fs/jfs/jfs_txnmgr.c:2364 txLazyCommit fs/jfs/jfs_txnmgr.c:2665 [inline] jfs_lazycommit+0x3f1/0xa10 fs/jfs/jfs_txnmgr.c:2734 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Reported-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4d0a0feb49c5138cac46 Tested-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
cce219b203 |
jfs: add dmapctl integrity check to prevent invalid operations
Add check_dmapctl() to validate dmapctl structure integrity, focusing on preventing invalid operations caused by on-disk corruption. Key checks: - nleafs bounded by [0, LPERCTL] (maximum leaf nodes per dmapctl). - l2nleafs bounded by [0, L2LPERCTL] and consistent with nleafs (nleafs must be 2^l2nleafs). - leafidx must be exactly CTLLEAFIND (expected leaf index position). - height bounded by [0, L2LPERCTL >> 1] (valid tree height range). - budmin validity: NOFREE only if nleafs=0; otherwise >= BUDMIN. - Leaf nodes fit within stree array (leafidx + nleafs <= CTLTREESIZE). - Leaf node values are either non-negative or NOFREE. Invoked in dbAllocAG(), dbFindCtl(), dbAdjCtl() and dbExtendFS() when accessing dmapctl pages, catching corruption early before dmap operations trigger invalid memory access or logic errors. This fixes the following UBSAN warning. [58245.668090][T14017] ------------[ cut here ]------------ [58245.668103][T14017] UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2641:11 [58245.668119][T14017] shift exponent 110 is too large for 32-bit type 'int' [58245.668137][T14017] CPU: 0 UID: 0 PID: 14017 Comm: 4c1966e88c28fa9 Tainted: G E 6.18.0-rc4-00253-g21ce5d4ba045-dirty #124 PREEMPT_{RT,(full)} [58245.668174][T14017] Tainted: [E]=UNSIGNED_MODULE [58245.668176][T14017] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [58245.668184][T14017] Call Trace: [58245.668200][T14017] <TASK> [58245.668208][T14017] dump_stack_lvl+0x189/0x250 [58245.668288][T14017] ? __pfx_dump_stack_lvl+0x10/0x10 [58245.668301][T14017] ? __pfx__printk+0x10/0x10 [58245.668315][T14017] ? lock_metapage+0x303/0x400 [jfs] [58245.668406][T14017] ubsan_epilogue+0xa/0x40 [58245.668422][T14017] __ubsan_handle_shift_out_of_bounds+0x386/0x410 [58245.668462][T14017] dbSplit+0x1f8/0x200 [jfs] [58245.668543][T14017] dbAdjCtl+0x34c/0xa20 [jfs] [58245.668628][T14017] dbAllocNear+0x2ee/0x3d0 [jfs] [58245.668710][T14017] dbAlloc+0x933/0xba0 [jfs] [58245.668797][T14017] ea_write+0x374/0xdd0 [jfs] [58245.668888][T14017] ? __pfx_ea_write+0x10/0x10 [jfs] [58245.668966][T14017] ? __jfs_setxattr+0x76e/0x1120 [jfs] [58245.669046][T14017] __jfs_setxattr+0xa01/0x1120 [jfs] [58245.669135][T14017] ? __pfx___jfs_setxattr+0x10/0x10 [jfs] [58245.669216][T14017] ? mutex_lock_nested+0x154/0x1d0 [58245.669252][T14017] ? __jfs_xattr_set+0xb9/0x170 [jfs] [58245.669333][T14017] __jfs_xattr_set+0xda/0x170 [jfs] [58245.669430][T14017] ? __pfx___jfs_xattr_set+0x10/0x10 [jfs] [58245.669509][T14017] ? xattr_full_name+0x6f/0x90 [58245.669546][T14017] ? jfs_xattr_set+0x33/0x60 [jfs] [58245.669636][T14017] ? __pfx_jfs_xattr_set+0x10/0x10 [jfs] [58245.669726][T14017] __vfs_setxattr+0x43c/0x480 [58245.669743][T14017] __vfs_setxattr_noperm+0x12d/0x660 [58245.669756][T14017] vfs_setxattr+0x16b/0x2f0 [58245.669768][T14017] ? __pfx_vfs_setxattr+0x10/0x10 [58245.669782][T14017] filename_setxattr+0x274/0x600 [58245.669795][T14017] ? __pfx_filename_setxattr+0x10/0x10 [58245.669806][T14017] ? getname_flags+0x1e5/0x540 [58245.669829][T14017] path_setxattrat+0x364/0x3a0 [58245.669840][T14017] ? __pfx_path_setxattrat+0x10/0x10 [58245.669859][T14017] ? __se_sys_chdir+0x1b9/0x280 [58245.669876][T14017] __x64_sys_lsetxattr+0xbf/0xe0 [58245.669888][T14017] do_syscall_64+0xfa/0xfa0 [58245.669901][T14017] ? lockdep_hardirqs_on+0x9c/0x150 [58245.669913][T14017] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [58245.669927][T14017] ? exc_page_fault+0xab/0x100 [58245.669937][T14017] entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot+4c1966e88c28fa96e053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4c1966e88c28fa96e053 Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
119e448bb5 |
jfs: add dtpage integrity check to prevent index/pointer overflows
Add check_dtpage() to validate dtpage_t integrity, focusing on preventing index/pointer overflows from on-disk corruption. Key checks: - maxslot must be exactly DTPAGEMAXSLOT (128) as defined for dtpage slot array. - freecnt bounded by [0, DTPAGEMAXSLOT-1] (slot[0] reserved for header). - freelist validity: -1 when freecnt=0; 1~DTPAGEMAXSLOT-1 when non-zero, with linked list checks (no duplicates, proper termination via next=-1). - stblindex bounds: must be within range that avoids overlapping with stbl itself (stblindex < DTPAGEMAXSLOT - stblsize). - nextindex bounded by stbl size (stblsize << L2DTSLOTSIZE). stbl entries validity: within 1~DTPAGEMAXSLOT-1, no duplicates(excluding invalid entries marked as -1). Invoked when loading dtpage (in BT_GETPAGE macro context) to catch corruption early before directory operations trigger out-of-bounds access. Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
c83abc766a |
jfs: add dtroot integrity check to prevent index out-of-bounds
Add check_dtroot() to validate dtroot_t integrity, focusing on preventing index/pointer overflows from on-disk corruption. Key checks: - freecnt bounded by [0, DTROOTMAXSLOT-1] (slot[0] reserved for header). - freelist validity: -1 when freecnt=0; 1~DTROOTMAXSLOT-1 when non-zero, with linked list checks (no duplicates, proper termination via next=-1). - stbl bounds: nextindex within stbl array size; entries within 0~8, no duplicates (excluding idx=0). Invoked in copy_from_dinode() when loading directory inodes, catching corruption early before directory operations trigger out-of-bounds access. This fixes the following UBSAN warning. [ 101.832754][ T5960] ------------[ cut here ]------------ [ 101.832762][ T5960] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3713:8 [ 101.832792][ T5960] index -1 is out of range for type 'struct dtslot[128]' [ 101.832807][ T5960] CPU: 2 UID: 0 PID: 5960 Comm: 5f7f0caf9979e9d Tainted: G E 6.18.0-rc4-00250-g2603eb907f03 #119 PREEMPT_{RT,(full [ 101.832817][ T5960] Tainted: [E]=UNSIGNED_MODULE [ 101.832819][ T5960] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 101.832823][ T5960] Call Trace: [ 101.832833][ T5960] <TASK> [ 101.832838][ T5960] dump_stack_lvl+0x189/0x250 [ 101.832909][ T5960] ? __pfx_dump_stack_lvl+0x10/0x10 [ 101.832925][ T5960] ? __pfx__printk+0x10/0x10 [ 101.832934][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0 [ 101.832959][ T5960] ubsan_epilogue+0xa/0x40 [ 101.832966][ T5960] __ubsan_handle_out_of_bounds+0xe9/0xf0 [ 101.833007][ T5960] dtInsertEntry+0x936/0x1430 [jfs] [ 101.833094][ T5960] dtSplitPage+0x2c8b/0x3ed0 [jfs] [ 101.833177][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10 [ 101.833193][ T5960] dtInsert+0x109b/0x6000 [jfs] [ 101.833283][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0 [ 101.833296][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10 [ 101.833307][ T5960] ? rt_spin_unlock+0x161/0x200 [ 101.833315][ T5960] ? __pfx_dtInsert+0x10/0x10 [jfs] [ 101.833391][ T5960] ? txLock+0xaf9/0x1cb0 [jfs] [ 101.833477][ T5960] ? dtInitRoot+0x22a/0x670 [jfs] [ 101.833556][ T5960] jfs_mkdir+0x6ec/0xa70 [jfs] [ 101.833636][ T5960] ? __pfx_jfs_mkdir+0x10/0x10 [jfs] [ 101.833721][ T5960] ? generic_permission+0x2e5/0x690 [ 101.833760][ T5960] ? bpf_lsm_inode_mkdir+0x9/0x20 [ 101.833776][ T5960] vfs_mkdir+0x306/0x510 [ 101.833786][ T5960] do_mkdirat+0x247/0x590 [ 101.833795][ T5960] ? __pfx_do_mkdirat+0x10/0x10 [ 101.833804][ T5960] ? getname_flags+0x1e5/0x540 [ 101.833815][ T5960] __x64_sys_mkdir+0x6c/0x80 [ 101.833823][ T5960] do_syscall_64+0xfa/0xfa0 [ 101.833832][ T5960] ? lockdep_hardirqs_on+0x9c/0x150 [ 101.833840][ T5960] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 101.833847][ T5960] ? exc_page_fault+0xab/0x100 [ 101.833856][ T5960] entry_SYSCALL_64_after_hwframe+0x77/0x7f Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
0b2600f81c
|
treewide: change inode->i_ino from unsigned long to u64
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
|
|
bf4afc53b7 |
Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
69050f8d6d |
treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org> |
||
|
|
178574549e |
Just a handful of minor jfs fixes
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmmM9+oACgkQNqiEXrVA jGT44Q/+OShL5wf+qdOehctrQ62ESQ9Xc4OyFHc73sef7FwfdMi3WN4MPwyNdvQf IIPyD0DbCob2XDtPoSuUGWUzJL/68u/0B9Lwi8e/f1cNQgUGSWuNmsff0gGeyzqU IfuGsHbIyiOt45n9GVU0G34BHOyI37ajWJv4cTFEIyP7wLvDpOcoWaxPFALcTFms PLG4PxGvW2SIzwEJnMyElGxFEX27RoUScIsiVRmpxjVS8551xwkPsbpwqeF9YLkI KtYHSVCtH+VN0nz07uEmvO3nzNV9vEkAI52YoI72a8Fb7ZJnxgTXLxW5Hs1cyQJL 7ZHCqmz8M1/URA9m0AY8+rGtIpm1g89njyvL1vbHpCB48/DW2VMdP4+GZCf7onsx QtI7yy+I7E3myZNDsQ3+wqAGJY8F6Jetmy833IHxzUHyjkycTmOiISP/KlfY5B3s TiwWwjTR6rzOQBGErLvOATQTKcDWWlsHcN7BDZ6QkEr7fjsZYEfIlSugMD7hxPP5 xsTn4eDjeDjAt4WUOF0BzLq5wDaAObo1aUgjjJ85E3E35e1QttsHPoBlbzjBsmFI HU8ufs7crhxPp+z+8KCSZ8Zp1geP4ZvYlLCk5OwVJ+0K6c4YhsHiQWPT8rC812YG 08TnpadTEX4s8K1QPDo8Ha/K5wqnGD6Y6RY0rnw9wWpUyDr40dU= =qxI/ -----END PGP SIGNATURE----- Merge tag 'jfs-7.0' of github.com:kleikamp/linux-shaggy Pull jfs updates from Dave Kleikamp: "Just a handful of minor jfs fixes" * tag 'jfs-7.0' of github.com:kleikamp/linux-shaggy: jfs: avoid -Wtautological-constant-out-of-range-compare warning jfs: Add missing set_freezable() for freezable kthread jfs: nlink overflow in jfs_rename |
||
|
|
9e355113f0 |
vfs-7.0-rc1.misc
Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc
ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u
SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo=
=JwZz
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains a mix of VFS cleanups, performance improvements, API
fixes, documentation, and a deprecation notice.
Scalability and performance:
- Rework pid allocation to only take pidmap_lock once instead of
twice during alloc_pid(), improving thread creation/teardown
throughput by 10-16% depending on false-sharing luck. Pad the
namespace refcount to reduce false-sharing
- Track file lock presence via a flag in ->i_opflags instead of
reading ->i_flctx, avoiding false-sharing with ->i_readcount on
open/close hot paths. Measured 4-16% improvement on 24-core
open-in-a-loop benchmarks
- Use a consume fence in locks_inode_context() to match the
store-release/load-consume idiom, eliminating a hardware fence on
some architectures
- Annotate cdev_lock with __cacheline_aligned_in_smp to prevent
false-sharing
- Remove a redundant DCACHE_MANAGED_DENTRY check in
__follow_mount_rcu() that never fires since the caller already
verifies it, eliminating a 100% mispredicted branch
- Fix a 100% mispredicted likely() in devcgroup_inode_permission()
that became wrong after a prior code reorder
Bug fixes and correctness:
- Make insert_inode_locked() wait for inode destruction instead of
skipping, fixing a corner case where two matching inodes could
exist in the hash
- Move f_mode initialization before file_ref_init() in alloc_file()
to respect the SLAB_TYPESAFE_BY_RCU ordering contract
- Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with
no buffers attached, preventing a null pointer dereference when
AS_RELEASE_ALWAYS is set but no release_folio op exists
- Fix select restart_block to store end_time as timespec64, avoiding
truncation of tv_sec on 32-bit architectures
- Make dump_inode() use get_kernel_nofault() to safely access inode
and superblock fields, matching the dump_mapping() pattern
API modernization:
- Make posix_acl_to_xattr() allocate the buffer internally since
every single caller was doing it anyway. Reduces boilerplate and
unnecessary error checking across ~15 filesystems
- Replace deprecated simple_strtoul() with kstrtoul() for the
ihash_entries, dhash_entries, mhash_entries, and mphash_entries
boot parameters, adding proper error handling
- Convert chardev code to use guard(mutex) and __free(kfree) cleanup
patterns
- Replace min_t() with min() or umin() in VFS code to avoid silently
truncating unsigned long to unsigned int
- Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers
already check the flag
Deprecation:
- Begin deprecating legacy BSD process accounting (acct(2)). The
interface has numerous footguns and better alternatives exist
(eBPF)
Documentation:
- Fix and complete kernel-doc for struct export_operations, removing
duplicated documentation between ReST and source
- Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait()
Testing:
- Add a kunit test for initramfs cpio handling of entries with
filesize > PATH_MAX
Misc:
- Add missing <linux/init_task.h> include in fs_struct.c"
* tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
posix_acl: make posix_acl_to_xattr() alloc the buffer
fs: make insert_inode_locked() wait for inode destruction
initramfs_test: kunit test for cpio.filesize > PATH_MAX
fs: improve dump_inode() to safely access inode fields
fs: add <linux/init_task.h> for 'init_fs'
docs: exportfs: Use source code struct documentation
fs: move initializing f_mode before file_ref_init()
exportfs: Complete kernel-doc for struct export_operations
exportfs: Mark struct export_operations functions at kernel-doc
exportfs: Fix kernel-doc output for get_name()
acct(2): begin the deprecation of legacy BSD process accounting
device_cgroup: remove branch hint after code refactor
VFS: fix __start_dirop() kernel-doc warnings
fs: Describe @isnew parameter in ilookup5_nowait()
fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
select: store end_time as timespec64 in restart block
chardev: Switch to guard(mutex) and __free(kfree)
namespace: Replace simple_strtoul with kstrtoul to parse boot params
dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
...
|
||
|
|
7833570dae |
jfs: avoid -Wtautological-constant-out-of-range-compare warning
A recent change for the range check started triggering a clang warning:
fs/jfs/jfs_dtree.c:2906:31: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
2906 | if (stbl[i] < 0 || stbl[i] >= DTPAGEMAXSLOT) {
| ~~~~~~~ ^ ~~~~~~~~~~~~~
fs/jfs/jfs_dtree.c:3111:30: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
3111 | if (stbl[0] < 0 || stbl[0] >= DTPAGEMAXSLOT) {
| ~~~~~~~ ^ ~~~~~~~~~~~~~
Both the old and the new check were useless, but the previous version
apparently did not lead to the warning.
Remove the extraneous range check for simplicity.
Fixes:
|
||
|
|
6cbfdf8947
|
posix_acl: make posix_acl_to_xattr() alloc the buffer
Without exception all caller do that. So move the allocation into the helper. This reduces boilerplate and removes unnecessary error checking. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
|
|
7dd596bb35
|
jfs: add setlease file operation
Add the setlease file_operation to jfs_file_operations and jfs_dir_operations, pointing to generic_setlease. A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Add generic_setlease to retain the ability to set leases on this filesystem. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-12-ea4dec9b67fa@kernel.org Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
|
|
eb0cfcf265 |
jfs: Add missing set_freezable() for freezable kthread
The jfsIOWait() thread calls try_to_freeze() but lacks set_freezable(), causing it to remain non-freezable by default. This prevents proper freezing during system suspend. Add set_freezable() to make the thread freezable as intended. Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
9218dc26fd |
jfs: nlink overflow in jfs_rename
If nlink is maximal for a directory (-1) and inside that directory you perform a rename for some child directory (not moving from the parent), then the nlink of the first directory is first incremented and later decremented. Normally this is fine, but when nlink = -1 this causes a wrap around to 0, and then drop_nlink issues a warning. After applying the patch syzbot no longer issues any warnings. I also ran some basic fs tests to look for any regressions. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Reported-by: syzbot+9131ddfd7870623b719f@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=9131ddfd7870623b719f Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
9368f0f941 |
vfs-6.19-rc1.inode
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc
omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC
RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM=
=zo/J
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"Features:
- Hide inode->i_state behind accessors. Open-coded accesses prevent
asserting they are done correctly. One obvious aspect is locking,
but significantly more can be checked. For example it can be
detected when the code is clearing flags which are already missing,
or is setting flags when it is illegal (e.g., I_FREEING when
->i_count > 0)
- Provide accessors for ->i_state, converts all filesystems using
coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
compile
- Rework I_NEW handling to operate without fences, simplifying the
code after the accessor infrastructure is in place
Cleanups:
- Move wait_on_inode() from writeback.h to fs.h
- Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
for clarity
- Cosmetic fixes to LRU handling
- Push list presence check into inode_io_list_del()
- Touch up predicts in __d_lookup_rcu()
- ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
- Assert on ->i_count in iput_final()
- Assert ->i_lock held in __iget()
Fixes:
- Add missing fences to I_NEW handling"
* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
dcache: touch up predicts in __d_lookup_rcu()
fs: push list presence check into inode_io_list_del()
fs: cosmetic fixes to lru handling
fs: rework I_NEW handling to operate without fences
fs: make plain ->i_state access fail to compile
xfs: use the new ->i_state accessors
nilfs2: use the new ->i_state accessors
overlayfs: use the new ->i_state accessors
gfs2: use the new ->i_state accessors
f2fs: use the new ->i_state accessors
smb: use the new ->i_state accessors
ceph: use the new ->i_state accessors
btrfs: use the new ->i_state accessors
Manual conversion to use ->i_state accessors of all places not covered by coccinelle
Coccinelle-based conversion to use ->i_state accessors
fs: provide accessors for ->i_state
fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
fs: move wait_on_inode() from writeback.h to fs.h
fs: add missing fences to I_NEW handling
ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
...
|
||
|
|
a6773e6932
|
jfs: Rename _inline to avoid conflict with clang's '-fms-extensions'
Building fs/jfs with clang and '-fms-extensions' errors with:
In file included from fs/jfs/jfs_unicode.c:8:
fs/jfs/jfs_incore.h:86:13: error: type name does not allow function specifier to be specified
86 | unchar _inline[128];
| ^
fs/jfs/jfs_incore.h:86:20: error: expected member name or ';' after declaration specifiers
86 | unchar _inline[128];
| ~~~~~~~~~~~~~~^
'-fms-extensions' in clang enables several other Microsoft specific
keywords such as _inline [1], presumably for compatibility with MSVC, as
Microsoft's documentation [2] mentions:
For compatibility with previous versions, _inline and _forceinline are
synonyms for __inline and __forceinline, respectively
Rename the _inline array in 'struct jfs_inode_info' to _inline_sym to
avoid this conflict, which is not a large workaround as this member is
only ever referred to via the i_inline macro.
Link:
|
||
|
|
b4dbfd8653
|
Coccinelle-based conversion to use ->i_state accessors
All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 | flag2) @@ expression inode, flags; @@ - inode->i_state |= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
|
|
5cb08b62fb |
A few fixes and cleanups for JFS.
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmjdREQACgkQNqiEXrVA jGRaew/+JTYtPNiWV/1T1jrYBwhnm+66nDasC4bM/JlyCSe6x2Q7sLWNq8KnYV8c KaVOQk/4QgZx1uvV64LUXJHlU7ayunDPpYo/s9/wjUA4HumFgU4eg1q/OP2BzdFT QxlXsXXk1Ay/A4tTcI6JHXJJwhux1dZf5APOQ7314OocdK5twCeKJ7uyQjQ5F10E Zdi/fNGWlsgrf5iabBRow7UfkiK8X0rwOM+ngl3YsZM12ZGkuTawpj/cWVtr7xom XJr8d5XeLAAzfC5jYaKHbzaQyoPo3LN1uNRIfNDB6Jsa9dsYuZdbtIpToVsrTLX8 jGRdJyS9wzeY4J6uZtKuHtzppJ0f35RIJk1ebeMsk3ezofuZEX2AMR2wLuTwLWcV bABu6FodwNa9ag2FVZzGK3v7OcOkCIAqySR2+564pjClEHGeXzSFYXz4hNWHMyRc 3L0qw5spjrMR5pQjPLQlraPpmFhwbErrBHam48TM1bz0JUBlXkt3tkiMQQ0uhM+z /F7fms8uPbI+3UVyMuvYo+qT9yiTmMdlNsk1wf6QkD4FEmBfOSoETOh+u4bytMnQ REW4oNP5QxZ6iOIkBwNvN04Vdkjbqiul5bLzujFbbgbRd1B89p/+iZVLawcOnbfm wEtp+uD5fulqVF8XeXdJgaSg1i9yDyK4nQonVmtKbFEDJWq0+v0= =QpFA -----END PGP SIGNATURE----- Merge tag 'jfs-6.18' of github.com:kleikamp/linux-shaggy Pull jfs updates from Dave Kleikamp: "A few fixes and cleanups for JFS" * tag 'jfs-6.18' of github.com:kleikamp/linux-shaggy: jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant JFS: Remove redundant 0 value initialization JFS: Remove unnecessary parentheses jfs: fix uninitialized waitqueue in transaction manager jfs: Verify inode mode when loading from disk |
||
|
|
cafc667982 |
jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant
Replace hardcoded value 127 with DTPAGEMAXSLOT constant in boundary checks within jfs_readdir() and dtReadFirst(). This improves code maintainability and ensures consistency with the defined maximum slot value. Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
e551cc21bb |
JFS: Remove redundant 0 value initialization
The jfs_log struct is already zeroed by kzalloc(). It's redundant to initialize dummy_log->base to 0. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
69f7321ce7 |
JFS: Remove unnecessary parentheses
When using &, it's unnecessary to have parentheses afterward. Remove redundant parentheses to enhance readability. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
300b072df7 |
jfs: fix uninitialized waitqueue in transaction manager
The transaction manager initialization in txInit() was not properly initializing TxBlock[0].waitor waitqueue, causing a crash when txEnd(0) is called on read-only filesystems. When a filesystem is mounted read-only, txBegin() returns tid=0 to indicate no transaction. However, txEnd(0) still gets called and tries to access TxBlock[0].waitor via tid_to_tblock(0), but this waitqueue was never initialized because the initialization loop started at index 1 instead of 0. This causes a 'non-static key' lockdep warning and system crash: INFO: trying to register non-static key in txEnd Fix by ensuring all transaction blocks including TxBlock[0] have their waitqueues properly initialized during txInit(). Reported-by: syzbot+c4f3462d8b2ad7977bea@syzkaller.appspotmail.com Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
7a5aa54fba |
jfs: Verify inode mode when loading from disk
The inode mode loaded from corrupted disk can be invalid. Do like what
commit
|
||
|
|
fb49a4425c |
treewide: remove MIGRATEPAGE_SUCCESS
At this point MIGRATEPAGE_SUCCESS is misnamed for all folio users, and now that we remove MIGRATEPAGE_UNMAP, it's really the only "success" return value that the code uses and expects. Let's just get rid of MIGRATEPAGE_SUCCESS completely and just use "0" for success. Link: https://lkml.kernel.org/r/20250811143949.1117439-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> [mm] Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> [jfs] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Byungchul Park <byungchul@sk.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Lance Yang <lance.yang@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
440e6d7e14 |
Fixes and cleanups for JFS filesystem
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmiLd+AACgkQNqiEXrVA jGTZjxAAjJGErKMS4XwC10Cpyy7en97xQF8qHnGWr0nIss7dvdN0zw8oUBCZFuso uhtWddAcEwuWK+VcNj1TdEf3tSyxbjGkMA9Gh7amv7eI7YVNrFz7OZBSWMsd2DPP ZJzPfExDIGEygWP5KDPd3qrCIA+hMwwJmpCY0sO7zZbrH8aftOUJQe5Gi/a+43f4 n+vTE1d/0h0PK/Vvy1Ceiql7V0HdJ7SWOfBfTTTQivdzdk8UFqgDoC27WJ1DpfQR XTTEIoqizSeSV5YXDqwF6d7QQ3gROvzQlk0J7r3CUJ4gYHF0hDEFmHStYnkz8ZKB Z+KMMEnd2Mt/NUFBI6+tYnMzAVj5fq4XLTfe7GqM7ZL+z2JRH+kAT7SGQJ0fabPA FlNIA7fpLd1c0uuZghgS1rdY2uLYJEY2gFumcn9T0fbMsE10hSbfrkbldXCvkMpN z1pLHj0yvG1yXUOXjgvdkpJEGpdp63Dq7qNq5yAbXCURzH08nAQQ2Fx9vX86WNta 8vkBrx/6X2uIE7Cd4WOSj3IYb4LY9BSBc9fScVd7D5kkwN0qAHTJNSdMUO0vpvjx +wwYMhfIVAreacrmSVsCAh+Zaf7zg/VwS+5GZ+trBx5/KNeLVI5f/QbOXcDhHWm3 Y/2oQ8TAZgcPoyIROmZ/bQD4T+v48IESNrE54yt5WUazAeIxqYo= =jTbL -----END PGP SIGNATURE----- Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy Pull jfs updates from Dave Kleikamp: "Fixes and cleanups for JFS filesystem" * tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy: jfs: fix metapage reference count leak in dbAllocCtl jfs: stop using write_cache_pages jfs: truncate good inode pages when hard link is 0 jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage() jfs: Regular file corruption check jfs: upper bound check of tree index in dbAllocAG |
||
|
|
856db37592 |
jfs: fix metapage reference count leak in dbAllocCtl
In dbAllocCtl(), read_metapage() increases the reference count of the
metapage. However, when dp->tree.budmin < 0, the function returns -EIO
without calling release_metapage() to decrease the reference count,
leading to a memory leak.
Add release_metapage(mp) before the error return to properly manage
the metapage reference count and prevent the leak.
Fixes:
|
||
|
|
57fcb7d930 |
vfs-6.17-rc1.fileattr
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4= =geAl -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file |
||
|
|
7031769e10 |
vfs-6.17-rc1.mmap_prepare
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO pcYE40CacaekD8rFWwYUyszpgmv6ewc= =wCwp -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in |
||
|
|
7879d7aff0 |
vfs-6.17-rc1.misc
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM/KwAKCRCRxhvAZXjc
opT+AP407JwhRSBjUEmHg5JzUyDoivkOySdnthunRjaBKD8rlgEApM6SOIZYucU7
cPC3ZY6ORFM6Mwaw+iDW9lasM5ucHQ8=
=CHha
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc VFS updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add ext4 IOCB_DONTCACHE support
This refactors the address_space_operations write_begin() and
write_end() callbacks to take const struct kiocb * as their first
argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
to the filesystem's buffered I/O path.
Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
and advertises support via the FOP_DONTCACHE file operation flag.
Additionally, the i915 driver's shmem write paths are updated to
bypass the legacy write_begin/write_end interface in favor of
directly calling write_iter() with a constructed synchronous kiocb.
Another i915 change replaces a manual write loop with
kernel_write() during GEM shmem object creation.
Cleanups:
- don't duplicate vfs_open() in kernel_file_open()
- proc_fd_getattr(): don't bother with S_ISDIR() check
- fs/ecryptfs: replace snprintf with sysfs_emit in show function
- vfs: Remove unnecessary list_for_each_entry_safe() from
evict_inodes()
- filelock: add new locks_wake_up_waiter() helper
- fs: Remove three arguments from block_write_end()
- VFS: change old_dir and new_dir in struct renamedata to dentrys
- netfs: Remove unused declaration netfs_queue_write_request()
Fixes:
- eventpoll: Fix semi-unbounded recursion
- eventpoll: fix sphinx documentation build warning
- fs/read_write: Fix spelling typo
- fs: annotate data race between poll_schedule_timeout() and
pollwake()
- fs/pipe: set FMODE_NOWAIT in create_pipe_files()
- docs/vfs: update references to i_mutex to i_rwsem
- fs/buffer: remove comment about hard sectorsize
- fs/buffer: remove the min and max limit checks in __getblk_slow()
- fs/libfs: don't assume blocksize <= PAGE_SIZE in
generic_check_addressable
- fs_context: fix parameter name in infofc() macro
- fs: Prevent file descriptor table allocations exceeding INT_MAX"
* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
netfs: Remove unused declaration netfs_queue_write_request()
eventpoll: fix sphinx documentation build warning
ext4: support uncached buffered I/O
mm/pagemap: add write_begin_get_folio() helper function
fs: change write_begin/write_end interface to take struct kiocb *
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
drm/i915: Use kernel_write() in shmem object create
eventpoll: Fix semi-unbounded recursion
vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
fs/buffer: remove the min and max limit checks in __getblk_slow()
fs: Prevent file descriptor table allocations exceeding INT_MAX
fs: Remove three arguments from block_write_end()
fs/ecryptfs: replace snprintf with sysfs_emit in show function
fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
docs/vfs: update references to i_mutex to i_rwsem
fs/buffer: remove comment about hard sectorsize
fs_context: fix parameter name in infofc() macro
VFS: change old_dir and new_dir in struct renamedata to dentrys
proc_fd_getattr(): don't bother with S_ISDIR() check
...
|
||
|
|
e9d8e2bf23
|
fs: change write_begin/write_end interface to take struct kiocb *
Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
|
|
b89798e79c |
jfs: stop using write_cache_pages
Stop using the obsolete write_cache_pages and use writeback_iter directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
2d91b3765c |
jfs: truncate good inode pages when hard link is 0
The fileset value of the inode copy from the disk by the reproducer is AGGR_RESERVED_I. When executing evict, its hard link number is 0, so its inode pages are not truncated. This causes the bugon to be triggered when executing clear_inode() because nrpages is greater than 0. Reported-by: syzbot+6e516bb515d93230bc7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e516bb515d93230bc7b Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
1014354cd8 |
jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()
Replace legacy XT_GETPAGE macro with an inline function that returns a xtpage_t pointer and update all instances of XT_GETPAGE in jfs_xtree.c Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Simplified xt_getpage by removing size and rc arguments. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
2d04df8116 |
jfs: Regular file corruption check
The reproducer builds a corrupted file on disk with a negative i_size value. Add a check when opening this file to avoid subsequent operation failures. Reported-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=630f6d40b3ccabc8e96e Tested-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
c214006856 |
jfs: upper bound check of tree index in dbAllocAG
When computing the tree index in dbAllocAG, we never check if we are out of bounds realative to the size of the stree. This could happen in a scenario where the filesystem metadata are corrupted. Reported-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cffd18309153948f3c3e Tested-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
ca115d7e75
|
tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.
Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
||
|
|
951ea2f484
|
fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
Since commit
|
||
|
|
05fb0e6664 |
new helper: set_default_d_op()
... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
00c010e130 |
- The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox
simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - The 3 patch series "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - The 2 patch series "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - The 8 patch series "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - The 2 patch series "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - The 12 patch series "Always call constructor for kernel page tables" from Kevin Brodsky "ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables". This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - The 9 patch series "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - The 3 patch series "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - The 6 patch series "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - The 3 patch series ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - The 2 patch series "vmscan: enforce mems_effective during demotion" from Gregory Price "changes reclaim to respect cpuset.mems_effective during demotion when possible". because "presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated." "This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently." - The 2 patch series ""Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - The 4 patch series "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - The 2 patch series "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - The 4 patch series "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - The 6 patch series "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - The 4 patch series "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - The 6 patch series "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is "yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents." - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - The 4 patch series "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x Qqb/UGMEpilyre1PayQqOnct3aSL9Ao= =tYYm -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ... |
||
|
|
906d7ce3b5 |
jfs: implement migrate_folio for jfs_metapage_aops
Add the missing migrate_folio operation to jfs_metapage_aops to fix warnings during memory compaction. These warnings were introduced by commit |
||
|
|
5dff41a863 |
jfs: fix array-index-out-of-bounds read in add_missing_indices
stbl is s8 but it must contain offsets into slot which can go from 0 to 127. Added a bound check for that error and return -EIO if the check fails. Also make jfs_readdir return with error if add_missing_indices returns with an error. Reported-by: syzbot+b974bd41515f770c608b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com./bug?extid=b974bd41515f770c608b Signed-off-by: Aditya Dutt <duttaditya18@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
a4685408ff |
jfs: Fix null-ptr-deref in jfs_ioc_trim
[ Syzkaller Report ] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000087: 0000 [#1 KASAN: null-ptr-deref in range [0x0000000000000438-0x000000000000043f] CPU: 2 UID: 0 PID: 10614 Comm: syz-executor.0 Not tainted 6.13.0-rc6-gfbfd64d25c7a-dirty #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Sched_ext: serialise (enabled+all), task: runnable_at=-30ms RIP: 0010:jfs_ioc_trim+0x34b/0x8f0 Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93 90 82 fe ff 4c 89 ff 31 f6 RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206 RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001 RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000 R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438 FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x61/0xb0 ? die_addr+0xb1/0xe0 ? exc_general_protection+0x333/0x510 ? asm_exc_general_protection+0x26/0x30 ? jfs_ioc_trim+0x34b/0x8f0 jfs_ioctl+0x3c8/0x4f0 ? __pfx_jfs_ioctl+0x10/0x10 ? __pfx_jfs_ioctl+0x10/0x10 __se_sys_ioctl+0x269/0x350 ? __pfx___se_sys_ioctl+0x10/0x10 ? do_syscall_64+0xfb/0x210 do_syscall_64+0xee/0x210 ? syscall_exit_to_user_mode+0x1e0/0x330 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe51f4903ad Code: c3 e8 a7 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d RSP: 002b:00007fe5202250c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fe51f5cbf80 RCX: 00007fe51f4903ad RDX: 0000000020000680 RSI: 00000000c0185879 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe520225640 R13: 000000000000000e R14: 00007fe51f44fca0 R15: 00007fe52021d000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:jfs_ioc_trim+0x34b/0x8f0 Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93 90 82 fe ff 4c 89 ff 31 f6 RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206 RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001 RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000 R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438 FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception [ Analysis ] We believe that we have found a concurrency bug in the `fs/jfs` module that results in a null pointer dereference. There is a closely related issue which has been fixed: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234 ... but, unfortunately, the accepted patch appears to still be susceptible to a null pointer dereference under some interleavings. To trigger the bug, we think that `JFS_SBI(ipbmap->i_sb)->bmap` is set to NULL in `dbFreeBits` and then dereferenced in `jfs_ioc_trim`. This bug manifests quite rarely under normal circumstances, but is triggereable from a syz-program. Reported-and-tested-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Reported-and-tested-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Signed-off-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> |
||
|
|
37bfb464dd |
jfs: validate AG parameters in dbMount() to prevent crashes
Validate db_agheight, db_agwidth, and db_agstart in dbMount to catch
corrupted metadata early and avoid undefined behavior in dbAllocAG.
Limits are derived from L2LPERCTL, LPERCTL/MAXAG, and CTLTREESIZE:
- agheight: 0 to L2LPERCTL/2 (0 to 5) ensures shift
(L2LPERCTL - 2*agheight) >= 0.
- agwidth: 1 to min(LPERCTL/MAXAG, 2^(L2LPERCTL - 2*agheight))
ensures agperlev >= 1.
- Ranges: 1-8 (agheight 0-3), 1-4 (agheight 4), 1 (agheight 5).
- LPERCTL/MAXAG = 1024/128 = 8 limits leaves per AG;
2^(10 - 2*agheight) prevents division to 0.
- agstart: 0 to CTLTREESIZE-1 - agwidth*(MAXAG-1) keeps ti within
stree (size 1365).
- Ranges: 0-1237 (agwidth 1), 0-348 (agwidth 8).
UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:1400:9
shift exponent -335544310 is negative
CPU: 0 UID: 0 PID: 5822 Comm: syz-executor130 Not tainted 6.14.0-rc5-syzkaller #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468
dbAllocAG+0x1087/0x10b0 fs/jfs/jfs_dmap.c:1400
dbDiscardAG+0x352/0xa20 fs/jfs/jfs_dmap.c:1613
jfs_ioc_trim+0x45a/0x6b0 fs/jfs/jfs_discard.c:105
jfs_ioctl+0x2cd/0x3e0 fs/jfs/ioctl.c:131
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
f79adee883 |
Various bug fixes and cleanups for JFS
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmfkKcAACgkQNqiEXrVA jGSE+g//ce4sps0QtKuVVQehW7+dp5OxGpNPwuqySbQ6p59QotyyFVnlqGV/927R NdnxcgK8Hd50XHWIsZbfFZQMuTM9c9ZzkvAna1w3xw+jXtc0S1Ym9zZY47Bq4cX2 D5A9bP5LONdyTgSGlA1P8e+NAfLnrMoO9kdef8EZC9iXpfY8RQj9sgF0TAmeQshU oMxcXoUWJj1zB9QzaikYzPCkr3AMWatR1CtWkLMs/IHTmEF8e3XhZOFhSfVlzZqO aN2YkxJG+E85Na8c7erPNFHuRtUzGu3v9J673aKfuWjpZEbYaFhSAoViESCQfeeu o/ArJurb6wNzYYNTTa1KG4083jWJZMoK582J/LHhpiUqikS5ovFkmcRYEEqWHHAW /ORLcqO+FX/hdbqomcmlSFJjMtaMiKIoM3KiXgAIjhrbh5vsOX+k60XfyJnCbakn IPEv1OyVX//V+UvK9dXAU7FLn94Xs9s1OehFYrAbns+zznp2bcfZHP3e6BYmPsyn FlRI8kc9K5+tbIrOdksAmopH00Ao9DjzGQUmh38ozOgKofLpJYYnwb7nV0pMPz3g ervm/9b+JUamtqwypY5YgKHV0K/BU9EenDkHKhhceAOGtV/Er4iIGJhL/Du97jqG M1Ow6Cr5mD/DF0DYQ1xfTWpiW+VaPvOmlIBjBU/kaKuoIzbg3Rg= =YeWl -----END PGP SIGNATURE----- Merge tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy Pull jfs updates from David Kleikamp: "Various bug fixes and cleanups for JFS" * tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy: jfs: add index corruption check to DT_GETPAGE() fs/jfs: consolidate sanity checking in dbMount jfs: add sanity check for agwidth in dbMount jfs: Prevent copying of nlink with value 0 from disk inode fs/jfs: Prevent integer overflow in AG size calculation fs/jfs: cast inactags to s64 to prevent potential overflow jfs: Fix uninit-value access of imap allocated in the diMount() function jfs: fix slab-out-of-bounds read in ea_get() jfs: add check read-only before truncation in jfs_truncate_nolock() jfs: add check read-only before txBeginAnon() call jfs: reject on-disk inodes of an unsupported type jfs: Remove reference to bh->b_page jfs: Delete a couple tabs in jfs_reconfigure() |
||
|
|
a8dfb21689 |
jfs: add index corruption check to DT_GETPAGE()
If the file system is corrupted, the header.stblindex variable
may become greater than 127. Because of this, an array access out
of bounds may occur:
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3096:10
index 237 is out of range for type 'struct dtslot[128]'
CPU: 0 UID: 0 PID: 5822 Comm: syz-executor740 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
dtReadFirst+0x622/0xc50 fs/jfs/jfs_dtree.c:3096
dtReadNext fs/jfs/jfs_dtree.c:3147 [inline]
jfs_readdir+0x9aa/0x3c50 fs/jfs/jfs_dtree.c:2862
wrap_directory_iterator+0x91/0xd0 fs/readdir.c:65
iterate_dir+0x571/0x800 fs/readdir.c:108
__do_sys_getdents64 fs/readdir.c:403 [inline]
__se_sys_getdents64+0x1e2/0x4b0 fs/readdir.c:389
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
---[ end trace ]---
Add a stblindex check for corruption.
Reported-by: syzbot <syzbot+9120834fc227768625ba@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=9120834fc227768625ba
Fixes:
|