linux

mirror of https://github.com/torvalds/linux.git synced 2026-07-27 17:47:41 +02:00

Author	SHA1	Message	Date
Eric Biggers	6fe4e4b825	fscrypt: Avoid dynamic allocation in fscrypt_get_devices() When a blk_crypto_key starts being used or is evicted, fs/crypto/ calls fscrypt_get_devices() to get the filesystem's list of block devices, then iterates over them and calls blk_crypto_config_supported(), blk_crypto_start_using_key(), or blk_crypto_evict_key() on each one. Currently, the block device pointers are placed in a dynamically allocated array. This dynamic allocation is problematic because: - It can fail, especially at the fscrypt_destroy_inline_crypt_key() call site when it's invoked for inode eviction under direct reclaim. - fscrypt_destroy_inline_crypt_key() doesn't handle the failure. It just zeroizes and frees the blk_crypto_key without calling blk_crypto_evict_key(). That causes a use-after-free. For now, let's fix this in the straightforward and easily-backportable way by switching to an on-stack array. Currently the fscrypt multi-device functionality is used only by f2fs, which has a hardcoded limit of 8 block devices. An on-stack array works fine for that. (Of course, this solution won't scale up to large number of block devices. For that we'd need a different solution, like moving the block device iteration into the filesystem. Or in the case of btrfs, which will only support blk-crypto-fallback, we should make it just call blk-crypto-fallback directly, so the block devices won't be needed.) Fixes: `22e9947a4b` ("fscrypt: stop holding extra request_queue references") Cc: stable@vger.kernel.org Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260713023708.9245-1-ebiggers%40kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260719055602.78828-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2026-07-20 10:39:24 -07:00
Sunmin Jeong	4275b59673	f2fs: fix to round down start offset of fallocate for pin file Currently, the length of fallocate for pin file is section-aligned to keep allocated sections from being selected as victims of GC. However, for the case that the start offset of fallocate is not aligned in section, the allocated sections can't be fully utilized. It's because a new section is allocated by f2fs_allocate_pinning_section() after using blks_per_sec blocks regardless of the start offset. As a result, several unexpected dirty segments may be created, including blocks assigned to the pinned file. To address this issue, let's round down the start offset of fallocate to the length of section. The reproducing scenario is as below chunk=$(((2<<20)+4096)) # 2MB + 4KB touch test f2fs_io pinfile set test f2fs_io fallocate 0 0 $chunk test f2fs_io fallocate 0 $chunk $chunk test f2fs_io fallocate 0 $((chunk2)) $chunk test f2fs_io fiemap 0 $((chunk3)) test Fiemap: offset = 0 len = 12288 logical addr. physical addr. length flags 0 0000000000000000 000000068c600000 0000000000400000 00001088 1 0000000000400000 000000003d400000 0000000000001000 00001088 2 0000000000401000 00000003eb200000 0000000000200000 00001088 3 0000000000601000 00000005e4200000 0000000000001000 00001088 4 0000000000602000 0000000605400000 0000000000200000 00001089 Cc: stable@vger.kernel.org Fixes: `f5a53edcf0` ("f2fs: support aligned pinned file") Reviewed-by: Yunji Kang <yunji0.kang@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:55:27 +00:00
Keshav Verma	5ef5bc304f	f2fs: fix listxattr handling of corrupted xattr entries Validate the xattr entry before reading its fields in f2fs_listxattr(). Return -EFSCORRUPTED when the entry is outside the valid xattr storage area instead of returning a successful partial result. Fixes: `688078e7f3` ("f2fs: fix to avoid memory leakage in f2fs_listxattr") Cc: stable@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Keshav Verma <iganschel@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:55:27 +00:00
Wenjie Qi	34636c6dcd	f2fs: skip direct I/O iostat context when disabled F2FS iostat is optional and is disabled by default. Direct I/O still allocates and binds a bio_iostat_ctx, updates the submit timestamp, and replaces bi_end_io for every DIO bio even when sbi->iostat_enable is false. The byte accounting calls do not need an extra guard because f2fs_update_iostat() already checks sbi->iostat_enable. Only skip the DIO bio context setup when iostat is disabled. If iostat is enabled through sysfs before submission, the existing context allocation and latency accounting path is still used. QEMU benchmark on a 1GiB F2FS virtio-blk image, with iostat_enable=0, 4KiB O_DIRECT I/O over a 64MiB file, 50000 iterations per run: baseline patched direct_read median 65264.50 ns 55470.95 ns direct_read recheck 65553.75 ns 55470.95 ns direct_write median 68054.62 ns 56309.44 ns direct_write recheck 66873.51 ns 56309.44 ns Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:38 +00:00
Chao Yu	70210492be	f2fs: remove unneeded f2fs_is_compressed_page() We have checked f2fs_is_compressed_page() before f2fs_compress_write_end_io(), so we don't need to check the status again, remove it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:38 +00:00
Chao Yu	8b938ae6f0	f2fs: avoid unnecessary fscrypt_finalize_bounce_page() fscrypt_finalize_bounce_page() should be called only if we use fs layer crypto, let's avoid unnecessary fscrypt_finalize_bounce_page() in error path of f2fs_write_compressed_pages(). BTW, fscrypt_finalize_bounce_page() will check mapping of bounced page before retrieving original page, so, previously it won't cause any issue w/ fscrypt_finalize_bounce_page(), but still we'd better avoid coupling w/ any logic inside fscrypt_finalize_bounce_page(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:38 +00:00
Chao Yu	cf716276b0	f2fs: avoid unnecessary sanity check on ckpt_valid_blocks The calculation of sec->ckpt_valid_blocks are the same in both set_ckpt_valid_blocks() and sanity_check_valid_blocks(), so it doesn't necessary to call sanity_check_valid_blocks() right after set_ckpt_valid_blocks(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:38 +00:00
Chao Yu	d27e443102	f2fs: misc cleanup in f2fs_record_stop_reason() Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:38 +00:00
Chao Yu	98fd20b9cf	f2fs: fix wrong description in printed log This patch fixes wrong description in printed log: "SSA and SIT" -> "SIT and SSA" Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:38 +00:00
Bryam Vargas	378acf3cf1	f2fs: bound i_inline_xattr_size for non-inline-xattr inodes When the flexible_inline_xattr feature is enabled, do_read_inode() loads the on-disk i_inline_xattr_size unconditionally: if (f2fs_sb_has_flexible_inline_xattr(sbi)) fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size); but sanity_check_inode() only range-checks it when the inode also has the FI_INLINE_XATTR flag set. An inode that carries an inline dentry or inline data but not FI_INLINE_XATTR -- the normal layout for an inline directory -- therefore keeps a fully attacker-controlled i_inline_xattr_size from a crafted image. get_inline_xattr_addrs() returns that value with no flag gating, so it feeds the inode geometry: MAX_INLINE_DATA() = 4 * (CUR_ADDRS_PER_INODE - i_inline_xattr_size - 1) NR_INLINE_DENTRY() = MAX_INLINE_DATA() * BITS_PER_BYTE / (...) addrs_per_page() = CUR_ADDRS_PER_INODE - i_inline_xattr_size A large i_inline_xattr_size drives MAX_INLINE_DATA() and NR_INLINE_DENTRY() negative, so make_dentry_ptr_inline() sets d->max (int) to a negative value. The inline directory walk then compares an unsigned long bit_pos against that negative d->max, which is promoted to a huge unsigned bound, and reads far past the inline area: while (bit_pos < d->max) /* fs/f2fs/dir.c */ ... test_bit_le(bit_pos, d->bitmap) / d->dentry[bit_pos] ... Mounting a crafted image and reading such a directory triggers an out-of-bounds read in f2fs_fill_dentries(); the same underflow also corrupts ADDRS_PER_INODE for regular files. Validate i_inline_xattr_size against MAX_INLINE_XATTR_SIZE whenever the flexible_inline_xattr feature is enabled -- i.e. whenever the value is loaded from disk and consumed -- and keep the lower MIN_INLINE_XATTR_SIZE bound gated on inodes that actually carry an inline xattr, so legitimate inodes with i_inline_xattr_size == 0 are still accepted. Cc: stable@vger.kernel.org Fixes: `6afc662e68` ("f2fs: support flexible inline xattr size") Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Zhang Cen	c4810ada31	f2fs: validate ACL entry sizes in f2fs_acl_from_disk() f2fs_acl_count() only validates the aggregate ACL xattr length. A malformed ACL can still place ACL_USER or ACL_GROUP in a slot that only contains struct f2fs_acl_entry_short bytes, and f2fs_acl_from_disk() then reads entry->e_id before verifying that a full entry fits. Require a short entry before reading e_tag and e_perm, and require a full entry before reading e_id for ACL_USER and ACL_GROUP. Return -EFSCORRUPTED from these new truncated-entry checks, while keeping the pre-existing -EINVAL paths unchanged. Validation reproduced this kernel report: KASAN slab-out-of-bounds in __f2fs_get_acl+0x6fb/0x7e0 RIP: 0033:0x7f4b835ea7aa The buggy address belongs to the object at ffff888114589960 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes to the right of allocated 8-byte region [ffff888114589960, ffff888114589968) Read of size 4 Call trace: dump_stack_lvl+0x66/0xa0 (?:?) print_report+0xce/0x630 (?:?) __f2fs_get_acl+0x6fb/0x7e0 (fs/f2fs/acl.c:169) srso_alias_return_thunk+0x5/0xfbef5 (?:?) __virt_addr_valid+0x224/0x430 (?:?) kasan_report+0xe0/0x110 (?:?) __f2fs_get_acl+0x5/0x7e0 (fs/f2fs/acl.c:169) __get_acl+0x281/0x380 (?:?) vfs_get_acl+0x10b/0x190 (?:?) do_get_acl+0x2a/0x410 (?:?) do_get_acl+0x9/0x410 (?:?) do_getxattr+0xe8/0x260 (?:?) filename_getxattr+0xd1/0x140 (?:?) do_getname+0x2d/0x2d0 (?:?) path_getxattrat+0x16c/0x200 (?:?) lock_release+0xc8/0x290 (?:?) cgroup_update_frozen+0x9d/0x320 (?:?) lockdep_hardirqs_on_prepare+0xea/0x1a0 (?:?) trace_hardirqs_on+0x1a/0x170 (?:?) _raw_spin_unlock_irq+0x28/0x50 (?:?) do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87) entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?) Cc: stable@kernel.org Fixes: `af48b85b8c` ("f2fs: add xattr and acl functionalities") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Zhaoyang Huang	ccaba78582	Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block" This reverts commit `9609dd7047`. The kernel panics are keeping to be reported especially when the f2fs partition get almost full. By investigation, we find that the reason is one f2fs page got freed to buddy without being deleted from LRU and the root cause is the race happened in [2] which is enrolled by this commit. There are 3 race processes in this scenario, please find below for their main activities. The changed code in move_data_block() lets the GC path evict the tail-end folio from the page cache through folio_end_dropbehind(). Once folio_unmap_invalidate() removes the folio from mapping->i_pages, the page-cache references for all pages in the folio are dropped. The folio is then kept alive only by temporary external references, which allows a later split to operate on a folio whose subpages are no longer protected by page-cache references. After the page-cache references are gone, split_folio_to_order() can split the big folio into individual pages and put the resulting subpages back on the LRU. For tail pages beyond EOF, split removes them from the page cache and drops their page-cache references. A tail page can then remain on the LRU with PG_lru set while holding only the split caller's temporary reference. When free_folio_and_swap_cache() drops that final reference, the page enters the final folio_put() release path. In parallel, folio_isolate_lru() can observe the same tail page with a non-zero refcount and PG_lru set. It clears PG_lru before taking its own reference. If this races with the final folio_put() from the split path, __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). The page is then freed back to the allocator while its lru links are still present in the LRU list. A later LRU operation on a neighboring page detects the stale link and reports list corruption. [1] [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) [ 22.486130] ------------[ cut here ]------------ [ 22.486134] kernel BUG at lib/list_debug.c:67! [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 [ 22.488539] sp : ffffffc08006b830 [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d [ 22.488647] Call trace: [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) [ 22.488661] __folio_put+0x2bc/0x434 [ 22.488670] folio_put+0x28/0x58 [ 22.488678] do_garbage_collect+0x1a34/0x2584 [ 22.488689] f2fs_gc+0x230/0x9b4 [ 22.488697] f2fs_fallocate+0xb90/0xdf4 [ 22.488706] vfs_fallocate+0x1b4/0x2bc [ 22.488716] __arm64_sys_fallocate+0x44/0x78 [ 22.488725] invoke_syscall+0x58/0xe4 [ 22.488732] do_el0_svc+0x48/0xdc [ 22.488739] el0_svc+0x3c/0x98 [ 22.488747] el0t_64_sync_handler+0x20/0x130 [ 22.488754] el0t_64_sync+0x1c4/0x1c8 [2] CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) F: pagecache refs = n F: extra refs = GC + split F: PG_lru set move_data_block() folio = f2fs_grab_cache_folio(F) ... __folio_set_dropbehind(F) folio_unlock(F) folio_end_dropbehind(F) folio_unmap_invalidate(F) __filemap_remove_folio(F) folio_put_refs(F, n) folio_put(F) split_folio_to_order(F) folio_ref_freeze(F, 1) ... lru_add_split_folio(T) list_add_tail(&T->lru, &F->lru) folio_set_lru(T) __filemap_remove_folio(T) folio_put_refs(T, 1) /* T refcount == 1, PageLRU set / folio_isolate_lru(T) folio_test_clear_lru(T) free_folio_and_swap_cache(T) folio_put(T) / refcount: 1 -> 0 / __folio_put(T) __page_cache_release(T) folio_test_lru(T) == false / skip lruvec_del_folio(T) */ free_frozen_pages(T) folio_get(T) lruvec_del_folio(T) later: list_del(adjacent->lru) next == &T->lru next->prev == LIST_POISON / PCP freelist BUG Cc: stable@vger.kernel.org Fixes: `9609dd7047` ("f2fs: remove non-uptodate folio from the page cache in move_data_block") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Bart Van Assche	bdc7cfd780	f2fs: Split f2fs_write_end_io() Prepare for running most of the write completion work asynchronously. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Bart Van Assche	7ba36a9ea8	f2fs: Rename f2fs_post_read_wq into f2fs_wq Rename f2fs_post_read_wq into f2fs_wq. Create it unconditionally. Prepare for using this workqueue for completing write bios. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Bart Van Assche	41b7928813	f2fs: Prepare for supporting delayed bio completion Use bio frontpadding to allocate memory for a work_struct when allocating a bio. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Wenjie Qi	242d30bfc0	f2fs: reject setattr size changes on large folio files F2FS large folios are only enabled for immutable non-compressed files. Writable open and writable mmap reject such mappings, but truncate(2) through f2fs_setattr() misses the same guard. If FS_IMMUTABLE_FL is cleared while the inode is still cached, the mapping can keep large-folio support and ATTR_SIZE can change i_size. Reject size changes in that state. Cc: stable@kernel.org Fixes: `05e65c14ea` ("f2fs: support large folio for immutable non-compressed case") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Samuel Moelius	90e02a8e1b	f2fs: validate dentry name length before lookup compares it The f2fs dentry lookup path can use the on-disk name length before checking that the name fits in the dentry filename area. A corrupted dentry can then make lookup read beyond the filename slots. The bounds check needs to happen before any comparison that consumes the name length from disk. Reject dentries with invalid name lengths before comparing their names. Assisted-by: Codex:gpt-5.5-cyber-preview Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:37 +00:00
Samuel Moelius	cfcd0e49a1	f2fs: validate inline dentry name lengths before conversion Inline dentry conversion copies names out of the inline dentry area before checking that each recorded name length fits in the available filename slots. A corrupted image can therefore make the conversion path read past the inline filename storage while building the regular dentry block. Validate each inline dentry name length against the inline filename area before copying it. Assisted-by: Codex:gpt-5.5-cyber-preview Signed-off-by: Samuel Moelius <samuel.moelius@trailofbits.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Mikhail Lobanov	a41075acde	f2fs: read COW data with the original inode during atomic write When updating an atomic-write file, f2fs_write_begin() may read the previously written data back from the COW inode: prepare_atomic_write_begin() locates the block in the COW inode and sets use_cow, and the read bio is then built with the COW inode: f2fs_submit_page_read(use_cow ? F2FS_I(inode)->cow_inode : inode, ...); and f2fs_grab_read_bio() decides whether to schedule fs-layer decryption (STEP_DECRYPT) for the bio based on that inode via fscrypt_inode_uses_fs_layer_crypto(). However, the folio being filled belongs to the original inode (folio->mapping->host == inode), and the data stored in the COW block was encrypted (or left as plaintext) using the original inode's context, not the COW inode's -- see f2fs_encrypt_one_page(), which keys off fio->page->mapping->host. fscrypt_decrypt_pagecache_blocks() likewise operates on folio->mapping->host. The COW inode is created as a tmpfile in the parent directory and inherits its encryption policy from there. With test_dummy_encryption the newly created COW inode gets the dummy policy and becomes encrypted, while a pre-existing regular file -- created before the policy applied, e.g. already present in the on-disk image -- stays unencrypted. The read path then sets STEP_DECRYPT based on the encrypted COW inode and calls fscrypt_decrypt_pagecache_blocks() on a folio whose host (the unencrypted original inode) has a NULL ->i_crypt_info, dereferencing it: Oops: general protection fault, probably for non-canonical address ... KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] RIP: 0010:fscrypt_decrypt_pagecache_blocks+0xa0/0x310 Workqueue: f2fs_post_read_wq f2fs_post_read_work Call Trace: fscrypt_decrypt_bio+0x1eb/0x340 f2fs_post_read_work+0xba/0x140 process_one_work+0x91c/0x1a40 worker_thread+0x677/0xe90 kthread+0x2bc/0x3a0 The COW inode is only needed to locate the on-disk block, and that block address is already resolved into @blkaddr by prepare_atomic_write_begin() via __find_data_block(cow_inode, ...); f2fs_submit_page_read() then reads from that physical @blkaddr directly, so the inode argument only selects the post-read crypto context, not which block is fetched. Reading with @inode therefore returns the same (latest, not-yet-committed) COW data, while making both the fs-layer decryption decision and the inline crypto path use the correct (original inode's) key. With the COW inode no longer used at the read site, the use_cow flag has no remaining consumer; drop it from f2fs_write_begin() and prepare_atomic_write_begin(). Fixes: `591fc34e1f` ("f2fs: use cow inode data when updating atomic write") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Wenjie Qi	222bc257a1	f2fs: skip inode folio lookup for cached overwrite prepare_write_begin() first gets the inode folio and builds a dnode, then checks the read extent cache. For an ordinary overwrite of a non-inline and non-compressed file, an extent-cache hit already gives the data block address and the following path does not need to allocate or update any node state. Check the read extent cache before fetching the inode folio for that narrow case. Keep the existing paths for inline data, compressed files, and writes that may extend past EOF, where the helper may need inline conversion, compression preparation, or block reservation. This avoids a node-folio lookup in the buffered overwrite fast path when the mapping is already cached. In a QEMU/KASAN x86_64 VM, using a small buffered overwrite workload on an existing 1MiB file, median time improved as follows: 64-byte overwrites: 1724.93 ns/write -> 1560.24 ns/write 256-byte overwrites: 1713.38 ns/write -> 1577.85 ns/write Function profiling of 20k 64-byte overwrites showed f2fs_get_inode_folio() calls drop from 20004 to 4. Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Wenjie Qi	6d874b65aa	f2fs: keep atomic write retry from zeroing original data A partial atomic write reserves a block in the COW inode before reading the original data page for the untouched bytes in that page. If that read fails, write_begin returns an error but leaves the COW inode entry as NEW_ADDR. A retry of the same partial write then finds the COW entry, treats it as existing COW data, and f2fs_write_begin() zeroes the whole folio because blkaddr is NEW_ADDR. If the retry is committed, the bytes outside the retried write range are committed as zeroes instead of preserving the original file contents. Only use the COW inode as the read source when it already has a real data block. If the COW entry is still NEW_ADDR, treat it as a reservation to reuse: keep reading the old data from the original inode and avoid reserving or accounting the same atomic block again. Cc: stable@kernel.org Fixes: `3db1de0e58` ("f2fs: change the current atomic write way") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Wenjie Qi	846c499a65	f2fs: validate orphan inode entry count f2fs_recover_orphan_inodes() trusts the orphan block entry_count when replaying orphan inodes from the checkpoint pack. A corrupted entry_count larger than F2FS_ORPHANS_PER_BLOCK makes the recovery loop read past the ino[] array and interpret footer or following data as inode numbers. On a crafted image, mounting an unpatched kernel can drive orphan recovery into f2fs_bug_on() and panic the kernel. Validate entry_count before consuming entries so corrupted checkpoint data fails the mount with -EFSCORRUPTED and requests fsck instead. Set ERROR_INCONSISTENT_ORPHAN as well, so the corruption reason can be recorded in the superblock s_errors[] field. This gives fsck a persistent hint even though mount-time orphan recovery failure may leave no chance to persist SBI_NEED_FSCK through a checkpoint. Cc: stable@kernel.org Fixes: `127e670abf` ("f2fs: add checkpoint operations") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Wenjie Qi	28ebb922b9	f2fs: honor per-I/O write streams for direct writes io_uring can pass a per-I/O write stream through kiocb->ki_write_stream, and block direct I/O propagates that value to bio->bi_write_stream. F2FS added FDP stream mapping for DATA writes, but its direct write submit hook always rewrites bio->bi_write_stream from the inode write hint and F2FS temperature. As a result, a direct write with an explicit io_uring write_stream is submitted to the F2FS-selected stream instead of the user-requested stream. Validate an explicit write stream before starting F2FS direct I/O, pass the kiocb through the iomap private pointer, and preserve the per-I/O stream in the direct write bio. When no per-I/O stream is supplied, keep using the existing F2FS temperature-to-stream mapping. Fixes: 42f7a7a50a33 ("f2fs: map data writes to FDP streams") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Chao Yu	8712353ed8	f2fs: fix to do sanity check on f2fs_get_node_folio_ra() kernel BUG at fs/f2fs/file.c:845! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:f2fs_do_truncate_blocks+0x1115/0x1140 fs/f2fs/file.c:845 Code: fc fc 90 0f 0b e8 8b 9d 9a fd 90 0f 0b e8 83 9d 9a fd 48 89 df 48 c7 c6 60 d1 1a 8c e8 54 f1 fc fc 90 0f 0b e8 6c 9d 9a fd 90 <0f> 0b e8 64 9d 9a fd 90 0f 0b 90 e9 93 fd ff ff e8 56 9d 9a fd 90 RSP: 0018:ffffc9000e4474c0 EFLAGS: 00010283 RAX: ffffffff842b1d34 RBX: 0000000000000003 RCX: 0000000000100000 RDX: ffffc9000f03a000 RSI: 0000000000035503 RDI: 0000000000035504 RBP: ffffc9000e447608 R08: ffff8880123b0000 R09: 0000000000000002 R10: 00000000fffffffe R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000000 R14: 1ffff92001c88ea0 R15: 00000000ffff039c FS: 00007f7e02ee36c0(0000) GS:ffff88808c887000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff0305c4000 CR3: 0000000012d4c000 CR4: 0000000000352ef0 Call Trace: <TASK> f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:882 f2fs_truncate+0x471/0x7c0 fs/f2fs/file.c:940 f2fs_evict_inode+0xa3f/0x1ac0 fs/f2fs/inode.c:907 evict+0x61e/0xb10 fs/inode.c:841 f2fs_fill_super+0x5f43/0x78f0 fs/f2fs/super.c:5224 get_tree_bdev_flags+0x431/0x4f0 fs/super.c:1694 vfs_get_tree+0x92/0x2a0 fs/super.c:1754 fc_mount fs/namespace.c:1193 [inline] do_new_mount_fc fs/namespace.c:3758 [inline] do_new_mount+0x341/0xd30 fs/namespace.c:3834 do_mount fs/namespace.c:4167 [inline] __do_sys_mount fs/namespace.c:4383 [inline] __se_sys_mount+0x31d/0x420 fs/namespace.c:4360 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f count = ADDRS_PER_PAGE(dn.node_folio, inode); count -= dn.ofs_in_node; f2fs_bug_on(sbi, count < 0); The fuzz test will trigger above bug_on in f2fs. The root cause should be: in the corrupted inode, there is a direct node which has the same ino and nid in its footer, so in f2fs_do_truncate_blocks(), after f2fs_get_dnode_of_data() finds such dnode: 1) ADDRS_PER_PAGE(dn.node_folio, inode) will return 923 2) once dn.ofs_in_node points to addr[923, 1017] Then it will trigger the system panic. Let's introduce NODE_TYPE_NON_IXNODE to indicate current node should not be an inode or xattr node, and then use it in below path to detect inconsistent node chain in inode mapping table: - f2fs_do_truncate_blocks - f2fs_get_dnode_of_data - f2fs_get_node_folio_ra - __get_node_folio - f2fs_sanity_check_node_footer - case NODE_TYPE_NON_IXNODE -> check whether it is inode\|xnode Cc: stable@kernel.org Reported-by: syzbot+2488d8d751b27f7ce268@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69fa3697.170a0220.59368.0018.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:36 +00:00
Chao Yu	7967d563bc	Revert: "f2fs: check in-memory sit version bitmap" Commit `ae27d62e6b` ("f2fs: check in-memory sit version bitmap") added a mirror for sit version bitmap, it expects to detect in-memory corruption, however we never got any reports from the check points for almost decade, let's remove the code, it can help to save memories. Cc: wallentx <william.allentx@gmail.com> Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Chao Yu	a2b251259b	Revert: "f2fs: check in-memory block bitmap" Commit `355e78913c` ("f2fs: check in-memory block bitmap") added a mirror for valid block bitmap, it expects to detect in-memory corruption, however we never got any reports from the check points for almost decade, let's remove the code, it can help to save memories. Cc: wallentx <william.allentx@gmail.com> Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Wenjie Qi	484c84ecc1	f2fs: avoid false shutdown fserror reports F2FS records image errors and checkpoint-stop reasons through the same s_error_work worker. The ordinary f2fs_handle_error() path only updates s_errors, but the worker still calls fserror_report_shutdown() unconditionally after committing the superblock. As a result, a metadata corruption report can be followed by a synthetic FAN_FS_ERROR event with ESHUTDOWN and an invalid superblock file handle, even though no stop reason was recorded. Track whether save_stop_reason() actually changed the stop_reason array and only report the shutdown fserror for that case. Pure s_errors updates still commit the superblock, but no longer generate a false shutdown event. Fixes: 50faed607d32 ("f2fs: support to report fserror") Cc: stable@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Wenjie Qi	5073c66a96	f2fs: validate compress cache inode only when enabled F2FS_COMPRESS_INO() uses NM_I(sbi)->max_nid as the synthetic inode number for the compressed page cache inode. That inode only exists when the compress_cache mount option is enabled. When compress_cache is disabled, max_nid is outside the valid inode range. A corrupted directory entry that points to ino == max_nid should therefore be rejected by f2fs_check_nid_range(). However, is_meta_ino() currently treats F2FS_COMPRESS_INO() as a meta inode unconditionally, so f2fs_iget() bypasses do_read_inode() and its nid range check, and instantiates a fake internal inode instead. Gate the compressed cache inode case on COMPRESS_CACHE, matching f2fs_init_compress_inode(). With compress_cache disabled, ino == max_nid now follows the normal inode path and is rejected as an out-of-range nid. Cc: stable@kernel.org Fixes: `6ce19aff0b` ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Wenjie Qi	fcb05c26c2	f2fs: pass correct iostat type for single node writes f2fs_write_single_node_folio() takes an io_type argument, but still passes FS_GC_NODE_IO to __write_node_folio() unconditionally. This was harmless while the helper was only used by f2fs_move_node_folio(), whose caller passes FS_GC_NODE_IO. However, commit `fe9b8b30b9` ("f2fs: fix inline data not being written to disk in writeback path") made f2fs_inline_data_fiemap() call the helper with FS_NODE_IO for FIEMAP_FLAG_SYNC. Honor the caller supplied io_type so inline-data FIEMAP sync writeback is accounted as normal node IO instead of GC node IO, while the GC path continues to pass FS_GC_NODE_IO explicitly. Cc: stable@kernel.org Fixes: `fe9b8b30b9` ("f2fs: fix inline data not being written to disk in writeback path") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Wenjie Qi	74c8d2ec95	f2fs: fix missing read bio submission on large folio error f2fs_read_data_large_folio() can keep a read bio across multiple readahead folios. If a later folio hits an error before any of its blocks are added to the bio, folio_in_bio is false and the current error path returns immediately after ending that folio. This can leave the bio accumulated for earlier folios unsubmitted. Those folios then never receive read completion, and readers can wait indefinitely on the locked folios. Route errors through the common out path so any pending bio is submitted before returning. Stop consuming more readahead folios once an error is seen, and only wait on and clear the current folio when it was actually added to the bio. Cc: stable@kernel.org Fixes: `a5d8b9d94e` ("f2fs: fix to unlock folio in f2fs_read_data_large_folio()") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Chao Yu	e0288584ba	f2fs: atomic: fix UAF issue on f2fs_inode_info.atomic_inode - ioctl(F2FS_IOC_GARBAGE_COLLECT_RANGE) - shrink - f2fs_gc - gc_data_segment - ra_data_block(cow_inode) - mapping = F2FS_I(inode)->atomic_inode->i_mapping : f2fs_is_cow_file(cow_inode) is true - f2fs_evict_inode(atomic_inode) - clear_inode_flag(fi->cow_inode, FI_COW_FILE) - F2FS_I(fi->cow_inode)->atomic_inode = NULL ... - truncate_inode_pages_final(atomic_inode) - f2fs_grab_cache_folio(mapping) : create folio in atomic_inode->mapping - clear_inode(atomic_inode) - BUG_ON(atomic_inode->i_data.nrpages) We need to add a reference on fi->atomic_inode before using its mapping field during garbage collection, otherwise, it will cause UAF issue. Cc: stable@kernel.org Cc: Daeho Jeong <daehojeong@google.com> Cc: Sunmin Jeong <s_min.jeong@samsung.com> Fixes: `3db1de0e58` ("f2fs: change the current atomic write way") Fixes: `f18d007693` ("f2fs: use meta inode for GC of COW file") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:35 +00:00
Chao Yu	8b4468ec02	f2fs: fix potential deadlock in gc_merge path of f2fs_balance_fs() When we mount device w/ gc_merge mount option, we may suffer below potential deadlock: Kworker GC trehad Truncator - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page - folio_start_writeback --- set writeback flag on folio - f2fs_outplace_write_data : cached folio in internal bio cache - f2fs_balance_fs - wake_up(gc_thread) : wake up gc thread to run foreground GC - finish_wait(fggc_wq) : wait on the waitqueue --- wait on GC thread to finish the work - truncate_inode_pages_range - __filemap_get_folio(, FGP_LOCK) --- lock folio - truncate_inode_partial_folio - folio_wait_writeback --- wait on writeback being cleared - do_garbage_collect - move_data_page - f2fs_get_lock_data_folio - lock on folio --- blocked on folio's lock In order to avoid such deadlock, let's call below functions to commit cached bios in GC_MERGE path of f2fs_balance_fs() as the same as we did in NOGC_MERGE path. - f2fs_submit_merged_write(sbi, DATA); - f2fs_submit_all_merged_ipu_writes(sbi); Cc: stable@kernel.org Fixes: `351df4b201` ("f2fs: add segment operations") Cc: Ruipeng Qi <ruipengqi3@gmail.com> Reported: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Chao Yu <chaseyu@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
Chao Yu	47a60629ca	f2fs: add logs in f2fs_disable_checkpoint() In order to troubleshoot in which step we may block on during mount w/ checkpoint_disable mount option. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
liujinbao1	caac757a3d	f2fs: add iostat latency tracking for direct IO F2FS did not collect iostat latency for direct IO reads and writes, hook iomap_dio_ops.submit_io to bind an iostat context and record the submission timestamp. Replace bi_end_io with f2fs_dio_end_bio() to collect IO latency on completion before calling back to the original iomap_dio_bio_end_io(), to add iostat latency tracking support for F2FS DIO. Signed-off-by: shengyong1 <shengyong1@xiaomi.com> Signed-off-by: liujinbao1 <liujinbao1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
Daeho Jeong	5dfb768326	f2fs: optimize representative type determination in GC In large section mode, do_garbage_collect() previously determined the section's representative type by looking only at the first segment of the section. However, if data was fsynced into an area previously used as a node section, and this area is recovered during roll-forward recovery after sudden power off (SPO), GC would incorrectly assume the section's type based on an empty or obsolete first segment. This caused the recovered data segment to be misunderstood as being stuck inside a node section, triggering false inconsistency panics (Inconsistent segment type in SSA and SIT) and subsequent mount failures. This patch optimizes do_garbage_collect() to determine the section's representative type by identifying the first segment that actually contains valid blocks (valid_blocks > 0) during the main GC loop. This eliminates false alarms from empty/obsolete leading segments while maintaining strict section-level type consistency checks for genuine corruption. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
liujinbao1	f6b2456603	f2fs: Add trace_f2fs_fault_report Add trace_f2fs_fault_report to trigger reporting upon f2fs_bug_on, need_fsck, stop_checkpoint, and handle_eio. Since f2fs_bug_on and need_fsck can be triggered in hundreds of scenarios, define set_sbi_flag as a macro to help capture the effective fault function and line number. Signed-off-by: shengyong1 <shengyong1@xiaomi.com> Signed-off-by: liujinbao1 <liujinbao1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
Cen Zhang	fb645a976f	f2fs: annotate lockless NAT counter reads nat_cnt[] is updated while callers hold nat_tree_lock, but F2FS samples the counters locklessly in f2fs_available_free_memory(), excess_dirty_nats(), and excess_cached_nats(). Those helpers only steer cache reclaim and background sync heuristics; they do not control NAT entry lifetime or checkpoint correctness. Document the intent with data_race(READ_ONCE()) and a short comment instead of adding locking to the balance path. Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
Cen Zhang	b952837f73	f2fs: annotate lockless last_time[] accesses f2fs stores mount-wide activity timestamps in sbi->last_time[] and samples them from background discard, GC, and balance paths without a dedicated lock. The timestamps are used as best-effort heuristics to decide whether background work should run now or sleep a bit longer. The current helpers use plain loads and stores, so KCSAN can report races between frequent foreground updates and background readers. Exact freshness is not required here, but the intentional lockless accesses should be marked explicitly. Use WRITE_ONCE() in f2fs_update_time() and READ_ONCE() in f2fs_time_over() and f2fs_time_to_wait(). This preserves the existing heuristic behavior and avoids adding locking to hot paths. Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-06-22 19:52:34 +00:00
Ruipeng Qi	dd31148707	f2fs: fix potential deadlock in f2fs_balance_fs() When the f2fs filesystem space is nearly exhausted, we encounter deadlock issues as below: INFO: task A:1890 blocked for more than 120 seconds. Tainted: G O 6.12.41-g3fe07ddf05ab #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:A state:D stack:0 pid:1890 tgid:1626 ppid:1153 flags:0x00000204 Call trace: __switch_to+0xf4/0x158 __schedule+0x27c/0x908 schedule+0x3c/0x118 io_schedule+0x44/0x68 folio_wait_bit_common+0x174/0x370 folio_wait_bit+0x20/0x38 folio_wait_writeback+0x54/0xc8 truncate_inode_partial_folio+0x70/0x1e0 truncate_inode_pages_range+0x1b0/0x450 truncate_pagecache+0x54/0x88 f2fs_file_write_iter+0x3e8/0xb80 do_iter_readv_writev+0xf0/0x1e0 vfs_writev+0x138/0x2c8 do_writev+0x88/0x130 __arm64_sys_writev+0x28/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x30/0xf8 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x190/0x198 INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds. Tainted: G O 6.12.41-g3fe07ddf05ab #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:11 state:D stack:0 pid:2680853 tgid:2680853 ppid:2 flags:0x00000208 Workqueue: writeback wb_workfn (flush-254:0) Call trace: __switch_to+0xf4/0x158 __schedule+0x27c/0x908 schedule+0x3c/0x118 io_schedule+0x44/0x68 folio_wait_bit_common+0x174/0x370 __filemap_get_folio+0x214/0x348 pagecache_get_page+0x20/0x70 f2fs_get_read_data_page+0x150/0x3e8 f2fs_get_lock_data_page+0x2c/0x160 move_data_page+0x50/0x478 do_garbage_collect+0xd38/0x1528 f2fs_gc+0x240/0x7e0 f2fs_balance_fs+0x1a0/0x208 f2fs_write_single_data_page+0x6e4/0x730 f2fs_write_cache_pages+0x378/0x9b0 f2fs_write_data_pages+0x2e4/0x388 do_writepages+0x8c/0x2c8 __writeback_single_inode+0x4c/0x498 writeback_sb_inodes+0x234/0x4a8 __writeback_inodes_wb+0x58/0x118 wb_writeback+0x2f8/0x3c0 wb_workfn+0x2c4/0x508 process_one_work+0x180/0x408 worker_thread+0x258/0x368 kthread+0x118/0x128 ret_from_fork+0x10/0x200 INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds. Tainted: G O 6.12.41-g3fe07ddf05ab #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:8 state:D stack:0 pid:2641297 tgid:2641297 ppid:2 flags:0x00000208 Workqueue: writeback wb_workfn (flush-254:0) Call trace: __switch_to+0xf4/0x158 __schedule+0x27c/0x908 rt_mutex_schedule+0x30/0x60 __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8 rwbase_write_lock+0x24c/0x378 down_write+0x1c/0x30 f2fs_balance_fs+0x184/0x208 f2fs_write_inode+0xf4/0x328 __writeback_single_inode+0x370/0x498 writeback_sb_inodes+0x234/0x4a8 __writeback_inodes_wb+0x58/0x118 wb_writeback+0x2f8/0x3c0 wb_workfn+0x2c4/0x508 process_one_work+0x180/0x408 worker_thread+0x258/0x368 kthread+0x118/0x128 ret_from_fork+0x10/0x20 INFO: task B:1902 blocked for more than 120 seconds. Tainted: G O 6.12.41-g3fe07ddf05ab #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:B state:D stack:0 pid:1902 tgid:1626 ppid:1153 flags:0x0000020c Call trace: __switch_to+0xf4/0x158 __schedule+0x27c/0x908 rt_mutex_schedule+0x30/0x60 __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8 rwbase_write_lock+0x24c/0x378 down_write+0x1c/0x30 f2fs_balance_fs+0x184/0x208 f2fs_map_blocks+0x94c/0x1110 f2fs_file_write_iter+0x228/0xb80 do_iter_readv_writev+0xf0/0x1e0 vfs_writev+0x138/0x2c8 do_writev+0x88/0x130 __arm64_sys_writev+0x28/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x30/0xf8 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x190/0x198 INFO: task sync:2769849 blocked for more than 120 seconds. Tainted: G O 6.12.41-g3fe07ddf05ab #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:sync state:D stack:0 pid:2769849 tgid:2769849 ppid:736 flags:0x0000020c Call trace: __switch_to+0xf4/0x158 __schedule+0x27c/0x908 schedule+0x3c/0x118 wb_wait_for_completion+0xb0/0xe8 sync_inodes_sb+0xc8/0x2b0 sync_inodes_one_sb+0x24/0x38 iterate_supers+0xa8/0x138 ksys_sync+0x54/0xc8 __arm64_sys_sync+0x18/0x30 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x30/0xf8 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x190/0x198 The root cause is a potential deadlock between the following tasks: kworker/u8:11 Thread A - f2fs_write_single_data_page - f2fs_do_write_data_page - folio_start_writeback(X) - f2fs_outplace_write_data - bio_add_folio(X) - folio_unlock(X) - truncate_inode_pages_range - __filemap_get_folio(X, FGP_LOCK) - truncate_inode_partial_folio(X) - folio_wait_writeback(X) - f2fs_balance_fs - f2fs_gc - do_garbage_collect - move_data_page - f2fs_get_lock_data_page - __filemap_get_folio(X, FGP_LOCK) Both threads try to access folio X. Thread A holds the lock but waits for writeback, while kworker waits for the lock. This causes a deadlock. Other threads also enter D state, waiting for locks such as gc_lock and writepages. OPU/IPU DATA folio are all affected by this issue. To avoid such potential deadlocks, always commit these cached folios before triggering f2fs_gc() in f2fs_balance_fs(). Suggested-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-05-22 03:49:06 +00:00
Yongpeng Yang	1f70ddb28a	f2fs: fix incorrect FI_NO_EXTENT handling in __destroy_extent_node() When __destroy_extent_node() sets the inode flag FI_NO_EXTENT, it does not reset the length of the largest extent to 0 and update the inode folio. Since modifications to the extent tree are disallowed afterward, the cached largest extent may become stale. This can trigger the following error in xfstests generic/388: F2FS-fs (dm-0): sanity_check_extent_cache: inode (ino=1761) extent info [220057, 57, 6] is incorrect, run fsck to fix In the f2fs_drop_inode path, __destroy_extent_node() does not need to guarantee that et->node_cnt is 0, because concurrency with writeback is expected in this path, and writeback may update the extent cache. This patch reverts commit `ed78aeebef` ("f2fs: fix node_cnt race between extent node destroy and writeback"), and remove the unnecessary zero check of et->node_cnt. Fixes: `ed78aeebef` ("f2fs: fix node_cnt race between extent node destroy and writeback") Cc: stable@vger.kernel.org Reported-by: Chao Yu <chao@kernel.org> Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-05-22 03:49:06 +00:00
Chao Yu	ccc6436abb	f2fs: support to report fserror This patch supports to report fserror, it provides another way to let userspace to monitor filesystem level error. In addition, it exports /sys/fs/f2fs/features/fserror once f2fs kernel module start to support the new feature, then generic/791 of fstests can notice the feature, and verify validation of fserror report. Cc: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-05-22 03:49:06 +00:00
Wenjie Qi	e6c8140bd0	f2fs: map data writes to FDP streams From: Wenjie Qi <qiwenjie@xiaomi.com> F2FS already classifies DATA writes using its existing hot, warm and cold temperature policy, but it only passes that intent down as a write hint. That hint alone is not sufficient for NVMe FDP placement, because the current NVMe command path consumes `bio->bi_write_stream` rather than `bio->bi_write_hint` when selecting a placement ID. When the target block device exposes write streams, map the existing F2FS DATA temperature classes onto stream IDs and set `bio->bi_write_stream` for both buffered and direct writes. If the device exposes no write streams, keep the current behavior by leaving the stream unset. The stream mapping is evaluated against the target block device of each bio, so the existing per-device fallback behavior stays unchanged for multi-device filesystems. Existing blkzoned restrictions also remain in place. The mapping is intentionally small and deterministic: - 1 stream: hot, warm and cold all use stream 1 - 2 streams: hot/warm use 1, cold uses 2 - 3+ streams: hot uses 1, warm uses 2, cold uses 3 Signed-off-by: Wenjie Qi <qwjhust@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-05-22 03:49:06 +00:00
Linus Torvalds	d46dd0d883	f2fs-for-7.1-rc1 In this round, the changes primarily focus on resolving race conditions, memory safety issues (UAF), and improving the robustness of garbage collection (GC), and folio management. Enhancement: - add page-order information for large folio reads in iostat - add defrag_blocks sysfs node Bug fix: - fix uninitialized kobject put in f2fs_init_sysfs() - disallow setting an extension to both cold and hot - fix node_cnt race between extent node destroy and writeback - fix to preserve previous reserve_{blocks,node} value when remount - fix to freeze GC and discard threads quickly - fix false alarm of lockdep on cp_global_sem lock - fix data loss caused by incorrect use of nat_entry flag - fix to skip empty sections in f2fs_get_victim - fix inline data not being written to disk in writeback path - fix fsck inconsistency caused by FGGC of node block - fix fsck inconsistency caused by incorrect nat_entry flag usage - call f2fs_handle_critical_error() to set cp_error flag - fix fiemap boundary handling when read extent cache is incomplete - fix use-after-free of sbi in f2fs_compress_write_end_io() - fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io() - fix incorrect file address mapping when inline inode is unwritten - fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled - fix to avoid memory leak in f2fs_rename() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmnn7+kACgkQQBSofoJI UNIknA//ScYLuOhOmJJNBfmkEoUe5es04YRRq1OOBAvOCGw+Z/qg9unel9Qpneqg 0xQ35rLKL6q7Y592ZOgWyipFTGhDBEbdJNP6eI9avBURoj9sFjDhFlmkVuUhjsns IgOSVgWSWqijWZOcBQbJGEm+N/W81Ktee1RUIDkcti66/uYIS+roTLDLbIyEhvkT DhsmUnYwoMy9cB5ag9rZuSWvEa8TI7UbelH78Oi/TqRYJu6ax+D99s6PzOFBH1EE FwNGoEMn3r1+2gqPVzDmtrz7A/cYtHVigaUT9d8/n2yygZhGaQ8whd0QoIlikgcW 9n7Ymo3sns/yLEJURFqkB6Q5yFcZ30jRJZJb5CMNeqtuHQFoLjtcpEWqiQKGzzKY uUATMoG7F3QSn8AOVt6GaxnpvNb/NiVZ1Fsvt1Cgq8hUjxf1v2AhHZnvcK0EDAqa PvEYSriB56Qtnt1UfbNqydxSiviDDjtaHDprFIvAyEavDCs2F7gzrHEW7IHzG2XR Io9hnaBNUJs065zU8qWHyetIZCjPySnPOkZ42eaMEsDMhDtlC3WDOB3ZkmFnh9u2 2K/SaIpQInGyP2LGLzNB/khWhDcZ4aGciCd7b5Ul9WkrfZTzrN9XI/F2w7dr0R6q tE6xJThraGk7NjO67xUq/M2KnVAHN5gTPRY9OmEboEdTO+6pC5w= =0oeQ -----END PGP SIGNATURE----- Merge tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, the changes primarily focus on resolving race conditions, memory safety issues (UAF), and improving the robustness of garbage collection (GC), and folio management. Enhancements: - add page-order information for large folio reads in iostat - add defrag_blocks sysfs node Bug fixes: - fix uninitialized kobject put in f2fs_init_sysfs() - disallow setting an extension to both cold and hot - fix node_cnt race between extent node destroy and writeback - preserve previous reserve_{blocks,node} value when remount - freeze GC and discard threads quickly - fix false alarm of lockdep on cp_global_sem lock - fix data loss caused by incorrect use of nat_entry flag - skip empty sections in f2fs_get_victim - fix inline data not being written to disk in writeback path - fix fsck inconsistency caused by FGGC of node block - fix fsck inconsistency caused by incorrect nat_entry flag usage - call f2fs_handle_critical_error() to set cp_error flag - fix fiemap boundary handling when read extent cache is incomplete - fix use-after-free of sbi in f2fs_compress_write_end_io() - fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io() - fix incorrect file address mapping when inline inode is unwritten - fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled - avoid memory leak in f2fs_rename()" * tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits) f2fs: add page-order information for large folio reads in iostat f2fs: do not support mmap write for large folio f2fs: fix uninitialized kobject put in f2fs_init_sysfs() f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show() f2fs: disallow setting an extension to both cold and hot f2fs: fix node_cnt race between extent node destroy and writeback f2fs: allow empty mount string for Opt_usr\|grp\|projjquota f2fs: fix to preserve previous reserve_{blocks,node} value when remount f2fs: invalidate block device page cache on umount f2fs: fix to freeze GC and discard threads quickly f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer f2fs: fix false alarm of lockdep on cp_global_sem lock f2fs: fix data loss caused by incorrect use of nat_entry flag f2fs: fix to skip empty sections in f2fs_get_victim f2fs: fix inline data not being written to disk in writeback path f2fs: fix fsck inconsistency caused by FGGC of node block f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally f2fs: refactor node footer flag setting related code f2fs: refactor f2fs_move_node_folio function ...	2026-04-21 14:50:04 -07:00
Daniel Lee	cb8ff3ead9	f2fs: add page-order information for large folio reads in iostat Track read folio counts by order in F2FS iostat sysfs and tracepoints. Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-04-18 22:44:42 +00:00
Linus Torvalds	334fbe734e	mm.git review status for linus..mm-stable Everything: Total patches: 368 Reviews/patch: 1.56 Reviewed rate: 74% Excluding DAMON: Total patches: 316 Reviews/patch: 1.77 Reviewed rate: 81% Excluding DAMON and zram: Total patches: 306 Reviews/patch: 1.81 Reviewed rate: 82% Excluding DAMON, zram and maple_tree: Total patches: 276 Reviews/patch: 2.01 Reviewed rate: 91% Significant patch series in this merge: - The 30 patch series "maple_tree: Replace big node with maple copy" from Liam Howlett is mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - The 12 patch series "mm, swap: swap table phase III: remove swap_map" from Kairui Song offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush Yadav adds file seal preservation to LUO's memfd code. - The 2 patch series "mm: zswap: add per-memcg stat for incompressible pages" from Jiayuan Chen adds additional userspace stats reportng to zswap. - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike Rapoport implements some cleanups for our handling of ZERO_PAGE() and zero_pfn. - The 2 patch series "mm/kmemleak: Improve scan_should_stop() implementation" from Zhongqiu Han provides an robustness improvement and some cleanups in the kmemleak code. - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang "improves the khugepaged scan logic and reduces CPU consumption by prioritizing scanning tasks that access memory frequently". - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies Kexec Handover by "transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel" - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's tracepointing. - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation. - The 2 patch series "Fix KASAN support for KHO restored vmalloc regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO restores a vmalloc area. - The 4 patch series "mm: Remove stray references to pagevec" from Tal Zussman provides several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago. - The 17 patch series "mm: Eliminate fake head pages from vmemmap optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page. - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters" from SeongJae Park improves two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. - The 3 patch series "mm/damon: strictly respect min_nr_regions" from SeongJae Park improves DAMON usability by extending the treatment of the min_nr_regions user-settable parameter. - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil Babka is a proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ennsed. - The 16 patch series "mm: cleanups around unmapping / zapping" from David Hildenbrand implements "a bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions". - The 6 patch series "support batched checking of the young flag for MGLRU" from Baolin Wang supports batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64. - The 5 patch series "memcg: obj stock and slab stat caching cleanups" from Johannes Weiner provides memcg cleanup and robustness improvements. - The 5 patch series "Allow order zero pages in page reporting" from Yuvraj Sakshith enhances page_reporting's free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is cleanup work following from the recent conversion of the VMA flags to a bitmap. - The 10 patch series "mm/damon: add optional debugging-purpose sanity checks" from SeongJae Park adds some more developer-facing debug checks into DAMON core. - The 2 patch series "mm/damon: test and document power-of-2 min_region_sz requirement" from SeongJae Park adds an additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling. - The 3 patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time overflow issue in DAMON core. - The 7 patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation" from SeongJae Park is a "batch of misc/minor improvements and fixups" for DAMON. - The 4 patch series "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - The 6 patch series "zram: recompression cleanups and tweaks" from Sergey Senozhatsky provides "a somewhat random mix of fixups, recompression cleanups and improvements" in the zram code. - The 11 patch series "mm/damon: support multiple goal-based quota tuning algorithms" from SeongJae Park extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select. - The 4 patch series "mm: thp: reduce unnecessary start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged. - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes provides some cleanups and slight fixes in the mremap, mmap and vma code. - The 5 patch series "mm/damon: support addr_unit on default monitoring targets for modules" from SeongJae Park extends the use of DAMON core's addr_unit tunable. - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites" from Nico Pache provides cleanups in the khugepaged and is a base for Nico's planned khugepaged mTHP support. - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups" from David Hildenbrand implements code movement and cleanups in the memhotplug and sparsemem code. - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some memhotplug Kconfig support. - The 6 patch series "change young flag check functions to return bool" from Baolin Wang is "a cleanup patchset to change all young flag check functions to return bool". - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues" from Josh Law and SeongJae Park fixes a few potential DAMON bugs. - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code" from "converts a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it". Mainly in the vma code. - The 21 patch series "mm: expand mmap_prepare functionality and usage" from Lorenzo Stoakes "expands the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time". Cleanups, documentation, extension of mmap_prepare into filesystem drivers. - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from Lorenzo Stoakes simplifies and cleans up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ= =1Q7o -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...	2026-04-15 12:59:16 -07:00
Jaegeuk Kim	1583a7ded0	f2fs: do not support mmap write for large folio Let's check mmap writes onto the large folio, since we don't support writing large folios. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-04-15 16:36:44 +00:00
Linus Torvalds	9932f00bf4	fscrypt updates for 7.1 - Various cleanups for the interface between fs/crypto/ and filesystems, from Christoph Hellwig - Simplify and optimize the implementation of v1 key derivation by using the AES library instead of the crypto_skcipher API -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCadV8xhQcZWJpZ2dlcnNA a2VybmVsLm9yZwAKCRDzXCl4vpKOK0wSAPsGg/zd0bMiF9dcKKVESAVIePSKFbvx 5e1speATaXTSVAEAnkLLL1PZJJq9HOKpQY8Wkqpzy8Kmt8x53VeI4YW0fQc= =Rb7e -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: - Various cleanups for the interface between fs/crypto/ and filesystems, from Christoph Hellwig - Simplify and optimize the implementation of v1 key derivation by using the AES library instead of the crypto_skcipher API * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: use AES library for v1 key derivation ext4: use a byte granularity cursor in ext4_mpage_readpages fscrypt: pass a real sector_t to fscrypt_zeroout_range fscrypt: pass a byte length to fscrypt_zeroout_range fscrypt: pass a byte offset to fscrypt_zeroout_range fscrypt: pass a byte length to fscrypt_zeroout_range_inline_crypt fscrypt: pass a byte offset to fscrypt_zeroout_range_inline_crypt fscrypt: pass a byte offset to fscrypt_set_bio_crypt_ctx fscrypt: pass a byte offset to fscrypt_mergeable_bio fscrypt: pass a byte offset to fscrypt_generate_dun fscrypt: move fscrypt_set_bio_crypt_ctx_bh to buffer.c ext4, fscrypt: merge fscrypt_mergeable_bio_bh into io_submit_need_new_bio ext4: factor out a io_submit_need_new_bio helper ext4: open code fscrypt_set_bio_crypt_ctx_bh ext4: initialize the write hint in io_submit_init_bio	2026-04-13 17:29:12 -07:00
Guangshuo Li	b635f2ecdb	f2fs: fix uninitialized kobject put in f2fs_init_sysfs() In f2fs_init_sysfs(), all failure paths after kset_register() jump to put_kobject, which unconditionally releases both f2fs_tune and f2fs_feat. If kobject_init_and_add(&f2fs_feat, ...) fails, f2fs_tune has not been initialized yet, so calling kobject_put(&f2fs_tune) is invalid. Fix this by splitting the unwind path so each error path only releases objects that were successfully initialized. Fixes: `a907f3a68e` ("f2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-04-13 22:53:17 +00:00
Yongpeng Yang	5909bedbed	f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show() In f2fs_sbi_show(), the extension_list, extension_count and hot_ext_count are read without holding sbi->sb_lock. If a concurrent sysfs store modifies the extension list via f2fs_update_extension_list(), the show path may read inconsistent count and array contents, potentially leading to out-of-bounds access or displaying stale data. Fix this by holding sb_lock around the entire extension list read and format operation. Fixes: `b6a06cbbb5` ("f2fs: support hot file extension") Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-04-13 22:53:00 +00:00
Yongpeng Yang	b8b902fd57	f2fs: disallow setting an extension to both cold and hot An extension should not exist in both the cold and hot extension lists simultaneously. When adding a hot extension, check whether it already exists in the cold list, and vice versa. Reject the operation with -EINVAL if a conflict is found. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2026-04-13 22:52:57 +00:00

1 2 3 4 5 ...

4916 Commits