Commit Graph

312 Commits

Author SHA1 Message Date
Linus Torvalds
a62fe21079 Description for this pull request:
- Implement FALLOC_FL_ALLOCATE_RANGE to add support for preallocating
    clusters without zeroing, helping to reduce file fragmentation.
  - Add a unified block readahead helper for FAT chain conversion, bitmap
    allocation, and directory entry lookups.
  - Optimize exfat_chain_cont_cluster() by caching buffer heads to minimize
    mark_buffer_dirty() and mirroring overhead during NO_FAT_CHAIN to
    FAT_CHAIN conversion.
  - Switch to truncate_inode_pages_final() in evict_inode() to prevent
    BUG_ON caused by shadow entries during reclaim.
  - Fix a 32-bit truncation bug in directory entry calculations by ensuring
    proper bitwise coercion.
  - Fix sb->s_maxbytes calculation to correctly reflect the maximum possible
    volume size for a given cluster size, resolving xfstests generic/213.
  - Introduced exfat_cluster_walk() helper to traverse FAT chains by
    a specified step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN
    modes.
  - Introduced exfat_chain_advance() helper to advance an exfat_chain
    structure, updating both the current cluster and remaining size.
  - Remove dead assignments and fix Smatch warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmnbHMUWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCHXZEACDyb7s63to6V0tdry1D+/Q0hrX
 blgUDnD+oGM5J+rFsvmfU+GhUJhe3hIgvLReUpBzrjegdVXCc6wBn+pBzQkjq81A
 emJOIjOzyTgwusLuneSMGBZ3oCBJDiREGWqFeoeV1IHZoXvL4oqVr11V4b+8l2ms
 Y2FqMljg8AYsgD/Wf3nXPJ1kph/5qpxeKQB4O1RIMEqJp5uoU0byDAtJSjde7oKy
 cBRyYeGxcJ0UkQNX3pOfJHe2+Hb2wvCOPWpZ7fw4OfsOYcJMM7IFEKoT2tH67sx9
 fo3J/Q74A2Gd0pwyWUwX2xK70uSkoLCR6TxInIgLaKK2IHmgpSHUJa+F5Nz9QFED
 PDsBMedt38QXlI4jHNfYTIZuFXgFpIuDJyiphX7tQ72n+JVTiWNnKEF9mRRpRzZ+
 2AV6h6YXN0Sd4/EJts+hf1S5admwEKNYpaBXgNTUBD/BbUQcKmvw8JQNk5/tYjU4
 9jr4ab1jj1tvWtw9r2Ceu/W71W3l+Ys8gxcJB2DJB1vvo9h23NyHoNwnBMAwmNRv
 5ykE7LFFHzwt8QUTMiGlMy5NFpd9cwcVKjzXd+Iogxqa1pDW/lFXrEYxzfvsqPwo
 cBNIu2hUf2ESqctAmOWbpCRZwSWMQjh7cUwIHPqNcG2f3JMCzxxVlQC6SUJYr1qv
 LoSPVMnQU/ReqFfK6w==
 =JG0E
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Implement FALLOC_FL_ALLOCATE_RANGE to add support for preallocating
   clusters without zeroing, helping to reduce file fragmentation

 - Add a unified block readahead helper for FAT chain conversion, bitmap
   allocation, and directory entry lookups

 - Optimize exfat_chain_cont_cluster() by caching buffer heads to
   minimize mark_buffer_dirty() and mirroring overhead during
   NO_FAT_CHAIN to FAT_CHAIN conversion

 - Switch to truncate_inode_pages_final() in evict_inode() to prevent
   BUG_ON caused by shadow entries during reclaim

 - Fix a 32-bit truncation bug in directory entry calculations by
   ensuring proper bitwise coercion

 - Fix sb->s_maxbytes calculation to correctly reflect the maximum
   possible volume size for a given cluster size, resolving xfstests
   generic/213

 - Introduced exfat_cluster_walk() helper to traverse FAT chains by a
   specified step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN
   modes

 - Introduced exfat_chain_advance() helper to advance an exfat_chain
   structure, updating both the current cluster and remaining size

 - Remove dead assignments and fix Smatch warnings

* tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: use exfat_chain_advance helper
  exfat: introduce exfat_chain_advance helper
  exfat: remove NULL cache pointer case in exfat_ent_get
  exfat: use exfat_cluster_walk helper
  exfat: introduce exfat_cluster_walk helper
  exfat: fix incorrect directory checksum after rename to shorter name
  exfat: fix s_maxbytes
  exfat: fix passing zero to ERR_PTR() in exfat_mkdir()
  exfat: fix error handling for FAT table operations
  exfat: optimize exfat_chain_cont_cluster with cached buffer heads
  exfat: drop redundant sec parameter from exfat_mirror_bh
  exfat: use readahead helper in exfat_get_dentry
  exfat: use readahead helper in exfat_allocate_bitmap
  exfat: add block readahead in exfat_chain_cont_cluster
  exfat: add fallocate FALLOC_FL_ALLOCATE_RANGE support
  exfat: Fix bitwise operation having different size
  exfat: Drop dead assignment of num_clusters
  exfat: use truncate_inode_pages_final() at evict_inode()
2026-04-13 16:57:31 -07:00
Chi Zhiling
08cf4a8181 exfat: use exfat_chain_advance helper
Replace open-coded cluster chain walking logic with exfat_chain_advance()
across exfat_readdir, exfat_find_dir_entry, exfat_count_dir_entries,
exfat_search_empty_slot and exfat_check_dir_empty.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03 22:41:10 +09:00
Chi Zhiling
227468fc82 exfat: introduce exfat_chain_advance helper
Introduce exfat_chain_advance() to walk a exfat_chain structure by a
given step, updating both ->dir and ->size fields atomically. This
helper handles both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes with
proper boundary checking.

Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03 22:41:08 +09:00
Chi Zhiling
f764c5897f exfat: remove NULL cache pointer case in exfat_ent_get
Since exfat_get_next_cluster has been updated, no callers pass a NULL
pointer to exfat_ent_get, so remove the handling logic for this case.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03 22:41:07 +09:00
Chi Zhiling
6f2cbe45c6 exfat: use exfat_cluster_walk helper
Replace the custom exfat_walk_fat_chain() function and open-coded
FAT chain walking logic with the exfat_cluster_walk() helper across
exfat_find_location, __exfat_get_dentry_set, and exfat_map_cluster.

Suggested-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03 22:41:05 +09:00
Chi Zhiling
f5e5177fd7 exfat: introduce exfat_cluster_walk helper
Introduce exfat_cluster_walk() to walk the FAT chain by a given step,
handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes. Also
redefine exfat_get_next_cluster as a thin wrapper around it for
backward compatibility.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03 22:41:04 +09:00
Chi Zhiling
ff37797bad exfat: fix incorrect directory checksum after rename to shorter name
When renaming a file in-place to a shorter name, exfat_remove_entries
marks excess entries as DELETED, but es->num_entries is not updated
accordingly. As a result, exfat_update_dir_chksum iterates over the
deleted entries and computes an incorrect checksum.

This does not lead to persistent corruption because mark_inode_dirty()
is called afterward, and __exfat_write_inode later recomputes the
checksum using the correct num_entries value.

Fix by setting es->num_entries = num_entries in exfat_init_ext_entry.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03 22:41:02 +09:00
David Timber
4129a3a275 exfat: fix s_maxbytes
With fallocate support, xfstest unit generic/213 fails with

   QA output created by 213
   We should get: fallocate: No space left on device
   Strangely, xfs_io sometimes says "Success" when something went wrong
  -fallocate: No space left on device
  +fallocate: File too large

because sb->s_maxbytes is set to the volume size.

To be in line with other non-extent-based filesystems, set to max volume
size possible with the cluster size of the volume.

Signed-off-by: David Timber <dxdt@dev.snart.me>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-31 23:01:07 +09:00
Jan Kara
5f36c9ca33
fs: Rename generic_file_fsync() to simple_fsync()
The implementation is now really basic so rename generic_file_fsync()
simple_fsync() and __generic_file_fsync() to simple_fsync_noflush().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-56-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26 15:03:28 +01:00
Jan Kara
2cbfeb4c8a
exfat: Drop pointless invalidate_inode_buffers() call
EXFAT never calls mark_buffer_dirty_inode() and thus
invalidate_inode_buffers() never has anything to evict. Drop the
pointless call.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260326095354.16340-49-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26 15:03:27 +01:00
Yang Wen
73f0125800 exfat: fix passing zero to ERR_PTR() in exfat_mkdir()
Detected by Smatch.

namei.c:890 exfat_mkdir() warn:
	passing zero to 'ERR_PTR'

Signed-off-by: Yang Wen <anmuxixixi@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-26 20:22:26 +09:00
Chi Zhiling
d1d75eaf01 exfat: fix error handling for FAT table operations
Fix three error handling issues in FAT table operations:

1. Fix exfat_update_bh() to properly return errors from sync_dirty_buffer
2. Fix exfat_end_bh() to properly return errors from exfat_update_bh()
   and exfat_mirror_bh()
3. Fix ignored return values from exfat_chain_cont_cluster() in inode.c
   and namei.c

These fixes ensure that FAT table write errors are properly propagated
to the caller instead of being silently ignored.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05 21:09:35 +09:00
Chi Zhiling
636bd62299 exfat: optimize exfat_chain_cont_cluster with cached buffer heads
When converting files from NO_FAT_CHAIN to FAT_CHAIN format, profiling
reveals significant time spent in mark_buffer_dirty() and exfat_mirror_bh()
operations. This overhead occurs because each FAT entry modification
triggers a full block dirty marking and mirroring operation.

For consecutive clusters that reside in the same block, optimize by caching
the buffer head and performing dirty marking only once at the end of the
block's modifications.

Performance improvements for converting a 30GB file:

| Cluster Size | Before Patch | After Patch | Speedup |
|--------------|--------------|-------------|---------|
| 512 bytes    | 4.243s       | 1.866s      | 2.27x   |
| 4KB          | 0.863s       | 0.236s      | 3.66x   |
| 32KB         | 0.069s       | 0.034s      | 2.03x   |
| 256KB        | 0.012s       | 0.006s      | 2.00x   |

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05 21:09:32 +09:00
Chi Zhiling
63193eb445 exfat: drop redundant sec parameter from exfat_mirror_bh
The sector offset can be obtained from bh->b_blocknr, so drop the
redundant sec parameter from exfat_mirror_bh(). Also clean up the
function to use exfat_update_bh() helper.

No functional changes.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05 21:09:30 +09:00
Chi Zhiling
7094b09ea7 exfat: use readahead helper in exfat_get_dentry
Replace the custom exfat_dir_readahead() function with the unified
exfat_blk_readahead() helper in exfat_get_dentry(). This removes
the duplicate readahead implementation and uses the common interface,
also reducing code complexity.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05 21:09:28 +09:00
Chi Zhiling
a299900144 exfat: use readahead helper in exfat_allocate_bitmap
Use the newly added exfat_blk_readahead() helper in exfat_allocate_bitmap()
to simplify the code. This eliminates the duplicate inline readahead logic
and uses the unified readahead interface.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05 21:09:00 +09:00
Chi Zhiling
6ed88c9491 exfat: add block readahead in exfat_chain_cont_cluster
When a file cannot allocate contiguous clusters, exfat converts the file
from NO_FAT_CHAIN to FAT_CHAIN format. For large files, this conversion
process can take a significant amount of time.

Add simple readahead to read all the FAT blocks in advance, as these
blocks are consecutive, significantly improving the conversion performance.

Test in an empty exfat filesystem:
  dd if=/dev/zero of=/mnt/file bs=1M count=30k
  dd if=/dev/zero of=/mnt/file2 bs=1M count=1
  time cat /mnt/file2 >> /mnt/file

| cluster size | before patch | after patch |
| ------------ | ------------ | ----------- |
| 512          | 47.667s      | 4.316s      |
| 4k           | 6.436s       | 0.541s      |
| 32k          | 0.758s       | 0.071s      |
| 256k         | 0.117s       | 0.011s      |

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05 21:08:21 +09:00
David Timber
bf1797960c exfat: add fallocate FALLOC_FL_ALLOCATE_RANGE support
Currently, the Linux (ex)FAT drivers do not employ any cluster
allocation strategy to keep fragmentation at bay. As a result, when
multiple processes are competing for new clusters to expand files in
exfat filesystem on Linux simultaneously, the files end up heavily
fragmented. HDDs are most impacted, but this could also have some
negative impact on various forms of flash memory depending on the
type of underlying technology.

For instance, modern digital cameras produce multiple media files for a
single video stream. If the application does not take the fragmentation
issue into account or the system is under memory pressure, the kernel
end up allocating clusters in said files in a interleaved manner.

Demo script:

	for (( i = 0; i < 4; i += 1 ));
	do
	    dd if=/dev/urandom iflag=fullblock bs=1M count=64 of=frag-$i &
	done
	for (( i = 0; i < 4; i += 1 ));
	do
	    wait
	done

	filefrag frag-*

Result - Linux kernel native exfat, async mount:
	780 extents found
	740 extents found
	809 extents found
	712 extents found

Result - Linux kernel native exfat, sync mount:
	1852 extents found
	1836 extents found
	1846 extents found
	1881 extents found

Result - Windows XP:
	3 extents found
	3 extents found
	3 extents found
	2 extents found

Windows kernel, on the other hand, regardless of the underlying storage
interface or the medium, seems to space out clusters for each file.
Similar strategy has to be employed by Linux fat filesystems for
efficient utilisation of storage backend.

In the meantime, userspace applications like rsync may
use fallocate to combat this issue.

This patch may introduce a regression-like behaviour to some niche
filesystem-agnostic applications that use fallocate and proceed to
non-sequentially write to the file. Examples:

 - libtorrent's use of posix_fallocate() and the first fragment from a
   peer is near the end of the file
 - "Download accelerators" that do partial content requests(HTTP 206)
   in multiple threads writing to the same file

The delay incurred in such use cases is documented in WinAPI. Patches
that add the ioctl equivalents to the WinAPI function
SetFileValidData() and `fsutil file queryvaliddata ...` will follow.

Signed-off-by: David Timber <dxdt@dev.snart.me>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-04 19:29:27 +09:00
Philipp Hahn
3dce5bb82c exfat: Fix bitwise operation having different size
cpos has type loff_t (long long), while s_blocksize has type u32. The
inversion wil happen on u32, the coercion to s64 happens afterwards and
will do 0-left-paddding, resulting in the upper bits getting masked out.

Cast s_blocksize to loff_t before negating it.

Found by static code analysis using Klocwork.

Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-04 19:23:59 +09:00
Philipp Hahn
81440a740d exfat: Drop dead assignment of num_clusters
num_clusters is not used anywhere afterwards. Remove assignment.

Found by static code analysis using Klocwork.

Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-04 19:22:41 +09:00
Yang Wen
4637b4cdd7 exfat: use truncate_inode_pages_final() at evict_inode()
Currently, exfat uses truncate_inode_pages() in exfat_evict_inode().
However, truncate_inode_pages() does not mark the mapping as exiting,
so reclaim may still install shadow entries for the mapping until
the inode teardown completes.

In older kernels like Linux 5.10, if shadow entries are present
at that point,clear_inode() can hit

    BUG_ON(inode->i_data.nrexceptional);

To align with VFS eviction semantics and prevent this situation,
switch to truncate_inode_pages_final() in ->evict_inode().

Other filesystems were updated to use truncate_inode_pages_final()
in ->evict_inode() by commit 91b0abe36a ("mm + fs: store shadow
entries in page cache")'.

Signed-off-by: Yang Wen <anmuxixixi@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-26 18:33:50 +09:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
William Hansen-Baird
c1f5740667 exfat: add blank line after declarations
Add a blank line after variable declarations in fatent.c and file.c.
This improves readability and makes code style more consistent
across the exfat subsystem.

Signed-off-by: William Hansen-Baird <william.hansen.baird@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:51 +09:00
William Hansen-Baird
5e37a4577f exfat: remove unnecessary else after return statement
Else-branch is unnecessary after return statement in if-branch.
Remove to enhance readability and reduce indentation.

Signed-off-by: William Hansen-Baird <william.hansen.baird@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:51 +09:00
Chi Zhiling
9276e330c1 exfat: support multi-cluster for exfat_get_cluster
This patch introduces a count parameter to exfat_get_cluster, which
serves as an input parameter for the caller to specify the desired
number of clusters, and as an output parameter to store the length
of consecutive clusters.

This patch can improve read performance by reducing the number of
get_block calls in sequential read scenarios. speacially in small
cluster size.

According to my test data, the performance improvement is
approximately 10% when read FAT_CHAIN file with 512 bytes of
cluster size.

454 MB/s -> 511 MB/s

Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:51 +09:00
Chi Zhiling
9fb696a10a exfat: return the start of next cache in exfat_cache_lookup
Change exfat_cache_lookup to return the cluster number of the last
cluster before the next cache (i.e., the end of the current cache range)
or the given 'end' if there is no next cache. This allows the caller to
know whether the next cluster after the current cache is cached.

The function signature is changed to accept an 'end' parameter, which
is the upper bound of the search range. The function now stops early
if it finds a cache that starts within the current cache's tail, meaning
caches are contiguous. The return value is the cluster number at which
the next cache starts (minus one) or the original 'end' if no next cache
is found.

The new behavior is illustrated as follows:

cache:  [ccccccc-------ccccccccc]
search: [..................]
return:               ^

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:51 +09:00
Chi Zhiling
8c6bdce0e9 exfat: tweak cluster cache to support zero offset
The current cache mechanism does not support reading clusters starting
from a file offset of zero. This patch enables that feature in
preparation for subsequent reads of contiguous clusters from offset zero.

1. support finding clusters with zero offset.
2. allow clusters with zero offset to be cached.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:50 +09:00
Chi Zhiling
256694b22d exfat: support multi-cluster for exfat_map_cluster
This patch introduces a parameter 'count' to support fetching multiple
clusters in exfat_map_cluster. The returned 'count' indicates the number
of consecutive clusters, or 0 when the input cluster offset is past EOF.

And the 'count' is also an input parameter for the caller to specify the
required number of clusters.

Only NO_FAT_CHAIN files enable multi-cluster fetching in this patch.

After this patch, the time proportion of exfat_get_block has decreased,
The performance data is as follows:

Cluster size: 512 bytes
Sequential read of a 30GB NO_FAT_CHAIN file:
2.4GB/s -> 2.5 GB/s
proportion of exfat_get_block:
10.8% -> 0.02%

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:50 +09:00
Chi Zhiling
88a936b7a9 exfat: remove handling of non-file types in exfat_map_cluster
Yuezhang said: "exfat_map_cluster() is only used for files. The code
in this 'else' block is never executed and can be cleaned up."

Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:50 +09:00
Chi Zhiling
6d0b7f873b exfat: reuse cache to improve exfat_get_cluster
Since exfat_ent_get supports cache buffer head, we can use this option to
reduce sb_bread calls when fetching consecutive entries.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:50 +09:00
Chi Zhiling
afb6ffa33d exfat: reduce the number of parameters for exfat_get_cluster()
Remove parameter 'fclus' and 'allow_eof':

- The fclus parameter is changed to a local variable as it is not
  needed to be returned.

- The passed allow_eof parameter was always 1, remove it and the
  associated error handling.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:49 +09:00
Chi Zhiling
5dc72a5181 exfat: remove the unreachable warning for cache miss cases
The cache_id remains unchanged on a cache miss; its value is always
exactly what was set by cache_init. Therefore, checking this value
again is meaningless.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:49 +09:00
Chi Zhiling
2e21557d54 exfat: remove the check for infinite cluster chain loop
The infinite cluster chain loop check is not work because the
loop will terminate when fclus reaches the parameter cluster,
and the parameter cluster value is never greater than
ei->valid_size.

The following relationship holds:
'fclus' < 'cluster' ≤ ei->valid_size ≤ sb->num_clusters

The check would only be triggered if a cluster number greater than
sb->num_clusters is passed, but no caller currently does this.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:49 +09:00
Chi Zhiling
5e205c484b exfat: improve exfat_find_last_cluster
Since exfat_ent_get support cache buffer head, let's apply it to
exfat_find_last_cluster.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:48 +09:00
Chi Zhiling
06805f4c57 exfat: improve exfat_count_num_clusters
Since exfat_ent_get support cache buffer head, let's apply it to
exfat_count_num_clusters.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:48 +09:00
Chi Zhiling
5cf0288f0d exfat: support reuse buffer head for exfat_ent_get
This patch is part 2 of cached buffer head for exfat_ent_get,
it introduces an argument for exfat_ent_get, and make sure this
routine releases buffer head refcount when any error return.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:48 +09:00
Chi Zhiling
967288e9a6 exfat: add cache option for __exfat_ent_get
When multiple entries are obtained consecutively, these entries are mostly
stored adjacent to each other. this patch introduces a "last" parameter to
cache the last opened buffer head, and reuse it when possible, which
reduces the number of sb_bread() calls.

When the passed parameter "last" is NULL, it means cache option is
disabled, the behavior unchanged as it was.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:47 +09:00
Yuling Dong
0914882bdd exfat: reduce unnecessary writes during mmap write
During mmap write, exfat_page_mkwrite() currently extends
valid_size to the end of the VMA range. For a large mapping,
this can push valid_size far beyond the page that actually
triggered the fault, resulting in unnecessary writes.

valid_size only needs to extend to the end of the page
being written.

Signed-off-by: Yuling Dong <yuling-dong@qq.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:47 +09:00
Haotian Zhang
8ffe56b104 exfat: improve error code handling in exfat_find_empty_entry()
Change the type of 'ret' from unsigned int to int in
exfat_find_empty_entry(). Although the implicit type conversion
(int -> unsigned int -> int) does not cause actual bugs in
practice, using int directly is more appropriate for storing
error codes returned by exfat_alloc_cluster().

This improves code clarity and consistency with standard error
handling practices.

Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12 21:21:47 +09:00
Jeff Layton
b8ca026675
exfat: add setlease file operation
Add the setlease file_operation to exfat_file_operations and
exfat_dir_operations, pointing to generic_setlease.  A future patch
will change the default behavior to reject lease attempts with -EINVAL
when there is no setlease file operation defined. Add generic_setlease
to retain the ability to set leases on this filesystem.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-7-ea4dec9b67fa@kernel.org
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:55:46 +01:00
Yuezhang Mo
51fc7b4ce1 exfat: fix remount failure in different process environments
The kernel test robot reported that the exFAT remount operation
failed. The reason for the failure was that the process's umask
is different between mount and remount, causing fs_fmask and
fs_dmask are changed.

Potentially, both gid and uid may also be changed. Therefore, when
initializing fs_context for remount, inherit these mount options
from the options used during mount.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:17 +09:00
Namjae Jeon
d70a5804c5 exfat: fix divide-by-zero in exfat_allocate_bitmap
The variable max_ra_count can be 0 in exfat_allocate_bitmap(),
which causes a divide-by-zero error in the subsequent modulo operation
(i % max_ra_count), leading to a system crash.
When max_ra_count is 0, it means that readahead is not used. This patch
load the bitmap without readahead.

Fixes: 9fd688678d ("exfat: optimize allocation bitmap loading time")
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Namjae Jeon
866cba3675 exfat: validate the cluster bitmap bits of directory
Syzbot created this issue by testing an image that did not have the root
cluster bitmap bit marked. After accessing a file through the root
directory via exfat_lookup, when creating a file again with mkdir,
the root cluster bit can be allocated for direcotry, which can cause
the root cluster to be zeroed out and the same entry can be allocated
in the same cluster. This patch improved this issue by adding
exfat_test_bitmap to validate the cluster bits of the root directory
and directory. And the first cluster bit of the root directory should
never be unset except when storage is corrupted. This bit is set to
allow operations after mount.

Reported-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com
Tested-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Yuezhang Mo
4e163c39dd exfat: zero out post-EOF page cache on file extension
xfstests generic/363 was failing due to unzeroed post-EOF page
cache that allowed mmap writes beyond EOF to become visible
after file extension.

For example, in following xfs_io sequence, 0x22 should not be
written to the file but would become visible after the extension:

  xfs_io -f -t -c "pwrite -S 0x11 0 8" \
    -c "mmap 0 4096" \
    -c "mwrite -S 0x22 32 32" \
    -c "munmap" \
    -c "pwrite -S 0x33 512 32" \
    $testfile

This violates the expected behavior where writes beyond EOF via
mmap should not persist after the file is extended. Instead, the
extended region should contain zeros.

Fix this by using truncate_pagecache() to truncate the page cache
after the current EOF when extending the file.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Shuhao Fu
9aee8de970 exfat: fix refcount leak in exfat_find
Fix refcount leaks in `exfat_find` related to `exfat_get_dentry_set`.

Function `exfat_get_dentry_set` would increase the reference counter of
`es->bh` on success. Therefore, `exfat_put_dentry_set` must be called
after `exfat_get_dentry_set` to ensure refcount consistency. This patch
relocate two checks to avoid possible leaks.

Fixes: 82ebecdc74 ("exfat: fix improper check of dentry.stream.valid_size")
Fixes: 13940cef95 ("exfat: add a check for invalid data size")
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03 10:00:16 +09:00
Linus Torvalds
e7c375b181 vfs-6.18-rc7.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaRtBJwAKCRCRxhvAZXjc
 ou5CAQCJb5y2ULKklblICU1wR7Nr15WvTW7VVOcv44RJ22S3NgEAy4DLDBFBw8zC
 8e7Hp8gxbjsq8ZJmU088aobFcqbZOwk=
 =TAnu
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.18-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix unitialized variable in statmount_string()

 - Fix hostfs mounting when passing host root during boot

 - Fix dynamic lookup to fail on cell lookup failure

 - Fix missing file type when reading bfs inodes from disk

 - Enforce checking of sb_min_blocksize() calls and update all callers
   accordingly

 - Restore write access before closing files opened by open_exec() in
   binfmt_misc

 - Always freeze efivarfs during suspend/hibernate cycles

 - Fix statmount()'s and listmount()'s grab_requested_mnt_ns() helper to
   actually allow mount namespace file descriptor in addition to mount
   namespace ids

 - Fix tmpfs remount when noswap is specified

 - Switch Landlock to iput_not_last() to remove false-positives from
   might_sleep() annotations in iput()

 - Remove dead node_to_mnt_ns() code

 - Ensure that per-queue kobjects are successfully created

* tag 'vfs-6.18-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  landlock: fix splats from iput() after it started calling might_sleep()
  fs: add iput_not_last()
  shmem: fix tmpfs reconfiguration (remount) when noswap is set
  fs/namespace: correctly handle errors returned by grab_requested_mnt_ns
  power: always freeze efivarfs
  binfmt_misc: restore write access before closing files opened by open_exec()
  block: add __must_check attribute to sb_min_blocksize()
  virtio-fs: fix incorrect check for fsvq->kobj
  xfs: check the return value of sb_min_blocksize() in xfs_fs_fill_super
  isofs: check the return value of sb_min_blocksize() in isofs_fill_super
  exfat: check return value of sb_min_blocksize in exfat_read_boot_sector
  vfat: fix missing sb_min_blocksize() return value checks
  mnt: Remove dead code which might prevent from building
  bfs: Reconstruct file type when loading from disk
  afs: Fix dynamic lookup to fail on cell lookup failure
  hostfs: Fix only passing host root in boot stage with new mount
  fs: Fix uninitialized 'offp' in statmount_string()
2025-11-17 09:11:27 -08:00
Yongpeng Yang
f2c1f63163
exfat: check return value of sb_min_blocksize in exfat_read_boot_sector
sb_min_blocksize() may return 0. Check its return value to avoid
accessing the filesystem super block when sb->s_blocksize is 0.

Cc: stable@vger.kernel.org # v6.15
Fixes: 719c1e1829 ("exfat: add super block operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Link: https://patch.msgid.link/20251104125009.2111925-3-yangyongpeng.storage@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-05 14:00:16 +01:00
Jeongjun Park
2d8636119b exfat: fix out-of-bounds in exfat_nls_to_ucs2()
Since the len argument value passed to exfat_ioctl_set_volume_label()
from exfat_nls_to_utf16() is passed 1 too large, an out-of-bounds read
occurs when dereferencing p_cstring in exfat_nls_to_ucs2() later.

And because of the NLS_NAME_OVERLEN macro, another error occurs when
creating a file with a period at the end using utf8 and other iocharsets.

So to avoid this, you should remove the code that uses NLS_NAME_OVERLEN
macro and make the len argument value be the length of the label string,
but with a maximum length of FSLABEL_MAX - 1.

Reported-by: syzbot+98cc76a76de46b3714d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=98cc76a76de46b3714d4
Fixes: d01579d590 ("exfat: Add support for FS_IOC_{GET,SET}FSLABEL")
Suggested-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-10-15 17:53:20 +09:00