Commit Graph

20 Commits

Author SHA1 Message Date
Ferry Meng
c7c707cbaa erofs: avoid some unnecessary #ifdefs
They can either be removed or replaced with IS_ENABLED().

Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-03 11:25:55 +08:00
Gao Xiang
7cef3c8341 erofs: separate plain and compressed filesystems formally
The EROFS on-disk format uses a tiny, plain metadata design that
prioritizes performance and minimizes complex inconsistencies against
common writable disk filesystems (almost all serious metadata
inconsistency cannot happen in well-designed immutable filesystems like
EROFS). EROFS deliberately avoids artificial design flaws to eliminate
serious security risks from untrusted remote sources by design,
although human-made implementation bugs can still happen sometimes.

Currently, there is no strict check to prevent compressed inodes,
especially LZ4-compressed inodes, from being read in plain filesystems.

Starting with erofs-utils 1.0 and Linux 5.3, LZ4_0PADDING sb feature
is automatically enabled for LZ4-compressed EROFS images to support
in-place decompression. Furthermore, since Linux 5.4 LTS is no longer
supported, we no longer need to handle ancient LZ4-compressed EROFS
images generated by erofs-utils prior to 1.0.

To formally distinguish different filesystem types for improved
security:

 - Use the presence of LZ4_0PADDING or a non-zero
   `dsb->u1.lz4_max_distance` as a marker for compressed filesystems
   containing LZ4-compressed inodes only;

 - For other algorithms, use `dsb->u1.available_compr_algs` bitmap.

Note: LZ4_0PADDING has been supported since Linux 5.4 (the first formal
kernel version), so exposing it via sysfs is no longer necessary and is
now deprecated (but remain it for five more years until 2031):

  `dsb->u1` has been strictly non-zero for all EROFS images containing
  compressed inodes starting with erofs-utils v1.3 and it is actually
  a much better marker for compressed filesystems.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-03 11:05:57 +08:00
Gao Xiang
cc831ab336 erofs: tidy up synchronous decompression
- Get rid of `sbi->opt.max_sync_decompress_pages` since it's fixed as
   3 all the time;

 - Add Z_EROFS_MAX_SYNC_DECOMPRESS_BYTES in bytes instead of in pages,
   since for non-4K pages, 3-page limitation makes no sense;

 - Move `sync_decompress` to sbi to avoid unexpected remount impact;

 - Fold z_erofs_is_sync_decompress() into its caller;

 - Better description of sysfs entry `sync_decompress`.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-01-23 00:05:11 +08:00
Chao Yu
df0ce6cefa erofs: support to readahead dirent blocks in erofs_readdir()
This patch supports to readahead more blocks in erofs_readdir(), it can
enhance readdir performance in large direcotry.

readdir test in a large directory which contains 12000 sub-files.

		files_per_second
Before:		926385.54
After:		2380435.562

Meanwhile, let's introduces a new sysfs entry to control readahead
bytes to provide more flexible policy for readahead of readdir().
- location: /sys/fs/erofs/<disk>/dir_ra_bytes
- default value: 16384
- disable readahead: set the value to 0

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250721021352.2495371-1-chao@kernel.org
[ Gao Xiang: minor styling adjustment. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-07-24 19:44:08 +08:00
Bo Liu (OpenAnolis)
414091322c erofs: implement metadata compression
Thanks to the meta buffer infrastructure, metadata-compressed inodes are
just read from the metabox inode instead of the blockdevice (or backing
file) inode.

The same is true for shared extended attributes.

When metadata compression is enabled, inode numbers are divided from
on-disk NIDs because of non-LTS 32-bit application compatibility.

Co-developed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Bo Liu (OpenAnolis) <liubo03@inspur.com>
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250722003229.2121752-1-hsiangkao@linux.alibaba.com
2025-07-24 19:43:31 +08:00
Bo Liu
b4a29efc51 erofs: support DEFLATE decompression by using Intel QAT
This patch introduces the use of the Intel QAT to offload EROFS data
decompression, aiming to improve the decompression performance.

A 285MiB dataset is used with the following command to create EROFS
images with different cluster sizes:
     $ mkfs.erofs -zdeflate,level=9 -C{4096,16384,65536,131072,262144}

Fio is used to test the following read patterns:
     $ fio -filename=testfile -bs=4k -rw=read -name=job1
     $ fio -filename=testfile -bs=4k -rw=randread -name=job1
     $ fio -filename=testfile -bs=4k -rw=randread --io_size=14m -name=job1

Here are some performance numbers for reference:

Processors: Intel(R) Xeon(R) 6766E (144 cores)
Memory:     512 GiB

|-----------------------------------------------------------------------------|
|           | Cluster size | sequential read | randread  | small randread(5%) |
|-----------|--------------|-----------------|-----------|--------------------|
| Intel QAT |    4096      |    538  MiB/s   | 112 MiB/s |     20.76 MiB/s    |
| Intel QAT |    16384     |    699  MiB/s   | 158 MiB/s |     21.02 MiB/s    |
| Intel QAT |    65536     |    917  MiB/s   | 278 MiB/s |     20.90 MiB/s    |
| Intel QAT |    131072    |    1056 MiB/s   | 351 MiB/s |     23.36 MiB/s    |
| Intel QAT |    262144    |    1145 MiB/s   | 431 MiB/s |     26.66 MiB/s    |
| deflate   |    4096      |    499  MiB/s   | 108 MiB/s |     21.50 MiB/s    |
| deflate   |    16384     |    422  MiB/s   | 125 MiB/s |     18.94 MiB/s    |
| deflate   |    65536     |    452  MiB/s   | 159 MiB/s |     13.02 MiB/s    |
| deflate   |    131072    |    452  MiB/s   | 177 MiB/s |     11.44 MiB/s    |
| deflate   |    262144    |    466  MiB/s   | 194 MiB/s |     10.60 MiB/s    |

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250522094931.28956-1-liubo03@inspur.com
[ Gao Xiang: refine the commit message. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-25 15:27:40 +08:00
Gao Xiang
17a2a72df3 erofs: clean up erofs_{init,exit}_sysfs()
Get rid of useless `goto`s.  No logic changes.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250522084953.412096-1-hsiangkao@linux.alibaba.com
2025-05-23 07:51:14 +08:00
Gao Xiang
2e1473d519 erofs: implement 48-bit block addressing for unencoded inodes
It adapts the on-disk changes from the previous commit.  It also
supports EROFS_NULL_ADDR (all 1's) for EROFS_INODE_FLAT_PLAIN inodes
to indicate 0-filled inodes, as it's common for composefs use cases.
As a result, EROFS_INODE_CHUNK_BASED is no longer needed.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250310095459.2620647-5-hsiangkao@linux.alibaba.com
2025-03-17 01:25:26 +08:00
Chunhai Guo
db80b98305 erofs: add sysfs node to drop internal caches
Add a sysfs node to drop compression-related caches, currently used to
drop in-memory pclusters and cached compressed folios.

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241113041148.749129-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-11-18 18:50:13 +08:00
Gao Xiang
59aadaa7eb erofs: clean up erofs_register_sysfs()
After commit 684b290abc ("erofs: add support for
FS_IOC_GETFSSYSFSPATH"), `sb->s_sysfs_name` is now valid.

Just use it to get rid of duplicated logic.

Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240828095232.571946-1-hsiangkao@linux.alibaba.com
2024-09-10 00:46:34 +08:00
Thomas Weißschuh
339bc4d3cd erofs: make kobj_type structures constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230209-kobj_type-erofs-v1-1-078c945e2c4b@weissschuh.net
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-02-15 08:11:26 +08:00
Jingbo Xu
39bfcb8138 erofs: fix use-after-free of fsid and domain_id string
When erofs instance is remounted with fsid or domain_id mount option
specified, the original fsid and domain_id string pointer in sbi->opt
is directly overridden with the fsid and domain_id string in the new
fs_context, without freeing the original fsid and domain_id string.
What's worse, when the new fsid and domain_id string is transferred to
sbi, they are not reset to NULL in fs_context, and thus they are freed
when remount finishes, while sbi is still referring to these strings.

Reconfiguration for fsid and domain_id seems unusual. Thus clarify this
restriction explicitly and dump a warning when users are attempting to
do this.

Besides, to fix the use-after-free issue, move fsid and domain_id from
erofs_mount_opts to outside.

Fixes: c6be2bd0a5 ("erofs: register fscache volume")
Fixes: 8b7adf1dff ("erofs: introduce fscache-based domain")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221021023153.1330-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-10 09:53:20 +08:00
Gao Xiang
5c2a64252c erofs: introduce partial-referenced pclusters
Due to deduplication for compressed data, pclusters can be partially
referenced with their prefixes.

Together with the user-space implementation, it enables EROFS
variable-length global compressed data deduplication with rolling
hash.

Link: https://lore.kernel.org/r/20220923014915.4362-1-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-26 23:55:43 +08:00
Yue Hu
b15b2e307c erofs: support on-disk compressed fragments data
Introduce on-disk compressed fragments data feature.

This approach adds a new field called `h_fragmentoff' in the per-file
compression header to indicate the fragment offset of each tail pcluster
or the whole file in the special packed inode.

Similar to ztailpacking, it will also find and record the 'headlcn'
of the tail pcluster when initializing per-inode zmap for making
follow-on requests more easy.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/YzHKxcFTlHGgXeH9@B-P7TQMD6M-0146.local
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-26 23:55:39 +08:00
Jia Zhu
2ef1644141 erofs: introduce 'domain_id' mount option
Introduce 'domain_id' mount option to enable shared domain sementics.
In which case, the related cookie is shared if two mountpoints in the
same domain have the same data blob. Users could specify the name of
domain by this mount option.

Signed-off-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220918043456.147-7-zhujia.zj@bytedance.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-20 08:01:54 +08:00
Jeffle Xu
9c0cc9c729 erofs: add 'fsid' mount option
Introduce 'fsid' mount option to enable on-demand read sementics, in
which case, erofs will be mounted from data blobs. Users could specify
the name of primary data blob by this mount option.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220425122143.56815-22-jefflexu@linux.alibaba.com
Acked-by: Chao Yu <chao@kernel.org>
Tested-by: Zichen Tian <tianzichen@kuaishou.com>
Tested-by: Jia Zhu <zhujia.zj@bytedance.com>
Tested-by: Yan Song <yansong.ys@antgroup.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-18 00:11:21 +08:00
Dongliang Mu
a942da24ab fs: erofs: add sanity check for kobject in erofs_unregister_sysfs
Syzkaller hit 'WARNING: kobject bug in erofs_unregister_sysfs'. This bug
is triggered by injecting fault in kobject_init_and_add of
erofs_unregister_sysfs.

Fix this by adding sanity check for kobject in erofs_unregister_sysfs

Note that I've tested the patch and the crash does not occur any more.

Link: https://lore.kernel.org/r/20220315132814.12332-1-dzm91@hust.edu.cn
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Fixes: 168e9a7620 ("erofs: add sysfs interface")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-03-17 00:09:02 +08:00
Yue Hu
ab92184ff8 erofs: add on-disk compressed tail-packing inline support
Introduces erofs compressed tail-packing inline support.

This approach adds a new field called `h_idata_size' in the
per-file compression header to indicate the encoded size of
each tail-packing pcluster.

At runtime, it will find the start logical offset of the tail
pcluster when initializing per-inode zmap and record such
extent (headlcn, idataoff) information to the in-memory inode.
Therefore, follow-on requests can directly recognize if one
pcluster is a tail-packing inline pcluster or not.

Link: https://lore.kernel.org/r/20211228054604.114518-6-hsiangkao@linux.alibaba.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-31 00:51:10 +08:00
Huang Jianan
40452ffca3 erofs: add sysfs node to control sync decompression strategy
Although readpage is a synchronous path, there will be no additional
kworker scheduling overhead in non-atomic contexts together with
dm-verity.

Let's add a sysfs node to disable sync decompression as an option.

Link: https://lore.kernel.org/r/20211206143552.8384-1-huangjianan@oppo.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08 09:42:18 +08:00
Huang Jianan
168e9a7620 erofs: add sysfs interface
Add sysfs interface to configure erofs related parameters later.

Link: https://lore.kernel.org/r/20211201145436.4357-1-huangjianan@oppo.com
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08 09:40:37 +08:00