- hfsplus: fix generic/642 failure
- hfsplus: rework logic of map nodes creation in xattr b-tree
- hfsplus: fix logic of alloc/free b-tree node
- hfsplus: fix error processing issue in hfs_bmap_free()
- hfsplus: fix potential race conditions in b-tree functionality
- hfsplus: extract hidden directory search into a helper function
- hfsplus: fix held lock freed on hfsplus_fill_super()
- hfsplus: fix generic/523 test-case failure
- hfsplus: validate b-tree node 0 bitmap at mount time
- hfsplus: refactor b-tree map page access and add node-type validation
- hfsplus: fix to update ctime after rename
- hfsplus: fix generic/533 test-case failure
- hfsplus: set ctime after setxattr and removexattr
- hfsplus: fix uninit-value by validating catalog record size
- hfsplus: fix potential Allocation File corruption after fsync
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCadlj7wAKCRAhI4xTh21N
nRzsAQDhjGNvUe9QJUDk7D/GjMxWUs+8jrGw9Ruu8GhvRIePzwEAtkZoJRwNQzwa
9XEyenrKqxLTaqaPtRChicbmdKw7eQQ=
=TUrR
-----END PGP SIGNATURE-----
Merge tag 'hfs-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfsplus updates from Viacheslav Dubeyko:
"This contains several fixes of syzbot reported issues and HFS+ fixes
of xfstests failures.
- Fix a syzbot reported issue of a KMSAN uninit-value in
hfsplus_strcasecmp().
The root cause was that hfs_brec_read() doesn't validate that the
on-disk record size matches the expected size for the record type
being read. The fix introduced hfsplus_brec_read_cat() wrapper that
validates the record size based on the type field and returns -EIO
if size doesn't match (Deepanshu Kartikey)
- Fix a syzbot reported issue of processing corrupted HFS+ images
where the b-tree allocation bitmap indicates that the header node
(Node 0) is free. Node 0 must always be allocated. Violating this
invariant leads to allocator corruption, which cascades into kernel
panics or undefined behavior.
Prevent trusting a corrupted allocator state by adding a validation
check during hfs_btree_open(). If corruption is detected, print a
warning identifying the specific corrupted tree and force the
filesystem to mount read-only (SB_RDONLY).
This prevents kernel panics from corrupted images while enabling
data recovery (Shardul Bankar)
- Fix a potential deadlock in hfsplus_fill_super().
hfsplus_fill_super() calls hfs_find_init() to initialize a search
structure, which acquires tree->tree_lock. If the subsequent call
to hfsplus_cat_build_key() fails, the function jumps to the
out_put_root error label without releasing the lock.
Fix this by adding the missing hfs_find_exit(&fd) call before
jumping to the out_put_root error label. This ensures that
tree->tree_lock is properly released on the error path (Zilin Guan)
- Update a files ctime after rename in hfsplus_rename() (Yangtao Li)
The rest of the patches introduce the HFS+ fixes for the case of
generic/348, generic/728, generic/533, generic/523, and generic/642
test-cases of xfstests suite"
* tag 'hfs-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfsplus: fix generic/642 failure
hfsplus: rework logic of map nodes creation in xattr b-tree
hfsplus: fix logic of alloc/free b-tree node
hfsplus: fix error processing issue in hfs_bmap_free()
hfsplus: fix potential race conditions in b-tree functionality
hfsplus: extract hidden directory search into a helper function
hfsplus: fix held lock freed on hfsplus_fill_super()
hfsplus: fix generic/523 test-case failure
hfsplus: validate b-tree node 0 bitmap at mount time
hfsplus: refactor b-tree map page access and add node-type validation
hfsplus: fix to update ctime after rename
hfsplus: fix generic/533 test-case failure
hfsplus: set ctime after setxattr and removexattr
hfsplus: fix uninit-value by validating catalog record size
hfsplus: fix potential Allocation File corruption after fsync
The HFS_BNODE_DELETED flag is checked in hfs_bnode_put()
under locked tree->hash_lock. This patch adds locking
for the case of setting the HFS_BNODE_DELETED flag in
hfs_bnode_unlink() with the goal to avoid potential
race conditions.
The hfs_btree_write() method should be called under
tree->tree_lock. This patch reworks logic by adding
locking the tree->tree_lock for the calls of
hfs_btree_write() in hfsplus_cat_write_inode() and
hfsplus_system_write_inode().
This patch adds also the lockdep_assert_held() in
hfs_bmap_reserve(), hfs_bmap_alloc(), and hfs_bmap_free().
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260403230556.614171-2-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
In hfsplus_fill_super(), the process of looking up the hidden directory
involves initializing a catalog search, building a search key, reading
the b-tree record, and releasing the search data.
Currently, this logic is open-coded directly within the main superblock
initialization routine. This makes hfsplus_fill_super() quite lengthy
and its error handling paths less straightforward.
Extract the hidden directory search sequence into a new helper function,
hfsplus_get_hidden_dir_entry(). This improves overall code readability,
cleanly encapsulates the hfs_find_data lifecycle, and simplifies the
error exits in hfsplus_fill_super().
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
hfsplus_fill_super() calls hfs_find_init() to initialize a search
structure, which acquires tree->tree_lock. If the subsequent call to
hfsplus_cat_build_key() fails, the function jumps to the out_put_root
error label without releasing the lock. The later cleanup path then
frees the tree data structure with the lock still held, triggering a
held lock freed warning.
Fix this by adding the missing hfs_find_exit(&fd) call before jumping
to the out_put_root error label. This ensures that tree->tree_lock is
properly released on the error path.
The bug was originally detected on v6.13-rc1 using an experimental
static analysis tool we are developing, and we have verified that the
issue persists in the latest mainline kernel. The tool is specifically
designed to detect memory management issues. It is currently under active
development and not yet publicly available.
We confirmed the bug by runtime testing under QEMU with x86_64 defconfig,
lockdep enabled, and CONFIG_HFSPLUS_FS=y. To trigger the error path, we
used GDB to dynamically shrink the max_unistr_len parameter to 1 before
hfsplus_asc2uni() is called. This forces hfsplus_asc2uni() to naturally
return -ENAMETOOLONG, which propagates to hfsplus_cat_build_key() and
exercises the faulty error path. The following warning was observed
during mount:
=========================
WARNING: held lock freed!
7.0.0-rc3-00016-gb4f0dd314b39 #4 Not tainted
-------------------------
mount/174 is freeing memory ffff888103f92000-ffff888103f92fff, with a lock still held there!
ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0
2 locks held by mount/174:
#0: ffff888103f960e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.constprop.0+0x167/0xa40
#1: ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0
stack backtrace:
CPU: 2 UID: 0 PID: 174 Comm: mount Not tainted 7.0.0-rc3-00016-gb4f0dd314b39 #4 PREEMPT(lazy)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x82/0xd0
debug_check_no_locks_freed+0x13a/0x180
kfree+0x16b/0x510
? hfsplus_fill_super+0xcb4/0x18a0
hfsplus_fill_super+0xcb4/0x18a0
? __pfx_hfsplus_fill_super+0x10/0x10
? srso_return_thunk+0x5/0x5f
? bdev_open+0x65f/0xc30
? srso_return_thunk+0x5/0x5f
? pointer+0x4ce/0xbf0
? trace_contention_end+0x11c/0x150
? __pfx_pointer+0x10/0x10
? srso_return_thunk+0x5/0x5f
? bdev_open+0x79b/0xc30
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? vsnprintf+0x6da/0x1270
? srso_return_thunk+0x5/0x5f
? __mutex_unlock_slowpath+0x157/0x740
? __pfx_vsnprintf+0x10/0x10
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? mark_held_locks+0x49/0x80
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? irqentry_exit+0x17b/0x5e0
? trace_irq_disable.constprop.0+0x116/0x150
? __pfx_hfsplus_fill_super+0x10/0x10
? __pfx_hfsplus_fill_super+0x10/0x10
get_tree_bdev_flags+0x302/0x580
? __pfx_get_tree_bdev_flags+0x10/0x10
? vfs_parse_fs_qstr+0x129/0x1a0
? __pfx_vfs_parse_fs_qstr+0x3/0x10
vfs_get_tree+0x89/0x320
fc_mount+0x10/0x1d0
path_mount+0x5c5/0x21c0
? __pfx_path_mount+0x10/0x10
? trace_irq_enable.constprop.0+0x116/0x150
? trace_irq_enable.constprop.0+0x116/0x150
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? kmem_cache_free+0x307/0x540
? user_path_at+0x51/0x60
? __x64_sys_mount+0x212/0x280
? srso_return_thunk+0x5/0x5f
__x64_sys_mount+0x212/0x280
? __pfx___x64_sys_mount+0x10/0x10
? srso_return_thunk+0x5/0x5f
? trace_irq_enable.constprop.0+0x116/0x150
? srso_return_thunk+0x5/0x5f
do_syscall_64+0x111/0x680
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ffacad55eae
Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 8
RSP: 002b:00007fff1ab55718 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffacad55eae
RDX: 000055740c64e5b0 RSI: 000055740c64e630 RDI: 000055740c651ab0
RBP: 000055740c64e380 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000055740c64e5b0 R14: 000055740c651ab0 R15: 000055740c64e380
</TASK>
After applying this patch, the warning no longer appears.
Fixes: 89ac9b4d3d ("hfsplus: fix longname handling")
CC: stable@vger.kernel.org
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Syzbot reported a KMSAN uninit-value issue in hfsplus_strcasecmp(). The
root cause is that hfs_brec_read() doesn't validate that the on-disk
record size matches the expected size for the record type being read.
When mounting a corrupted filesystem, hfs_brec_read() may read less data
than expected. For example, when reading a catalog thread record, the
debug output showed:
HFSPLUS_BREC_READ: rec_len=520, fd->entrylength=26
HFSPLUS_BREC_READ: WARNING - entrylength (26) < rec_len (520) - PARTIAL READ!
hfs_brec_read() only validates that entrylength is not greater than the
buffer size, but doesn't check if it's less than expected. It successfully
reads 26 bytes into a 520-byte structure and returns success, leaving 494
bytes uninitialized.
This uninitialized data in tmp.thread.nodeName then gets copied by
hfsplus_cat_build_key_uni() and used by hfsplus_strcasecmp(), triggering
the KMSAN warning when the uninitialized bytes are used as array indices
in case_fold().
Fix by introducing hfsplus_brec_read_cat() wrapper that:
1. Calls hfs_brec_read() to read the data
2. Validates the record size based on the type field:
- Fixed size for folder and file records
- Variable size for thread records (depends on string length)
3. Returns -EIO if size doesn't match expected
For thread records, check against HFSPLUS_MIN_THREAD_SZ before reading
nodeName.length to avoid reading uninitialized data at call sites that
don't zero-initialize the entry structure.
Also initialize the tmp variable in hfsplus_find_cat() as defensive
programming to ensure no uninitialized data even if validation is
bypassed.
Reported-by: syzbot+d80abb5b890d39261e72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d80abb5b890d39261e72
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: syzbot+d80abb5b890d39261e72@syzkaller.appspotmail.com
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Viacheslav Dubeyko <slava@dubeyko.com>
Suggested-by: Charalampos Mitrodimas <charmitro@posteo.net>
Link: https://lore.kernel.org/all/20260120051114.1281285-1-kartikey406@gmail.com/ [v1]
Link: https://lore.kernel.org/all/20260121063109.1830263-1-kartikey406@gmail.com/ [v2]
Link: https://lore.kernel.org/all/20260212014233.2422046-1-kartikey406@gmail.com/ [v3]
Link: https://lore.kernel.org/all/20260214002100.436125-1-kartikey406@gmail.com/T/ [v4]
Link: https://lore.kernel.org/all/20260221061626.15853-1-kartikey406@gmail.com/T/ [v5]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20260307010302.41547-1-kartikey406@gmail.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
On 32-bit architectures, unsigned long is only 32 bits wide, which
causes 64-bit inode numbers to be silently truncated. Several
filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that
exceed 32 bits, and this truncation can lead to inode number collisions
and other subtle bugs on 32-bit systems.
Change the type of inode->i_ino from unsigned long to u64 to ensure that
inode numbers are always represented as 64-bit values regardless of
architecture. Update all format specifiers treewide from %lu/%lx to
%llu/%llx to match the new type, along with corresponding local variable
types.
This is the bulk treewide conversion. Earlier patches in this series
handled trace events separately to allow trace field reordering for
better struct packing on 32-bit.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The generic/348 test-case has revealed the issue of
HFS+ volume corruption after simulated power failure:
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.15.0-rc4+ #8 SMP PREEMPT_DYNAMIC Thu May 1 16:43:22 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/348 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see xfstests-dev/results//generic/348.full for details)
The fsck tool complains about Allocation File (block bitmap)
corruption as a result of such event. The generic/348 creates
a symlink, fsync its parent directory, power fail and mount
again the filesystem. Currently, HFS+ logic has several flags
HFSPLUS_I_CAT_DIRTY, HFSPLUS_I_EXT_DIRTY, HFSPLUS_I_ATTR_DIRTY,
HFSPLUS_I_ALLOC_DIRTY. If inode operation modified the Catalog
File, Extents Overflow File, Attributes File, or Allocation
File, then inode is marked as dirty and one of the mentioned
flags has been set. When hfsplus_file_fsync() has been called,
then this set of flags is checked and dirty b-tree or/and
block bitmap is flushed. However, block bitmap can be modified
during file's content allocation. It means that if we call
hfsplus_file_fsync() for directory, then we never flush
the modified Allocation File in such case because such inode
cannot receive HFSPLUS_I_ALLOC_DIRTY flag. Moreover, this
inode-centric model is not good at all because Catalog File,
Extents Overflow File, Attributes File, and Allocation File
represent the whole state of file system metadata. This
inode-centric policy is the main reason of the issue.
This patch saves the whole approach of using HFSPLUS_I_CAT_DIRTY,
HFSPLUS_I_EXT_DIRTY, HFSPLUS_I_ATTR_DIRTY, and
HFSPLUS_I_ALLOC_DIRTY flags. But Catalog File, Extents Overflow
File, Attributes File, and Allocation File have associated
inodes. And namely these inodes become the mechanism of
checking the dirty state of metadata. The hfsplus_file_fsync()
method checks the dirtiness of file system metadata by
testing HFSPLUS_I_CAT_DIRTY, HFSPLUS_I_EXT_DIRTY,
HFSPLUS_I_ATTR_DIRTY, and HFSPLUS_I_ALLOC_DIRTY flags of
Catalog File's, Extents Overflow File's, Attributes File's, or
Allocation File's inodes. As a result, even if we call
hfsplus_file_fsync() for parent folder, then dirty Allocation File
will be flushed anyway.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20260220220152.152721-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
The recent commit "hfsplus: ensure sb->s_fs_info is always cleaned up"
[1] introduced a custom ->kill_sb() handler (hfsplus_kill_super) that
cleans up the s_fs_info structure (including the NLS table) on
superblock destruction.
However, the error handling path in hfsplus_fill_super() still calls
unload_nls() before returning an error. Since the VFS layer calls
->kill_sb() when fill_super fails, this results in unload_nls() being
called twice for the same sbi->nls pointer: once in hfsplus_fill_super()
and again in hfsplus_kill_super() (via delayed_free).
Remove the explicit unload_nls() call from the error path in
hfsplus_fill_super() to rely solely on the cleanup in ->kill_sb().
[1] https://lore.kernel.org/r/20251201222843.82310-3-mehdi.benhadjkhelifa@gmail.com/
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20260203043806.GF3183987@ZenIV/
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Link: https://lore.kernel.org/r/20260204170440.1337261-1-shardul.b@mpiricsoftware.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
When hfsplus was converted to the new mount api a bug was introduced by
changing the allocation pattern of sb->s_fs_info. If setup_bdev_super()
fails after a new superblock has been allocated by sget_fc(), but before
hfsplus_fill_super() takes ownership of the filesystem-specific s_fs_info
data it was leaked.
Fix this by freeing sb->s_fs_info in hfsplus_kill_super().
Cc: stable@vger.kernel.org
Fixes: 432f7c78cb ("hfsplus: convert hfsplus to use the new mount api")
Reported-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20251201222843.82310-3-mehdi.benhadjkhelifa@gmail.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
- hfs/hfsplus: move on-disk layout declarations into hfs_common.h
- hfsplus: fix volume corruption issue for generic/101
- hfsplus: introduce KUnit tests for HFS+ string operations
- hfs: introduce KUnit tests for HFS string operations
- hfsplus: fix volume corruption issue for generic/073
- hfsplus: Verify inode mode when loading from disk
- hfsplus: fix volume corruption issue for generic/070
- hfs/hfsplus: prevent getting negative values of offset/length
- hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
- hfs: fix potential use after free in hfs_correct_next_unused_CNID()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCaSnmHAAKCRAhI4xTh21N
nWt0AQDQ4hDGj4VkHNzWWGfh6GL+RhSwKgEzf897tJlUZDewogD/TE9bZnzOKjOw
YhWPXHEH4xy9+QaDXRgXk2DnWS+YKwg=
=mAL6
-----END PGP SIGNATURE-----
Merge tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"Several fixes for syzbot reported issues, HFS/HFS+ fixes of xfstests
failures, Kunit-based unit-tests introduction, and code cleanup:
- Dan Carpenter fixed a potential use-after-free issue in
hfs_correct_next_unused_CNID() method. Tetsuo Handa has made nice
fix of syzbot reported issue related to incorrect inode->i_mode
management if volume has been corrupted somehow. Yang Chenzhi has
made really good fix of potential race condition in
__hfs_bnode_create() method for HFS+ file system.
- Several fixes to xfstests failures. Particularly, generic/070,
generic/073, and generic/101 test-cases finish successfully for the
case of HFS+ file system right now.
- HFS and HFS+ drivers share multiple structures of on-disk layout
declarations. Some structures are used without any change. However,
we had two independent declarations of the same structures in HFS
and HFS+ drivers.
The on-disk layout declarations have been moved into
include/linux/hfs_common.h with the goal to exclude the
declarations duplication and to keep the HFS/HFS+ on-disk layout
declarations in one place.
Also, this patch prepares the basis for creating a hfslib that can
aggregate common functionality without necessity to duplicate the
same code in HFS and HFS+ drivers.
- HFS/HFS+ really need unit-tests because of multiple xfstests
failures. The first two patches introduce Kunit-based unit-tests
for the case string operations in HFS/HFS+ file system drivers"
* tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfs/hfsplus: move on-disk layout declarations into hfs_common.h
hfsplus: fix volume corruption issue for generic/101
hfsplus: introduce KUnit tests for HFS+ string operations
hfs: introduce KUnit tests for HFS string operations
hfsplus: fix volume corruption issue for generic/073
hfsplus: Verify inode mode when loading from disk
hfsplus: fix volume corruption issue for generic/070
hfs/hfsplus: prevent getting negative values of offset/length
hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
hfs: fix potential use after free in hfs_correct_next_unused_CNID()
The xfstests' test-case generic/101 leaves HFS+ volume
in corrupted state:
sudo ./check generic/101
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/101 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/101.full for details)
Ran: generic/101
Failures: generic/101
Failed 1 of 1 tests
sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
Invalid volume free block count
(It should be 2614350 instead of 2614382)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000000
** Repairing volume.
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.
This test executes such steps: "Test that if we truncate a file
to a smaller size, then truncate it to its original size or
a larger size, then fsyncing it and a power failure happens,
the file will have the range [first_truncate_size, last_size[ with
all bytes having a value of 0x00 if we read it the next time
the filesystem is mounted.".
HFS+ keeps volume's free block count in the superblock.
However, hfsplus_file_fsync() doesn't store superblock's
content. As a result, superblock contains not correct
value of free blocks if a power failure happens.
This patch adds functionality of saving superblock's
content during hfsplus_file_fsync() call.
sudo ./check generic/101
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc3+ #96 SMP PREEMPT_DYNAMIC Wed Nov 19 12:47:37 PST 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/101 32s ... 30s
Ran: generic/101
Passed all 1 tests
sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled appears to be OK.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20251119223219.1824434-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Currently, HFS/HFS+ has very obsolete and inconvenient
debug output subsystem. Also, the code is duplicated
in HFS and HFS+ driver. This patch introduces
linux/hfs_common.h for gathering common declarations,
inline functions, and common short methods. Currently,
this file contains only hfs_dbg() function that
employs pr_debug() with the goal to print a debug-level
messages conditionally.
So, now, it is possible to enable the debug output
by means of:
echo 'file extent.c +p' > /proc/dynamic_debug/control
echo 'func hfsplus_evict_inode +p' > /proc/dynamic_debug/control
And debug output looks like this:
hfs: pid 5831:fs/hfs/catalog.c:228 hfs_cat_delete(): delete_cat: 00,48
hfs: pid 5831:fs/hfs/extent.c:484 hfs_file_truncate(): truncate: 48, 409600 -> 0
hfs: pid 5831:fs/hfs/extent.c:212 hfs_dump_extent():
hfs: pid 5831:fs/hfs/extent.c:214 hfs_dump_extent(): 78:4
hfs: pid 5831:fs/hfs/extent.c:214 hfs_dump_extent(): 0:0
hfs: pid 5831:fs/hfs/extent.c:214 hfs_dump_extent(): 0:0
v4
Debug messages have been reworked and information about
new HFS/HFS+ shared declarations file has been added
to MAINTAINERS file.
v5
Yangtao Li suggested to clean up debug output and
fix several typos.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
If Catalog File contains corrupted record for the case of
hidden directory's type, regard it as I/O error instead of
Invalid argument.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250805165905.3390154-1-frank.li@vivo.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
- hfs: fix general protection fault in hfs_find_init()
- hfs: fix slab-out-of-bounds in hfs_bnode_read()
- hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
- hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
- hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
- hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
- hfsplus: remove mutex_lock check in hfsplus_free_extents
- hfs: make splice write available again
- hfsplus: make splice write available again
- hfs: fix not erasing deleted b-tree node issue
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCaIQQ0wAKCRAhI4xTh21N
nW3yAQDMhJcNyjP1j2dhNRq8l2PO6jDJqLhxAYGKwWMwv1GTvQD5AaOUSeMQbmcs
hNkMtjzb7OlfBLUthvrWlaCfLKWCmAk=
=dI94
-----END PGP SIGNATURE-----
Merge tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"Johannes Thumshirn has made nice cleanup in hfsplus_submit_bio().
Tetsuo Handa has fixed the syzbot reported issue in
hfsplus_create_attributes_file() for the case of corruption the
Attributes File's metadata.
Yangtao Li has fixed the syzbot reported issue by removing the
uneccessary WARN_ON() in hfsplus_free_extents().
Other fixes:
- restore generic/001 successful execution by erasing deleted b-tree
nodes
- eliminate slab-out-of-bounds issue in hfs_bnode_read() and
hfsplus_bnode_read() by checking correctness of offset and length
when accessing b-tree node contents
- eliminate slab-out-of-bounds read in hfsplus_uni2asc() if the
b-tree node record has corrupted length of a name that could be
bigger than HFSPLUS_MAX_STRLEN
- eliminate general protection fault in hfs_find_init() for the case
of initial b-tree object creation"
* tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfs: fix general protection fault in hfs_find_init()
hfs: fix slab-out-of-bounds in hfs_bnode_read()
hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
hfsplus: remove mutex_lock check in hfsplus_free_extents
hfs: make splice write available again
hfsplus: make splice write available again
hfs: fix not erasing deleted b-tree node issue
hfsplus_submit_bio() called by hfsplus_sync_fs() uses bdev_virt_rw() which
in turn uses submit_bio_wait() to submit the BIO.
But submit_bio_wait() already sets the REQ_SYNC flag on the BIO so there
is no need for setting the flag in hfsplus_sync_fs() when calling
hfsplus_submit_bio().
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/20250710063553.4805-1-johannes.thumshirn@wdc.com
Link: https://lore.kernel.org/r/20250710063553.4805-1-johannes.thumshirn@wdc.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
... to be used instead of manually assigning to ->s_d_op.
All in-tree filesystem converted (and field itself is renamed,
so any out-of-tree ones in need of conversion will be caught
by compiler).
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the hfsplus filesystem to use the new mount API.
Tested by comparing random mount & remount options before and after
the change.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/20240916172735.866916-6-sandeen@redhat.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
->d_hash() and ->d_compare() use those, so we need to delay freeing
them.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The current hfsplus_put_super first calls hfs_btree_close on
sbi->ext_tree, then invokes iput on sbi->hidden_dir, resulting in an
use-after-free issue in hfsplus_release_folio.
As shown in hfsplus_fill_super, the error handling code also calls iput
before hfs_btree_close.
To fix this error, we move all iput calls before hfsplus_btree_close.
Note that this patch is tested on Syzbot.
Link: https://lkml.kernel.org/r/20230226124948.3175736-1-mudongliangabcd@gmail.com
Reported-by: syzbot+57e3e98f7e3b80f64d56@syzkaller.appspotmail.com
Tested-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
variables that represent request flags. Combine the last two
hfsplus_submit_bio() arguments into a single argument.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-55-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no point in allocating memory for a synchronous flush.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The argument isn't used by any caller, and drivers don't fill out
bi_sector for flush requests either.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The HFS+ Access Control Lists have not worked at all for the past five
years, and nobody seems to have noticed. Besides, POSIX draft ACLs are
not compatible with MacOS. Drop the feature entirely.
Link: http://lkml.kernel.org/r/20180714190608.wtnmmtjqeyladkut@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
syzbot is reporting NULL pointer dereference at mount_fs() [1]. This is
because hfsplus_fill_super() is by error returning 0 when
hfsplus_fill_super() detected invalid filesystem image, and mount_bdev()
is returning NULL because dget(s->s_root) == NULL if s->s_root == NULL,
and mount_fs() is accessing root->d_sb because IS_ERR(root) == false if
root == NULL. Fix this by returning -EINVAL when hfsplus_fill_super()
detected invalid filesystem image.
[1] https://syzkaller.appspot.com/bug?id=21acb6850cecbc960c927229e597158cf35f33d0
Link: http://lkml.kernel.org/r/d83ce31a-874c-dd5b-f790-41405983a5be@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+01ffaf5d9568dd1609f7@syzkaller.appspotmail.com>
Reviewed-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This
is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does
not fail if queue_delayed_work() was called, and the out_put_hidden_dir
label is the appropriate location to call cancel_delayed_work_sync().
[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When creating a file inside a directory that has the setgid flag set, give
the new file the group ID of the parent, and also the setgid flag if it is
a directory itself.
Link: http://lkml.kernel.org/r/20171204192705.GA6101@debian.home
Signed-off-by: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Firstly by applying the following with coccinelle's spatch:
@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)
to effect the conversion to sb_rdonly(sb), then by applying:
@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)
@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)
to remove left over excess bracketage and finally by applying:
@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)
to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.
Signed-off-by: David Howells <dhowells@redhat.com>
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Separate the op from the rq_flag_bits and have gfs2
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We need to protect the list of hfsplus_readdir_data against parallel
insertions (in readdir) and removals (in release). Add a spinlock
for that. Note that it has nothing to do with protection of
hfsplus_readdir_data->key - we have an exclusion between hfsplus_readdir()
and hfsplus_delete_cat() on directory lock and between several
hfsplus_readdir() for the same struct file on ->f_pos_lock. The spinlock
is strictly for list changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.
This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it. c files
which need access to more backing-dev details now include
backing-dev.h directly. This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.
v2: fs/fat build failure fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Longname is not correctly handled by hfsplus driver. If an attempt to
create a longname(>255) file/directory is made, it succeeds by creating a
file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key. Thus
leaving the volume in an inconsistent state. This patch fixes this issue.
Although lookup is always called first to create a negative entry, so just
doing a check in lookup would probably fix this issue. I choose to
propagate error to other iops as well.
Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
hfsplus_cat_build_key, to avoid unncessary branching.
Thanks a lot.
TEST:
------
dir="TEST_DIR"
cdir=`pwd`
name255="_123456789_123456789_123456789_123456789_123456789_123456789\
_123456789_123456789_123456789_123456789_123456789_123456789_123456789\
_123456789_123456789_123456789_123456789_123456789_123456789_123456789\
_123456789_123456789_123456789_123456789_123456789_1234"
name256="${name255}5"
mkdir $dir
cd $dir
touch $name255
rm -f $name255
touch $name256
ls -la
cd $cdir
rm -rf $dir
RESULT:
-------
[sougata@ultrabook tmp]$ cdir=`pwd`
[sougata@ultrabook tmp]$
name255="_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_1234"
[sougata@ultrabook tmp]$ name256="${name255}5"
[sougata@ultrabook tmp]$
[sougata@ultrabook tmp]$ mkdir $dir
[sougata@ultrabook tmp]$ cd $dir
[sougata@ultrabook TEST_DIR]$ touch $name255
[sougata@ultrabook TEST_DIR]$ rm -f $name255
[sougata@ultrabook TEST_DIR]$ touch $name256
[sougata@ultrabook TEST_DIR]$ ls -la
ls: cannot access
_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
No such file or directory
total 0
drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
drwxrwxrwx 1 root root 6 Feb 20 19:56 ..
-????????? ? ? ? ? ?
_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
[sougata@ultrabook TEST_DIR]$ cd $cdir
[sougata@ultrabook tmp]$ rm -rf $dir
rm: cannot remove `TEST_DIR': Directory not empty
-ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
and incorrect keys, leaving the FS in an inconsistent state. This patch
fixes this issue.
Signed-off-by: Sougata Santra <sougata@tuxera.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
+165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
DrPKlXwIgva7zq5/S0Vr
=4s1Z
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
ext4: fix premature freeing of partial clusters split across leaf blocks
ext4: remove unneeded test of ret variable
ext4: fix comment typo
ext4: make ext4_block_zero_page_range static
ext4: atomically set inode->i_flags in ext4_set_inode_flags()
ext4: optimize Hurd tests when reading/writing inodes
ext4: kill i_version support for Hurd-castrated file systems
ext4: each filesystem creates and uses its own mb_cache
fs/mbcache.c: doucple the locking of local from global data
fs/mbcache.c: change block and index hash chain to hlist_bl_node
ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
ext4: refactor ext4_fallocate code
ext4: Update inode i_size after the preallocation
ext4: fix partial cluster handling for bigalloc file systems
ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
ext4: only call sync_filesystm() when remounting read-only
fs: push sync_filesystem() down to the file system's remount_fs()
jbd2: improve error messages for inconsistent journal heads
jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
jbd2: minimize region locked by j_list_lock in journal_get_create_access()
...
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page. As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently. At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.
Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate. Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement functionality of creation AttributesFile metadata file on HFS+
volume in the case of absence of it.
It makes trying to open AttributesFile's B-tree during mount of HFS+
volume. If HFS+ volume hasn't AttributesFile then a pointer on
AttributesFile's B-tree keeps as NULL. Thereby, when it is discovered
absence of AttributesFile on HFS+ volume in the begin of xattr creation
operation then AttributesFile will be created.
The creation of AttributesFile will have success in the case of
availability (2 * clump) free blocks on HFS+ volume. Otherwise,
creation operation is ended with error (-ENOSPC).
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>