linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-07 22:14:04 +02:00

Author	SHA1	Message	Date
Christoph Hellwig	86de848403	xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent Accessing if_bytes without the ilock is racy. Remove the initial if_bytes == 0 check in xfs_reflink_end_cow_extent and let ext_iext_lookup_extent fail for this case after we've taken the ilock. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-05-03 11:20:06 +05:30
Christoph Hellwig	99fb6b7ad1	xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later Defer the extent counter size upgrade until we know we're going to modify the extent mapping. This also defers dirtying the transaction and will allow us safely back out later in the function in later changes. Fixes: `4f86bb4b66` ("xfs: Conditionally upgrade existing inodes to use large extent counters") Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-05-03 11:20:06 +05:30
Christoph Hellwig	cc3c92e7e7	xfs: xfs_quota_unreserve_blkres can't fail Unreserving quotas can't fail due to quota limits, and we'll notice a shut down file system a bit later in all the callers anyway. Return void and remove the error checking and propagation in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-05-03 11:15:03 +05:30
Christoph Hellwig	f7b9ee7845	xfs: consolidate the xfs_quota_reserve_blkres definitions xfs_trans_reserve_quota_nblks is already stubbed out if quota support is disabled, no need for an extra xfs_quota_reserve_blkres stub. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-05-03 11:15:03 +05:30
Christoph Hellwig	67a841f9d7	xfs: clean up buffer allocation in xlog_do_recovery_pass Merge the initial xlog_alloc_buffer calls, and pass the variable designating the length that is initialized to 1 above instead of passing the open coded 1 directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-05-03 11:10:17 +05:30
Christoph Hellwig	45cf976008	xfs: fix log recovery buffer allocation for the legacy h_size fixup Commit `a70f9fe52d` ("xfs: detect and handle invalid iclog size set by mkfs") added a fixup for incorrect h_size values used for the initial umount record in old xfsprogs versions. Later commit `0c771b99d6` ("xfs: clean up calculation of LR header blocks") cleaned up the log reover buffer calculation, but stoped using the fixed up h_size value to size the log recovery buffer, which can lead to an out of bounds access when the incorrect h_size does not come from the old mkfs tool, but a fuzzer. Fix this by open coding xlog_logrec_hblks and taking the fixed h_size into account for this calculation. Fixes: `0c771b99d6` ("xfs: clean up calculation of LR header blocks") Reported-by: Sam Sun <samsun1006219@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-05-03 11:10:17 +05:30
Chandan Babu R	0370f9bb49	xfs: last round of cleanups for 6.10 Here are the reviewed cleanups at the head of the fsverity series. Apparently there's other work that could use some of these things, so let's try to get it in for 6.10. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZjRotQAKCRBKO3ySh0YR pqEOAPwKcaXNaVfFxAX+Ld5eUnYvDXpBWsVNd9Nd4Fr6zbk52QD+MonEMXarZ3uR rp8HjsonuV0WZYqSqfrlomjOETxeCgA= =U8EZ -----END PGP SIGNATURE----- Merge tag 'xfs-cleanups-6.10_2024-05-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeF xfs: last round of cleanups for 6.10 Here are the reviewed cleanups at the head of the fsverity series. Apparently there's other work that could use some of these things, so let's try to get it in for 6.10. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'xfs-cleanups-6.10_2024-05-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: widen flags argument to the xfs_iflags_* helpers xfs: minor cleanups of xfs_attr3_rmt_blocks xfs: create a helper to compute the blockcount of a max sized remote value xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c	2024-05-03 11:05:39 +05:30
Darrick J. Wong	1a3f1afb25	xfs: widen flags argument to the xfs_iflags_* helpers xfs_inode.i_flags is an unsigned long, so make these helpers take that as the flags argument instead of unsigned short. This is needed for the next patch. While we're at it, remove the iflags variable from xfs_iget_cache_miss because we no longer need it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>	2024-05-02 07:48:37 -07:00
Darrick J. Wong	3791a05329	xfs: minor cleanups of xfs_attr3_rmt_blocks Clean up the type signature of this function since we don't have negative attr lengths or block counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-05-02 07:48:37 -07:00
Darrick J. Wong	204a26aa1d	xfs: create a helper to compute the blockcount of a max sized remote value Create a helper function to compute the number of fsblocks needed to store a maximally-sized extended attribute value. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-05-02 07:48:36 -07:00
Darrick J. Wong	a5714b67ca	xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Turn this into a properly typechecked function, and actually use the correct blocksize for extended attributes. The function cannot be static inline because xfsprogs userspace uses it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-05-02 07:48:36 -07:00
Darrick J. Wong	a86f8671d0	xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c In the next few patches we're going to refactor the attr remote code so that we can support headerless remote xattr values for storing merkle tree blocks. For now, let's change the code to use unsigned int to describe quantities of bytes and blocks that cannot be negative. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-05-02 07:48:35 -07:00
Christoph Hellwig	21255afdd7	xfs: do not allocate the entire delalloc extent in xfs_bmapi_write While trying to convert the entire delalloc extent is a good decision for regular writeback as it leads to larger contigous on-disk extents, but for other callers of xfs_bmapi_write is is rather questionable as it forced them to loop creating new transactions just in case there is no large enough contiguous extent to cover the whole delalloc reservation. Change xfs_bmapi_write to only allocate the passed in range instead, whіle the writeback path through xfs_bmapi_convert_delalloc and xfs_bmapi_allocate still always converts the full extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	d69bee6a35	xfs: fix xfs_bmap_add_extent_delay_real for partial conversions xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent and converts them to a real extent. It is written to deal with any potential overlap of the to be converted range with the delalloc extent, but it turns out that currently only converting the entire extents, or a part starting at the beginning is actually exercised, as the only caller always tries to convert the entire delalloc extent, and either succeeds or at least progresses partially from the start. If it only converts a tiny part of a delalloc extent, the indirect block calculation for the new delalloc extent (da_new) might be equivalent to that of the existing delalloc extent (da_old). If this extent conversion now requires allocating an indirect block that gets accounted into da_new, leading to the assert that da_new must be smaller or equal to da_new unless we split the extent to trigger. Except for the assert that case is actually handled by just trying to allocate more space, as that already handled for the split case (which currently can't be reached at all), so just reusing it should be fine. Except that without dipping into the reserved block pool that would make it a bit too easy to trigger a fs shutdown due to ENOSPC. So in addition to adjusting the assert, also dip into the reserved block pool. Note that I could only reproduce the assert with a change to only convert the actually asked range instead of the full delalloc extent from xfs_bmapi_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	a8bb258f70	xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate Both callers of xfs_bmapi_allocate already initialize bma->prev, don't redo that in xfs_bmapi_allocate. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	2a9b99d45b	xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate xfs_bmapi_allocate currently overwrites offset and len when converting delayed allocations, and duplicates the length cap done for non-delalloc allocations. Move all that logic into the callers to avoid duplication and to make the calling conventions more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	9d06960341	xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct xfs_filblks_t type. Use it in xfs_bmapi_write and slightly adjust the comment document th potential pitfall to take account of this Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	04c609e6e5	xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block allocated by xfs_bmapi_allocate. Lift it into xfs_bmapi_allocate as we should assert the same for xfs_bmapi_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	b11ed354c9	xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate tmp_logflags is initialized to 0 and then ORed into bma->logflags, which isn't actually doing anything. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:19 +05:30
Christoph Hellwig	6773da870a	xfs: fix error returns from xfs_bmapi_write xfs_bmapi_write can return 0 without actually returning a mapping in mval in two different cases: 1) when there is absolutely no space available to do an allocation 2) when converting delalloc space, and the allocation is so small that it only covers parts of the delalloc extent before the range requested by the caller Callers at best can handle one of these cases, but in many cases can't cope with either one. Switch xfs_bmapi_write to always return a mapping or return an error code instead. For case 1) above ENOSPC is the obvious choice which is very much what the callers expect anyway. For case 2) there is no really good error code, so pick a funky one from the SysV streams portfolio. This fixes the reproducer here: https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/ which uses reserved blocks to create file systems that are gravely out of space and thus cause at least xfs_file_alloc_space to hang and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc. Note that this patch does not actually make any caller but xfs_alloc_file_space deal intelligently with case 2) above. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: 刘通 <lyutoon@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-30 09:45:18 +05:30
Zhang Yi	5ce5674187	xfs: convert delayed extents to unwritten when zeroing post eof blocks Current clone operation could be non-atomic if the destination of a file is beyond EOF, user could get a file with corrupted (zeroed) data on crash. The problem is about preallocations. If you write some data into a file: [A...B) and XFS decides to preallocate some post-eof blocks, then it can create a delayed allocation reservation: [A.........D) The writeback path tries to convert delayed extents to real ones by allocating blocks. If there aren't enough contiguous free space, we can end up with two extents, the first real and the second still delalloc: [A....C)[C.D) After that, both the in-memory and the on-disk file sizes are still B. If we clone into the range [E...F) from another file: [A....C)[C.D) [E...F) then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc extent, its pagecache will be zeroed and both the in-memory and on-disk size will be updated to D after flushing but before cloning. This is wrong, because the user can see the size change and read the zeroes while the clone operation is ongoing. We need to keep the in-memory and on-disk size before the clone operation starts, so instead of writing zeroes through the page cache for delayed ranges beyond EOF, we convert these ranges to unwritten and invalidate any cached data over that range beyond EOF. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-29 17:23:11 +05:30
Zhang Yi	2e08371a83	xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset Since xfs_bmapi_convert_delalloc() only attempts to allocate the entire delalloc extent and require multiple invocations to allocate the target offset. So xfs_convert_blocks() add a loop to do this job and we call it in the write back path, but xfs_convert_blocks() isn't a common helper. Let's do it in xfs_bmapi_convert_delalloc() and drop xfs_convert_blocks(), preparing for the post EOF delalloc blocks converting in the buffered write begin path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-29 17:23:11 +05:30
Zhang Yi	fc8d0ba0ff	xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional Allow callers to pass a NULLL seq argument if they don't care about the fork sequence number. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-29 17:23:11 +05:30
Zhang Yi	bb712842a8	xfs: match lock mode in xfs_buffered_write_iomap_begin() Commit `1aa91d9c99` ("xfs: Add async buffered write support") replace xfs_ilock(XFS_ILOCK_EXCL) with xfs_ilock_for_iomap() when locking the writing inode, and a new variable lockmode is used to indicate the lock mode. Although the lockmode should always be XFS_ILOCK_EXCL, it's still better to use this variable instead of useing XFS_ILOCK_EXCL directly when unlocking the inode. Fixes: `1aa91d9c99` ("xfs: Add async buffered write support") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-29 17:23:11 +05:30
Christoph Hellwig	e58ac1770d	xfs: refactor dir format helpers Add a new enum and a xfs_dir2_format helper that returns it to allow the code to switch on the format of a directory in a single operation and switch all helpers of xfs_dir2_isblock and xfs_dir2_isleaf to it. This also removes the explicit xfs_iread_extents call in a few of the call sites given that xfs_bmap_last_offset already takes care of it underneath. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-26 11:21:46 +05:30
Christoph Hellwig	dfe5febe2b	xfs: factor out a xfs_dir_replace_args helper Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-26 11:19:04 +05:30
Christoph Hellwig	3866e6e669	xfs: factor out a xfs_dir_removename_args helper Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-26 11:19:04 +05:30
Christoph Hellwig	4d893a4051	xfs: factor out a xfs_dir_createname_args helper Add a helper to switch between the different directory formats for creating a directory entry and to handle the XFS_DA_OP_JUSTCHECK flag based on the passed in ino number field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-26 11:19:03 +05:30
Christoph Hellwig	14ee22fef4	xfs: factor out a xfs_dir_lookup_args helper Add a helper to switch between the different directory formats for lookup and to handle the -EEXIST return for a successful lookup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-26 11:19:03 +05:30
Jiapeng Chong	08e012a62d	xfs: Remove unused function xrep_dir_self_parent The function are defined in the dir_repair.c file, but not called elsewhere, so delete the unused function. fs/xfs/scrub/dir_repair.c:186:1: warning: unused function 'xrep_dir_self_parent'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8867 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-04-24 12:34:44 +05:30
Chandan Babu R	4b0bf86c17	xfs: minor fixes to online repair [v13.4 9/9] Here are some miscellaneous bug fixes for the online repair code. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NwAKCRBKO3ySh0YR pneLAQCyg4vZ7sYX3tkCrZVYUHTa4Ly5BWCeHoaW+/ylTFm07wEAoakjB9nwPYYC Ftru5vN1pijV3xPqGBjj5iIjlpYyiAc= =qE/f -----END PGP SIGNATURE----- Merge tag 'repair-fixes-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: minor fixes to online repair Here are some miscellaneous bug fixes for the online repair code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-fixes-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: invalidate dentries for a file before moving it to the orphanage xfs: exchange-range for repairs is no longer dynamic xfs: fix iunlock calls in xrep_adoption_trans_alloc xfs: drop the scrub file's iolock when transaction allocation fails	2024-04-24 12:31:13 +05:30
Chandan Babu R	b878dbbe2a	xfs: reduce iget overhead in scrub [v13.4 8/9] This patchset looks to reduce iget overhead in two ways: First, a previous patch conditionally set DONTCACHE on inodes during xchk_irele on the grounds that we knew better at irele time if an inode should be dropped. Unfortunately, over time that patch morphed into a call to d_mark_dontcache, which resulted in inodes being dropped even if they were referenced by the dcache. This actually caused more recycle overhead than if we'd simply called xfs_iget to set DONTCACHE only on misses. The second patch reduces the cost of untrusted iget for a vectored scrub call by having the scrubv code maintain a separate refcount to the inode so that the cache will always hit. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NwAKCRBKO3ySh0YR ppWZAPsGOI6tY2LP4pqA3M2BzsERBICC8uflKeeUbYF12p/c3wEAyLaHWnVhD36D rP/dVedTTuhpZQBAd6mdfA5puq72PgU= =bSz3 -----END PGP SIGNATURE----- Merge tag 'reduce-scrub-iget-overhead-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: reduce iget overhead in scrub This patchset looks to reduce iget overhead in two ways: First, a previous patch conditionally set DONTCACHE on inodes during xchk_irele on the grounds that we knew better at irele time if an inode should be dropped. Unfortunately, over time that patch morphed into a call to d_mark_dontcache, which resulted in inodes being dropped even if they were referenced by the dcache. This actually caused more recycle overhead than if we'd simply called xfs_iget to set DONTCACHE only on misses. The second patch reduces the cost of untrusted iget for a vectored scrub call by having the scrubv code maintain a separate refcount to the inode so that the cache will always hit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'reduce-scrub-iget-overhead-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: only iget the file once when doing vectored scrub-by-handle xfs: use dontcache for grabbing inodes during scrub	2024-04-24 12:27:33 +05:30
Chandan Babu R	496baa2cb9	xfs: vectorize scrub kernel calls [v13.4 7/9] Create a vectorized version of the metadata scrub and repair ioctl, and adapt xfs_scrub to use that. This mitigates the impact of system call overhead on xfs_scrub runtime. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NwAKCRBKO3ySh0YR pik6AQDyqpzUUy7vsJQgRQc/kaHVZevNtHoizvsCfVxI6f4sSgD/Xn9Ek9TnCA+A 21sSZLYbXl+9+eCQgpm7Oj3+fxCZDw8= =HEPK -----END PGP SIGNATURE----- Merge tag 'vectorized-scrub-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: vectorize scrub kernel calls Create a vectorized version of the metadata scrub and repair ioctl, and adapt xfs_scrub to use that. This mitigates the impact of system call overhead on xfs_scrub runtime. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'vectorized-scrub-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: introduce vectored scrub mode xfs: move xfs_ioc_scrub_metadata to scrub.c xfs: reduce the rate of cond_resched calls inside scrub	2024-04-24 12:22:07 +05:30
Chandan Babu R	f7cea94646	xfs: detect and correct directory tree problems [v13.4 6/9] Historically, checking the tree-ness of the directory tree structure has not been complete. Cycles of subdirectories break the tree properties, as do subdirectories with multiple parents. It's easy enough for DFS to detect problems as long as one of the participants is reachable from the root, but this technique cannot find unconnected cycles. Directory parent pointers change that, because we can discover all of these problems from a simple walk from a subdirectory towards the root. For each child we start with, if the walk terminates without reaching the root, we know the path is disconnected and ought to be attached to the lost and found. If we find ourselves, we know this is a cycle and can delete an incoming edge. If we find multiple paths to the root, we know to delete an incoming edge. Even better, once we've finished walking paths, we've identified the good ones and know which other path(s) to remove. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NwAKCRBKO3ySh0YR pg8TAQD/usg8IEtpiHxiewnJPWrxnM5X3e4fmuHRz41pvCntnwD+Oa28Oa9Fmogf Y5AwmdLpy3qugvnHdxhQx4nGKiz8FAA= =wJgH -----END PGP SIGNATURE----- Merge tag 'scrub-directory-tree-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: detect and correct directory tree problems Historically, checking the tree-ness of the directory tree structure has not been complete. Cycles of subdirectories break the tree properties, as do subdirectories with multiple parents. It's easy enough for DFS to detect problems as long as one of the participants is reachable from the root, but this technique cannot find unconnected cycles. Directory parent pointers change that, because we can discover all of these problems from a simple walk from a subdirectory towards the root. For each child we start with, if the walk terminates without reaching the root, we know the path is disconnected and ought to be attached to the lost and found. If we find ourselves, we know this is a cycle and can delete an incoming edge. If we find multiple paths to the root, we know to delete an incoming edge. Even better, once we've finished walking paths, we've identified the good ones and know which other path(s) to remove. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'scrub-directory-tree-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: fix corruptions in the directory tree xfs: report directory tree corruption in the health information xfs: invalidate dirloop scrub path data when concurrent updates happen xfs: teach online scrub to find directory tree structure problems	2024-04-24 12:17:51 +05:30
Chandan Babu R	1da824b0bf	xfs: online repair for parent pointers [v13.4 5/9] This series implements online repair for directory parent pointer metadata. The checking half is fairly straightforward -- for each outgoing directory link (forward or backwards), grab the inode at the other end, and confirm that there's a corresponding link. If we can't grab an inode or lock it, we'll save that link for a slower loop that cycles all the locks, confirms the continued existence of the link, and rechecks the link if it's actually still there. Repairs are a bit more involved -- for directories, we walk the entire filesystem to rebuild the dirents from parent pointer information. Parent pointer repairs do the same walk but rebuild the pptrs from the dirent information, but with the added twist that it duplicates all the xattrs so that it can use the atomic extent swapping code to commit the repairs atomically. This introduces an added twist to the xattr repair code -- we use dirent hooks to detect a colliding update to the pptr data while we're not holding the ILOCKs; if one is detected, we restart the xattr salvaging process but this time hold all the ILOCKs until the end of the scan. For offline repair, the phase6 directory connectivity scan generates an index of all the expected parent pointers in the filesystem. Then it walks each file and compares the parent pointers attached to that file against the index generated, and resyncs the results as necessary. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NgAKCRBKO3ySh0YR pi/tAQCAMBb+xkfs+RR/qmCwCcoyw2rQwEPKOPqhLxfYdlqc5AEAsa64aKliv5vf Z6ShYoEpK4SuwclOqf9qFafN4D6zKg4= =SurM -----END PGP SIGNATURE----- Merge tag 'repair-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: online repair for parent pointers This series implements online repair for directory parent pointer metadata. The checking half is fairly straightforward -- for each outgoing directory link (forward or backwards), grab the inode at the other end, and confirm that there's a corresponding link. If we can't grab an inode or lock it, we'll save that link for a slower loop that cycles all the locks, confirms the continued existence of the link, and rechecks the link if it's actually still there. Repairs are a bit more involved -- for directories, we walk the entire filesystem to rebuild the dirents from parent pointer information. Parent pointer repairs do the same walk but rebuild the pptrs from the dirent information, but with the added twist that it duplicates all the xattrs so that it can use the atomic extent swapping code to commit the repairs atomically. This introduces an added twist to the xattr repair code -- we use dirent hooks to detect a colliding update to the pptr data while we're not holding the ILOCKs; if one is detected, we restart the xattr salvaging process but this time hold all the ILOCKs until the end of the scan. For offline repair, the phase6 directory connectivity scan generates an index of all the expected parent pointers in the filesystem. Then it walks each file and compares the parent pointers attached to that file against the index generated, and resyncs the results as necessary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: inode repair should ensure there's an attr fork to store parent pointers xfs: repair link count of nondirectories after rebuilding parent pointers xfs: adapt the orphanage code to handle parent pointers xfs: actually rebuild the parent pointer xattrs xfs: add a per-leaf block callback to xchk_xattr_walk xfs: split xfs_bmap_add_attrfork into two pieces xfs: remove pointless unlocked assertion xfs: implement live updates for parent pointer repairs xfs: repair directory parent pointers by scanning for dirents xfs: replay unlocked parent pointer updates that accrue during xattr repair xfs: implement live updates for directory repairs xfs: repair directories by scanning directory parent pointers xfs: add raw parent pointer apis to support repair xfs: salvage parent pointers when rebuilding xattr structures xfs: make the reserved block permission flag explicit in xfs_attr_set xfs: remove some boilerplate from xfs_attr_set	2024-04-24 12:13:09 +05:30
Chandan Babu R	0d2dd382a7	xfs: scrubbing for parent pointers [v13.4 4/9] Teach online fsck to use parent pointers to assist in checking directories, parent pointers, extended attributes, and link counts. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NgAKCRBKO3ySh0YR pkk7AQDd/xKCVQDq14c1A9MjzhpTUn9dWeHoU6KUV1xWvtDaygD6A1KBzirREi9O ieKg3OGPcg41sl4bnz7lsvRGxgp/mgY= =XU5l -----END PGP SIGNATURE----- Merge tag 'scrub-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: scrubbing for parent pointers Teach online fsck to use parent pointers to assist in checking directories, parent pointers, extended attributes, and link counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'scrub-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: check parent pointer xattrs when scrubbing xfs: walk directory parent pointers to determine backref count xfs: deferred scrub of parent pointers xfs: scrub parent pointers xfs: deferred scrub of dirents xfs: check dirents have parent pointers xfs: revert commit `44af6c7e59`	2024-04-24 12:06:51 +05:30
Chandan Babu R	47d83c1946	xfs: Parent Pointers [v13.4 3/9] This is the latest parent pointer attributes for xfs. The goal of this patch set is to add a parent pointer attribute to each inode. The attribute name containing the parent inode, generation, and directory offset, while the attribute value contains the file name. This feature will enable future optimizations for online scrub, shrink, nfs handles, verity, or any other feature that could make use of quickly deriving an inodes path from the mount point. Directory parent pointers are stored as namespaced extended attributes of a file. Because parent pointers are an indivisible tuple of (dirent_name, parent_ino, parent_gen) we cannot use the usual attr name lookup functions to find a parent pointer. This is solvable by introducing a new lookup mode that checks both the name and the value of the xattr. Therefore, introduce this new name-value lookup mode that's gated on the XFS_ATTR_PARENT namespace. This requires the introduction of new opcodes for the extended attribute update log intent items, which actually means that parent pointers (itself an INCOMPAT feature) does not depend on the LOGGED_XATTRS log incompat feature bit. To reduce collisions on the dirent names of parent pointers, introduce a new attr hash mode that is the dir2 namehash of the dirent name xor'd with the parent inode number. At this point, Allison has moved on to other things, so I've merged her patchset into djwong-dev for merging. Updates since v12 [djwong]: Rebase on 6.9-rc and update the online fsck design document. Redesign the ondisk format to use the name-value lookups to get us back to the point where the attr is (dirent_name -> parent_ino/gen). Updates since v11 [djwong]: Rebase on 6.4-rc and make some tweaks and bugfixes to enable the repair prototypes. Merge with djwong-dev and make online repair actually work. Updates since v10 [djwong]: Merge in the ondisk format changes to get rid of the diroffset conflicts with the parent pointer repair code, rebase the entire series with the attr vlookup changes first, and merge all the other random fixes. Updates since v9: Reordered patches 2 and 3 to be 6 and 7 xfs: Add xfs_verify_pptr moved parent pointer validators to xfs_parent xfs: Add parent pointer ioctl Extra validation checks for fs id added missing release for the inode use GFP_KERNEL flags for malloc/realloc reworked ioctl to use pptr listenty and flex array NEW xfs: don't remove the attr fork when parent pointers are enabled NEW directory lookups should return diroffsets too NEW xfs: move/add parent pointer validators to xfs_parent Updates since v8: xfs: parent pointer attribute creation Fix xfs_parent_init to release log assist on alloc fail Add slab cache for xfs_parent_defer Fix xfs_create to release after unlock Add xfs_parent_start and xfs_parent_finish wrappers removed unused xfs_parent_name_irec and xfs_init_parent_name_irec xfs: add parent attributes to link Start/finish wrapper updates Fix xfs_link to disallow reservationless quotas xfs: add parent attributes to symlink Fix xfs_symlink to release after unlock Start/finish wrapper updates xfs: remove parent pointers in unlink Start/finish wrapper updates Add missing parent free xfs: Add parent pointers to rename Start/finish wrapper updates Fix rename to only grab logged xattr once Fix xfs_rename to disallow reservationless quotas Fix double unlock on dqattach fail Move parent frees to out_release_wip xfs: Add parent pointers to xfs_cross_rename Hoist parent pointers into rename Questions comments and feedback appreciated! Thanks all! Allison This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NgAKCRBKO3ySh0YR pj2UAQDJwGaKslDfXmMY0JMS/GTrEwD7IFNQ/cE2fIkfoUNDeAEAs1tPS0lST4MM YUC+1DVSkIl+07sV9Xf0q4zMvgJtOQc= =oOSQ -----END PGP SIGNATURE----- Merge tag 'pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: Parent Pointers This is the latest parent pointer attributes for xfs. The goal of this patch set is to add a parent pointer attribute to each inode. The attribute name containing the parent inode, generation, and directory offset, while the attribute value contains the file name. This feature will enable future optimizations for online scrub, shrink, nfs handles, verity, or any other feature that could make use of quickly deriving an inodes path from the mount point. Directory parent pointers are stored as namespaced extended attributes of a file. Because parent pointers are an indivisible tuple of (dirent_name, parent_ino, parent_gen) we cannot use the usual attr name lookup functions to find a parent pointer. This is solvable by introducing a new lookup mode that checks both the name and the value of the xattr. Therefore, introduce this new name-value lookup mode that's gated on the XFS_ATTR_PARENT namespace. This requires the introduction of new opcodes for the extended attribute update log intent items, which actually means that parent pointers (itself an INCOMPAT feature) does not depend on the LOGGED_XATTRS log incompat feature bit. To reduce collisions on the dirent names of parent pointers, introduce a new attr hash mode that is the dir2 namehash of the dirent name xor'd with the parent inode number. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: enable parent pointers xfs: drop compatibility minimum log size computations for reflink xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res xfs: add a incompat feature bit for parent pointers xfs: don't remove the attr fork when parent pointers are enabled xfs: add parent pointer ioctls xfs: split out handle management helpers a bit xfs: move handle ioctl code to xfs_handle.c xfs: pass the attr value to put_listent when possible xfs: don't return XFS_ATTR_PARENT attributes via listxattr xfs: Add parent pointers to xfs_cross_rename xfs: Add parent pointers to rename xfs: remove parent pointers in unlink xfs: add parent attributes to symlink xfs: add parent attributes to link xfs: parent pointer attribute creation xfs: create a hashname function for parent pointers xfs: extend transaction reservations for parent attributes xfs: add parent pointer validator functions xfs: Expose init_xattrs in xfs_create_tmpfile xfs: record inode generation in xattr update log intent items xfs: create attr log item opcodes and formats for parent pointers xfs: refactor xfs_is_using_logged_xattrs checks in attr item recovery xfs: allow xattr matching on name and value for parent pointers xfs: define parent pointer ondisk extended attribute format xfs: add parent pointer support to attribute code xfs: create a separate hashname function for extended attributes xfs: move xfs_attr_defer_add to xfs_attr_item.c xfs: check the flags earlier in xfs_attr_match xfs: rearrange xfs_attr_match parameters	2024-04-24 11:54:37 +05:30
Chandan Babu R	d7d02f750a	xfs: improve extended attribute validation [v13.4 2/9] Prior to introducing parent pointer extended attributes, let's spend some time cleaning up the attr code and strengthening the validation that it performs on attrs coming in from the disk. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NgAKCRBKO3ySh0YR pojvAQD4AF7OO0MFZ8XcLqOAC8IJv/O690+cg7xEtnk5U8NviwD/Yakjfvqo4lLD N0py4DXg/OIY08gF2BZpLUOiZdSTAAg= =l8aH -----END PGP SIGNATURE----- Merge tag 'improve-attr-validation-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: improve extended attribute validation Prior to introducing parent pointer extended attributes, let's spend some time cleaning up the attr code and strengthening the validation that it performs on attrs coming in from the disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'improve-attr-validation-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: enforce one namespace per attribute xfs: refactor name/value iovec validation in xlog_recover_attri_commit_pass2 xfs: refactor name/length checks in xfs_attri_validate xfs: use local variables for name and value length in _attri_commit_pass2 xfs: always set args->value in xfs_attri_item_recover xfs: validate recovered name buffers when recovering xattr items xfs: use helpers to extract xattr op from opflags xfs: restructure xfs_attr_complete_op a bit xfs: check shortform attr entry flags specifically xfs: fix missing check for invalid attr flags xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2 xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf	2024-04-24 11:50:04 +05:30
Chandan Babu R	1321890a1b	xfs: shrink struct xfs_da_args [v13.4 1/9] Let's clean out some unused flags and fields from struct xfs_da_args. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZih1NgAKCRBKO3ySh0YR pr16AQDIj2NMtagLvPz/rSFcJPZ4VsGVor0zGQDK08WI79l3dAD/boFN5VpMLGnE zYE0nXT25OAxo5i97kyQ1WvKyq9uFQM= =A3+W -----END PGP SIGNATURE----- Merge tag 'shrink-dirattr-args-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC xfs: shrink struct xfs_da_args Let's clean out some unused flags and fields from struct xfs_da_args. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'shrink-dirattr-args-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: rearrange xfs_da_args a bit to use less space xfs: make attr removal an explicit operation xfs: remove xfs_da_args.attr_flags xfs: remove XFS_DA_OP_NOTIME xfs: remove XFS_DA_OP_REMOVE	2024-04-24 11:14:36 +05:30
Darrick J. Wong	5e1c7d0b29	xfs: invalidate dentries for a file before moving it to the orphanage Invalidate the cached dentries that point to the file that we're moving to lost+found before we actually move it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:19 -07:00
Darrick J. Wong	6d335233fe	xfs: exchange-range for repairs is no longer dynamic The atomic file exchange-range functionality is now a permanent filesystem feature instead of a dynamic log-incompat feature. It cannot be turned on at runtime, so we no longer need the XCHK_FSGATES flags and whatnot that supported it. Remove the flag and the enable function, and move the xfs_has_exchange_range checks to the start of the repair functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:19 -07:00
Darrick J. Wong	b44bfc0695	xfs: fix iunlock calls in xrep_adoption_trans_alloc If the transaction allocation in xrep_adoption_trans_alloc fails, we should drop only the locks that we took. In this case this is ILOCK_EXCL of both the orphanage and the file being repaired. Dropping any IOLOCK here is incorrect. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:19 -07:00
Darrick J. Wong	4ad350ac58	xfs: only iget the file once when doing vectored scrub-by-handle If a program wants us to perform a scrub on a file handle and the fd passed to ioctl() is not the file referenced in the handle, iget the file once and pass it into the scrub code. This amortizes the untrusted iget lookup over /all/ the scrubbers mentioned in the scrubv call. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime on account of avoiding repeated inobt lookups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:18 -07:00
Darrick J. Wong	c77b37584c	xfs: introduce vectored scrub mode Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored mode. The caller specifies the principal metadata object that they want to scrub (allocation group, inode, etc.) once, followed by an array of scrub types they want called on that object. The kernel runs the scrub operations and writes the output flags and errno code to the corresponding array element. A new pseudo scrub type BARRIER is introduced to force the kernel to return to userspace if any corruptions have been found when scrubbing the previous scrub types in the array. This enables userspace to schedule, for example, the sequence: 1. data fork 2. barrier 3. directory If the data fork scrub is clean, then the kernel will perform the directory scrub. If not, the barrier in 2 will exit back to userspace. The alternative would have been an interface where userspace passes a pointer to an empty buffer, and the kernel formats that with xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome was. With that the kernel would have to communicate that the buffer needed to have been at least X size, even though for our cases XFS_SCRUB_TYPE_NR + 2 would always be enough. Compared to that, this design keeps all the dependency policy and ordering logic in userspace where it already resides instead of duplicating it in the kernel. The downside of that is that it needs the barrier logic. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime due to fewer transitions across the system call boundary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:18 -07:00
Darrick J. Wong	6691753752	xfs: drop the scrub file's iolock when transaction allocation fails If the transaction allocation in the !orphanage_available case of xrep_nlinks_repair_inode fails, we need to drop the IOLOCK of the file being scrubbed before exiting. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:18 -07:00
Darrick J. Wong	b27ce0da60	xfs: use dontcache for grabbing inodes during scrub Back when I wrote commit `a03297a0ca`, I had thought that we'd be doing users a favor by only marking inodes dontcache at the end of a scrub operation, and only if there's only one reference to that inode. This was more or less true back when I_DONTCACHE was an XFS iflag and the only thing it did was change the outcome of xfs_fs_drop_inode to 1. Note: If there are dentries pointing to the inode when scrub finishes, the inode will have positive i_count and stay around in cache until dentry reclaim. But now we have d_mark_dontcache, which cause the inode and the dentries attached to it all to be marked I_DONTCACHE, which means that we drop the dentries ASAP, which drops the inode ASAP. This is bad if scrub found problems with the inode, because now they can be scheduled for inactivation, which can cause inodegc to trip on it and shut down the filesystem. Even if the inode isn't bad, this is still suboptimal because phases 3-7 each initiate inode scans. Dropping the inode immediately during phase 3 is silly because phase 5 will reload it and drop it immediately, etc. It's fine to mark the inodes dontcache, but if there have been accesses to the file that set up dentries, we should keep them. I validated this by setting up ftrace to capture xfs_iget_recycle* tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per 30 seconds. Therefore, grab the inode with XFS_IGET_DONTCACHE, which only has the effect of setting I_DONTCACHE for cache misses. Remove the d_mark_dontcache call that can happen in xchk_irele. Fixes: `a03297a0ca` ("xfs: manage inode DONTCACHE status at irele time") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:18 -07:00
Darrick J. Wong	3f31406aef	xfs: fix corruptions in the directory tree Repair corruptions in the directory tree itself. Cycles are broken by removing an incoming parent->child link. Multiply-owned directories are fixed by pruning the extra parent -> child links Disconnected subtrees are reconnected to the lost and found. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:17 -07:00
Darrick J. Wong	be7cf174e9	xfs: move xfs_ioc_scrub_metadata to scrub.c Move the scrub ioctl handler to scrub.c to keep the code together and to reduce unnecessary code when CONFIG_XFS_ONLINE_SCRUB=n. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:17 -07:00
Darrick J. Wong	37056912d5	xfs: report directory tree corruption in the health information Report directories that are the source of corruption in the directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:17 -07:00
Darrick J. Wong	271557de7c	xfs: reduce the rate of cond_resched calls inside scrub We really don't want to call cond_resched every single time we go through a loop in scrub -- there may be billions of records, and probing into the scheduler itself has overhead. Reduce this overhead by only calling cond_resched 10x per second; and add a counter so that we only check jiffies once every 1000 records or so. Surprisingly, this reduces scrub-only fstests runtime by about 2%. I used the bmapinflate xfs_db command to produce a billion-extent file and this stupid gadget reduced the scrub runtime by about 4%. From a stupid microbenchmark of calling these things 1 billion times, I estimate that cond_resched costs about 5.5ns per call; jiffes costs about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per call. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-04-23 16:55:17 -07:00

1 2 3 4 5 ...

1266002 Commits