linux/fs
Filipe Manana 5afd80c393 btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range
commit f0bfa76a11 upstream.

When doing a direct IO write against a file range that either has
preallocated extents in that range or has regular extents and the file
has the NOCOW attribute set, the write fails with -ENOSPC when all of
the following conditions are met:

1) There are no data blocks groups with enough free space matching
   the size of the write;

2) There's not enough unallocated space for allocating a new data block
   group;

3) The extents in the target file range are not shared, neither through
   snapshots nor through reflinks.

This is wrong because a NOCOW write can be done in such case, and in fact
it's possible to do it using a buffered IO write, since when failing to
allocate data space, the buffered IO path checks if a NOCOW write is
possible.

The failure in direct IO write path comes from the fact that early on,
at btrfs_dio_iomap_begin(), we try to allocate data space for the write
and if it that fails we return the error and stop - we never check if we
can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check
if we can do a NOCOW write into the range, or a subset of the range, and
then release the previously reserved data space.

Fix this by doing the data reservation only if needed, when we must COW,
at btrfs_get_blocks_direct_write() instead of doing it at
btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes
the inneficiency of doing unnecessary data reservations.

The following example test script reproduces the problem:

  $ cat dio-nocow-enospc.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  # Use a small fixed size (1G) filesystem so that it's quick to fill
  # it up.
  # Make sure the mixed block groups feature is not enabled because we
  # later want to not have more space available for allocating data
  # extents but still have enough metadata space free for the file writes.
  mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV
  mount $DEV $MNT

  # Create our test file with the NOCOW attribute set.
  touch $MNT/foobar
  chattr +C $MNT/foobar

  # Now fill in all unallocated space with data for our test file.
  # This will allocate a data block group that will be full and leave
  # no (or a very small amount of) unallocated space in the device, so
  # that it will not be possible to allocate a new block group later.
  echo
  echo "Creating test file with initial data..."
  xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar

  # Now try a direct IO write against file range [0, 10M[.
  # This should succeed since this is a NOCOW file and an extent for the
  # range was previously allocated.
  echo
  echo "Trying direct IO write over allocated space..."
  xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar

  umount $MNT

When running the test:

  $ ./dio-nocow-enospc.sh
  (...)

  Creating test file with initial data...
  wrote 943718400/943718400 bytes at offset 0
  900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec)

  Trying direct IO write over allocated space...
  pwrite: No space left on device

A test case for fstests will follow, testing both this direct IO write
scenario as well as the buffered IO write scenario to make it less likely
to get future regressions on the buffered IO case.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:12:46 +01:00
..
9p Revert "fs/9p: search open fids first" 2022-02-08 18:34:04 +01:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
afs afs: Fix mmap 2021-12-22 09:32:45 +01:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range 2022-03-08 19:12:46 +01:00
cachefiles cachefiles: Change %p in format strings to something else 2021-08-27 13:34:02 +01:00
ceph ceph: put the requests/sessions when it fails to alloc memory 2022-02-01 17:27:14 +01:00
cifs cifs: fix confusing unneeded warning message on smb2.1 and earlier 2022-03-08 19:12:41 +01:00
coda
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-03-02 11:48:02 +01:00
cramfs
crypto fscrypt: allow 256-bit master keys with AES-256-XTS 2021-11-18 19:16:11 +01:00
debugfs debugfs: lockdown: Allow reading debugfs files that are not world readable 2022-01-27 11:03:55 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-02-01 17:27:01 +01:00
dlm fs: dlm: filter user dlm messages for kernel locks 2022-01-27 11:04:23 +01:00
ecryptfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
efivarfs
efs
erofs erofs: fix deadlock when shrink erofs slab 2021-12-01 09:04:50 +01:00
exfat exfat: fix i_blocks for files truncated over 4 GiB 2022-03-08 19:12:32 +01:00
exportfs
ext2 ext2: fix sleeping in atomic bugs on error 2021-09-22 13:05:23 +02:00
ext4 ext4: fast commit may miss file actions 2022-03-08 19:12:32 +01:00
f2fs f2fs: fix to check available space of CP area correctly in update_ckpt_flags() 2022-01-27 11:05:30 +01:00
fat linux-kselftest-kunit-5.15-rc1 2021-09-02 12:32:12 -07:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse fuse: Pass correct lend value to filemap_write_and_wait_range() 2022-01-27 11:05:08 +01:00
gfs2 gfs2: Fix gfs2_release for non-writers regression 2022-02-16 12:56:18 +01:00
hfs hfs: add lock nesting notation to hfs_find_init 2021-07-15 10:13:49 -07:00
hfsplus hfsplus: report create_date to kstat.btime 2021-07-01 11:06:06 -07:00
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs hpfs: use iomap_fiemap to implement ->fiemap 2021-07-27 11:00:36 +02:00
hugetlbfs hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() 2022-03-08 19:12:38 +01:00
iomap iomap: Fix inline extent handling in iomap_readpage 2021-12-01 09:04:44 +01:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-11-12 15:05:50 +01:00
jbd2 ext4: fast commit may not fallback for ineligible commit 2022-03-08 19:12:32 +01:00
jffs2 jffs2: GC deadlock reading a page that is used in jffs2_write_begin() 2022-01-27 11:04:49 +01:00
jfs JFS: fix memleak in jfs_mount 2021-11-18 19:16:48 +01:00
kernfs kernfs: don't create a negative dentry if inactive node exists 2021-10-04 10:27:18 +02:00
ksmbd ksmbd: don't align last entry offset in smb2 query directory 2022-02-23 12:03:18 +01:00
lockd lockd: fix failure to cleanup client locks 2022-02-05 12:38:57 +01:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: fix parameter of cleanup() 2021-12-29 12:28:59 +01:00
nfs NFS: Do not report writeback errors in nfs_getattr() 2022-02-23 12:03:15 +01:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd nfsd: fix crash on COPY_NOTIFY with special stateid 2022-03-08 19:12:36 +01:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
nls
notify fanotify: Fix stale file descriptor in copy_event_to_user() 2022-02-05 12:38:59 +01:00
ntfs Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:30:04 -07:00
ntfs3 Fixed xfstests generic/016 generic/021 generic/022 generic/041 generic/274 generic/423, 2021-10-15 09:58:11 -04:00
ocfs2 ocfs2: fix a deadlock when commit trans 2022-02-01 17:27:05 +01:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2022-01-20 09:13:13 +01:00
overlayfs ovl: fix NULL pointer dereference in copy up warning 2022-02-05 12:38:59 +01:00
proc fs/proc: task_mmu.c: don't read mapcount for migration entry 2022-02-23 12:03:02 +01:00
pstore pstore/blk: Use "%lu" to format unsigned long 2021-11-25 09:48:42 +01:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota quota: make dquot_quota_sync return errors from ->sync_fs 2022-02-23 12:03:06 +01:00
ramfs fs: move ramfs_aops to libfs 2021-06-29 10:53:48 -07:00
reiserfs Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
romfs
smbfs_common cifs: Fix crash on unload of cifs_arc4.ko 2021-12-14 10:57:12 +01:00
squashfs squashfs: use bvec_virt 2021-08-16 10:50:32 -06:00
sysfs sysfs: Allow deferred execution of iomem_get_mapping() 2021-08-06 13:05:28 +02:00
sysv mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
tracefs tracefs: Set the group ownership in apply_options() not parse_options() 2022-03-02 11:48:05 +01:00
ubifs ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers 2022-01-27 11:05:07 +01:00
udf udf: Fix NULL ptr deref when converting from inline format 2022-02-01 17:27:00 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate 2022-01-11 15:35:16 +01:00
zonefs zonefs: add MODULE_ALIAS_FS 2021-12-22 09:32:48 +01:00
aio.c aio: Fix incorrect usage of eventfd_signal_allowed() 2021-12-14 10:57:22 +01:00
anon_inodes.c
attr.c fs: handle circular mappings correctly 2021-11-25 09:48:46 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf_fdpic.c binfmt: remove in-tree usage of MAP_DENYWRITE 2021-09-03 18:42:01 +02:00
binfmt_elf.c elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings 2021-10-03 14:02:58 -07:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c
binfmt_script.c
buffer.c mm: fs: invalidate bh_lrus for only cold path 2021-09-24 16:13:35 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: fix memleak in dump_vma_snapshot() 2021-09-08 11:50:27 -07:00
d_path.c d_path: make 'prepend()' fill up the buffer exactly on overflow 2021-09-02 10:07:29 -07:00
dax.c New code for 5.15: 2021-08-31 11:13:35 -07:00
dcache.c
direct-io.c
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c ARM development updates for 5.15: 2021-09-09 13:25:49 -07:00
exec.c signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) 2021-11-25 09:49:06 +01:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c
file_table.c
file.c fget: clarify and improve __fget_files() implementation 2022-01-16 09:12:42 +01:00
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-20 09:13:14 +01:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fsopen.c
init.c
inode.c fs: export an inode_update_time helper 2021-11-25 09:49:08 +01:00
internal.h block: move fs/block_dev.c to block/bdev.c 2021-09-07 08:39:40 -06:00
io_uring.c io_uring: fix no lock protection for ctx->cq_extra 2022-03-08 19:12:33 +01:00
io-wq.c io-wq: drop wqe lock before creating new worker 2021-12-22 09:32:51 +01:00
io-wq.h io-wq: provide a way to limit max number of workers 2021-08-29 07:55:55 -06:00
ioctl.c New code for 5.15: 2021-08-31 11:06:32 -07:00
Kconfig 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
locks.c Revert "memcg: enable accounting for file lock caches" 2021-09-07 11:21:48 -07:00
Makefile 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
mbcache.c
mount.h
mpage.c
namei.c fsnotify: invalidate dcache before IN_DELETE event 2022-02-01 17:27:15 +01:00
namespace.c fs/mount_setattr: always cleanup mount_kattr 2022-01-05 12:42:39 +01:00
no-block.c
nsfs.c
open.c mm, thp: fix incorrect unmap behavior for private pages 2021-11-18 19:17:17 +01:00
pipe.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
pnode.c
pnode.h
posix_acl.c ovl: enable RCU'd ->get_acl() 2021-08-18 22:08:24 +02:00
proc_namespace.c
read_write.c fs: clean up after mandatory file locking support removal 2021-08-24 07:52:45 -04:00
readdir.c
remap_range.c fs: remove mandatory file locking support 2021-08-23 06:15:36 -04:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-29 10:58:25 +01:00
seq_file.c seq_file: disallow extremely large seq buffer allocations 2021-07-19 17:18:48 -07:00
signalfd.c signalfd: use wake_up_pollfree() 2021-12-14 10:57:15 +01:00
splice.c
stack.c
stat.c fs: add generic helper for filling statx attribute flags 2021-08-17 11:47:43 +02:00
statfs.c
super.c vfs: make freeze_super abort when sync_filesystem returns error 2022-02-23 12:03:05 +01:00
sync.c
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c userfaultfd: fix a race between writeprotect and exit_mmap() 2021-10-18 20:22:02 -10:00
utimes.c
xattr.c