Commit Graph

32 Commits

Author SHA1 Message Date
Linus Torvalds
997f9640c9 fsverity updates for 7.0
fsverity cleanups, speedup, and memory usage optimization from
 Christoph Hellwig:
 
 - Move some logic into common code
 
 - Fix btrfs to reject truncates of fsverity files
 
 - Improve the readahead implementation
 
 - Store each inode's fsverity_info in a hash table instead of using a
   pointer in the filesystem-specific part of the inode.
 
   This optimizes for memory usage in the usual case where most files
   don't have fsverity enabled.
 
 - Look up the fsverity_info fewer times during verification, to
   amortize the hash table overhead
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaY0nZhQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK/AVAP9wSLEYsG3dqnNIHjIvLeK+9NC3Ni4d
 m+fvT1JfuideOwEA9r2EfztusLU5iyqWJlHyxekibXItUDgYGltaYb7eXAU=
 =a+To
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity updates from Eric Biggers:
 "fsverity cleanups, speedup, and memory usage optimization from
  Christoph Hellwig:

   - Move some logic into common code

   - Fix btrfs to reject truncates of fsverity files

   - Improve the readahead implementation

   - Store each inode's fsverity_info in a hash table instead of using a
     pointer in the filesystem-specific part of the inode.

     This optimizes for memory usage in the usual case where most files
     don't have fsverity enabled.

   - Look up the fsverity_info fewer times during verification, to
     amortize the hash table overhead"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: remove inode from fsverity_verification_ctx
  fsverity: use a hashtable to find the fsverity_info
  btrfs: consolidate fsverity_info lookup
  f2fs: consolidate fsverity_info lookup
  ext4: consolidate fsverity_info lookup
  fs: consolidate fsverity_info lookup in buffer.c
  fsverity: push out fsverity_info lookup
  fsverity: deconstify the inode pointer in struct fsverity_info
  fsverity: kick off hash readahead at data I/O submission time
  ext4: move ->read_folio and ->readahead to readpage.c
  readahead: push invalidate_lock out of page_cache_ra_unbounded
  fsverity: don't issue readahead for non-ENOENT errors from __filemap_get_folio
  fsverity: start consolidating pagecache code
  fsverity: pass struct file to ->write_merkle_tree_block
  f2fs: don't build the fsverity work handler for !CONFIG_FS_VERITY
  ext4: don't build the fsverity work handler for !CONFIG_FS_VERITY
  fs,fsverity: clear out fsverity_info from common code
  fs,fsverity: reject size changes on fsverity files in setattr_prepare
2026-02-12 10:41:34 -08:00
Christoph Hellwig
f77f281b61 fsverity: use a hashtable to find the fsverity_info
Use the kernel's resizable hash table (rhashtable) to find the
fsverity_info.  This way file systems that want to support fsverity don't
have to bloat every inode in the system with an extra pointer.  The
trade-off is that looking up the fsverity_info is a bit more expensive
now, but the main operations are still dominated by I/O and hashing
overhead.

The rhashtable implementations requires no external synchronization, and
the _fast versions of the APIs provide the RCU critical sections required
by the implementation.  Because struct fsverity_info is only removed on
inode eviction and does not contain a reference count, there is no need
for an extended critical section to grab a reference or validate the
object state.  The file open path uses rhashtable_lookup_get_insert_fast,
which can either find an existing object for the hash key or insert a
new one in a single atomic operation, so that concurrent opens never
instantiate duplicate fsverity_info structure.  FS_IOC_ENABLE_VERITY must
already be synchronized by a combination of i_rwsem and file system flags
and uses rhashtable_lookup_insert_fast, which errors out on an existing
object for the hash key as an additional safety check.

Because insertion into the hash table now happens before S_VERITY is set,
fsverity just becomes a barrier and a flag check and doesn't have to look
up the fsverity_info at all, so there is only a single lookup per
->read_folio or ->readahead invocation.  For btrfs there is an additional
one for each bio completion, while for ext4 and f2fs the fsverity_info
is stored in the per-I/O context and reused for the completion workqueue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Link: https://lore.kernel.org/r/20260202060754.270269-12-hch@lst.de
[EB: folded in fix for missing fsverity_free_info()]
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-04 11:31:54 -08:00
Christoph Hellwig
f1a6cf44b3 fsverity: kick off hash readahead at data I/O submission time
Currently all reads of the fsverity hashes are kicked off from the data
I/O completion handler, leading to needlessly dependent I/O.  This is
worked around a bit by performing readahead on the level 0 nodes, but
still fairly ineffective.

Switch to a model where the ->read_folio and ->readahead methods instead
kick off explicit readahead of the fsverity hashed so they are usually
available at I/O completion time.

For 64k sequential reads on my test VM this improves read performance
from 2.4GB/s - 2.6GB/s to 3.5GB/s - 3.9GB/s.  The improvements for
random reads are likely to be even bigger.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Link: https://lore.kernel.org/r/20260202060754.270269-5-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-02 17:15:26 -08:00
Christoph Hellwig
821ddd25fb fsverity: start consolidating pagecache code
ext4 and f2fs are largely using the same code to read a page full
of Merkle tree blocks from the page cache, and the upcoming xfs
fsverity support would add another copy.

Move the ext4 code to fs/verity/ and use it in f2fs as well.  For f2fs
this removes the previous f2fs-specific error injection, but otherwise
the behavior remains unchanged.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Link: https://lore.kernel.org/r/20260128152630.627409-7-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-29 09:39:41 -08:00
Christoph Hellwig
ac09a30900 fsverity: pass struct file to ->write_merkle_tree_block
This will make an iomap implementation of the method easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Link: https://lore.kernel.org/r/20260128152630.627409-6-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-29 09:39:41 -08:00
Li Chen
16d43b9748 ext4: mark fs-verity enable fast-commit ineligible
Fast commits only log operations that have dedicated replay support.
Enabling fs-verity builds a Merkle tree and updates inode and orphan
state in ways that are not described by the fast commit replay tags.
In practice these operations are rare and usually followed by further
updates, but mixing them into a fast commit makes the overall
semantics harder to reason about and risks replay gaps if new call
sites appear.

Teach ext4 to mark the filesystem fast-commit ineligible when
ext4_end_enable_verity() starts its journal transaction.
This forces that transaction to fall back to a full commit, ensuring
that the fs-verity enable changes are captured by the normal journal
rather than partially encoded in fast commit TLVs.
This change should not affect common workloads but makes fs-verity
enable safer and easier to reason about under fast commit.

Testing:
1. prepare:
    dd if=/dev/zero of=/root/fc_verity.img bs=1M count=0 seek=128
    mkfs.ext4 -O fast_commit,verity -F /root/fc_verity.img
    mkdir -p /mnt/fc_verity && mount -t ext4 -o loop /root/fc_verity.img /mnt/fc_verity
2. Enabled fs-verity on a file and verified reason accounting:
    echo "data" > /mnt/fc_verity/verityfile
    /root/enable_verity /mnt/fc_verity/verityfile
    sync
    tail -n 1 /proc/fs/ext4/loop0/fc_info
    "fs-verity enable":     1
3. Enabled fs-verity on a second file, fsynced it, and checked that the
   ineligible commit counter is updated too:
    echo "data2" > /mnt/fc_verity/verityfile2
    /root/enable_verity /mnt/fc_verity/verityfile2
    /root/fsync_file /mnt/fc_verity/verityfile2
    sync
    /proc/fs/ext4/loop0/fc_info shows "fs-verity enable" incremented and
    fc stats ineligible increased accordingly.

Signed-off-by: Li Chen <me@linux.beauty>
Link: https://patch.msgid.link/20251211115146.897420-3-me@linux.beauty
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-01-19 19:26:35 -05:00
Baokun Li
125d1f6a5a ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion
No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-10-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-28 22:35:27 -05:00
Eric Biggers
c9fff804b5
ext4: move verity info pointer to fs-specific part of inode
Move the fsverity_info pointer into the filesystem-specific part of the
inode by adding the field ext4_inode_info::i_verity_info and configuring
fsverity_operations::inode_info_offs accordingly.

This is a prerequisite for a later commit that removes
inode::i_verity_info, saving memory and improving cache efficiency on
filesystems that don't support fsverity.

Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/20250810075706.172910-10-ebiggers@kernel.org
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-21 13:58:08 +02:00
Matthew Wilcox (Oracle)
1da86618bd
fs: Convert aops->write_begin to take a folio
Convert all callers from working on a page to working on one page
of a folio (support for working on an entire folio can come later).
Removes a lot of folio->page->folio conversions.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:33:21 +02:00
Matthew Wilcox (Oracle)
a225800f32
fs: Convert aops->write_end to take a folio
Most callers have a folio, and most implementations operate on a folio,
so remove the conversion from folio->page->folio to fit through this
interface.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:32:02 +02:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Matthew Wilcox
e9ebecf266 ext4: Use a folio in ext4_read_merkle_tree_page
This is an implementation of fsverity_operations read_merkle_tree_page,
so it must still return the precise page asked for, but we can use the
folio API to reduce the number of conversions between folios & pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-30-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
b23fb76278 ext4: Convert pagecache_read() to use a folio
Use the folio API and support folios of arbitrary sizes.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-29-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Eric Biggers
72ea15f0dd fsverity: pass pos and size to ->write_merkle_tree_block
fsverity_operations::write_merkle_tree_block is passed the index of the
block to write and the log base 2 of the block size.  However, all
implementations of it use these parameters only to calculate the
position and the size of the block, in bytes.

Therefore, make ->write_merkle_tree_block take 'pos' and 'size'
parameters instead of 'index' and 'log_blocksize'.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20221214224304.145712-5-ebiggers@kernel.org
2023-01-01 15:46:48 -08:00
Alexander Potapenko
956510c0c7 fs: ext4: initialize fsdata in pagecache_write()
When aops->write_begin() does not initialize fsdata, KMSAN reports
an error passing the latter to aops->write_end().

Fix this by unconditionally initializing fsdata.

Cc: Eric Biggers <ebiggers@kernel.org>
Fixes: c93d8f8858 ("ext4: add basic fs-verity support")
Reported-by: syzbot+9767be679ef5016b6082@syzkaller.appspotmail.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221121112134.407362-1-glider@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-12-08 21:49:25 -05:00
Linus Torvalds
5e714bf171 - Alistair Popple has a series which addresses a race which causes page
refcounting errors in ZONE_DEVICE pages.
 
 - Peter Xu fixes some userfaultfd test harness instability.
 
 - Various other patches in MM, mainly fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0j6igAKCRDdBJ7gKXxA
 jnGxAP99bV39ZtOsoY4OHdZlWU16BUjKuf/cb3bZlC2G849vEwD+OKlij86SG20j
 MGJQ6TfULJ8f1dnQDd6wvDfl3FMl7Qc=
 =tbdp
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - fix a race which causes page refcounting errors in ZONE_DEVICE pages
   (Alistair Popple)

 - fix userfaultfd test harness instability (Peter Xu)

 - various other patches in MM, mainly fixes

* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
  highmem: fix kmap_to_page() for kmap_local_page() addresses
  mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
  mm/selftest: uffd: explain the write missing fault check
  mm/hugetlb: use hugetlb_pte_stable in migration race check
  mm/hugetlb: fix race condition of uffd missing/minor handling
  zram: always expose rw_page
  LoongArch: update local TLB if PTE entry exists
  mm: use update_mmu_tlb() on the second thread
  kasan: fix array-bounds warnings in tests
  hmm-tests: add test for migrate_device_range()
  nouveau/dmem: evict device private memory during release
  nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
  mm/migrate_device.c: add migrate_device_range()
  mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
  mm/memremap.c: take a pgmap reference on page allocation
  mm: free device private pages have zero refcount
  mm/memory.c: fix race when faulting a device private page
  mm/damon: use damon_sz_region() in appropriate place
  mm/damon: move sz_damon_region to damon_sz_region
  lib/test_meminit: add checks for the allocation functions
  ...
2022-10-14 12:28:43 -07:00
Matthew Wilcox (Oracle)
4fa0e3ff21 ext4,f2fs: fix readahead of verity data
The recent change of page_cache_ra_unbounded() arguments was buggy in the
two callers, causing us to readahead the wrong pages.  Move the definition
of ractl down to after the index is set correctly.  This affected
performance on configurations that use fs-verity.

Link: https://lkml.kernel.org/r/20221012193419.1453558-1-willy@infradead.org
Fixes: 73bb49da50 ("mm/readahead: make page_cache_ra_unbounded take a readahead_control")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Jintao Yin <nicememory@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:48 -07:00
Ye Bin
7ff5fddadd ext4: factor out ext4_free_ext_path()
Factor out ext4_free_ext_path() to free extent path. As after previous patch
'ext4_ext_drop_refs()' is only used in 'extents.c', so make it static.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220924021211.3831551-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-30 23:46:54 -04:00
Matthew Wilcox (Oracle)
1b0aa4449c ext4: Call aops write_begin() and write_end() directly
pagecache_write_begin() and pagecache_write_end() are now trivial
wrappers, so call the aops directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08 14:45:56 -04:00
Linus Torvalds
9f67672a81 New features for ext4 this cycle include support for encrypted
casefold, ensure that deleted file names are cleared in directory
 blocks by zeroing directory entries when they are unlinked or moved as
 part of a hash tree node split.  We also improve the block allocator's
 performance on a freshly mounted file system by prefetching block
 bitmaps.
 
 There are also the usual cleanups and bug fixes, including fixing a
 page cache invalidation race when there is mixed buffered and direct
 I/O and the block size is less than page size, and allow the dax flag
 to be set and cleared on inline directories.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmCLei4ACgkQ8vlZVpUN
 gaPZkgf/VH08xjMf3VthC+BpvVmChQXfV4yjigHbO2pmPyYWZhyJzkEGCQD8u2eB
 b7ShW+B1NCifcTU34xAkKHwEtakzzEv3WIMrT1oZNWrpfo8tt850EkwQggaGGDpd
 /HnP1/wLtziJ5hE6DwutmX7qB4VFghVj898MjDrEPSOBqItOjWps9mn/JWL7SHyI
 Dqzhf5XZTYPaXWuJmSmKw3q8O70JDHnZe/rRWlfX1jLI5KDtqp71Nw1B+gszUB66
 IUdncyZKvInsyjYhkbCQ8U6WFih82MrbKeuGYDp/RFvg5eMELEYkwT9j0ofuDHq8
 zn62sAlbOXv1DiqkPDHKVm9GkHx8/g==
 =UpnH
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "New features for ext4 this cycle include support for encrypted
  casefold, ensure that deleted file names are cleared in directory
  blocks by zeroing directory entries when they are unlinked or moved as
  part of a hash tree node split. We also improve the block allocator's
  performance on a freshly mounted file system by prefetching block
  bitmaps.

  There are also the usual cleanups and bug fixes, including fixing a
  page cache invalidation race when there is mixed buffered and direct
  I/O and the block size is less than page size, and allow the dax flag
  to be set and cleared on inline directories"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
  ext4: wipe ext4_dir_entry2 upon file deletion
  ext4: Fix occasional generic/418 failure
  fs: fix reporting supported extra file attributes for statx()
  ext4: allow the dax flag to be set and cleared on inline directories
  ext4: fix debug format string warning
  ext4: fix trailing whitespace
  ext4: fix various seppling typos
  ext4: fix error return code in ext4_fc_perform_commit()
  ext4: annotate data race in jbd2_journal_dirty_metadata()
  ext4: annotate data race in start_this_handle()
  ext4: fix ext4_error_err save negative errno into superblock
  ext4: fix error code in ext4_commit_super
  ext4: always panic when errors=panic is specified
  ext4: delete redundant uptodate check for buffer
  ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
  ext4: make prefetch_block_bitmaps default
  ext4: add proc files to monitor new structures
  ext4: improve cr 0 / cr 1 group scanning
  ext4: add MB_NUM_ORDERS macro
  ext4: add mballoc stats proc file
  ...
2021-04-30 15:35:30 -07:00
Matthew Wilcox (Oracle)
fcd9ae4f7f mm/filemap: Pass the file_ra_state in the ractl
For readahead_expand(), we need to modify the file ra_state, so pass it
down by adding it to the ractl.  We have to do this because it's not always
the same as f_ra in the struct file that is already being passed.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/20210407201857.3582797-2-willy@infradead.org/
Link: https://lore.kernel.org/r/161789067431.6155.8063840447229665720.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 09:25:00 +01:00
Chaitanya Kulkarni
bd256fda92 ext4: use memcpy_to_page() in pagecache_write()
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210207190425.38107-7-chaitanya.kulkarni@wdc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-25 10:19:48 -04:00
Chaitanya Kulkarni
4d93874b9e ext4: use memcpy_from_page() in pagecache_read()
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210207190425.38107-6-chaitanya.kulkarni@wdc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-25 10:19:48 -04:00
Eric Biggers
f053cf7aa6 ext4: fix error handling in ext4_end_enable_verity()
ext4 didn't properly clean up if verity failed to be enabled on a file:

- It left verity metadata (pages past EOF) in the page cache, which
  would be exposed to userspace if the file was later extended.

- It didn't truncate the verity metadata at all (either from cache or
  from disk) if an error occurred while setting the verity bit.

Fix these bugs by adding a call to truncate_inode_pages() and ensuring
that we truncate the verity metadata (both from cache and from disk) in
all error paths.  Also rework the code to cleanly separate the success
path from the error paths, which makes it much easier to understand.

Reported-by: Yunlei He <heyunlei@hihonor.com>
Fixes: c93d8f8858 ("ext4: add basic fs-verity support")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210302200420.137977-2-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-03-11 10:38:50 -05:00
Matthew Wilcox (Oracle)
73bb49da50 mm/readahead: make page_cache_ra_unbounded take a readahead_control
Define it in the callers instead of in page_cache_ra_unbounded().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
Linus Torvalds
3be20b6fc1 This is the second round of ext4 commits for 5.8 merge window. It
includes the per-inode DAX support, which was dependant on the DAX
 infrastructure which came in via the XFS tree, and a number of
 regression and bug fixes; most notably the "BUG: using
 smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
 by syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7mgCcACgkQ8vlZVpUN
 gaPftwf8C4w/7SG+CYLdwg0d2u9TKk77yDuWaioFHOcMSjZvG4TCSgtMhZxQnyty
 9t4yqacILx12pCj/mZnrZp5BOSn9O2ZbuDoXNKNrFXU0BF+CsbnhvJvrrh1j/MUa
 PPtcqyGFdOLSDvHSD9xPVT76juwh79aR8vB7qnQXaEO5wcLodZWoqBEFSKCl6Bo8
 hjXs1EXidusKsoarQxW6mEITmnhU2S2fuCVDgVcoM/LmKwzbgqvlWrentq9u8qLH
 W+XbjWgUtCM1byeDZWqe5FYyyJ8x+dTv7H5an3KR92EN6hKo5AOvzA0I41pZscq/
 bJ9p2THDxJQX4rJBevGAS5mZ6hTkRw==
 =z6eO
 -----END PGP SIGNATURE-----

Merge tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull more ext4 updates from Ted Ts'o:
 "This is the second round of ext4 commits for 5.8 merge window [1].

  It includes the per-inode DAX support, which was dependant on the DAX
  infrastructure which came in via the XFS tree, and a number of
  regression and bug fixes; most notably the "BUG: using
  smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
  by syzkaller"

[1] The pull request actually came in 15 minutes after I had tagged the
    rc1 release. Tssk, tssk, late..   - Linus

* tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
  ext4: support xattr gnu.* namespace for the Hurd
  ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
  ext4: avoid utf8_strncasecmp() with unstable name
  ext4: stop overwrite the errcode in ext4_setup_super
  ext4: fix partial cluster initialization when splitting extent
  ext4: avoid race conditions when remounting with options that change dax
  Documentation/dax: Update DAX enablement for ext4
  fs/ext4: Introduce DAX inode flag
  fs/ext4: Remove jflag variable
  fs/ext4: Make DAX mount option a tri-state
  fs/ext4: Only change S_DAX on inode load
  fs/ext4: Update ext4_should_use_dax()
  fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
  fs/ext4: Disallow verity if inode is DAX
  fs/ext4: Narrow scope of DAX check in setflags
2020-06-15 09:32:10 -07:00
Matthew Wilcox (Oracle)
2c684234d3 mm: add page_cache_readahead_unbounded
ext4 and f2fs have duplicated the guts of the readahead code so they can
read past i_size.  Instead, separate out the guts of the readahead code
so they can call it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:06 -07:00
Ira Weiny
b383a73f2b fs/ext4: Introduce DAX inode flag
Add a flag ([EXT4|FS]_DAX_FL) to preserve FS_XFLAG_DAX in the ext4
inode.

Set the flag to be user visible and changeable.  Set the flag to be
inherited.  Allow applications to change the flag at any time except if
it conflicts with the set of mutually exclusive flags (Currently VERITY,
ENCRYPT, JOURNAL_DATA).

Furthermore, restrict setting any of the exclusive flags if DAX is set.

While conceptually possible, we do not allow setting EXT4_DAX_FL while
at the same time clearing exclusion flags (or vice versa) for 2 reasons:

	1) The DAX flag does not take effect immediately which
	   introduces quite a bit of complexity
	2) There is no clear use case for being this flexible

Finally, on regular files, flag the inode to not be cached to facilitate
changing S_DAX on the next creation of the inode.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-9-ira.weiny@intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28 22:09:47 -04:00
Ira Weiny
043546e46d fs/ext4: Only change S_DAX on inode load
To prevent complications with in memory inodes we only set S_DAX on
inode load.  FS_XFLAG_DAX can be changed at any time and S_DAX will
change after inode eviction and reload.

Add init bool to ext4_set_inode_flags() to indicate if the inode is
being newly initialized.

Assert that S_DAX is not set on an inode which is just being loaded.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-6-ira.weiny@intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28 22:09:47 -04:00
Ira Weiny
6c0d077ff8 fs/ext4: Disallow verity if inode is DAX
Verity and DAX are incompatible.  Changing the DAX mode due to a verity
flag change is wrong without a corresponding address_space_operations
update.

Make the 2 options mutually exclusive by returning an error if DAX was
set first.

(Setting DAX is already disabled if Verity is set first.)

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Link: https://lore.kernel.org/r/20200528150003.828793-3-ira.weiny@intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-28 22:09:47 -04:00
Eric Biggers
fd39073dba fs-verity: implement readahead of Merkle tree pages
When fs-verity verifies data pages, currently it reads each Merkle tree
page synchronously using read_mapping_page().

Therefore, when the Merkle tree pages aren't already cached, fs-verity
causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
that the Merkle tree uses SHA-256 and 4 KiB blocks).  This results in
more I/O requests and performance loss than is strictly necessary.

Therefore, implement readahead of the Merkle tree pages.

For simplicity, we take advantage of the fact that the kernel already
does readahead of the file's *data*, just like it does for any other
file.  Due to this, we don't really need a separate readahead state
(struct file_ra_state) just for the Merkle tree, but rather we just need
to piggy-back on the existing data readahead requests.

We also only really need to bother with the first level of the Merkle
tree, since the usual fan-out factor is 128, so normally over 99% of
Merkle tree I/O requests are for the first level.

Therefore, make fsverity_verify_bio() enable readahead of the first
Merkle tree level, for up to 1/4 the number of pages in the bio, when it
sees that the REQ_RAHEAD flag is set on the bio.  The readahead size is
then passed down to ->read_merkle_tree_page() for the filesystem to
(optionally) implement if it sees that the requested page is uncached.

While we're at it, also make build_merkle_tree_level() set the Merkle
tree readahead size, since it's easy to do there.

However, for now don't set the readahead size in fsverity_verify_page(),
since currently it's only used to verify holes on ext4 and f2fs, and it
would need parameters added to know how much to read ahead.

This patch significantly improves fs-verity sequential read performance.
Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:

    On an ARM64 phone (using sha256-ce):
        Before: 217 MB/s
        After: 263 MB/s
        (compare to sha256sum of non-verity file: 357 MB/s)

    In an x86_64 VM (using sha256-avx2):
        Before: 173 MB/s
        After: 215 MB/s
        (compare to sha256sum of non-verity file: 223 MB/s)

Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-14 13:27:32 -08:00
Eric Biggers
c93d8f8858 ext4: add basic fs-verity support
Add most of fs-verity support to ext4.  fs-verity is a filesystem
feature that enables transparent integrity protection and authentication
of read-only files.  It uses a dm-verity like mechanism at the file
level: a Merkle tree is used to verify any block in the file in
log(filesize) time.  It is implemented mainly by helper functions in
fs/verity/.  See Documentation/filesystems/fsverity.rst for the full
documentation.

This commit adds all of ext4 fs-verity support except for the actual
data verification, including:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
  inode and reading/writing the verity metadata.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
  writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
past the end of the file, starting at the first 64K boundary beyond
i_size.  This approach works because (a) verity files are readonly, and
(b) pages fully beyond i_size aren't visible to userspace but can be
read/written internally by ext4 with only some relatively small changes
to ext4.  This approach avoids having to depend on the EA_INODE feature
and on rearchitecturing ext4's xattr support to support paging
multi-gigabyte xattrs into memory, and to support encrypting xattrs.
Note that the verity metadata *must* be encrypted when the file is,
since it contains hashes of the plaintext data.

This patch incorporates work by Theodore Ts'o and Chandan Rajendra.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-12 19:33:50 -07:00