Commit Graph

232 Commits

Author SHA1 Message Date
Keith Busch
33eded2931 dm: provide helper to set stacked limits
There are multiple device mappers that set up their stacking limits
exactly the same for the logical, physical and minimum IO queue limits.
Provide a helper for it.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-03-27 22:19:17 +01:00
Keith Busch
cbc1532d2b dm-integrity: always set the io hints
Don't depend on the defaults to be what is desired if the integrity
device was set up with 512b sector size. Always set the queue limits to
be at least what the device mapper wants.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-03-27 22:19:08 +01:00
Keith Busch
6ebf3b6c6f dm-integrity: fix mismatched queue limits
A user can integritysetup a device with a backing device using a 4k
logical block size, but request the dm device use 1k or 2k. This
mismatch creates an inconsistency such that the dm device would report
limits for IO that it can't actually execute. Fix this by using the
backing device's limits if they are larger.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-03-27 22:18:56 +01:00
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Mikulas Patocka
118ba36e44 dm-integrity: fix recalculation in bitmap mode
There's a logic quirk in the handling of suspend in the bitmap mode:

This is the sequence of calls if we are reloading a dm-integrity table:
* dm_integrity_ctr reads a superblock with the flag SB_FLAG_DIRTY_BITMAP
  set.
* dm_integrity_postsuspend initializes a journal and clears the flag
  SB_FLAG_DIRTY_BITMAP.
* dm_integrity_resume sees the superblock with SB_FLAG_DIRTY_BITMAP set -
  thus it interprets the journal as if it were a bitmap.

This quirk causes recalculation problem if the user increases the size of
the device in the bitmap mode.

Fix this by reading a fresh copy on the superblock in
dm_integrity_resume. This commit also fixes another logic quirk - the
branch that sets bitmap bits if the device was extended should only be
executed if the flag SB_FLAG_DIRTY_BITMAP is set.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Ondrej Kozina <okozina@redhat.com>
Fixes: 468dfca38b ("dm integrity: add a bitmap mode")
Cc: stable@vger.kernel.org
2026-01-19 16:16:53 +01:00
Marco Crivellari
d488086867 dm: add WQ_PERCPU to alloc_workqueue users
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

   commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14 13:16:00 +01:00
Mikulas Patocka
c698b7f417 dm-integrity: fix a typo in the code for write/discard race
If we send a write followed by a discard, it may be possible that the
discarded data end up being overwritten by the previous write from the
journal. The code tries to prevent that, but there was a typo in this
logic that made it not being activated as it should be.

Note that if we end up here the second time (when discard_retried is
true), it means that the write bio is actually racing with the discard
bio, and in this situation it is not specified which of them should win.

Cc: stable@vger.kernel.org
Fixes: 31843edab7 ("dm integrity: improve discard in journal mode")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14 13:15:02 +01:00
Linus Torvalds
7dbec0bbc3 dm docs: fix typos
dm, dm-ima, dm-bufio, dm-vdo, dm-raid: small refactoring
 
 dm-error: mark it with DM_TARGET_PASSES_INTEGRITY
 
 dm-pcache: a new target for read/write caching on persistent memory
 
 dm-request-based: fix NULL pointer dereference and quiesce_depth out of sync
 
 dm-linear: optimize REQ_PREFLUSH
 
 dm-vdo: return error on corrupted metadata
 
 dm-integrity: support asynchronous hash interface
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCaN/T8RQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbdPWAP9JEpnq09RzwneB/FdCE2WsjInsAaas
 eYqKrkgoatVzOwEAoqEzYb9IUrKMZbFPdWzEA1aXUBCH+UAKhm+G9WG5Vw0=
 =dCXi
 -----END PGP SIGNATURE-----

Merge tag 'for-6.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - a new dm-pcache target for read/write caching on persistent memory

 - fix typos in docs

 - misc small refactoring

 - mark dm-error with DM_TARGET_PASSES_INTEGRITY

 - dm-request-based: fix NULL pointer dereference and quiesce_depth out of sync

 - dm-linear: optimize REQ_PREFLUSH

 - dm-vdo: return error on corrupted metadata

 - dm-integrity: support asynchronous hash interface

* tag 'for-6.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
  dm raid: use proper md_ro_state enumerators
  dm-integrity: prefer synchronous hash interface
  dm-integrity: enable asynchronous hash interface
  dm-integrity: rename internal_hash
  dm-integrity: add the "offset" argument
  dm-integrity: allocate the recalculate buffer with kmalloc
  dm-integrity: introduce integrity_kmap and integrity_kunmap
  dm-integrity: replace bvec_kmap_local with kmap_local_page
  dm-integrity: use internal variable for digestsize
  dm vdo: return error on corrupted metadata in start_restoring_volume functions
  dm vdo: Update code to use mem_is_zero
  dm: optimize REQ_PREFLUSH with data when using the linear target
  dm-pcache: use int type to store negative error codes
  dm: fix "writen"->"written"
  dm-pcache: cleanup: fix coding style report by checkpatch.pl
  dm-pcache: remove ctrl_lock for pcache_cache_segment
  dm: fix NULL pointer dereference in __dm_suspend()
  dm: fix queue start/stop imbalance under suspend/load/resume races
  dm-pcache: add persistent cache target in device-mapper
  dm error: mark as DM_TARGET_PASSES_INTEGRITY
  ...
2025-10-03 18:48:02 -07:00
Mikulas Patocka
1cd83fb790 dm-integrity: prefer synchronous hash interface
The previous patch preferred async interface for the purpose of testing.
However, the synchronous interface is faster, so it should be preferred.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 14:51:04 +02:00
Mikulas Patocka
5076d4599c dm-integrity: enable asynchronous hash interface
This commit enables the asynchronous hash interface in dm-integrity.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 14:51:04 +02:00
Mikulas Patocka
e8a052ee1f dm-integrity: rename internal_hash
Rename "internal_hash" to "internal_shash" and introduce a boolean value
"internal_hash".

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 14:51:04 +02:00
Mikulas Patocka
e3de0a3640 dm-integrity: add the "offset" argument
Make sure that the "data" argument passed to integrity_sector_checksum is
always page-aligned and add an "offset" argument that specifies the
offset from the start of the page. This will enable us to use the
asynchronous hash interface later.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 14:51:04 +02:00
Mikulas Patocka
e7151e225c dm-integrity: allocate the recalculate buffer with kmalloc
Allocate the recalculate buffer with kmalloc rather than vmalloc. This
will be needed later, for the simplification of the asynchronous hash
interface.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 14:50:56 +02:00
Mikulas Patocka
4b9197ed60 dm-integrity: introduce integrity_kmap and integrity_kunmap
This abstraction will be used later, for the asynchronous hash interface.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 10:49:45 +02:00
Mikulas Patocka
d916106564 dm-integrity: replace bvec_kmap_local with kmap_local_page
Replace bvec_kmap_local with kmap_local_page - it will be needed for the
upcoming patches that make kmap_local_page optional, depending on whether
asynchronous hash interface is used or not.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 10:49:45 +02:00
Mikulas Patocka
e4cc9deca3 dm-integrity: use internal variable for digestsize
Instead of calling digestsize() each time the digestsize for
the internal hash is needed, store the digestsize in a new
field internal_hash_digestsize within struct dm_integrity_c
once and use this value when needed.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-23 10:49:45 +02:00
Mikulas Patocka
77b8e6fbf9 dm-integrity: limit MAX_TAG_SIZE to 255
MAX_TAG_SIZE was 0x1a8 and it may be truncated in the "bi->metadata_size
= ic->tag_size" assignment. We need to limit it to 255.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-08 15:57:04 +02:00
Anuj Gupta
c6603b1d65
block: rename tuple_size field in blk_integrity to metadata_size
The tuple_size field in blk_integrity currently represents the total
size of metadata associated with each data interval. To make the meaning
more explicit, rename tuple_size to metadata_size. This is a purely
mechanical rename with no functional changes.

Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/20250630090548.3317-2-anuj20.g@samsung.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-01 14:00:14 +02:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Linus Torvalds
6f59de9bc0 for-6.16/block-20250523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X
 MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q
 IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy
 fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO
 xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW
 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH
 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R
 Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn
 i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X
 chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8
 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb
 Y6NDJw5+kQ==
 =1PNK
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
2025-05-26 11:39:36 -07:00
Christoph Hellwig
bd4e709b32 dm-integrity: use bio_add_virt_nofail
Convert the __bio_add_page(..., virt_to_page(), ...) pattern to the
bio_add_virt_nofail helper implementing it, and do the same for the
similar pattern using bio_add_page for adding the first segment after
a bio allocation as that can't fail either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250507120451.4000627-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07 07:31:07 -06:00
Mikulas Patocka
0a533c3e42 dm-integrity: fix a warning on invalid table line
If we use the 'B' mode and we have an invalit table line,
cancel_delayed_work_sync would trigger a warning. This commit avoids the
warning.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
2025-04-23 13:09:15 +02:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Linus Torvalds
5014bebee0 - dm-crypt: switch to using the crc32 library
- dm-verity, dm-integrity, dm-crypt: documentation improvement
 
 - dm-vdo fixes
 
 - dm-stripe: enable inline crypto passthrough
 
 - dm-integrity: set ti->error on memory allocation failure
 
 - dm-bufio: remove unused return value
 
 - dm-verity: do forward error correction on metadata I/O errors
 
 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH
 
 - dm cache: prevent BUG_ON by blocking retries on failed device resumes
 
 - dm cache: support shrinking the origin device
 
 - dm: restrict dm device size to 2^63-512 bytes
 
 - dm-delay: support zoned devices
 
 - dm-verity: support block number limits for different ioprio classes
 
 - dm-integrity: fix non-constant-time tag verification (security bug)
 
 - dm-verity, dm-ebs: fix prefetch-vs-suspend race
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZ+u7shQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbZ0JAQDVhbl77u9jjPWjxJvFodMAqw+KPXGC
 MNzkyzG0lu7oPAEA33vt5pHQtr7F3SJj/sDBuZ+rb5bvUtgxeGqpJOQpTAk=
 =tj00
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - dm-crypt: switch to using the crc32 library

 - dm-verity, dm-integrity, dm-crypt: documentation improvement

 - dm-vdo fixes

 - dm-stripe: enable inline crypto passthrough

 - dm-integrity: set ti->error on memory allocation failure

 - dm-bufio: remove unused return value

 - dm-verity: do forward error correction on metadata I/O errors

 - dm: fix unconditional IO throttle caused by REQ_PREFLUSH

 - dm cache: prevent BUG_ON by blocking retries on failed device resumes

 - dm cache: support shrinking the origin device

 - dm: restrict dm device size to 2^63-512 bytes

 - dm-delay: support zoned devices

 - dm-verity: support block number limits for different ioprio classes

 - dm-integrity: fix non-constant-time tag verification (security bug)

 - dm-verity, dm-ebs: fix prefetch-vs-suspend race

* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
  dm-ebs: fix prefetch-vs-suspend race
  dm-verity: fix prefetch-vs-suspend race
  dm-integrity: fix non-constant-time tag verification
  dm-verity: support block number limits for different ioprio classes
  dm-delay: support zoned devices
  dm: restrict dm device size to 2^63-512 bytes
  dm cache: support shrinking the origin device
  dm cache: prevent BUG_ON by blocking retries on failed device resumes
  dm vdo indexer: reorder uds_request to reduce padding
  dm: fix unconditional IO throttle caused by REQ_PREFLUSH
  dm vdo: rework processing of loaded refcount byte arrays
  dm vdo: remove remaining ring references
  dm-verity: do forward error correction on metadata I/O errors
  dm-bufio: remove unused return value
  dm-integrity: set ti->error on memory allocation failure
  dm: Enable inline crypto passthrough for striped target
  dm vdo slab-depot: read refcount blocks in large chunks at load time
  dm vdo vio-pool: allow variable-sized metadata vios
  dm vdo vio-pool: support pools with multiple data blocks per vio
  dm vdo vio-pool: add a pool pointer to pooled_vio
  ...
2025-04-02 21:27:59 -07:00
Jo Van Bulck
8bde1033f9 dm-integrity: fix non-constant-time tag verification
When using dm-integrity in standalone mode with a keyed hmac algorithm,
integrity tags are calculated and verified internally.

Using plain memcmp to compare the stored and computed tags may leak the
position of the first byte mismatch through side-channel analysis,
allowing to brute-force expected tags in linear time (e.g., by counting
single-stepping interrupts in confidential virtual machine environments).

Co-developed-by: Luca Wilke <work@luca-wilke.com>
Signed-off-by: Luca Wilke <work@luca-wilke.com>
Signed-off-by: Jo Van Bulck <jo.vanbulck@cs.kuleuven.be>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
2025-03-28 18:25:42 +01:00
Linus Torvalds
9b960d8cd6 for-6.15/block-20250322
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfe8BkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvTqD/4pOeGi/QfLyocn4TcJcidRGZAvBxecTVuM
 upeyr+dCyCi9Wk+EJKeAFooGe15upzxDxKj06HhCixaLx4etDK78uGV4FMM1Z4oa
 2dtchz1Zd0HyBPgQIUY8OuOgbS7tstMS/KdvL+gr5IjfapeTF+54WVLCD8eVyvO/
 vUIppgJBhrqy2qui4xF2lw4t2COt+/PqinGQuYALn4V4Po9NWA7lSh3ZI4F/byj1
 v68jXyt2fqCAyxwkzRDv4GxhN8c6W+TPJpzivrEAuSkLacovESKztinOrafrBnLR
 zdyO4n0V0yGOXbAcxRbADVA4HUsqhLl4JRnvE5P5zIaD7rkE0UqggF7vrSeCvVA1
 hsi1BhkAMNimKX7CZMnT3dJpxRQj1eDJxpwUAusLHWjMyQbNFhV7WAtthMtVJon8
 lAS4e5+xzjqKhF15GpVg5Lzy8SAwdqgNXwwq2zbM8OaPKG0FpajG8DXAqqcj4fpy
 WXnwg72KZDmRcSNJhVZK6B9xSAwIMXPgH4ClCMP9/xlw8EDpM38MDmzrs35TAVtI
 HGE3Qv9CjFjVj/OG3el+bTGIQJFVgYEVPV5TYfNCpKoxpj5cLn5OQY5u6MJawtgK
 HeDgKv3jw3lHatDALMVfwJqqVlUht0R6SIxtP9WHV+CcFrqN1LJKmdhDQbm7b4XK
 EbbawIsdxw==
 =Ci5m
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - Fixes for integrity handling

 - NVMe pull request via Keith:
      - Secure concatenation for TCP transport (Hannes)
      - Multipath sysfs visibility (Nilay)
      - Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
      - Correct use of 64-bit BARs for pci-epf target (Niklas)
      - Socket fix for selinux when used in containers (Peijie)

 - MD pull request via Yu:
      - fix recovery can preempt resync (Li Nan)
      - fix md-bitmap IO limit (Su Yue)
      - fix raid10 discard with REQ_NOWAIT (Xiao Ni)
      - fix raid1 memory leak (Zheng Qixing)
      - fix mddev uaf (Yu Kuai)
      - fix raid1,raid10 IO flags (Yu Kuai)
      - some refactor and cleanup (Yu Kuai)

 - Series cleaning up and fixing bugs in the bad block handling code

 - Improve support for write failure simulation in null_blk

 - Various lock ordering fixes

 - Fixes for locking for debugfs attributes

 - Various ublk related fixes and improvements

 - Cleanups for blk-rq-qos wait handling

 - blk-throttle fixes

 - Fixes for loop dio and sync handling

 - Fixes and cleanups for the auto-PI code

 - Block side support for hardware encryption keys in blk-crypto

 - Various cleanups and fixes

* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
  nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
  nvme-tcp: fix selinux denied when calling sock_sendmsg
  nvmet: pci-epf: Always configure BAR0 as 64-bit
  nvmet: Remove duplicate uuid_copy
  nvme: zns: Simplify nvme_zone_parse_entry()
  nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
  nvmet-fc: Remove unused functions
  nvme-pci: remove stale comment
  nvme-fc: Utilise min3() to simplify queue count calculation
  nvme-multipath: Add visibility for queue-depth io-policy
  nvme-multipath: Add visibility for numa io-policy
  nvme-multipath: Add visibility for round-robin io-policy
  nvmet: add tls_concat and tls_key debugfs entries
  nvmet-tcp: support secure channel concatenation
  nvmet: Add 'sq' argument to alloc_ctrl_args
  nvme-fabrics: reset admin connection for secure concatenation
  nvme-tcp: request secure channel concatenation
  nvme-keyring: add nvme_tls_psk_refresh()
  nvme: add nvme_auth_derive_tls_psk()
  nvme: add nvme_auth_generate_digest()
  ...
2025-03-26 18:08:55 -07:00
Christoph Hellwig
105ca2a2c2 block: split struct bio_integrity_payload
Many of the fields in struct bio_integrity_payload are only needed for
the default integrity buffer in the block layer, and the variable
sized array at the end of the structure makes it very hard to embed
into caller allocated structures.

Reduce struct bio_integrity_payload to the minimal structure needed in
common code and create two separate containing structures for the
automatically generated payload and the caller allocated payload.
The latter is a simple wrapper for struct bio_integrity_payload and
the bvecs, while the former contains the additional fields moved out
of struct bio_integrity_payload.

Always use a dedicated mempool for automatic integrity metadata
instead of depending on bio_set that is submitter controlled and thus
often doesn't have the mempool initialized and stop using mempools for
the submitter buffers as they aren't in the NOIO I/O submission path
where we need to guarantee forward progress.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20250225154449.422989-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-03 11:17:52 -07:00
Mikulas Patocka
00204ae3d6 dm-integrity: set ti->error on memory allocation failure
The dm-integrity target didn't set the error string when memory
allocation failed. This patch fixes it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
2025-02-24 11:42:09 +01:00
Milan Broz
c19525b5fb dm-integrity: Do not emit journal configuration in DM table for Inline mode
The Inline mode does not use a journal; it makes no sense to print
journal information in DM table. Print it only if the journal is used.

The same applies to interleave_sectors (unused for Inline mode).

Also, add comments for arg_count, as the current calculation
is quite obscure.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-02-17 11:33:41 +01:00
Milan Broz
7fb39882b2 dm-integrity: Avoid divide by zero in table status in Inline mode
In Inline mode, the journal is unused, and journal_sectors is zero.

Calculating the journal watermark requires dividing by journal_sectors,
which should be done only if the journal is configured.

Otherwise, a simple table query (dmsetup table) can cause OOPS.

This bug did not show on some systems, perhaps only due to
compiler optimization.

On my 32-bit testing machine, this reliably crashes with the following:

 : Oops: divide error: 0000 [#1] PREEMPT SMP
 : CPU: 0 UID: 0 PID: 2450 Comm: dmsetup Not tainted 6.14.0-rc2+ #959
 : EIP: dm_integrity_status+0x2f8/0xab0 [dm_integrity]
 ...

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: fb0987682c ("dm-integrity: introduce the Inline mode")
Cc: stable@vger.kernel.org # 6.11+
2025-02-17 11:33:07 +01:00
Linus Torvalds
e477dba544 - Misc VDO fixes
- Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()
 
 - Dm-delay: Improve kernel documentation
 
 - Dm-crypt: Allow to specify the integrity key size as an option
 
 - Dm-bufio: Remove pointless NULL check
 
 - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR; use
   __assign_bit
 
 - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix smatch
   warning
 
 - Dm-integrity: Support recalculation in the 'I' mode
 
 - Revert "dm: requeue IO if mapping table not yet available"
 
 - Dm-crypt: Small refactoring to make the code more readable
 
 - Dm-cache: Remove pointless error check
 
 - Dm: Fix spelling errors
 
 - Dm-verity: Restart or panic on an I/O error if restart or panic was
   requested
 
 - Dm-verity: Fallback to platform keyring also if key in trusted keyring
   is rejected
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRnH8MwLyZDhyYfesYTAyx9YGnhbQUCZvapzRQcbXBhdG9ja2FA
 cmVkaGF0LmNvbQAKCRATAyx9YGnhbdKAAP4gHNU7aRmwTPcmvytEqBO4Pcz4eGB/
 tytj2+o1orph3AD/YD2X75YHOrdNKTLq+N0ecetAt0yDVUnJAUtKiOnx6Q8=
 =0f9T
 -----END PGP SIGNATURE-----

Merge tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mikulas Patocka:

 - Misc VDO fixes

 - Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio()

 - Dm-delay: Improve kernel documentation

 - Dm-crypt: Allow to specify the integrity key size as an option

 - Dm-bufio: Remove pointless NULL check

 - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR;
   use __assign_bit

 - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix
   smatch warning

 - Dm-integrity: Support recalculation in the 'I' mode

 - Revert "dm: requeue IO if mapping table not yet available"

 - Dm-crypt: Small refactoring to make the code more readable

 - Dm-cache: Remove pointless error check

 - Dm: Fix spelling errors

 - Dm-verity: Restart or panic on an I/O error if restart or panic was
   requested

 - Dm-verity: Fallback to platform keyring also if key in trusted
   keyring is rejected

* tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
  dm verity: fallback to platform keyring also if key in trusted keyring is rejected
  dm-verity: restart or panic on an I/O error
  dm: fix spelling errors
  dm-cache: remove pointless error check
  dm vdo: handle unaligned discards correctly
  dm vdo indexer: Convert comma to semicolon
  dm-crypt: Use common error handling code in crypt_set_keyring_key()
  dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key()
  Revert "dm: requeue IO if mapping table not yet available"
  dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac()
  dm-integrity: support recalculation in the 'I' mode
  dm integrity: Convert comma to semicolon
  dm integrity: fix gcc 5 warning
  dm: Make use of __assign_bit() API
  dm integrity: Remove extra unlikely helper
  dm: Convert to use ERR_CAST()
  dm bufio: Remove NULL check of list_entry()
  dm-crypt: Allow to specify the integrity key size as option
  dm: Remove unused declaration and empty definition "dm_zone_map_bio"
  dm delay: enhance kernel documentation
  ...
2024-09-27 09:12:51 -07:00
Eric Biggers
9c2010bccc dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac()
sb_mac() verifies that the superblock + MAC don't exceed 512 bytes.
Because the superblock is currently 64 bytes, this really verifies
mac_size <= 448.  This confuses smatch into thinking that mac_size may
be as large as 448, which is inconsistent with the later code that
assumes the MAC fits in a buffer of size HASH_MAX_DIGESTSIZE (64).

In fact mac_size <= HASH_MAX_DIGESTSIZE is guaranteed by the crypto API,
as that is the whole point of HASH_MAX_DIGESTSIZE.  But, let's be
defensive and explicitly check for this.  This suppresses the false
positive smatch warning.  It does not fix an actual bug.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202409061401.44rtN1bh-lkp@intel.com/
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-09-11 14:04:41 +02:00
Mikulas Patocka
f8e1ca92e3 dm-integrity: fix a race condition when accessing recalc_sector
There's a race condition when accessing the variable
ic->sb->recalc_sector. The function integrity_recalc writes to this
variable when it makes some progress and the function
dm_integrity_map_continue may read this variable concurrently.

One problem is that on 32-bit architectures the 64-bit variable is not
read and written atomically - it may be possible to read garbage if read
races with write.

Another problem is that memory accesses to this variable are not guarded
with memory barriers.

This commit fixes the race - it moves reading ic->sb->recalc_sector to an
earlier place where we hold &ic->endio_wait.lock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
2024-09-06 12:38:16 +02:00
Mikulas Patocka
90da77987d dm-integrity: support recalculation in the 'I' mode
In the kernel 6.11, dm-integrity was enhanced with an inline ('I') mode.
This mode uses devices with non-power-of-2 sector size. The extra
metadata after each sector are used to hold the integrity hash.

This commit enhances the inline mode, so that there is automatic
recalculation of the integrity hashes when the 'reclaculate' parameter is
used. It allows us to activate the device instantly, and the
recalculation is done on background.

If the device is deactivated while recalculation is in progress, it will
remember the point where it stopped and it will continue from this point
when activated again.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-09-06 12:34:05 +02:00
Chen Ni
a7722b82c2 dm integrity: Convert comma to semicolon
Replace comma between expressions with semicolons.

Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.

Found by inspection.
No functional change intended.
Compile tested only.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-09-06 12:33:42 +02:00
Mikulas Patocka
a8fa6483b4 dm integrity: fix gcc 5 warning
This commit fixes gcc 5 warning "logical not is only applied to the left
hand side of comparison"

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: fb0987682c ("dm-integrity: introduce the Inline mode")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-09-03 12:08:20 +02:00
Hongbo Li
35c9f09b56 dm integrity: Remove extra unlikely helper
In IS_ERR, the unlikely is used for the input parameter,
so these is no need to use it again outside.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-09-02 11:41:11 +02:00
Linus Torvalds
4477b39c32 minmax: add a few more MIN_T/MAX_T users
Commit 3a7e02c040 ("minmax: avoid overly complicated constant
expressions in VM code") added the simpler MIN_T/MAX_T macros in order
to avoid some excessive expansion from the rather complicated regular
min/max macros.

The complexity of those macros stems from two issues:

 (a) trying to use them in situations that require a C constant
     expression (in static initializers and for array sizes)

 (b) the type sanity checking

and MIN_T/MAX_T avoids both of these issues.

Now, in the whole (long) discussion about all this, it was pointed out
that the whole type sanity checking is entirely unnecessary for
min_t/max_t which get a fixed type that the comparison is done in.

But that still leaves min_t/max_t unnecessarily complicated due to
worries about the C constant expression case.

However, it turns out that there really aren't very many cases that use
min_t/max_t for this, and we can just force-convert those.

This does exactly that.

Which in turn will then allow for much simpler implementations of
min_t()/max_t().  All the usual "macros in all upper case will evaluate
the arguments multiple times" rules apply.

We should do all the same things for the regular min/max() vs MIN/MAX()
cases, but that has the added complexity of various drivers defining
their own local versions of MIN/MAX, so that needs another level of
fixes first.

Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28 13:41:14 -07:00
Mikulas Patocka
fb0987682c dm-integrity: introduce the Inline mode
This commit introduces a new 'I' mode for dm-integrity.

The 'I' mode may be selected if the underlying device has non-power-of-2
sector size. In this mode, dm-integrity will store integrity data
directly in device's sectors and it will not use journal.

This mode improves performance and reduces flash wear because there would
be no journal writes.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-07-19 12:06:44 +02:00
Christoph Hellwig
0a94a469a4 dm: stop using blk_limits_io_{min,opt}
Remove use of the blk_limits_io_{min,opt} and assign the values directly
to the queue_limits structure.  For the io_opt this is a completely
mechanical change, for io_min it removes flooring the limit to the
physical and logical block size in the particular caller.  But as
blk_validate_limits will do the same later when actually applying the
limits, there still is no change in overall behavior.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-07-10 13:10:06 +02:00
Christoph Hellwig
c6e56cf6b2 block: move integrity information into queue_limits
Move the integrity information into the queue limits so that it can be
set atomically with other queue limits, and that the sysfs changes to
the read_verify and write_generate flags are properly synchronized.
This also allows to provide a more useful helper to stack the integrity
fields, although it still is separate from the main stacking function
as not all stackable devices want to inherit the integrity settings.
Even with that it greatly simplifies the code in md and dm.

Note that the integrity field is moved as-is into the queue limits.
While there are good arguments for removing the separate blk_integrity
structure, this would cause a lot of churn and might better be done at a
later time if desired.  However the integrity field in the queue_limits
structure is now unconditional so that various ifdefs can be avoided or
replaced with IS_ENABLED().  Given that tiny size of it that seems like
a worthwhile trade off.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240613084839.1044015-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14 10:20:07 -06:00
Christoph Hellwig
63e649594a dm-integrity: use the nop integrity profile
Use the block layer built-in nop profile instead of duplicating it.

Tested by:

$ dd if=/dev/urandom of=key.bin bs=512 count=1

$ cryptsetup luksFormat -q --type luks2 --integrity hmac-sha256 \
 	--integrity-no-wipe /dev/nvme0n1 key.bin
$ cryptsetup luksOpen /dev/nvme0n1 luks-integrity --key-file key.bin

and then doing mkfs.xfs and simple I/O on the mount file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240613084839.1044015-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14 10:20:06 -06:00
Mikulas Patocka
69381cf88a dm-integrity: set discard_granularity to logical block size
dm-integrity could set discard_granularity lower than the logical block
size. This could result in failures when sending discard requests to
dm-integrity.

This fix is needed for kernels prior to 6.10.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Eric Wheeler <linux-integrity@lists.ewheeler.net>
Cc: stable@vger.kernel.org # <= 6.9
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-05-20 15:49:22 -04:00
Arnd Bergmann
8e91c23423 dm integrity: fix out-of-range warning
Depending on the value of CONFIG_HZ, clang complains about a pointless
comparison:

drivers/md/dm-integrity.c:4085:12: error: result of comparison of
                        constant 42949672950 with expression of type
                        'unsigned int' is always false
                        [-Werror,-Wtautological-constant-out-of-range-compare]
                        if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {

As the check remains useful for other configurations, shut up the
warning by adding a second type cast to uint64_t.

Fixes: 468dfca38b ("dm integrity: add a bitmap mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-29 09:48:07 -04:00
Mikulas Patocka
b4d78cfeb3 dm-integrity: align the outgoing bio in integrity_recheck
It is possible to set up dm-integrity with smaller sector size than
the logical sector size of the underlying device. In this situation,
dm-integrity guarantees that the outgoing bios have the same alignment as
incoming bios (so, if you create a filesystem with 4k block size,
dm-integrity would send 4k-aligned bios to the underlying device).

This guarantee was broken when integrity_recheck was implemented.
integrity_recheck sends bio that is aligned to ic->sectors_per_block. So
if we set up integrity with 512-byte sector size on a device with logical
block size 4k, we would be sending unaligned bio. This triggered a bug in
one of our internal tests.

This commit fixes it by determining the actual alignment of the
incoming bio and then makes sure that the outgoing bio in
integrity_recheck has the same alignment.

Fixes: c88f5e553f ("dm-integrity: recheck the integrity tag after a failure")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-21 13:19:10 -04:00
Mikulas Patocka
55e565c42d dm-integrity: fix a memory leak when rechecking the data
Memory for the "checksums" pointer will leak if the data is rechecked
after checksum failure (because the associated kfree won't happen due
to 'goto skip_io').

Fix this by freeing the checksums memory before recheck, and just use
the "checksum_onstack" memory for storing checksum during recheck.

Fixes: c88f5e553f ("dm-integrity: recheck the integrity tag after a failure")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-19 11:51:37 -04:00