Linux kernel source tree
Go to file
Coly Li 09bdafb89a bcache: avoid oversize memory allocation by small stripe_size
[ Upstream commit baf8fb7e0e ]

Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
used for dirty data writeback, their sizes are decided by backing device
capacity and stripe size. Larger backing device capacity or smaller
stripe size make these two arraies occupies more dynamic memory space.

Currently bcache->stripe_size is directly inherited from
queue->limits.io_opt of underlying storage device. For normal hard
drives, its limits.io_opt is 0, and bcache sets the corresponding
stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
devices do declare value for queue->limits.io_opt, small stripe_size
(comparing to 1TB) becomes an issue for oversize memory allocations of
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
capacity of hard drives gets much larger in recent decade.

For example a raid5 array assembled by three 20TB hardrives, the raid
device capacity is 40TB with typical 512KB limits.io_opt. After the math
calculation in bcache code, these two arraies will occupy 400MB dynamic
memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
declared on a new 2TB hard drive, then these two arraies request 2GB and
512MB dynamic memory from kzalloc(). The result is that bcache device
always fails to initialize on his system.

To avoid the oversize memory allocation, bcache->stripe_size should not
directly inherited by queue->limits.io_opt from the underlying device.
This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
and set bcache device's stripe size against the declared limits.io_opt
value from the underlying storage device,
- If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size directly by this limits.io_opt value.
- If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size by a value multiplying limits.io_opt and euqal or
  large than BCH_MIN_STRIPE_SZ.

Then the minimal stripe size of a bcache device will always be >= 4MB.
For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
by these two arraies will be 2.5MB in total.

Such mount of memory allocated for bcache->stripe_sectors_dirty and
bcache->full_dirty_stripes is reasonable for most of storage devices.

Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net>
Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:01:56 +01:00
arch x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM 2023-12-20 17:01:55 +01:00
block blk-cgroup: bypass blkcg_deactivate_policy after destroying 2023-12-20 17:01:55 +01:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 17:19:42 +00:00
Documentation tee: optee: Fix supplicant based device enumeration 2023-12-13 18:45:11 +01:00
drivers bcache: avoid oversize memory allocation by small stripe_size 2023-12-20 17:01:56 +01:00
fs ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE 2023-12-20 17:01:53 +01:00
include PCI/ASPM: Add pci_enable_link_state_locked() 2023-12-20 17:01:53 +01:00
init proc: sysctl: prevent aliased sysctls from getting passed to init 2023-11-28 17:19:57 +00:00
io_uring io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation 2023-12-20 17:01:52 +01:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-20 17:01:51 +01:00
lib cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-20 17:01:51 +01:00
LICENSES
mm mm/memory_hotplug: fix error handling in add_memory_resource() 2023-12-13 18:45:25 +01:00
net rxrpc: Fix some minor issues with bundle tracing 2023-12-20 17:01:55 +01:00
rust rust: docs: fix logo replacement 2023-10-19 16:40:00 +02:00
samples samples/bpf: syscall_tp_user: Fix array out-of-bound access 2023-11-28 17:19:48 +00:00
scripts sign-file: Fix incorrect return values check 2023-12-20 17:01:49 +01:00
security cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-20 17:01:51 +01:00
sound ALSA: hda/tas2781: reset the amp before component_add 2023-12-20 17:01:53 +01:00
tools selftests/mm: cow: print ksft header before printing anything else 2023-12-20 17:01:55 +01:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt ARM: 2023-09-07 13:52:20 -07:00
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap 20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues 2023-10-24 09:52:16 -10:00
.rustfmt.toml
COPYING
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild
Kconfig
MAINTAINERS Char/Misc driver fixes for 6.6-final 2023-10-28 07:51:27 -10:00
Makefile Linux 6.6.7 2023-12-13 18:45:36 +01:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.