Linux kernel source tree
Go to file
Darrick J. Wong c7d7fe001b vfs: remove lockdep bogosity in __sb_start_write
[ Upstream commit 22843291ef ]

__sb_start_write has some weird looking lockdep code that claims to
exist to handle nested freeze locking requests from xfs.  The code as
written seems broken -- if we think we hold a read lock on any of the
higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
trylock.

However, it's not correct to downgrade a blocking lock attempt to a
trylock unless the downgrading code or the callers are prepared to deal
with that situation.  Neither __sb_start_write nor its callers handle
this at all.  For example:

sb_start_pagefault ignores the return value completely, with the result
that if xfs_filemap_fault loses a race with a different thread trying to
fsfreeze, it will proceed without pagefault freeze protection (thereby
breaking locking rules) and then unlocks the pagefault freeze lock that
it doesn't own on its way out (thereby corrupting the lock state), which
leads to a system hang shortly afterwards.

Normally, this won't happen because our ownership of a read lock on a
higher freeze protection level blocks fsfreeze from grabbing a write
lock on that higher level.  *However*, if lockdep is offline,
lock_is_held_type unconditionally returns 1, which means that
percpu_rwsem_is_held returns 1, which means that __sb_start_write
unconditionally converts blocking freeze lock attempts into trylocks,
even when we *don't* hold anything that would block a fsfreeze.

Apparently this all held together until 5.10-rc1, when bugs in lockdep
caused lockdep to shut itself off early in an fstests run, and once
fstests gets to the "race writes with freezer" tests, kaboom.  This
might explain the long trail of vanishingly infrequent livelocks in
fstests after lockdep goes offline that I've never been able to
diagnose.

We could fix it by spinning on the trylock if wait==true, but AFAICT the
locking works fine if lockdep is not built at all (and I didn't see any
complaints running fstests overnight), so remove this snippet entirely.

NOTE: Commit f4b554af99 in 2015 created the current weird logic (which
used to exist in a different form in commit 5accdf82ba from 2012) in
__sb_start_write.  XFS solved this whole problem in the late 2.6 era by
creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
grab intwrite freeze protection, thus making lockdep's solution
unnecessary.  The commit claims that Dave Chinner explained that the
trylock hack + comment could be removed, but nobody ever did.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:27:20 +01:00
arch arm64: psci: Avoid printing in cpu_psci_cpu_die() 2020-11-24 13:27:19 +01:00
block blk-cgroup: Pre-allocate tree node on blkg_conf_prep 2020-11-10 12:35:59 +01:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto crypto: algif_skcipher - EBUSY on aio should be an error 2020-10-29 09:55:01 +01:00
Documentation powerpc/64s: flush L1D after user accesses 2020-11-22 10:02:26 +01:00
drivers ACPI: button: Add DMI quirk for Medion Akoya E2228T 2020-11-24 13:27:19 +01:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:21:29 +01:00
fs vfs: remove lockdep bogosity in __sb_start_write 2020-11-24 13:27:20 +01:00
include random32: make prandom_u32() output unpredictable 2020-11-18 19:18:52 +01:00
init printk: reduce LOG_BUF_SHIFT range for H8300 2020-11-05 11:08:41 +01:00
ipc ipc/util.c: sysvipc_find_ipc() incorrectly updates position index 2020-05-20 08:18:40 +02:00
kernel reboot: fix overflow parsing reboot cpu number 2020-11-18 19:18:52 +01:00
lib random32: make prandom_u32() output unpredictable 2020-11-18 19:18:52 +01:00
LICENSES LICENSES: Remove CC-BY-SA-4.0 license text 2018-10-18 11:28:50 +02:00
mm page_frag: Recover from memory pressure 2020-11-24 13:27:18 +01:00
net net/ncsi: Fix netlink registration 2020-11-24 13:27:19 +01:00
samples misc: vop: add round_up(x,4) for vring_size to avoid kernel panic 2020-10-30 10:38:29 +01:00
scripts scripts/setlocalversion: make git describe output more reliable 2020-11-05 11:08:31 +01:00
security selinux: Fix error return code in sel_ib_pkey_sid_slow() 2020-11-18 19:18:50 +01:00
sound ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link() 2020-11-18 19:18:42 +01:00
tools selftests: kvm: Fix the segment descriptor layout to match the actual layout 2020-11-24 13:27:19 +01:00
usr initramfs: restore default compression behavior 2020-04-13 10:44:59 +02:00
virt KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch 2020-10-01 13:14:54 +02:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Documentation/llvm: add documentation on building w/ Clang/LLVM 2020-09-26 18:01:31 +02:00
Makefile Linux 4.19.159 2020-11-22 10:02:27 +01:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.