Linux kernel source tree
Go to file
Xianting Tian cebefe4f6f mm/filemap.c: clear page error before actual read
[ Upstream commit faffdfa04f ]

Mount failure issue happens under the scenario: Application forked dozens
of threads to mount the same number of cramfs images separately in docker,
but several mounts failed with high probability.  Mount failed due to the
checking result of the page(read from the superblock of loop dev) is not
uptodate after wait_on_page_locked(page) returned in function cramfs_read:

   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate: systemd-udevd
read the loopX dev before mount, because the status of loopX is Lo_unbound
at this time, so loop_make_request directly trigger the calling of io_end
handler end_buffer_async_read, which called SetPageError(page).  So It
caused the page can't be set to uptodate in function
end_buffer_async_read:

   if(page_uptodate && !PageError(page)) {
      SetPageUptodate(page);
   }

Then mount operation is performed, it used the same page which is just
accessed by systemd-udevd above, Because this page is not uptodate, it
will launch a actual read via submit_bh, then wait on this page by calling
wait_on_page_locked(page).  When the I/O of the page done, io_end handler
end_buffer_async_read is called, because no one cleared the page
error(during the whole read path of mount), which is caused by
systemd-udevd reading, so this page is still in "PageError" status, which
can't be set to uptodate in function end_buffer_async_read, then caused
mount failure.

But sometimes mount succeed even through systemd-udeved read loopX dev
just before, The reason is systemd-udevd launched other loopX read just
between step 3.1 and 3.2, the steps as below:

1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
   ==>systemd-udevd read loopX dev<==
   2) read loopX dev(page has no error)
   3) mount succeed

As the loopX dev status is set to Lo_bound after step 3.1, so the other
loopX dev read by systemd-udevd will go through the whole I/O stack, part
of the call trace as below:

   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                        ClearPageError(page);
                        mapping->a_ops->readpage(filp, page);

here, mapping->a_ops->readpage() is blkdev_readpage.  In latest kernel,
some function name changed, the call trace as below:

   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. mutipath errors.
             * Pg_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read. The read will unlock the page*/
            error=mapping->a_ops->readpage(flip, page);

We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.

This patch is to add the calling of ClearPageError just before the actual
read of read path of cramfs mount.  Without the patch, the call trace as
below when performing cramfs mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page
               do_read_cache_page:
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);

With the patch, the call trace as below when performing mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page:
               do_read_cache_page:
                  ClearPageError(page); <== new add
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error if no
additional page error happen when I/O done.

Signed-off-by: Xianting Tian <xianting_tian@126.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <yubin@h3c.com>
Link: http://lkml.kernel.org/r/1583318844-22971-1-git-send-email-xianting_tian@126.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:41 +02:00
arch KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones 2020-10-01 13:14:38 +02:00
block block: ensure bdi->io_pages is always initialized 2020-09-12 13:40:22 +02:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock() 2020-07-09 09:37:10 +02:00
Documentation kbuild: support LLVM=1 to switch the default tools to Clang/LLVM 2020-09-26 18:01:32 +02:00
drivers PCI: pciehp: Fix MSI interrupt race 2020-10-01 13:14:41 +02:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:21:29 +01:00
fs NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() 2020-10-01 13:14:41 +02:00
include NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() 2020-10-01 13:14:41 +02:00
init printk: queue wake_up_klogd irq_work only if per-CPU areas are ready 2020-07-22 09:32:13 +02:00
ipc ipc/util.c: sysvipc_find_ipc() incorrectly updates position index 2020-05-20 08:18:40 +02:00
kernel tracing: Use address-of operator on section symbols 2020-10-01 13:14:38 +02:00
lib lib/string.c: implement stpcpy 2020-10-01 13:14:25 +02:00
LICENSES LICENSES: Remove CC-BY-SA-4.0 license text 2018-10-18 11:28:50 +02:00
mm mm/filemap.c: clear page error before actual read 2020-10-01 13:14:41 +02:00
net svcrdma: Fix leak of transport addresses 2020-10-01 13:14:40 +02:00
samples samples: bpf: Fix build error 2020-06-03 08:19:31 +02:00
scripts checkpatch: fix the usage of capture group ( ... ) 2020-09-09 19:04:32 +02:00
security selinux: sel_avc_get_stat_idx should increase position index 2020-10-01 13:14:33 +02:00
sound ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor 2020-10-01 13:14:41 +02:00
tools tools: gpio-hammer: Avoid potential overflow in main 2020-10-01 13:14:39 +02:00
usr initramfs: restore default compression behavior 2020-04-13 10:44:59 +02:00
virt KVM: fix overflow of zero page refcount with ksm running 2020-10-01 13:14:32 +02:00
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Documentation/llvm: add documentation on building w/ Clang/LLVM 2020-09-26 18:01:31 +02:00
Makefile Linux 4.19.148 2020-09-26 18:01:33 +02:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.