Commit Graph

71170 Commits

Author SHA1 Message Date
Tao Huang
8a8168e343 Merge remote branch 'android12-5.10-2021-08' of https://android.googlesource.com/kernel/common
* android12-5.10-2021-08: (429 commits)
  ANDROID: Update symbol list for mtk
  ANDROID: scheduler: export task_sched_runtime
  FROMLIST: mm: slub: fix slub_debug disabling for list of slabs
  FROMLIST: mm/madvise: add MADV_WILLNEED to process_madvise()
  ANDROID: Update the exynos symbol list
  FROMGIT: firmware: arm_scmi: Free mailbox channels if probe fails
  ANDROID: GKI: gki_defconfig: Enable CONFIG_NFC
  ANDROID: sched: Make uclamp changes depend on CAP_SYS_NICE
  ANDROID: GKI: update xiaomi symbol list and ABI XML
  ANDROID: ABI: update generic symbol list
  ANDROID: scsi: ufs: Enable CONFIG_SCSI_UFS_HPB
  ANDROID: scsi: ufs: Make CONFIG_SCSI_UFS_HPB compatible with the GKI
  UPSTREAM: arm64: vdso: Avoid ISB after reading from cntvct_el0
  ANDROID: GKI: Disable X86_MCE drivers
  ANDROID: GKI: Update symbols to symbol list
  ANDROID: ABI: update allowed list for exynos
  FROMGIT: sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
  FROMGIT: sched: Don't report SCHED_FLAG_SUGOV in sched_getattr()
  FROMGIT: sched/deadline: Fix reset_on_fork reporting of DL tasks
  BACKPORT: FROMGIT: sched: Fix UCLAMP_FLAG_IDLE setting
  ...

Change-Id: I5e0600bb4ccd0333366b016b42332e1e79e56b61

Conflicts:
	drivers/usb/gadget/configfs.c
	include/linux/usb/gadget.h
2021-08-24 20:07:38 +08:00
Linus Torvalds
3de34cc5ea UPSTREAM: pipe: make pipe writes always wake up readers
Since commit 1b6b26ae70 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.

In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already.  Doing
extraneous wakeups will only cause potential thundering herd problems.

However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it".  Even if there was no edge in sight.

Quoting Sandeep Patil:
 "The commit 1b6b26ae70 ('pipe: fix and clarify pipe write wakeup
  logic') changed pipe write logic to wakeup readers only if the pipe
  was empty at the time of write. However, there are libraries that
  relied upon the older behavior for notification scheme similar to
  what's described in [1]

  One such library 'realm-core'[2] is used by numerous Android
  applications. The library uses a similar notification mechanism as GNU
  Make but it never drains the pipe until it is full. When Android moved
  to v5.10 kernel, all applications using this library stopped working.

  The library has since been fixed[3] but it will be a while before all
  applications incorporate the updated library"

Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong.  Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.

So add the extraneous wakeup, to approximate the old behavior.

[ I say "approximate", because the exact old behavior was to do a wakeup
  not for each write(), but for each pipe buffer chunk that was filled
  in. The behavior introduced by this change is not that - this is just
  "every write will cause a wakeup, whether necessary or not", which
  seems to be sufficient for the broken library use. ]

It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.

See commit f467a6a664 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".

Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
Link: https://github.com/realm/realm-core [2]
Link: https://github.com/realm/realm-core/issues/4666 [3]
Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/

Bug: 193851993
Reported-by: Sandeep Patil <sspatil@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3a34b13a88)
Signed-off-by: Sandeep Patil <sspatil@android.com>
Change-Id: Idcf3e8faa31bff47ada4b815237a355e0757b964
2021-08-03 21:20:32 +00:00
Sandeep Patil
6b7e007164 ANDROID: Revert "ANDROID: fs: pipe: wakeup readers on small writes even if pipe had data"
This reverts commit
76879a1964 ("ANDROID: fs: pipe: wakeup readers on small writes even if pipe had data")
to replace that with the bug fix that landed upstream at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a34b13a88caeb2800ab44a4918f230041b37dd9

Bug: 193851993
Test: Build and boot cuttlefish.

Signed-off-by: Sandeep Patil <sspatil@android.com>
Change-Id: Ic4f5e2cc516b4ea68ae7d63225d1529217990431
2021-08-03 21:20:13 +00:00
Kalesh Singh
36fbb55631 FROMGIT: procfs: prevent unpriveleged processes accessing fdinfo dir
The file permissions on the fdinfo dir from were changed from
S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added
for opening the fdinfo files [1].  However, the ptrace permission check
was not added to the directory, allowing anyone to get the open FD numbers
by reading the fdinfo directory.

Add the missing ptrace permission check for opening the fdinfo directory.

[1] https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com

Link: https://lkml.kernel.org/r/20210713162008.1056986-1-kaleshsingh@google.com
Fixes: 7bc3fa0172 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Brown <broonie@kernel.org>

Bug: 151772539
Change-Id: I274b30aa0a5ce8412eae7161d31c6ee955035da9
(cherry picked from commit fc73829fa54b0c7af32d6da7c972eb3390957da4
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master)
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2021-07-29 15:10:21 -04:00
Jaegeuk Kim
fc2d64ec5d FROMGIT: f2fs: don't sleep while grabing nat_tree_lock
This tries to fix priority inversion in the below condition resulting in
long checkpoint delay.

f2fs_get_node_info()
 - nat_tree_lock
  -> sleep to grab journal_rwsem by contention

                                     checkpoint
                                     - waiting for nat_tree_lock

In order to let checkpoint go, let's release nat_tree_lock, if there's a
journal_rwsem contention.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Bug: 191987855
(cherry picked from commit 2eeb0dce72
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I97ac4f9d3bde399ab4f17f5b3a6e949ae9b79f0f
Signed-off-by: Daeho Jeong <daehojeong@google.com>
2021-07-29 01:29:53 +00:00
Sandeep Patil
76879a1964 ANDROID: fs: pipe: wakeup readers on small writes even if pipe had data
commit '1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic")'
change `pipe_write()` wakeup logic to wakeup readers only if the pipe
was empty.

This meant that applications that are not draining the pipe
before each write were exposed to unexpected timeouts / hangs in
epoll_wait() waiting for data in a pipe using EPOLLIN | EPOLLET flags.

This behaviour can be easily tested with android12-5.4 kernel where
the test that uses pipes for notifications in this way works while it
fails 100% with android12-5.10.

This change restores the old behavior to wakeup all pipe_readers if any
new data is written to the pipe.

Bug: 193851993
Bug: 193846582

Change-Id: If0c5a844091ccf16d5236bd072326325d4d5447a
Signed-off-by: Sandeep Patil <sspatil@google.com>
2021-07-27 17:35:49 +00:00
Jaegeuk Kim
e33cf9dd43 FROMGIT: f2fs: let's keep writing IOs on SBI_NEED_FSCK
SBI_NEED_FSCK is an indicator that fsck.f2fs needs to be triggered, so it
is not fully critical to stop any IO writes. So, let's allow to write data
instead of reporting EIO forever given SBI_NEED_FSCK, but do keep OPU.

Bug: 193659742
Fixes: 9557727876 ("f2fs: drop inplace IO if fs status is abnormal")
Cc: <stable@kernel.org> # v5.13+
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 1ffc8f5f77
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I9585358c1cee864064d9d210ee643f49ff1e6749
2021-07-22 23:53:01 +00:00
Matthias Maennich
d0a88ae479 ANDROID: Enable GKI Dr. No Enforcement
This effectively locks down OWNERS approval to a small group to guard
the code base against unintentional breakages.

Bug: 194314089
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: Ifd1ea97639a622320ea83f901f6451e2e52b38d4
2021-07-21 20:51:47 +01:00
Daeho Jeong
11cec52238 FROMGIT: f2fs: add sysfs nodes to get GC info for each GC mode
Added gc_reclaimed_segments and gc_segment_mode sysfs nodes.
1) "gc_reclaimed_segments" shows how many segments have been
reclaimed by GC during a specific GC mode.
2) "gc_segment_mode" is used to control for which gc mode
the "gc_reclaimed_segments" node shows.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Bug: 182708936
(cherry picked from commit 07c6b5933e
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: Ie8c2ccf5d36ce2f388c98b77e22848f3ff6645c3
Signed-off-by: Daeho Jeong <daehojeong@google.com>
2021-07-16 00:21:51 +00:00
Prasad Sodagudi
9b136eab76 ANDROID: pstore/ram: Add backward compatibility for ramoops reserved region
Some of the platforms might be still expecting dedicated memory region
for ramoops node. So add logic to detect the start and size of the
ramoops memory region by looking up reserved memory region with
of_reserved_mem_lookup() when platform_get_resource() failed.

Bug: 191636717
Change-Id: Idc479b45fb3f637f7235efd6eabac62059d5e92b
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2021-07-15 18:27:15 +00:00
Kalesh Singh
1f0c32a667 UPSTREAM: procfs/dmabuf: add inode number to /proc/*/fdinfo
And 'ino' field to /proc/<pid>/fdinfo/<FD> and
/proc/<pid>/task/<tid>/fdinfo/<FD>.

The inode numbers can be used to uniquely identify DMA buffers in user
space and avoids a dependency on /proc/<pid>/fd/* when accounting
per-process DMA buffer sizes.

Link: https://lkml.kernel.org/r/20210308170651.919148-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3845f256a8)

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Id07beac3edcc95c0b42805e24e5486965acbb46e
2021-07-12 22:38:22 +00:00
Kalesh Singh
0c8c125f57 UPSTREAM: procfs: allow reading fdinfo with PTRACE_MODE_READ
Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers.  In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting.  Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to a
DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and
/proc/<pid>/fdinfo -- both are only readable by the process owner, as
follows:

  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size.

Accessing other processes' fdinfo requires root privileges.  This limits
the use of the interface to debugging environments and is not suitable for
production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

Since fdinfo doesn't permit reading process memory and manipulating
process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.

Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7bc3fa0172)

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I842b689670f731138592f45c7124ef446d9aa59a
2021-07-12 22:38:16 +00:00
Kalesh Singh
2e0476a465 Revert "FROMLIST: procfs: Allow reading fdinfo with PTRACE_MODE_READ"
Revert submission 1578844

Reason for revert: Will be replaced by upstream version
Reverted Changes:
Ic9c551998:FROMLIST: BACKPORT: procfs/dmabuf: Add inode numbe...
I41407760c:FROMLIST: procfs: Allow reading fdinfo with PTRACE...

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Iede4e9a65f87dad1ca0c6ecdeb42de47af4b37c8
2021-07-12 22:38:09 +00:00
Kalesh Singh
5ded961aa2 Revert "FROMLIST: BACKPORT: procfs/dmabuf: Add inode number to /..."
Revert submission 1578844

Reason for revert: Will be replaced by upstream version
Reverted Changes:
Ic9c551998:FROMLIST: BACKPORT: procfs/dmabuf: Add inode numbe...
I41407760c:FROMLIST: procfs: Allow reading fdinfo with PTRACE...

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: If02a6dc9a525193f286a138791de49085cd91972
2021-07-12 22:38:01 +00:00
Jaegeuk Kim
3ee5565017 UPSTREAM: f2fs: initialize page->private when using for our internal use
We need to guarantee it's initially zero. Otherwise, it'll hurt entire flag
operations.

Bug: 192173168
Fixes: b763f3bedc ("f2fs: restructure f2fs page.private layout")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
(cherry picked from commit c9ebd3df43 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master)
Change-Id: I02211d4bd2b1cb526e5fbcd55673043e3082d011
2021-07-12 21:01:34 +00:00
Kuan-Ying Lee
a7a3b31d58 ANDROID: syscall_check: add vendor hook for open syscall
Through this vendor hook, we can get the timing to check
current running task for the validation of its credential
and open operation.

Bug: 191291287

Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Change-Id: Ia644ceb02dbc230ee1d25cad3630c2c3f908e41a
2021-07-09 13:48:44 +00:00
Isaac J. Manjarres
bd2ca0ba5b FROMLIST: pstore/ram: Rework logic for detecting ramoops reserved memory region
The reserved memory region for ramoops is assumed to be at a fixed
and known location when read from the devicetree. This is not desirable
in environments where it is preferred for the region to be dynamically
allocated at runtime, as opposed to it being fixed at compile time.

Change the logic for detecting the start and size of the ramoops
memory region by looking up the reserved memory region instead of
using platform_get_resource(), which assumes that the location
of the memory is known ahead of time.

Bug: 191636717
Link: https://lore.kernel.org/patchwork/patch/1451704/
Change-Id: I24066de9f4fe1f1575cb1bbb1687c37a2b1938a4
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2021-07-08 18:16:31 +00:00
Tao Huang
582f79266f Merge remote branch 'android12-5.10' of https://android.googlesource.com/kernel/common
* android12-5.10: (2274 commits)
  FROMGIT: mm: slub: move sysfs slab alloc/free interfaces to debugfs
  ANDROID: gki - CONFIG_NET_SCH_FQ=y
  ANDROID: GKI: Kconfig.gki: Add GKI_HIDDEN_ETHERNET_CONFIGS
  FROMLIST: media: Kconfig: Fix DVB_CORE can't be selected as module
  ANDROID: Update ABI and symbol list
  Revert "net: usb: cdc_ncm: don't spew notifications"
  ANDROID: Fips 140: move fips symbols entirely in own list
  ANDROID: core of xt_IDLETIMER send_nl_msg support
  ANDROID: start to re-add xt_IDLETIMER send_nl_msg support
  ANDROID: add fips140.ko symbols to module ABI
  ANDROID: inject correct HMAC digest into fips140.ko at build time
  ANDROID: crypto: fips140 - perform load time integrity check
  FROMLIST: crypto: shash - stop comparing function pointers to avoid breaking CFI
  ANDROID: arm64: module: preserve RELA sections for FIPS140 integrity selfcheck
  ANDROID: arm64: simd: omit capability check in may_use_simd()
  ANDROID: kbuild: lto: permit the use of .a archives in LTO modules
  ANDROID: arm64: only permit certain alternatives in the FIPS140 module
  ANDROID: crypto: lib/aes - add vendor hooks for AES library routines
  ANDROID: crypto: lib/sha256 - add vendor hook for sha256() routine
  UPSTREAM: KVM: arm64: Mark the host stage-2 memory pools static
  ...

Conflicts:
	drivers/mmc/core/mmc_ops.c
	drivers/usb/gadget/function/f_uac1.c
	drivers/usb/gadget/function/f_uac2.c
	drivers/usb/gadget/function/f_uvc.c
2021-06-25 11:32:04 +08:00
Jon Lin
cc770c4e7a ubifs: Recovery for cases of unclean reboot
After power lost, spinand may work in a unkonw state and result in
bit flip, including:
1.Write to cache invalid and dirty cache data write to page's array
which result in node CRC error for some pages.
2.One page write fail but the next page write success result in
empty space corruption.

Change-Id: I212c237202b32de0217efc8dd5a4e84174953a3f
Signed-off-by: Jon Lin <jon.lin@rock-chips.com>
2021-06-24 20:25:10 +08:00
Jaegeuk Kim
8102df91f2 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.10.y' into android12-5.10
* aosp/upstream-f2fs-stable-linux-5.10.y:
  Revert "f2fs: avoid attaching SB_ACTIVE flag during mount/remount"
  f2fs: remove false alarm on iget failure during GC
  f2fs: enable extent cache for compression files in read-only
  f2fs: fix to avoid adding tab before doc section
  f2fs: introduce f2fs_casefolded_name slab cache
  f2fs: swap: support migrating swapfile in aligned write mode
  f2fs: swap: remove dead codes
  f2fs: compress: add compress_inode to cache compressed blocks
  f2fs: clean up /sys/fs/f2fs/<disk>/features
  f2fs: add pin_file in feature list
  f2fs: Advertise encrypted casefolding in sysfs
  f2fs: Show casefolding support only when supported
  f2fs: support RO feature
  f2fs: logging neatening

Bug: 186107892
Bug: 190759634
Bug: 190517210
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I8eb93d8a43304b98166676da52a9c2434b15b942
2021-06-23 19:54:41 -07:00
Jaegeuk Kim
bb51a33182 Revert "f2fs: avoid attaching SB_ACTIVE flag during mount/remount"
This reverts commit 42bbf0bcc2.
2021-06-23 01:42:05 -07:00
Jaegeuk Kim
c81ac64da1 f2fs: remove false alarm on iget failure during GC
This patch removes setting SBI_NEED_FSCK when GC gets an error on f2fs_iget,
since f2fs_iget can give ENOMEM and others by race condition.
If we set this critical fsck flag, we'll get EIO during fsync via the below
code path.

In f2fs_inplace_write_data(),

	if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) || f2fs_cp_error(sbi)) {
		err = -EIO;
		goto drop_bio;
	}

Fixes: 9557727876 ("f2fs: drop inplace IO if fs status is abnormal")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23 01:41:57 -07:00
Daeho Jeong
cdeff03989 f2fs: enable extent cache for compression files in read-only
Let's allow extent cache for RO partition.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-21 07:24:26 -07:00
Chao Yu
44e0be85eb f2fs: introduce f2fs_casefolded_name slab cache
Add a slab cache: "f2fs_casefolded_name" for memory allocation
of casefold name.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-21 07:24:24 -07:00
Greg Kroah-Hartman
9e08e97ec6 Merge 5.10.43 into android12-5.10
Changes in 5.10.43
	btrfs: tree-checker: do not error out if extent ref hash doesn't match
	net: usb: cdc_ncm: don't spew notifications
	hwmon: (dell-smm-hwmon) Fix index values
	hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
	netfilter: conntrack: unregister ipv4 sockopts on error unwind
	efi/fdt: fix panic when no valid fdt found
	efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
	efi/libstub: prevent read overflow in find_file_option()
	efi: cper: fix snprintf() use in cper_dimm_err_location()
	vfio/pci: Fix error return code in vfio_ecap_init()
	vfio/pci: zap_vma_ptes() needs MMU
	samples: vfio-mdev: fix error handing in mdpy_fb_probe()
	vfio/platform: fix module_put call in error flow
	ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
	HID: logitech-hidpp: initialize level variable
	HID: pidff: fix error return code in hid_pidff_init()
	HID: i2c-hid: fix format string mismatch
	devlink: Correct VIRTUAL port to not have phys_port attributes
	net/sched: act_ct: Offload connections with commit action
	net/sched: act_ct: Fix ct template allocation for zone 0
	mptcp: always parse mptcp options for MPC reqsk
	nvme-rdma: fix in-casule data send for chained sgls
	ACPICA: Clean up context mutex during object deletion
	perf probe: Fix NULL pointer dereference in convert_variable_location()
	net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
	net: sock: fix in-kernel mark setting
	net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
	net/tls: Fix use-after-free after the TLS device goes down and up
	net/mlx5e: Fix incompatible casting
	net/mlx5: Check firmware sync reset requested is set before trying to abort it
	net/mlx5e: Check for needed capability for cvlan matching
	net/mlx5: DR, Create multi-destination flow table with level less than 64
	nvmet: fix freeing unallocated p2pmem
	netfilter: nft_ct: skip expectations for confirmed conntrack
	netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
	drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
	bpf: Simplify cases in bpf_base_func_proto
	bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
	ieee802154: fix error return code in ieee802154_add_iface()
	ieee802154: fix error return code in ieee802154_llsec_getparams()
	igb: add correct exception tracing for XDP
	ixgbevf: add correct exception tracing for XDP
	cxgb4: fix regression with HASH tc prio value update
	ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
	ice: Fix allowing VF to request more/less queues via virtchnl
	ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
	ice: handle the VF VSI rebuild failure
	ice: report supported and advertised autoneg using PHY capabilities
	ice: Allow all LLDP packets from PF to Tx
	i2c: qcom-geni: Add shutdown callback for i2c
	cxgb4: avoid link re-train during TC-MQPRIO configuration
	i40e: optimize for XDP_REDIRECT in xsk path
	i40e: add correct exception tracing for XDP
	ice: simplify ice_run_xdp
	ice: optimize for XDP_REDIRECT in xsk path
	ice: add correct exception tracing for XDP
	ixgbe: optimize for XDP_REDIRECT in xsk path
	ixgbe: add correct exception tracing for XDP
	arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
	optee: use export_uuid() to copy client UUID
	bus: ti-sysc: Fix am335x resume hang for usb otg module
	arm64: dts: ls1028a: fix memory node
	arm64: dts: zii-ultra: fix 12V_MAIN voltage
	arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
	ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
	ARM: dts: imx7d-pico: Fix the 'tuning-step' property
	ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
	bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
	tipc: add extack messages for bearer/media failure
	tipc: fix unique bearer names sanity check
	serial: stm32: fix threaded interrupt handling
	riscv: vdso: fix and clean-up Makefile
	io_uring: fix link timeout refs
	io_uring: use better types for cflags
	drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
	Bluetooth: fix the erroneous flush_work() order
	Bluetooth: use correct lock to prevent UAF of hdev object
	wireguard: do not use -O3
	wireguard: peer: allocate in kmem_cache
	wireguard: use synchronize_net rather than synchronize_rcu
	wireguard: selftests: remove old conntrack kconfig value
	wireguard: selftests: make sure rp_filter is disabled on vethc
	wireguard: allowedips: initialize list head in selftest
	wireguard: allowedips: remove nodes in O(1)
	wireguard: allowedips: allocate nodes in kmem_cache
	wireguard: allowedips: free empty intermediate nodes when removing single node
	net: caif: added cfserl_release function
	net: caif: add proper error handling
	net: caif: fix memory leak in caif_device_notify
	net: caif: fix memory leak in cfusbl_device_notify
	HID: i2c-hid: Skip ELAN power-on command after reset
	HID: magicmouse: fix NULL-deref on disconnect
	HID: multitouch: require Finger field to mark Win8 reports as MT
	gfs2: fix scheduling while atomic bug in glocks
	ALSA: timer: Fix master timer notification
	ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
	ALSA: hda: update the power_state during the direct-complete
	ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
	ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
	ext4: fix memory leak in ext4_fill_super
	ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
	ext4: fix fast commit alignment issues
	ext4: fix memory leak in ext4_mb_init_backend on error path.
	ext4: fix accessing uninit percpu counter variable with fast_commit
	usb: dwc2: Fix build in periphal-only mode
	pid: take a reference when initializing `cad_pid`
	ocfs2: fix data corruption by fallocate
	mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
	mm/page_alloc: fix counting of free pages after take off from buddy
	x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
	x86/sev: Check SME/SEV support in CPUID first
	nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
	drm/amdgpu: Don't query CE and UE errors
	drm/amdgpu: make sure we unpin the UVD BO
	x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
	powerpc/kprobes: Fix validation of prefixed instructions across page boundary
	btrfs: mark ordered extent and inode with error if we fail to finish
	btrfs: fix error handling in btrfs_del_csums
	btrfs: return errors from btrfs_del_csums in cleanup_ref_head
	btrfs: fixup error handling in fixup_inode_link_counts
	btrfs: abort in rename_exchange if we fail to insert the second ref
	btrfs: fix deadlock when cloning inline extents and low on available space
	mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
	drm/msm/dpu: always use mdp device to scale bandwidth
	btrfs: fix unmountable seed device after fstrim
	KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
	KVM: arm64: Fix debug register indexing
	x86/kvm: Teardown PV features on boot CPU as well
	x86/kvm: Disable kvmclock on all CPUs on shutdown
	x86/kvm: Disable all PV features on crash
	lib/lz4: explicitly support in-place decompression
	i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
	netfilter: nf_tables: missing error reporting for not selected expressions
	xen-netback: take a reference to the RX task thread
	neighbour: allow NUD_NOARP entries to be forced GCed
	Linux 5.10.43

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d7ec0878193e4e454076809b7fb71fcc4e3d810
2021-06-12 14:48:14 +02:00
Anand Jain
fe910d20e2 btrfs: fix unmountable seed device after fstrim
commit 5e753a817b upstream.

The following test case reproduces an issue of wrongly freeing in-use
blocks on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

Create a seed device and add a sprout device to it:

  $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
  $ btrfstune -S 1 /dev/loop0
  $ mount /dev/loop0 /btrfs
  $ btrfs dev add -f /dev/loop1 /btrfs
  BTRFS info (device loop0): relocating block group 290455552 flags system
  BTRFS info (device loop0): relocating block group 1048576 flags system
  BTRFS info (device loop0): disk added /dev/loop1
  $ umount /btrfs

Mount the sprout device and run fstrim:

  $ mount /dev/loop1 /btrfs
  $ fstrim /btrfs
  $ umount /btrfs

Now try to mount the seed device, and it fails:

  $ mount /dev/loop0 /btrfs
  mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Block 5292032 is missing on the readonly seed device:

 $ dmesg -kt | tail
 <snip>
 BTRFS error (device loop0): bad tree block start, want 5292032 have 0
 BTRFS warning (device loop0): couldn't read-tree root
 BTRFS error (device loop0): open_ctree failed

>From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880:

  $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
  <snip>
  item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
  	block group used 114688 chunk_objectid 256 flags METADATA
  <snip>

>From the dump-tree of the sprout device (taken before the fstrim).
fstrim used block-group 5242880 to find the related free space to free:

  $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
  <snip>
  item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
  	block group used 32768 chunk_objectid 256 flags METADATA
  <snip>

BPF kernel tracing the fstrim command finds the missing block 5292032
within the range of the discarded blocks as below:

  kprobe:btrfs_discard_extent {
  	printf("freeing start %llu end %llu num_bytes %llu:\n",
  		arg1, arg1+arg2, arg2);
  }

  freeing start 5259264 end 5406720 num_bytes 147456
  <snip>

Fix this by avoiding the discard command to the readonly seed device.

Reported-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Filipe Manana
baa6763123 btrfs: fix deadlock when cloning inline extents and low on available space
commit 76a6d5cd74 upstream.

There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:

1) When starting writeback for an inode with a very small dirty range that
   fits in an inline extent, we deadlock during the writeback when trying
   to insert the inline extent, at cow_file_range_inline(), if the extent
   is going to be located in the leaf for which we are already holding a
   read lock;

2) After successfully starting writeback, for non-inline extent cases,
   the async reclaim thread will hang waiting for an ordered extent to
   complete if the ordered extent completion needs to modify the leaf
   for which the clone task is holding a read lock (for adding or
   replacing file extent items). So the cloning task will wait forever
   on the async reclaim thread to make progress, which in turn is
   waiting for the ordered extent completion which in turn is waiting
   to acquire a write lock on the same leaf.

So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.

Fixes: 05a5a7621c ("Btrfs: implement full reflink support for inline extents")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
0df50d47d1 btrfs: abort in rename_exchange if we fail to insert the second ref
commit dc09ef3562 upstream.

Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange.  This happens because
we insert the inode ref for one side of the rename, and then for the
other side.  If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind.  Fix this by
aborting if we did the insert for the first inode ref.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
48568f3944 btrfs: fixup error handling in fixup_inode_link_counts
commit 011b28acf9 upstream.

This function has the following pattern

	while (1) {
		ret = whatever();
		if (ret)
			goto out;
	}
	ret = 0
out:
	return ret;

However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.

Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the

	ret = 0;
out:

bit so everybody can break if there is an error, which will allow for
proper error handling to occur.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
466d83fdbb btrfs: return errors from btrfs_del_csums in cleanup_ref_head
commit 856bd270dc upstream.

We are unconditionally returning 0 in cleanup_ref_head, despite the fact
that btrfs_del_csums could fail.  We need to return the error so the
transaction gets aborted properly, fix this by returning ret from
btrfs_del_csums in cleanup_ref_head.

Reviewed-by: Qu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
5a89982fa2 btrfs: fix error handling in btrfs_del_csums
commit b86652be7c upstream.

Error injection stress would sometimes fail with checksums on disk that
did not have a corresponding extent.  This occurred because the pattern
in btrfs_del_csums was

	while (1) {
		ret = btrfs_search_slot();
		if (ret < 0)
			break;
	}
	ret = 0;
out:
	btrfs_free_path(path);
	return ret;

If we got an error from btrfs_search_slot we'd clear the error because
we were breaking instead of goto out.  Instead of using goto out, simply
handle the cases where we may leave a random value in ret, and get rid
of the

	ret = 0;
out:

pattern and simply allow break to have the proper error reporting.  With
this fix we properly abort the transaction and do not commit thinking we
successfully deleted the csum.

Reviewed-by: Qu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:27 +02:00
Josef Bacik
b547a16b24 btrfs: mark ordered extent and inode with error if we fail to finish
commit d61bec08b9 upstream.

While doing error injection testing I saw that sometimes we'd get an
abort that wouldn't stop the current transaction commit from completing.
This abort was coming from finish ordered IO, but at this point in the
transaction commit we should have gotten an error and stopped.

It turns out the abort came from finish ordered io while trying to write
out the free space cache.  It occurred to me that any failure inside of
finish_ordered_io isn't actually raised to the person doing the writing,
so we could have any number of failures in this path and think the
ordered extent completed successfully and the inode was fine.

Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and
marking the mapping of the inode with mapping_set_error, so any callers
that simply call fdatawait will also get the error.

With this we're seeing the IO error on the free space inode when we fail
to do the finish_ordered_io.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:27 +02:00
Junxiao Bi
c8d5faee46 ocfs2: fix data corruption by fallocate
commit 6bba4471f0 upstream.

When fallocate punches holes out of inode size, if original isize is in
the middle of last cluster, then the part from isize to the end of the
cluster will be zeroed with buffer write, at that time isize is not yet
updated to match the new size, if writeback is kicked in, it will invoke
ocfs2_writepage()->block_write_full_page() where the pages out of inode
size will be dropped.  That will cause file corruption.  Fix this by
zero out eof blocks when extending the inode size.

Running the following command with qemu-image 4.2.1 can get a corrupted
coverted image file easily.

    qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
             -O qcow2 -o compat=1.1 $qcow_image.conv

The usage of fallocate in qemu is like this, it first punches holes out
of inode size, then extend the inode size.

    fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
    fallocate(11, 0, 2276196352, 65536) = 0

v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/

Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:26 +02:00
Ritesh Harjani
3b713aafa7 ext4: fix accessing uninit percpu counter variable with fast_commit
commit b45f189a19 upstream.

When running generic/527 with fast_commit configuration, the following
issue is seen on Power.  With fast_commit, during ext4_fc_replay()
(which can be called from ext4_fill_super()), if inode eviction
happens then it can access an uninitialized percpu counter variable.

This patch adds the check before accessing the counters in
ext4_free_inode() path.

[  321.165371] run fstests generic/527 at 2021-04-29 08:38:43
[  323.027786] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: block_validity. Quota mode: none.
[  323.618772] BUG: Unable to handle kernel data access on read at 0x1fbd80000
[  323.619767] Faulting instruction address: 0xc000000000bae78c
cpu 0x1: Vector: 300 (Data Access) at [c000000010706ef0]
    pc: c000000000bae78c: percpu_counter_add_batch+0x3c/0x100
    lr: c0000000006d0bb0: ext4_free_inode+0x780/0xb90
    pid   = 5593, comm = mount
	ext4_free_inode+0x780/0xb90
	ext4_evict_inode+0xa8c/0xc60
	evict+0xfc/0x1e0
	ext4_fc_replay+0xc50/0x20f0
	do_one_pass+0xfe0/0x1350
	jbd2_journal_recover+0x184/0x2e0
	jbd2_journal_load+0x1c0/0x4a0
	ext4_fill_super+0x2458/0x4200
	mount_bdev+0x1dc/0x290
	ext4_mount+0x28/0x40
	legacy_get_tree+0x4c/0xa0
	vfs_get_tree+0x4c/0x120
	path_mount+0xcf8/0xd70
	do_mount+0x80/0xd0
	sys_mount+0x3fc/0x490
	system_call_exception+0x384/0x3d0
	system_call_common+0xec/0x278

Cc: stable@kernel.org
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/6cceb9a75c54bef8fa9696c1b08c8df5ff6169e2.1619692410.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:26 +02:00
Phillip Potter
2050c6e5b1 ext4: fix memory leak in ext4_mb_init_backend on error path.
commit a8867f4e38 upstream.

Fix a memory leak discovered by syzbot when a file system is corrupted
with an illegally large s_log_groups_per_flex.

Reported-by: syzbot+aa12d6106ea4ca1b6aae@syzkaller.appspotmail.com
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20210412073837.1686-1-phil@philpotter.co.uk
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:26 +02:00
Harshad Shirwadkar
fb86acc623 ext4: fix fast commit alignment issues
commit a7ba36bc94 upstream.

Fast commit recovery data on disk may not be aligned. So, when the
recovery code reads it, this patch makes sure that fast commit info
found on-disk is first memcpy-ed into an aligned variable before
accessing it. As a consequence of it, we also remove some macros that
could resulted in unaligned accesses.

Cc: stable@kernel.org
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:26 +02:00
Ye Bin
d3b668b96a ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
commit 082cd4ec24 upstream.

We got follow bug_on when run fsstress with injecting IO fault:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
[130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995]  ext4_readpage+0x54/0x100 [ext4]
[130747.337359]  generic_file_buffered_read+0x410/0xae8
[130747.337767]  generic_file_read_iter+0x114/0x190
[130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556]  __vfs_read+0x11c/0x188
[130747.338851]  vfs_read+0x94/0x150
[130747.339110]  ksys_read+0x74/0xf0

This patch's modification is according to Jan Kara's suggestion in:
https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
"I see. Now I understand your patch. Honestly, seeing how fragile is trying
to fix extent tree after split has failed in the middle, I would probably
go even further and make sure we fix the tree properly in case of ENOSPC
and EDQUOT (those are easily user triggerable).  Anything else indicates a
HW problem or fs corruption so I'd rather leave the extent tree as is and
don't try to fix it (which also means we will not create overlapping
extents)."

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:26 +02:00
Alexey Makhalov
01d349a481 ext4: fix memory leak in ext4_fill_super
commit afd09b617d upstream.

Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Fixes: ce40733ce9 ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec11 ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:26 +02:00
Bob Peterson
d11e5b96ef gfs2: fix scheduling while atomic bug in glocks
commit 20265d9a67 upstream.

Before this patch, in the unlikely event that gfs2_glock_dq encountered
a withdraw, it would do a wait_on_bit to wait for its journal to be
recovered, but it never released the glock's spin_lock, which caused a
scheduling-while-atomic error.

This patch unlocks the lockref spin_lock before waiting for recovery.

Fixes: 601ef0d52e ("gfs2: Force withdraw to replay journals and wait for it to finish")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:25 +02:00
Pavel Begunkov
ec72cb50c1 io_uring: use better types for cflags
[ Upstream commit 8c3f9cd160 ]

__io_cqring_fill_event() takes cflags as long to squeeze it into u32 in
an CQE, awhile all users pass int or unsigned. Replace it with unsigned
int and store it as u32 in struct io_completion to match CQE.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:23 +02:00
Pavel Begunkov
0b2a990e5d io_uring: fix link timeout refs
[ Upstream commit a298232ee6 ]

WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28
RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28
Call Trace:
 __refcount_sub_and_test include/linux/refcount.h:283 [inline]
 __refcount_dec_and_test include/linux/refcount.h:315 [inline]
 refcount_dec_and_test include/linux/refcount.h:333 [inline]
 io_put_req fs/io_uring.c:2140 [inline]
 io_queue_linked_timeout fs/io_uring.c:6300 [inline]
 __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354
 io_submit_sqe fs/io_uring.c:6534 [inline]
 io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660
 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline]
 __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182

io_link_timeout_fn() should put only one reference of the linked timeout
request, however in case of racing with the master request's completion
first io_req_complete() puts one and then io_put_req_deferred() is
called.

Cc: stable@vger.kernel.org # 5.12+
Fixes: 9ae1f8dd37 ("io_uring: fix inconsistent lock state")
Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.1620417627.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:23 +02:00
Josef Bacik
1d62b7ac83 btrfs: tree-checker: do not error out if extent ref hash doesn't match
commit 1119a72e22 upstream.

The tree checker checks the extent ref hash at read and write time to
make sure we do not corrupt the file system.  Generally extent
references go inline, but if we have enough of them we need to make an
item, which looks like

key.objectid	= <bytenr>
key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
key.offset	= hash(tree, owner, offset)

However if key.offset collide with an unrelated extent reference we'll
simply key.offset++ until we get something that doesn't collide.
Obviously this doesn't match at tree checker time, and thus we error
while writing out the transaction.  This is relatively easy to
reproduce, simply do something like the following

  xfs_io -f -c "pwrite 0 1M" file
  offset=2

  for i in {0..10000}
  do
	  xfs_io -c "reflink file 0 ${offset}M 1M" file
	  offset=$(( offset + 2 ))
  done

  xfs_io -c "reflink file 0 17999258914816 1M" file
  xfs_io -c "reflink file 0 35998517829632 1M" file
  xfs_io -c "reflink file 0 53752752058368 1M" file

  btrfs filesystem sync

And the sync will error out because we'll abort the transaction.  The
magic values above are used because they generate hash collisions with
the first file in the main subvol.

The fix for this is to remove the hash value check from tree checker, as
we have no idea which offset ours should belong to.

Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
Fixes: 0785a9aacf ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:12 +02:00
Huang Jianan
06229c49eb UPSTREAM: erofs: support adjust lz4 history window size
lz4 uses LZ4_DISTANCE_MAX to record history preservation. When
using rolling decompression, a block with a higher compression
ratio will cause a larger memory allocation (up to 64k). It may
cause a large resource burden in extreme cases on devices with
small memory and a large number of concurrent IOs. So appropriately
reducing this value can improve performance.

Decreasing this value will reduce the compression ratio (except
when input_size <LZ4_DISTANCE_MAX). But considering that erofs
currently only supports 4k output, reducing this value will not
significantly reduce the compression benefits.

The maximum value of LZ4_DISTANCE_MAX defined by lz4 is 64k, and
we can only reduce this value. For the old kernel, it just can't
reduce the memory allocation during rolling decompression without
affecting the decompression result.

Link: https://lore.kernel.org/r/20210329012308.28743-3-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
[ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ]
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 190585249
Change-Id: Ia1dbe56677a0bd2a388ae6b484f4c4d40170bf1c
(cherry picked from commit 5d50538fc5)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:02:00 +00:00
Huang Jianan
0a24d25f08 UPSTREAM: erofs: use sync decompression for atomic contexts only
Sync decompression was introduced to get rid of additional kworker
scheduling overhead. But there is no such overhead in non-atomic
contexts. Therefore, it should be better to turn off sync decompression
to avoid the current thread waiting in z_erofs_runqueue.

Link: https://lore.kernel.org/r/20210317035448.13921-3-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 190585249
Change-Id: I7e03053d690b9cda4fe78c0ed0e2fbb0ba188d38
(cherry picked from commit 30048cdac4)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:01:48 +00:00
Huang Jianan
0ca4eafb39 UPSTREAM: erofs: use workqueue decompression for atomic contexts only
z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead.

Link: https://lore.kernel.org/r/20210317035448.13921-2-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 190585249
Change-Id: I369268e398ce8b16362ace2e6a03a5061caebd90
(cherry picked from commit 648f2de053)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:01:33 +00:00
Huang Jianan
5a44e4bc13 UPSTREAM: erofs: avoid memory allocation failure during rolling decompression
Currently, err would be treated as io error. Therefore, it'd be
better to ensure memory allocation during rolling decompression
to avoid such io error.

In the long term, we might consider adding another !Uptodate case
for such case.

Link: https://lore.kernel.org/r/20210316031515.90954-1-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 190585249
Change-Id: I8bab4ab3e0d26b698fc0054517065872eafa5d15
(cherry picked from commit b4892fa3e7)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:01:19 +00:00
Gao Xiang
4ae1c8a4d0 UPSTREAM: erofs: force inplace I/O under low memory scenario
Try to forcely switch to inplace I/O under low memory scenario in
order to avoid direct memory reclaim due to cached page allocation.

Link: https://lore.kernel.org/r/20201209123717.12430-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>

Bug: 190585249
Change-Id: I877e97f493a74c94e94c3671d2cd8a102ad5ed54
(cherry picked from commit 1825c8d7ce)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:01:08 +00:00
Gao Xiang
46a00756cb UPSTREAM: erofs: insert to managed cache after adding to pcl
Previously, it could be some concern to call add_to_page_cache_lru()
with page->mapping == Z_EROFS_MAPPING_STAGING (!= NULL).

In contrast, page->private is used instead now, so partially revert
commit 5ddcee1f3a ("erofs: get rid of __stagingpage_alloc helper")
with some adaption for simplicity.

Link: https://lore.kernel.org/r/20201208095834.3133565-2-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>

Bug: 190585249
Change-Id: Ia67c77a85f1e22bce2b1c0f238639fbb615d613a
(cherry picked from commit bf225074ff)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:00:56 +00:00
Gao Xiang
b1c757466f UPSTREAM: erofs: get rid of magical Z_EROFS_MAPPING_STAGING
Previously, we played around with magical page->mapping for short-lived
temporary pages since we need to identify different types of pages in
the same pcluster but both invalidated and short-lived temporary pages
can have page->mapping == NULL. It was considered as safe because that
temporary pages are all non-LRU / non-movable pages.

This patch tends to use specific page->private to identify short-lived
pages instead so it won't rely on page->mapping anymore. Details are
described in "compress.h" as well.

Link: https://lore.kernel.org/r/20201208095834.3133565-1-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>

Bug: 190585249
Change-Id: I0beb89854846dc5fe9bf167bc9ce311659003dc3
(cherry picked from commit 6aaa7b0664)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-06-09 12:00:47 +00:00
Cliff Chen
24dace7355 ANDROID: fuse: fix deadlock for reply of FUSE_CANONICAL_PATH
There is a deadlock when the reply of FUSE_CANONICAL_PATH from user-
space client, because the kern_path function will issue a new request
and wait the respond from client which has been in wait state. The ba-
cktrace is like this:

<6>[  518.977731] ntfs-3g         S    0  2138      1 0x04000000
<4>[  518.977745] Call trace:
<4>[  518.977757]  __switch_to+0x130/0x13c
<4>[  518.977767]  __schedule+0x740/0x964
<4>[  518.977777]  schedule+0x70/0x90
<4>[  518.977794]  __fuse_request_send+0x1a0/0x340
<4>[  518.977808]  fuse_simple_request+0x178/0x1c8
<4>[  518.977818]  fuse_lookup_name+0xfc/0x220
<4>[  518.977829]  fuse_lookup+0x48/0x134
<4>[  518.977842]  __lookup_slow+0xc8/0x154
<4>[  518.977853]  walk_component+0x1c0/0x728
<4>[  518.977863]  path_lookupat+0xa8/0x208
<4>[  518.977875]  filename_lookup+0x8c/0x190
<4>[  518.977887]  kern_path+0x30/0x3c
<4>[  518.977901]  fuse_dev_do_write+0x79c/0x114c
<4>[  518.977914]  fuse_dev_write+0x60/0x84
<4>[  518.977928]  do_iter_readv_writev+0x11c/0x158
<4>[  518.977941]  do_iter_write+0x7c/0x1b8
<4>[  518.977953]  vfs_writev+0x84/0xe8
<4>[  518.977966]  do_writev+0x78/0x114
<4>[  518.977979]  __arm64_sys_writev+0x1c/0x24
<4>[  518.977992]  el0_svc_common+0x98/0x160
<4>[  518.978005]  el0_svc_handler+0x5c/0x64
<4>[  518.978015]  el0_svc+0x8/0xc

Fixes: fa199896a3 ("ANDROID: fuse: Add support for d_canonical_path")
Signed-off-by: Cliff Chen <cliff.chen@rock-chips.com>
Change-Id: I13487e5c956c4537c2554a44208d6664653ef4f1
2021-06-08 09:38:35 +08:00