Add the vendor hook to freezer.c, because of some special cases related to our feature, we do not want the process to be frozen immediately, so we add the hook at __refrigerator to make sure we can go to our own freeze logic when the process is about to be frozen.
Bug: 187458531
Signed-off-by: heshuai1 <heshuai1@xiaomi.com>
Change-Id: Iea42fd9604d6b33ccd6502425416f0dd28eecebb
Add android_rvh_find_new_ilb to select a next ilb cpu for vendors.
Bug: 190228983
Change-Id: Iba1a0cd9cdc22dcf628dd33f8d838fe513a4818f
Signed-off-by: Choonghoon Park <choong.park@samsung.com>
Add ANDROID_OEM_DATA to struct rq, which is used to implement oem's
scheduler tuning.
Bug: 188899490
Change-Id: I1904b4fd83effc4b309bfb98811e9718398504f4
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
With the introduction of per-cpu wakeup devices that can be used in
preference to the broadcast timer, print the name of such devices when
they are available.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-6-will@kernel.org
(cherry picked from commit 245a057fee tip/tip.git timers/core)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 185092876
Change-Id: I39736cb43702430b722382c802603fdc4188a5c4
When configuring the broadcast timer on entry to and exit from deep idle
states, prefer a per-CPU wakeup timer if one exists.
On entry to idle, stop the tick device and transfer the next event into
the oneshot wakeup device, which will serve as the wakeup from idle. To
avoid the overhead of additional hardware accesses on exit from idle,
leave the timer armed and treat the inevitable interrupt as a (possibly
spurious) tick event.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-5-will@kernel.org
(cherry picked from commit ea5c7f1b9a tip/tip.git timers/core)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 185092876
Change-Id: I62a49231e213285f95e9f0cf6a07633984930b56
Some SoCs have two per-cpu timer implementations where the timer with the
higher rating stops in deep idle (i.e. suffers from CLOCK_EVT_FEAT_C3STOP)
but is otherwise preferable to the timer with the lower rating. In such a
design, selecting the higher rated devices relies on a global broadcast
timer and IPIs to wake up from deep idle states.
To avoid the reliance on a global broadcast timer and also to reduce the
overhead associated with the IPI wakeups, extend
tick_install_broadcast_device() to manage per-cpu wakeup timers separately
from the broadcast device.
For now, these timers remain unused.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-4-will@kernel.org
(cherry picked from commit c94a8537df tip/tip.git timers/core)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 185092876
Change-Id: I2d2b1bc6333d004846270d3e58dec0dca89a89d1
In preparation for adding support for per-cpu wakeup timers, split
_tick_broadcast_oneshot_control() into a helper function which deals
only with the broadcast timer management across idle transitions.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-3-will@kernel.org
(cherry picked from commit e5007c288e tip/tip.git timers/core)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 185092876
Change-Id: I14c5456ec1af0b8f73c85e9571f171ea1c3c564b
tick-broadcast.o is only built if CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
so remove the redundant #ifdef guards around the definition of
tick_receive_broadcast().
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-2-will@kernel.org
(cherry picked from commit c2d4fee3f6 tip/tip.git timers/core)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 185092876
Change-Id: Ifb18a53bfad8d228f6f76c9b74c9bca8de27759a
Add vendor hook to determine if the memory of a process that received
the SIGKILL can be reaped.
Bug: 189803002
Change-Id: Ie6802b9bf93ddffb0ceef615d7cca40c23219e57
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
This reverts commit a7af91adc7 ("ANDROID: mm: oom_kill: reap memory of
a task that receives SIGKILL") as this functionality is moved to vendor
hook based approach.
Bug: 189803002
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Change-Id: Ica0d6df22fb81bf430e9b4c7d12b36ab7d44dab8
specific wake flag for android vendor
Bug: 189858948
Signed-off-by: Namkyu Kim <namkyu78.kim@samsung.com>
Change-Id: Idc23c1c47f7d83b298c0b2560859f1ce2761fd85
Changes in 5.10.42
ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
ALSA: hda/realtek: Headphone volume is controlled by Front mixer
ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
ALSA: usb-audio: scarlett2: Improve driver startup messages
cifs: set server->cipher_type to AES-128-CCM for SMB3.0
NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
iommu/vt-d: Fix sysfs leak in alloc_iommu()
perf intel-pt: Fix sample instruction bytes
perf intel-pt: Fix transaction abort handling
perf scripts python: exported-sql-viewer.py: Fix copy to clipboard from Top Calls by elapsed Time report
perf scripts python: exported-sql-viewer.py: Fix Array TypeError
perf scripts python: exported-sql-viewer.py: Fix warning display
proc: Check /proc/$pid/attr/ writes against file opener
net: hso: fix control-request directions
net/sched: fq_pie: re-factor fix for fq_pie endless loop
net/sched: fq_pie: fix OOB access in the traffic path
netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version
mac80211: assure all fragments are encrypted
mac80211: prevent mixed key and fragment cache attacks
mac80211: properly handle A-MSDUs that start with an RFC 1042 header
cfg80211: mitigate A-MSDU aggregation attacks
mac80211: drop A-MSDUs on old ciphers
mac80211: add fragment cache to sta_info
mac80211: check defrag PN against current frame
mac80211: prevent attacks on TKIP/WEP as well
mac80211: do not accept/forward invalid EAPOL frames
mac80211: extend protection against mixed key and fragment cache attacks
ath10k: add CCMP PN replay protection for fragmented frames for PCIe
ath10k: drop fragments with multicast DA for PCIe
ath10k: drop fragments with multicast DA for SDIO
ath10k: drop MPDU which has discard flag set by firmware for SDIO
ath10k: Fix TKIP Michael MIC verification for PCIe
ath10k: Validate first subframe of A-MSDU before processing the list
ath11k: Clear the fragment cache during key install
dm snapshot: properly fix a crash when an origin has no snapshots
drm/amd/pm: correct MGpuFanBoost setting
drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
selftests/gpio: Use TEST_GEN_PROGS_EXTENDED
selftests/gpio: Move include of lib.mk up
selftests/gpio: Fix build when source tree is read only
kgdb: fix gcc-11 warnings harder
Documentation: seccomp: Fix user notification documentation
seccomp: Refactor notification handler to prepare for new semantics
serial: core: fix suspicious security_locked_down() call
misc/uss720: fix memory leak in uss720_probe
thunderbolt: usb4: Fix NVM read buffer bounds and offset issue
thunderbolt: dma_port: Fix NVM read buffer bounds and offset issue
KVM: X86: Fix vCPU preempted state from guest's point of view
KVM: arm64: Prevent mixed-width VM creation
mei: request autosuspend after sending rx flow control
staging: iio: cdc: ad7746: avoid overwrite of num_channels
iio: gyro: fxas21002c: balance runtime power in error path
iio: dac: ad5770r: Put fwnode in error case during ->probe()
iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()
iio: adc: ad7124: Fix missbalanced regulator enable / disable on error.
iio: adc: ad7124: Fix potential overflow due to non sequential channel numbers
iio: adc: ad7923: Fix undersized rx buffer.
iio: adc: ad7793: Add missing error code in ad7793_setup()
iio: adc: ad7192: Avoid disabling a clock that was never enabled.
iio: adc: ad7192: handle regulator voltage error first
serial: 8250: Add UART_BUG_TXRACE workaround for Aspeed VUART
serial: 8250_dw: Add device HID for new AMD UART controller
serial: 8250_pci: Add support for new HPE serial device
serial: 8250_pci: handle FL_NOIRQ board flag
USB: trancevibrator: fix control-request direction
Revert "irqbypass: do not start cons/prod when failed connect"
USB: usbfs: Don't WARN about excessively large memory allocations
drivers: base: Fix device link removal
serial: tegra: Fix a mask operation that is always true
serial: sh-sci: Fix off-by-one error in FIFO threshold register setting
serial: rp2: use 'request_firmware' instead of 'request_firmware_nowait'
USB: serial: ti_usb_3410_5052: add startech.com device id
USB: serial: option: add Telit LE910-S1 compositions 0x7010, 0x7011
USB: serial: ftdi_sio: add IDs for IDS GmbH Products
USB: serial: pl2303: add device id for ADLINK ND-6530 GC
thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID
usb: dwc3: gadget: Properly track pending and queued SG
usb: gadget: udc: renesas_usb3: Fix a race in usb3_start_pipen()
usb: typec: mux: Fix matching with typec_altmode_desc
net: usb: fix memory leak in smsc75xx_bind
Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails
fs/nfs: Use fatal_signal_pending instead of signal_pending
NFS: fix an incorrect limit in filelayout_decode_layout()
NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
drm/meson: fix shutdown crash when component not probed
net/mlx5e: reset XPS on error flow if netdev isn't registered yet
net/mlx5e: Fix multipath lag activation
net/mlx5e: Fix error path of updating netdev queues
{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table
net/mlx5e: Fix nullptr in add_vlan_push_action()
net/mlx5: Set reformat action when needed for termination rules
net/mlx5e: Fix null deref accessing lag dev
net/mlx4: Fix EEPROM dump support
net/mlx5: Set term table as an unmanaged flow table
SUNRPC in case of backlog, hand free slots directly to waiting task
Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"
tipc: wait and exit until all work queues are done
tipc: skb_linearize the head skb when reassembling msgs
spi: spi-fsl-dspi: Fix a resource leak in an error handling path
netfilter: flowtable: Remove redundant hw refresh bit
net: dsa: mt7530: fix VLAN traffic leaks
net: dsa: fix a crash if ->get_sset_count() fails
net: dsa: sja1105: update existing VLANs from the bridge VLAN list
net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic
net: dsa: sja1105: error out on unsupported PHY mode
net: dsa: sja1105: add error handling in sja1105_setup()
net: dsa: sja1105: call dsa_unregister_switch when allocating memory fails
net: dsa: sja1105: fix VL lookup command packing for P/Q/R/S
i2c: s3c2410: fix possible NULL pointer deref on read message after write
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
i2c: i801: Don't generate an interrupt on bus reset
i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
afs: Fix the nlink handling of dir-over-dir rename
perf jevents: Fix getting maximum number of fds
nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
mptcp: avoid error message on infinite mapping
mptcp: drop unconditional pr_warn on bad opt
mptcp: fix data stream corruption
platform/x86: hp_accel: Avoid invoking _INI to speed up resume
gpio: cadence: Add missing MODULE_DEVICE_TABLE
Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions"
Revert "media: usb: gspca: add a missed check for goto_low_power"
Revert "ALSA: sb: fix a missing check of snd_ctl_add"
Revert "serial: max310x: pass return value of spi_register_driver"
serial: max310x: unregister uart driver in case of failure and abort
Revert "net: fujitsu: fix a potential NULL pointer dereference"
net: fujitsu: fix potential null-ptr-deref
Revert "net/smc: fix a NULL pointer dereference"
net/smc: properly handle workqueue allocation failure
Revert "net: caif: replace BUG_ON with recovery code"
net: caif: remove BUG_ON(dev == NULL) in caif_xmit
Revert "char: hpet: fix a missing check of ioremap"
char: hpet: add checks after calling ioremap
Revert "ALSA: gus: add a check of the status of snd_ctl_add"
Revert "ALSA: usx2y: Fix potential NULL pointer dereference"
Revert "isdn: mISDNinfineon: fix potential NULL pointer dereference"
isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io
Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()"
ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()
Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc"
isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info
Revert "dmaengine: qcom_hidma: Check for driver register failure"
dmaengine: qcom_hidma: comment platform_driver_register call
Revert "libertas: add checks for the return value of sysfs_create_group"
libertas: register sysfs groups properly
Revert "ASoC: cs43130: fix a NULL pointer dereference"
ASoC: cs43130: handle errors in cs43130_probe() properly
Revert "media: dvb: Add check on sp8870_readreg"
media: dvb: Add check on sp8870_readreg return
Revert "media: gspca: mt9m111: Check write_bridge for timeout"
media: gspca: mt9m111: Check write_bridge for timeout
Revert "media: gspca: Check the return value of write_bridge for timeout"
media: gspca: properly check for errors in po1030_probe()
Revert "net: liquidio: fix a NULL pointer dereference"
net: liquidio: Add missing null pointer checks
Revert "brcmfmac: add a check for the status of usb_register"
brcmfmac: properly check for bus register errors
btrfs: return whole extents in fiemap
scsi: ufs: ufs-mediatek: Fix power down spec violation
scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic
openrisc: Define memory barrier mb
scsi: pm80xx: Fix drives missing during rmmod/insmod loop
btrfs: release path before starting transaction when cloning inline extent
btrfs: do not BUG_ON in link_to_fixup_dir
platform/x86: hp-wireless: add AMD's hardware id to the supported list
platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI
platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet
SMB3: incorrect file id in requests compounded with open
drm/amd/display: Disconnect non-DP with no EDID
drm/amd/amdgpu: fix refcount leak
drm/amdgpu: Fix a use-after-free
drm/amd/amdgpu: fix a potential deadlock in gpu reset
drm/amdgpu: stop touching sched.ready in the backend
platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
block: fix a race between del_gendisk and BLKRRPART
linux/bits.h: fix compilation error with GENMASK
net: netcp: Fix an error message
net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
interconnect: qcom: bcm-voter: add a missing of_node_put()
interconnect: qcom: Add missing MODULE_DEVICE_TABLE
ASoC: cs42l42: Regmap must use_single_read/write
net: stmmac: Fix MAC WoL not working if PHY does not support WoL
net: ipa: memory region array is variable size
vfio-ccw: Check initialized flag in cp_init()
spi: Assume GPIO CS active high in ACPI case
net: really orphan skbs tied to closing sk
net: packetmmap: fix only tx timestamp on request
net: fec: fix the potential memory leak in fec_enet_init()
chelsio/chtls: unlock on error in chtls_pt_recvmsg()
net: mdio: thunder: Fix a double free issue in the .remove function
net: mdio: octeon: Fix some double free issues
cxgb4/ch_ktls: Clear resources when pf4 device is removed
openvswitch: meter: fix race when getting now_ms.
tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
net: sched: fix packet stuck problem for lockless qdisc
net: sched: fix tx action rescheduling issue during deactivation
net: sched: fix tx action reschedule issue with stopped queue
net: hso: check for allocation failure in hso_create_bulk_serial_device()
net: bnx2: Fix error return code in bnx2_init_board()
bnxt_en: Include new P5 HV definition in VF check.
bnxt_en: Fix context memory setup for 64K page size.
mld: fix panic in mld_newpack()
net/smc: remove device from smcd_dev_list after failed device_add()
gve: Check TX QPL was actually assigned
gve: Update mgmt_msix_idx if num_ntfy changes
gve: Add NULL pointer checks when freeing irqs.
gve: Upgrade memory barrier in poll routine
gve: Correct SKB queue index validation.
iommu/virtio: Add missing MODULE_DEVICE_TABLE
net: hns3: fix incorrect resp_msg issue
net: hns3: put off calling register_netdev() until client initialize complete
iommu/vt-d: Use user privilege for RID2PASID translation
cxgb4: avoid accessing registers when clearing filters
staging: emxx_udc: fix loop in _nbu2ss_nuke()
ASoC: cs35l33: fix an error code in probe()
bpf, offload: Reorder offload callback 'prepare' in verifier
bpf: Set mac_len in bpf_skb_change_head
ixgbe: fix large MTU request from VF
ASoC: qcom: lpass-cpu: Use optional clk APIs
scsi: libsas: Use _safe() loop in sas_resume_port()
net: lantiq: fix memory corruption in RX ring
ipv6: record frag_max_size in atomic fragments in input path
ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
net: ethernet: mtk_eth_soc: Fix packet statistics support for MT7628/88
sch_dsmark: fix a NULL deref in qdisc_reset()
net: hsr: fix mac_len checks
MIPS: alchemy: xxs1500: add gpio-au1000.h header file
MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
net: zero-initialize tc skb extension on allocation
net: mvpp2: add buffer header handling in RX
i915: fix build warning in intel_dp_get_link_status()
samples/bpf: Consider frame size in tx_only of xdpsock sample
net: hns3: check the return of skb_checksum_help()
bpftool: Add sock_release help info for cgroup attach/prog load command
SUNRPC: More fixes for backlog congestion
Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference""
net: hso: bail out on interrupt URB allocation failure
scripts/clang-tools: switch explicitly to Python 3
neighbour: Prevent Race condition in neighbour subsytem
usb: core: reduce power-on-good delay time of root hub
Linux 5.10.42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I05d98d1355a080e0951b4b2ae77f0a9ccb6dfc5d
task_cs(p) is protected by RCU, so ensure that we have entered an RCU
read-side critical section before accessing it in guarantee_online_cpus().
This issue was introduced by 4045a05f88 ("BACKPORT: FROMLIST: cpuset:
Honour task_cpu_possible_mask() in guarantee_online_cpus()") and spotted
during upstream review.
Reported-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210521162524.22cwmrao3df7m4jb@e107158-lin.cambridge.arm.com
Fixes: 4045a05f88 ("BACKPORT: FROMLIST: cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Bug: 178507149
Change-Id: Ia8b8b89b5fcf72eefe9c2667951e24f315176ed5
Signed-off-by: Will Deacon <willdeacon@google.com>
[ Upstream commit ceb11679d9 ]
Commit 4976b718c3 ("bpf: Introduce pseudo_btf_id") switched the
order of resolve_pseudo_ldimm(), in which some pseudo instructions
are rewritten. Thus those rewritten instructions cannot be passed
to driver via 'prepare' offload callback.
Reorder the 'prepare' offload callback to fix it.
Fixes: 4976b718c3 ("bpf: Introduce pseudo_btf_id")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210520085834.15023-1-simon.horman@netronome.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit ddc4739169 upstream.
This refactors the user notification code to have a do / while loop around
the completion condition. This has a small change in semantic, in that
previously we ignored addfd calls upon wakeup if the notification had been
responded to, but instead with the new change we check for an outstanding
addfd calls prior to returning to userspace.
Rodrigo Campos also identified a bug that can result in addfd causing
an early return, when the supervisor didn't actually handle the
syscall [1].
[1]: https://lore.kernel.org/lkml/20210413160151.3301-1-rodrigo@kinvolk.io/
Fixes: 7cf97b1254 ("seccomp: Introduce addfd ioctl to seccomp user notifier")
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Rodrigo Campos <rodrigo@kinvolk.io>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210517193908.3113-3-sargun@sargun.me
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CONFIG_SCHEDSTATS is not set, the build breaks because
DEFINE_EVENT_SCHEDSTAT evaluates to DEFINE_EVENT_NOP, which only defines
trace_<name>, not __tracepoint_<name>, __traceiter_<name>, and
_SCK__tp_func_<name> like DEFINE_EVENT.
Gate these exports on CONFIG_SCHEDSTATS so all of the exported symbols
are defined.
Change-Id: I38056ee1446e6c149686ce1905c2ba6e4ea5e59e
Fixes: a6bb1af39d ("ANDROID: vendor_hooks: Export the tracepoints sched_stat_iowait, sched_stat_blocked, sched_stat_wait to let modules probe them")
Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/2724257445?check_suite_focus=true
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.
Although we take care to prevent explicit hot-unplug of all 32-bit
capable CPUs on such a system, this is required when suspending on some
SoCs where the firmware mandates that the suspend/resume operation is
handled by CPU 0, which may not be capable of running 32-bit tasks.
Consequently, there is a window on the resume path where no 32-bit
capable CPUs are available for scheduling and waking up a 32-bit task
will result in a scheduler BUG() due to failure of select_fallback_rq():
| kernel BUG at kernel/sched/core.c:2858!
| Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
| ...
| Call trace:
| select_fallback_rq+0x4b0/0x4e4
| try_to_wake_up.llvm.4388853297126348405+0x460/0x5b0
| default_wake_function+0x1c/0x30
| autoremove_wake_function+0x1c/0x60
| __wake_up_common.llvm.11763074518265335900+0x100/0x1b8
| __wake_up+0x78/0xc4
| ep_poll_callback+0x20c/0x3fc
Prevent wakeups of unschedulable frozen tasks in ttwu() and instead
defer the wakeup to __thaw_tasks(), which runs only once all the
secondary CPUs are back online.
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-arch/20210525151432.16875-17-will@kernel.org/
Bug: 186372082
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I5a0531b48d537a79e1926289b5a87edcd7dd78ad
Occasionally it is necessary to see if a task is either frozen or
sleeping in the PF_FREEZER_SKIP state. In preparation for adding
additional users of this check, introduce a frozen_or_skipped() helper
function and convert the hung task detector over to using it.
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-arch/20210525151432.16875-16-will@kernel.org/
Bug: 186372082
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I138ffe2fae5a2da96df6f30d50d3a8a0dc61724c
Get task info about scheduling delay, iowait, and block time.
Bug: 189415303
Change-Id: Ib6b548f8a78de5b26d555e9a89e3cc79ea2d1024
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmCw0WEACgkQONu9yGCS
aT5K6RAAphIxwnUhvm1gBe+lzNOsp6ZURXMIT6hANhUdCU21Tw6agLGRELOJ+YNQ
agLdWb3auH1ufGV0wUtUJYbLa3lYF6uuU53BZb2i9iJ+1X2igzGwtVN/lEdcZs4G
R6hD8W5Rxiam5K7KAgYZLpSA5bS3ETrfsbJ2kddIGGSH1BHnwrManXOan1U9mY99
HDf+ksPCF2iA8Zqqq5Hx2g9Nuf0x0vyZ+6cob2QdJVq5ZnZXwamC498zmi/vGkGj
fPSxjaBMR9kDMQhUmvgSmAieM0UrrsPIkOxsWWCz/Lo4qhTG8+ccyHmplnsvvsyz
R3LEdq0YK3vMTypi7RfdxaEeB1a5d8cTV4JYZBs/eOU45lBVKZ+IKp1KJjTqtshy
Oj7LnNtONUfPNfXki+AgW7zGTPUJqK3hxW5j87Qg0MKe1i7CrAxmKhDcWX23acYG
5jBlUGX8vrYycdjvGCC7m/+T1NptVi/9UbcTi22au8hSwtKokn6AdTifTNfjst7H
4UMslyD5EA1Js17eObk/04kB0iMp9RSIEtc8DOV3cHZWAu3gK6pKe+RBL4uZ/K7Q
Wr8+gqlGyjg89NMjwoyXYaCTRkEwDcSSjKXzbfX1Pqjbujc/I7MLiLlPoen4Pe2i
v4aftEYPa4SHznutCQmLqRghFUfsxkRJKYVoo+SvKbhK7dE92O4=
=KKY2
-----END PGP SIGNATURE-----
Merge 5.10.41 into android12-5.10
Changes in 5.10.41
bpf: Wrap aux data inside bpf_sanitize_info container
bpf: Fix mask direction swap upon off reg sign change
bpf: No need to simulate speculative domain for immediates
context_tracking: Move guest exit context tracking to separate helpers
context_tracking: Move guest exit vtime accounting to separate helpers
KVM: x86: Defer vtime accounting 'til after IRQ handling
perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder
perf unwind: Set userdata for all __report_module() paths
NFC: nci: fix memory leak in nci_allocate_device
Linux 5.10.41
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie9f14d4b9960fb923eb01303517012fe6274d5ef
commit a703619127 upstream.
In 801c6058d1 ("bpf: Fix leakage of uninitialized bpf stack under
speculation") we replaced masking logic with direct loads of immediates
if the register is a known constant. Given in this case we do not apply
any masking, there is also no reason for the operation to be truncated
under the speculative domain.
Therefore, there is also zero reason for the verifier to branch-off and
simulate this case, it only needs to do it for unknown but bounded scalars.
As a side-effect, this also enables few test cases that were previously
rejected due to simulation under zero truncation.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Piotr Krysiuk <piotras@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb01a1bba5 upstream.
Masking direction as indicated via mask_to_left is considered to be
calculated once and then used to derive pointer limits. Thus, this
needs to be placed into bpf_sanitize_info instead so we can pass it
to sanitize_ptr_alu() call after the pointer move. Piotr noticed a
corner case where the off reg causes masking direction change which
then results in an incorrect final aux->alu_limit.
Fixes: 7fedb63a83 ("bpf: Tighten speculative pointer arithmetic mask")
Reported-by: Piotr Krysiuk <piotras@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Piotr Krysiuk <piotras@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d0220f686 upstream.
Add a container structure struct bpf_sanitize_info which holds
the current aux info, and update call-sites to sanitize_ptr_alu()
to pass it in. This is needed for passing in additional state
later on.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Piotr Krysiuk <piotras@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.40
firmware: arm_scpi: Prevent the ternary sign expansion bug
openrisc: Fix a memory leak
tee: amdtee: unload TA only when its refcount becomes 0
RDMA/siw: Properly check send and receive CQ pointers
RDMA/siw: Release xarray entry
RDMA/core: Prevent divide-by-zero error triggered by the user
RDMA/rxe: Clear all QP fields if creation failed
scsi: ufs: core: Increase the usable queue depth
scsi: qedf: Add pointer checks in qedf_update_link_speed()
scsi: qla2xxx: Fix error return code in qla82xx_write_flash_dword()
RDMA/mlx5: Recover from fatal event in dual port mode
RDMA/core: Don't access cm_id after its destruction
nvmet: remove unused ctrl->cqs
nvmet: fix memory leak in nvmet_alloc_ctrl()
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
nvme-tcp: rerun io_work if req_list is not empty
nvme-fc: clear q_live at beginning of association teardown
platform/mellanox: mlxbf-tmfifo: Fix a memory barrier issue
platform/x86: intel_int0002_vgpio: Only call enable_irq_wake() when using s2idle
platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios
RDMA/mlx5: Fix query DCT via DEVX
RDMA/uverbs: Fix a NULL vs IS_ERR() bug
tools/testing/selftests/exec: fix link error
powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
nvmet: seset ns->file when open fails
perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
locking/lockdep: Correct calling tracepoints
locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
powerpc: Fix early setup to make early_ioremap() work
btrfs: avoid RCU stalls while running delayed iputs
cifs: fix memory leak in smb2_copychunk_range
misc: eeprom: at24: check suspend status before disable regulator
ALSA: dice: fix stream format for TC Electronic Konnekt Live at high sampling transfer frequency
ALSA: intel8x0: Don't update period unless prepared
ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field
ALSA: line6: Fix racy initialization of LINE6 MIDI
ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26
ALSA: firewire-lib: fix calculation for size of IR context payload
ALSA: usb-audio: Validate MS endpoint descriptors
ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro
ALSA: hda: fixup headset for ASUS GU502 laptop
Revert "ALSA: sb8: add a check for request_region"
ALSA: firewire-lib: fix check for the size of isochronous packet payload
ALSA: hda/realtek: reset eapd coeff to default value for alc287
ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293
ALSA: hda/realtek: Fix silent headphone output on ASUS UX430UA
ALSA: hda/realtek: Add fixup for HP OMEN laptop
ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx
uio_hv_generic: Fix a memory leak in error handling paths
Revert "rapidio: fix a NULL pointer dereference when create_workqueue() fails"
rapidio: handle create_workqueue() failure
Revert "serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference"
nvme-tcp: fix possible use-after-completion
x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
x86/sev-es: Invalidate the GHCB after completing VMGEXIT
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
x86/sev-es: Use __put_user()/__get_user() for data accesses
x86/sev-es: Forward page-faults which happen during emulation
drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
drm/amdgpu: update gc golden setting for Navi12
drm/amdgpu: update sdma golden setting for Navi12
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
mmc: sdhci-pci-gli: increase 1.8V regulator wait
xen-pciback: redo VF placement in the virtual topology
xen-pciback: reconfigure also from backend watch handler
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
dm snapshot: fix crash with transient storage and zero chunk size
kcsan: Fix debugfs initcall return type
Revert "video: hgafb: fix potential NULL pointer dereference"
Revert "net: stmicro: fix a missing check of clk_prepare"
Revert "leds: lp5523: fix a missing check of return value of lp55xx_read"
Revert "hwmon: (lm80) fix a missing check of bus read in lm80 probe"
Revert "video: imsttfb: fix potential NULL pointer dereferences"
Revert "ecryptfs: replace BUG_ON with error handling code"
Revert "scsi: ufs: fix a missing check of devm_reset_control_get"
Revert "gdrom: fix a memory leak bug"
cdrom: gdrom: deallocate struct gdrom_unit fields in remove_gdrom
cdrom: gdrom: initialize global variable at init time
Revert "media: rcar_drif: fix a memory disclosure"
Revert "rtlwifi: fix a potential NULL pointer dereference"
Revert "qlcnic: Avoid potential NULL pointer dereference"
Revert "niu: fix missing checks of niu_pci_eeprom_read"
ethernet: sun: niu: fix missing checks of niu_pci_eeprom_read()
net: stmicro: handle clk_prepare() failure during init
scsi: ufs: handle cleanup correctly on devm_reset_control_get error
net: rtlwifi: properly check for alloc_workqueue() failure
ics932s401: fix broken handling of errors when word reading fails
leds: lp5523: check return value of lp5xx_read and jump to cleanup code
qlcnic: Add null check after calling netdev_alloc_skb
video: hgafb: fix potential NULL pointer dereference
vgacon: Record video mode changes with VT_RESIZEX
vt_ioctl: Revert VT_RESIZEX parameter handling removal
vt: Fix character height handling with VT_RESIZEX
tty: vt: always invoke vc->vc_sw->con_resize callback
drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7
openrisc: mm/init.c: remove unused memblock_region variable in map_ram()
x86/Xen: swap NX determination and GDT setup on BSP
nvme-multipath: fix double initialization of ANA state
rtc: pcf85063: fallback to parent of_node
x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path
nvmet: use new ana_log_size instead the old one
video: hgafb: correctly handle card detect failure during probe
Bluetooth: SMP: Fail if remote and local public keys are identical
Linux 5.10.40
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4523cf43d1da6bea507e4027bd83bc491a574f41
commit 976aac5f88 upstream.
clang with CONFIG_LTO_CLANG points out that an initcall function should
return an 'int' due to the changes made to the initcall macros in commit
3578ad11f3 ("init: lto: fix PREL32 relocations"):
kernel/kcsan/debugfs.c:274:15: error: returning 'void' from a function with incompatible result type 'int'
late_initcall(kcsan_debugfs_init);
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
include/linux/init.h:292:46: note: expanded from macro 'late_initcall'
#define late_initcall(fn) __define_initcall(fn, 7)
Fixes: e36299efe7 ("kcsan, debugfs: Move debugfs file creation out of early init")
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3a010c4932 ]
When a interruptible mutex locker is interrupted by a signal
without acquiring this lock and removed from the wait queue.
if the mutex isn't contended enough to have a waiter
put into the wait queue again, the setting of the WAITER
bit will force mutex locker to go into the slowpath to
acquire the lock every time, so if the wait queue is empty,
the WAITER bit need to be clear.
Fixes: 040a0a3710 ("mutex: Add support for wound/wait style locks")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210517034005.30828-1-qiang.zhang@windriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dbb5afad10 ]
Suppose we have 2 threads, the group-leader L and a sub-theread T,
both parked in ptrace_stop(). Debugger tries to resume both threads
and does
ptrace(PTRACE_CONT, T);
ptrace(PTRACE_CONT, L);
If the sub-thread T execs in between, the 2nd PTRACE_CONT doesn not
resume the old leader L, it resumes the post-exec thread T which was
actually now stopped in PTHREAD_EVENT_EXEC. In this case the
PTHREAD_EVENT_EXEC event is lost, and the tracer can't know that the
tracee changed its pid.
This patch makes ptrace() fail in this case until debugger does wait()
and consumes PTHREAD_EVENT_EXEC which reports old_pid. This affects all
ptrace requests except the "asynchronous" PTRACE_INTERRUPT/KILL.
The patch doesn't add the new PTRACE_ option to not complicate the API,
and I _hope_ this won't cause any noticeable regression:
- If debugger uses PTRACE_O_TRACEEXEC and the thread did an exec
and the tracer does a ptrace request without having consumed
the exec event, it's 100% sure that the thread the ptracer
thinks it is targeting does not exist anymore, or isn't the
same as the one it thinks it is targeting.
- To some degree this patch adds nothing new. In the scenario
above ptrace(L) can fail with -ESRCH if it is called after the
execing sub-thread wakes the leader up and before it "steals"
the leader's pid.
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <errno.h>
#include <pthread.h>
#include <assert.h>
void *tf(void *arg)
{
execve("/usr/bin/true", NULL, NULL);
assert(0);
return NULL;
}
int main(void)
{
int leader = fork();
if (!leader) {
kill(getpid(), SIGSTOP);
pthread_t th;
pthread_create(&th, NULL, tf, NULL);
for (;;)
pause();
return 0;
}
waitpid(leader, NULL, WSTOPPED);
ptrace(PTRACE_SEIZE, leader, 0,
PTRACE_O_TRACECLONE | PTRACE_O_TRACEEXEC);
waitpid(leader, NULL, 0);
ptrace(PTRACE_CONT, leader, 0,0);
waitpid(leader, NULL, 0);
int status, thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != leader);
assert(status == 0x80137f);
ptrace(PTRACE_CONT, thread, 0,0);
/*
* waitid() because waitpid(leader, &status, WNOWAIT) does not
* report status. Why ????
*
* Why WEXITED? because we have another kernel problem connected
* to mt-exec.
*/
siginfo_t info;
assert(waitid(P_PID, leader, &info, WSTOPPED|WEXITED|WNOWAIT) == 0);
assert(info.si_pid == leader && info.si_status == 0x0405);
/* OK, it sleeps in ptrace(PTRACE_EVENT_EXEC == 0x04) */
assert(ptrace(PTRACE_CONT, leader, 0,0) == -1);
assert(errno == ESRCH);
assert(leader == waitpid(leader, &status, WNOHANG));
assert(status == 0x04057f);
assert(ptrace(PTRACE_CONT, leader, 0,0) == 0);
return 0;
}
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Simon Marchi <simon.marchi@efficios.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pedro Alves <palves@redhat.com>
Acked-by: Simon Marchi <simon.marchi@efficios.com>
Acked-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Migrating a task to a CPU which is concurrently being taken offline can
cause the migration to fail silently, with the task left running on the
old CPU. This is usually not the end of the world, but when forcefully
migrating a 32-bit task during execve() from a 64-bit task, it is
imperative that we do not attempt to return to userspace on a
64-bit-only CPU.
Take the CPU hotplug lock for read while forcefully migrating a 32-bit
task on execve() so that the migration cannot fail.
Bug: 187917024
Change-Id: I6eaf2a564fe3ad73c03f0a6029aade09c707330f
Signed-off-by: Will Deacon <willdeacon@google.com>
We encountered a system hang issue while doing the tests. The callstack
is as following
schedule+0x80/0x100
schedule_timeout+0x48/0x138
wait_for_common+0xa4/0x134
wait_for_completion+0x1c/0x2c
kthread_flush_work+0x114/0x1cc
kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
kthread_cancel_delayed_work_sync+0x18/0x2c
xxxx_pm_notify+0xb0/0xd8
blocking_notifier_call_chain_robust+0x80/0x194
pm_notifier_call_chain_robust+0x28/0x4c
suspend_prepare+0x40/0x260
enter_state+0x80/0x3f4
pm_suspend+0x60/0xdc
state_store+0x108/0x144
kobj_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
When we started investigating, we found race between
kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync. The race's
result could be simply reproduced as a kthread_mod_delayed_work with
a following kthread_flush_work call.
Thing is we release kthread_mod_delayed_work kspin_lock in
__kthread_cancel_work so it opens a race window for
kthread_cancel_delayed_work_sync to change the canceling count used to
prevent dwork from being requeued before calling kthread_flush_work.
However, we don't check the canceling count after returning from
__kthread_cancel_work and then insert the dwork to the worker. It
results the following kthread_flush_work inserts flush work to dwork's
tail which is at worker's dealyed_work_list. Therefore, flush work will
never get moved to the worker's work_list to be executed. Finally,
kthread_cancel_delayed_work_sync will NOT be able to get completed and
wait forever. The code sequence diagram is as following
Thread A Thread B
kthread_mod_delayed_work
spin_lock
__kthread_cancel_work
canceling = 1
spin_unlock
kthread_cancel_delayed_work_sync
spin_lock
kthread_cancel_work
canceling = 2
spin_unlock
del_timer_sync
spin_lock
canceling = 1 // canceling count gets update in ThreadB before
queue_delayed_work // dwork is put into the woker’s dealyed_work_list
without checking the canceling count
spin_unlock
kthread_flush_work
spin_lock
Insert flush work // at the tail of the
dwork which is at
the worker’s
dealyed_work_list
spin_unlock
wait_for_completion // Thread B stuck here as
flush work will never
get executed
The canceling count could change in __kthread_cancel_work as the spinlock
get released and regained in between, let's check the count again before
we queue the delayed work to avoid the race.
Bug: 185578292
Link: https://lore.kernel.org/lkml/20210513065458.941403-1-liumartin@google.com
Fixes: 37be45d49d ("kthread: allow to cancel kthread work")
Tested-by: David Chao <davidchao@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Change-Id: I7bf68c66cc80cf7aa27a4712238444281826234b
[ Upstream commit 95b079d821 ]
Fix the type of index from unsigned int to int since find_slots() might
return -1.
Fixes: 26a7e09478 ("swiotlb: refactor swiotlb_tbl_map_single")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a01572e21f)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I352263b74ff89334c1e3fd6f1b70e1c08b7f2b47
This reverts commit 3d24408745.
Bring back the commit in 5.10.36 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iafa9c51e4cb9e42f4149c4e9889b822288a96ad2
This reverts commit b4ae4430ab.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1dd3e49ebca722219c15faaa4f895d992530a3ed
This reverts commit 2201384121.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I14d47f0fc68b35b08323b26eb045b5cdda845cf0
This reverts commit 0fb49e91d4.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7c52f223a6ca7b8900ccdfc529afe1c21b2b11ef
This reverts commit 28a2f5f10f.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibebda0f5aeecc6ba2fe45902870b783e3e9cbb0f
This reverts commit cb27079661.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I14e92ad7064f5d3f4334d17bdb78444dff45cab3
This reverts commit 78957dcb2c.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3c4e4f15d932dfe10b4a406cda6a49948a130f58
This reverts commit 17ba7dfe20.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ied3e936060722b44f374cec6645bcdf5a1c1ba47
This reverts commit d9d0c09e0a.
Bring back the commit in 5.10.35 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1fcf27286f9c21fb87f9dd05b0f861b37350d780
Add vendor hook for hung task detect, so we can decide which
threads need to check, avoiding false alarms.
Bug: 188684133
Signed-off-by: Huang Yiwei <hyiwei@codeaurora.org>
Change-Id: I5d7dfeb071cbfda8121134c38a458202aaa3a8c6
Changes in 5.10.38
KEYS: trusted: Fix memory leak on object td
tpm: fix error return code in tpm2_get_cc_attrs_tbl()
tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt()
tpm, tpm_tis: Reserve locality in tpm_tis_resume()
KVM: x86/mmu: Remove the defunct update_pte() paging hook
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
ACPI: PM: Add ACPI ID of Alder Lake Fan
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
kvm: Cap halt polling at kvm->max_halt_poll_ns
ath11k: fix thermal temperature read
fs: dlm: fix debugfs dump
fs: dlm: add errno handling to check callback
fs: dlm: check on minimum msglen size
fs: dlm: flush swork on shutdown
tipc: convert dest node's address to network order
ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
net: stmmac: Set FIFO sizes for ipq806x
ASoC: rsnd: core: Check convert rate in rsnd_hw_params
Bluetooth: Fix incorrect status handling in LE PHY UPDATE event
i2c: bail out early when RDWR parameters are wrong
ALSA: hdsp: don't disable if not enabled
ALSA: hdspm: don't disable if not enabled
ALSA: rme9652: don't disable if not enabled
ALSA: bebob: enable to deliver MIDI messages for multiple ports
Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
Bluetooth: initialize skb_queue_head at l2cap_chan_create()
net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports
net: bridge: when suppression is enabled exclude RARP packets
Bluetooth: check for zapped sk before connecting
selftests/powerpc: Fix L1D flushing tests for Power10
powerpc/32: Statically initialise first emergency context
net: hns3: remediate a potential overflow risk of bd_num_list
net: hns3: add handling for xmit skb with recursive fraglist
ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
ice: handle increasing Tx or Rx ring sizes
Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip.
ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055
i2c: Add I2C_AQ_NO_REP_START adapter quirk
MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED
coresight: Do not scan for graph if none is present
IB/hfi1: Correct oversized ring allocation
mac80211: clear the beacon's CRC after channel switch
pinctrl: samsung: use 'int' for register masks in Exynos
rtw88: 8822c: add LC calibration for RTL8822C
mt76: mt7615: support loading EEPROM for MT7613BE
mt76: mt76x0: disable GTK offloading
mt76: mt7915: fix txpower init for TSSI off chips
fuse: invalidate attrs when page writeback completes
virtiofs: fix userns
cuse: prevent clone
iwlwifi: pcie: make cfg vs. trans_cfg more robust
powerpc/mm: Add cond_resched() while removing hpte mappings
ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init()
Revert "iommu/amd: Fix performance counter initialization"
iommu/amd: Remove performance counter pre-initialization test
drm/amd/display: Force vsync flip when reconfiguring MPCC
selftests: Set CC to clang in lib.mk if LLVM is set
kconfig: nconf: stop endless search loops
ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740
ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp
ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
powerpc/smp: Set numa node before updating mask
ASoC: rt286: Generalize support for ALC3263 codec
ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule
samples/bpf: Fix broken tracex1 due to kprobe argument change
powerpc/pseries: Stop calling printk in rtas_stop_self()
drm/amd/display: fixed divide by zero kernel crash during dsc enablement
drm/amd/display: add handling for hdcp2 rx id list validation
drm/amdgpu: Add mem sync flag for IB allocated by SA
mt76: mt7615: fix entering driver-own state on mt7663
crypto: ccp: Free SEV device if SEV init fails
wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth
powerpc/iommu: Annotate nested lock for lockdep
iavf: remove duplicate free resources calls
net: ethernet: mtk_eth_soc: fix RX VLAN offload
selftests: mlxsw: Increase the tolerance of backlog buildup
selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
kbuild: generate Module.symvers only when vmlinux exists
bnxt_en: Add PCI IDs for Hyper-V VF devices.
ia64: module: fix symbolizer crash on fdescr
watchdog: rename __touch_watchdog() to a better descriptive name
watchdog: explicitly update timestamp when reporting softlockup
watchdog/softlockup: remove logic that tried to prevent repeated reports
watchdog: fix barriers when printing backtraces from all CPUs
ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params()
f2fs: move ioctl interface definitions to separated file
f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE
f2fs: fix to allow migrating fully valid segment
f2fs: fix panic during f2fs_resize_fs()
f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
remoteproc: qcom_q6v5_mss: Replace ioremap with memremap
remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader
PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
PCI: Release OF node in pci_scan_device()'s error path
ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
f2fs: fix to align to section for fallocate() on pinned file
f2fs: fix to update last i_size if fallocate partially succeeds
PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR
PCI: endpoint: Add helper API to get the 'next' unreserved BAR
PCI: endpoint: Make *_free_bar() to return error codes on failure
PCI: endpoint: Fix NULL pointer dereference for ->get_features()
f2fs: fix to avoid touching checkpointed data in get_victim()
f2fs: fix to cover __allocate_new_section() with curseg_lock
f2fs: Fix a hungtask problem in atomic write
f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
NFS: nfs4_bitmask_adjust() must not change the server global bitmasks
NFS: Fix attribute bitmask in _nfs42_proc_fallocate()
NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
NFS: Deal correctly with attribute generation counter overflow
PCI: endpoint: Fix missing destroy_workqueue()
pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
NFSv4.2 fix handling of sr_eof in SEEK's reply
SUNRPC: Move fault injection call sites
SUNRPC: Remove trace_xprt_transmit_queued
SUNRPC: Handle major timeout in xprt_adjust_timeout()
thermal/drivers/tsens: Fix missing put_device error
NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
nfsd: ensure new clients break delegations
rtc: fsl-ftm-alarm: add MODULE_TABLE()
dmaengine: idxd: Fix potential null dereference on pointer status
dmaengine: idxd: fix dma device lifetime
dmaengine: idxd: fix cdev setup and free device lifetime issues
SUNRPC: fix ternary sign expansion bug in tracing
pwm: atmel: Fix duty cycle calculation in .get_state()
xprtrdma: Avoid Receive Queue wrapping
xprtrdma: Fix cwnd update ordering
xprtrdma: rpcrdma_mr_pop() already does list_del_init()
swiotlb: Fix the type of index
ceph: fix inode leak on getattr error in __fh_to_dentry
scsi: qla2xxx: Prevent PRLI in target mode
scsi: ufs: core: Do not put UFS power into LPM if link is broken
scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
scsi: ufs: core: Narrow down fast path in system suspend path
rtc: ds1307: Fix wday settings for rx8130
net: hns3: fix incorrect configuration for igu_egu_hw_err
net: hns3: initialize the message content in hclge_get_link_mode()
net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()
net: hns3: fix for vxlan gpe tx checksum bug
net: hns3: use netif_tx_disable to stop the transmit queue
net: hns3: disable phy loopback setting in hclge_mac_start_phy
sctp: do asoc update earlier in sctp_sf_do_dupcook_a
RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
sunrpc: Fix misplaced barrier in call_decode
libbpf: Fix signed overflow in ringbuf_process_ring
block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
block/rnbd-clt: Check the return value of the function rtrs_clt_query
ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
netfilter: xt_SECMARK: add new revision to fix structure layout
xsk: Fix for xp_aligned_validate_desc() when len == chunk_size
net: stmmac: Clear receive all(RA) bit when promiscuous mode is off
drm/radeon: Fix off-by-one power_state index heap overwrite
drm/radeon: Avoid power table parsing memory leaks
arm64: entry: factor irq triage logic into macros
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
ksm: fix potential missing rmap_item for stable_node
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
ethtool: fix missing NLM_F_MULTI flag when dumping
net: fix nla_strcmp to handle more then one trailing null character
smc: disallow TCP_ULP in smc_setsockopt()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
netfilter: nftables: Fix a memleak from userdata error path in new objects
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251x: fix resume from sleep before interface was brought up
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
sched: Fix out-of-bound access in uclamp
sched/fair: Fix unfairness caused by missing load decay
fs/proc/generic.c: fix incorrect pde_is_permanent check
kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
netfilter: nftables: avoid overflows in nft_hash_buckets()
i40e: fix broken XDP support
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
mptcp: fix splat when closing unaccepted socket
f2fs: avoid unneeded data copy in f2fs_ioc_move_range()
ARC: entry: fix off-by-one error in syscall number validation
ARC: mm: PAE: use 40-bit physical page mask
ARC: mm: Use max_high_pfn as a HIGHMEM zone border
powerpc/64s: Fix crashes when toggling stf barrier
powerpc/64s: Fix crashes when toggling entry flush barrier
hfsplus: prevent corruption in shrinking truncate
squashfs: fix divide error in calculate_skip()
userfaultfd: release page in error path to avoid BUG_ON
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
blk-iocost: fix weight updates of inner active iocgs
arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
btrfs: fix race leading to unpersisted data and metadata on fsync
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
drm/i915: Avoid div-by-zero on gen2
kvm: exit halt polling on need_resched() as well
KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer
drm/msm/dp: initialize audio_comp when audio starts
KVM: x86: Cancel pvclock_gtod_work on module removal
KVM: x86: Prevent deadlock against tk_core.seq
dax: Add an enum for specifying dax wakup mode
dax: Add a wakeup mode parameter to put_unlocked_entry()
dax: Wake up all waiters after invalidating dax entry
xen/unpopulated-alloc: consolidate pgmap manipulation
xen/unpopulated-alloc: fix error return code in fill_list()
perf tools: Fix dynamic libbpf link
usb: dwc3: gadget: Free gadget structure only after freeing endpoints
iio: light: gp2ap002: Fix rumtime PM imbalance on error
iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
usb: fotg210-hcd: Fix an error message
hwmon: (occ) Fix poll rate limiting
usb: musb: Fix an error message
ACPI: scan: Fix a memory leak in an error handling path
kyber: fix out of bounds access when preempted
nvmet: add lba to sect conversion helpers
nvmet: fix inline bio check for bdev-ns
nvmet-rdma: Fix NULL deref when SEND is completed with error
f2fs: compress: fix to free compress page correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to assign cc.cluster_idx correctly
nbd: Fix NULL pointer in flush_workqueue
blk-mq: plug request for shared sbitmap
blk-mq: Swap two calls in blk_mq_exit_queue()
usb: dwc3: omap: improve extcon initialization
usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
usb: xhci: Increase timeout for HC halt
usb: dwc2: Fix gadget DMA unmap direction
usb: core: hub: fix race condition about TRSMRCY of resume
usb: dwc3: gadget: Enable suspend events
usb: dwc3: gadget: Return success always for kick transfer in ep queue
usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
usb: typec: ucsi: Put fwnode in any case during ->probe()
xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
xhci: Do not use GFP_KERNEL in (potentially) atomic context
xhci: Add reset resume quirk for AMD xhci controller.
iio: gyro: mpu3050: Fix reported temperature value
iio: tsl2583: Fix division by a zero lux_val
cdc-wdm: untangle a circular dependency between callback and softint
xen/gntdev: fix gntdev_mmap() error exit path
KVM: x86: Emulate RDPID only if RDTSCP is supported
KVM: x86: Move RDPID emulation intercept to its own enum
KVM: nVMX: Always make an attempt to map eVMCS after migration
KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported
KVM: VMX: Disable preemption when probing user return MSRs
Revert "iommu/vt-d: Remove WO permissions on second-level paging entries"
Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL"
iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
iommu/vt-d: Remove WO permissions on second-level paging entries
mm: fix struct page layout on 32-bit systems
MIPS: Reinstate platform `__div64_32' handler
MIPS: Avoid DIVU in `__div64_32' is result would be zero
MIPS: Avoid handcoded DIVU in `__div64_32' altogether
clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
ARM: 9012/1: move device tree mapping out of linear region
ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section
usb: typec: tcpm: Fix error while calculating PPS out values
kobject_uevent: remove warning in init_uevent_argv()
drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
drm/i915/overlay: Fix active retire callback alignment
drm/i915: Fix crash in auto_retire
clk: exynos7: Mark aclk_fsys1_200 as critical
media: rkvdec: Remove of_match_ptr()
i2c: mediatek: Fix send master code at more than 1MHz
dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
dt-bindings: serial: 8250: Remove duplicated compatible strings
debugfs: Make debugfs_allow RO after init
ext4: fix debug format string warning
nvme: do not try to reconfigure APST when the controller is not live
ASoC: rsnd: check all BUSIF status when error
Linux 5.10.38
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia32e01283b488a38be48015c58a0e481f09aaf65
[ Upstream commit 3c9c797534 ]
It used to be true that we can have system RAM (IORESOURCE_SYSTEM_RAM |
IORESOURCE_BUSY) only on the first level in the resource tree. However,
this is no longer holds for driver-managed system RAM (i.e., added via
dax/kmem and virtio-mem), which gets added on lower levels, for example,
inside device containers.
IORESOURCE_SYSTEM_RAM is defined as IORESOURCE_MEM | IORESOURCE_SYSRAM and
just a special type of IORESOURCE_MEM.
The function walk_mem_res() only considers the first level and is used in
arch/x86/mm/ioremap.c:__ioremap_check_mem() only. We currently fail to
identify System RAM added by dax/kmem and virtio-mem as
"IORES_MAP_SYSTEM_RAM", for example, allowing for remapping of such
"normal RAM" in __ioremap_caller().
Let's find all IORESOURCE_MEM | IORESOURCE_BUSY resources, making the
function behave similar to walk_system_ram_res().
Link: https://lkml.kernel.org/r/20210325115326.7826-3-david@redhat.com
Fixes: ebf71552bb ("virtio-mem: Add parent resource for all added "System RAM"")
Fixes: c221c0b030 ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97f61c8f44 ]
Patch series "kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree", v2.
Playing with kdump+virtio-mem I noticed that kexec_file_load() does not
consider System RAM added via dax/kmem and virtio-mem when preparing the
elf header for kdump. Looking into the details, the logic used in
walk_system_ram_res() and walk_mem_res() seems to be outdated.
walk_system_ram_range() already does the right thing, let's change
walk_system_ram_res() and walk_mem_res(), and clean up.
Loading a kdump kernel via "kexec -p -s" ... will result in the kdump
kernel to also dump dax/kmem and virtio-mem added System RAM now.
Note: kexec-tools on x86-64 also have to be updated to consider this
memory in the kexec_load() case when processing /proc/iomem.
This patch (of 3):
It used to be true that we can have system RAM (IORESOURCE_SYSTEM_RAM |
IORESOURCE_BUSY) only on the first level in the resource tree. However,
this is no longer holds for driver-managed system RAM (i.e., added via
dax/kmem and virtio-mem), which gets added on lower levels, for example,
inside device containers.
We have two users of walk_system_ram_res(), which currently only
consideres the first level:
a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip
IORESOURCE_SYSRAM_DRIVER_MANAGED resources via
locate_mem_hole_callback(), so even after this change, we won't be
placing kexec images onto dax/kmem and virtio-mem added memory. No
change.
b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently
not adding relevant ranges to the crash elf header, resulting in them
not getting dumped via kdump.
This change fixes loading a crashkernel via kexec_file_load() and
including dax/kmem and virtio-mem added System RAM in the crashdump on
x86-64. Note that e.g,, arm64 relies on memblock data and, therefore,
always considers all added System RAM already.
Let's find all IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY resources, making
the function behave like walk_system_ram_range().
Link: https://lkml.kernel.org/r/20210325115326.7826-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210325115326.7826-2-david@redhat.com
Fixes: ebf71552bb ("virtio-mem: Add parent resource for all added "System RAM"")
Fixes: c221c0b030 ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31d82c2c78 ]
When vzalloc() returns NULL to sha_regions, no error return code of
kexec_calculate_store_digests() is assigned. To fix this bug, ret is
assigned with -ENOMEM in this case.
Link: https://lkml.kernel.org/r/20210309083904.24321-1-baijiaju1990@gmail.com
Fixes: a43cac0d9d ("kexec: split kexec_file syscall code to kexec_file.c")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0258bdfaff ]
This fixes an issue where old load on a cfs_rq is not properly decayed,
resulting in strange behavior where fairness can decrease drastically.
Real workloads with equally weighted control groups have ended up
getting a respective 99% and 1%(!!) of cpu time.
When an idle task is attached to a cfs_rq by attaching a pid to a cgroup,
the old load of the task is attached to the new cfs_rq and sched_entity by
attach_entity_cfs_rq. If the task is then moved to another cpu (and
therefore cfs_rq) before being enqueued/woken up, the load will be moved
to cfs_rq->removed from the sched_entity. Such a move will happen when
enforcing a cpuset on the task (eg. via a cgroup) that force it to move.
The load will however not be removed from the task_group itself, making
it look like there is a constant load on that cfs_rq. This causes the
vruntime of tasks on other sibling cfs_rq's to increase faster than they
are supposed to; causing severe fairness issues. If no other task is
started on the given cfs_rq, and due to the cpuset it would not happen,
this load would never be properly unloaded. With this patch the load
will be properly removed inside update_blocked_averages. This also
applies to tasks moved to the fair scheduling class and moved to another
cpu, and this path will also fix that. For fork, the entity is queued
right away, so this problem does not affect that.
This applies to cases where the new process is the first in the cfs_rq,
issue introduced 3d30544f02 ("sched/fair: Apply more PELT fixes"), and
when there has previously been load on the cgroup but the cgroup was
removed from the leaflist due to having null PELT load, indroduced
in 039ae8bcf7 ("sched/fair: Fix O(nr_cgroups) in the load balancing
path").
For a simple cgroup hierarchy (as seen below) with two equally weighted
groups, that in theory should get 50/50 of cpu time each, it often leads
to a load of 60/40 or 70/30.
parent/
cg-1/
cpu.weight: 100
cpuset.cpus: 1
cg-2/
cpu.weight: 100
cpuset.cpus: 1
If the hierarchy is deeper (as seen below), while keeping cg-1 and cg-2
equally weighted, they should still get a 50/50 balance of cpu time.
This however sometimes results in a balance of 10/90 or 1/99(!!) between
the task groups.
$ ps u -C stress
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18568 1.1 0.0 3684 100 pts/12 R+ 13:36 0:00 stress --cpu 1
root 18580 99.3 0.0 3684 100 pts/12 R+ 13:36 0:09 stress --cpu 1
parent/
cg-1/
cpu.weight: 100
sub-group/
cpu.weight: 1
cpuset.cpus: 1
cg-2/
cpu.weight: 100
sub-group/
cpu.weight: 10000
cpuset.cpus: 1
This can be reproduced by attaching an idle process to a cgroup and
moving it to a given cpuset before it wakes up. The issue is evident in
many (if not most) container runtimes, and has been reproduced
with both crun and runc (and therefore docker and all its "derivatives"),
and with both cgroup v1 and v2.
Fixes: 3d30544f02 ("sched/fair: Apply more PELT fixes")
Fixes: 039ae8bcf7 ("sched/fair: Fix O(nr_cgroups) in the load balancing path")
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210501141950.23622-2-odin@uged.al
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d2f8909a5 ]
Util-clamp places tasks in different buckets based on their clamp values
for performance reasons. However, the size of buckets is currently
computed using a rounding division, which can lead to an off-by-one
error in some configurations.
For instance, with 20 buckets, the bucket size will be 1024/20=51. A
task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
correct indexes are in range [0,19], hence leading to an out of bound
memory access.
Clamp the bucket id to fix the issue.
Fixes: 69842cba9a ("sched/uclamp: Add CPU's clamp buckets refcounting")
Suggested-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20210430151412.160913-1-qperret@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95b079d821 ]
Fix the type of index from unsigned int to int since find_slots() might
return -1.
Fixes: 26a7e09478 ("swiotlb: refactor swiotlb_tbl_map_single")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Claire Chang <tientzu@chromium.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f113bf760 ]
Any parallel softlockup reports are skipped when one CPU is already
printing backtraces from all CPUs.
The exclusive rights are synchronized using one bit in
soft_lockup_nmi_warn. There is also one memory barrier that does not make
much sense.
Use two barriers on the right location to prevent mixing two reports.
[pmladek@suse.com: use bit lock operations to prevent multiple soft-lockup reports]
Link: https://lkml.kernel.org/r/YFSVsLGVWMXTvlbk@alley
Link: https://lkml.kernel.org/r/20210311122130.6788-6-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1bc503cb4a ]
The softlockup detector does some gymnastic with the variable
soft_watchdog_warn. It was added by the commit 58687acba5
("lockup_detector: Combine nmi_watchdog and softlockup detector").
The purpose is not completely clear. There are the following clues. They
describe the situation how it looked after the above mentioned commit:
1. The variable was checked with a comment "only warn once".
2. The variable was set when softlockup was reported. It was cleared
only when the CPU was not longer in the softlockup state.
3. watchdog_touch_ts was not explicitly updated when the softlockup
was reported. Without this variable, the report would normally
be printed again during every following watchdog_timer_fn()
invocation.
The logic has got even more tangled up by the commit ed235875e2
("kernel/watchdog.c: print traces for all cpus on lockup detection").
After this commit, soft_watchdog_warn is set only when
softlockup_all_cpu_backtrace is enabled. But multiple reports from all
CPUs are prevented by a new variable soft_lockup_nmi_warn.
Conclusion:
The variable probably never worked as intended. In each case, it has not
worked last many years because the softlockup was reported repeatedly
after the full period defined by watchdog_thresh.
The reason is that watchdog gets touched in many known slow paths, for
example, in printk_stack_address(). This code is called also when
printing the softlockup report. It means that the watchdog timestamp gets
updated after each report.
Solution:
Simply remove the logic. People want the periodic report anyway.
Link: https://lkml.kernel.org/r/20210311122130.6788-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>