linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Mayank Rungta	077ba03600	watchdog/hardlockup: improve buddy system detection timeliness Currently, the buddy system only performs checks every 3rd sample. With a 4-second interval. If a check window is missed, the next check occurs 12 seconds later, potentially delaying hard lockup detection for up to 24 seconds. Modify the buddy system to perform checks at every interval (4s). Introduce a missed-interrupt threshold to maintain the existing grace period while reducing the detection window to 8-12 seconds. Best and worst case detection scenarios: Before (12s check window): - Best case: Lockup occurs after first check but just before heartbeat interval. Detected in ~8s (8s till next check). - Worst case: Lockup occurs just after a check. Detected in ~24s (missed check + 12s till next check + 12s logic). After (4s check window with threshold of 3): - Best case: Lockup occurs just before a check. Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd). - Worst case: Lockup occurs just after a check. Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd). Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-4-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erainan <eranian@google.com> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-27 21:19:47 -07:00
Mayank Rungta	746bb7fa91	watchdog: update saved interrupts during check Currently, arch_touch_nmi_watchdog() causes an early return that skips updating hrtimer_interrupts_saved. This leads to stale comparisons and delayed lockup detection. I found this issue because in our system the serial console is fairly chatty. For example, the 8250 console driver frequently calls touch_nmi_watchdog() via console_write(). If a CPU locks up after a timer interrupt but before next watchdog check, we see the following sequence: * watchdog_hardlockup_check() saves counter (e.g., 1000) * Timer runs and updates the counter (1001) * touch_nmi_watchdog() is called * CPU locks up * 10s pass: check() notices touch, returns early, skips update * 10s pass: check() saves counter (1001) * 10s pass: check() finally detects lockup This delays detection to 30 seconds. With this fix, we detect the lockup in 20 seconds. Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-2-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erainan <eranian@google.com> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-27 21:19:46 -07:00
Mayank Rungta	3e811cae32	watchdog: return early in watchdog_hardlockup_check() Patch series "watchdog/hardlockup: Improvements to hardlockup", v2. This series addresses limitations in the hardlockup detector implementations and updates the documentation to reflect actual behavior and recent changes. The changes are structured as follows: Refactoring (Patch 1) ===================== Patch 1 refactors watchdog_hardlockup_check() to return early if no lockup is detected. This reduces the indentation level of the main logic block, serving as a clean base for the subsequent changes. Hardlockup Detection Improvements (Patches 2 & 4) ================================================= The hardlockup detector logic relies on updating saved interrupt counts to determine if the CPU is making progress. Patch 1 ensures that the saved interrupt count is updated unconditionally before checking the "touched" flag. This prevents stale comparisons which can delay detection. This is a logic fix that ensures the detector remains accurate even when the watchdog is frequently touched. Patch 3 improves the Buddy detector's timeliness. The current checking interval (every 3rd sample) causes high variability in detection time (up to 24s). This patch changes the Buddy detector to check at every hrtimer interval (4s) with a missed-interrupt threshold of 3, narrowing the detection window to a consistent 8-12 second range. Documentation Updates (Patches 3 & 5) ===================================== The current documentation does not fully capture the variable nature of detection latency or the details of the Buddy system. Patch 3 removes the strict "10 seconds" definition of a hardlockup, which was misleading given the periodic nature of the detector. It adds a "Detection Overhead" section to the admin guide, using "Best Case" and "Worst Case" scenarios to illustrate that detection time can vary significantly (e.g., ~6s to ~20s). Patch 5 adds a dedicated section for the Buddy detector, which was previously undocumented. It details the mechanism, the new timing logic, and known limitations. This patch (of 5): Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()` to return early when a hardlockup is not detected. This flattens the main logic block, reducing the indentation level and making the code easier to read and maintain. This refactoring serves as a preparation patch for future hardlockup changes. Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-1-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erainan <eranian@google.com> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-27 21:19:46 -07:00
Shengming Hu	cafe4074a7	watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() cpustat_tail indexes cpustat_util[], which is a NUM_SAMPLE_PERIODS-sized ring buffer. need_counting_irqs() currently wraps the index using NUM_HARDIRQ_REPORT, which only happens to match NUM_SAMPLE_PERIODS. Use NUM_SAMPLE_PERIODS for the wrap to keep the ring math correct even if the NUM_HARDIRQ_REPORT or NUM_SAMPLE_PERIODS changes. Link: https://lkml.kernel.org/r/tencent_7068189CB6D6689EB353F3D17BF5A5311A07@qq.com Fixes: `e9a9292e23` ("watchdog/softlockup: Report the most frequent interrupts") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Run <zhang.run@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-08 00:13:34 -08:00
Li RongQing	e700f5d156	watchdog: softlockup: panic when lockup duration exceeds N thresholds The softlockup_panic sysctl is currently a binary option: panic immediately or never panic on soft lockups. Panicking on any soft lockup, regardless of duration, can be overly aggressive for brief stalls that may be caused by legitimate operations. Conversely, never panicking may allow severe system hangs to persist undetected. Extend softlockup_panic to accept an integer threshold, allowing the kernel to panic only when the normalized lockup duration exceeds N watchdog threshold periods. This provides finer-grained control to distinguish between transient delays and persistent system failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic when duration >= 1 * threshold (20s default, original behavior) - N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.) The original behavior is preserved for values 0 and 1, maintaining full backward compatibility while allowing systems to tolerate brief lockups while still catching severe, persistent hangs. [lirongqing@baidu.com: v2] Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:44:20 -08:00
Feng Tang	e561383a39	powerpc/watchdog: add support for hardlockup_sys_info sysctl Commit `a9af76a787` ("watchdog: add sys_info sysctls to dump sys info on system lockup") adds 'hardlock_sys_info' systcl knob for general kernel watchdog to control what kinds of system debug info to be dumped on hardlockup. Add similar support in powerpc watchdog code to make the sysctl knob more general, which also fixes a compiling warning in general watchdog code reported by 0day bot. Link: https://lkml.kernel.org/r/20251231080309.39642-1-feng.tang@linux.alibaba.com Fixes: `a9af76a787` ("watchdog: add sys_info sysctls to dump sys info on system lockup") Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512030920.NFKtekA7-lkp@intel.com/ Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-14 22:16:22 -08:00
Linus Torvalds	509d3f4584	Significant patch series in this pull request: - The 6 patch series "panic: sys_info: Refactor and fix a potential issue" from Andy Shevchenko fixes a build issue and does some cleanup in ib/sys_info.c. - The 9 patch series "Implement mul_u64_u64_div_u64_roundup()" from David Laight enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions. - The 2 patch series "scripts/gdb/symbols: make BPF debug info available to GDB" from Ilya Leoshkevich makes BPF symbol names, sizes, and line numbers available to the GDB debugger. - The 4 patch series "Enable hung_task and lockup cases to dump system info on demand" from Feng Tang adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire. - The 6 patch series "lib/base64: add generic encoder/decoder, migrate users" from Kuan-Wei Chiu adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations. - The 2 patch series "rbree: inline rb_first() and rb_last()" from Eric Dumazet makes TCP a little faster. - The 9 patch series "liveupdate: Rework KHO for in-kernel users" from Pasha Tatashin reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients. - The 13 patch series "kho: simplify state machine and enable dynamic updates" from Pasha Tatashin increases the flexibility of KEXEC Handover. Also preparation for LUO. - The 18 patch series "Live Update Orchestrator" from Pasha Tatashin is a major new feature targeted at cloud environments. Quoting the [0/N]: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - The 3 patch series "kexec: reorganize kexec and kdump sysfs" from Sourabh Jain moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day. - The 2 patch series "kho: fixes for vmalloc restoration" from Mike Rapoport fixes a BUG which was being hit during KHO restoration of vmalloc() regions. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTSAkQAKCRDdBJ7gKXxA jrkiAP9QKfsRv46XZaM5raScjY1ayjP+gqb2rgt6BQ/gZvb2+wD/cPAYOR6BiX52 n0pVpQmG5P/KyOmpLztn96ejL4heKwQ= =JY96 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko) fixes a build issue and does some cleanup in ib/sys_info.c - "Implement mul_u64_u64_div_u64_roundup()" (David Laight) enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich) makes BPF symbol names, sizes, and line numbers available to the GDB debugger - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang) adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu) adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations - "rbree: inline rb_first() and rb_last()" (Eric Dumazet) makes TCP a little faster - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin) reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin) increases the flexibility of KEXEC Handover. Also preparation for LUO - "Live Update Orchestrator" (Pasha Tatashin) is a major new feature targeted at cloud environments. Quoting the cover letter: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain) moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day - "kho: fixes for vmalloc restoration" (Mike Rapoport) fixes a BUG which was being hit during KHO restoration of vmalloc() regions * tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits) calibrate: update header inclusion Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" vmcoreinfo: track and log recoverable hardware errors kho: fix restoring of contiguous ranges of order-0 pages kho: kho_restore_vmalloc: fix initialization of pages array MAINTAINERS: TPM DEVICE DRIVER: update the W-tag init: replace simple_strtoul with kstrtoul to improve lpj_setup KHO: fix boot failure due to kmemleak access to non-PRESENT pages Documentation/ABI: new kexec and kdump sysfs interface Documentation/ABI: mark old kexec sysfs deprecated kexec: move sysfs entries to /sys/kernel/kexec test_kho: always print restore status kho: free chunks using free_page() instead of kfree() selftests/liveupdate: add kexec test for multiple and empty sessions selftests/liveupdate: add simple kexec-based selftest for LUO selftests/liveupdate: add userspace API selftests docs: add documentation for memfd preservation via LUO mm: memfd_luo: allow preserving memfd liveupdate: luo_file: add private argument to store runtime state mm: shmem: export some functions to internal.h ...	2025-12-06 14:01:20 -08:00
Linus Torvalds	ac20755937	Summary * Move jiffies converters out of kernel/sysctl.c Moved the jiffies converters into kernel/time/jiffies.c and replaced the pipe-max-size proc_handler converter with a macro based version. This is all part of the effort to relocate non-sysctl logic out of kernel/sysctl.c into more relevant subsystems. No functional changes. * Generalize proc handler converter creation Removed duplicated sysctl converter logic by consolidating it in macros. These are used inside sysctl core as well as in pipe.c and jiffies.c. Converter kernel and user space pointer args are now automatically const qualified for the convenience of the caller. No functional changes. * Miscellaneous Fixed kernel-doc format warnings, removed unnecessary __user qualifiers, and moved the nmi_watchdog sysctl into .rodata. * Testing This series was run through sysctl selftests/kunit test suite in x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks of testing. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmktuJMACgkQupfNUreW QU9l8Qv+Noh/wLTqBEmHCrQ8k19YCNlBHO6a10Q5bFiAiGdTAMCZ7oFzoAwAjv5y pLtzS75G89zP0O6wgkxTsmoDNi4MRJenOCyjyEDFYvrK+qSTm0CWs0sZCsHqX4Dg 7M+7PVK/EbMO5509J2ae6cYS9pjfwg3EBQZ978b/FATkuhRjxOIJhIv3ZoaFjme4 0q/xqHw+oms5CUL035BfqtkoskIiRT19DAvM/DEjc2ByaHCTGURv00XLvSDHaRer O0Z8nXaxOOCscLunZbC3UL+hC7tB0nPE+XSzm9ylBEM7bTxeZmtvx2G6ru0+873U Sp+BwpFhe0RmzBFlclkd7UPtvGlFAY2QgAfpSaiLsodkoX0mctquTgpy99LhxKej EEyjl9tPVrYoH4MG562bZPGrQHtV4vnR9DXYx56vYtY2Fyr1GZmWawQoMZsHe9AU cVw5HrfeKeHBhk9hi3ZvT9z96ns3YBmIHnYNDeMy+mF/i+cVu///GwgGuFqUqKag 3eWcTaPh =DXVD -----END PGP SIGNATURE----- Merge tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move jiffies converters out of kernel/sysctl.c Move the jiffies converters into kernel/time/jiffies.c and replace the pipe-max-size proc_handler converter with a macro based version. This is all part of the effort to relocate non-sysctl logic out of kernel/sysctl.c into more relevant subsystems. No functional changes. - Generalize proc handler converter creation Remove duplicated sysctl converter logic by consolidating it in macros. These are used inside sysctl core as well as in pipe.c and jiffies.c. Converter kernel and user space pointer args are now automatically const qualified for the convenience of the caller. No functional changes. - Miscellaneous Fix kernel-doc format warnings, remove unnecessary __user qualifiers, and move the nmi_watchdog sysctl into .rodata. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks of testing. * tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (21 commits) sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv sysctl: Create pipe-max-size converter using sysctl UINT macros sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c sysctl: Move jiffies converters to kernel/time/jiffies.c sysctl: Move UINT converter macros to sysctl header sysctl: Move INT converter macros to sysctl header sysctl: Allow custom converters from outside sysctl sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument sysctl: Create macro for user-to-kernel uint converter sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM sysctl: Create unsigned int converter using new macro sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM sysctl: Create integer converters with one macro sysctl: Create converter functions with two new macros sysctl: Discriminate between kernel and user converter params sysctl: Indicate the direction of operation with macro names sysctl: Remove superfluous __do_proc_* indirection sysctl: Remove superfluous tbl_data param from "dovec" functions sysctl: Replace void pointer with const pointer to ctl_table sysctl: fix kernel-doc format warning ...	2025-12-05 11:15:37 -08:00
Feng Tang	a9af76a787	watchdog: add sys_info sysctls to dump sys info on system lockup When soft/hard lockup happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all requested information will be printed out. (refer kernel/sys_info.c for more details). Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-20 14:03:43 -08:00
Tejun Heo	1dcb98bbb7	sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs With the buddy lockup detector, smp_processor_id() returns the detecting CPU, not the locked CPU, making scx_hardlockup()'s printouts confusing. Pass the locked CPU number from watchdog_hardlockup_check() as a parameter instead. Also add kerneldoc comments to handle_lockup(), scx_hardlockup(), and scx_rcu_cpu_stall() documenting their return value semantics. Suggested-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-11-14 11:11:08 -10:00
Tejun Heo	582f700e1b	sched_ext: Hook up hardlockup detector A poorly behaving BPF scheduler can trigger hard lockup. For example, on a large system with many tasks pinned to different subsets of CPUs, if the BPF scheduler puts all tasks in a single DSQ and lets all CPUs at it, the DSQ lock can be contended to the point where hardlockup triggers. Unfortunately, hardlockup can be the first signal out of such situations, thus requiring hardlockup handling. Hook scx_hardlockup() into the hardlockup detector to try kicking out the current scheduler in an attempt to recover the system to a good state. The handling strategy can delay watchdog taking its own action by one polling period; however, given that the only remediation for hardlockup is crash, this is likely an acceptable trade-off. v2: Add missing dummy scx_hardlockup() definition for !CONFIG_SCHED_CLASS_EXT (kernel test bot). Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com> Cc: Emil Tsalapatis <etsal@meta.com> Cc: Douglas Anderson <dianders@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-11-12 06:43:44 -10:00
Joel Granados	18c4e02884	watchdog: move nmi_watchdog sysctl into .rodata Move nmi_watchdog into the watchdog_sysctls array to prevent it from unnecessary modification. This move effectively moves it inside the .rodata section. Initially moved out into its own non-const array in commit `9ec272c586` ("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails"), which made it writable only when watchdog_hardlockup_available was true. Moving it back to watchdog_sysctl keeps this behavior as writing to nmi_watchdog still fails when watchdog_hardlockup_available is false. Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>	2025-10-24 15:35:11 +02:00
Jinchao Wang	3d5f4f15b7	watchdog: skip checks when panic is in progress This issue was found when an EFI pstore was configured for kdump logging with the NMI hard lockup detector enabled. The efi-pstore write operation was slow, and with a large number of logs, the pstore dump callback within kmsg_dump() took a long time. This delay triggered the NMI watchdog, leading to a nested panic. The call flow demonstrates how the secondary panic caused an emergency_restart() to be triggered before the initial pstore operation could finish, leading to a failure to dump the logs: real panic() { kmsg_dump() { ... pstore_dump() { start_dump(); ... // long time operation triggers NMI watchdog nmi panic() { ... emergency_restart(); // pstore unfinished } ... finish_dump(); // never reached } } } Both watchdog_buddy_check_hardlockup() and watchdog_overflow_callback() may trigger during a panic. This can lead to recursive panic handling. Add panic_in_progress() checks so watchdog activity is skipped once a panic has begun. This prevents recursive panic and keeps the panic path more reliable. Link: https://lkml.kernel.org/r/20250825022947.1596226-10-wangjinchao600@gmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Doug Anderson <dianders@chromium.org> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kees Cook <kees@kernel.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Nam Cao <namcao@linutronix.de> Cc: oushixiong <oushixiong@kylinos.cn> Cc: Petr Mladek <pmladek@suse.com> Cc: Qianqiang Liu <qianqiang.liu@163.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:53 -07:00
ZhenguoYao	95f091274f	watchdog/softlockup: fix incorrect CPU utilization output during softlockup Since we use 16-bit precision, the raw data will undergo integer division, which may sometimes result in data loss. This can lead to slightly inaccurate CPU utilization calculations. Under normal circumstances, this isn't an issue. However, when CPU utilization reaches 100%, the calculated result might exceed 100%. For example, with raw data like the following: sample_period 400000134 new_stat 83648414036 old_stat 83247417494 sample_period=400000134/2^24=23 new_stat=83648414036/2^24=4985 old_stat=83247417494/2^24=4961 util=105% Below log will output： CPU#3 Utilization every 0s during lockup: #1: 0% system, 0% softirq, 105% hardirq, 0% idle #2: 0% system, 0% softirq, 105% hardirq, 0% idle #3: 0% system, 0% softirq, 100% hardirq, 0% idle #4: 0% system, 0% softirq, 105% hardirq, 0% idle #5: 0% system, 0% softirq, 105% hardirq, 0% idle To avoid confusion, we enforce a 100% display cap when calculations exceed this threshold. We also round to the nearest multiple of 16.8 milliseconds to improve the accuracy. [yaozhenguo1@gmail.com: make get_16bit_precision() more accurate, fix comment layout] Link: https://lkml.kernel.org/r/20250818081438.40540-1-yaozhenguo@jd.com Link: https://lkml.kernel.org/r/20250812082510.32291-1-yaozhenguo@jd.com Signed-off-by: ZhenguoYao <yaozhenguo1@gmail.com> Cc: Bitao Hu <yaoma@linux.alibaba.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:47 -07:00
ZhenguoYao	41f88ddfd4	watchdog/softlockup: fix wrong output when watchdog_thresh < 3 When watchdog_thresh is below 3, sample_period will be less than 1 second. So the following output will print when softlockup: CPU#3 Utilization every 0s during lockup Fix this by changing time unit from seconds to milliseconds. Link: https://lkml.kernel.org/r/20250812074132.27810-1-yaozhenguo@jd.com Signed-off-by: ZhenguoYao <yaozhenguo1@gmail.com> Cc: Bitao Hu <yaoma@linux.alibaba.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:47 -07:00
Max Kellermann	aaf05e96e9	kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count Patch series "sysfs: add counters for lockups and stalls", v2. Commits `9db89b4111` ("exit: Expose "oops_count" to sysfs") and `8b05aa2633` ("panic: Expose "warn_count" to sysfs") added counters for oopses and warnings to sysfs, and these two patches do the same for hard/soft lockups and RCU stalls. All of these counters are useful for monitoring tools to detect whether the machine is healthy. If the kernel has experienced a lockup or a stall, it's probably due to a kernel bug, and I'd like to detect that quickly and easily. There is currently no way to detect that, other than parsing dmesg. Or observing indirect effects: such as certain tasks not responding, but then I need to observe all tasks, and it may take a while until these effects become visible/measurable. I'd rather be able to detect the primary cause more quickly, possibly before everything falls apart. This patch (of 2): There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count and /sys/kernel/oops_count but there is no userspace-accessible counter for hard/soft lockups. Having this is useful for monitoring tools. Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Cc: Cc: Core Minyard <cminyard@mvista.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Joel Granados <joel.granados@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-21 10:48:22 -07:00
Luo Gengkun	7123dbbef8	watchdog: fix watchdog may detect false positive of softlockup When updating `watchdog_thresh`, there is a race condition between writing the new `watchdog_thresh` value and stopping the old watchdog timer. If the old timer triggers during this window, it may falsely detect a softlockup due to the old interval and the new `watchdog_thresh` value being used. The problem can be described as follow: # We asuume previous watchdog_thresh is 60, so the watchdog timer is # coming every 24s. echo 10 > /proc/sys/kernel/watchdog_thresh (User space) \| +------>+ update watchdog_thresh (We are in kernel now) \| \| # using old interval and new `watchdog_thresh` +------>+ watchdog hrtimer (irq context: detect softlockup) \| \| +-------+ \| \| + softlockup_stop_all To fix this problem, introduce a shadow variable for `watchdog_thresh`. The update to the actual `watchdog_thresh` is delayed until after the old timer is stopped, preventing false positives. The following testcase may help to understand this problem. --------------------------------------------- echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features echo -1 > /proc/sys/kernel/sched_rt_runtime_us echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime echo 60 > /proc/sys/kernel/watchdog_thresh taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" & echo 10 > /proc/sys/kernel/watchdog_thresh & --------------------------------------------- The test case above first removes the throttling restrictions for real-time tasks. It then sets watchdog_thresh to 60 and executes a real-time task ,a simple while(1) loop, on cpu3. Consequently, the final command gets blocked because the presence of this real-time thread prevents kworker:3 from being selected by the scheduler. This eventually triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating with inconsistent variable - using both the old interval and the updated watchdog_thresh simultaneously. [nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case] Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Joel Granados <joel.granados@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Nysal Jan K.A." <nysal@linux.ibm.com> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:54:10 -07:00
Linus Torvalds	a50b4fe095	A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5jQTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVvRD/wKtuwmiA66NJFgXC0qVq82A6fO3bY8 GBdbfysDJIbqGu5PTcULTbJ8qkqv3jeLUv6CcXvS4sZ7y/uJQl2lzf8yrD/0bbwc rLI6sHiPSZmK93kNVN4X5H7kvt7cE/DYC9nnEOgK3BY5FgKc4n9887d4aVBhL8Lv ODwVXvZ+xi351YCj7qRyPU24zt/p4tkkT1o2k4a0HBluqLI0D+V20fke9IERUL8r d1uWKlcn0TqYDesE8HXKIhbst3gx52rMJrXBJDHwFmG6v8Pj1fkTXCVpPo8QcBz8 OTVkpomN9f/Tx4+GZwhZOF86LhLL3OhxD6pT7JhFCXdmSGv+Ez8uyk1YZysM/XpV Juy/1yAcBpDIDkmhMFGdAAn48Nn9Fotty0r4je60zSEp1d/4QMXcFme29qr2JTUE iWnQ/HD6DxUjVHqy7CYvvo26Xegg1C7qgyOVt4PYZwAM1VKF5P3kzYTb4SAdxtop Tpji1sfW9QV08jqMNo6XntD32DSP9S2HqjO9LwBw700jnx2jjJ35fcJs6iodMOUn gckIZLMn3L0OoglPdyA5O7SNTbKE7aFiRKdnT/cJtR3Fa39Qu27CwC5gfiyuie9I Q+LG8GLuYSBHXAR+PBK4GWlzJ7Dn8k3eqmbnLeKpRMsU6ZzcttgA64xhaviN2wN0 iJbvLJeisXr3GA== =bYAX -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...	2025-03-25 10:54:15 -07:00
Li Huafei	d6834d9c99	watchdog/hardlockup/perf: Fix perf_event memory leak During stress-testing, we found a kmemleak report for perf_event: unreferenced object 0xff110001410a33e0 (size 1328): comm "kworker/4:11", pid 288, jiffies 4294916004 hex dump (first 32 bytes): b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de ...;...."....... f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff .3.A.....3.A.... backtrace (crc 24eb7b3a): [<00000000e211b653>] kmem_cache_alloc_node_noprof+0x269/0x2e0 [<000000009d0985fa>] perf_event_alloc+0x5f/0xcf0 [<00000000084ad4a2>] perf_event_create_kernel_counter+0x38/0x1b0 [<00000000fde96401>] hardlockup_detector_event_create+0x50/0xe0 [<0000000051183158>] watchdog_hardlockup_enable+0x17/0x70 [<00000000ac89727f>] softlockup_start_fn+0x15/0x40 ... Our stress test includes CPU online and offline cycles, and updating the watchdog configuration. After reading the code, I found that there may be a race between cleaning up perf_event after updating watchdog and disabling event when the CPU goes offline: CPU0 CPU1 CPU2 (update watchdog) (hotplug offline CPU1) ... _cpu_down(CPU1) cpus_read_lock() // waiting for cpu lock softlockup_start_all smp_call_on_cpu(CPU1) softlockup_start_fn ... watchdog_hardlockup_enable(CPU1) perf create E1 watchdog_ev[CPU1] = E1 cpus_read_unlock() cpus_write_lock() cpuhp_kick_ap_work(CPU1) cpuhp_thread_fun ... watchdog_hardlockup_disable(CPU1) watchdog_ev[CPU1] = NULL dead_event[CPU1] = E1 __lockup_detector_cleanup for each dead_events_mask release each dead_event /* * CPU1 has not been added to * dead_events_mask, then E1 * will not be released / CPU1 -> dead_events_mask cpumask_clear(&dead_events_mask) // dead_events_mask is cleared, E1 is leaked In this case, the leaked perf_event E1 matches the perf_event leak reported by kmemleak. Due to the low probability of problem recurrence (only reported once), I added some hack delays in the code: static void __lockup_detector_reconfigure(void) { ... watchdog_hardlockup_start(); cpus_read_unlock(); + mdelay(100); / * Must be called outside the cpus locked section to prevent * recursive locking in the perf code. ... } void watchdog_hardlockup_disable(unsigned int cpu) { ... perf_event_disable(event); this_cpu_write(watchdog_ev, NULL); this_cpu_write(dead_event, event); + mdelay(100); cpumask_set_cpu(smp_processor_id(), &dead_events_mask); atomic_dec(&watchdog_cpus); ... } void hardlockup_detector_perf_cleanup(void) { ... perf_event_release_kernel(event); per_cpu(dead_event, cpu) = NULL; } + mdelay(100); cpumask_clear(&dead_events_mask); } Then, simultaneously performing CPU on/off and switching watchdog, it is almost certain to reproduce this leak. The problem here is that releasing perf_event is not within the CPU hotplug read-write lock. Commit: `941154bd69` ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock") introduced deferred release to solve the deadlock caused by calling get_online_cpus() when releasing perf_event. Later, commit: `efe951d3de` ("perf/x86: Fix perf,x86,cpuhp deadlock") removed the get_online_cpus() call on the perf_event release path to solve another deadlock problem. Therefore, it is now possible to move the release of perf_event back into the CPU hotplug read-write lock, and release the event immediately after disabling it. Fixes: `941154bd69` ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241021193004.308303-1-lihuafei1@huawei.com	2025-03-06 12:05:33 +01:00
Nam Cao	d2254b0643	watchdog: Switch to use hrtimer_setup() hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/a5c62f2b5e1ea1cf4d32f37bc2d21a8eeab2f875.1738746821.git.namcao@linutronix.de	2025-02-18 10:32:33 +01:00
Joel Granados	1751f872cc	treewide: const qualify ctl_tables where applicable Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit `78eb4ea25c` ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>	2025-01-28 13:48:37 +01:00
Yunhui Cui	3c16fc0c91	watchdog: output this_cpu when printing hard LOCKUP When printing "Watchdog detected hard LOCKUP on cpu", also output the detecting CPU. It's more intuitive. Link: https://lkml.kernel.org/r/20241210095238.63444-1-cuiyunhui@bytedance.com Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Bitao Hu <yaoma@linux.alibaba.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Liu Song <liusong@linux.alibaba.com> Cc: Song Liu <song@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-01-12 20:21:05 -08:00
Linus Torvalds	f5f4745a7f	- The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code. - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[]. - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest. - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code. - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification. - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity. - Apart from that, singelton patches in many places - please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m Ii8igH0UBibIgva7MrCyJedDI1O23AA= =8BIU -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[] - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity - Apart from that, singelton patches in many places - please see the individual changelogs for details * tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) gdb: lx-symbols: do not error out on monolithic build kernel/reboot: replace sprintf() with sysfs_emit() lib: util_macros_kunit: add kunit test for util_macros.h util_macros.h: fix/rework find_closest() macros Improve consistency of '#error' directive messages ocfs2: fix uninitialized value in ocfs2_file_read_iter() hung_task: add docs for hung_task_detect_count hung_task: add detect count for hung tasks dma-buf: use atomic64_inc_return() in dma_buf_getfile() fs/proc/kcore.c: fix coccinelle reported ERROR instances resource: avoid unnecessary resource tree walking in __region_intersects() ocfs2: remove unused errmsg function and table ocfs2: cluster: fix a typo lib/scatterlist: use sg_phys() helper checkpatch: always parse orig_commit in fixes tag nilfs2: convert metadata aops from writepage to writepages nilfs2: convert nilfs_recovery_copy_block() to take a folio nilfs2: convert nilfs_page_count_clean_buffers() to take a folio nilfs2: remove nilfs_writepage nilfs2: convert checkpoint file to be folio-based ...	2024-11-25 16:09:48 -08:00
Tejun Heo	e32c260195	sched_ext: Enable the ops breather and eject BPF scheduler on softlockup On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly behaving BPF scheduler can live-lock the system by making multiple CPUs bang on the same DSQ to the point where soft-lockup detection triggers before SCX's own watchdog can take action. It also seems possible that the machine can be live-locked enough to prevent scx_ops_helper, which is an RT task, from running in a timely manner. Implement scx_softlockup() which is called when three quarters of soft-lockup threshold has passed. The function immediately enables the ops breather and triggers an ops error to initiate ejection of the BPF scheduler. The previous and this patch combined enable the kernel to reliably recover the system from live-lock conditions that can be triggered by a poorly behaving BPF scheduler on Intel dual socket systems. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org>	2024-11-08 10:42:22 -10:00
Tio Zhang	8380101802	kernel/watchdog: always restore watchdog_softlockup(,hardlockup)_user_enabled after proc show Otherwise when watchdog_enabled becomes 0, watchdog_softlockup(,hardlockup)_user_enabled will changes to 0 after proc show. Steps to reproduce: step 1: # cat /proc/sys/kernel/watchdog 1 1 1 \| name \| value \|----------------------------------\|-------------------------- \| watchdog_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_hardlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_softlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_user_enabled \| 1 \|----------------------------------\|-------------------------- step 2: # echo 0 > /proc/sys/kernel/watchdog \| name \| value \|----------------------------------\|-------------------------- \| watchdog_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_hardlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_softlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_user_enabled \| 0 \|----------------------------------\|-------------------------- step 3: # cat /proc/sys/kernel/watchdog 0 0 0 \| name \| value \|----------------------------------\|-------------------------- \| watchdog_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_hardlockup_user_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_softlockup_user_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_user_enabled \| 0 \|----------------------------------\|-------------------------- step 4: # echo 1 > /proc/sys/kernel/watchdog \| name \| value \|----------------------------------\|-------------------------- \| watchdog_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_hardlockup_user_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_softlockup_user_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_user_enabled \| 0 \|----------------------------------\|-------------------------- step 5: # cat /proc/sys/kernel/*watchdog 0 0 0 If we dont do "step 3", do "step 4" right after "step 2", it will be \| name \| value \|----------------------------------\|-------------------------- \| watchdog_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_hardlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_softlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_user_enabled \| 1 \|----------------------------------\|-------------------------- then everything works correctly. So this patch fix "step 3"'s value into \| name \| value \|----------------------------------\|-------------------------- \| watchdog_enabled \| 0 \|----------------------------------\|-------------------------- \| watchdog_hardlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_softlockup_user_enabled \| 1 \|----------------------------------\|-------------------------- \| watchdog_user_enabled \| 0 \|----------------------------------\|-------------------------- And still print 0 as before. Link: https://lkml.kernel.org/r/20240906094700.GA30052@didi-ThinkCentre-M930t-N000 Signed-off-by: Tio Zhang <tiozhang@didiglobal.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Ben Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:27 -08:00
Waiman Long	97cf8f5f93	watchdog: handle the ENODEV failure case of lockup_detector_delay_init() separately When watchdog_hardlockup_probe() is being called by lockup_detector_delay_init(), an error return of -ENODEV will happen for the arm64 arch when arch_perf_nmi_is_available() returns false. This means that NMI is not usable by the hard lockup detector and so has to be disabled. This can be considered a deficiency in that particular arm64 chip, but there is nothing we can do about it. That also means the following error will always be reported when the kernel boot up. watchdog: Delayed init of the lockup detector failed: -19 The word "failed" itself has a connotation that there is something wrong with the kernel which is not really the case here. Handle this special ENODEV case separately and explain the reason behind disabling hard lockup detector without causing anxiety for those users who read the above message and wonder about it. Link: https://lkml.kernel.org/r/20240802151621.617244-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Cc: Douglas Anderson <dianders@chromium.org> Cc: Joel Granados <j.granados@samsung.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-01 20:43:32 -07:00
Joel Granados	78eb4ea25c	sysctl: treewide: constify the ctl_table argument of proc_handlers const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer\|interval)_handler\|sched_(rt\|rr)_handler\|rds_tcp_skbuf_handler\|proc_sctp_do_(hmac_alg\|rto_min\|rto_max\|udp_port\|alpha_beta\|auth\|probe_interval)"; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void , size_t , loff_t ); @r4@ identifier func, ctl; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int , void , size_t , loff_t ); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void buffer, size_t lenp, loff_t ppos); ``` Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>	2024-07-24 20:59:29 +02:00
Linus Torvalds	eb6a9339ef	Mainly singleton patches, documented in their respective changelogs. Notable series include: - Some maintenance and performance work for ocfs2 in Heming Zhao's series "improve write IO performance when fragmentation is high". - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes exposed by fstests". - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean up kfifo.h". - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes for $lx_current and $lx_per_cpu". - After much discussion, a coding-style update from Barry Song explaining one reason why inline functions are preferred over macros. The series is "codingstyle: avoid unused parameters for a function-like macro". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkpLYQAKCRDdBJ7gKXxA jo9NAQDctSD3TMXqxqCHLaEpCaYTYzi6TGAVHjgkqGzOt7tYjAD/ZIzgcmRwthjP R7SSiSgZ7UnP9JRn16DQILmFeaoG1gs= =lYhr -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: "Mainly singleton patches, documented in their respective changelogs. Notable series include: - Some maintenance and performance work for ocfs2 in Heming Zhao's series "improve write IO performance when fragmentation is high". - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes exposed by fstests". - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean up kfifo.h". - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes for $lx_current and $lx_per_cpu". - After much discussion, a coding-style update from Barry Song explaining one reason why inline functions are preferred over macros. The series is "codingstyle: avoid unused parameters for a function-like macro"" * tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits) fs/proc: fix softlockup in __read_vmcore nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON() scripts: checkpatch: check unused parameters for function-like macro Documentation: coding-style: ask function-like macros to evaluate parameters nilfs2: use __field_struct() for a bitwise field selftests/kcmp: remove unused open mode nilfs2: remove calls to folio_set_error() and folio_clear_error() kernel/watchdog_perf.c: tidy up kerneldoc watchdog: allow nmi watchdog to use raw perf event watchdog: handle comma separated nmi_watchdog command line nilfs2: make superblock data array index computation sparse friendly squashfs: remove calls to set the folio error flag squashfs: convert squashfs_symlink_read_folio to use folio APIs scripts/gdb: fix detection of current CPU in KGDB scripts/gdb: make get_thread_info accept pointers scripts/gdb: fix parameter handling in $lx_per_cpu scripts/gdb: fix failing KGDB detection during probe kfifo: don't use "proxy" headers media: stih-cec: add missing io.h media: rc: add missing io.h ...	2024-05-19 14:02:03 -07:00
Linus Torvalds	91b6163be4	sysctl changes for v6.10-rc1 Summary * Removed sentinel elements from ctl_table structs in kernel/* Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. Removals for net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline through their respective subsystems making the next release the most likely place where the final series that removes the check for proc_name == NULL will land. This PR adds to removals already in arch/, drivers/ and fs/. * Adjusted ctl_table definitions and references to allow constification Adjustments: - Removing unused ctl_table function arguments - Moving non-const elements from ctl_table to ctl_table_header - Making ctl_table pointers const in ctl_table_root structure Making the static ctl_table structs const will increase safety by keeping the pointers to proc_handler functions in .rodata. Though no ctl_tables where made const in this PR, the ground work for making that possible has started with these changes sent by Thomas Weißschuh. Testing * These changes went into linux-next after v6.9-rc4; giving it a good month of testing. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmZFvBMACgkQupfNUreW QU/eGAv9EWeiXKxr3EVSMAsb9MWbJq7C99I/pd5hMf+qH4PgJpKDH7w/sb2e8h8+ unGiW83ikgrtph7OS4/xM3Y9r3Nvzd6C/OztqgMnNKeRFdMgP7wu9HaSNs05ordb CqJdhvL93quc5HxrGTS9sdLK/wLJWOHwuWMXhX4qS44JNxTdPV2q10Rb7DZyHZ6O C9qp61L2Q2CrnOBKIx8MoeCh20ynJQAo3b0pTN63ZYF4D0vqCcnYNNTPkge4ID8/ ULJoP5hAQY0vJ4g4fC4Gmooa5GECpm8MfZUf3SdgPyauqM/sm3dVdsLXAWD4Phcp TsG2a/5KMYwnLHrUGwDW7bFfEemRU88h0Iam56+SKMl1kMlEpWaLL9ApQXoHFayG e10izS+i/nlQiqYIHtuczCoTimT4/LGnonCLcdA//C3XzBT5MnOd7xsjuaQSpFWl /CV9SZa4ABwzX7u2jty8ik90iihLCFQyKj1d9m1mDVbgb6r3iUOxVuHBgMtY7MF7 eyaEmV7l =/rQW -----END PGP SIGNATURE----- Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove sentinel elements from ctl_table structs in kernel/* Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. Removals for net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline through their respective subsystems making the next release the most likely place where the final series that removes the check for proc_name == NULL will land. This adds to removals already in arch/, drivers/ and fs/. - Adjust ctl_table definitions and references to allow constification - Remove unused ctl_table function arguments - Move non-const elements from ctl_table to ctl_table_header - Make ctl_table pointers const in ctl_table_root structure Making the static ctl_table structs const will increase safety by keeping the pointers to proc_handler functions in .rodata. Though no ctl_tables where made const in this PR, the ground work for making that possible has started with these changes sent by Thomas Weißschuh. * tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: drop now unnecessary out-of-bounds check sysctl: move sysctl type to ctl_table_header sysctl: drop sysctl_is_perm_empty_ctl_table sysctl: treewide: constify argument ctl_table_root::permissions(table) sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) bpf: Remove the now superfluous sentinel elements from ctl_table array delayacct: Remove the now superfluous sentinel elements from ctl_table array kprobes: Remove the now superfluous sentinel elements from ctl_table array printk: Remove the now superfluous sentinel elements from ctl_table array scheduler: Remove the now superfluous sentinel elements from ctl_table array seccomp: Remove the now superfluous sentinel elements from ctl_table array timekeeping: Remove the now superfluous sentinel elements from ctl_table array ftrace: Remove the now superfluous sentinel elements from ctl_table array umh: Remove the now superfluous sentinel elements from ctl_table array kernel misc: Remove the now superfluous sentinel elements from ctl_table array	2024-05-17 17:31:24 -07:00
Song Liu	393fb313a2	watchdog: allow nmi watchdog to use raw perf event NMI watchdog permanently consumes one hardware counters per CPU on the system. For systems that use many hardware counters, this causes more aggressive time multiplexing of perf events. OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely used. Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog to use raw event. For example, on Intel CPUs, we can use "r300" to configure the watchdog to use ref-cycles event. If the raw event does not work, fall back to use "cycles". [akpm@linux-foundation.org: fix kerneldoc] Link: https://lkml.kernel.org/r/20240430060236.1878002-2-song@kernel.org Signed-off-by: Song Liu <song@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:29 -07:00
Song Liu	602ba77361	watchdog: handle comma separated nmi_watchdog command line Per the document, the kernel can accept comma separated command line like nmi_watchdog=nopanic,0. However, the code doesn't really handle it. Fix the kernel to handle it properly. Link: https://lkml.kernel.org/r/20240430060236.1878002-1-song@kernel.org Signed-off-by: Song Liu <song@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:28 -07:00
Joel Granados	11a921909f	kernel misc: Remove the now superfluous sentinel elements from ctl_table array This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove the sentinel from ctl_table arrays. Reduce by one the values used to compare the size of the adjusted arrays. Signed-off-by: Joel Granados <j.granados@samsung.com>	2024-04-24 09:43:53 +02:00
Bitao Hu	e9a9292e23	watchdog/softlockup: Report the most frequent interrupts When the watchdog determines that the current soft lockup is due to an interrupt storm based on CPU utilization, reporting the most frequent interrupts could be good enough for further troubleshooting. Below is an example of interrupt storm. The call tree does not provide useful information, but analyzing which interrupt caused the soft lockup by comparing the counts of interrupts during the lockup period allows to identify the culprit. [ 638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0] [ 638.870825] CPU#9 Utilization every 4s during lockup: [ 638.871194] #1: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.871652] #2: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.872107] #3: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.872563] #4: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.873018] #5: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs: [ 638.873994] #1: 330945 irq#7 [ 638.874236] #2: 31 irq#82 [ 638.874493] #3: 10 irq#10 [ 638.874744] #4: 2 irq#89 [ 638.874992] #5: 1 irq#102 ... [ 638.875313] Call trace: [ 638.875315] __do_softirq+0xa8/0x364 Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liu Song <liusong@linux.alibaba.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240411074134.30922-6-yaoma@linux.alibaba.com	2024-04-12 17:08:06 +02:00
Bitao Hu	d7037381d0	watchdog/softlockup: Low-overhead detection of interrupt storm The following softlockup is caused by interrupt storm, but it cannot be identified from the call tree. Because the call tree is just a snapshot and doesn't fully capture the behavior of the CPU during the soft lockup. watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] ... Call trace: __do_softirq+0xa0/0x37c __irq_exit_rcu+0x108/0x140 irq_exit+0x14/0x20 __handle_domain_irq+0x84/0xe0 gic_handle_irq+0x80/0x108 el0_irq_naked+0x50/0x58 Therefore, it is necessary to report CPU utilization during the softlockup_threshold period (report once every sample_period, for a total of 5 reportings), like this: watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] CPU#28 Utilization every 4s during lockup: #1: 0% system, 0% softirq, 100% hardirq, 0% idle #2: 0% system, 0% softirq, 100% hardirq, 0% idle #3: 0% system, 0% softirq, 100% hardirq, 0% idle #4: 0% system, 0% softirq, 100% hardirq, 0% idle #5: 0% system, 0% softirq, 100% hardirq, 0% idle ... This is helpful in determining whether an interrupt storm has occurred or in identifying the cause of the softlockup. The criteria for determination are as follows: a. If the hardirq utilization is high, then interrupt storm should be considered and the root cause cannot be determined from the call tree. b. If the softirq utilization is high, then the call might not necessarily point at the root cause. c. If the system utilization is high, then analyzing the root cause from the call tree is possible in most cases. The mechanism requires a considerable amount of global storage space when configured for the maximum number of CPUs. Therefore, adding a SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes" if the max number of CPUs is <= 128. Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com	2024-04-12 17:08:05 +02:00
Thomas Weißschuh	75060b6ead	watchdog/core: remove sysctl handlers from public header The functions are only used in the file where they are defined. Remove them from the header and make them static. Also guard proc_soft_watchdog with a #define-guard as it is not used otherwise. Link: https://lkml.kernel.org/r/20240306-const-sysctl-prep-watchdog-v1-1-bd45da3a41cf@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-03-12 13:09:23 -07:00
Douglas Anderson	55efe4abf9	watchdog: if panicking and we dumped everything, don't re-enable dumping If, as part of handling a hardlockup or softlockup, we've already dumped all CPUs and we're just about to panic, don't reenable dumping and give some other CPU a chance to hop in there and add some confusing logs right as the panic is happening. Link: https://lkml.kernel.org/r/20231220131534.4.Id3a9c7ec2d7d83e4080da6f8662ba2226b40543f@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-29 12:22:30 -08:00
Douglas Anderson	ee6bdb3f4b	watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read. The interleaving problem was especially bad with the "perf" hardlockup detector where the locked up CPU is always the same as the running CPU and we end up in show_regs(). show_regs() has no inherent serialization so we could mix together two crawls if two hardlockups happened at the same time (and if we didn't have `sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll fully serialize hardlockups when using the "perf" hardlockup detector. The interleaving problem was less bad with the "buddy" hardlockup detector. With "buddy" we always end up calling `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running one. trigger_single_cpu_backtrace() always at least serializes the individual stack crawls because it eventually uses printk_cpu_sync_get_irqsave(). Unfortunately the fact that trigger_single_cpu_backtrace() eventually calls printk_cpu_sync_get_irqsave() (on a different CPU) means that we have to drop the "lock" before calling it and we can't fully serialize all printouts associated with a given hardlockup. However, we still do get the advantage of serializing the output of print_modules() and print_irqtrace_events(). Aside from serializing hardlockups from each other, this change also has the advantage of serializing hardlockups and softlockups from each other if they happen to happen at the same time since they are both using the same "lock". Even though nobody is expected to hang while holding the lock associated with printk_cpu_sync_get_irqsave(), out of an abundance of caution, we don't call printk_cpu_sync_get_irqsave() until after we print out about the hardlockup. This makes extra sure that, even if printk_cpu_sync_get_irqsave() somehow never runs we at least print that we saw the hardlockup. This is different than the choice made for softlockup because hardlockup is really our last resort. Link: https://lkml.kernel.org/r/20231220131534.3.I6ff691b3b40f0379bc860f80c6e729a0485b5247@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-29 12:22:30 -08:00
Douglas Anderson	896260a6d6	watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave() and printk_cpu_sync_put_irqrestore() to serialize softlockup reporting. Alone this doesn't have any real advantage over the spinlock, but this will allow us to use the same function in a future change to also serialize hardlockup crawls. NOTE: for the most part this serialization is important because we often end up in the show_regs() path and that has no built-in serialization if there are multiple callers at once. However, even in the case where we end up in the dump_stack() path this still has some advantages because the stack will be guaranteed to be together in the logs with the lockup message with no interleaving. NOTE: the fact that printk_cpu_sync_get_irqsave() is allowed to be called multiple times on the same CPU is important here. Specifically we hold the "lock" while calling dump_stack() which also gets the same "lock". This is explicitly documented to be OK and means we don't need to introduce a variant of dump_stack() that doesn't grab the lock. Link: https://lkml.kernel.org/r/20231220131534.2.Ia5906525d440d8e8383cde31b7c61c2aadc8f907@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Li Zhe <lizhe.67@bytedance.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-29 12:22:30 -08:00
Douglas Anderson	6dcde5d5f2	watchdog/hardlockup: adopt softlockup logic avoiding double-dumps Patch series "watchdog: Better handling of concurrent lockups". When we get multiple lockups at roughly the same time, the output in the kernel logs can be very confusing since the reports about the lockups end up interleaved in the logs. There is some code in the kernel to try to handle this but it wasn't that complete. Li Zhe recently made this a bit better for softlockups (specifically for the case where `kernel.softlockup_all_cpu_backtrace` is not set) in commit `9d02330abd` ("softlockup: serialized softlockup's log"), but that only handled softlockup reports. Hardlockup reports still had similar issues. This series also has a small fix to avoid dumping all stacks a second time in the case of a panic. This is a bit unrelated to the interleaving fixes but it does also improve the clarity of lockup reports. This patch (of 4): The hardlockup detector and softlockup detector both have the ability to dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace` and `kernel.softlockup_all_cpu_backtrace`). Both detectors also have some logic to attempt to avoid interleaving printouts if two CPUs were trying to do dumps of all CPUs at the same time. However: - The hardlockup detector's logic still allowed interleaving some information. Specifically another CPU could print modules and dump the stack of the locked CPU at the same time we were dumping all CPUs. - In the case where `kernel.hardlockup_panic` was set in addition to `kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected hardlockups at the same time the second CPU could call panic() while the first was still dumping stacks. This was especially bad if the locked up CPU wasn't responding to the request for a backtrace since the function nmi_trigger_cpumask_backtrace() can wait up to 10 seconds. Let's resolve this by adopting the softlockup logic in the hardlockup handler. NOTES: - As part of this, one might think that we should make a helper function that both the hard and softlockup detectors call. This turns out not to be super trivial since it would have to be parameterized quite a bit since there are separate global variables controlling each lockup detector and they print log messages that are just different enough that it would be a pain. We probably don't want to change the messages that are printed without good reason to avoid throwing log parsers for a loop. - One might also think that it would be a good idea to have the hardlockup and softlockup detector use the same global variable to prevent interleaving. This would make sure that softlockups and hardlockups can't interleave each other. That _almost_ works but has a dangerous flaw if `kernel.hardlockup_panic` is not the same as `kernel.softlockup_panic` because we might skip a call to panic() if one type of lockup was detected at the same time as another. Link: https://lkml.kernel.org/r/20231220211640.2023645-1-dianders@chromium.org Link: https://lkml.kernel.org/r/20231220131534.1.I4f35a69fbb124b5f0c71f75c631e11fabbe188ff@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-29 12:22:30 -08:00
Li Zhe	9d02330abd	softlockup: serialized softlockup's log If multiple CPUs trigger softlockup at the same time with 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear staggeredly in dmesg, which will affect the viewing of the logs for developer. Since the code path for outputting softlockup logs is not a kernel hotspot and the performance requirements for the code are not strict, locks are used to serialize the softlockup log output to improve the readability of the logs. Link: https://lkml.kernel.org/r/20231123084022.10302-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: John Ogness <john.ogness@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-10 17:21:44 -08:00
Krister Johansen	8b793bcda6	watchdog: move softlockup_panic back to early_param Setting softlockup_panic from do_sysctl_args() causes it to take effect later in boot. The lockup detector is enabled before SMP is brought online, but do_sysctl_args runs afterwards. If a user wants to set softlockup_panic on boot and have it trigger should a softlockup occur during onlining of the non-boot processors, they could do this prior to commit `f117955a22` ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases"). However, after this commit the value of softlockup_panic is set too late to be of help for this type of problem. Restore the prior behavior. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Cc: stable@vger.kernel.org Fixes: `f117955a22` ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2023-11-01 12:10:02 -07:00
Douglas Anderson	1f38c86bb2	watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check() After commit `77c12fc959` ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started storing a `struct cpumask` on the stack in watchdog_hardlockup_check(). On systems with CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That triggers warnings with `CONFIG_FRAME_WARN` set to 1024. We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to use a CPU mask at all. Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid Fixes: `77c12fc959` ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:19:00 -07:00
Douglas Anderson	8d539b84f1	nmi_backtrace: allow excluding an arbitrary CPU The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case. Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask. Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change. [akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:19:00 -07:00
Petr Mladek	47f4cb4339	watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64 The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be set when it should be built. It will make it compatible with the other hardlockup detectors. Before, it is far from obvious that the SPARC64 variant is actually used: $> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y After, it is more clear: $> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y Link: https://lkml.kernel.org/r/20230616150618.6073-6-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:29 -07:00
Petr Mladek	a5fcc2367e	watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one. CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit `23637d477c` ("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36. It was a preparation step for introducing the new generic perf hardlockup detector. The existing arch-specific variants did not support the to-be-created generic build configurations, sysctl interface, etc. This distinction was made explicit by the commit `4a7863cc2e` ("x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR") in v2.6.38. CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit `d314d74c69` ("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used by three architectures, namely blackfin, mn10300, and sparc. The support for blackfin and mn10300 architectures has been completely dropped some time ago. And sparc is the only architecture with the historic NMI watchdog at the moment. And the old sparc implementation is really special. It is always built on sparc64. It used to be always enabled until the commit `7a5c8b57ce` ("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added in v4.10-rc1. There are only few locations where the sparc64 NMI watchdog interacts with the generic hardlockup detectors code: + implements arch_touch_nmi_watchdog() which is called from the generic touch_nmi_watchdog() + implements watchdog_hardlockup_enable()/disable() to support /proc/sys/kernel/nmi_watchdog + is always preferred over other generic watchdogs, see CONFIG_HARDLOCKUP_DETECTOR + includes asm/nmi.h into linux/nmi.h because some sparc-specific functions are needed in sparc-specific code which includes only linux/nmi.h. The situation became more complicated after the commit `05a4a95279` ("kernel/watchdog: split up config options") and commit `2104180a53` ("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1. They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc specific hardlockup detector. It was compatible with the perf one regarding the general boot, sysctl, and programming interfaces. HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of HAVE_NMI_WATCHDOG. It made some sense because all arch-specific detectors had some common requirements, namely: + implemented arch_touch_nmi_watchdog() + included asm/nmi.h into linux/nmi.h + defined the default value for /proc/sys/kernel/nmi_watchdog But it actually has made things pretty complicated when the generic buddy hardlockup detector was added. Before the generic perf detector was newer supported together with an arch-specific one. But the buddy detector could work on any SMP system. It means that an architecture could support both the arch-specific and buddy detector. As a result, there are few tricky dependencies. For example, CONFIG_HARDLOCKUP_DETECTOR depends on: ((HAVE_HARDLOCKUP_DETECTOR_PERF \|\| HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) \|\| HAVE_HARDLOCKUP_DETECTOR_ARCH The problem is that the very special sparc implementation is defined as: HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear without reading understanding the history. Make the logic less tricky and more self-explanatory by making HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to HAVE_HARDLOCKUP_DETECTOR_SPARC64. Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF, and HARDLOCKUP_DETECTOR_BUDDY may conflict only with HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR and it is not longer enabled when HAVE_NMI_WATCHDOG is set. Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:29 -07:00
Douglas Anderson	28168eca32	watchdog/hardlockup: move SMP barriers from common code to buddy code It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hardlockup detector, that the barriers belong in the buddy code. Let's move them and add clearer comments about why they're needed. Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:28 -07:00
Douglas Anderson	d3b62ace0f	watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called In the patch ("watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs"), we added a call from the common watchdog.c file into the buddy. That call could be done more cleanly. Specifically: 1. If we move the call into watchdog_hardlockup_kick() then it keeps watchdog_timer_fn() simpler. 2. We don't need to pass an "unsigned long" to the buddy for the timer count. In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") the count was changed to "atomic_t" which is backed by an int, so we should match types. Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:27 -07:00
Douglas Anderson	7a71d8e650	watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy() In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started using a cpumask to keep track of which CPUs to backtrace. When setting up this cpumask, it's better to use cpumask_copy() than to just copy the structure directly. Fix this. Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:27 -07:00
Douglas Anderson	2711e4adef	watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick() In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr(). Using this_cpu_ptr() works fine. Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:27 -07:00
Douglas Anderson	6426e8d1f2	watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one architecture, we have some special case code in the watchdog core to handle the fact that watchdog_hardlockup_probe() isn't implemented. Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the special case. As a side effect of doing this, code inspection tells us that we could fix a minor bug where the system won't properly realize that NMI watchdogs are disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then nothing will override the "weak" watchdog_hardlockup_probe() and we'll fallback to looking at CONFIG_HAVE_NMI_WATCHDOG. Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-19 16:25:26 -07:00

1 2 3 4 5 ...

262 Commits