- Fix printf format warning for bprintf
sunrpc uses a trace_printk() that triggers a printf warning during the
compile. Move the __printf() attribute around for when debugging is not
enabled the warning will go away.
- Remove redundant check for EVENT_FILE_FL_FREED in event_filter_write()
The FREED flag is checked in the call to event_file_file() and then
checked again right afterward, which is unneeded.
- Clean up event_file_file() and event_file_data() helpers
These helper functions played a different role in the past, but now with
eventfs, the READ_ONCE() isn't needed. Simplify the code a bit and also
add a warning to event_file_data() if the file or its data is not present.
- Remove updating file->private_data in tracing open
All access to the file private data is handled by the helper functions,
which do not use file->private_data. Stop updating it on open.
- Show ENUM names in function arguments via BTF in function tracing
When showing the function arguments when func-args option is set for
function tracing, if one of the arguments is found to be an enum, show the
name of the enum instead of its number.
- Add new trace_call__##name() API for tracepoints
Tracepoints are enabled via static_branch() blocks, where when not
enabled, there's only a nop that is in the code where the execution will
just skip over it. When tracing is enabled, the nop is converted to a
direct jump to the tracepoint code. Sometimes more calculations are
required to be performed to update the parameters of the tracepoint. In
this case, trace_##name##_enabled() is called which is a static_branch()
that gets enabled only when the tracepoint is enabled. This allows the
extra calculations to also be skipped by the nop:
if (trace_foo_enabled()) {
x = bar();
trace_foo(x);
}
Where the x=bar() is only performed when foo is enabled. The problem with
this approach is that there's now two static_branch() calls. One for
checking if the tracepoint is enabled, and then again to know if the
tracepoint should be called. The second one is redundant.
Introduce trace_call__foo() that will call the foo() tracepoint directly
without doing a static_branch():
if (trace_foo_enabled()) {
x = bar();
trace_call__foo();
}
- Update various locations to use the new trace_call__##name() API
- Move snapshot code out of trace.c
Cleaning up trace.c to not be a "dump all", move the snapshot code out of
it and into a new trace_snapshot.c file.
- Clean up some "%*.s" to "%*s"
- Allow boot kernel command line options to be called multiple times
Have options like:
ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo
Equal to:
ftrace_filter=foo,bar,zoo
- Fix ipi_raise event CPU field to be a CPU field
The ipi_raise target_cpus field is defined as a __bitmask(). There is now a
__cpumask() field definition. Update the field to use that.
- Have hist_field_name() use a snprintf() and not a series of strcat()
It's safer to use snprintf() that a series of strcat().
- Fix tracepoint regfunc balancing
A tracepoint can define a "reg" and "unreg" function that gets called
before the tracepoint is enabled, and after it is disabled respectively.
But on error, after the "reg" func is called and the tracepoint is not
enabled, the "unreg" function is not called to tear down what the "reg"
function performed.
- Fix output that shows what histograms are enabled
Event variables are displayed incorrectly in the histogram output.
Instead of "sched.sched_wakeup.$var", it is showing
"$sched.sched_wakeup.var" where the '$' is in the incorrect location.
- Some other simple cleanups.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaeCpvxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qt2WAP44m85BbAjBqJe4WR103eOXV+bREBta
dRoReKJOMe519gEAp0rK/HoCvHgHhIGe3gaGdIsNhnaxoFyNWMG/wokoLAY=
=Hg6+
-----END PGP SIGNATURE-----
Merge tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Fix printf format warning for bprintf
sunrpc uses a trace_printk() that triggers a printf warning during
the compile. Move the __printf() attribute around for when debugging
is not enabled the warning will go away
- Remove redundant check for EVENT_FILE_FL_FREED in
event_filter_write()
The FREED flag is checked in the call to event_file_file() and then
checked again right afterward, which is unneeded
- Clean up event_file_file() and event_file_data() helpers
These helper functions played a different role in the past, but now
with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit
and also add a warning to event_file_data() if the file or its data
is not present
- Remove updating file->private_data in tracing open
All access to the file private data is handled by the helper
functions, which do not use file->private_data. Stop updating it on
open
- Show ENUM names in function arguments via BTF in function tracing
When showing the function arguments when func-args option is set for
function tracing, if one of the arguments is found to be an enum,
show the name of the enum instead of its number
- Add new trace_call__##name() API for tracepoints
Tracepoints are enabled via static_branch() blocks, where when not
enabled, there's only a nop that is in the code where the execution
will just skip over it. When tracing is enabled, the nop is converted
to a direct jump to the tracepoint code. Sometimes more calculations
are required to be performed to update the parameters of the
tracepoint. In this case, trace_##name##_enabled() is called which is
a static_branch() that gets enabled only when the tracepoint is
enabled. This allows the extra calculations to also be skipped by the
nop:
if (trace_foo_enabled()) {
x = bar();
trace_foo(x);
}
Where the x=bar() is only performed when foo is enabled. The problem
with this approach is that there's now two static_branch() calls. One
for checking if the tracepoint is enabled, and then again to know if
the tracepoint should be called. The second one is redundant
Introduce trace_call__foo() that will call the foo() tracepoint
directly without doing a static_branch():
if (trace_foo_enabled()) {
x = bar();
trace_call__foo();
}
- Update various locations to use the new trace_call__##name() API
- Move snapshot code out of trace.c
Cleaning up trace.c to not be a "dump all", move the snapshot code
out of it and into a new trace_snapshot.c file
- Clean up some "%*.s" to "%*s"
- Allow boot kernel command line options to be called multiple times
Have options like:
ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo
Equal to:
ftrace_filter=foo,bar,zoo
- Fix ipi_raise event CPU field to be a CPU field
The ipi_raise target_cpus field is defined as a __bitmask(). There is
now a __cpumask() field definition. Update the field to use that
- Have hist_field_name() use a snprintf() and not a series of strcat()
It's safer to use snprintf() that a series of strcat()
- Fix tracepoint regfunc balancing
A tracepoint can define a "reg" and "unreg" function that gets called
before the tracepoint is enabled, and after it is disabled
respectively. But on error, after the "reg" func is called and the
tracepoint is not enabled, the "unreg" function is not called to tear
down what the "reg" function performed
- Fix output that shows what histograms are enabled
Event variables are displayed incorrectly in the histogram output
Instead of "sched.sched_wakeup.$var", it is showing
"$sched.sched_wakeup.var" where the '$' is in the incorrect location
- Some other simple cleanups
* tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (24 commits)
selftests/ftrace: Add test case for fully-qualified variable references
tracing: Fix fully-qualified variable reference printing in histograms
tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func()
tracing: Rebuild full_name on each hist_field_name() call
tracing: Report ipi_raise target CPUs as cpumask
tracing: Remove duplicate latency_fsnotify() stub
tracing: Preserve repeated trace_trigger boot parameters
tracing: Append repeated boot-time tracing parameters
tracing: Remove spurious default precision from show_event_trigger/filter formats
cpufreq: Use trace_call__##name() at guarded tracepoint call sites
tracing: Remove tracing_alloc_snapshot() when snapshot isn't defined
tracing: Move snapshot code out of trace.c and into trace_snapshot.c
mm: damon: Use trace_call__##name() at guarded tracepoint call sites
btrfs: Use trace_call__##name() at guarded tracepoint call sites
spi: Use trace_call__##name() at guarded tracepoint call sites
i2c: Use trace_call__##name() at guarded tracepoint call sites
kernel: Use trace_call__##name() at guarded tracepoint call sites
tracepoint: Add trace_call__##name() API
tracing: trace_mmap.h: fix a kernel-doc warning
tracing: Pretty-print enum parameters in function arguments
...
When a caller enqueues a work item using schedule_delayed_work() the used
wq is "system_wq" (per-cpu wq) while queue_delayed_work() uses
WORK_CPU_UNBOUND (used when no target CPU is specified). The same applies
to schedule_work() that is using system_wq and queue_work(), which again
makes use of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
Continue the effort to refactor workqueue APIs, which began with the
introduction of new workqueues and a new alloc_workqueue() flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
and switch smp_call_on_cpu() to use system_percpu_wq because system_wq is
going away once the ongoing workqueue restructuring is done.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251110170332.319314-1-marco.crivellari@suse.com
Both smp_call_function() and smp_call_function_single() use per-CPU
call_single_data_t variable to hold the infamous CSD lock. However,
while smp_call_function() acquires the destination CPU's CSD lock,
smp_call_function_single() instead uses the source CPU's CSD lock.
(These are two separate sets of CSD locks, cfd_data and csd_data,
respectively.)
This otherwise inexplicable pair of choices is explained by their
respective queueing properties. If smp_call_function() where to
use the sending CPU's CSD lock, that would serialize the destination
CPUs' IPI handlers and result in long smp_call_function() latencies,
especially on systems with large numbers of CPUs. For its part, if
smp_call_function_single() were to use the (single) destination CPU's
CSD lock, this would similarly serialize in the case where many CPUs
are sending IPIs to a single "victim" CPU. Plus it would result in
higher levels of memory contention.
Except that if there is no NMI-based stack tracing on a weakly ordered
system where remote unsynchronized stack traces are especially unreliable,
the improved debugging beats the improved queueing. This improved queueing
only matters if a bunch of CPUs are calling smp_call_function_single()
concurrently for a single "victim" CPU, which is not the common case.
Therefore, make smp_call_function_single() use the destination CPU's
csd_data instance in kernels built with CONFIG_CSD_LOCK_WAIT_DEBUG=y
where csdlock_debug_enabled is also set. Otherwise, continue to use
the source CPU's csd_data.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/25c2eb97-77c8-49a5-80ac-efe78dea272c@paulmck-laptop
smp_call_function_single() and smp_call_function_many_cond() disable
preemption and cache the CPU number via get_cpu().
Use this cached value throughout the function instead of invoking
smp_processor_id() again.
[ tglx: Make the copy&pasta'ed change log match the patch ]
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Link: https://patch.msgid.link/20260323193630.640311-4-sshegde@linux.ibm.com
Add missing kernel-doc comments and rearrange the order of others to
prevent all kernel-doc warnings.
- add function Returns: sections or format existing comments as kernel-doc
- add missing function parameter comments
- use "/**" for smp_call_function_any() and on_each_cpu_cond_mask()
- correct the commented function name for on_each_cpu_cond_mask()
- use correct format for function short descriptions
- add all kernel-doc comments for smp_call_on_cpu()
- remove kernel-doc comments for raw_smp_processor_id() since there is
no prototype for it here (other than !SMP)
- in smp.h, rearrange some lines so that the kernel-doc comments for
smp_processor_id() are immediately before the macro (to prevent
kernel-doc warnings)
- remove "Returns" from smp_call_function() since it doesn't
return a value
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260310061726.1153764-1-rdunlap@infradead.org
When governors used during cpuidle try to find the most optimal idle state
for a CPU or a group of CPUs, they are known to quite often fail. One
reason for this is, that they are not taking into account whether there has
been an IPI scheduled for any of the CPUs that are affected by the selected
idle state.
To enable pending IPIs to be taken into account for cpuidle decisions,
introduce a new helper function, cpus_peek_for_pending_ipi().
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The smp_call_function_many() kerneldoc comment got out of sync with the
function definition (bool parameter "wait" is incorrectly described as a
bitmask in it), so fix it up by copying the "wait" description from the
smp_call_function() kerneldoc and add information regarding the handling
of the local CPU to it.
Fixes: 49b3bd213a ("smp: Fix all kernel-doc warnings")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Recently while revising RCU's cpu online checks, there was some discussion
around how IPIs synchronize with hotplug.
Add comments explaining how preemption disable creates mutual exclusion with
CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
atomically updates CPU masks and flushes IPIs with interrupts disabled, and
cannot proceed while any CPU (including the IPI sender) has preemption
disabled.
[ Apply peterz feedback. ]
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: rcu@vger.kernel.org
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Whenever work is enqueued for a remote CPU, smp_call_function_many_cond()
may need to wait for that work to be completed. However, if no work is
enqueued for a remote CPU, because the condition func() evaluated to false
for all CPUs, there is no need to wait.
Set run_remote only if work was enqueued on remote CPUs.
Document the difference between "work enqueued", and "CPU needs to be
woken up"
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://lore.kernel.org/all/20250703203019.11331ac3@fangorn
Defer check for local execution to the actual place where it is needed,
which removes the extra local variable.
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-5-yury.norov@gmail.com
smp_call_function_any() tries to make a local call as it's the cheapest
option, or switches to a CPU in the same node. If it's not possible, the
algorithm gives up and searches for any CPU, in a numerical order.
Instead, it can search for the best CPU based on NUMA locality, including
the 2nd nearest hop (a set of equidistant nodes), and higher.
sched_numa_find_nth_cpu() does exactly that, and also helps to drop most
of the housekeeping code.
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-2-yury.norov@gmail.com
Allow runtime modification of the csd_lock_timeout and panic_on_ipistall
module parameters.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmeY+dsTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jH+TD/417AAMonxfZhg1Rv1U61HQlg6hm3n3
OMrwh1AL584Tv+g5UWxqwuBTmxdYNIRLtkuzuRKov7PwxPVgz7T7APXtGYE2D+k+
Kg8mVOXXvb6UUKenA0ICwy0vKgFBbfaClO8EY/5nMrEvvLz43yWe3cRRhfcvjNXq
sgSiG42caElvJrOBzpJ7wmPIlU5VScoSXx7gdOrLXNOC8Tc63O3Z1Z5/vlgZV48k
NhBBGDwIlkLyaFig876XcE/X5c7yx2b6U9iyvcA4k+DwWw/Ku/pq/srbVLFKvCKj
NU8ax6vGzBoaG2xD/NjF/+NFF86C6eSJW5x9eGrysbjKbwNA30TDYKTBC9P4N0Kv
WLJto6j/j9JMeftKKOWSRYA9FjrscxEOMzOJ71OrOXNhLTXNkSwSBqFnGdLsfar5
6B+TxRTaOqPUUo3NyNG6uNtv8TZ00p9kTEo+1SPDkBy5pROfCbIuoJjuUm+iqd6e
Udx37tcWS6DQFsxcFRr+JiJSBE8NbfIDaSlPonjTDf39sgWfIj1u+mdJ7eb+NNE7
YHJLhf6iRaSPRSsiaS7IN3tRlU5/aQVqwymmyOFX6fh+nyzTSk19A0jhVW7RAznE
ta08he9RDUvVjW03wzZANpoRSj4z07Amrz8Rbvfc1j7faVhEBPgFNjfwH/Ej2A9b
5ZoUL1B53EyDYQ==
=+f0b
-----END PGP SIGNATURE-----
Merge tag 'csd-lock.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull CSD-lock update from Paul McKenney:
"Allow runtime modification of the csd_lock_timeout and
panic_on_ipistall module parameters"
* tag 'csd-lock.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
locking/csd-lock: make CSD lock debug tunables writable in /sys
Currently the CSD lock tunables can only be set at boot time in the
kernel commandline, but the way these variables are used means there
is really no reason not to tune them at runtime through /sys.
Make the CSD lock debug tunables tunable through /sys.
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
In smp_call_function_many_cond(), the local cond_func() is evaluated
after triggering the remote CPU IPIs.
If cond_func() depends on loading shared state updated by other CPU's
IPI handlers func(), then triggering execution of remote CPUs IPI before
evaluating cond_func() may have unexpected consequences.
One example scenario is evaluating a jiffies delay in cond_func(), which
is updated by func() in the IPI handlers. This situation can prevent
execution of periodic cleanup code on the local CPU.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241203163558.3455535-1-mathieu.desnoyers@efficios.com
Currently, the CONFIG_CSD_LOCK_WAIT_DEBUG code uses sched_clock() to check
for excessive CSD-lock wait times. This works, but does not guarantee
monotonic timestamps on x86 due to the sched_clock() function's use of
the rdtsc instruction, which does not guarantee ordering. This means
that, given successive calls to sched_clock(), the second might return
an earlier time than the second, that is, time might seem to go backwards.
This can (and does!) result in false-positive CSD-lock wait complaints
claiming almost 2^64 nanoseconds of delay.
Therefore, switch from sched_clock() to ktime_get_mono_fast_ns(), which
does guarantee monotonic timestamps via the rdtsc_ordered() function,
which as the name implies, does guarantee ordered timestamps, at least
in the absence of calls from NMI handlers, which are not involved in
this code path.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
About 40% of all csd_lock warnings observed in our fleet appear to
be due to sched_clock() going backward in time (usually only a little
bit), resulting in ts0 being larger than ts2.
When the local CPU is at fault, we should print out a message reflecting
that, rather than trying to get the remote CPU's stack trace.
Signed-off-by: Rik van Riel <riel@surriel.com>
Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Currently, the CSD-lock diagnostics in CONFIG_CSD_LOCK_WAIT_DEBUG=y
kernels are emitted at five-second intervals. Although this has proven
to be a good time interval for the first diagnostic, if the target CPU
keeps interrupts disabled for way longer than five seconds, the ratio
of useful new information to pointless repetition increases considerably.
Therefore, back off the time period for repeated reports of the same
incident, increasing linearly with the number of reports and logarithmicly
with the number of online CPUs.
[ paulmck: Apply Dan Carpenter feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
If a CSD-lock stall goes on long enough, it will cause an RCU CPU
stall warning. This additional warning provides much additional
console-log traffic and little additional information. Therefore,
provide a new csd_lock_is_stuck() function that returns true if there
is an ongoing CSD-lock stall. This function will be used by the RCU
CPU stall warnings to provide a one-line indication of the stall when
this function returns true.
[ neeraj.upadhyay: Apply Rik van Riel feedback. ]
[ neeraj.upadhyay: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
The CSD-lock-hold diagnostics from CONFIG_CSD_LOCK_WAIT_DEBUG are
printed in nanoseconds as unsigned long longs, which is a bit obtuse for
human readers when timing bugs result in negative CSD-lock hold times.
Yes, there are some people to whom it is immediately obvious that
18446744073709551615 is really -1, but for the rest of us...
Therefore, print these numbers as signed long longs, making the negative
hold times immediately apparent.
Reported-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by
INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for
the debug check in __init_work() to work correctly.
But this lacks the counterpart to remove the tracked object from debug
objects again, which will cause a debug object warning once the stack is
freed.
Add the missing destroy_work_on_stack() invocation to cure that.
[ tglx: Massaged changelog ]
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240704065213.13559-1-qiang.zhang1211@gmail.com
Fixes the following two Coccinelle/coccicheck warnings reported by
string_choices.cocci:
opportunity for str_plural(num_cpus)
opportunity for str_plural(num_nodes)
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240508154225.309703-2-thorsten.blum@toblux.com
This series adds a kernel boot parameter that causes the kernel to
panic if one of the call_smp_function() APIs is stalled for more than
the specified duration. This is useful in deployments in which a clean
panic is preferable to an indefinite stall.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmU9plITHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jB91D/9OiOMtV03TXN2K+zGmJMjTFgLuVnug
OqG4mrCv7jTJ3k6fTpu7hih/BCmG1Mu7byyPV6BUSfKsYony7L4yTPFJsjK8lNmq
MHh847DErGieuURCDnsvqBVpYIRfXnvW9ptlf+BMCjbzz4FuUu1XhJTm+U2nab3i
BEIEMORxDCyghh7yluAVG6sULRXOqjv5VcypwOXUbavDf0JyJTns4QlXFD85yiBr
nvfHvLUrzu5EblA3m09lTnCaKrAlz5pwD7fWQq7bS3rz2gndR/DcVcZknQ68FMsj
Mcf0Zf45TzyWWfMcE8LCulhMlZ2GYsIm2YkIbgwlsOAndjBrV55rsgaVbSZDke37
QMHPGUZ7m7AjuDqpWZITrJjQHkWCtn/5tFSazHSlMwWg44pgOqvc3OgZn04tzn01
L/guq3yBIKBJiAjdgzdx8/H9S7cSH8TLEWFY2utEAMIjef9Xzi6qwQ4X5p4K9QYJ
Sm1hTTCbF9NeyMk0o2rokR58f13+9ewxE4RIYcwP9loo6nbBcTVN/fvdHN6jQywa
7UIui508AUf55zQgO+5z8LsmjyNiDmDIeE8CLeVDicllWN+ne7F//AXAuA5V8m+y
G0Mphn/lZpnddDSFlR/QFuMvGqtpIu2y8w93awq8g/VzirbkiiCia3iFcFtss4ip
n0Y50CCu/W70fw==
=ZD0S
-----END PGP SIGNATURE-----
Merge tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull CSD lock update from Paul McKenney:
"This adds a kernel boot parameter that causes the kernel to panic if
one of the call_smp_function() APIs is stalled for more than the
specified duration.
This is useful in deployments in which a clean panic is preferable to
an indefinite stall"
* tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
smp,csd: Throw an error if a CSD lock is stuck for too long
The CSD lock seems to get stuck in 2 "modes". When it gets stuck
temporarily, it usually gets released in a few seconds, and sometimes
up to one or two minutes.
If the CSD lock stays stuck for more than several minutes, it never
seems to get unstuck, and gradually more and more things in the system
end up also getting stuck.
In the latter case, we should just give up, so the system can dump out
a little more information about what went wrong, and, with panic_on_oops
and a kdump kernel loaded, dump a whole bunch more information about what
might have gone wrong. In addition, there is an smp.panic_on_ipistall
kernel boot parameter that by default retains the old behavior, but when
set enables the panic after the CSD lock has been stuck for more than
the specified number of milliseconds, as in 300,000 for five minutes.
[ paulmck: Apply Imran Khan feedback. ]
[ paulmck: Apply Leonardo Bras feedback. ]
Link: https://lore.kernel.org/lkml/bc7cc8b0-f587-4451-8bcd-0daae627bcc7@paulmck-laptop/
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
call_single_data_t is a size-aligned typedef of struct __call_single_data.
This alignment is desirable in order to have smp_call_function*() avoid
bouncing an extra cacheline in case of an unaligned csd, given this
would hurt performance.
Since the removal of struct request->csd in commit 660e802c76
("blk-mq: use percpu csd to remote complete instead of per-rq csd") there
are no current users of smp_call_function*() with unaligned csd.
Change every 'struct __call_single_data' function parameter to
'call_single_data_t', so we have warnings if any new code tries to
introduce an smp_call_function*() call with unaligned csd.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230831063129.335425-1-leobras@redhat.com
On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang,
then all of these waiting CPUs send an NMI to the destination CPU in
order to dump its backtrace.
Given enough NMIs, the destination CPU will spent much of its time
producing backtraces, thus further delaying that CPU's response to the
original CSD IPI. In the worst case, by the time destination CPU is
done producing all of these backtrace NMIs, the CSD wait timeout will
have elapsed so that the waiters resend their backtrace NMIs again,
further delaying forward progress.
Therefore, to avoid these delays, issue the backtrace NMI only from
the first waiter. The destination CPU's other waiters can make use of
backtrace obtained from the first waiter's NMI.
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If a waiter is waiting for CSD lock, its call stack will not change
between first and subsequent hang detection for the same CSD lock.
Therefore, do dump_stack only for first-time detection for a given waiter.
This avoids excessive logging on systems with hundreds of CPUs where
repetitive dump_stack from hundreds of CPUs would otherwise flood the
console.
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Add a tracepoint for when a CSD is queued to a remote CPU's
call_single_queue. This allows finding exactly which CPU queued a given CSD
when looking at a csd_function_{entry,exit} event, and also enables us to
accurately measure IPI delivery time with e.g. a synthetic event:
$ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\
/sys/kernel/tracing/events/smp/csd_queue_cpu/trigger
$ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\
/sys/kernel/tracing/synthetic_events
$ echo \
'hist:keys=common_cpu,csd.hex:'\
'time=common_timestamp.usecs-$ts:'\
'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\
/sys/kernel/tracing/events/smp/csd_function_entry/trigger
$ trace-cmd record -e 'synthetic:csd_latency' hackbench
$ trace-cmd report
<...>-467 [001] 21.824263: csd_queue_cpu: cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8
<...>-467 [001] 21.824280: ipi_send_cpu: cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0
<...>-489 [000] 21.824299: csd_function_entry: func=sched_ttwu_pending csd=0xffff8880076148b8
<...>-489 [000] 21.824320: csd_latency: dst_cpu=0, csd=18446612682193848504, time=36
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615065944.188876-7-leobras@redhat.com
The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources
of IPIs targeting CPUs running latency-sensitive applications.
For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are
sufficient to find them and work on getting rid of them. In some setups
however, not *all* IPIs are to be suppressed, but long-running IPI
callbacks can still be problematic.
Add a pair of tracepoints to mark the start and end of processing a CSD IPI
callback, similar to what exists for softirq, workqueue or timer callbacks.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615065944.188876-5-leobras@redhat.com
No point in keeping them around.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.551974164@linutronix.de
(Ab)use the trace_ipi_send_cpu*() family to trace all
smp_function_call*() invocations, not only those that result in an
actual IPI.
The queued entries log their callback function while the actual IPIs
are traced on generic_smp_call_function_single_interrupt().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Context
=======
The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.
While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.
This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.
Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.
Changes
=======
send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).
Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.
Do the same thing for the multi IPI case.
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230307143558.294354-8-vschneid@redhat.com
Accessing the call_single_queue hasn't involved a spinlock since 2014:
6897fc22ea ("kernel: use lockless list for smp_call_function_single")
The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230307143558.294354-7-vschneid@redhat.com
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307143558.294354-4-vschneid@redhat.com
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.
Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230307143558.294354-3-vschneid@redhat.com
It is currently possible to set the csdlock_debug_enabled static
branch, but not to reset it. This is an issue when several different
entities supply kernel boot parameters and also for kernels built with
CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT=y.
Therefore, make the csdlock_debug=0 kernel boot parameter turn off
debugging. Last one wins!
Reported-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-4-paulmck@kernel.org
The diagnostics added by this commit were extremely useful in one instance:
a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")
However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.
Therefore, manually revert the following commit preparatory commit:
de7b09ef65 ("locking/csd_lock: Prepare more CSD lock debugging")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-3-paulmck@kernel.org
The diagnostics added by this commit were extremely useful in one instance:
a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")
However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.
Therefore, manually revert this commit, but leave a comment telling
people where to find these diagnostics.
[ paulmck: Apply Juergen Gross feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org
The csd_debug kernel parameter works well, but is inconvenient in cases
where it is more closely associated with boot loaders or automation than
with a particular kernel version or release. Thererfore, provide a new
CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
1 when selected and 0 otherwise, with this latter being the default.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
From Phil Auld:
drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
From me:
cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess
This series cleans that mess and adds new config FORCE_NR_CPUS that
allows to optimize cpumask subsystem if the number of CPUs is known
at compile-time.
From me:
lib: optimize find_bit() functions
Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros.
From me:
lib/find: add find_nth_bit()
Adds find_nth_bit(), which is ~70 times faster than bitcounting with
for_each() loop:
for_each_set_bit(bit, mask, size)
if (n-- == 0)
return bit;
Also adds bitmap_weight_and() to let people replace this pattern:
tmp = bitmap_alloc(nbits);
bitmap_and(tmp, map1, map2, nbits);
weight = bitmap_weight(tmp, nbits);
bitmap_free(tmp);
with a single bitmap_weight_and() call.
From me:
cpumask: repair cpumask_check()
After switching cpumask to use nr_cpu_ids, cpumask_check() started
generating many false-positive warnings. This series fixes it.
From Valentin Schneider:
bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot()
Extends the API with one more function and applies it in sched/core.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof
vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow
7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD
5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6
hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc
/ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001
TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm
r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV
6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3
GR7sMQae
=Q/+F
-----END PGP SIGNATURE-----
Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)
- cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)
This series cleans that mess and adds new config FORCE_NR_CPUS that
allows to optimize cpumask subsystem if the number of CPUs is known
at compile-time.
- optimize find_bit() functions (me)
Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
macros.
- add find_nth_bit() (me)
Adds find_nth_bit(), which is ~70 times faster than bitcounting with
for_each() loop:
for_each_set_bit(bit, mask, size)
if (n-- == 0)
return bit;
Also adds bitmap_weight_and() to let people replace this pattern:
tmp = bitmap_alloc(nbits);
bitmap_and(tmp, map1, map2, nbits);
weight = bitmap_weight(tmp, nbits);
bitmap_free(tmp);
with a single bitmap_weight_and() call.
- repair cpumask_check() (me)
After switching cpumask to use nr_cpu_ids, cpumask_check() started
generating many false-positive warnings. This series fixes it.
- Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
Schneider)
Extends the API with one more function and applies it in sched/core.
* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
lib/test_cpumask: Add for_each_cpu_and(not) tests
cpumask: Introduce for_each_cpu_andnot()
lib/find_bit: Introduce find_next_andnot_bit()
cpumask: fix checking valid cpu range
lib/bitmap: add tests for for_each() loops
lib/find: optimize for_each() macros
lib/bitmap: introduce for_each_set_bit_wrap() macro
lib/find_bit: add find_next{,_and}_bit_wrap
cpumask: switch for_each_cpu{,_not} to use for_each_bit()
net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
cpumask: add cpumask_nth_{,and,andnot}
lib/bitmap: remove bitmap_ord_to_pos
lib/bitmap: add tests for find_nth_bit()
lib: add find_nth{,_and,_andnot}_bit()
lib/bitmap: add bitmap_weight_and()
lib/bitmap: don't call __bitmap_weight() in kernel code
tools: sync find_bit() implementation
lib/find_bit: optimize find_next_bit() functions
lib/find_bit: create find_first_zero_bit_le()
...
The size of cpumasks is hard-limited by compile-time parameter NR_CPUS,
but defined at boot-time when kernel parses ACPI/DT tables, and stored in
nr_cpu_ids. In many practical cases, number of CPUs for a target is known
at compile time, and can be provided with NR_CPUS.
In that case, compiler may be instructed to rely on NR_CPUS as on actual
number of CPUs, not an upper limit. It allows to optimize many cpumask
routines and significantly shrink size of the kernel image.
This patch adds FORCE_NR_CPUS option to teach the compiler to rely on
NR_CPUS and enable corresponding optimizations.
If FORCE_NR_CPUS=y, kernel will not set nr_cpu_ids at boot, but only check
that the actual number of possible CPUs is equal to NR_CPUS, and WARN if
that doesn't hold.
The new option is especially useful in embedded applications because
kernel configurations are unique for each SoC, the number of CPUs is
constant and known well, and memory limitations are typically harder.
For my 4-CPU ARM64 build with NR_CPUS=4, FORCE_NR_CPUS=y saves 46KB:
add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300)
Signed-off-by: Yury Norov <yury.norov@gmail.com>
In preparation to support compile-time nr_cpu_ids, add a setter for
the variable.
This is a no-op for all arches.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
SMP and NR_CPUS are independent options, hence nr_cpu_ids may be
declared even if NR_CPUS == 1, which is useless.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The trigger_all_cpu_backtrace() function attempts to send an NMI to the
target CPU, which usually provides much better stack traces than the
dump_cpu_task() function's approach of dumping that stack from some other
CPU. So much so that most calls to dump_cpu_task() only happen after
a call to trigger_all_cpu_backtrace() has failed. And the exception to
this rule really should attempt to use trigger_all_cpu_backtrace() first.
Therefore, move the trigger_all_cpu_backtrace() invocation into
dump_cpu_task().
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
The csdlock_debug kernel-boot parameter is parsed by the
early_param() function csdlock_debug(). If set, csdlock_debug()
invokes static_branch_enable() to enable csd_lock_wait feature, which
triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and
CONFIG_SPARSEMEM_VMEMMAP=n.
With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
static_key_enable() and returns NULL, resulting in a NULL dereference
because mem_section is initialized only later in sparse_init().
This is also a problem for powerpc because early_param() functions
are invoked earlier than jump_label_init(), also resulting in
static_key_enable() failures. These failures cause the warning "static
key 'xxx' used before call to jump_label_init()".
Thus, early_param is too early for csd_lock_wait to run
static_branch_enable(), so changes it to __setup to fix these.
Fixes: 8d0968cc6b ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging")
Cc: stable@vger.kernel.org
Reported-by: Chen jingwen <chenjingwen6@huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>