Commit Graph

208 Commits

Author SHA1 Message Date
Linus Torvalds
cb30bf881c tracing updates for v7.1:
- Fix printf format warning for bprintf
 
   sunrpc uses a trace_printk() that triggers a printf warning during the
   compile. Move the __printf() attribute around for when debugging is not
   enabled the warning will go away.
 
 - Remove redundant check for EVENT_FILE_FL_FREED in event_filter_write()
 
   The FREED flag is checked in the call to event_file_file() and then
   checked again right afterward, which is unneeded.
 
 - Clean up event_file_file() and event_file_data() helpers
 
   These helper functions played a different role in the past, but now with
   eventfs, the READ_ONCE() isn't needed. Simplify the code a bit and also
   add a warning to event_file_data() if the file or its data is not present.
 
 - Remove updating file->private_data in tracing open
 
   All access to the file private data is handled by the helper functions,
   which do not use file->private_data. Stop updating it on open.
 
 - Show ENUM names in function arguments via BTF in function tracing
 
   When showing the function arguments when func-args option is set for
   function tracing, if one of the arguments is found to be an enum, show the
   name of the enum instead of its number.
 
 - Add new trace_call__##name() API for tracepoints
 
   Tracepoints are enabled via static_branch() blocks, where when not
   enabled, there's only a nop that is in the code where the execution will
   just skip over it. When tracing is enabled, the nop is converted to a
   direct jump to the tracepoint code. Sometimes more calculations are
   required to be performed to update the parameters of the tracepoint. In
   this case, trace_##name##_enabled() is called which is a static_branch()
   that gets enabled only when the tracepoint is enabled. This allows the
   extra calculations to also be skipped by the nop:
 
   if (trace_foo_enabled()) {
       x = bar();
       trace_foo(x);
   }
 
   Where the x=bar() is only performed when foo is enabled. The problem with
   this approach is that there's now two static_branch() calls. One for
   checking if the tracepoint is enabled, and then again to know if the
   tracepoint should be called. The second one is redundant.
 
   Introduce trace_call__foo() that will call the foo() tracepoint directly
   without doing a static_branch():
 
   if (trace_foo_enabled()) {
       x = bar();
       trace_call__foo();
   }
 
 - Update various locations to use the new trace_call__##name() API
 
 - Move snapshot code out of trace.c
 
   Cleaning up trace.c to not be a "dump all", move the snapshot code out of
   it and into a new trace_snapshot.c file.
 
 - Clean up some "%*.s" to "%*s"
 
 - Allow boot kernel command line options to be called multiple times
 
   Have options like:
 
     ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo
 
   Equal to:
 
     ftrace_filter=foo,bar,zoo
 
 - Fix ipi_raise event CPU field to be a CPU field
 
   The ipi_raise target_cpus field is defined as a __bitmask(). There is now a
   __cpumask() field definition. Update the field to use that.
 
 - Have hist_field_name() use a snprintf() and not a series of strcat()
 
   It's safer to use snprintf() that a series of strcat().
 
 - Fix tracepoint regfunc balancing
 
   A tracepoint can define a "reg" and "unreg" function that gets called
   before the tracepoint is enabled, and after it is disabled respectively.
   But on error, after the "reg" func is called and the tracepoint is not
   enabled, the "unreg" function is not called to tear down what the "reg"
   function performed.
 
 - Fix output that shows what histograms are enabled
 
   Event variables are displayed incorrectly in the histogram output.
 
   Instead of "sched.sched_wakeup.$var", it is showing
   "$sched.sched_wakeup.var" where the '$' is in the incorrect location.
 
 - Some other simple cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaeCpvxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qt2WAP44m85BbAjBqJe4WR103eOXV+bREBta
 dRoReKJOMe519gEAp0rK/HoCvHgHhIGe3gaGdIsNhnaxoFyNWMG/wokoLAY=
 =Hg6+
 -----END PGP SIGNATURE-----

Merge tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Fix printf format warning for bprintf

   sunrpc uses a trace_printk() that triggers a printf warning during
   the compile. Move the __printf() attribute around for when debugging
   is not enabled the warning will go away

 - Remove redundant check for EVENT_FILE_FL_FREED in
   event_filter_write()

   The FREED flag is checked in the call to event_file_file() and then
   checked again right afterward, which is unneeded

 - Clean up event_file_file() and event_file_data() helpers

   These helper functions played a different role in the past, but now
   with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit
   and also add a warning to event_file_data() if the file or its data
   is not present

 - Remove updating file->private_data in tracing open

   All access to the file private data is handled by the helper
   functions, which do not use file->private_data. Stop updating it on
   open

 - Show ENUM names in function arguments via BTF in function tracing

   When showing the function arguments when func-args option is set for
   function tracing, if one of the arguments is found to be an enum,
   show the name of the enum instead of its number

 - Add new trace_call__##name() API for tracepoints

   Tracepoints are enabled via static_branch() blocks, where when not
   enabled, there's only a nop that is in the code where the execution
   will just skip over it. When tracing is enabled, the nop is converted
   to a direct jump to the tracepoint code. Sometimes more calculations
   are required to be performed to update the parameters of the
   tracepoint. In this case, trace_##name##_enabled() is called which is
   a static_branch() that gets enabled only when the tracepoint is
   enabled. This allows the extra calculations to also be skipped by the
   nop:

	if (trace_foo_enabled()) {
		x = bar();
		trace_foo(x);
	}

   Where the x=bar() is only performed when foo is enabled. The problem
   with this approach is that there's now two static_branch() calls. One
   for checking if the tracepoint is enabled, and then again to know if
   the tracepoint should be called. The second one is redundant

   Introduce trace_call__foo() that will call the foo() tracepoint
   directly without doing a static_branch():

	if (trace_foo_enabled()) {
		x = bar();
		trace_call__foo();
	}

 - Update various locations to use the new trace_call__##name() API

 - Move snapshot code out of trace.c

   Cleaning up trace.c to not be a "dump all", move the snapshot code
   out of it and into a new trace_snapshot.c file

 - Clean up some "%*.s" to "%*s"

 - Allow boot kernel command line options to be called multiple times

   Have options like:

	ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo

   Equal to:

	ftrace_filter=foo,bar,zoo

 - Fix ipi_raise event CPU field to be a CPU field

   The ipi_raise target_cpus field is defined as a __bitmask(). There is
   now a __cpumask() field definition. Update the field to use that

 - Have hist_field_name() use a snprintf() and not a series of strcat()

   It's safer to use snprintf() that a series of strcat()

 - Fix tracepoint regfunc balancing

   A tracepoint can define a "reg" and "unreg" function that gets called
   before the tracepoint is enabled, and after it is disabled
   respectively. But on error, after the "reg" func is called and the
   tracepoint is not enabled, the "unreg" function is not called to tear
   down what the "reg" function performed

 - Fix output that shows what histograms are enabled

   Event variables are displayed incorrectly in the histogram output

   Instead of "sched.sched_wakeup.$var", it is showing
   "$sched.sched_wakeup.var" where the '$' is in the incorrect location

 - Some other simple cleanups

* tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (24 commits)
  selftests/ftrace: Add test case for fully-qualified variable references
  tracing: Fix fully-qualified variable reference printing in histograms
  tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func()
  tracing: Rebuild full_name on each hist_field_name() call
  tracing: Report ipi_raise target CPUs as cpumask
  tracing: Remove duplicate latency_fsnotify() stub
  tracing: Preserve repeated trace_trigger boot parameters
  tracing: Append repeated boot-time tracing parameters
  tracing: Remove spurious default precision from show_event_trigger/filter formats
  cpufreq: Use trace_call__##name() at guarded tracepoint call sites
  tracing: Remove tracing_alloc_snapshot() when snapshot isn't defined
  tracing: Move snapshot code out of trace.c and into trace_snapshot.c
  mm: damon: Use trace_call__##name() at guarded tracepoint call sites
  btrfs: Use trace_call__##name() at guarded tracepoint call sites
  spi: Use trace_call__##name() at guarded tracepoint call sites
  i2c: Use trace_call__##name() at guarded tracepoint call sites
  kernel: Use trace_call__##name() at guarded tracepoint call sites
  tracepoint: Add trace_call__##name() API
  tracing: trace_mmap.h: fix a kernel-doc warning
  tracing: Pretty-print enum parameters in function arguments
  ...
2026-04-17 09:43:12 -07:00
Marco Crivellari
7eb28030f6 smp: Use system_percpu_wq instead of system_wq
When a caller enqueues a work item using schedule_delayed_work() the used
wq is "system_wq" (per-cpu wq) while queue_delayed_work() uses
WORK_CPU_UNBOUND (used when no target CPU is specified). The same applies
to schedule_work() that is using system_wq and queue_work(), which again
makes use of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

Continue the effort to refactor workqueue APIs, which began with the
introduction of new workqueues and a new alloc_workqueue() flag in:

  commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
  commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

and switch smp_call_on_cpu() to use system_percpu_wq because system_wq is
going away once the ongoing workqueue restructuring is done.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251110170332.319314-1-marco.crivellari@suse.com
2026-03-26 17:31:35 +01:00
Vineeth Pillai (Google)
7b1c2d7547 kernel: Use trace_call__##name() at guarded tracepoint call sites
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.

Cc: David Vernet <void@manifault.com>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Changwoo Min <changwoo@igalia.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Kisel <romank@linux.microsoft.com>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20260323160052.17528-3-vineeth@bitbyteword.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-26 08:28:49 -04:00
Paul E. McKenney
b0473dcd4b smp: Improve smp_call_function_single() CSD-lock diagnostics
Both smp_call_function() and smp_call_function_single() use per-CPU
call_single_data_t variable to hold the infamous CSD lock.  However,
while smp_call_function() acquires the destination CPU's CSD lock,
smp_call_function_single() instead uses the source CPU's CSD lock.
(These are two separate sets of CSD locks, cfd_data and csd_data,
respectively.)

This otherwise inexplicable pair of choices is explained by their
respective queueing properties.  If smp_call_function() where to
use the sending CPU's CSD lock, that would serialize the destination
CPUs' IPI handlers and result in long smp_call_function() latencies,
especially on systems with large numbers of CPUs.  For its part, if
smp_call_function_single() were to use the (single) destination CPU's
CSD lock, this would similarly serialize in the case where many CPUs
are sending IPIs to a single "victim" CPU.  Plus it would result in
higher levels of memory contention.

Except that if there is no NMI-based stack tracing on a weakly ordered
system where remote unsynchronized stack traces are especially unreliable,
the improved debugging beats the improved queueing.  This improved queueing
only matters if a bunch of CPUs are calling smp_call_function_single()
concurrently for a single "victim" CPU, which is not the common case.

Therefore, make smp_call_function_single() use the destination CPU's
csd_data instance in kernels built with CONFIG_CSD_LOCK_WAIT_DEBUG=y
where csdlock_debug_enabled is also set.  Otherwise, continue to use
the source CPU's csd_data.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/25c2eb97-77c8-49a5-80ac-efe78dea272c@paulmck-laptop
2026-03-25 20:11:30 +01:00
Shrikanth Hegde
ec39780d6a smp: Get this_cpu once in smp_call_function
smp_call_function_single() and smp_call_function_many_cond() disable
preemption and cache the CPU number via get_cpu().

Use this cached value throughout the function instead of invoking
smp_processor_id() again.

[ tglx: Make the copy&pasta'ed change log match the patch ]

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Link: https://patch.msgid.link/20260323193630.640311-4-sshegde@linux.ibm.com
2026-03-25 20:11:30 +01:00
Randy Dunlap
cc5623947f smp: Add missing kernel-doc comments
Add missing kernel-doc comments and rearrange the order of others to
prevent all kernel-doc warnings.

 - add function Returns: sections or format existing comments as kernel-doc
 - add missing function parameter comments
 - use "/**" for smp_call_function_any() and on_each_cpu_cond_mask()
 - correct the commented function name for on_each_cpu_cond_mask()
 - use correct format for function short descriptions
 - add all kernel-doc comments for smp_call_on_cpu()
 - remove kernel-doc comments for raw_smp_processor_id() since there is
   no prototype for it here (other than !SMP)
 - in smp.h, rearrange some lines so that the kernel-doc comments for
   smp_processor_id() are immediately before the macro (to prevent
   kernel-doc warnings)
 - remove "Returns" from smp_call_function() since it doesn't
   return a value

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260310061726.1153764-1-rdunlap@infradead.org
2026-03-25 20:11:29 +01:00
Ulf Hansson
ccde652518 smp: Introduce a helper function to check for pending IPIs
When governors used during cpuidle try to find the most optimal idle state
for a CPU or a group of CPUs, they are known to quite often fail. One
reason for this is, that they are not taking into account whether there has
been an IPI scheduled for any of the CPUs that are affected by the selected
idle state.

To enable pending IPIs to be taken into account for cpuidle decisions,
introduce a new helper function, cpus_peek_for_pending_ipi().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-19 18:06:50 +01:00
Rafael J. Wysocki
ccf09357ff smp: Fix up and expand the smp_call_function_many() kerneldoc
The smp_call_function_many() kerneldoc comment got out of sync with the
function definition (bool parameter "wait" is incorrectly described as a
bitmask in it), so fix it up by copying the "wait" description from the
smp_call_function() kerneldoc and add information regarding the handling
of the local CPU to it.

Fixes: 49b3bd213a ("smp: Fix all kernel-doc warnings")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-09-18 22:21:28 +02:00
Roman Kisel
83e6384374 smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment
"boolean" is spelt as "blooean". Fix that.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250722161818.6139-1-romank@linux.microsoft.com
2025-08-02 14:24:50 +02:00
Linus Torvalds
909d2bb07d stop-machine: Improve documentation
Changes
 -------
 
 * Improve kernel-doc function-header comments
 * Document preemption and stop_machine() mutual exclusion (Joel Fernandes)
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmiBbxgTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jCgqD/4zu5NL6k5JmzPnYoLKPmjyAhQCZHDf
 saXTuH7mmZPLj0KB/LoiwHTSLO+ldFzqwlggw2YuyqfdHMQzc91NVBU2ZZMAFncD
 tEdcIQjDvtBHQmi7PU0b/knNJuVEHSCq6OtpmcnvSjQsZXfPeswrfd1ofuFrgc37
 jpmyYhplFMAbM4V0IbkjzX8afLYKQRG0Eb51S+yAmbz0ZdiEr4OuSRs3zJvQfCRE
 /PYfMM8oczUW6MaVJRRIi67KIAiGBfhN+X4Uc39yir/gAu/HQMEv7/+3ImemUu5d
 x/ZIca/LtJh7Ga4EHNdc7I9AOtDifzXti07izFiILAHrsMlpALMqIRbftHQAknwf
 CQQUpoB/C0qH1PxZuYyeOByo+MXYejKAsum8990AY6z9mBjTGtdE4cr/ikeYVFft
 DZLuDPKoy48C5xu0ycj5Ir9TF8LkTBAUdRsUmXiF+SnQWbOwnmcr79XoPp+7JH2A
 /70EClttaaTvCjpzDZRTNaHjPZ7bdiARUYwCvR/Vgd8KbqYaUoXnbSZjcLGbnSNC
 W/GCPiDJCEhxbKnli6Vdc2cEI7aA9mUUZqSMo6XryjixeXoPUEfmv91vZ3GkRrGv
 QMM+MxBuYLWG//G2rBbyIiGE81ewAn6TmDVU54UfqKbB3AmOioyDN0jI/kJCAYfx
 UqqLyz7+LMbx+A==
 =tjJh
 -----END PGP SIGNATURE-----

Merge tag 'stop-machine.2025.07.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull stop-machine documentation updates from Paul McKenney:

 - Improve kernel-doc function-header comments

 - Document preemption and stop_machine() mutual exclusion (Joel
   Fernandes)

* tag 'stop-machine.2025.07.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  smp: Document preemption and stop_machine() mutual exclusion
  stop_machine: Improve kernel-doc function-header comments
2025-07-29 16:14:07 -07:00
Joel Fernandes
cf4fc66746 smp: Document preemption and stop_machine() mutual exclusion
Recently while revising RCU's cpu online checks, there was some discussion
around how IPIs synchronize with hotplug.

Add comments explaining how preemption disable creates mutual exclusion with
CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
atomically updates CPU masks and flushes IPIs with interrupts disabled, and
cannot proceed while any CPU (including the IPI sender) has preemption
disabled.

[ Apply peterz feedback. ]

Cc: Andrea Righi <arighi@nvidia.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: rcu@vger.kernel.org
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-07-17 12:11:13 -07:00
Rik van Riel
946a728198 smp: Wait only if work was enqueued
Whenever work is enqueued for a remote CPU, smp_call_function_many_cond()
may need to wait for that work to be completed. However, if no work is
enqueued for a remote CPU, because the condition func() evaluated to false
for all CPUs, there is no need to wait.

Set run_remote only if work was enqueued on remote CPUs.

Document the difference between "work enqueued", and "CPU needs to be
woken up"

Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://lore.kernel.org/all/20250703203019.11331ac3@fangorn
2025-07-06 11:57:39 +02:00
Yury Norov [NVIDIA]
e0e9506523 smp: Defer check for local execution in smp_call_function_many_cond()
Defer check for local execution to the actual place where it is needed,
which removes the extra local variable.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-5-yury.norov@gmail.com
2025-07-02 19:13:14 +02:00
Yury Norov [NVIDIA]
976e0e3103 smp: Use cpumask_any_but() in smp_call_function_many_cond()
smp_call_function_many_cond() opencodes cpumask_any_but().

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-3-yury.norov@gmail.com
2025-06-26 23:46:34 +02:00
Yury Norov [NVIDIA]
5f295519b4 smp: Improve locality in smp_call_function_any()
smp_call_function_any() tries to make a local call as it's the cheapest
option, or switches to a CPU in the same node. If it's not possible, the
algorithm gives up and searches for any CPU, in a numerical order.

Instead, it can search for the best CPU based on NUMA locality, including
the 2nd nearest hop (a set of equidistant nodes), and higher.

sched_numa_find_nth_cpu() does exactly that, and also helps to drop most
of the housekeeping code.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-2-yury.norov@gmail.com
2025-06-26 23:46:34 +02:00
Linus Torvalds
b2b3379f4c CSD-lock pull request for v6.14
Allow runtime modification of the csd_lock_timeout and panic_on_ipistall
 module parameters.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmeY+dsTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jH+TD/417AAMonxfZhg1Rv1U61HQlg6hm3n3
 OMrwh1AL584Tv+g5UWxqwuBTmxdYNIRLtkuzuRKov7PwxPVgz7T7APXtGYE2D+k+
 Kg8mVOXXvb6UUKenA0ICwy0vKgFBbfaClO8EY/5nMrEvvLz43yWe3cRRhfcvjNXq
 sgSiG42caElvJrOBzpJ7wmPIlU5VScoSXx7gdOrLXNOC8Tc63O3Z1Z5/vlgZV48k
 NhBBGDwIlkLyaFig876XcE/X5c7yx2b6U9iyvcA4k+DwWw/Ku/pq/srbVLFKvCKj
 NU8ax6vGzBoaG2xD/NjF/+NFF86C6eSJW5x9eGrysbjKbwNA30TDYKTBC9P4N0Kv
 WLJto6j/j9JMeftKKOWSRYA9FjrscxEOMzOJ71OrOXNhLTXNkSwSBqFnGdLsfar5
 6B+TxRTaOqPUUo3NyNG6uNtv8TZ00p9kTEo+1SPDkBy5pROfCbIuoJjuUm+iqd6e
 Udx37tcWS6DQFsxcFRr+JiJSBE8NbfIDaSlPonjTDf39sgWfIj1u+mdJ7eb+NNE7
 YHJLhf6iRaSPRSsiaS7IN3tRlU5/aQVqwymmyOFX6fh+nyzTSk19A0jhVW7RAznE
 ta08he9RDUvVjW03wzZANpoRSj4z07Amrz8Rbvfc1j7faVhEBPgFNjfwH/Ej2A9b
 5ZoUL1B53EyDYQ==
 =+f0b
 -----END PGP SIGNATURE-----

Merge tag 'csd-lock.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull CSD-lock update from Paul McKenney:
 "Allow runtime modification of the csd_lock_timeout and
  panic_on_ipistall module parameters"

* tag 'csd-lock.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  locking/csd-lock: make CSD lock debug tunables writable in /sys
2025-01-28 11:34:03 -08:00
Rik van Riel
0e4a19e2bd locking/csd-lock: make CSD lock debug tunables writable in /sys
Currently the CSD lock tunables can only be set at boot time in the
kernel commandline, but the way these variables are used means there
is really no reason not to tune them at runtime through /sys.

Make the CSD lock debug tunables tunable through /sys.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2024-12-11 20:50:11 -08:00
Mathieu Desnoyers
63a48181fb smp/scf: Evaluate local cond_func() before IPI side-effects
In smp_call_function_many_cond(), the local cond_func() is evaluated
after triggering the remote CPU IPIs.

If cond_func() depends on loading shared state updated by other CPU's
IPI handlers func(), then triggering execution of remote CPUs IPI before
evaluating cond_func() may have unexpected consequences.

One example scenario is evaluating a jiffies delay in cond_func(), which
is updated by func() in the IPI handlers. This situation can prevent
execution of periodic cleanup code on the local CPU.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241203163558.3455535-1-mathieu.desnoyers@efficios.com
2024-12-05 14:25:28 +01:00
Paul E. McKenney
9861f7f66f locking/csd-lock: Switch from sched_clock() to ktime_get_mono_fast_ns()
Currently, the CONFIG_CSD_LOCK_WAIT_DEBUG code uses sched_clock() to check
for excessive CSD-lock wait times.  This works, but does not guarantee
monotonic timestamps on x86 due to the sched_clock() function's use of
the rdtsc instruction, which does not guarantee ordering.  This means
that, given successive calls to sched_clock(), the second might return
an earlier time than the second, that is, time might seem to go backwards.
This can (and does!) result in false-positive CSD-lock wait complaints
claiming almost 2^64 nanoseconds of delay.

Therefore, switch from sched_clock() to ktime_get_mono_fast_ns(), which
does guarantee monotonic timestamps via the rdtsc_ordered() function,
which as the name implies, does guarantee ordered timestamps, at least
in the absence of calls from NMI handlers, which are not involved in
this code path.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
2024-10-11 09:31:21 -07:00
Rik van Riel
9fbaa44114 smp: print only local CPU info when sched_clock goes backward
About 40% of all csd_lock warnings observed in our fleet appear to
be due to sched_clock() going backward in time (usually only a little
bit), resulting in ts0 being larger than ts2.

When the local CPU is at fault, we should print out a message reflecting
that, rather than trying to get the remote CPU's stack trace.

Signed-off-by: Rik van Riel <riel@surriel.com>
Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15 00:06:48 +05:30
Paul E. McKenney
d40760d681 locking/csd-lock: Use backoff for repeated reports of same incident
Currently, the CSD-lock diagnostics in CONFIG_CSD_LOCK_WAIT_DEBUG=y
kernels are emitted at five-second intervals.  Although this has proven
to be a good time interval for the first diagnostic, if the target CPU
keeps interrupts disabled for way longer than five seconds, the ratio
of useful new information to pointless repetition increases considerably.

Therefore, back off the time period for repeated reports of the same
incident, increasing linearly with the number of reports and logarithmicly
with the number of online CPUs.

[ paulmck: Apply Dan Carpenter feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15 00:06:48 +05:30
Paul E. McKenney
ac9d45544c locking/csd_lock: Provide an indication of ongoing CSD-lock stall
If a CSD-lock stall goes on long enough, it will cause an RCU CPU
stall warning.  This additional warning provides much additional
console-log traffic and little additional information.  Therefore,
provide a new csd_lock_is_stuck() function that returns true if there
is an ongoing CSD-lock stall.  This function will be used by the RCU
CPU stall warnings to provide a one-line indication of the stall when
this function returns true.

[ neeraj.upadhyay: Apply Rik van Riel feedback. ]
[ neeraj.upadhyay: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15 00:05:39 +05:30
Paul E. McKenney
c1972c8dc9 locking/csd_lock: Print large numbers as negatives
The CSD-lock-hold diagnostics from CONFIG_CSD_LOCK_WAIT_DEBUG are
printed in nanoseconds as unsigned long longs, which is a bit obtuse for
human readers when timing bugs result in negative CSD-lock hold times.
Yes, there are some people to whom it is immediately obvious that
18446744073709551615 is really -1, but for the rest of us...

Therefore, print these numbers as signed long longs, making the negative
hold times immediately apparent.

Reported-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leonardo Bras <leobras@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-07-29 07:44:38 +05:30
Zqiang
77aeb1b685 smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()
For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by
INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for
the debug check in __init_work() to work correctly.

But this lacks the counterpart to remove the tracked object from debug
objects again, which will cause a debug object warning once the stack is
freed.

Add the missing destroy_work_on_stack() invocation to cure that.

[ tglx: Massaged changelog ]

Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240704065213.13559-1-qiang.zhang1211@gmail.com
2024-07-10 22:40:39 +02:00
Thorsten Blum
c4df15931c smp: Use str_plural() to fix Coccinelle warnings
Fixes the following two Coccinelle/coccicheck warnings reported by
string_choices.cocci:

	opportunity for str_plural(num_cpus)
	opportunity for str_plural(num_nodes)

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240508154225.309703-2-thorsten.blum@toblux.com
2024-06-17 15:17:44 +02:00
Linus Torvalds
9a0f53e0cf CSD lock commits for v6.7
This series adds a kernel boot parameter that causes the kernel to
 panic if one of the call_smp_function() APIs is stalled for more than
 the specified duration.  This is useful in deployments in which a clean
 panic is preferable to an indefinite stall.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmU9plITHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jB91D/9OiOMtV03TXN2K+zGmJMjTFgLuVnug
 OqG4mrCv7jTJ3k6fTpu7hih/BCmG1Mu7byyPV6BUSfKsYony7L4yTPFJsjK8lNmq
 MHh847DErGieuURCDnsvqBVpYIRfXnvW9ptlf+BMCjbzz4FuUu1XhJTm+U2nab3i
 BEIEMORxDCyghh7yluAVG6sULRXOqjv5VcypwOXUbavDf0JyJTns4QlXFD85yiBr
 nvfHvLUrzu5EblA3m09lTnCaKrAlz5pwD7fWQq7bS3rz2gndR/DcVcZknQ68FMsj
 Mcf0Zf45TzyWWfMcE8LCulhMlZ2GYsIm2YkIbgwlsOAndjBrV55rsgaVbSZDke37
 QMHPGUZ7m7AjuDqpWZITrJjQHkWCtn/5tFSazHSlMwWg44pgOqvc3OgZn04tzn01
 L/guq3yBIKBJiAjdgzdx8/H9S7cSH8TLEWFY2utEAMIjef9Xzi6qwQ4X5p4K9QYJ
 Sm1hTTCbF9NeyMk0o2rokR58f13+9ewxE4RIYcwP9loo6nbBcTVN/fvdHN6jQywa
 7UIui508AUf55zQgO+5z8LsmjyNiDmDIeE8CLeVDicllWN+ne7F//AXAuA5V8m+y
 G0Mphn/lZpnddDSFlR/QFuMvGqtpIu2y8w93awq8g/VzirbkiiCia3iFcFtss4ip
 n0Y50CCu/W70fw==
 =ZD0S
 -----END PGP SIGNATURE-----

Merge tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull CSD lock update from Paul McKenney:
 "This adds a kernel boot parameter that causes the kernel to panic if
  one of the call_smp_function() APIs is stalled for more than the
  specified duration.

  This is useful in deployments in which a clean panic is preferable to
  an indefinite stall"

* tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  smp,csd: Throw an error if a CSD lock is stuck for too long
2023-10-30 17:56:53 -10:00
Thomas Gleixner
a940daa521 Merge branch 'linus' into smp/core
Pull in upstream to get the fixes so depending changes can be applied.
2023-10-17 21:40:46 +02:00
Rik van Riel
94b3f0b5af smp,csd: Throw an error if a CSD lock is stuck for too long
The CSD lock seems to get stuck in 2 "modes". When it gets stuck
temporarily, it usually gets released in a few seconds, and sometimes
up to one or two minutes.

If the CSD lock stays stuck for more than several minutes, it never
seems to get unstuck, and gradually more and more things in the system
end up also getting stuck.

In the latter case, we should just give up, so the system can dump out
a little more information about what went wrong, and, with panic_on_oops
and a kdump kernel loaded, dump a whole bunch more information about what
might have gone wrong.  In addition, there is an smp.panic_on_ipistall
kernel boot parameter that by default retains the old behavior, but when
set enables the panic after the CSD lock has been stuck for more than
the specified number of milliseconds, as in 300,000 for five minutes.

[ paulmck: Apply Imran Khan feedback. ]
[ paulmck: Apply Leonardo Bras feedback. ]

Link: https://lore.kernel.org/lkml/bc7cc8b0-f587-4451-8bcd-0daae627bcc7@paulmck-laptop/
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
2023-10-16 16:06:37 -07:00
Leonardo Bras
d090ec0df8 smp: Change function signatures to use call_single_data_t
call_single_data_t is a size-aligned typedef of struct __call_single_data.

This alignment is desirable in order to have smp_call_function*() avoid
bouncing an extra cacheline in case of an unaligned csd, given this
would hurt performance.

Since the removal of struct request->csd in commit 660e802c76
("blk-mq: use percpu csd to remote complete instead of per-rq csd") there
are no current users of smp_call_function*() with unaligned csd.

Change every 'struct __call_single_data' function parameter to
'call_single_data_t', so we have warnings if any new code tries to
introduce an smp_call_function*() call with unaligned csd.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230831063129.335425-1-leobras@redhat.com
2023-09-13 14:59:24 +02:00
Imran Khan
0d3a00b370 smp: Reduce NMI traffic from CSD waiters to CSD destination
On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang,
then all of these waiting CPUs send an NMI to the destination CPU in
order to dump its backtrace.

Given enough NMIs, the destination CPU will spent much of its time
producing backtraces, thus further delaying that CPU's response to the
original CSD IPI.  In the worst case, by the time destination CPU is
done producing all of these backtrace NMIs, the CSD wait timeout will
have elapsed so that the waiters resend their backtrace NMIs again,
further delaying forward progress.

Therefore, to avoid these delays, issue the backtrace NMI only from
the first waiter.  The destination CPU's other waiters can make use of
backtrace obtained from the first waiter's NMI.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-07-10 14:19:04 -07:00
Imran Khan
5bd00f6db0 smp: Reduce logging due to dump_stack of CSD waiters
If a waiter is waiting for CSD lock, its call stack will not change
between first and subsequent hang detection for the same CSD lock.
Therefore, do dump_stack only for first-time detection for a given waiter.

This avoids excessive logging on systems with hundreds of CPUs where
repetitive dump_stack from hundreds of CPUs would otherwise flood the
console.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-07-10 14:19:04 -07:00
Leonardo Bras
bf5a8c26ad trace,smp: Add tracepoints for scheduling remotelly called functions
Add a tracepoint for when a CSD is queued to a remote CPU's
call_single_queue. This allows finding exactly which CPU queued a given CSD
when looking at a csd_function_{entry,exit} event, and also enables us to
accurately measure IPI delivery time with e.g. a synthetic event:

  $ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\
      /sys/kernel/tracing/events/smp/csd_queue_cpu/trigger
  $ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\
      /sys/kernel/tracing/synthetic_events
  $ echo \
  'hist:keys=common_cpu,csd.hex:'\
  'time=common_timestamp.usecs-$ts:'\
  'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\
      /sys/kernel/tracing/events/smp/csd_function_entry/trigger

  $ trace-cmd record -e 'synthetic:csd_latency' hackbench
  $ trace-cmd report
  <...>-467   [001]    21.824263: csd_queue_cpu:        cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8
  <...>-467   [001]    21.824280: ipi_send_cpu:         cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0
  <...>-489   [000]    21.824299: csd_function_entry:   func=sched_ttwu_pending csd=0xffff8880076148b8
  <...>-489   [000]    21.824320: csd_latency:          dst_cpu=0, csd=18446612682193848504, time=36

Suggested-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615065944.188876-7-leobras@redhat.com
2023-06-16 22:08:09 +02:00
Leonardo Bras
949fa3f11c trace,smp: Add tracepoints around remotelly called functions
The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources
of IPIs targeting CPUs running latency-sensitive applications.

For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are
sufficient to find them and work on getting rid of them. In some setups
however, not *all* IPIs are to be suppressed, but long-running IPI
callbacks can still be problematic.

Add a pair of tracepoints to mark the start and end of processing a CSD IPI
callback, similar to what exists for softirq, workqueue or timer callbacks.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615065944.188876-5-leobras@redhat.com
2023-06-16 22:08:09 +02:00
Thomas Gleixner
ba831b7b1a cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init
No point in keeping them around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.551974164@linutronix.de
2023-05-15 13:44:47 +02:00
Peter Zijlstra
5c3124975e trace,smp: Trace all smp_function_call*() invocations
(Ab)use the trace_ipi_send_cpu*() family to trace all
smp_function_call*() invocations, not only those that result in an
actual IPI.

The queued entries log their callback function while the actual IPIs
are traced on generic_smp_call_function_single_interrupt().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2023-03-24 11:01:30 +01:00
Peter Zijlstra
68e2d17c9e trace: Add trace_ipi_send_cpu()
Because copying cpumasks around when targeting a single CPU is a bit
daft...

Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230322103004.GA571242%40hirez.programming.kicks-ass.net
2023-03-24 11:01:29 +01:00
Valentin Schneider
68f4ff04db sched, smp: Trace smp callback causing an IPI
Context
=======

The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter
which so far has only been fed with NULL.

While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing
struct layout (meaning their callback func can be accessed without caring
about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function
attached to its struct. This means we need to check the type of a CSD
before eventually dereferencing its associated callback.

This isn't as trivial as it sounds: the CSD type is stored in
__call_single_node.u_flags, which get cleared right before the callback is
executed via csd_unlock(). This implies checking the CSD type before it is
enqueued on the call_single_queue, as the target CPU's queue can be flushed
before we get to sending an IPI.

Furthermore, send_call_function_single_ipi() only has a CPU parameter, and
would need to have an additional argument to trickle down the invoked
function. This is somewhat silly, as the extra argument will always be
pushed down to the function even when nothing is being traced, which is
unnecessary overhead.

Changes
=======

send_call_function_single_ipi() is only used by smp.c, and is defined in
sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of
a CPU's idle task).

Split it into two parts: the scheduler bits remain in sched/core.c, and the
actual IPI emission is moved into smp.c. This lets us define an
__always_inline helper function that can take the related callback as
parameter without creating useless register pressure in the non-traced path
which only gains a (disabled) static branch.

Do the same thing for the multi IPI case.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230307143558.294354-8-vschneid@redhat.com
2023-03-24 11:01:29 +01:00
Valentin Schneider
253a0fb4c6 smp: reword smp call IPI comment
Accessing the call_single_queue hasn't involved a spinlock since 2014:

  6897fc22ea ("kernel: use lockless list for smp_call_function_single")

The llist operations (namely cmpxchg() and xchg()) provide similar ordering
guarantees, update the comment to lessen confusion.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230307143558.294354-7-vschneid@redhat.com
2023-03-24 11:01:28 +01:00
Valentin Schneider
08407b5f61 smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
This simply wraps around the arch function and prepends it with a
tracepoint, similar to send_call_function_single_ipi().

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307143558.294354-4-vschneid@redhat.com
2023-03-24 11:01:27 +01:00
Valentin Schneider
cc9cb0a717 sched, smp: Trace IPIs sent via send_call_function_single_ipi()
send_call_function_single_ipi() is the thing that sends IPIs at the bottom
of smp_call_function*() via either generic_exec_single() or
smp_call_function_many_cond(). Give it an IPI-related tracepoint.

Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230307143558.294354-3-vschneid@redhat.com
2023-03-24 11:01:27 +01:00
Paul E. McKenney
203e435844 kernel/smp: Make csdlock_debug= resettable
It is currently possible to set the csdlock_debug_enabled static
branch, but not to reset it.  This is an issue when several different
entities supply kernel boot parameters and also for kernels built with
CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT=y.

Therefore, make the csdlock_debug=0 kernel boot parameter turn off
debugging.  Last one wins!

Reported-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-4-paulmck@kernel.org
2023-03-24 11:01:26 +01:00
Paul E. McKenney
6366d062e7 locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
The diagnostics added by this commit were extremely useful in one instance:

a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")

However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.

Therefore, manually revert the following commit preparatory commit:

de7b09ef65 ("locking/csd_lock: Prepare more CSD lock debugging")

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-3-paulmck@kernel.org
2023-03-24 11:01:26 +01:00
Paul E. McKenney
1771257cb4 locking/csd_lock: Remove added data from CSD lock debugging
The diagnostics added by this commit were extremely useful in one instance:

a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")

However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.

Therefore, manually revert this commit, but leave a comment telling
people where to find these diagnostics.

[ paulmck: Apply Juergen Gross feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org
2023-03-24 11:01:25 +01:00
Paul E. McKenney
c521986016 locking/csd_lock: Add Kconfig option for csd_debug default
The csd_debug kernel parameter works well, but is inconvenient in cases
where it is more closely associated with boot loaders or automation than
with a particular kernel version or release.  Thererfore, provide a new
CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
1 when selected and 0 otherwise, with this latter being the default.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
2023-03-24 11:01:25 +01:00
Linus Torvalds
d4013bc4d4 bitmap patches for v6.1-rc1
From Phil Auld:
 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
 
 From me:
 cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess
 
 This series cleans that mess and adds new config FORCE_NR_CPUS that
 allows to optimize cpumask subsystem if the number of CPUs is known
 at compile-time.
 
 From me:
 lib: optimize find_bit() functions
 
 Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros.
 
 From me:
 lib/find: add find_nth_bit()
 
 Adds find_nth_bit(), which is ~70 times faster than bitcounting with
 for_each() loop:
         for_each_set_bit(bit, mask, size)
                 if (n-- == 0)
                         return bit;
 
 Also adds bitmap_weight_and() to let people replace this pattern:
 	tmp = bitmap_alloc(nbits);
 	bitmap_and(tmp, map1, map2, nbits);
 	weight = bitmap_weight(tmp, nbits);
 	bitmap_free(tmp);
 with a single bitmap_weight_and() call.
 
 From me:
 cpumask: repair cpumask_check()
 
 After switching cpumask to use nr_cpu_ids, cpumask_check() started
 generating many false-positive warnings. This series fixes it.
 
 From Valentin Schneider:
 bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot()
 
 Extends the API with one more function and applies it in sched/core.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof
 vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow
 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD
 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6
 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc
 /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001
 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm
 r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV
 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3
 GR7sMQae
 =Q/+F
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)

 - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)

   This series cleans that mess and adds new config FORCE_NR_CPUS that
   allows to optimize cpumask subsystem if the number of CPUs is known
   at compile-time.

 - optimize find_bit() functions (me)

   Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
   macros.

 - add find_nth_bit() (me)

   Adds find_nth_bit(), which is ~70 times faster than bitcounting with
   for_each() loop:

	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

   Also adds bitmap_weight_and() to let people replace this pattern:

	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

   with a single bitmap_weight_and() call.

 - repair cpumask_check() (me)

   After switching cpumask to use nr_cpu_ids, cpumask_check() started
   generating many false-positive warnings. This series fixes it.

 - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
   Schneider)

   Extends the API with one more function and applies it in sched/core.

* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
  sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  cpumask: Introduce for_each_cpu_andnot()
  lib/find_bit: Introduce find_next_andnot_bit()
  cpumask: fix checking valid cpu range
  lib/bitmap: add tests for for_each() loops
  lib/find: optimize for_each() macros
  lib/bitmap: introduce for_each_set_bit_wrap() macro
  lib/find_bit: add find_next{,_and}_bit_wrap
  cpumask: switch for_each_cpu{,_not} to use for_each_bit()
  net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
  cpumask: add cpumask_nth_{,and,andnot}
  lib/bitmap: remove bitmap_ord_to_pos
  lib/bitmap: add tests for find_nth_bit()
  lib: add find_nth{,_and,_andnot}_bit()
  lib/bitmap: add bitmap_weight_and()
  lib/bitmap: don't call __bitmap_weight() in kernel code
  tools: sync find_bit() implementation
  lib/find_bit: optimize find_next_bit() functions
  lib/find_bit: create find_first_zero_bit_le()
  ...
2022-10-10 12:49:34 -07:00
Yury Norov
6f9c07be9d lib/cpumask: add FORCE_NR_CPUS config option
The size of cpumasks is hard-limited by compile-time parameter NR_CPUS,
but defined at boot-time when kernel parses ACPI/DT tables, and stored in
nr_cpu_ids. In many practical cases, number of CPUs for a target is known
at compile time, and can be provided with NR_CPUS.

In that case, compiler may be instructed to rely on NR_CPUS as on actual
number of CPUs, not an upper limit. It allows to optimize many cpumask
routines and significantly shrink size of the kernel image.

This patch adds FORCE_NR_CPUS option to teach the compiler to rely on
NR_CPUS and enable corresponding optimizations.

If FORCE_NR_CPUS=y, kernel will not set nr_cpu_ids at boot, but only check
that the actual number of possible CPUs is equal to NR_CPUS, and WARN if
that doesn't hold.

The new option is especially useful in embedded applications because
kernel configurations are unique for each SoC, the number of CPUs is
constant and known well, and memory limitations are typically harder.

For my 4-CPU ARM64 build with NR_CPUS=4, FORCE_NR_CPUS=y saves 46KB:
  add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300)

Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-20 16:11:44 -07:00
Yury Norov
38bef8e57f smp: add set_nr_cpu_ids()
In preparation to support compile-time nr_cpu_ids, add a setter for
the variable.

This is a no-op for all arches.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-19 17:51:53 -07:00
Yury Norov
53fc190cc6 smp: don't declare nr_cpu_ids if NR_CPUS == 1
SMP and NR_CPUS are independent options, hence nr_cpu_ids may be
declared even if NR_CPUS == 1, which is useless.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-19 17:51:53 -07:00
Zhen Lei
e73dfe3093 sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
The trigger_all_cpu_backtrace() function attempts to send an NMI to the
target CPU, which usually provides much better stack traces than the
dump_cpu_task() function's approach of dumping that stack from some other
CPU.  So much so that most calls to dump_cpu_task() only happen after
a call to trigger_all_cpu_backtrace() has failed.  And the exception to
this rule really should attempt to use trigger_all_cpu_backtrace() first.

Therefore, move the trigger_all_cpu_backtrace() invocation into
dump_cpu_task().

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
2022-08-31 05:03:14 -07:00
Chen Zhongjin
9c9b26b0df locking/csd_lock: Change csdlock_debug from early_param to __setup
The csdlock_debug kernel-boot parameter is parsed by the
early_param() function csdlock_debug().  If set, csdlock_debug()
invokes static_branch_enable() to enable csd_lock_wait feature, which
triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and
CONFIG_SPARSEMEM_VMEMMAP=n.

With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
static_key_enable() and returns NULL, resulting in a NULL dereference
because mem_section is initialized only later in sparse_init().

This is also a problem for powerpc because early_param() functions
are invoked earlier than jump_label_init(), also resulting in
static_key_enable() failures.  These failures cause the warning "static
key 'xxx' used before call to jump_label_init()".

Thus, early_param is too early for csd_lock_wait to run
static_branch_enable(), so changes it to __setup to fix these.

Fixes: 8d0968cc6b ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging")
Cc: stable@vger.kernel.org
Reported-by: Chen jingwen <chenjingwen6@huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-07-19 11:40:00 -07:00