Commit Graph

404 Commits

Author SHA1 Message Date
Rafael J. Wysocki
d33e89d122 thermal: core: Suspend thermal zones later and resume them earlier
To avoid some undesirable interactions between thermal zone suspend
and resume with user space that is running when those operations are
carried out, move them closer to the suspend and resume of devices,
respectively, by updating dpm_prepare() to carry out thermal zone
suspend and dpm_complete() to start thermal zone resume (that will
continue asynchronously).

This also makes the code easier to follow by removing one, arguably
redundant, level of indirection represented by the thermal PM notifier.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/2036875.PYKUYFuaPT@rafael.j.wysocki
2026-04-08 12:30:31 +02:00
Rafael J. Wysocki
c4c6a86463 thermal: core: Allocate thermal_class statically
Define thermal_class as a static structure to simplify thermal_init()
and to simplify thermal class availability checks that will need to
be carried out during the suspend and resume of thermal zones after
subsequent changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/10831981.nUPlyArG6x@rafael.j.wysocki
2026-04-08 12:30:31 +02:00
Rafael J. Wysocki
26fd03effa thermal: core: Adjust thermal_wq allocation flags
The thermal workqueue doesn't need to be freezable or per-CPU, so drop
WQ_FREEZABLE and WQ_PERCPU from the flags when allocating it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Subject rewrite ]
Link: https://patch.msgid.link/3413335.44csPzL39Z@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-04-08 12:30:31 +02:00
Rafael J. Wysocki
323803ac63 thermal: core: Drop redundant check from thermal_zone_device_update()
Since __thermal_zone_device_update() checks if tz->state is
TZ_STATE_READY and bails out immediately otherwise, it is not
necessary to check the thermal_zone_is_present() return value in
thermal_zone_device_update().  Namely, tz->state is equal to
TZ_STATE_FLAG_INIT initially and that flag is only cleared in
thermal_zone_init_complete() after adding tz to the list of thermal
zones, and thermal_zone_exit() sets TZ_STATE_FLAG_EXIT in tz->state
while removing tz from that list.  Thus tz->state is not TZ_STATE_READY
when tz is not in the list and the check mentioned above is redundant.

Accordingly, drop the redundant thermal_zone_is_present() check from
thermal_zone_device_update() and drop the former altogether because it
has no more users.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3406806.aeNJFYEL58@rafael.j.wysocki
2026-04-08 12:30:31 +02:00
Rafael J. Wysocki
daae9c18fe thermal: core: Free thermal zone ID later during removal
The thermal zone removal ordering is different from the thermal zone
registration rollback path ordering and the former is arguably
problematic because freeing a thermal zone ID prematurely may cause
it to be used during the registration of another thermal zone which
may fail as a result.

Prevent that from occurring by changing the thermal zone removal
ordering to reflect the thermal zone registration rollback path
ordering.

Also more the ida_destroy() call from thermal_zone_device_unregister()
to thermal_release() for consistency.

Fixes: b31ef8285b ("thermal core: convert ID allocation to IDA")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5063934.GXAFRqVoOG@rafael.j.wysocki
2026-04-08 12:30:30 +02:00
Rafael J. Wysocki
41ff66baf8 thermal: core: Fix thermal zone governor cleanup issues
If thermal_zone_device_register_with_trips() fails after adding
a thermal governor to the thermal zone being registered, the
governor is not removed from it as appropriate which may lead to
a memory leak.

In turn, thermal_zone_device_unregister() calls thermal_set_governor()
without acquiring the thermal zone lock beforehand which may race with
a governor update via sysfs and may lead to a use-after-free in that
case.

Address these issues by adding two thermal_set_governor() calls, one to
thermal_release() to remove the governor from the given thermal zone,
and one to the thermal zone registration error path to cover failures
preceding the thermal zone device registration.

Fixes: e33df1d2f3 ("thermal: let governors have private data for each thermal zone")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5092923.31r3eYUQgx@rafael.j.wysocki
2026-04-08 12:30:30 +02:00
Daniel Lezcano
761fdf4c8c thermal/core: Remove pointless variable when registering a cooling device
The 'id' variable is set to store the ida_alloc() value which is
already stored into cdev->id. It is pointless to use it because
cdev->id can be used instead.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20260402084426.1360086-1-daniel.lezcano@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-04-04 15:05:18 +02:00
Rafael J. Wysocki
49c7db1f0b Merge back earlier thermal core updates for 7.1 2026-04-04 14:59:43 +02:00
Rafael J. Wysocki
9e07e3b818 thermal: core: Fix thermal zone device registration error path
If thermal_zone_device_register_with_trips() fails after registering
a thermal zone device, it needs to wait for the tz->removal completion
like thermal_zone_device_unregister(), in case user space has managed
to take a reference to the thermal zone device's kobject, in which case
thermal_release() may not be called by the error path itself and tz may
be freed prematurely.

Add the missing wait_for_completion() call to the thermal zone device
registration error path.

Fixes: 04e6ccfc93 ("thermal: core: Fix NULL pointer dereference in zone registration error path")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2849815.mvXUDI8C0e@rafael.j.wysocki
2026-04-02 11:37:35 +02:00
Rafael J. Wysocki
45b859b072 thermal: core: Address thermal zone removal races with resume
Since thermal_zone_pm_complete() and thermal_zone_device_resume()
re-initialize the poll_queue delayed work for the given thermal zone,
the cancel_delayed_work_sync() in thermal_zone_device_unregister()
may miss some already running work items and the thermal zone may
be freed prematurely [1].

There are two failing scenarios that both start with
running thermal_pm_notify_complete() right before invoking
thermal_zone_device_unregister() for one of the thermal zones.

In the first scenario, there is a work item already running for
the given thermal zone when thermal_pm_notify_complete() calls
thermal_zone_pm_complete() for that thermal zone and it continues to
run when thermal_zone_device_unregister() starts.  Since the poll_queue
delayed work has been re-initialized by thermal_pm_notify_complete(), the
running work item will be missed by the cancel_delayed_work_sync() in
thermal_zone_device_unregister() and if it continues to run past the
freeing of the thermal zone object, a use-after-free will occur.

In the second scenario, thermal_zone_device_resume() queued up by
thermal_pm_notify_complete() runs right after the thermal_zone_exit()
called by thermal_zone_device_unregister() has returned.  The poll_queue
delayed work is re-initialized by it before cancel_delayed_work_sync() is
called by thermal_zone_device_unregister(), so it may continue to run
after the freeing of the thermal zone object, which also leads to a
use-after-free.

Address the first failing scenario by ensuring that no thermal work
items will be running when thermal_pm_notify_complete() is called.
For this purpose, first move the cancel_delayed_work() call from
thermal_zone_pm_complete() to thermal_zone_pm_prepare() to prevent
new work from entering the workqueue going forward.  Next, switch
over to using a dedicated workqueue for thermal events and update
the code in thermal_pm_notify() to flush that workqueue after
thermal_pm_notify_prepare() has returned which will take care of
all leftover thermal work already on the workqueue (that leftover
work would do nothing useful anyway because all of the thermal zones
have been flagged as suspended).

The second failing scenario is addressed by adding a tz->state check
to thermal_zone_device_resume() to prevent it from re-initializing
the poll_queue delayed work if the thermal zone is going away.

Note that the above changes will also facilitate relocating the suspend
and resume of thermal zones closer to the suspend and resume of devices,
respectively.

Fixes: 5a5efdaffd ("thermal: core: Resume thermal zones asynchronously")
Reported-by: syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=3b3852c6031d0f30dfaf
Reported-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Closes: https://lore.kernel.org/linux-pm/20260324-thermal-core-uaf-init_delayed_work-v1-1-6611ae76a8a1@igalia.com/ [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Tested-by: Mauricio Faria de Oliveira <mfo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/6267615.lOV4Wx5bFT@rafael.j.wysocki
2026-03-31 13:54:03 +02:00
Thorsten Blum
5ddd020b8b thermal: core: Replace sprintf() in thermal_bind_cdev_to_trip()
Replace unbounded sprintf() with the safer snprintf(). While the
current code works correctly, snprintf() is safer and follows secure
coding best practices.

No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/20260223073245.321298-2-thorsten.blum@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-06 17:57:58 +01:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Thorsten Blum
1d97b8e3bf thermal: core: Use strnlen() in thermal_zone_device_register_with_trips()
Replace strlen() with the safer strnlen() and calculate the length of
the thermal zone name 'type' only once.

No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
[ rjw: Subject edits ]
Link: https://patch.msgid.link/20251216130943.40180-2-thorsten.blum@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-07 21:27:07 +01:00
Thorsten Blum
d113735421 thermal: core: Fix typo and indentation in comments
s/tmperature/temperature/ and adjust the indentation of the @ops
parameter description to improve readability.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251206174245.116391-2-thorsten.blum@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-12-15 12:47:39 +01:00
Linus Torvalds
d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Ahmad Fatoum
941a07cad2 thermal: core: allow user configuration of hardware protection action
In the general case, we don't know which of system shutdown or reboot is
the better action to take to protect hardware in an emergency situation. 
We thus allow the policy to come from the device-tree in the form of an
optional critical-action OF property, but so far there was no way for the
end user to configure this.

With recent addition of the hw_protection parameter, the user can now
choose a default action for the case, where the driver isn't fully sure
what's the better course of action.

Let's make use of this by passing HWPROT_ACT_DEFAULT in absence of the
critical-action OF property.

As HWPROT_ACT_DEFAULT is shutdown by default, this introduces no
functional change for users, unless they start using the new parameter.

Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-11-e1c09b090c0c@pengutronix.de
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Fabio Estevam <festevam@denx.de>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 23:24:14 -07:00
Lucas De Marchi
ff63b62d5a thermal: core: Delay exposing sysfs interface
There's a race between initializing the governor and userspace accessing
the sysfs interface. From time to time the Intel graphics CI shows this
signature:

	<1>[] #PF: error_code(0x0000) - not-present page
	<6>[] PGD 0 P4D 0
	<4>[] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
	<4>[] CPU: 3 UID: 0 PID: 562 Comm: thermald Not tainted 6.14.0-rc4-CI_DRM_16208-g7e37396f86d8+ #1
	<4>[] Hardware name: Intel Corporation Twin Lake Client Platform/AlderLake-N LP5 RVP, BIOS TWLNFWI1.R00.5222.A01.2405290634 05/29/2024
	<4>[] RIP: 0010:policy_show+0x1a/0x40

thermald tries to read the policy file between the sysfs files being
created and the governor set by thermal_set_governor(), which causes the
NULL pointer dereference.

Similarly to the hwmon interface, delay exposing the sysfs files to when
the governor is already set.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13655
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20250307-thermal-sysfs-race-v1-1-8a3d4d4ac9c4@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12 21:24:33 +01:00
Rafael J. Wysocki
0ac66e512f thermal: core: Rename function argument related to trip crossing
Rename the 'crossed_up' function argument to 'upward', which is more
proper English and a better match for representing temperature change
direction, everywhere in the code.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2360961.ElGaqSPkdT@rjwysocki.net
[ rjw: Rebased ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-17 18:48:58 +01:00
Rafael J. Wysocki
43bac1026f thermal: core: Relocate thermal zone initialization routine
Move thermal_zone_device_init() along with thermal_zone_device_check()
closer to the callers of the former, where they fit better together.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1906685.CQOukoFCf9@rjwysocki.net
2024-10-24 17:15:07 +02:00
Rafael J. Wysocki
6d5537d40c thermal: core: Use trip lists for trip crossing detection
Modify the thermal core to use three lists of trip points:

 trips_high, containing trips with thresholds strictly above the current
 thermal zone temperature,

 trips_reached, containing trips with thresholds at or below the current
 zone temperature,

 trips_invalid, containing trips with temperature equal to
 THERMAL_ZONE_INVALID,

where the first two lists are always sorted by the current trip
threshold.

For each trip in trips_high, there is no mitigation under way and
the trip threshold is equal to its temperature.  In turn, for each
trip in trips_reached, there is mitigation under way and the trip
threshold is equal to its low temperature.  The trips in trips_invalid,
of course, need not be taken into consideration.

The idea is to make __thermal_zone_device_update() walk trips_high and
trips_reached instead of walking the entire table of trip points in a
thermal zone.  Usually, it will only need to walk a few entries in one
of the lists and check one entry in the other list, depending on the
direction of the zone temperature changes, because crossing many trips
by the zone temperature in one go between two consecutive temperature
checks should be unlikely (if it occurs often, the thermal zone
temperature should probably be checked more often either or there
are too many trips).

This also helps to eliminate one temporary trip list used for trip
crossing notification (only one temporary list is needed for this
purpose instead of two) and the remaining temporary list may be sorted
by the current trip threshold value, like the trips_reached list, so
the additional notify_temp field in struct thermal_trip_desc is not
necessary any more.

Moreover, since the trips_reached and trips_high lists are sorted,
the "low" and "high" values needed by thermal_zone_set_trips() can be
determined in a straightforward way by looking at one end of each list.

Of course, additional work is needed in some places in order to
maintain the ordering of the lists, but it is limited to situations
that should be rare, like updating a trip point temperature or
hysteresis, thermal zone initialization, or system resume.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2003443.usQuhbGJ8B@rjwysocki.net
[ rjw: Added a comment to thermal_zone_handle_trips() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-24 17:15:07 +02:00
Rafael J. Wysocki
a44b5e39e4 thermal: core: Eliminate thermal_zone_trip_down()
Since thermal_zone_set_trip_temp() is now located in the same file
as thermal_trip_crossed(), it can invoke the latter directly without
using the thermal_zone_trip_down() wrapper that has no other users.

Update thermal_zone_set_trip_temp() accordingly and drop
thermal_zone_trip_down().

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1807510.VLH7GnMWUR@rjwysocki.net
2024-10-24 17:15:07 +02:00
Rafael J. Wysocki
e654a0c58d thermal: core: Relocate functions that update trip points
In preparation for subsequent changes, move two functions used
for updating trip points, thermal_zone_set_trip_temp() and
thermal_zone_set_trip_hyst(), to thermal_core.c.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/3248558.5fSG56mABF@rjwysocki.net
2024-10-24 17:15:07 +02:00
Rafael J. Wysocki
72fb849f77 thermal: core: Move some trip processing to thermal_trip_crossed()
Notice that some processing related to trip point crossing carried out
in handle_thermal_trip() and thermal_zone_set_trip_temp() may as well
be done in thermal_trip_crossed(), which allows code duplication to be
reduced, so change the code accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1982859.PYKUYFuaPT@rjwysocki.net
2024-10-24 17:15:07 +02:00
Rafael J. Wysocki
db0a46b600 thermal: core: Pass trip descriptor to thermal_trip_crossed()
In preparation for subsequent changes, modify thermal_trip_crossed()
to take a trip descriptor pointer instead of a pointer to struct
thermal_trip and propagate this change to thermal_zone_trip_down().

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/10547668.nUPlyArG6x@rjwysocki.net
2024-10-24 17:15:06 +02:00
Rafael J. Wysocki
e254ec292f thermal: core: Rearrange __thermal_zone_device_update()
In preparation for subsequent changes, move the invocations of
thermal_thresholds_handle() and thermal_zone_set_trips() in
__thermal_zone_device_update() after the processing of the
temporary trip lists.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/3323276.44csPzL39Z@rjwysocki.net
2024-10-24 17:15:06 +02:00
Rafael J. Wysocki
ca70d55ab0 thermal: core: Prepare for moving trips between sorted lists
Subsequently, trips will be moved between sorted lists in multiple
places, so replace add_trip_to_sorted_list() with an analogous
function, move_trip_to_sorted_list(), that will move a given trip
to a given sorted list.

To allow list_del() used in the new function to work, initialize the
list_node fields in trip descriptors where applicable so they are
always valid.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2960197.e9J7NaK4W3@rjwysocki.net
2024-10-24 17:15:06 +02:00
Rafael J. Wysocki
bd32eacd95 thermal: core: Rename trip list node in struct thermal_trip_desc
Since the list node field in struct thermal_trip_desc is going to be
used for purposes other than trip crossing notification, rename it
to list_node.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2201558.irdbgypaU6@rjwysocki.net
2024-10-24 17:15:06 +02:00
Rafael J. Wysocki
c12629f832 thermal: core: Build sorted lists instead of sorting them later
Since it is not expected that multiple trip points will be crossed
in one go very often (if this happens, there are too many trip points
in the given thermal zone or they are checked too rarely), quite likely
it is more efficient to build a sorted list of crossed trip points than
to put them on an unsorted list and sort it later.

Moreover, trip points are often sorted in ascending temperature order
during thermal zone registration, so building a sorted list out of
them is quite straightforward and relatively inexpensive.

Accordingly, make handle_thermal_trip() maintain list ordering when
adding trip points to the lists and get rid of separate list sorting
in __thermal_zone_device_update().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/4930656.GXAFRqVoOG@rjwysocki.net
2024-10-24 17:15:06 +02:00
Rafael J. Wysocki
dfa245f512 thermal: core: Manage thermal_governor_lock using a mutex guard
Switch over the thermal core to using a mutex guard for
thermal_governor_lock management.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3679429.R56niFO833@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-24 14:48:42 +02:00
Rafael J. Wysocki
af73d53e97 thermal: core: Separate thermal zone governor initialization
In preparation for a subsequent change that will switch over the thermal
core to using a mutex guard for managing thermal_governor_lock, move
the code running in thermal_zone_device_register_with_trips() under that
lock into a separate function called thermal_zone_init_governor().

While at it, drop a useless comment.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4408795.ejJDZkT8p0@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-24 14:48:42 +02:00
Rafael J. Wysocki
a5a98a786e thermal: core: Add and use cooling device guard
Add and use a special guard for cooling devices.

This allows quite a few error code paths to be simplified among
other things and brings in code size reduction for a good measure.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5837621.DvuYhMxLoT@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-24 14:48:23 +02:00
Rafael J. Wysocki
c690dce5dc thermal: core: Introduce thermal_instance_delete()
It is not necessary to walk the thermal_instances list in a trip
descriptor under a cooling device lock, so acquire that lock only
for deleting the given thermal instance from the list of thermal
instances in the given cdev.

Moreover, in analogy with the previous change that introduced
thermal_instance_add(), put the code deleting the given thermal
instance from the lists it is on into a separate new function
called thermal_instance_delete().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3275745.5fSG56mABF@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:57:04 +02:00
Rafael J. Wysocki
6d153f52cc thermal: core: Introduce thermal_instance_add()
To reduce the number of redundant result checks in
thermal_bind_cdev_to_trip() and make the code in it easier to
follow, move some of it to a new function called thermal_instance_add()
and make thermal_bind_cdev_to_trip() invoke that function.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3618899.iIbC2pHGDl@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:56:57 +02:00
Rafael J. Wysocki
33eab804d6 thermal: core: Call thermal_governor_update_tz() outside of cdev lock
Holding a cooling device lock under thermal_governor_update_tz() is not
necessary and it may cause lockdep to complain if any governor's
.update_tz() callback attempts to lock a cdev.

For this reason, move the thermal_governor_update_tz() calls in
thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() from
under the cdev lock.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/7749552.EvYhyI6sBW@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:56:49 +02:00
Rafael J. Wysocki
d1c8aa2a5c thermal: core: Manage thermal_list_lock using a mutex guard
Switch over the thermal core to using a mutex guard for
thermal_list_lock management.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2010397.PYKUYFuaPT@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:56:40 +02:00
Rafael J. Wysocki
6f60ae7221 thermal: core: Separate code running under thermal_list_lock
To prepare for a subsequent change that will switch over the thermal
core to using a mutex guard for thermal_list_lock management, move the
code running under thermal_list_lock during the initialization and
unregistration of cooling devices into separate functions.

While at it, drop some comments that do not add value.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/10572828.nUPlyArG6x@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:56:33 +02:00
Rafael J. Wysocki
57f076664c thermal: core: Add and use a reverse thermal zone guard
Add a guard for unlocking a locked thermal zone temporarily and use it
in thermal_zone_pm_prepare().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3344086.aeNJFYEL58@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:56:27 +02:00
Rafael J. Wysocki
cba00d16a2 thermal: core: Add and use thermal zone guard
Add and use a guard for thermal zone locking.

This allows quite a few error code paths to be simplified among
other things and brings in a noticeable code size reduction for
a good measure.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1930069.tdWV9SEqCh@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-23 11:56:19 +02:00
Rafael J. Wysocki
17f76be51c thermal: core: Pass trip descriptors to trip bind/unbind functions
The code is somewhat cleaner if struct thermal_trip_desc pointers are
passed to thermal_bind_cdev_to_trip(), thermal_unbind_cdev_from_trip(),
and print_bind_err_msg() instead of struct thermal_trip pointers, so
modify it accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2246211.NgBsaNRSFp@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:08:11 +02:00
Rafael J. Wysocki
0dc23567c2 thermal: core: Move lists of thermal instances to trip descriptors
In almost all places where a thermal zone's list of thermal instances
is walked, there is a check to match a specific trip point and it is
walked in vain whenever there are no cooling devices associated with
the given trip.

To address this, store the lists of thermal instances in trip point
descriptors instead of storing them in thermal zones and adjust all
code using those lists accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5522726.Sb9uPGUboI@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:08:03 +02:00
Rafael J. Wysocki
ee879a5ea3 thermal: core: Drop need_update field from struct thermal_zone_device
After previous changes, the need_update field in struct thermal_zone_device
is only set and never read, so drop it.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2495061.jE0xQCEvom@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:07:54 +02:00
Rafael J. Wysocki
c4cd42ebd3 thermal: core: Update thermal zones after cooling device binding
If a new cooling device is registered and it is bound to at least one
trip point in a given thermal zone, that thermal zone needs to be
updated via __thermal_zone_device_update().

Instead of doing this with the help of the need_update atomic field in
struct thermal_zone_device, which is not particularly straightforward,
make __thermal_zone_cdev_bind() return a bool value indicating whether
or not the given thermal zone needs to be updated because a new cooling
device has been bound to it and update thermal_zone_cdev_bind() to
call __thermal_zone_device_update() when this value is "true".

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2226302.Icojqenx9y@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:07:47 +02:00
Rafael J. Wysocki
fa4f9c9679 thermal: core: Consolidate thermal zone locking in the exit path
In analogy with a previous change in the thermal zone initialization
path, to avoid acquiring the thermal zone lock and releasing it multiple
times back and forth unnecessarily, move all of the code running under
thermal_list_lock in thermal_zone_device_unregister() into a new
function called thermal_zone_exit() and make the latter acquire the
thermal zone lock only once and release it along with thermal_list_lock.

For this purpose, provide an "unlocked" variant of
thermal_zone_cdev_unbind() to be called by thermal_zone_exit() under the
thermal zone lock.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1963152.taCxCBeP46@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:07:40 +02:00
Rafael J. Wysocki
1dae3e70b4 thermal: core: Mark thermal zones as exiting before unregistration
In analogy with a previous change in the thermal zone registration code
path, to ensure that __thermal_zone_device_update() will return early
for thermal zones that are going away, introduce a thermal zone state
flag representing the "exit" state and set it while deleting the thermal
zone from thermal_tz_list.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4394176.ejJDZkT8p0@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:07:33 +02:00
Rafael J. Wysocki
d07700b474 thermal: core: Consolidate thermal zone locking during initialization
The part of thermal zone initialization carried out under
thermal_list_lock acquires the thermal zone lock and releases it
multiple times back and forth which is not really necessary.

Instead of doing this, make it acquire the thermal zone lock once after
acquiring thermal_list_lock and release it along with that lock.

For this purpose, move all of the code in question to
thermal_zone_init_complete() introduced previously and provide an
"unlocked" variant of thermal_zone_cdev_bind() to be invoked from
there.

Also notice that a thermal zone does not need to be added to
thermal_tz_list under its own lock, so make the new code acquire
the thermal zone lock after adding it to the list.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1920382.CQOukoFCf9@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Rebase on top of recent thermal core changes ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-22 12:07:15 +02:00
Rafael J. Wysocki
cdf771ab47 thermal: core: Fix race between zone registration and system suspend
If the registration of a thermal zone takes place at the time when
system suspend is started, thermal_pm_notify() can run before the new
thermal zone is added to thermal_tz_list and its "suspended" flag will
not be set.  Consequently, if __thermal_zone_device_update() is called
for that thermal zone, it will not return early as expected which may
cause some destructive interference with the system suspend or resume
flow to occur.

To avoid that, make thermal_zone_init_complete() introduced previously
set the "suspended" flag for new thermal zones if it runs during system
suspend or resume.

Fixes: 4e814173a8 ("thermal: core: Fix thermal zone suspend-resume synchronization")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8490245.NyiUUSuA9g@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:07:08 +02:00
Rafael J. Wysocki
7837fa8115 thermal: core: Mark thermal zones as initializing to start with
After thermal_zone_device_register_with_trips() has called
device_register() and it has registered the new thermal zone device
with the driver core, user space may access its sysfs attributes and,
among other things, it may enable the thermal zone before it is ready.

To address this, introduce a new thermal zone state flag for
initialization and set it before calling device_register() in
thermal_zone_device_register_with_trips().  This causes
__thermal_zone_device_update() to return early until the new flag
is cleared.

To clear it when the thermal zone is ready, introduce a new
function called thermal_zone_init_complete() that will also invoke
__thermal_zone_device_update() after clearing that flag (both under the
thernal zone lock) and make thermal_zone_device_register_with_trips()
call the new function instead of checking need_update and calling
thermal_zone_device_update() when it is set.

After this change, if user space enables the thermal zone prematurely,
__thermal_zone_device_update() will return early for it until
thermal_zone_init_complete() is called.  In turn, if the thermal zone
is not enabled by user space before thermal_zone_init_complete() is
called, the __thermal_zone_device_update() call in it will return early
because the thermal zone has not been enabled yet, but that function
will be invoked again by thermal_zone_device_set_mode() when the thermal
zone is enabled and it will not return early this time.

The checking of need_update is not necessary any more because the
__thermal_zone_device_update() calls potentially triggered by cooling
device binding take place before calling thermal_zone_init_complete(),
so they all will return early, which means that
thermal_zone_init_complete() must call __thermal_zone_device_update()
in case the thermal zone is enabled prematurely by user space.

Fixes: 203d3d4aa4 ("the generic thermal sysfs driver")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/9360231.CDJkKcVGEf@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:06:58 +02:00
Rafael J. Wysocki
26c9ab8090 thermal: core: Represent suspend-related thermal zone flags as bits
Instead of using two separate fields in struct thermal_zone_device for
representing flags related to thermal zone suspend, represent them
explicitly as bits in one u8 "state" field.

Subsequently, that field will be used for addressing race conditions
related to thermal zone initialization and exit.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/7733910.EvYhyI6sBW@rjwysocki.net
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-22 12:06:52 +02:00