From 4fb352df14de4b5277f38a9874f7c19cf641ae4d Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Fri, 5 Dec 2025 16:24:05 +0100
Subject: [PATCH 01/65] PM: sleep: Do not flag runtime PM workqueue as
 freezable

Till now, the runtime PM workqueue has been flagged as freezable, so it
does not process work items during system-wide PM transitions like
system suspend and resume.  The original reason to do that was to
reduce the likelihood of runtime PM getting in the way of system-wide
PM processing, but now it is mostly an optimization because (1) runtime
suspend of devices is prevented by bumping up their runtime PM usage
counters in device_prepare() and (2) device drivers are expected to
disable runtime PM for the devices handled by them before they embark
on system-wide PM activities that may change the state of the hardware
or otherwise interfere with runtime PM.  However, it prevents
asynchronous runtime resume of devices from working during system-wide
PM transitions, which is confusing because synchronous runtime resume
is not prevented at the same time, and it also sometimes turns out to
be problematic.

For example, it has been reported that blk_queue_enter() may deadlock
during a system suspend transition because of the pm_request_resume()
usage in it [1].  It may also deadlock during a system resume transition
in a similar way.  That happens because the asynchronous runtime resume
of the given device is not processed due to the freezing of the runtime
PM workqueue.  While it may be better to address this particular issue
in the block layer, the very presence of it means that similar problems
may be expected to occur elsewhere.

For this reason, remove the WQ_FREEZABLE flag from the runtime PM
workqueue and make device_suspend_late() use the generic variant of
pm_runtime_disable() that will carry out runtime PM of the device
synchronously if there is pending resume work for it.

Also update the comment before the pm_runtime_disable() call in
device_suspend_late(), to document the fact that the runtime PM
should not be expected to work for the device until the end of
device_resume_early(), and update the related documentation.

This change may, even though it is not expected to, uncover some
latent issues related to queuing up asynchronous runtime resume
work items during system suspend or hibernation.  However, they
should be limited to the interference between runtime resume and
system-wide PM callbacks in the cases when device drivers start
to handle system-wide PM before disabling runtime PM as described
above.

Link: https://lore.kernel.org/linux-pm/20251126101636.205505-2-yang.yang@vivo.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/12794222.O9o76ZdvQC@rafael.j.wysocki
---
 Documentation/power/runtime_pm.rst | 7 +++----
 drivers/base/power/main.c          | 7 ++++---
 kernel/power/main.c                | 2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst
index 455b9d135d85..a53ab09c37d5 100644
--- a/Documentation/power/runtime_pm.rst
+++ b/Documentation/power/runtime_pm.rst
@@ -712,10 +712,9 @@ out the following operations:
   * During system suspend pm_runtime_get_noresume() is called for every device
     right before executing the subsystem-level .prepare() callback for it and
     pm_runtime_barrier() is called for every device right before executing the
-    subsystem-level .suspend() callback for it.  In addition to that the PM core
-    calls __pm_runtime_disable() with 'false' as the second argument for every
-    device right before executing the subsystem-level .suspend_late() callback
-    for it.
+    subsystem-level .suspend() callback for it.  In addition to that, the PM
+    core disables runtime PM for every device right before executing the
+    subsystem-level .suspend_late() callback for it.
 
   * During system resume pm_runtime_enable() and pm_runtime_put() are called for
     every device right after executing the subsystem-level .resume_early()
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 97a8b4fcf471..189de5250f25 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1647,10 +1647,11 @@ static void device_suspend_late(struct device *dev, pm_message_t state, bool asy
 		goto Complete;
 
 	/*
-	 * Disable runtime PM for the device without checking if there is a
-	 * pending resume request for it.
+	 * After this point, any runtime PM operations targeting the device
+	 * will fail until the corresponding pm_runtime_enable() call in
+	 * device_resume_early().
 	 */
-	__pm_runtime_disable(dev, false);
+	pm_runtime_disable(dev);
 
 	if (dev->power.syscore)
 		goto Skip;
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 03b2c5495c77..5f8c9e12eaec 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -1125,7 +1125,7 @@ EXPORT_SYMBOL_GPL(pm_wq);
 
 static int __init pm_start_workqueues(void)
 {
-	pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0);
+	pm_wq = alloc_workqueue("pm", WQ_UNBOUND, 0);
 	if (!pm_wq)
 		return -ENOMEM;
 

From 6b401a5b2d2acf56ec902f96f6381982457ab339 Mon Sep 17 00:00:00 2001
From: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Date: Tue, 2 Dec 2025 10:10:12 +0530
Subject: [PATCH 02/65] cpupower: idle_monitor: fix incorrect value logged
 after stop
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The cpuidle sysfs monitor printed the previous sample’s counter
value in cpuidle_stop() instead of the freshly read one. The dprint
line used previous_count[cpu][state] while current_count[cpu][state]
had just been populated. This caused misleading debug output.

Switch the logging to current_count so the post-interval snapshot
matches the displayed value.

Link: https://lore.kernel.org/r/20251202044012.3844790-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c b/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c
index 8b42c2f0a5b0..4225eff9833d 100644
--- a/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c
+++ b/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c
@@ -70,7 +70,7 @@ static int cpuidle_stop(void)
 			current_count[cpu][state] =
 				cpuidle_state_time(cpu, state);
 			dprint("CPU %d - State: %d - Val: %llu\n",
-			       cpu, state, previous_count[cpu][state]);
+			       cpu, state, current_count[cpu][state]);
 		}
 	}
 	return 0;

From 24858a84163c8d04827166b3bcaed80612bb62fc Mon Sep 17 00:00:00 2001
From: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Date: Wed, 26 Nov 2025 14:46:13 +0530
Subject: [PATCH 03/65] tools/cpupower: Fix inverted APERF capability check

The capability check was inverted, causing the function to return
error when APERF support is available and proceed when it is not.

Negate the condition to return error only when APERF capability
is absent.

Link: https://lore.kernel.org/r/20251126091613.567480-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/power/cpupower/utils/cpufreq-info.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c
index 7d3732f5f2f6..5fe01e516817 100644
--- a/tools/power/cpupower/utils/cpufreq-info.c
+++ b/tools/power/cpupower/utils/cpufreq-info.c
@@ -270,7 +270,7 @@ static int get_freq_hardware(unsigned int cpu, unsigned int human)
 {
 	unsigned long freq;
 
-	if (cpupower_cpu_info.caps & CPUPOWER_CAP_APERF)
+	if (!(cpupower_cpu_info.caps & CPUPOWER_CAP_APERF))
 		return -EINVAL;
 
 	freq = cpufreq_get_freq_hardware(cpu);

From 1b9aaf36b7b40235e5a529c15848c3d866362207 Mon Sep 17 00:00:00 2001
From: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Date: Thu, 27 Nov 2025 10:15:36 +0530
Subject: [PATCH 04/65] tools/cpupower: Use strcspn() to strip trailing newline

Replace manual newline removal with strcspn() which is safer and
cleaner. This avoids potential out-of-bounds access on empty strings
and handles the case where no newline exists.

Link: https://lore.kernel.org/r/20251127044536.715722-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/power/cpupower/lib/cpuidle.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/power/cpupower/lib/cpuidle.c b/tools/power/cpupower/lib/cpuidle.c
index f2c1139adf71..6a881d93d2e9 100644
--- a/tools/power/cpupower/lib/cpuidle.c
+++ b/tools/power/cpupower/lib/cpuidle.c
@@ -193,8 +193,7 @@ static char *cpuidle_state_get_one_string(unsigned int cpu,
 	if (result == NULL)
 		return NULL;
 
-	if (result[strlen(result) - 1] == '\n')
-		result[strlen(result) - 1] = '\0';
+	result[strcspn(result, "\n")] = '\0';
 
 	return result;
 }
@@ -366,8 +365,7 @@ static char *sysfs_cpuidle_get_one_string(enum cpuidle_string which)
 	if (result == NULL)
 		return NULL;
 
-	if (result[strlen(result) - 1] == '\n')
-		result[strlen(result) - 1] = '\0';
+	result[strcspn(result, "\n")] = '\0';
 
 	return result;
 }

From f9bd3762cf1bd0c2465f2e6121b340883471d1bf Mon Sep 17 00:00:00 2001
From: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Date: Mon, 1 Dec 2025 17:47:45 +0530
Subject: [PATCH 05/65] tools/power cpupower: Reset errno before strtoull()

cpuidle_state_get_one_value() never cleared errno before calling
strtoull(), so a prior ERANGE caused every cpuidle counter read to
return zero. Reset errno to 0 before the conversion so each sysfs read
is evaluated independently.

Link: https://lore.kernel.org/r/20251201121745.3776703-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/power/cpupower/lib/cpuidle.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/power/cpupower/lib/cpuidle.c b/tools/power/cpupower/lib/cpuidle.c
index 6a881d93d2e9..2fcb343d8e75 100644
--- a/tools/power/cpupower/lib/cpuidle.c
+++ b/tools/power/cpupower/lib/cpuidle.c
@@ -150,6 +150,7 @@ unsigned long long cpuidle_state_get_one_value(unsigned int cpu,
 	if (len == 0)
 		return 0;
 
+	errno = 0;
 	value = strtoull(linebuf, &endp, 0);
 
 	if (endp == linebuf || errno == ERANGE)

From ff72619e11348ab189e232c59515dd5c33780d7c Mon Sep 17 00:00:00 2001
From: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Date: Tue, 2 Dec 2025 12:24:03 +0530
Subject: [PATCH 06/65] tools/power cpupower: Show C0 in idle-info dump

`cpupower idle-info -o` skipped C0 because the loop began at 1:

  before:
    states:
      C1 ... latency[002] residency[00002]
      C2 ... latency[010] residency[00020]
      C3 ... latency[133] residency[00600]

  after:
    states:
      C0 ... latency[000] residency[00000]
      C1 ... latency[002] residency[00002]
      C2 ... latency[010] residency[00020]
      C3 ... latency[133] residency[00600]

Start iterating at index 0 so the idle report mirrors sysfs and
includes C0 stats.

Link: https://lore.kernel.org/r/20251202065403.1492807-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/power/cpupower/utils/cpuidle-info.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/cpupower/utils/cpuidle-info.c b/tools/power/cpupower/utils/cpuidle-info.c
index e0d17f0de3fe..81b4763a97d6 100644
--- a/tools/power/cpupower/utils/cpuidle-info.c
+++ b/tools/power/cpupower/utils/cpuidle-info.c
@@ -111,7 +111,7 @@ static void proc_cpuidle_cpu_output(unsigned int cpu)
 	printf(_("max_cstate:              C%u\n"), cstates-1);
 	printf(_("maximum allowed latency: %lu usec\n"), max_allowed_cstate);
 	printf(_("states:\t\n"));
-	for (cstate = 1; cstate < cstates; cstate++) {
+	for (cstate = 0; cstate < cstates; cstate++) {
 		printf(_("    C%d:                  "
 			 "type[C%d] "), cstate, cstate);
 		printf(_("promotion[--] demotion[--] "));

From 77cf053b041fe13d1fdd2e572e16ee7776ff687d Mon Sep 17 00:00:00 2001
From: Lifeng Zheng <zhenglifeng1@huawei.com>
Date: Tue, 2 Dec 2025 15:27:26 +0800
Subject: [PATCH 07/65] cpufreq: Return -EOPNOTSUPP if no policy supports boost

In cpufreq_boost_trigger_state(), if none of the the policies support
boost, policy_set_boost() will not be called and this function will
return 0.

But it is better to return an error to indicate that the platform
doesn't support boost.

Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251202072727.1368285-2-zhenglifeng1@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4472bb1ec83c..8de9c94c097f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2803,7 +2803,7 @@ static int cpufreq_boost_trigger_state(int state)
 {
 	struct cpufreq_policy *policy;
 	unsigned long flags;
-	int ret = 0;
+	int ret = -EOPNOTSUPP;
 
 	/*
 	 * Don't compare 'cpufreq_driver->boost_enabled' with 'state' here to
@@ -2823,6 +2823,10 @@ static int cpufreq_boost_trigger_state(int state)
 		if (ret)
 			goto err_reset_state;
 	}
+
+	if (ret)
+		goto err_reset_state;
+
 	cpus_read_unlock();
 
 	return 0;

From 78d83b293891c597cef773eb17d9cc02b386f21a Mon Sep 17 00:00:00 2001
From: Lifeng Zheng <zhenglifeng1@huawei.com>
Date: Tue, 2 Dec 2025 15:27:27 +0800
Subject: [PATCH 08/65] cpufreq: cpufreq_boost_trigger_state() optimization

Optimize the error handling code in cpufreq_boost_trigger_state().

Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
[ rjw: Changelog edit ]
Link: https://patch.msgid.link/20251202072727.1368285-3-zhenglifeng1@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 8de9c94c097f..50dde2980f1b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2820,19 +2820,14 @@ static int cpufreq_boost_trigger_state(int state)
 			continue;
 
 		ret = policy_set_boost(policy, state);
-		if (ret)
-			goto err_reset_state;
+		if (unlikely(ret))
+			break;
 	}
 
-	if (ret)
-		goto err_reset_state;
-
 	cpus_read_unlock();
 
-	return 0;
-
-err_reset_state:
-	cpus_read_unlock();
+	if (likely(!ret))
+		return 0;
 
 	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	cpufreq_driver->boost_enabled = !state;

From 549a1be5cebb7079789e5821d8ad53140e181367 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Date: Fri, 2 Jan 2026 13:49:14 +0100
Subject: [PATCH 09/65] OPP: of: Simplify with scoped for each OF child loop

Use scoped for-each loop when iterating over device nodes to make code a
bit simpler.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/opp/of.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 1e0d0adb18e1..a268c2b250c0 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -956,7 +956,6 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
 /* Initializes OPP tables based on new bindings */
 static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
 {
-	struct device_node *np;
 	int ret, count = 0;
 	struct dev_pm_opp *opp;
 
@@ -971,13 +970,12 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
 	}
 
 	/* We have opp-table node now, iterate over it and add OPPs */
-	for_each_available_child_of_node(opp_table->np, np) {
+	for_each_available_child_of_node_scoped(opp_table->np, np) {
 		opp = _opp_add_static_v2(opp_table, dev, np);
 		if (IS_ERR(opp)) {
 			ret = PTR_ERR(opp);
 			dev_err(dev, "%s: Failed to add OPP, %d\n", __func__,
 				ret);
-			of_node_put(np);
 			goto remove_static_opp;
 		} else if (opp) {
 			count++;

From 25ff69011ddf9ec73114382dc90040a4cad490b0 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Mon, 15 Dec 2025 13:12:29 +0200
Subject: [PATCH 10/65] intel_idle: Remove unused driver version constant

The INTEL_IDLE_VERSION constant has not been updated since 2020 and serves
no useful purpose. The driver version is implicitly defined by the kernel
version, making this constant redundant.

Remove the constant to eliminate potential confusion about version
tracking.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20251215111229.132705-1-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/idle/intel_idle.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 9ba83954c255..aa44b3c2cb2c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -63,8 +63,6 @@
 #include <asm/fpu/api.h>
 #include <asm/smp.h>
 
-#define INTEL_IDLE_VERSION "0.5.1"
-
 static struct cpuidle_driver intel_idle_driver = {
 	.name = "intel_idle",
 	.owner = THIS_MODULE,
@@ -2478,9 +2476,6 @@ static int __init intel_idle_init(void)
 		return -ENODEV;
 	}
 
-	pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
-		 boot_cpu_data.x86_model);
-
 	intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
 	if (!intel_idle_cpuidle_devices)
 		return -ENOMEM;

From a36dc37b56722bc114d5dd5657b884334031eb49 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Mon, 15 Dec 2025 13:13:00 +0200
Subject: [PATCH 11/65] intel_idle: Remove the 'preferred_cstates' parameter

Remove the 'preferred_cstates' module parameter as it is not really useful.

The parameter currently only affects Alder Lake, where it controls C1/C1E
preference, with C1E being the default. The parameter does not support any
other platform. For example, Meteor Lake has a similar C1/C1E limitation,
but the parameter does not support Meteor Lake. This indicates that the
parameter is not very useful.

Generally, independent C1 and C1E are important for server platforms where
low latency is key. However, they are not as important for client platforms,
like Alder Lake, where C1E providing better energy savings is generally
preferred.

The parameter was originally introduced for Sapphire Rapids Xeon:
da0e58c038e6 intel_idle: add 'preferred_cstates' module argument

Later it was added to Alder Lake:
d1cf8bbfed1ed ("intel_idle: Add AlderLake support")

But it was removed from Sapphire Rapids when firmware fixed the C1/C1E
limitation:
1548fac47a114 ("intel_idle: make SPR C1 and C1E be independent")

So Alder Lake is the only platform left where this parameter has any effect.
Remove this parameter to simplify the driver and reduce maintenance burden.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251215111300.132803-1-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/idle/intel_idle.c | 36 ------------------------------------
 1 file changed, 36 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index aa44b3c2cb2c..2d67a091ed3f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -70,7 +70,6 @@ static struct cpuidle_driver intel_idle_driver = {
 /* intel_idle.max_cstate=0 disables driver */
 static int max_cstate = CPUIDLE_STATE_MAX - 1;
 static unsigned int disabled_states_mask __read_mostly;
-static unsigned int preferred_states_mask __read_mostly;
 static bool force_irq_on __read_mostly;
 static bool ibrs_off __read_mostly;
 
@@ -2049,25 +2048,6 @@ static void __init skx_idle_state_table_update(void)
 	}
 }
 
-/**
- * adl_idle_state_table_update - Adjust AlderLake idle states table.
- */
-static void __init adl_idle_state_table_update(void)
-{
-	/* Check if user prefers C1 over C1E. */
-	if (preferred_states_mask & BIT(1) && !(preferred_states_mask & BIT(2))) {
-		cpuidle_state_table[0].flags &= ~CPUIDLE_FLAG_UNUSABLE;
-		cpuidle_state_table[1].flags |= CPUIDLE_FLAG_UNUSABLE;
-
-		/* Disable C1E by clearing the "C1E promotion" bit. */
-		c1e_promotion = C1E_PROMOTION_DISABLE;
-		return;
-	}
-
-	/* Make sure C1E is enabled by default */
-	c1e_promotion = C1E_PROMOTION_ENABLE;
-}
-
 /**
  * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
  */
@@ -2174,11 +2154,6 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 	case INTEL_EMERALDRAPIDS_X:
 		spr_idle_state_table_update();
 		break;
-	case INTEL_ALDERLAKE:
-	case INTEL_ALDERLAKE_L:
-	case INTEL_ATOM_GRACEMONT:
-		adl_idle_state_table_update();
-		break;
 	case INTEL_ATOM_SILVERMONT:
 	case INTEL_ATOM_AIRMONT:
 		byt_cht_auto_demotion_disable();
@@ -2532,17 +2507,6 @@ module_param(max_cstate, int, 0444);
  */
 module_param_named(states_off, disabled_states_mask, uint, 0444);
 MODULE_PARM_DESC(states_off, "Mask of disabled idle states");
-/*
- * Some platforms come with mutually exclusive C-states, so that if one is
- * enabled, the other C-states must not be used. Example: C1 and C1E on
- * Sapphire Rapids platform. This parameter allows for selecting the
- * preferred C-states among the groups of mutually exclusive C-states - the
- * selected C-states will be registered, the other C-states from the mutually
- * exclusive group won't be registered. If the platform has no mutually
- * exclusive C-states, this parameter has no effect.
- */
-module_param_named(preferred_cstates, preferred_states_mask, uint, 0444);
-MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
 /*
  * Debugging option that forces the driver to enter all C-states with
  * interrupts enabled. Does not apply to C-states with

From ff24f314447a25164bac85cb310c382e289afdbe Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Tue, 16 Dec 2025 10:04:00 +0200
Subject: [PATCH 12/65] intel_idle: Initialize sysfs after cpuidle driver
 initialization

Reorder initialization calls to initialize the internal driver data before
sysfs:

Was:
intel_idle_sysfs_init();
intel_idle_cpuidle_driver_init();

Now:
intel_idle_cpuidle_driver_init();
intel_idle_sysfs_init();

Follow the general principle that drivers should initialize internal state
before registering external interfaces like sysfs, avoiding potential usage
before full initialization.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251216080402.156988-2-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/idle/intel_idle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 2d67a091ed3f..f64463e00df7 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -2455,12 +2455,12 @@ static int __init intel_idle_init(void)
 	if (!intel_idle_cpuidle_devices)
 		return -ENOMEM;
 
+	intel_idle_cpuidle_driver_init(&intel_idle_driver);
+
 	retval = intel_idle_sysfs_init();
 	if (retval)
 		pr_warn("failed to initialized sysfs");
 
-	intel_idle_cpuidle_driver_init(&intel_idle_driver);
-
 	retval = cpuidle_register_driver(&intel_idle_driver);
 	if (retval) {
 		struct cpuidle_driver *drv = cpuidle_get_driver();

From 111f77a233484cf39a6317f4d0306387e9ffda7b Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Tue, 16 Dec 2025 10:04:01 +0200
Subject: [PATCH 13/65] intel_idle: Add cmdline option to adjust C-states table

Add a new module parameter that allows adjusting the C-states table used by
the driver.

Currently, the C-states table is hardcoded in the driver based on the CPU
model. The goal is to have good enough defaults for most users.

However, C-state characteristics, such as exit latency and residency, can
vary between different variants of the same CPU model and BIOS settings.
Moreover, different platform usage models and user preferences may benefit
from different C-state target_residency values.

Provide a way for users to adjust the C-states table via a module parameter
"table". The general format is:
"state1:latency1:target_residency1,state2:latency2:target_residency2,..."

In other words, represent each C-state by its name, exit latency (in
microseconds), and target residency (in microseconds), separated by colons.
Separate multiple C-states by commas.

For example, suppose a CPU has 3 C-states with the following
characteristics:
  C1:  exit_latency=1, target_residency=2
  C1E: exit_latency=10, target_residency=10
  C6:  exit_latency=100, target_residency=500

Users can specify a custom C-states table as follows:

1. intel_idle.table="C1:2:2,C1E:5:20,C6:150:600"
   Result: C1:  exit_latency=2, target_residency=2
           C1E: exit_latency=5, target_residency=20
           C6:  exit_latency=150, target_residency=600
2. intel_idle.table="C6::400"
   Result: C1:  exit_latency=1, target_residency=2 (unchanged)
           C1E: exit_latency=10, target_residency=10 (unchanged)
           C6:  exit_latency=100, target_residency=400
                (only target_residency changed)

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251216080402.156988-3-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/idle/intel_idle.c | 169 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 169 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f64463e00df7..ab6b86ff9905 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -73,6 +73,10 @@ static unsigned int disabled_states_mask __read_mostly;
 static bool force_irq_on __read_mostly;
 static bool ibrs_off __read_mostly;
 
+/* The maximum allowed length for the 'table' module parameter  */
+#define MAX_CMDLINE_TABLE_LEN 256
+static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __read_mostly;
+
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 
 static unsigned long auto_demotion_disable_flags;
@@ -104,6 +108,9 @@ static struct device *sysfs_root __initdata;
 static const struct idle_cpu *icpu __initdata;
 static struct cpuidle_state *cpuidle_state_table __initdata;
 
+/* C-states data from the 'intel_idle.table' cmdline parameter */
+static struct cpuidle_state cmdline_states[CPUIDLE_STATE_MAX] __initdata;
+
 static unsigned int mwait_substates __initdata;
 
 /*
@@ -2393,6 +2400,149 @@ static void __init intel_idle_sysfs_uninit(void)
 	put_device(sysfs_root);
 }
 
+ /**
+  * get_cmdline_field - Get the current field from a cmdline string.
+  * @args: The cmdline string to get the current field from.
+  * @field: Pointer to the current field upon return.
+  * @sep: The fields separator character.
+  *
+  * Examples:
+  *   Input: args="C1:1:1,C1E:2:10", sep=':'
+  *   Output: field="C1", return "1:1,C1E:2:10"
+  *   Input: args="C1:1:1,C1E:2:10", sep=','
+  *   Output: field="C1:1:1", return "C1E:2:10"
+  *   Ipnut: args="::", sep=':'
+  *   Output: field="", return ":"
+  *
+  * Return: The continuation of the cmdline string after the field or NULL.
+  */
+static char *get_cmdline_field(char *args, char **field, char sep)
+{
+	unsigned int i;
+
+	for (i = 0; args[i] && !isspace(args[i]); i++) {
+		if (args[i] == sep)
+			break;
+	}
+
+	*field = args;
+
+	if (args[i] != sep)
+		return NULL;
+
+	args[i] = '\0';
+	return args + i + 1;
+}
+
+/**
+ * cmdline_table_adjust - Adjust the C-states table with data from cmdline.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ *
+ * Adjust the C-states table with data from the 'intel_idle.table' module
+ * parameter (if specified).
+ */
+static void __init cmdline_table_adjust(struct cpuidle_driver *drv)
+{
+	char *args = cmdline_table_str;
+	struct cpuidle_state *state;
+	int i;
+
+	if (args[0] == '\0')
+		/* The 'intel_idle.table' module parameter was not specified */
+		return;
+
+	/* Create a copy of the C-states table */
+	for (i = 0; i < drv->state_count; i++)
+		cmdline_states[i] = drv->states[i];
+
+	/*
+	 * Adjust the C-states table copy with data from the 'intel_idle.table'
+	 * module parameter.
+	 */
+	while (args) {
+		char *fields, *name, *val;
+
+		/*
+		 * Get the next C-state definition, which is expected to be
+		 * '<name>:<latency_us>:<target_residency_us>'. Treat "empty"
+		 * fields as unchanged. For example,
+		 * '<name>::<target_residency_us>' leaves the latency unchanged.
+		 */
+		args = get_cmdline_field(args, &fields, ',');
+
+		/* name */
+		fields = get_cmdline_field(fields, &name, ':');
+		if (!fields)
+			goto error;
+
+		if (!strcmp(name, "POLL")) {
+			pr_err("Cannot adjust POLL\n");
+			continue;
+		}
+
+		/* Find the C-state by its name */
+		state = NULL;
+		for (i = 0; i < drv->state_count; i++) {
+			if (!strcmp(name, drv->states[i].name)) {
+				state = &cmdline_states[i];
+				break;
+			}
+		}
+
+		if (!state) {
+			pr_err("C-state '%s' was not found\n", name);
+			continue;
+		}
+
+		/* Latency */
+		fields = get_cmdline_field(fields, &val, ':');
+		if (!fields)
+			goto error;
+
+		if (*val) {
+			if (kstrtouint(val, 0, &state->exit_latency))
+				goto error;
+		}
+
+		/* Target residency */
+		fields = get_cmdline_field(fields, &val, ':');
+
+		if (*val) {
+			if (kstrtouint(val, 0, &state->target_residency))
+				goto error;
+		}
+
+		/*
+		 * Allow for 3 more fields, but ignore them. Helps to make
+		 * possible future extensions of the cmdline format backward
+		 * compatible.
+		 */
+		for (i = 0; fields && i < 3; i++) {
+			fields = get_cmdline_field(fields, &val, ':');
+			if (!fields)
+				break;
+		}
+
+		if (fields) {
+			pr_err("Too many fields for C-state '%s'\n", state->name);
+			goto error;
+		}
+
+		pr_info("C-state from cmdline: name=%s, latency=%u, residency=%u\n",
+			state->name, state->exit_latency, state->target_residency);
+	}
+
+	/* Copy the adjusted C-states table back */
+	for (i = 1; i < drv->state_count; i++)
+		drv->states[i] = cmdline_states[i];
+
+	pr_info("Adjusted C-states with data from 'intel_idle.table'\n");
+	return;
+
+error:
+	pr_info("Failed to adjust C-states with data from 'intel_idle.table'\n");
+}
+
 static int __init intel_idle_init(void)
 {
 	const struct x86_cpu_id *id;
@@ -2456,6 +2606,7 @@ static int __init intel_idle_init(void)
 		return -ENOMEM;
 
 	intel_idle_cpuidle_driver_init(&intel_idle_driver);
+	cmdline_table_adjust(&intel_idle_driver);
 
 	retval = intel_idle_sysfs_init();
 	if (retval)
@@ -2519,3 +2670,21 @@ module_param(force_irq_on, bool, 0444);
  */
 module_param(ibrs_off, bool, 0444);
 MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle");
+
+/*
+ * Define the C-states table from a user input string. Expected format is
+ * 'name:latency:residency', where:
+ * - name: The C-state name.
+ * - latency: The C-state exit latency in us.
+ * - residency: The C-state target residency in us.
+ *
+ * Multiple C-states can be defined by separating them with commas:
+ * 'name1:latency1:residency1,name2:latency2:residency2'
+ *
+ * Example: intel_idle.table=C1:1:1,C1E:5:10,C6:100:600
+ *
+ * To leave latency or residency unchanged, use an empty field, for example:
+ * 'C1:1:1,C1E::10' - leaves C1E latency unchanged.
+ */
+module_param_string(table, cmdline_table_str, MAX_CMDLINE_TABLE_LEN, 0444);
+MODULE_PARM_DESC(table, "Build the C-states table from a user input string");

From be6a150829b375c1b53d7ea5794ccc9edd2e0c9c Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Tue, 16 Dec 2025 10:04:02 +0200
Subject: [PATCH 14/65] intel_idle: Add C-states validation

Add validation for C-states specified via the "table=" module parameter.
Treat this module parameter as untrusted input and validate it thoroughly.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251216080402.156988-4-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/idle/intel_idle.c | 54 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ab6b86ff9905..f49c939d636f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -45,6 +45,7 @@
 #include <linux/kernel.h>
 #include <linux/cpuidle.h>
 #include <linux/tick.h>
+#include <linux/time64.h>
 #include <trace/events/power.h>
 #include <linux/sched.h>
 #include <linux/sched/smt.h>
@@ -75,6 +76,11 @@ static bool ibrs_off __read_mostly;
 
 /* The maximum allowed length for the 'table' module parameter  */
 #define MAX_CMDLINE_TABLE_LEN 256
+/* Maximum allowed C-state latency */
+#define MAX_CMDLINE_LATENCY_US (5 * USEC_PER_MSEC)
+/* Maximum allowed C-state target residency */
+#define MAX_CMDLINE_RESIDENCY_US (100 * USEC_PER_MSEC)
+
 static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __read_mostly;
 
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
@@ -2434,6 +2440,41 @@ static char *get_cmdline_field(char *args, char **field, char sep)
 	return args + i + 1;
 }
 
+/**
+ * validate_cmdline_cstate - Validate a C-state from cmdline.
+ * @state: The C-state to validate.
+ * @prev_state: The previous C-state in the table or NULL.
+ *
+ * Return: 0 if the C-state is valid or -EINVAL otherwise.
+ */
+static int validate_cmdline_cstate(struct cpuidle_state *state,
+				   struct cpuidle_state *prev_state)
+{
+	if (state->exit_latency == 0)
+		/* Exit latency 0 can only be used for the POLL state */
+		return -EINVAL;
+
+	if (state->exit_latency > MAX_CMDLINE_LATENCY_US)
+		return -EINVAL;
+
+	if (state->target_residency > MAX_CMDLINE_RESIDENCY_US)
+		return -EINVAL;
+
+	if (state->target_residency < state->exit_latency)
+		return -EINVAL;
+
+	if (!prev_state)
+		return 0;
+
+	if (state->exit_latency <= prev_state->exit_latency)
+		return -EINVAL;
+
+	if (state->target_residency <= prev_state->target_residency)
+		return -EINVAL;
+
+	return 0;
+}
+
 /**
  * cmdline_table_adjust - Adjust the C-states table with data from cmdline.
  * @drv: cpuidle driver (assumed to point to intel_idle_driver).
@@ -2532,6 +2573,19 @@ static void __init cmdline_table_adjust(struct cpuidle_driver *drv)
 			state->name, state->exit_latency, state->target_residency);
 	}
 
+	/* Validate the adjusted C-states, start with index 1 to skip POLL */
+	for (i = 1; i < drv->state_count; i++) {
+		struct cpuidle_state *prev_state;
+
+		state = &cmdline_states[i];
+		prev_state = &cmdline_states[i - 1];
+
+		if (validate_cmdline_cstate(state, prev_state)) {
+			pr_err("C-state '%s' validation failed\n", state->name);
+			goto error;
+		}
+	}
+
 	/* Copy the adjusted C-states table back */
 	for (i = 1; i < drv->state_count; i++)
 		drv->states[i] = cmdline_states[i];

From 1ade6a4f7f09d5d6f6fc449e6bfa92b5e2d063c2 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 20:52:33 +0100
Subject: [PATCH 15/65] USB: core: Discard pm_runtime_put() return value

To allow the return type of pm_runtime_put() to be changed to void in the
future, modify usb_autopm_put_interface_async() to discard the return
value of pm_runtime_put().

That value is merely used in a debug comment printed by the function in
question and it is not a particularly useful piece of information
because pm_runtime_put() does not guarantee that the device will be
suspended even if it successfully queues up a work item to check
whether or not the device can be suspended.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/5058509.GXAFRqVoOG@rafael.j.wysocki
---
 drivers/usb/core/driver.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index d29edc7c616a..2f5958bc4f7f 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1810,13 +1810,11 @@ EXPORT_SYMBOL_GPL(usb_autopm_put_interface);
 void usb_autopm_put_interface_async(struct usb_interface *intf)
 {
 	struct usb_device	*udev = interface_to_usbdev(intf);
-	int			status;
 
 	usb_mark_last_busy(udev);
-	status = pm_runtime_put(&intf->dev);
-	dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n",
-			__func__, atomic_read(&intf->dev.power.usage_count),
-			status);
+	pm_runtime_put(&intf->dev);
+	dev_vdbg(&intf->dev, "%s: cnt %d\n",
+			__func__, atomic_read(&intf->dev.power.usage_count));
 }
 EXPORT_SYMBOL_GPL(usb_autopm_put_interface_async);
 

From 88dcab0650fd31072ed07a0d26fce5bbbbd8e7a1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 20:59:58 +0100
Subject: [PATCH 16/65] drm/imagination: Discard pm_runtime_put() return value

The Imagination DRM driver defines pvr_power_put() to pass the return
value of pm_runtime_put() to the caller, but then it never uses the
return value of pvr_power_put().

Modify pvr_power_put() to discard the pm_runtime_put() return value and
change its return type to void.

No intentional functional impact.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/8642685.T7Z3S40VBb@rafael.j.wysocki
---
 drivers/gpu/drm/imagination/pvr_power.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/imagination/pvr_power.h b/drivers/gpu/drm/imagination/pvr_power.h
index b853d092242c..c34252bda078 100644
--- a/drivers/gpu/drm/imagination/pvr_power.h
+++ b/drivers/gpu/drm/imagination/pvr_power.h
@@ -30,12 +30,12 @@ pvr_power_get(struct pvr_device *pvr_dev)
 	return pm_runtime_resume_and_get(drm_dev->dev);
 }
 
-static __always_inline int
+static __always_inline void
 pvr_power_put(struct pvr_device *pvr_dev)
 {
 	struct drm_device *drm_dev = from_pvr_device(pvr_dev);
 
-	return pm_runtime_put(drm_dev->dev);
+	pm_runtime_put(drm_dev->dev);
 }
 
 int pvr_power_domains_init(struct pvr_device *pvr_dev);

From 0cc7933cbec80900bdbe658b72e2ba99187fe628 Mon Sep 17 00:00:00 2001
From: Andreas Kemnade <andreas@kemnade.info>
Date: Thu, 8 Jan 2026 09:26:12 +0100
Subject: [PATCH 17/65] cpufreq: omap: remove driver

The omap-cpufreq driver is not used in the corresponding defconfigs.
The pseudo platform device to use it was removed by
commit cb6675d6a868 ("ARM: OMAP2+: Remove legacy PM init")
10 years ago.

Checking if there is any need to reactivate it:
For omap3, dra7 there is ti-cpufreq to create cpufreq-dt device
For omap2/4/5 there is cpufreq-dt-plat to create cpufreq-dt device.
For omap1 this driver cannot be selected at all.

So no users, no need to reactivate the driver somehow. So remove it.

Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Link: https://patch.msgid.link/20260108-omap-cpufreq-removal-v1-1-8fe42f130f48@kemnade.info
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/Kconfig.arm    |   5 -
 drivers/cpufreq/Makefile       |   1 -
 drivers/cpufreq/omap-cpufreq.c | 195 ---------------------------------
 3 files changed, 201 deletions(-)
 delete mode 100644 drivers/cpufreq/omap-cpufreq.c

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 9be0503df55a..4014bc9dd73a 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -141,11 +141,6 @@ config ARM_MEDIATEK_CPUFREQ_HW
 	  The driver implements the cpufreq interface for this HW engine.
 	  Say Y if you want to support CPUFreq HW.
 
-config ARM_OMAP2PLUS_CPUFREQ
-	bool "TI OMAP2+"
-	depends on ARCH_OMAP2PLUS || COMPILE_TEST
-	default ARCH_OMAP2PLUS
-
 config ARM_QCOM_CPUFREQ_NVMEM
 	tristate "Qualcomm nvmem based CPUFreq"
 	depends on ARCH_QCOM || COMPILE_TEST
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 681d687b5a18..385c9fcc65c6 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -69,7 +69,6 @@ obj-$(CONFIG_ARM_KIRKWOOD_CPUFREQ)	+= kirkwood-cpufreq.o
 obj-$(CONFIG_ARM_MEDIATEK_CPUFREQ)	+= mediatek-cpufreq.o
 obj-$(CONFIG_ARM_MEDIATEK_CPUFREQ_HW)	+= mediatek-cpufreq-hw.o
 obj-$(CONFIG_MACH_MVEBU_V7)		+= mvebu-cpufreq.o
-obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)	+= omap-cpufreq.o
 obj-$(CONFIG_ARM_PXA2xx_CPUFREQ)	+= pxa2xx-cpufreq.o
 obj-$(CONFIG_PXA3xx)			+= pxa3xx-cpufreq.o
 obj-$(CONFIG_ARM_QCOM_CPUFREQ_HW)	+= qcom-cpufreq-hw.o
diff --git a/drivers/cpufreq/omap-cpufreq.c b/drivers/cpufreq/omap-cpufreq.c
deleted file mode 100644
index bbb01d93b54b..000000000000
--- a/drivers/cpufreq/omap-cpufreq.c
+++ /dev/null
@@ -1,195 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- *  CPU frequency scaling for OMAP using OPP information
- *
- *  Copyright (C) 2005 Nokia Corporation
- *  Written by Tony Lindgren <tony@atomide.com>
- *
- *  Based on cpu-sa1110.c, Copyright (C) 2001 Russell King
- *
- * Copyright (C) 2007-2011 Texas Instruments, Inc.
- * - OMAP3/4 support by Rajendra Nayak, Santosh Shilimkar
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/sched.h>
-#include <linux/cpufreq.h>
-#include <linux/delay.h>
-#include <linux/init.h>
-#include <linux/err.h>
-#include <linux/clk.h>
-#include <linux/io.h>
-#include <linux/pm_opp.h>
-#include <linux/cpu.h>
-#include <linux/module.h>
-#include <linux/platform_device.h>
-#include <linux/regulator/consumer.h>
-
-/* OPP tolerance in percentage */
-#define	OPP_TOLERANCE	4
-
-static struct cpufreq_frequency_table *freq_table;
-static atomic_t freq_table_users = ATOMIC_INIT(0);
-static struct device *mpu_dev;
-static struct regulator *mpu_reg;
-
-static int omap_target(struct cpufreq_policy *policy, unsigned int index)
-{
-	int r, ret;
-	struct dev_pm_opp *opp;
-	unsigned long freq, volt = 0, volt_old = 0, tol = 0;
-	unsigned int old_freq, new_freq;
-
-	old_freq = policy->cur;
-	new_freq = freq_table[index].frequency;
-
-	freq = new_freq * 1000;
-	ret = clk_round_rate(policy->clk, freq);
-	if (ret < 0) {
-		dev_warn(mpu_dev,
-			 "CPUfreq: Cannot find matching frequency for %lu\n",
-			 freq);
-		return ret;
-	}
-	freq = ret;
-
-	if (mpu_reg) {
-		opp = dev_pm_opp_find_freq_ceil(mpu_dev, &freq);
-		if (IS_ERR(opp)) {
-			dev_err(mpu_dev, "%s: unable to find MPU OPP for %d\n",
-				__func__, new_freq);
-			return -EINVAL;
-		}
-		volt = dev_pm_opp_get_voltage(opp);
-		dev_pm_opp_put(opp);
-		tol = volt * OPP_TOLERANCE / 100;
-		volt_old = regulator_get_voltage(mpu_reg);
-	}
-
-	dev_dbg(mpu_dev, "cpufreq-omap: %u MHz, %ld mV --> %u MHz, %ld mV\n", 
-		old_freq / 1000, volt_old ? volt_old / 1000 : -1,
-		new_freq / 1000, volt ? volt / 1000 : -1);
-
-	/* scaling up?  scale voltage before frequency */
-	if (mpu_reg && (new_freq > old_freq)) {
-		r = regulator_set_voltage(mpu_reg, volt - tol, volt + tol);
-		if (r < 0) {
-			dev_warn(mpu_dev, "%s: unable to scale voltage up.\n",
-				 __func__);
-			return r;
-		}
-	}
-
-	ret = clk_set_rate(policy->clk, new_freq * 1000);
-
-	/* scaling down?  scale voltage after frequency */
-	if (mpu_reg && (new_freq < old_freq)) {
-		r = regulator_set_voltage(mpu_reg, volt - tol, volt + tol);
-		if (r < 0) {
-			dev_warn(mpu_dev, "%s: unable to scale voltage down.\n",
-				 __func__);
-			clk_set_rate(policy->clk, old_freq * 1000);
-			return r;
-		}
-	}
-
-	return ret;
-}
-
-static inline void freq_table_free(void)
-{
-	if (atomic_dec_and_test(&freq_table_users))
-		dev_pm_opp_free_cpufreq_table(mpu_dev, &freq_table);
-}
-
-static int omap_cpu_init(struct cpufreq_policy *policy)
-{
-	int result;
-
-	policy->clk = clk_get(NULL, "cpufreq_ck");
-	if (IS_ERR(policy->clk))
-		return PTR_ERR(policy->clk);
-
-	if (!freq_table) {
-		result = dev_pm_opp_init_cpufreq_table(mpu_dev, &freq_table);
-		if (result) {
-			dev_err(mpu_dev,
-				"%s: cpu%d: failed creating freq table[%d]\n",
-				__func__, policy->cpu, result);
-			clk_put(policy->clk);
-			return result;
-		}
-	}
-
-	atomic_inc_return(&freq_table_users);
-
-	/* FIXME: what's the actual transition time? */
-	cpufreq_generic_init(policy, freq_table, 300 * 1000);
-
-	return 0;
-}
-
-static void omap_cpu_exit(struct cpufreq_policy *policy)
-{
-	freq_table_free();
-	clk_put(policy->clk);
-}
-
-static struct cpufreq_driver omap_driver = {
-	.flags		= CPUFREQ_NEED_INITIAL_FREQ_CHECK,
-	.verify		= cpufreq_generic_frequency_table_verify,
-	.target_index	= omap_target,
-	.get		= cpufreq_generic_get,
-	.init		= omap_cpu_init,
-	.exit		= omap_cpu_exit,
-	.register_em	= cpufreq_register_em_with_opp,
-	.name		= "omap",
-};
-
-static int omap_cpufreq_probe(struct platform_device *pdev)
-{
-	mpu_dev = get_cpu_device(0);
-	if (!mpu_dev) {
-		pr_warn("%s: unable to get the MPU device\n", __func__);
-		return -EINVAL;
-	}
-
-	mpu_reg = regulator_get(mpu_dev, "vcc");
-	if (IS_ERR(mpu_reg)) {
-		pr_warn("%s: unable to get MPU regulator\n", __func__);
-		mpu_reg = NULL;
-	} else {
-		/* 
-		 * Ensure physical regulator is present.
-		 * (e.g. could be dummy regulator.)
-		 */
-		if (regulator_get_voltage(mpu_reg) < 0) {
-			pr_warn("%s: physical regulator not present for MPU\n",
-				__func__);
-			regulator_put(mpu_reg);
-			mpu_reg = NULL;
-		}
-	}
-
-	return cpufreq_register_driver(&omap_driver);
-}
-
-static void omap_cpufreq_remove(struct platform_device *pdev)
-{
-	cpufreq_unregister_driver(&omap_driver);
-}
-
-static struct platform_driver omap_cpufreq_platdrv = {
-	.driver = {
-		.name	= "omap-cpufreq",
-	},
-	.probe		= omap_cpufreq_probe,
-	.remove		= omap_cpufreq_remove,
-};
-module_platform_driver(omap_cpufreq_platdrv);
-
-MODULE_DESCRIPTION("cpufreq driver for OMAP SoCs");
-MODULE_LICENSE("GPL");

From 80b49829ba1776d3593998293d457397e349b765 Mon Sep 17 00:00:00 2001
From: Andreas Kemnade <andreas@kemnade.info>
Date: Thu, 8 Jan 2026 09:26:13 +0100
Subject: [PATCH 18/65] MAINTAINERS: remove omap-cpufreq

Remove entry for omap-cpufreq, since it is removed.

Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Link: https://patch.msgid.link/20260108-omap-cpufreq-removal-v1-2-8fe42f130f48@kemnade.info
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5b11839cba9d..2f950a4c9fac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19129,7 +19129,6 @@ M:	Kevin Hilman <khilman@kernel.org>
 L:	linux-omap@vger.kernel.org
 S:	Maintained
 F:	arch/arm/*omap*/*pm*
-F:	drivers/cpufreq/omap-cpufreq.c
 
 OMAP POWERDOMAIN SOC ADAPTATION LAYER SUPPORT
 M:	Paul Walmsley <paul@pwsan.com>

From c9f7b0e6b903a68780684c30773e3b591b10deaa Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:03:25 +0100
Subject: [PATCH 19/65] media: ccs: Discard pm_runtime_put() return value

Passing the pm_runtime_put() return value to callers is not particularly
useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel is
configured with CONFIG_PM unset.

Accordingly, update ccs_post_streamoff() to simply discard the return
value of pm_runtime_put() and always return success to the caller.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://patch.msgid.link/22966634.EfDdHjke4D@rafael.j.wysocki
---
 drivers/media/i2c/ccs/ccs-core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/media/i2c/ccs/ccs-core.c b/drivers/media/i2c/ccs/ccs-core.c
index f8523140784c..0d7b922fd4c4 100644
--- a/drivers/media/i2c/ccs/ccs-core.c
+++ b/drivers/media/i2c/ccs/ccs-core.c
@@ -1974,7 +1974,9 @@ static int ccs_post_streamoff(struct v4l2_subdev *subdev)
 	struct ccs_sensor *sensor = to_ccs_sensor(subdev);
 	struct i2c_client *client = v4l2_get_subdevdata(&sensor->src->sd);
 
-	return pm_runtime_put(&client->dev);
+	pm_runtime_put(&client->dev);
+
+	return 0;
 }
 
 static int ccs_enum_mbus_code(struct v4l2_subdev *subdev,

From f52defa7b830abbba6b26df503ca42c0b2f20abe Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:07:46 +0100
Subject: [PATCH 20/65] watchdog: rz: Discard pm_runtime_put() return values

Failing a watchdog stop due to pm_runtime_put() returning a negative
value is not particularly useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel is
configured with CONFIG_PM unset.

Accordingly, update rzg2l_wdt_stop() and rzv2h_wdt_stop() to simply
discard the return value of pm_runtime_put().

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/3340071.5fSG56mABF@rafael.j.wysocki
---
 drivers/watchdog/rzg2l_wdt.c | 4 +---
 drivers/watchdog/rzv2h_wdt.c | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
index 1c9aa366d0a0..509f9dffdacd 100644
--- a/drivers/watchdog/rzg2l_wdt.c
+++ b/drivers/watchdog/rzg2l_wdt.c
@@ -132,9 +132,7 @@ static int rzg2l_wdt_stop(struct watchdog_device *wdev)
 	if (ret)
 		return ret;
 
-	ret = pm_runtime_put(wdev->parent);
-	if (ret < 0)
-		return ret;
+	pm_runtime_put(wdev->parent);
 
 	return 0;
 }
diff --git a/drivers/watchdog/rzv2h_wdt.c b/drivers/watchdog/rzv2h_wdt.c
index a694786837e1..f93647934db7 100644
--- a/drivers/watchdog/rzv2h_wdt.c
+++ b/drivers/watchdog/rzv2h_wdt.c
@@ -174,9 +174,7 @@ static int rzv2h_wdt_stop(struct watchdog_device *wdev)
 	if (priv->of_data->wdtdcr)
 		rzt2h_wdt_wdtdcr_count_stop(priv);
 
-	ret = pm_runtime_put(wdev->parent);
-	if (ret < 0)
-		return ret;
+	pm_runtime_put(wdev->parent);
 
 	return 0;
 }

From 7b8de72b4001a7e2071c69b6bcc95ac21ca01094 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:09:22 +0100
Subject: [PATCH 21/65] watchdog: rzv2h_wdt: Discard pm_runtime_put() return
 value

Failing device probe due to pm_runtime_put() returning an error is not
particularly useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel is
configured with CONFIG_PM unset.

Accordingly, update rzt2h_wdt_wdtdcr_init() to simply discard the return
value of pm_runtime_put() and return success to the caller after
invoking that function.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/1867890.VLH7GnMWUR@rafael.j.wysocki
---
 drivers/watchdog/rzv2h_wdt.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/watchdog/rzv2h_wdt.c b/drivers/watchdog/rzv2h_wdt.c
index f93647934db7..3b6abb66a1da 100644
--- a/drivers/watchdog/rzv2h_wdt.c
+++ b/drivers/watchdog/rzv2h_wdt.c
@@ -268,9 +268,7 @@ static int rzt2h_wdt_wdtdcr_init(struct platform_device *pdev,
 
 	rzt2h_wdt_wdtdcr_count_stop(priv);
 
-	ret = pm_runtime_put(&pdev->dev);
-	if (ret < 0)
-		return ret;
+	pm_runtime_put(&pdev->dev);
 
 	return 0;
 }

From d33976be6cecfe340a52b365ecf706a0c55d543d Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:24:19 +0100
Subject: [PATCH 22/65] hwspinlock: omap: Discard pm_runtime_put() return value

Failing driver probe due to pm_runtime_put() returning a negative value
is not particularly useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel
has been configured with CONFIG_PM unset.

Accordingly, update omap_hwspinlock_probe() to simply discard the
return value of pm_runtime_put().

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Andersson <andersson@kernel.org>
Link: https://patch.msgid.link/883243465.0ifERbkFSE@rafael.j.wysocki
---
 drivers/hwspinlock/omap_hwspinlock.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/hwspinlock/omap_hwspinlock.c b/drivers/hwspinlock/omap_hwspinlock.c
index 27b47b8623c0..3a9a5678737b 100644
--- a/drivers/hwspinlock/omap_hwspinlock.c
+++ b/drivers/hwspinlock/omap_hwspinlock.c
@@ -101,9 +101,7 @@ static int omap_hwspinlock_probe(struct platform_device *pdev)
 	 * runtime PM will make sure the clock of this module is
 	 * enabled again iff at least one lock is requested
 	 */
-	ret = pm_runtime_put(&pdev->dev);
-	if (ret < 0)
-		return ret;
+	pm_runtime_put(&pdev->dev);
 
 	/* one of the four lsb's must be set, and nothing else */
 	if (hweight_long(i & 0xf) != 1 || i > 8)

From 01eafccacc707da2db2a9eb4be56c9367e42323f Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:25:57 +0100
Subject: [PATCH 23/65] coresight: Discard pm_runtime_put() return values

Failing a debugfs write due to pm_runtime_put() returning a negative
value is not particularly useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel
has been configured with CONFIG_PM unset, in which case
debug_disable_func() in the coresight driver will always return an
error.

For this reason, update debug_disable_func() to simply discard the
return value of pm_runtime_put(), change its return type to void, and
propagate that change to debug_func_knob_write().

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://patch.msgid.link/2058657.yKVeVyVuyW@rafael.j.wysocki
---
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 5f21366406aa..629614278e46 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -451,10 +451,10 @@ static int debug_enable_func(void)
 	return ret;
 }
 
-static int debug_disable_func(void)
+static void debug_disable_func(void)
 {
 	struct debug_drvdata *drvdata;
-	int cpu, ret, err = 0;
+	int cpu;
 
 	/*
 	 * Disable debug power domains, records the error and keep
@@ -466,12 +466,8 @@ static int debug_disable_func(void)
 		if (!drvdata)
 			continue;
 
-		ret = pm_runtime_put(drvdata->dev);
-		if (ret < 0)
-			err = ret;
+		pm_runtime_put(drvdata->dev);
 	}
-
-	return err;
 }
 
 static ssize_t debug_func_knob_write(struct file *f,
@@ -492,7 +488,7 @@ static ssize_t debug_func_knob_write(struct file *f,
 	if (val)
 		ret = debug_enable_func();
 	else
-		ret = debug_disable_func();
+		debug_disable_func();
 
 	if (ret) {
 		pr_err("%s: unable to %s debug function: %d\n",

From 6401e43479a809b7a5a930d76c363f4b5705ed00 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:27:44 +0100
Subject: [PATCH 24/65] platform/chrome: cros_hps_i2c: Discard pm_runtime_put()
 return value

Passing pm_runtime_put() return value to the callers is not particularly
useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel is
configured with CONFIG_PM unset.

Accordingly, update hps_release() to simply discard the return value of
pm_runtime_put() and always return success to the caller.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/2302270.NgBsaNRSFp@rafael.j.wysocki
---
 drivers/platform/chrome/cros_hps_i2c.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/chrome/cros_hps_i2c.c b/drivers/platform/chrome/cros_hps_i2c.c
index 6b479cfe3f73..ac6498c593e3 100644
--- a/drivers/platform/chrome/cros_hps_i2c.c
+++ b/drivers/platform/chrome/cros_hps_i2c.c
@@ -46,7 +46,9 @@ static int hps_release(struct inode *inode, struct file *file)
 					       struct hps_drvdata, misc_device);
 	struct device *dev = &hps->client->dev;
 
-	return pm_runtime_put(dev);
+	pm_runtime_put(dev);
+
+	return 0;
 }
 
 static const struct file_operations hps_fops = {

From bf91b35a46ceef08a1e64c54b0e611fcae531e7a Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 22 Dec 2025 21:31:45 +0100
Subject: [PATCH 25/65] scsi: ufs: core: Discard pm_runtime_put() return values

The ufshcd driver defines ufshcd_rpm_put() to return an int, but that
return value is never used.  It also passes the return value of
pm_runtime_put() to the caller which is not very useful.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.

Modify ufshcd_rpm_put() to discard the pm_runtime_put() return value
and change its return type to void.

No intentional functional impact.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://patch.msgid.link/2781685.BddDVKsqQX@rafael.j.wysocki
---
 drivers/ufs/core/ufshcd-priv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
index 4259f499382f..27b18b0cc058 100644
--- a/drivers/ufs/core/ufshcd-priv.h
+++ b/drivers/ufs/core/ufshcd-priv.h
@@ -348,9 +348,9 @@ static inline int ufshcd_rpm_resume(struct ufs_hba *hba)
 	return pm_runtime_resume(&hba->ufs_device_wlun->sdev_gendev);
 }
 
-static inline int ufshcd_rpm_put(struct ufs_hba *hba)
+static inline void ufshcd_rpm_put(struct ufs_hba *hba)
 {
-	return pm_runtime_put(&hba->ufs_device_wlun->sdev_gendev);
+	pm_runtime_put(&hba->ufs_device_wlun->sdev_gendev);
 }
 
 /**

From fcbd7897b871e157ee5c595e950c8466d86c0cd5 Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao@debian.org>
Date: Mon, 5 Jan 2026 06:37:06 -0800
Subject: [PATCH 26/65] cpuidle: menu: Remove incorrect unlikely() annotation

The unlikely() annotation on the early-return condition in menu_select()
is incorrect on systems with only one idle state (e.g., ARM64 servers
with a single ACPI LPI state). Branch profiling shows 100% misprediction
on such systems since drv->state_count <= 1 is always true.

On platforms where only state0 is available, this path is the common
case, not an unlikely edge case. Remove the misleading annotation to
let the branch predictor learn the actual behavior.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260105-annotated_idle-v1-1-10ddf0771b58@debian.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/menu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 64d6f7a1c776..ef9c5a84643e 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -271,7 +271,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		data->bucket = BUCKETS - 1;
 	}
 
-	if (unlikely(drv->state_count <= 1 || latency_req == 0) ||
+	if (drv->state_count <= 1 || latency_req == 0 ||
 	    ((data->next_timer_ns < drv->states[1].target_residency_ns ||
 	      latency_req < drv->states[1].exit_latency_ns) &&
 	     !dev->states_usage[0].disable)) {

From fd0d2872dc53fe55f66842767e952457348b8d18 Mon Sep 17 00:00:00 2001
From: Christian Loehle <christian.loehle@arm.com>
Date: Tue, 6 Jan 2026 13:36:53 +0000
Subject: [PATCH 27/65] MAINTAINERS: Add myself as cpuidle reviewer

I've been reviewing cpuidle changes, for governors in particular, for
the last couple of years and will continue to do so.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/71f63cb7-2d9b-49a3-9b04-a47e2edef5e0@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 765ad2daa218..ea1d4c85b865 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6554,6 +6554,7 @@ F:	rust/kernel/cpu.rs
 CPU IDLE TIME MANAGEMENT FRAMEWORK
 M:	"Rafael J. Wysocki" <rafael@kernel.org>
 M:	Daniel Lezcano <daniel.lezcano@linaro.org>
+R:	Christian Loehle <christian.loehle@arm.com>
 L:	linux-pm@vger.kernel.org
 S:	Maintained
 B:	https://bugzilla.kernel.org

From 07e5e811f86dcd6f595c3bbd71cde294e8545889 Mon Sep 17 00:00:00 2001
From: Sumeet Pawnikar <sumeet4linux@gmail.com>
Date: Sun, 11 Jan 2026 19:42:36 +0530
Subject: [PATCH 28/65] powercap: Replace sprintf() with sysfs_emit() in sysfs
 show functions

Replace all sprintf() calls with sysfs_emit() in sysfs show functions.

sysfs_emit() is preferred over sprintf() for formatting sysfs output
as it provides better bounds checking and prevents potential buffer
overflows.

Also, replace sprintf() with sysfs_emit() in show_constraint_name()
and simplify the code by removing the redundant strlen() call since
sysfs_emit() returns the length.

Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com>
Link: https://patch.msgid.link/20260111141237.12340-1-sumeet4linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/powercap/powercap_sys.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
index 1ff369880beb..f3b2ae635305 100644
--- a/drivers/powercap/powercap_sys.c
+++ b/drivers/powercap/powercap_sys.c
@@ -27,7 +27,7 @@ static ssize_t _attr##_show(struct device *dev, \
 	\
 	if (power_zone->ops->get_##_attr) { \
 		if (!power_zone->ops->get_##_attr(power_zone, &value)) \
-			len = sprintf(buf, "%lld\n", value); \
+			len = sysfs_emit(buf, "%lld\n", value); \
 	} \
 	\
 	return len; \
@@ -75,7 +75,7 @@ static ssize_t show_constraint_##_attr(struct device *dev, \
 	pconst = &power_zone->constraints[id]; \
 	if (pconst && pconst->ops && pconst->ops->get_##_attr) { \
 		if (!pconst->ops->get_##_attr(power_zone, id, &value)) \
-			len = sprintf(buf, "%lld\n", value); \
+			len = sysfs_emit(buf, "%lld\n", value); \
 	} \
 	\
 	return len; \
@@ -171,9 +171,8 @@ static ssize_t show_constraint_name(struct device *dev,
 	if (pconst && pconst->ops && pconst->ops->get_name) {
 		name = pconst->ops->get_name(power_zone, id);
 		if (name) {
-			sprintf(buf, "%.*s\n", POWERCAP_CONSTRAINT_NAME_LEN - 1,
-				name);
-			len = strlen(buf);
+			len = sysfs_emit(buf, "%.*s\n",
+					 POWERCAP_CONSTRAINT_NAME_LEN - 1, name);
 		}
 	}
 
@@ -350,7 +349,7 @@ static ssize_t name_show(struct device *dev,
 {
 	struct powercap_zone *power_zone = to_powercap_zone(dev);
 
-	return sprintf(buf, "%s\n", power_zone->name);
+	return sysfs_emit(buf, "%s\n", power_zone->name);
 }
 
 static DEVICE_ATTR_RO(name);
@@ -438,7 +437,7 @@ static ssize_t enabled_show(struct device *dev,
 				mode = false;
 	}
 
-	return sprintf(buf, "%d\n", mode);
+	return sysfs_emit(buf, "%d\n", mode);
 }
 
 static ssize_t enabled_store(struct device *dev,

From 54b3cd55a515c7c0fcfa0c1f0b10d62c11d64bcc Mon Sep 17 00:00:00 2001
From: Daniel Tang <danielzgtg.opensource@gmail.com>
Date: Wed, 14 Jan 2026 21:01:52 -0500
Subject: [PATCH 29/65] powercap: intel_rapl: Add PL4 support for Ice Lake

Microsoft Surface Pro 7 firmware throttles the processor upon
boot/resume. Userspace needs to be able to restore the correct value.

Link: https://github.com/linux-surface/linux-surface/issues/706
Signed-off-by: Daniel Tang <danielzgtg.opensource@gmail.com>
Link: https://patch.msgid.link/6088605.ChMirdbgyp@daniel-desktop3
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/powercap/intel_rapl_msr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c
index 9a7e150b3536..a2bc0a9c1e10 100644
--- a/drivers/powercap/intel_rapl_msr.c
+++ b/drivers/powercap/intel_rapl_msr.c
@@ -162,6 +162,7 @@ static int rapl_msr_write_raw(int cpu, struct reg_action *ra)
 
 /* List of verified CPUs. */
 static const struct x86_cpu_id pl4_support_ids[] = {
+	X86_MATCH_VFM(INTEL_ICELAKE_L, NULL),
 	X86_MATCH_VFM(INTEL_TIGERLAKE_L, NULL),
 	X86_MATCH_VFM(INTEL_ALDERLAKE, NULL),
 	X86_MATCH_VFM(INTEL_ALDERLAKE_L, NULL),

From e9df6eba060c6db2f7f3fd8666d1af0a369d6f7b Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Thu, 8 Jan 2026 16:05:37 +0100
Subject: [PATCH 30/65] genirq/chip: Change irq_chip_pm_put() return type to
 void

The irq_chip_pm_put() return value is only used in __irq_do_set_handler()
to trigger a WARN_ON() if it is negative, but doing so is not useful
because irq_chip_pm_put() simply passes the pm_runtime_put() return value
to its callers.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.

For this reason, modify irq_chip_pm_put() to discard the pm_runtime_put()
return value, change its return type to void, and drop the WARN_ON()
around the irq_chip_pm_put() invocation from __irq_do_set_handler().
Also update the irq_chip_pm_put() kerneldoc comment to be more accurate.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/5075294.31r3eYUQgx@rafael.j.wysocki
---
 include/linux/irq.h |  2 +-
 kernel/irq/chip.c   | 22 +++++++++++-----------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 4a9f1d7b08c3..ef0816fdc6f2 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -658,7 +658,7 @@ extern void handle_fasteoi_nmi(struct irq_desc *desc);
 
 extern int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg);
 extern int irq_chip_pm_get(struct irq_data *data);
-extern int irq_chip_pm_put(struct irq_data *data);
+extern void irq_chip_pm_put(struct irq_data *data);
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
 extern void handle_fasteoi_ack_irq(struct irq_desc *desc);
 extern void handle_fasteoi_mask_irq(struct irq_desc *desc);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 678f094d261a..23f22f3d5207 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -974,7 +974,7 @@ __irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle,
 		irq_state_set_disabled(desc);
 		if (is_chained) {
 			desc->action = NULL;
-			WARN_ON(irq_chip_pm_put(irq_desc_get_irq_data(desc)));
+			irq_chip_pm_put(irq_desc_get_irq_data(desc));
 		}
 		desc->depth = 1;
 	}
@@ -1530,20 +1530,20 @@ int irq_chip_pm_get(struct irq_data *data)
 }
 
 /**
- * irq_chip_pm_put - Disable power for an IRQ chip
+ * irq_chip_pm_put - Drop a PM reference on an IRQ chip
  * @data:	Pointer to interrupt specific data
  *
- * Disable the power to the IRQ chip referenced by the interrupt data
- * structure, belongs. Note that power will only be disabled, once this
- * function has been called for all IRQs that have called irq_chip_pm_get().
+ * Drop a power management reference, acquired via irq_chip_pm_get(), on the IRQ
+ * chip represented by the interrupt data structure.
+ *
+ * Note that this will not disable power to the IRQ chip until this function
+ * has been called for all IRQs that have called irq_chip_pm_get() and it may
+ * not disable power at all (if user space prevents that, for example).
  */
-int irq_chip_pm_put(struct irq_data *data)
+void irq_chip_pm_put(struct irq_data *data)
 {
 	struct device *dev = irq_get_pm_device(data);
-	int retval = 0;
 
-	if (IS_ENABLED(CONFIG_PM) && dev)
-		retval = pm_runtime_put(dev);
-
-	return (retval < 0) ? retval : 0;
+	if (dev)
+		pm_runtime_put(dev);
 }

From 75e8635832a2e45d2a910c247eddd6b65d5ce6e1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Thu, 8 Jan 2026 16:17:17 +0100
Subject: [PATCH 31/65] drm: Discard pm_runtime_put() return value

Multiple DRM drivers use the pm_runtime_put() return value for printing
debug or even error messages and all of those messages are at least
somewhat misleading.

Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example.  It also happens when the kernel
has been configured with CONFIG_PM unset.

For this reason, modify all of those drivers to simply discard the
pm_runtime_put() return value which is what they should be doing.

This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patch.msgid.link/2256082.irdbgypaU6@rafael.j.wysocki
---
 drivers/gpu/drm/arm/malidp_crtc.c                   |  6 +-----
 drivers/gpu/drm/bridge/imx/imx8qm-ldb.c             |  4 +---
 drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c            |  4 +---
 drivers/gpu/drm/bridge/imx/imx8qxp-pixel-combiner.c |  5 +----
 drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c        |  5 +----
 drivers/gpu/drm/imx/dc/dc-crtc.c                    | 12 +++---------
 drivers/gpu/drm/vc4/vc4_hdmi.c                      |  5 +----
 drivers/gpu/drm/vc4/vc4_vec.c                       | 12 ++----------
 8 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/arm/malidp_crtc.c b/drivers/gpu/drm/arm/malidp_crtc.c
index d72c22dcf685..e61cf362abdf 100644
--- a/drivers/gpu/drm/arm/malidp_crtc.c
+++ b/drivers/gpu/drm/arm/malidp_crtc.c
@@ -77,7 +77,6 @@ static void malidp_crtc_atomic_disable(struct drm_crtc *crtc,
 									 crtc);
 	struct malidp_drm *malidp = crtc_to_malidp_device(crtc);
 	struct malidp_hw_device *hwdev = malidp->dev;
-	int err;
 
 	/* always disable planes on the CRTC that is being turned off */
 	drm_atomic_helper_disable_planes_on_crtc(old_state, false);
@@ -87,10 +86,7 @@ static void malidp_crtc_atomic_disable(struct drm_crtc *crtc,
 
 	clk_disable_unprepare(hwdev->pxlclk);
 
-	err = pm_runtime_put(crtc->dev->dev);
-	if (err < 0) {
-		DRM_DEBUG_DRIVER("Failed to disable runtime power management: %d\n", err);
-	}
+	pm_runtime_put(crtc->dev->dev);
 }
 
 static const struct gamma_curve_segment {
diff --git a/drivers/gpu/drm/bridge/imx/imx8qm-ldb.c b/drivers/gpu/drm/bridge/imx/imx8qm-ldb.c
index 47aa65938e6a..fc67e7ed653d 100644
--- a/drivers/gpu/drm/bridge/imx/imx8qm-ldb.c
+++ b/drivers/gpu/drm/bridge/imx/imx8qm-ldb.c
@@ -280,9 +280,7 @@ static void imx8qm_ldb_bridge_atomic_disable(struct drm_bridge *bridge,
 	clk_disable_unprepare(imx8qm_ldb->clk_bypass);
 	clk_disable_unprepare(imx8qm_ldb->clk_pixel);
 
-	ret = pm_runtime_put(dev);
-	if (ret < 0)
-		DRM_DEV_ERROR(dev, "failed to put runtime PM: %d\n", ret);
+	pm_runtime_put(dev);
 }
 
 static const u32 imx8qm_ldb_bus_output_fmts[] = {
diff --git a/drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c b/drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c
index 122502968927..d70f3c9b3925 100644
--- a/drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c
+++ b/drivers/gpu/drm/bridge/imx/imx8qxp-ldb.c
@@ -282,9 +282,7 @@ static void imx8qxp_ldb_bridge_atomic_disable(struct drm_bridge *bridge,
 	if (is_split && companion)
 		companion->funcs->atomic_disable(companion, state);
 
-	ret = pm_runtime_put(dev);
-	if (ret < 0)
-		DRM_DEV_ERROR(dev, "failed to put runtime PM: %d\n", ret);
+	pm_runtime_put(dev);
 }
 
 static const u32 imx8qxp_ldb_bus_output_fmts[] = {
diff --git a/drivers/gpu/drm/bridge/imx/imx8qxp-pixel-combiner.c b/drivers/gpu/drm/bridge/imx/imx8qxp-pixel-combiner.c
index 8517b1c953d4..8e64b5404561 100644
--- a/drivers/gpu/drm/bridge/imx/imx8qxp-pixel-combiner.c
+++ b/drivers/gpu/drm/bridge/imx/imx8qxp-pixel-combiner.c
@@ -181,11 +181,8 @@ static void imx8qxp_pc_bridge_atomic_disable(struct drm_bridge *bridge,
 {
 	struct imx8qxp_pc_channel *ch = bridge->driver_private;
 	struct imx8qxp_pc *pc = ch->pc;
-	int ret;
 
-	ret = pm_runtime_put(pc->dev);
-	if (ret < 0)
-		DRM_DEV_ERROR(pc->dev, "failed to put runtime PM: %d\n", ret);
+	pm_runtime_put(pc->dev);
 }
 
 static const u32 imx8qxp_pc_bus_output_fmts[] = {
diff --git a/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c b/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c
index 111310acab2c..82a2bba375ad 100644
--- a/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c
+++ b/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c
@@ -127,11 +127,8 @@ static void imx8qxp_pxl2dpi_bridge_atomic_disable(struct drm_bridge *bridge,
 						  struct drm_atomic_state *state)
 {
 	struct imx8qxp_pxl2dpi *p2d = bridge->driver_private;
-	int ret;
 
-	ret = pm_runtime_put(p2d->dev);
-	if (ret < 0)
-		DRM_DEV_ERROR(p2d->dev, "failed to put runtime PM: %d\n", ret);
+	pm_runtime_put(p2d->dev);
 
 	if (p2d->companion)
 		p2d->companion->funcs->atomic_disable(p2d->companion, state);
diff --git a/drivers/gpu/drm/imx/dc/dc-crtc.c b/drivers/gpu/drm/imx/dc/dc-crtc.c
index 31d3a982deaf..608c610662dc 100644
--- a/drivers/gpu/drm/imx/dc/dc-crtc.c
+++ b/drivers/gpu/drm/imx/dc/dc-crtc.c
@@ -300,7 +300,7 @@ dc_crtc_atomic_disable(struct drm_crtc *crtc, struct drm_atomic_state *state)
 				drm_atomic_get_new_crtc_state(state, crtc);
 	struct dc_drm_device *dc_drm = to_dc_drm_device(crtc->dev);
 	struct dc_crtc *dc_crtc = to_dc_crtc(crtc);
-	int idx, ret;
+	int idx;
 
 	if (!drm_dev_enter(crtc->dev, &idx))
 		goto out;
@@ -313,16 +313,10 @@ dc_crtc_atomic_disable(struct drm_crtc *crtc, struct drm_atomic_state *state)
 	dc_fg_disable_clock(dc_crtc->fg);
 
 	/* request pixel engine power-off as plane is off too */
-	ret = pm_runtime_put(dc_drm->pe->dev);
-	if (ret)
-		dc_crtc_err(crtc, "failed to put DC pixel engine RPM: %d\n",
-			    ret);
+	pm_runtime_put(dc_drm->pe->dev);
 
 	/* request display engine power-off when CRTC is disabled */
-	ret = pm_runtime_put(dc_crtc->de->dev);
-	if (ret < 0)
-		dc_crtc_err(crtc, "failed to put DC display engine RPM: %d\n",
-			    ret);
+	pm_runtime_put(dc_crtc->de->dev);
 
 	drm_dev_exit(idx);
 
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 1798d1156d10..4504e38ce844 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -848,7 +848,6 @@ static void vc4_hdmi_encoder_post_crtc_powerdown(struct drm_encoder *encoder,
 	struct vc4_hdmi *vc4_hdmi = encoder_to_vc4_hdmi(encoder);
 	struct drm_device *drm = vc4_hdmi->connector.dev;
 	unsigned long flags;
-	int ret;
 	int idx;
 
 	mutex_lock(&vc4_hdmi->mutex);
@@ -867,9 +866,7 @@ static void vc4_hdmi_encoder_post_crtc_powerdown(struct drm_encoder *encoder,
 	clk_disable_unprepare(vc4_hdmi->pixel_bvb_clock);
 	clk_disable_unprepare(vc4_hdmi->pixel_clock);
 
-	ret = pm_runtime_put(&vc4_hdmi->pdev->dev);
-	if (ret < 0)
-		drm_err(drm, "Failed to release power domain: %d\n", ret);
+	pm_runtime_put(&vc4_hdmi->pdev->dev);
 
 	drm_dev_exit(idx);
 
diff --git a/drivers/gpu/drm/vc4/vc4_vec.c b/drivers/gpu/drm/vc4/vc4_vec.c
index b84fad2a5b23..b0b271d93b27 100644
--- a/drivers/gpu/drm/vc4/vc4_vec.c
+++ b/drivers/gpu/drm/vc4/vc4_vec.c
@@ -542,7 +542,7 @@ static void vc4_vec_encoder_disable(struct drm_encoder *encoder,
 {
 	struct drm_device *drm = encoder->dev;
 	struct vc4_vec *vec = encoder_to_vc4_vec(encoder);
-	int idx, ret;
+	int idx;
 
 	if (!drm_dev_enter(drm, &idx))
 		return;
@@ -556,17 +556,9 @@ static void vc4_vec_encoder_disable(struct drm_encoder *encoder,
 
 	clk_disable_unprepare(vec->clock);
 
-	ret = pm_runtime_put(&vec->pdev->dev);
-	if (ret < 0) {
-		drm_err(drm, "Failed to release power domain: %d\n", ret);
-		goto err_dev_exit;
-	}
+	pm_runtime_put(&vec->pdev->dev);
 
 	drm_dev_exit(idx);
-	return;
-
-err_dev_exit:
-	drm_dev_exit(idx);
 }
 
 static void vc4_vec_encoder_enable(struct drm_encoder *encoder,

From 7799ba2160e4919913ecabca8a7fc1aa4c576fb4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jo=C3=A3o=20Marcos=20Costa?= <joaomarcos.costa@bootlin.com>
Date: Tue, 13 Jan 2026 14:27:53 +0100
Subject: [PATCH 32/65] cpupower: make systemd unit installation optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

cpupower currently installs a cpupower.service unit file into unitdir
unconditionally, regardless of whether systemd is used by the host.

Improve the installation procedure by making this systemd step optional:
a 'SYSTEMD' build parameter that defaults to 'true' and can be set to
'false' to disable the installation of systemd's unit file.

Since 'SYSTEMD' defaults to true, the current behavior is kept as the
default.

Link: https://lore.kernel.org/r/20260113132753.1730020-2-joaomarcos.costa@bootlin.com
Signed-off-by: João Marcos Costa <joaomarcos.costa@bootlin.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/power/cpupower/Makefile | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/power/cpupower/Makefile b/tools/power/cpupower/Makefile
index a1df9196dc45..969716dfe8de 100644
--- a/tools/power/cpupower/Makefile
+++ b/tools/power/cpupower/Makefile
@@ -315,7 +315,17 @@ endif
 	$(INSTALL_DATA) lib/cpuidle.h $(DESTDIR)${includedir}/cpuidle.h
 	$(INSTALL_DATA) lib/powercap.h $(DESTDIR)${includedir}/powercap.h
 
-install-tools: $(OUTPUT)cpupower
+# SYSTEMD=false disables installation of the systemd unit file
+SYSTEMD ?=	true
+
+install-systemd:
+	$(INSTALL) -d $(DESTDIR)${unitdir}
+	sed 's|___CDIR___|${confdir}|; s|___LDIR___|${libexecdir}|' cpupower.service.in > '$(DESTDIR)${unitdir}/cpupower.service'
+	$(SETPERM_DATA) '$(DESTDIR)${unitdir}/cpupower.service'
+
+INSTALL_SYSTEMD := $(if $(filter true,$(strip $(SYSTEMD))),install-systemd)
+
+install-tools: $(OUTPUT)cpupower $(INSTALL_SYSTEMD)
 	$(INSTALL) -d $(DESTDIR)${bindir}
 	$(INSTALL_PROGRAM) $(OUTPUT)cpupower $(DESTDIR)${bindir}
 	$(INSTALL) -d $(DESTDIR)${bash_completion_dir}
@@ -324,9 +334,6 @@ install-tools: $(OUTPUT)cpupower
 	$(INSTALL_DATA) cpupower-service.conf '$(DESTDIR)${confdir}'
 	$(INSTALL) -d $(DESTDIR)${libexecdir}
 	$(INSTALL_PROGRAM) cpupower.sh '$(DESTDIR)${libexecdir}/cpupower'
-	$(INSTALL) -d $(DESTDIR)${unitdir}
-	sed 's|___CDIR___|${confdir}|; s|___LDIR___|${libexecdir}|' cpupower.service.in > '$(DESTDIR)${unitdir}/cpupower.service'
-	$(SETPERM_DATA) '$(DESTDIR)${unitdir}/cpupower.service'
 
 install-man:
 	$(INSTALL_DATA) -D man/cpupower.1 $(DESTDIR)${mandir}/man1/cpupower.1
@@ -406,4 +413,4 @@ help:
 	@echo  '  uninstall	  - Remove previously installed files from the dir defined by "DESTDIR"'
 	@echo  '                    cmdline or Makefile config block option (default: "")'
 
-.PHONY: all utils libcpupower update-po create-gmo install-lib install-tools install-man install-gmo install uninstall clean help
+.PHONY: all utils libcpupower update-po create-gmo install-lib install-systemd install-tools install-man install-gmo install uninstall clean help

From 80606f4eb8d7484ab7f7d6f0fd30d71e6fbcf328 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Tue, 20 Jan 2026 16:26:14 +0100
Subject: [PATCH 33/65] cpuidle: governors: menu: Always check timers with tick
 stopped

After commit 5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length()
call in some cases"), if the return value of get_typical_interval()
multiplied by NSEC_PER_USEC is not greater than RESIDENCY_THRESHOLD_NS,
the menu governor will skip computing the time till the closest timer.
If that happens when the tick has been stopped already, the selected
idle state may be too deep due to the subsequent check comparing
predicted_ns with TICK_NSEC and causing its value to be replaced with
the expected time till the closest timer, which is KTIME_MAX in that
case.  That will cause the deepest enabled idle state to be selected,
but the time till the closest timer very well may be shorter than the
target residency of that state, in which case a shallower state should
be used.

Address this by making menu_select() always compute the time till the
closest timer when the tick has been stopped.

Also move the predicted_ns check mentioned above into the branch in
which the time till the closest timer is determined because it only
needs to be done in that case.

Fixes: 5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5959091.DvuYhMxLoT@rafael.j.wysocki
---
 drivers/cpuidle/governors/menu.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index ef9c5a84643e..c6052055ba0f 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -239,7 +239,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 
 	/* Find the shortest expected idle interval. */
 	predicted_ns = get_typical_interval(data) * NSEC_PER_USEC;
-	if (predicted_ns > RESIDENCY_THRESHOLD_NS) {
+	if (predicted_ns > RESIDENCY_THRESHOLD_NS || tick_nohz_tick_stopped()) {
 		unsigned int timer_us;
 
 		/* Determine the time till the closest timer. */
@@ -259,6 +259,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 				   RESOLUTION * DECAY * NSEC_PER_USEC);
 		/* Use the lowest expected idle interval to pick the idle state. */
 		predicted_ns = min((u64)timer_us * NSEC_PER_USEC, predicted_ns);
+		/*
+		 * If the tick is already stopped, the cost of possible short
+		 * idle duration misprediction is much higher, because the CPU
+		 * may be stuck in a shallow idle state for a long time as a
+		 * result of it.  In that case, say we might mispredict and use
+		 * the known time till the closest timer event for the idle
+		 * state selection.
+		 */
+		if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
+			predicted_ns = data->next_timer_ns;
 	} else {
 		/*
 		 * Because the next timer event is not going to be determined
@@ -284,16 +294,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		return 0;
 	}
 
-	/*
-	 * If the tick is already stopped, the cost of possible short idle
-	 * duration misprediction is much higher, because the CPU may be stuck
-	 * in a shallow idle state for a long time as a result of it.  In that
-	 * case, say we might mispredict and use the known time till the closest
-	 * timer event for the idle state selection.
-	 */
-	if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
-		predicted_ns = data->next_timer_ns;
-
 	/*
 	 * Find the idle state with the lowest power while satisfying
 	 * our constraints.

From 4bd2221f231d798b01027367857d9ba2f24f6ea0 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Wed, 14 Jan 2026 20:44:04 +0100
Subject: [PATCH 34/65] cpuidle: governors: teo: Avoid selecting states with
 zero-size bins

If the last two enabled idle states have the same target residency which
is at least equal to TICK_NSEC, teo may select the next-to-last one even
though the size of that state's bin is 0, which is confusing.

Prevent that from happening by adding a target residency check to the
relevant code path.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Fixed a typo in the changelog ]
Link: https://patch.msgid.link/3033265.e9J7NaK4W3@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 81ac5fd58a1c..9820ef36a664 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -388,6 +388,15 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			while (min_idx < idx &&
 			       drv->states[min_idx].target_residency_ns < TICK_NSEC)
 				min_idx++;
+
+			/*
+			 * Avoid selecting a state with a lower index, but with
+			 * the same target residency as the current candidate
+			 * one.
+			 */
+			if (drv->states[min_idx].target_residency_ns ==
+					drv->states[idx].target_residency_ns)
+				goto constraint;
 		}
 
 		/*
@@ -410,6 +419,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		}
 	}
 
+constraint:
 	/*
 	 * If there is a latency constraint, it may be necessary to select an
 	 * idle state shallower than the current candidate one.

From 60836533b4c7b69e6cb815c87f089e39c2878acd Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Wed, 14 Jan 2026 20:44:53 +0100
Subject: [PATCH 35/65] cpuidle: governors: teo: Avoid fake intercepts produced
 by tick

Tick wakeups can lead to fake intercepts that may skew idle state
selection towards shallow states, so it is better to avoid counting
them as intercepts.

For this purpose, add a check causing teo_update() to only count
tick wakeups as intercepts if intercepts within the tick period
range are at least twice as frequent as any other events.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3404606.44csPzL39Z@rafael.j.wysocki
---
 drivers/cpuidle/governors/teo.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 9820ef36a664..5434584af040 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -239,6 +239,17 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 			cpu_data->state_bins[drv->state_count-1].hits += PULSE;
 			return;
 		}
+		/*
+		 * If intercepts within the tick period range are not frequent
+		 * enough, count this wakeup as a hit, since it is likely that
+		 * the tick has woken up the CPU because an expected intercept
+		 * was not there.  Otherwise, one of the intercepts may have
+		 * been incidentally preceded by the tick wakeup.
+		 */
+		if (3 * cpu_data->tick_intercepts < 2 * total) {
+			cpu_data->state_bins[idx_timer].hits += PULSE;
+			return;
+		}
 	}
 
 	/*

From 475ca3470b3739150720f1b285646de38103e7b7 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Wed, 14 Jan 2026 20:45:30 +0100
Subject: [PATCH 36/65] cpuidle: governors: teo: Refine tick_intercepts vs
 total events check

Use 2/3 as the proportion coefficient in the check comparing
cpu_data->tick_intercepts with cpu_data->total because it is close
enough to the current one (5/8) and it allows of more straightforward
interpretation (on average, intercepts within the tick period length
are twice as frequent as other events).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/10793374.nUPlyArG6x@rafael.j.wysocki
---
 drivers/cpuidle/governors/teo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 5434584af040..750ab0678a77 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -485,7 +485,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	 * total wakeup events, do not stop the tick.
 	 */
 	if (drv->states[idx].target_residency_ns < TICK_NSEC &&
-	    cpu_data->tick_intercepts > cpu_data->total / 2 + cpu_data->total / 8)
+	    3 * cpu_data->tick_intercepts >= 2 * cpu_data->total)
 		duration_ns = TICK_NSEC / 2;
 
 end:

From 0b7277e02dabba2a9921a7f4761ae6e627e7297a Mon Sep 17 00:00:00 2001
From: Aleks Todorov <aleksbgbg@google.com>
Date: Fri, 23 Jan 2026 14:03:44 +0000
Subject: [PATCH 37/65] OPP: Return correct value in dev_pm_opp_get_level

Commit 073d3d2ca7d4 ("OPP: Level zero is valid") modified the
documentation for this function to indicate that errors should return a
non-zero value to avoid colliding with the OPP level zero, however
forgot to actually update the return.

No in-tree kernel code depends on the error value being 0.

Fixes: 073d3d2ca7d4 ("OPP: Level zero is valid")
Signed-off-by: Aleks Todorov <aleksbgbg@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/opp/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index dbebb8c829bc..ae43c656f108 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -241,7 +241,7 @@ unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp)
 {
 	if (IS_ERR_OR_NULL(opp) || !opp->available) {
 		pr_err("%s: Invalid parameters\n", __func__);
-		return 0;
+		return U32_MAX;
 	}
 
 	return opp->level;

From 8c8b12a55614ea05953e8d695e700e6e1322a05d Mon Sep 17 00:00:00 2001
From: Alexandre Courbot <acourbot@nvidia.com>
Date: Fri, 28 Nov 2025 11:11:39 +0900
Subject: [PATCH 38/65] rust: cpufreq: always inline functions using
 build_assert with arguments

`build_assert` relies on the compiler to optimize out its error path.
Functions using it with its arguments must thus always be inlined,
otherwise the error path of `build_assert` might not be optimized out,
triggering a build error.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 rust/kernel/cpufreq.rs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
index f968fbd22890..0879a79485f8 100644
--- a/rust/kernel/cpufreq.rs
+++ b/rust/kernel/cpufreq.rs
@@ -1015,6 +1015,8 @@ impl<T: Driver> Registration<T> {
         ..pin_init::zeroed()
     };
 
+    // Always inline to optimize out error path of `build_assert`.
+    #[inline(always)]
     const fn copy_name(name: &'static CStr) -> [c_char; CPUFREQ_NAME_LEN] {
         let src = name.to_bytes_with_nul();
         let mut dst = [0; CPUFREQ_NAME_LEN];

From 9d84fd86d9ce26be72f1cf6839a9335005734d4f Mon Sep 17 00:00:00 2001
From: Alice Ryhl <aliceryhl@google.com>
Date: Tue, 2 Dec 2025 19:37:35 +0000
Subject: [PATCH 39/65] rust: cpufreq: add __rust_helper to helpers

This is needed to inline these helpers into Rust code.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 rust/helpers/cpufreq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rust/helpers/cpufreq.c b/rust/helpers/cpufreq.c
index 7c1343c4d65e..0e16aeef2b5a 100644
--- a/rust/helpers/cpufreq.c
+++ b/rust/helpers/cpufreq.c
@@ -3,7 +3,8 @@
 #include <linux/cpufreq.h>
 
 #ifdef CONFIG_CPU_FREQ
-void rust_helper_cpufreq_register_em_with_opp(struct cpufreq_policy *policy)
+__rust_helper void
+rust_helper_cpufreq_register_em_with_opp(struct cpufreq_policy *policy)
 {
 	cpufreq_register_em_with_opp(policy);
 }

From e79cc7b5eba255fc0534212d25ee6142213d5314 Mon Sep 17 00:00:00 2001
From: Luca Weiss <luca.weiss@fairphone.com>
Date: Wed, 10 Dec 2025 10:43:25 +0900
Subject: [PATCH 40/65] dt-bindings: cpufreq: qcom-hw: document Milos CPUFREQ
 Hardware

Document the CPUFREQ Hardware on the Milos SoC.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml
index 2d42fc3d8ef8..22eeaef14f55 100644
--- a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml
+++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml
@@ -35,6 +35,7 @@ properties:
       - description: v2 of CPUFREQ HW (EPSS)
         items:
           - enum:
+              - qcom,milos-cpufreq-epss
               - qcom,qcs8300-cpufreq-epss
               - qcom,qdu1000-cpufreq-epss
               - qcom,sa8255p-cpufreq-epss
@@ -169,6 +170,7 @@ allOf:
         compatible:
           contains:
             enum:
+              - qcom,milos-cpufreq-epss
               - qcom,qcs8300-cpufreq-epss
               - qcom,sc7280-cpufreq-epss
               - qcom,sm8250-cpufreq-epss

From d6a6c58da38e4c4564e841faf3880769ff09936b Mon Sep 17 00:00:00 2001
From: Aaron Kling <webgeek1234@gmail.com>
Date: Thu, 18 Dec 2025 15:39:52 -0600
Subject: [PATCH 41/65] cpufreq: Add Tegra186 and Tegra194 to
 cpufreq-dt-platdev blocklist

These have platform specific drivers.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq-dt-platdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c
index a1d11ecd1ac8..4348eba6eb91 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -147,6 +147,8 @@ static const struct of_device_id blocklist[] __initconst = {
 	{ .compatible = "nvidia,tegra30", },
 	{ .compatible = "nvidia,tegra114", },
 	{ .compatible = "nvidia,tegra124", },
+	{ .compatible = "nvidia,tegra186", },
+	{ .compatible = "nvidia,tegra194", },
 	{ .compatible = "nvidia,tegra210", },
 	{ .compatible = "nvidia,tegra234", },
 

From e05d9e5c8b754cc7d72acd896f5f7caf6b78a973 Mon Sep 17 00:00:00 2001
From: Tamir Duberstein <tamird@gmail.com>
Date: Mon, 22 Dec 2025 13:29:32 +0100
Subject: [PATCH 42/65] rust: cpufreq: replace `kernel::c_str!` with C-Strings

C-String literals were added in Rust 1.77. Replace instances of
`kernel::c_str!` with C-String literals where possible.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/rcpufreq_dt.rs | 5 ++---
 rust/kernel/cpufreq.rs         | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/rcpufreq_dt.rs b/drivers/cpufreq/rcpufreq_dt.rs
index 31e07f0279db..f17bf64c22e2 100644
--- a/drivers/cpufreq/rcpufreq_dt.rs
+++ b/drivers/cpufreq/rcpufreq_dt.rs
@@ -3,7 +3,6 @@
 //! Rust based implementation of the cpufreq-dt driver.
 
 use kernel::{
-    c_str,
     clk::Clk,
     cpu, cpufreq,
     cpumask::CpumaskVar,
@@ -52,7 +51,7 @@ impl opp::ConfigOps for CPUFreqDTDriver {}
 
 #[vtable]
 impl cpufreq::Driver for CPUFreqDTDriver {
-    const NAME: &'static CStr = c_str!("cpufreq-dt");
+    const NAME: &'static CStr = c"cpufreq-dt";
     const FLAGS: u16 = cpufreq::flags::NEED_INITIAL_FREQ_CHECK | cpufreq::flags::IS_COOLING_DEV;
     const BOOST_ENABLED: bool = true;
 
@@ -197,7 +196,7 @@ fn register_em(policy: &mut cpufreq::Policy) {
     OF_TABLE,
     MODULE_OF_TABLE,
     <CPUFreqDTDriver as platform::Driver>::IdInfo,
-    [(of::DeviceId::new(c_str!("operating-points-v2")), ())]
+    [(of::DeviceId::new(c"operating-points-v2"), ())]
 );
 
 impl platform::Driver for CPUFreqDTDriver {
diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
index 0879a79485f8..76faa1ac8501 100644
--- a/rust/kernel/cpufreq.rs
+++ b/rust/kernel/cpufreq.rs
@@ -840,7 +840,6 @@ fn register_em(_policy: &mut Policy) {
 /// ```
 /// use kernel::{
 ///     cpufreq,
-///     c_str,
 ///     device::{Core, Device},
 ///     macros::vtable,
 ///     of, platform,
@@ -853,7 +852,7 @@ fn register_em(_policy: &mut Policy) {
 ///
 /// #[vtable]
 /// impl cpufreq::Driver for SampleDriver {
-///     const NAME: &'static CStr = c_str!("cpufreq-sample");
+///     const NAME: &'static CStr = c"cpufreq-sample";
 ///     const FLAGS: u16 = cpufreq::flags::NEED_INITIAL_FREQ_CHECK | cpufreq::flags::IS_COOLING_DEV;
 ///     const BOOST_ENABLED: bool = true;
 ///

From f9cadb3d56912a70571fdd95f426b757557c465b Mon Sep 17 00:00:00 2001
From: Jie Zhan <zhanjie9@hisilicon.com>
Date: Tue, 23 Dec 2025 15:21:17 +0800
Subject: [PATCH 43/65] ACPI: CPPC: Factor out and export per-cpu
 cppc_perf_ctrs_in_pcc_cpu()

Factor out cppc_perf_ctrs_in_pcc_cpu() for checking whether per-cpu CPC
regs are defined in PCC channels, and export it out for further use.

Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/acpi/cppc_acpi.c | 48 ++++++++++++++++++++++------------------
 include/acpi/cppc_acpi.h |  5 +++++
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 3bdeeee3414e..ec4966aaa8d4 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1422,6 +1422,32 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
 }
 EXPORT_SYMBOL_GPL(cppc_get_perf_caps);
 
+/**
+ * cppc_perf_ctrs_in_pcc_cpu - Check if any perf counters of a CPU are in PCC.
+ * @cpu: CPU on which to check perf counters.
+ *
+ * Return: true if any of the counters are in PCC regions, false otherwise
+ */
+bool cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu)
+{
+	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+	struct cpc_register_resource *ref_perf_reg;
+
+	/*
+	 * If reference perf register is not supported then we should use the
+	 * nominal perf value
+	 */
+	ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
+	if (!CPC_SUPPORTED(ref_perf_reg))
+		ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
+
+	return CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
+		CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
+		CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]) ||
+		CPC_IN_PCC(ref_perf_reg);
+}
+EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc_cpu);
+
 /**
  * cppc_perf_ctrs_in_pcc - Check if any perf counters are in a PCC region.
  *
@@ -1436,27 +1462,7 @@ bool cppc_perf_ctrs_in_pcc(void)
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		struct cpc_register_resource *ref_perf_reg;
-		struct cpc_desc *cpc_desc;
-
-		cpc_desc = per_cpu(cpc_desc_ptr, cpu);
-
-		if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
-		    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
-		    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
-			return true;
-
-
-		ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
-
-		/*
-		 * If reference perf register is not supported then we should
-		 * use the nominal perf value
-		 */
-		if (!CPC_SUPPORTED(ref_perf_reg))
-			ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
-
-		if (CPC_IN_PCC(ref_perf_reg))
+		if (cppc_perf_ctrs_in_pcc_cpu(cpu))
 			return true;
 	}
 
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 13fa81504844..4bcdcaf8bf2c 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -154,6 +154,7 @@ extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
 extern int cppc_set_enable(int cpu, bool enable);
 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
+extern bool cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu);
 extern bool cppc_perf_ctrs_in_pcc(void);
 extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
 extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
@@ -204,6 +205,10 @@ static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
 {
 	return -EOPNOTSUPP;
 }
+static inline bool cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu)
+{
+	return false;
+}
 static inline bool cppc_perf_ctrs_in_pcc(void)
 {
 	return false;

From 206b6612556398e717b1e293d96992d5ab2b8f32 Mon Sep 17 00:00:00 2001
From: Jie Zhan <zhanjie9@hisilicon.com>
Date: Tue, 23 Dec 2025 15:21:18 +0800
Subject: [PATCH 44/65] cpufreq: CPPC: Factor out cppc_fie_kworker_init()

Factor out the CPPC FIE kworker init in cppc_freq_invariance_init() because
it's a standalone procedure for use when the CPC regs are in PCC channels.

Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cppc_cpufreq.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 9eac77c4f294..947b4e2e1d4e 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -184,7 +184,7 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
 	}
 }
 
-static void __init cppc_freq_invariance_init(void)
+static void cppc_fie_kworker_init(void)
 {
 	struct sched_attr attr = {
 		.size		= sizeof(struct sched_attr),
@@ -201,17 +201,6 @@ static void __init cppc_freq_invariance_init(void)
 	};
 	int ret;
 
-	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
-		fie_disabled = FIE_ENABLED;
-		if (cppc_perf_ctrs_in_pcc()) {
-			pr_info("FIE not enabled on systems with registers in PCC\n");
-			fie_disabled = FIE_DISABLED;
-		}
-	}
-
-	if (fie_disabled)
-		return;
-
 	kworker_fie = kthread_run_worker(0, "cppc_fie");
 	if (IS_ERR(kworker_fie)) {
 		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
@@ -229,6 +218,22 @@ static void __init cppc_freq_invariance_init(void)
 	}
 }
 
+static void __init cppc_freq_invariance_init(void)
+{
+	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
+		fie_disabled = FIE_ENABLED;
+		if (cppc_perf_ctrs_in_pcc()) {
+			pr_info("FIE not enabled on systems with registers in PCC\n");
+			fie_disabled = FIE_DISABLED;
+		}
+	}
+
+	if (fie_disabled)
+		return;
+
+	cppc_fie_kworker_init();
+}
+
 static void cppc_freq_invariance_exit(void)
 {
 	if (fie_disabled)

From 997c021abc6eb9cf7df39fa77fa5e666ad55e3a3 Mon Sep 17 00:00:00 2001
From: Jie Zhan <zhanjie9@hisilicon.com>
Date: Tue, 23 Dec 2025 15:21:19 +0800
Subject: [PATCH 45/65] cpufreq: CPPC: Update FIE arch_freq_scale in ticks for
 non-PCC regs

Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the
scheduler tick but defers the update of arch_freq_scale to a separate
thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC.

However, this deferred update mechanism is unnecessary and introduces extra
overhead for non-PCC register spaces (e.g. System Memory or FFH), where
accessing the regs won't sleep and can be safely performed from the tick
context.

Furthermore, with the CPPC FIE registered, it throws repeated warnings of
"cppc_scale_freq_workfn: failed to read perf counters" on our platform with
the CPC regs in System Memory and a power-down idle state enabled.  That's
because the remote CPU can be in a power-down idle state, and reading its
perf counters returns 0.  Moving the FIE handling back to the scheduler
tick process makes the CPU handle its own perf counters, so it won't be
idle and the issue would be inherently solved.

To address the above issues, update arch_freq_scale directly in ticks for
non-PCC regs and keep the deferred update mechanism for PCC regs.

Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cppc_cpufreq.c | 77 +++++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 25 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 947b4e2e1d4e..36e8a75a37f1 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -54,31 +54,24 @@ static int cppc_perf_from_fbctrs(struct cppc_perf_fb_ctrs *fb_ctrs_t0,
 				 struct cppc_perf_fb_ctrs *fb_ctrs_t1);
 
 /**
- * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
- * @work: The work item.
+ * __cppc_scale_freq_tick - CPPC arch_freq_scale updater for frequency invariance
+ * @cppc_fi: per-cpu CPPC FIE data.
  *
- * The CPPC driver register itself with the topology core to provide its own
+ * The CPPC driver registers itself with the topology core to provide its own
  * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
  * gets called by the scheduler on every tick.
  *
  * Note that the arch specific counters have higher priority than CPPC counters,
  * if available, though the CPPC driver doesn't need to have any special
  * handling for that.
- *
- * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
- * reach here from hard-irq context), which then schedules a normal work item
- * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
- * based on the counter updates since the last tick.
  */
-static void cppc_scale_freq_workfn(struct kthread_work *work)
+static void __cppc_scale_freq_tick(struct cppc_freq_invariance *cppc_fi)
 {
-	struct cppc_freq_invariance *cppc_fi;
 	struct cppc_perf_fb_ctrs fb_ctrs = {0};
 	struct cppc_cpudata *cpu_data;
 	unsigned long local_freq_scale;
 	u64 perf;
 
-	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
 	cpu_data = cppc_fi->cpu_data;
 
 	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
@@ -102,6 +95,24 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
 	per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
 }
 
+static void cppc_scale_freq_tick(void)
+{
+	__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
+}
+
+static struct scale_freq_data cppc_sftd = {
+	.source = SCALE_FREQ_SOURCE_CPPC,
+	.set_freq_scale = cppc_scale_freq_tick,
+};
+
+static void cppc_scale_freq_workfn(struct kthread_work *work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+
+	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
+	__cppc_scale_freq_tick(cppc_fi);
+}
+
 static void cppc_irq_work(struct irq_work *irq_work)
 {
 	struct cppc_freq_invariance *cppc_fi;
@@ -110,7 +121,14 @@ static void cppc_irq_work(struct irq_work *irq_work)
 	kthread_queue_work(kworker_fie, &cppc_fi->work);
 }
 
-static void cppc_scale_freq_tick(void)
+/*
+ * Reading perf counters may sleep if the CPC regs are in PCC.  Thus, we
+ * schedule an irq work in scale_freq_tick (since we reach here from hard-irq
+ * context), which then schedules a normal work item cppc_scale_freq_workfn()
+ * that updates the per_cpu arch_freq_scale variable based on the counter
+ * updates since the last tick.
+ */
+static void cppc_scale_freq_tick_pcc(void)
 {
 	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
 
@@ -121,13 +139,14 @@ static void cppc_scale_freq_tick(void)
 	irq_work_queue(&cppc_fi->irq_work);
 }
 
-static struct scale_freq_data cppc_sftd = {
+static struct scale_freq_data cppc_sftd_pcc = {
 	.source = SCALE_FREQ_SOURCE_CPPC,
-	.set_freq_scale = cppc_scale_freq_tick,
+	.set_freq_scale = cppc_scale_freq_tick_pcc,
 };
 
 static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
 {
+	struct scale_freq_data *sftd = &cppc_sftd;
 	struct cppc_freq_invariance *cppc_fi;
 	int cpu, ret;
 
@@ -138,8 +157,11 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
 		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
 		cppc_fi->cpu = cpu;
 		cppc_fi->cpu_data = policy->driver_data;
-		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
-		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+		if (cppc_perf_ctrs_in_pcc_cpu(cpu)) {
+			kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
+			init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+			sftd = &cppc_sftd_pcc;
+		}
 
 		ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
 
@@ -155,7 +177,7 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
 	}
 
 	/* Register for freq-invariance */
-	topology_set_scale_freq_source(&cppc_sftd, policy->cpus);
+	topology_set_scale_freq_source(sftd, policy->cpus);
 }
 
 /*
@@ -178,6 +200,8 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
 	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
 
 	for_each_cpu(cpu, policy->related_cpus) {
+		if (!cppc_perf_ctrs_in_pcc_cpu(cpu))
+			continue;
 		cppc_fi = &per_cpu(cppc_freq_inv, cpu);
 		irq_work_sync(&cppc_fi->irq_work);
 		kthread_cancel_work_sync(&cppc_fi->work);
@@ -206,6 +230,7 @@ static void cppc_fie_kworker_init(void)
 		pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
 			PTR_ERR(kworker_fie));
 		fie_disabled = FIE_DISABLED;
+		kworker_fie = NULL;
 		return;
 	}
 
@@ -215,20 +240,24 @@ static void cppc_fie_kworker_init(void)
 			ret);
 		kthread_destroy_worker(kworker_fie);
 		fie_disabled = FIE_DISABLED;
+		kworker_fie = NULL;
 	}
 }
 
 static void __init cppc_freq_invariance_init(void)
 {
-	if (fie_disabled != FIE_ENABLED && fie_disabled != FIE_DISABLED) {
-		fie_disabled = FIE_ENABLED;
-		if (cppc_perf_ctrs_in_pcc()) {
+	bool perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
+
+	if (fie_disabled == FIE_UNSET) {
+		if (perf_ctrs_in_pcc) {
 			pr_info("FIE not enabled on systems with registers in PCC\n");
 			fie_disabled = FIE_DISABLED;
+		} else {
+			fie_disabled = FIE_ENABLED;
 		}
 	}
 
-	if (fie_disabled)
+	if (fie_disabled || !perf_ctrs_in_pcc)
 		return;
 
 	cppc_fie_kworker_init();
@@ -236,10 +265,8 @@ static void __init cppc_freq_invariance_init(void)
 
 static void cppc_freq_invariance_exit(void)
 {
-	if (fie_disabled)
-		return;
-
-	kthread_destroy_worker(kworker_fie);
+	if (kworker_fie)
+		kthread_destroy_worker(kworker_fie);
 }
 
 #else

From 11af6e102d31433e3084d6d6cdb2b2fe6c23d1a9 Mon Sep 17 00:00:00 2001
From: Yilin Chen <1479826151@qq.com>
Date: Mon, 12 Jan 2026 16:00:47 +0800
Subject: [PATCH 46/65] rust: cpumask: rename methods of Cpumask for clarity
 and consistency

Rename `as_ref` and `as_mut_ref` to `from_raw` and `from_raw_mut` to
align with the established naming convention for constructing types
from raw pointers in the kernel's Rust codebase.

Signed-off-by: Yilin Chen <1479826151@qq.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 rust/kernel/cpumask.rs | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/rust/kernel/cpumask.rs b/rust/kernel/cpumask.rs
index c1d17826ae7b..44bb36636ee3 100644
--- a/rust/kernel/cpumask.rs
+++ b/rust/kernel/cpumask.rs
@@ -39,7 +39,7 @@
 /// fn set_clear_cpu(ptr: *mut bindings::cpumask, set_cpu: CpuId, clear_cpu: CpuId) {
 ///     // SAFETY: The `ptr` is valid for writing and remains valid for the lifetime of the
 ///     // returned reference.
-///     let mask = unsafe { Cpumask::as_mut_ref(ptr) };
+///     let mask = unsafe { Cpumask::from_raw_mut(ptr) };
 ///
 ///     mask.set(set_cpu);
 ///     mask.clear(clear_cpu);
@@ -49,13 +49,13 @@
 pub struct Cpumask(Opaque<bindings::cpumask>);
 
 impl Cpumask {
-    /// Creates a mutable reference to an existing `struct cpumask` pointer.
+    /// Creates a mutable reference from an existing `struct cpumask` pointer.
     ///
     /// # Safety
     ///
     /// The caller must ensure that `ptr` is valid for writing and remains valid for the lifetime
     /// of the returned reference.
-    pub unsafe fn as_mut_ref<'a>(ptr: *mut bindings::cpumask) -> &'a mut Self {
+    pub unsafe fn from_raw_mut<'a>(ptr: *mut bindings::cpumask) -> &'a mut Self {
         // SAFETY: Guaranteed by the safety requirements of the function.
         //
         // INVARIANT: The caller ensures that `ptr` is valid for writing and remains valid for the
@@ -63,13 +63,13 @@ pub unsafe fn as_mut_ref<'a>(ptr: *mut bindings::cpumask) -> &'a mut Self {
         unsafe { &mut *ptr.cast() }
     }
 
-    /// Creates a reference to an existing `struct cpumask` pointer.
+    /// Creates a reference from an existing `struct cpumask` pointer.
     ///
     /// # Safety
     ///
     /// The caller must ensure that `ptr` is valid for reading and remains valid for the lifetime
     /// of the returned reference.
-    pub unsafe fn as_ref<'a>(ptr: *const bindings::cpumask) -> &'a Self {
+    pub unsafe fn from_raw<'a>(ptr: *const bindings::cpumask) -> &'a Self {
         // SAFETY: Guaranteed by the safety requirements of the function.
         //
         // INVARIANT: The caller ensures that `ptr` is valid for reading and remains valid for the

From 7b781899072c5701ef9538c365757ee9ab9c00bd Mon Sep 17 00:00:00 2001
From: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Date: Tue, 13 Jan 2026 16:25:35 +0100
Subject: [PATCH 47/65] cpufreq: dt-platdev: Block the driver from probing on
 more QC platforms

Add a number of QC platforms to the blocklist, they all use either the
qcom-cpufreq-hw driver.

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq-dt-platdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c
index 4348eba6eb91..73b00c51f9e9 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -171,8 +171,11 @@ static const struct of_device_id blocklist[] __initconst = {
 	{ .compatible = "qcom,sdm845", },
 	{ .compatible = "qcom,sdx75", },
 	{ .compatible = "qcom,sm6115", },
+	{ .compatible = "qcom,sm6125", },
+	{ .compatible = "qcom,sm6150", },
 	{ .compatible = "qcom,sm6350", },
 	{ .compatible = "qcom,sm6375", },
+	{ .compatible = "qcom,sm7125", },
 	{ .compatible = "qcom,sm7225", },
 	{ .compatible = "qcom,sm7325", },
 	{ .compatible = "qcom,sm8150", },

From 8c376f337a7e31c42949247e24eaad9a30d6c62c Mon Sep 17 00:00:00 2001
From: Sergey Shtylyov <s.shtylyov@auroraos.dev>
Date: Tue, 13 Jan 2026 22:33:30 +0300
Subject: [PATCH 48/65] cpufreq: scmi: correct SCMI explanation

SCMI stands for System Control and Management Interface, not System Control
and Power Interface -- apparently, Sudeep Holla copied this line from his
SCPI driver and then just forgot to update the acronym explanation... :-)

Fixes: 99d6bdf33877 ("cpufreq: add support for CPU DVFS based on SCMI message protocol")
Signed-off-by: Sergey Shtylyov <s.shtylyov@auroraos.dev>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/scmi-cpufreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index d2a110079f5f..e0e1756180b0 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * System Control and Power Interface (SCMI) based CPUFreq Interface driver
+ * System Control and Management Interface (SCMI) based CPUFreq Interface driver
  *
  * Copyright (C) 2018-2021 ARM Ltd.
  * Sudeep Holla <sudeep.holla@arm.com>

From 94dbce6c13cd7634f9bdb402248991c95a8c3d57 Mon Sep 17 00:00:00 2001
From: Juan Martinez <juan.martinez@amd.com>
Date: Fri, 16 Jan 2026 15:45:39 -0600
Subject: [PATCH 49/65] cpufreq/amd-pstate: Add comment explaining nominal_perf
 usage for performance policy

Add comment explaining why nominal_perf is used for MinPerf when the
CPU frequency policy is set to CPUFREQ_POLICY_PERFORMANCE, rather than
using highest_perf or lowest_nonlinear_perf.

Signed-off-by: Juan Martinez <juan.martinez@amd.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/amd-pstate.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index c45bc98721d2..ec9f38b219de 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -636,6 +636,19 @@ static void amd_pstate_update_min_max_limit(struct cpufreq_policy *policy)
 	WRITE_ONCE(cpudata->max_limit_freq, policy->max);
 
 	if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
+		/*
+		 * For performance policy, set MinPerf to nominal_perf rather than
+		 * highest_perf or lowest_nonlinear_perf.
+		 *
+		 * Per commit 0c411b39e4f4c, using highest_perf was observed
+		 * to cause frequency throttling on power-limited platforms, leading to
+		 * performance regressions. Using lowest_nonlinear_perf would limit
+		 * performance too much for HPC workloads requiring high frequency
+		 * operation and minimal wakeup latency from idle states.
+		 *
+		 * nominal_perf therefore provides a balance by avoiding throttling
+		 * while still maintaining enough performance for HPC workloads.
+		 */
 		perf.min_limit_perf = min(perf.nominal_perf, perf.max_limit_perf);
 		WRITE_ONCE(cpudata->min_limit_freq, min(cpudata->nominal_freq, cpudata->max_limit_freq));
 	} else {

From 945fc28a06a1d30315ca416167754e10208024a5 Mon Sep 17 00:00:00 2001
From: Dhruva Gole <d-gole@ti.com>
Date: Tue, 20 Jan 2026 17:17:30 +0530
Subject: [PATCH 50/65] cpufreq: dt-platdev: Add ti,am62l3 to blocklist

Add AM62L3 SoC to the dt-platdev blocklist to ensure proper handling
of CPUFreq functionality. The AM62L3 will use its native TI CPUFreq
driver implementation instead of the generic dt-platdev driver.

This follows the same pattern as other TI SoCs like AM62A7, AM62D2,
and AM62P5 which have been previously added to this blocklist.

Reviewed-by: Kendall Willis <k-willis@ti.com>
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq-dt-platdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c
index 73b00c51f9e9..4b0b6c521b36 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -196,6 +196,7 @@ static const struct of_device_id blocklist[] __initconst = {
 	{ .compatible = "ti,am625", },
 	{ .compatible = "ti,am62a7", },
 	{ .compatible = "ti,am62d2", },
+	{ .compatible = "ti,am62l3", },
 	{ .compatible = "ti,am62p5", },
 
 	{ .compatible = "qcom,ipq5332", },

From dea8bfea76e4bea9f727f777604d4053d7e9cd92 Mon Sep 17 00:00:00 2001
From: Dhruva Gole <d-gole@ti.com>
Date: Tue, 20 Jan 2026 17:17:31 +0530
Subject: [PATCH 51/65] cpufreq: ti-cpufreq: add support for AM62L3 SoC

Add CPUFreq support for the AM62L3 SoC with the appropriate
AM62L3 speed grade constants according to the datasheet [1].

This follows the same architecture-specific implementation pattern
as other TI SoCs in the AM6x family.

While at it, also sort instances where the SOC family names
were not sorted alphabetically.

[1] https://www.ti.com/lit/pdf/SPRSPA1

Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kendall Willis <k-willis@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/ti-cpufreq.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c
index 6ee76f5fe9c5..3d1129aeed02 100644
--- a/drivers/cpufreq/ti-cpufreq.c
+++ b/drivers/cpufreq/ti-cpufreq.c
@@ -70,6 +70,12 @@ enum {
 #define AM62A7_SUPPORT_R_MPU_OPP		BIT(1)
 #define AM62A7_SUPPORT_V_MPU_OPP		BIT(2)
 
+#define AM62L3_EFUSE_E_MPU_OPP			5
+#define AM62L3_EFUSE_O_MPU_OPP			15
+
+#define AM62L3_SUPPORT_E_MPU_OPP		BIT(0)
+#define AM62L3_SUPPORT_O_MPU_OPP		BIT(1)
+
 #define AM62P5_EFUSE_O_MPU_OPP			15
 #define AM62P5_EFUSE_S_MPU_OPP			19
 #define AM62P5_EFUSE_T_MPU_OPP			20
@@ -213,6 +219,22 @@ static unsigned long am625_efuse_xlate(struct ti_cpufreq_data *opp_data,
 	return calculated_efuse;
 }
 
+static unsigned long am62l3_efuse_xlate(struct ti_cpufreq_data *opp_data,
+				       unsigned long efuse)
+{
+	unsigned long calculated_efuse = AM62L3_SUPPORT_E_MPU_OPP;
+
+	switch (efuse) {
+	case AM62L3_EFUSE_O_MPU_OPP:
+		calculated_efuse |= AM62L3_SUPPORT_O_MPU_OPP;
+		fallthrough;
+	case AM62L3_EFUSE_E_MPU_OPP:
+		calculated_efuse |= AM62L3_SUPPORT_E_MPU_OPP;
+	}
+
+	return calculated_efuse;
+}
+
 static struct ti_cpufreq_soc_data am3x_soc_data = {
 	.efuse_xlate = amx3_efuse_xlate,
 	.efuse_fallback = AM33XX_800M_ARM_MPU_MAX_FREQ,
@@ -313,8 +335,9 @@ static struct ti_cpufreq_soc_data am3517_soc_data = {
 static const struct soc_device_attribute k3_cpufreq_soc[] = {
 	{ .family = "AM62X", },
 	{ .family = "AM62AX", },
-	{ .family = "AM62PX", },
 	{ .family = "AM62DX", },
+	{ .family = "AM62LX", },
+	{ .family = "AM62PX", },
 	{ /* sentinel */ }
 };
 
@@ -335,6 +358,14 @@ static struct ti_cpufreq_soc_data am62a7_soc_data = {
 	.multi_regulator = false,
 };
 
+static struct ti_cpufreq_soc_data am62l3_soc_data = {
+	.efuse_xlate = am62l3_efuse_xlate,
+	.efuse_offset = 0x0,
+	.efuse_mask = 0x07c0,
+	.efuse_shift = 0x6,
+	.multi_regulator = false,
+};
+
 static struct ti_cpufreq_soc_data am62p5_soc_data = {
 	.efuse_xlate = am62p5_efuse_xlate,
 	.efuse_offset = 0x0,
@@ -463,6 +494,7 @@ static const struct of_device_id ti_cpufreq_of_match[]  __maybe_unused = {
 	{ .compatible = "ti,am625", .data = &am625_soc_data, },
 	{ .compatible = "ti,am62a7", .data = &am62a7_soc_data, },
 	{ .compatible = "ti,am62d2", .data = &am62a7_soc_data, },
+	{ .compatible = "ti,am62l3", .data = &am62l3_soc_data, },
 	{ .compatible = "ti,am62p5", .data = &am62p5_soc_data, },
 	/* legacy */
 	{ .compatible = "ti,omap3430", .data = &omap34xx_soc_data, },

From 0b7fbf9333fa4699a53145bad8ce74ea986caa13 Mon Sep 17 00:00:00 2001
From: Felix Gu <ustc.gu@gmail.com>
Date: Wed, 21 Jan 2026 23:32:06 +0800
Subject: [PATCH 52/65] cpufreq: scmi: Fix device_node reference leak in
 scmi_cpu_domain_id()

When calling of_parse_phandle_with_args(), the caller is responsible
to call of_node_put() to release the reference of device node.
In scmi_cpu_domain_id(), it does not release the reference.

Fixes: e336baa4193e ("cpufreq: scmi: Prepare to move OF parsing of domain-id to cpufreq")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/scmi-cpufreq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index e0e1756180b0..c7a3b038385b 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -101,6 +101,7 @@ static int scmi_cpu_domain_id(struct device *cpu_dev)
 			return -EINVAL;
 	}
 
+	of_node_put(domain_id.np);
 	return domain_id.args[0];
 }
 

From 4a1cf5ed51b1b6049d7771d2e77789b99dafc8ae Mon Sep 17 00:00:00 2001
From: Sumit Gupta <sumitg@nvidia.com>
Date: Tue, 20 Jan 2026 20:26:15 +0530
Subject: [PATCH 53/65] cpufreq: CPPC: Add generic helpers for sysfs show/store

Add generic helper functions for u64 sysfs attributes that follow the
common pattern of calling CPPC get/set APIs:
 - cppc_cpufreq_sysfs_show_u64(): reads value and handles -EOPNOTSUPP
 - cppc_cpufreq_sysfs_store_u64(): parses input and calls set function

Add CPPC_CPUFREQ_ATTR_RW_U64() macro to generate show/store functions
using these helpers, reducing boilerplate for simple attributes.

Convert auto_act_window and energy_performance_preference_val to use
the new macro.

No functional changes.

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
[ rjw: Retained empty code line after a conditional ]
Link: https://patch.msgid.link/20260120145623.2959636-2-sumitg@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cppc_cpufreq.c | 72 +++++++++++++---------------------
 1 file changed, 27 insertions(+), 45 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 36e8a75a37f1..7e8042efedd1 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -863,14 +863,13 @@ static ssize_t store_auto_select(struct cpufreq_policy *policy,
 	return count;
 }
 
-static ssize_t show_auto_act_window(struct cpufreq_policy *policy, char *buf)
+static ssize_t cppc_cpufreq_sysfs_show_u64(unsigned int cpu,
+					   int (*get_func)(int, u64 *),
+					   char *buf)
 {
 	u64 val;
-	int ret;
+	int ret = get_func((int)cpu, &val);
 
-	ret = cppc_get_auto_act_window(policy->cpu, &val);
-
-	/* show "<unsupported>" when this register is not supported by cpc */
 	if (ret == -EOPNOTSUPP)
 		return sysfs_emit(buf, "<unsupported>\n");
 
@@ -880,42 +879,9 @@ static ssize_t show_auto_act_window(struct cpufreq_policy *policy, char *buf)
 	return sysfs_emit(buf, "%llu\n", val);
 }
 
-static ssize_t store_auto_act_window(struct cpufreq_policy *policy,
-				     const char *buf, size_t count)
-{
-	u64 usec;
-	int ret;
-
-	ret = kstrtou64(buf, 0, &usec);
-	if (ret)
-		return ret;
-
-	ret = cppc_set_auto_act_window(policy->cpu, usec);
-	if (ret)
-		return ret;
-
-	return count;
-}
-
-static ssize_t show_energy_performance_preference_val(struct cpufreq_policy *policy, char *buf)
-{
-	u64 val;
-	int ret;
-
-	ret = cppc_get_epp_perf(policy->cpu, &val);
-
-	/* show "<unsupported>" when this register is not supported by cpc */
-	if (ret == -EOPNOTSUPP)
-		return sysfs_emit(buf, "<unsupported>\n");
-
-	if (ret)
-		return ret;
-
-	return sysfs_emit(buf, "%llu\n", val);
-}
-
-static ssize_t store_energy_performance_preference_val(struct cpufreq_policy *policy,
-						       const char *buf, size_t count)
+static ssize_t cppc_cpufreq_sysfs_store_u64(unsigned int cpu,
+					    int (*set_func)(int, u64),
+					    const char *buf, size_t count)
 {
 	u64 val;
 	int ret;
@@ -924,13 +890,29 @@ static ssize_t store_energy_performance_preference_val(struct cpufreq_policy *po
 	if (ret)
 		return ret;
 
-	ret = cppc_set_epp(policy->cpu, val);
-	if (ret)
-		return ret;
+	ret = set_func((int)cpu, val);
 
-	return count;
+	return ret ? ret : count;
 }
 
+#define CPPC_CPUFREQ_ATTR_RW_U64(_name, _get_func, _set_func)		\
+static ssize_t show_##_name(struct cpufreq_policy *policy, char *buf)	\
+{									\
+	return cppc_cpufreq_sysfs_show_u64(policy->cpu, _get_func, buf);\
+}									\
+static ssize_t store_##_name(struct cpufreq_policy *policy,		\
+			     const char *buf, size_t count)		\
+{									\
+	return cppc_cpufreq_sysfs_store_u64(policy->cpu, _set_func,	\
+					    buf, count);		\
+}
+
+CPPC_CPUFREQ_ATTR_RW_U64(auto_act_window, cppc_get_auto_act_window,
+			 cppc_set_auto_act_window)
+
+CPPC_CPUFREQ_ATTR_RW_U64(energy_performance_preference_val,
+			 cppc_get_epp_perf, cppc_set_epp)
+
 cpufreq_freq_attr_ro(freqdomain_cpus);
 cpufreq_freq_attr_rw(auto_select);
 cpufreq_freq_attr_rw(auto_act_window);

From 1081c1649da989ef9cbc01ffa99babc190df6077 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Mon, 26 Jan 2026 21:03:57 +0100
Subject: [PATCH 54/65] PM: hibernate: Drop NULL pointer checks before
 acomp_request_free()

Since acomp_request_free() checks its argument against NULL, the NULL
pointer checks before calling it added by commit ("7966cf0ebe32 PM:
hibernate: Fix crash when freeing invalid crypto compressor") are
redundant, so drop them.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/6233709.lOV4Wx5bFT@rafael.j.wysocki
---
 kernel/power/swap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 8050e5182835..7e462957c9bf 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -902,8 +902,8 @@ static int save_compressed_image(struct swap_map_handle *handle,
 		for (thr = 0; thr < nr_threads; thr++) {
 			if (data[thr].thr)
 				kthread_stop(data[thr].thr);
-			if (data[thr].cr)
-				acomp_request_free(data[thr].cr);
+
+			acomp_request_free(data[thr].cr);
 
 			if (!IS_ERR_OR_NULL(data[thr].cc))
 				crypto_free_acomp(data[thr].cc);
@@ -1502,8 +1502,8 @@ static int load_compressed_image(struct swap_map_handle *handle,
 		for (thr = 0; thr < nr_threads; thr++) {
 			if (data[thr].thr)
 				kthread_stop(data[thr].thr);
-			if (data[thr].cr)
-				acomp_request_free(data[thr].cr);
+
+			acomp_request_free(data[thr].cr);
 
 			if (!IS_ERR_OR_NULL(data[thr].cc))
 				crypto_free_acomp(data[thr].cc);

From cc764d3bbd545d7d6f5f66ac678ffc522d75f0f9 Mon Sep 17 00:00:00 2001
From: Pengjie Zhang <zhangpengjie2@huawei.com>
Date: Fri, 16 Jan 2026 17:46:23 +0800
Subject: [PATCH 55/65] cpufreq: userspace: make scaling_setspeed return the
 actual requested frequency

According to the Linux kernel ABI documentation for 'scaling_setspeed':
  "It returns the last frequency requested by the governor (in kHz) or
   can be written to in order to set a new frequency for the policy."

However, the current implementation of show_speed() returns 'policy->cur'.
'policy->cur' represents the frequency after the driver has
resolved the request against the hardware frequency table and applied
policy limits (min/max).

This creates a discrepancy between the documentation/user expectation
and the actual code behavior. For instance:

 1. User writes a value to 'scaling_setspeed' that is not in the OPP
    table (e.g., user asks for A, driver rounds it to B).
 2. User reads 'scaling_setspeed'.
 3. Code returns B ('policy->cur').
 4. User expects A (the "frequency requested"), but gets B.

This patch changes show_speed() to return 'userspace->setspeed', which
stores the actual value last requested by the user. This restores the
read/write symmetry of the attribute and aligns the code with the ABI
description.

The effective frequency can still be observed via 'scaling_cur_freq' or
'cpuinfo_cur_freq', preserving the distinction between "what was
requested" (setspeed) and "what is effective" (cur_freq).

Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: lihuisong@huawei.com
Link: https://patch.msgid.link/20260116094623.2980031-1-zhangpengjie2@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq_userspace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_userspace.c b/drivers/cpufreq/cpufreq_userspace.c
index 77d62152cd38..4bd62e6c5c51 100644
--- a/drivers/cpufreq/cpufreq_userspace.c
+++ b/drivers/cpufreq/cpufreq_userspace.c
@@ -49,7 +49,9 @@ static int cpufreq_set(struct cpufreq_policy *policy, unsigned int freq)
 
 static ssize_t show_speed(struct cpufreq_policy *policy, char *buf)
 {
-	return sprintf(buf, "%u\n", policy->cur);
+	struct userspace_policy *userspace = policy->governor_data;
+
+	return sprintf(buf, "%u\n", userspace->setspeed);
 }
 
 static int cpufreq_userspace_policy_init(struct cpufreq_policy *policy)

From a554a25e66efea0b78fb3d24f4f19289e037c0dc Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic@kernel.org>
Date: Wed, 28 Jan 2026 17:05:27 +0100
Subject: [PATCH 56/65] cpufreq: ondemand: Simplify idle cputime granularity
 test

cpufreq calls get_cpu_idle_time_us() just to know if idle cputime
accounting has a nanoseconds granularity.

Use the appropriate indicator instead to make that deduction.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/aXozx0PXutnm8ECX@localhost.localdomain
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq_ondemand.c | 7 +------
 include/linux/tick.h               | 2 ++
 kernel/time/hrtimer.c              | 2 +-
 kernel/time/tick-internal.h        | 2 --
 kernel/time/tick-sched.c           | 8 +++++++-
 kernel/time/timer.c                | 2 +-
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index a6ecc203f7b7..bb7db82930e4 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -334,17 +334,12 @@ static void od_free(struct policy_dbs_info *policy_dbs)
 static int od_init(struct dbs_data *dbs_data)
 {
 	struct od_dbs_tuners *tuners;
-	u64 idle_time;
-	int cpu;
 
 	tuners = kzalloc(sizeof(*tuners), GFP_KERNEL);
 	if (!tuners)
 		return -ENOMEM;
 
-	cpu = get_cpu();
-	idle_time = get_cpu_idle_time_us(cpu, NULL);
-	put_cpu();
-	if (idle_time != -1ULL) {
+	if (tick_nohz_is_active()) {
 		/* Idle micro accounting is supported. Use finer thresholds */
 		dbs_data->up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
 	} else {
diff --git a/include/linux/tick.h b/include/linux/tick.h
index ac76ae9fa36d..738007d6f577 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -126,6 +126,7 @@ enum tick_dep_bits {
 
 #ifdef CONFIG_NO_HZ_COMMON
 extern bool tick_nohz_enabled;
+extern bool tick_nohz_is_active(void);
 extern bool tick_nohz_tick_stopped(void);
 extern bool tick_nohz_tick_stopped_cpu(int cpu);
 extern void tick_nohz_idle_stop_tick(void);
@@ -142,6 +143,7 @@ extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 #else /* !CONFIG_NO_HZ_COMMON */
 #define tick_nohz_enabled (0)
+static inline bool tick_nohz_is_active(void) { return false; }
 static inline int tick_nohz_tick_stopped(void) { return 0; }
 static inline int tick_nohz_tick_stopped_cpu(int cpu) { return 0; }
 static inline void tick_nohz_idle_stop_tick(void) { }
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 0e4bc1ca15ff..1caf02a72ba8 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -943,7 +943,7 @@ void clock_was_set(unsigned int bases)
 	cpumask_var_t mask;
 	int cpu;
 
-	if (!hrtimer_hres_active(cpu_base) && !tick_nohz_active)
+	if (!hrtimer_hres_active(cpu_base) && !tick_nohz_is_active())
 		goto out_timerfd;
 
 	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 4e4f7bbe2a64..597d816d22e8 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -156,7 +156,6 @@ static inline void tick_nohz_init(void) { }
 #endif
 
 #ifdef CONFIG_NO_HZ_COMMON
-extern unsigned long tick_nohz_active;
 extern void timers_update_nohz(void);
 extern u64 get_jiffies_update(unsigned long *basej);
 # ifdef CONFIG_SMP
@@ -171,7 +170,6 @@ extern void timer_expire_remote(unsigned int cpu);
 # endif
 #else /* CONFIG_NO_HZ_COMMON */
 static inline void timers_update_nohz(void) { }
-#define tick_nohz_active (0)
 #endif
 
 DECLARE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2f8a7923fa27..72e39c793117 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -693,7 +693,7 @@ void __init tick_nohz_init(void)
  * NO HZ enabled ?
  */
 bool tick_nohz_enabled __read_mostly  = true;
-unsigned long tick_nohz_active  __read_mostly;
+static unsigned long tick_nohz_active  __read_mostly;
 /*
  * Enable / Disable tickless mode
  */
@@ -704,6 +704,12 @@ static int __init setup_tick_nohz(char *str)
 
 __setup("nohz=", setup_tick_nohz);
 
+bool tick_nohz_is_active(void)
+{
+	return tick_nohz_active;
+}
+EXPORT_SYMBOL_GPL(tick_nohz_is_active);
+
 bool tick_nohz_tick_stopped(void)
 {
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1f2364126894..7e1e3bde6b8b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -281,7 +281,7 @@ DEFINE_STATIC_KEY_FALSE(timers_migration_enabled);
 
 static void timers_update_migration(void)
 {
-	if (sysctl_timer_migration && tick_nohz_active)
+	if (sysctl_timer_migration && tick_nohz_is_active())
 		static_branch_enable(&timers_migration_enabled);
 	else
 		static_branch_disable(&timers_migration_enabled);

From f36de72673ad80c9931c0b411df0d6ef184f6c22 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Thu, 29 Jan 2026 21:49:12 +0100
Subject: [PATCH 57/65] cpuidle: governors: teo: Adjust the classification of
 wakeup events

If differences between target residency values of adjacent idle states
of a given CPU are relatively large, the corresponding idle state bins
used by the teo governors are large either and the rule by which hits
are distinguished from intercepts is inaccurate.

Namely, by that rule, a wakeup event is classified as a hit if the
sleep length (the time till the closest timer other than the tick)
and the measured idle duration, adjusted for the entered idle state
exit latency, fall into the same idle state bin.  However, if that bin
is large enough, the actual difference between the sleep length and
the measured idle duration may be significant.  It may in fact be
significantly greater than the analogous difference for an event where
the sleep length and the measured idle duration fall into different
bins.

For this reason, amend the rule in question with a check that will only
allow a wakeup event to be counted as a hit if the sleep length is less
than the "raw" measured idle duration (which means that the wakeup
appears to have occurred after the anticipated timer event).  Otherwise,
the event will be counted as an intercept.

Also update the documentation part explaining the difference between
"hits" and "intercepts" to take the above change into account.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5093379.31r3eYUQgx@rafael.j.wysocki
---
 drivers/cpuidle/governors/teo.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 750ab0678a77..34b769b37a86 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -48,12 +48,11 @@
  * in accordance with what happened last time.
  *
  * The "hits" metric reflects the relative frequency of situations in which the
- * sleep length and the idle duration measured after CPU wakeup fall into the
- * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
- * length).  In turn, the "intercepts" metric reflects the relative frequency of
- * non-timer wakeup events for which the measured idle duration falls into a bin
- * that corresponds to an idle state shallower than the one whose bin is fallen
- * into by the sleep length (these events are also referred to as "intercepts"
+ * sleep length and the idle duration measured after CPU wakeup are close enough
+ * (that is, the CPU appears to wake up "on time" relative to the sleep length).
+ * In turn, the "intercepts" metric reflects the relative frequency of non-timer
+ * wakeup events for which the measured idle duration is significantly different
+ * from the sleep length (these events are also referred to as "intercepts"
  * below).
  *
  * The governor also counts "intercepts" with the measured idle duration below
@@ -167,6 +166,7 @@ static void teo_decay(unsigned int *metric)
  */
 static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 {
+	s64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns;
 	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
 	int i, idx_timer = 0, idx_duration = 0;
 	s64 target_residency_ns, measured_ns;
@@ -182,8 +182,6 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 		 */
 		measured_ns = S64_MAX;
 	} else {
-		s64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns;
-
 		measured_ns = dev->last_residency_ns;
 		/*
 		 * The delay between the wakeup and the first instruction
@@ -253,12 +251,17 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	}
 
 	/*
-	 * If the measured idle duration falls into the same bin as the sleep
-	 * length, this is a "hit", so update the "hits" metric for that bin.
+	 * If the measured idle duration (adjusted for the entered state exit
+	 * latency) falls into the same bin as the sleep length and the latter
+	 * is less than the "raw" measured idle duration (so the wakeup appears
+	 * to have occurred after the anticipated timer event), this is a "hit",
+	 * so update the "hits" metric for that bin.
+	 *
 	 * Otherwise, update the "intercepts" metric for the bin fallen into by
 	 * the measured idle duration.
 	 */
-	if (idx_timer == idx_duration) {
+	if (idx_timer == idx_duration &&
+	    cpu_data->sleep_length_ns - measured_ns < lat_ns / 2) {
 		cpu_data->state_bins[idx_timer].hits += PULSE;
 	} else {
 		cpu_data->state_bins[idx_duration].intercepts += PULSE;

From a971f984b8455db0ef23910442029cdad53bc459 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Thu, 29 Jan 2026 21:51:11 +0100
Subject: [PATCH 58/65] cpuidle: governors: teo: Refine intercepts-based idle
 state lookup

There are cases in which decisions made by the teo governor are
arguably overly conservative.

For instance, suppose that there are 4 idle states and the values of
the intercepts metric for the first 3 of them are 400, 250, and 251,
respectively.  If the total sum computed in teo_update() is 1000, the
governor will select idle state 1 (provided that all idle states are
enabled and the scheduler tick has not been stopped) although arguably
idle state 0 would be a better choice because the likelihood of getting
an idle duration below the target residency of idle state 1 is greater
than the likelihood of getting an idle duration between the target
residency of idle state 1 and the target residency of idle state 2.

To address this, refine the candidate idle state lookup based on
intercepts to start at the state with the maximum intercepts metric,
below the deepest enabled one, to avoid the cases in which the search
may stop before reaching that state.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Fixed typo "intercetps" in new comments (3 places) ]
Link: https://patch.msgid.link/2417298.ElGaqSPkdT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c | 50 ++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 34b769b37a86..80f3ba942a06 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -74,12 +74,17 @@
  *      than the candidate one (it represents the cases in which the CPU was
  *      likely woken up by a non-timer wakeup source).
  *
+ *    Also find the idle state with the maximum intercepts metric (if there are
+ *    multiple states with the maximum intercepts metric, choose the one with
+ *    the highest index).
+ *
  * 2. If the second sum computed in step 1 is greater than a half of the sum of
  *    both metrics for the candidate state bin and all subsequent bins (if any),
  *    a shallower idle state is likely to be more suitable, so look for it.
  *
  *    - Traverse the enabled idle states shallower than the candidate one in the
- *      descending order.
+ *      descending order, starting at the state with the maximum intercepts
+ *      metric found in step 1.
  *
  *    - For each of them compute the sum of the "intercepts" metrics over all
  *      of the idle states between it and the candidate one (including the
@@ -308,8 +313,10 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	ktime_t delta_tick = TICK_NSEC / 2;
 	unsigned int idx_intercept_sum = 0;
 	unsigned int intercept_sum = 0;
+	unsigned int intercept_max = 0;
 	unsigned int idx_hit_sum = 0;
 	unsigned int hit_sum = 0;
+	int intercept_max_idx = -1;
 	int constraint_idx = 0;
 	int idx0 = 0, idx = -1;
 	s64 duration_ns;
@@ -340,17 +347,32 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	if (!dev->states_usage[0].disable)
 		idx = 0;
 
-	/* Compute the sums of metrics for early wakeup pattern detection. */
+	/*
+	 * Compute the sums of metrics for early wakeup pattern detection and
+	 * look for the state bin with the maximum intercepts metric below the
+	 * deepest enabled one (if there are multiple states with the maximum
+	 * intercepts metric, choose the one with the highest index).
+	 */
 	for (i = 1; i < drv->state_count; i++) {
 		struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
+		unsigned int prev_intercepts = prev_bin->intercepts;
 		struct cpuidle_state *s = &drv->states[i];
 
 		/*
 		 * Update the sums of idle state metrics for all of the states
 		 * shallower than the current one.
 		 */
-		intercept_sum += prev_bin->intercepts;
 		hit_sum += prev_bin->hits;
+		intercept_sum += prev_intercepts;
+		/*
+		 * Check if this is the bin with the maximum number of
+		 * intercepts so far and in that case update the index of
+		 * the state with the maximum intercepts metric.
+		 */
+		if (prev_intercepts >= intercept_max) {
+			intercept_max = prev_intercepts;
+			intercept_max_idx = i - 1;
+		}
 
 		if (dev->states_usage[i].disable)
 			continue;
@@ -414,9 +436,22 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		}
 
 		/*
-		 * Look for the deepest idle state whose target residency had
-		 * not exceeded the idle duration in over a half of the relevant
-		 * cases in the past.
+		 * If the minimum state index is greater than or equal to the
+		 * index of the state with the maximum intercepts metric and
+		 * the corresponding state is enabled, there is no need to look
+		 * at the deeper states.
+		 */
+		if (min_idx >= intercept_max_idx &&
+		    !dev->states_usage[min_idx].disable) {
+			idx = min_idx;
+			goto constraint;
+		}
+
+		/*
+		 * Look for the deepest enabled idle state, at most as deep as
+		 * the one with the maximum intercepts metric, whose target
+		 * residency had not been greater than the idle duration in over
+		 * a half of the relevant cases in the past.
 		 *
 		 * Take the possible duration limitation present if the tick
 		 * has been stopped already into account.
@@ -428,7 +463,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 				continue;
 
 			idx = i;
-			if (2 * intercept_sum > idx_intercept_sum)
+			if (2 * intercept_sum > idx_intercept_sum &&
+			    i <= intercept_max_idx)
 				break;
 		}
 	}

From e79eec6ca1f5a3dbd804b73fd313b3fe455df4f3 Mon Sep 17 00:00:00 2001
From: Patrick Little <plittle@gmail.com>
Date: Wed, 28 Jan 2026 16:33:11 -0600
Subject: [PATCH 59/65] Documentation: Fix typos in energy model documentation

Fix typos in documentation related to energy model management.

Signed-off-by: Patrick Little <plittle@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20260128-documentation-fix-grammar-v1-1-39238dc471f9@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/power/energy-model.rst     | 14 +++++++-------
 Documentation/scheduler/sched-energy.rst |  8 ++++----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index cbdf7520aaa6..65133187f2ad 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -14,8 +14,8 @@ subsystems willing to use that information to make energy-aware decisions.
 The source of the information about the power consumed by devices can vary greatly
 from one platform to another. These power costs can be estimated using
 devicetree data in some cases. In others, the firmware will know better.
-Alternatively, userspace might be best positioned. And so on. In order to avoid
-each and every client subsystem to re-implement support for each and every
+Alternatively, userspace might be best positioned. In order to avoid
+having each and every client subsystem re-implement support for each and every
 possible source of information on its own, the EM framework intervenes as an
 abstraction layer which standardizes the format of power cost tables in the
 kernel, hence enabling to avoid redundant work.
@@ -32,7 +32,7 @@ be found in the Intelligent Power Allocation in
 Documentation/driver-api/thermal/power_allocator.rst.
 Kernel subsystems might implement automatic detection to check whether EM
 registered devices have inconsistent scale (based on EM internal flag).
-Important thing to keep in mind is that when the power values are expressed in
+An important thing to keep in mind is that when the power values are expressed in
 an 'abstract scale' deriving real energy in micro-Joules would not be possible.
 
 The figure below depicts an example of drivers (Arm-specific here, but the
@@ -82,7 +82,7 @@ using kref mechanism. The device driver which provided the new EM at runtime,
 should call EM API to free it safely when it's no longer needed. The EM
 framework will handle the clean-up when it's possible.
 
-The kernel code which want to modify the EM values is protected from concurrent
+The kernel code which wants to modify the EM values is protected from concurrent
 access using a mutex. Therefore, the device driver code must run in sleeping
 context when it tries to modify the EM.
 
@@ -113,7 +113,7 @@ Registration of 'advanced' EM
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The 'advanced' EM gets its name due to the fact that the driver is allowed
-to provide more precised power model. It's not limited to some implemented math
+to provide a more precise power model. It's not limited to some implemented math
 formula in the framework (like it is in 'simple' EM case). It can better reflect
 the real power measurements performed for each performance state. Thus, this
 registration method should be preferred in case considering EM static power
@@ -172,7 +172,7 @@ Registration of 'simple' EM
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The 'simple' EM is registered using the framework helper function
-cpufreq_register_em_with_opp(). It implements a power model which is tight to
+cpufreq_register_em_with_opp(). It implements a power model which is tied to a
 math formula::
 
 	Power = C * V^2 * f
@@ -251,7 +251,7 @@ It returns the 'struct em_perf_state' pointer which is an array of performance
 states in ascending order.
 This function must be called in the RCU read lock section (after the
 rcu_read_lock()). When the EM table is not needed anymore there is a need to
-call rcu_real_unlock(). In this way the EM safely uses the RCU read section
+call rcu_read_unlock(). In this way the EM safely uses the RCU read section
 and protects the users. It also allows the EM framework to manage the memory
 and free it. More details how to use it can be found in Section 3.2 in the
 example driver.
diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index 70e2921ef725..4e47aaf103eb 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst
@@ -244,7 +244,7 @@ Example 2.
 
 
     From these calculations, the Case 1 has the lowest total energy. So CPU 1
-    is be the best candidate from an energy-efficiency standpoint.
+    is the best candidate from an energy-efficiency standpoint.
 
 Big CPUs are generally more power hungry than the little ones and are thus used
 mainly when a task doesn't fit the littles. However, little CPUs aren't always
@@ -252,7 +252,7 @@ necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
 of the little CPUs can be less energy-efficient than the lowest OPPs of the
 bigs, for example. So, if the little CPUs happen to have enough utilization at
 a specific point in time, a small task waking up at that moment could be better
-of executing on the big side in order to save energy, even though it would fit
+off executing on the big side in order to save energy, even though it would fit
 on the little side.
 
 And even in the case where all OPPs of the big CPUs are less energy-efficient
@@ -285,7 +285,7 @@ much that can be done by the scheduler to save energy without severely harming
 throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
 'over-utilized' as soon as they are used at more than 80% of their compute
 capacity. As long as no CPUs are over-utilized in a root domain, load balancing
-is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+is disabled and EAS overrides the wake-up balancing code. EAS is likely to load
 the most energy efficient CPUs of the system more than the others if that can be
 done without harming throughput. So, the load-balancer is disabled to prevent
 it from breaking the energy-efficient task placement found by EAS. It is safe to
@@ -385,7 +385,7 @@ Using EAS with any other governor than schedutil is not supported.
 6.5 Scale-invariant utilization signals
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-In order to make accurate prediction across CPUs and for all performance
+In order to make accurate predictions across CPUs and for all performance
 states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
 be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
 callbacks.

From 1c7442d10b031ace1b7f4902af48bdca465ca25f Mon Sep 17 00:00:00 2001
From: Patrick Little <plittle@gmail.com>
Date: Wed, 28 Jan 2026 16:33:12 -0600
Subject: [PATCH 60/65] PM: EM: Documentation: Fix bug in example code snippet

A semicolon was mistakenly placed at the end of 'if' statements.

If example is copied as-is, it would lead to the subsequent return
being executed unconditionally, which is incorrect, and the rest of the
function would never be reached.

Signed-off-by: Patrick Little <plittle@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Subject adjustment ]
Link: https://patch.msgid.link/20260128-documentation-fix-grammar-v1-2-39238dc471f9@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/power/energy-model.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index 65133187f2ad..0d4644d72767 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -308,12 +308,12 @@ EM framework::
   05
   06		/* Use the 'foo' protocol to ceil the frequency */
   07		freq = foo_get_freq_ceil(dev, *KHz);
-  08		if (freq < 0);
+  08		if (freq < 0)
   09			return freq;
   10
   11		/* Estimate the power cost for the dev at the relevant freq. */
   12		power = foo_estimate_power(dev, freq);
-  13		if (power < 0);
+  13		if (power < 0)
   14			return power;
   15
   16		/* Return the values to the EM framework */

From 75ce02f4bc9a8b8350b6b1b01872467b0cc960cc Mon Sep 17 00:00:00 2001
From: Samuel Wu <wusamuel@google.com>
Date: Fri, 23 Jan 2026 17:21:29 -0800
Subject: [PATCH 61/65] PM: wakeup: Handle empty list in
 wakeup_sources_walk_start()

In the case of an empty wakeup_sources list, wakeup_sources_walk_start()
will return an invalid but non-NULL address. This also affects wrappers
of the aforementioned function, like for_each_wakeup_source().

Update wakeup_sources_walk_start() to return NULL in case of an empty
list.

Fixes: b4941adb24c0 ("PM: wakeup: Add routine to help fetch wakeup source object.")
Signed-off-by: Samuel Wu <wusamuel@google.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20260124012133.2451708-2-wusamuel@google.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/wakeup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 1e1a0e7eeac5..e69033d16fba 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -275,9 +275,7 @@ EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock);
  */
 struct wakeup_source *wakeup_sources_walk_start(void)
 {
-	struct list_head *ws_head = &wakeup_sources;
-
-	return list_entry_rcu(ws_head->next, struct wakeup_source, entry);
+	return list_first_or_null_rcu(&wakeup_sources, struct wakeup_source, entry);
 }
 EXPORT_SYMBOL_GPL(wakeup_sources_walk_start);
 

From 1fedbb589448bee9f20bb2ed9c850d1d2cf9963c Mon Sep 17 00:00:00 2001
From: Yaxiong Tian <tianyaxiong@kylinos.cn>
Date: Tue, 3 Feb 2026 10:48:52 +0800
Subject: [PATCH 62/65] cpufreq: intel_pstate: Enable asym capacity only when
 CPU SMT is not possible

According to the description in the intel_pstate.rst documentation,
Capacity-Aware Scheduling and Energy-Aware Scheduling are only
supported on a hybrid processor without SMT. Previously, the system
used sched_smt_active() for judgment, which is not a strict condition
because users can switch it on or off via /sys at any time.

This could lead to incorrect driver settings in certain scenarios.
For example, on a CPU that supports SMT, a user can disable SMT
via the nosmt parameter to enable asym capacity, and then re-enable
SMT via /sys. In such cases, some settings in the driver would no
longer be correct.

To address this issue, replace sched_smt_active() with cpu_smt_possible(),
and only enable asym capacity when CPU SMT is not possible.

Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20260203024852.301066-1-tianyaxiong@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ec4abe374573..1625ec2d0d06 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1161,7 +1161,7 @@ static void hybrid_init_cpu_capacity_scaling(bool refresh)
 	 * the capacity of SMT threads is not deterministic even approximately,
 	 * do not do that when SMT is in use.
 	 */
-	if (hwp_is_hybrid && !sched_smt_active() && arch_enable_hybrid_capacity_scale()) {
+	if (hwp_is_hybrid && !cpu_smt_possible() && arch_enable_hybrid_capacity_scale()) {
 		hybrid_refresh_cpu_capacity_scaling();
 		/*
 		 * Disabling ITMT causes sched domains to be rebuilt to disable asym

From 3bd1cde3dffbb29764453201e19c17053557a520 Mon Sep 17 00:00:00 2001
From: Yaxiong Tian <tianyaxiong@kylinos.cn>
Date: Tue, 3 Feb 2026 17:35:01 +0800
Subject: [PATCH 63/65] cpufreq: Documentation: Update description of
 rate_limit_us default value

Due to commit 37c6dccd6837 ("cpufreq: Remove LATENCY_MULTIPLIER")
updating the acquisition logic of cpufreq_policy_transition_delay_us(),
the original description of 2 ms has become inaccurate.

Therefore, update the description of the default value for
rate_limit_us from 2ms to 1ms.

Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20260203093501.1138721-1-tianyaxiong@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/admin-guide/pm/cpufreq.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 738d7b4dc33a..dbe6d23a5d67 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -439,7 +439,7 @@ This governor exposes only one tunable:
 ``rate_limit_us``
 	Minimum time (in microseconds) that has to pass between two consecutive
 	runs of governor computations (default: 1.5 times the scaling driver's
-	transition latency or the maximum 2ms).
+	transition latency or 1ms if the driver does not provide a latency value).
 
 	The purpose of this tunable is to reduce the scheduler context overhead
 	of the governor which might be excessive without it.

From 5c9ecd8e6437cd55a38ea4f1e1d19cee8e226cb8 Mon Sep 17 00:00:00 2001
From: Gui-Dong Han <hanguidong02@gmail.com>
Date: Tue, 3 Feb 2026 11:19:43 +0800
Subject: [PATCH 64/65] PM: sleep: wakeirq: harden dev_pm_clear_wake_irq()
 against races

dev_pm_clear_wake_irq() currently uses a dangerous pattern where
dev->power.wakeirq is read and checked for NULL outside the lock.
If two callers invoke this function concurrently, both might see
a valid pointer and proceed. This could result in a double-free
when the second caller acquires the lock and tries to release the
same object.

Address this by removing the lockless check of dev->power.wakeirq.
Instead, acquire dev->power.lock immediately to ensure the check and
the subsequent operations are atomic. If dev->power.wakeirq is NULL
under the lock, simply unlock and return. This guarantees that
concurrent calls cannot race to free the same object.

Based on a quick scan of current users, I did not find an actual bug as
drivers seem to rely on their own synchronization. However, since
asynchronous usage patterns exist (e.g., in
drivers/net/wireless/ti/wlcore), I believe a race is theoretically
possible if the API is used less carefully in the future. This change
hardens the API to be robust against such cases.

Fixes: 4990d4fe327b ("PM / Wakeirq: Add automated device wake IRQ handling")
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20260203031943.1924-1-hanguidong02@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/wakeirq.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/base/power/wakeirq.c b/drivers/base/power/wakeirq.c
index 8aa28c08b289..c0809d18fc54 100644
--- a/drivers/base/power/wakeirq.c
+++ b/drivers/base/power/wakeirq.c
@@ -83,13 +83,16 @@ EXPORT_SYMBOL_GPL(dev_pm_set_wake_irq);
  */
 void dev_pm_clear_wake_irq(struct device *dev)
 {
-	struct wake_irq *wirq = dev->power.wakeirq;
+	struct wake_irq *wirq;
 	unsigned long flags;
 
-	if (!wirq)
-		return;
-
 	spin_lock_irqsave(&dev->power.lock, flags);
+	wirq = dev->power.wakeirq;
+	if (!wirq) {
+		spin_unlock_irqrestore(&dev->power.lock, flags);
+		return;
+	}
+
 	device_wakeup_detach_irq(dev);
 	dev->power.wakeirq = NULL;
 	spin_unlock_irqrestore(&dev->power.lock, flags);

From 0491f3f9f664e7e0131eb4d2a8b19c49562e5c64 Mon Sep 17 00:00:00 2001
From: Xuewen Yan <xuewen.yan@unisoc.com>
Date: Wed, 4 Feb 2026 13:25:09 +0100
Subject: [PATCH 65/65] PM: sleep: core: Avoid bit field races related to
 work_in_progress

In all of the system suspend transition phases, the async processing of
a device may be carried out in parallel with power.work_in_progress
updates for the device's parent or suppliers and if it touches bit
fields from the same group (for example, power.must_resume or
power.wakeup_path), bit field corruption is possible.

To avoid that, turn work_in_progress in struct dev_pm_info into a proper
bool field and relocate it to save space.

Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children")
Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Closes: https://lore.kernel.org/linux-pm/20260203063459.12808-1-xuewen.yan@unisoc.com/
Cc: All applicable <stable@vger.kernel.org>
[ rjw: Added subject and changelog ]
Link: https://patch.msgid.link/CAB8ipk_VX2VPm706Jwa1=8NSA7_btWL2ieXmBgHr2JcULEP76g@mail.gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 include/linux/pm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/pm.h b/include/linux/pm.h
index 98a899858ece..afcaaa37a812 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -681,10 +681,10 @@ struct dev_pm_info {
 	struct list_head	entry;
 	struct completion	completion;
 	struct wakeup_source	*wakeup;
+	bool			work_in_progress;	/* Owned by the PM core */
 	bool			wakeup_path:1;
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
-	bool			work_in_progress:1;	/* Owned by the PM core */
 	bool			smart_suspend:1;	/* Owned by the PM core */
 	bool			must_resume:1;		/* Owned by the PM core */
 	bool			may_skip_resume:1;	/* Set by subsystems */