linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-13 08:39:31 +02:00

Author	SHA1	Message	Date
Linus Torvalds	d7c8087a9c	Power management updates for 7.1-rc1 - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa) - Update cpufreq-dt-platdev blocklist (Faruque Ansari) - Minor updates to driver and dt-bindings for Tegra (Thierry Reding, Rosen Penev) - Add MAINTAINERS entry for CPPC driver (Viresh Kumar) - Add support for new features: CPPC performance priority, Dynamic EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy, Mario Limonciello) - Fix sysfs files being present when HW missing and broken/outdated documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy) - Pass the policy to cpufreq_driver->adjust_perf() to avoid using cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which leads to a scheduling-while-atomic bug (K Prateek Nayak) - Clean up dead code in Kconfig for cpufreq (Julian Braha) - Remove max_freq_req update for pre-existing cpufreq policy and add a boost_freq_req QoS request to save the boost constraint instead of overwriting the last scaling_max_freq constraint (Pierre Gondois) - Embed cpufreq QoS freq_req objects in cpufreq policy so they all are allocated in one go along with the policy to simplify lifetime rules and avoid error handling issues (Viresh Kumar) - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq scaling driver (Henry Tseng) - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead of cpumask_weight() because the former is more efficient (Yury Norov) - Use sysfs_emit() in sysfs show functions for cpufreq governor attributes (Thorsten Blum) - Update intel_pstate to stop returning an error when "off" is written to its status sysfs attribute while the driver is already off (Fabio De Francesco) - Include current frequency in the debug message printed by __cpufreq_driver_target() (Pengjie Zhang) - Refine stopped tick handling in the menu cpuidle governor and rearrange stopped tick handling in the teo cpuidle governor (Rafael Wysocki) - Add Panther Lake C-states table to the intel_idle driver (Artem Bityutskiy) - Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha) - Simplify cpuidle_register_device() with guard() (Huisong Li) - Use performance level if available to distinguish between rates in OPP debugfs (Manivannan Sadhasivam) - Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar) - Return -ENODATA if the snapshot image is not loaded (Alberto Garcia) - Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric Biggers) - Clean up and rearrange the intel_rapl power capping driver to make the respective interface drivers (TPMI, MSR, and MMOI) hold their own settings and primitives and consolidate PL4 and PMU support flags into rapl_defaults (Kuppuswamy Sathyanarayanan) - Correct kernel-doc function parameter names in the power capping core code (Randy Dunlap) - Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko) - Use _visible attribute to replace create/remove_sysfs_files() in devfreq (Pengjie Zhang) - Add Tegra114 support to activity monitor device in tegra30-devfreq as a preparation to upcoming EMC controller support (Svyatoslav Ryhel) - Fix mistakes in cpupower man pages, add the boost and epp options to the cpupower-frequency-info man page, and add the perf-bias option to the cpupower-info man page (Roberto Ricci) - Remove unnecessary extern declarations from getopt.h in arguments parsing functions in cpufreq-set, cpuidle-info, cpuidle-set, cpupower-info, and cpupower-set utilities (Kaushlendra Kumar) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmnY9TISHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1G9gH/j5mEqfPpiwX6fQ/ZwOGdNOOPVA5w9j4 KPHSMwMD5lZkoaZfasp2vt27KY5SOoVVvRZ2DKkFJ3Jai4I3cUPZYypga2nre1ag tgzX4vOjcw2r40Eda6ezWl1h4mca/xJJBX7xH2+hn1JY+Y1in37g50CqMIjKh96z Uugkk6UZytL1XcF55PMhIUgDf6pDtRT5UOW9xOKOkUt8FVWTJ7ei3HaWyV5kDmVq b5eQ42+OH7y6sWNnoKczFd8fStvh6J/avoJurBEvcOQhMcjaIaB48G19+KjDg73E NjrVcgG20P2rltBvV2d0J1TKskZHkaP7XjIeWfkwjGZhee3FL7ssS/g= =fRCO -----END PGP SIGNATURE----- Merge tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Once again, cpufreq is the most active development area, mostly because of the new feature additions and documentation updates in the amd-pstate driver, but there are also changes in the cpufreq core related to boost support and other assorted updates elsewhere. Next up are power capping changes due to the major cleanup of the Intel RAPL driver. On the cpuidle front, a new C-states table for Intel Panther Lake is added to the intel_idle driver, the stopped tick handling in the menu and teo governors is updated, and there are a couple of cleanups. Apart from the above, support for Tegra114 is added to devfreq and there are assorted cleanups of that code, there are also two updates of the operating performance points (OPP) library, two minor updates related to hibernation, and cpupower utility man pages updates and cleanups. Specifics: - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa) - Update cpufreq-dt-platdev blocklist (Faruque Ansari) - Minor updates to driver and dt-bindings for Tegra (Thierry Reding, Rosen Penev) - Add MAINTAINERS entry for CPPC driver (Viresh Kumar) - Add support for new features: CPPC performance priority, Dynamic EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy, Mario Limonciello) - Fix sysfs files being present when HW missing and broken/outdated documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy) - Pass the policy to cpufreq_driver->adjust_perf() to avoid using cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which leads to a scheduling-while-atomic bug (K Prateek Nayak) - Clean up dead code in Kconfig for cpufreq (Julian Braha) - Remove max_freq_req update for pre-existing cpufreq policy and add a boost_freq_req QoS request to save the boost constraint instead of overwriting the last scaling_max_freq constraint (Pierre Gondois) - Embed cpufreq QoS freq_req objects in cpufreq policy so they all are allocated in one go along with the policy to simplify lifetime rules and avoid error handling issues (Viresh Kumar) - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq scaling driver (Henry Tseng) - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead of cpumask_weight() because the former is more efficient (Yury Norov) - Use sysfs_emit() in sysfs show functions for cpufreq governor attributes (Thorsten Blum) - Update intel_pstate to stop returning an error when "off" is written to its status sysfs attribute while the driver is already off (Fabio De Francesco) - Include current frequency in the debug message printed by __cpufreq_driver_target() (Pengjie Zhang) - Refine stopped tick handling in the menu cpuidle governor and rearrange stopped tick handling in the teo cpuidle governor (Rafael Wysocki) - Add Panther Lake C-states table to the intel_idle driver (Artem Bityutskiy) - Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha) - Simplify cpuidle_register_device() with guard() (Huisong Li) - Use performance level if available to distinguish between rates in OPP debugfs (Manivannan Sadhasivam) - Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar) - Return -ENODATA if the snapshot image is not loaded (Alberto Garcia) - Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric Biggers) - Clean up and rearrange the intel_rapl power capping driver to make the respective interface drivers (TPMI, MSR, and MMOI) hold their own settings and primitives and consolidate PL4 and PMU support flags into rapl_defaults (Kuppuswamy Sathyanarayanan) - Correct kernel-doc function parameter names in the power capping core code (Randy Dunlap) - Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko) - Use _visible attribute to replace create/remove_sysfs_files() in devfreq (Pengjie Zhang) - Add Tegra114 support to activity monitor device in tegra30-devfreq as a preparation to upcoming EMC controller support (Svyatoslav Ryhel) - Fix mistakes in cpupower man pages, add the boost and epp options to the cpupower-frequency-info man page, and add the perf-bias option to the cpupower-info man page (Roberto Ricci) - Remove unnecessary extern declarations from getopt.h in arguments parsing functions in cpufreq-set, cpuidle-info, cpuidle-set, cpupower-info, and cpupower-set utilities (Kaushlendra Kumar)" * tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits) cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP cpupower: remove extern declarations in cmd functions cpuidle: Simplify cpuidle_register_device() with guard() PM / devfreq: tegra30-devfreq: add support for Tegra114 PM / devfreq: use _visible attribute to replace create/remove_sysfs_files() PM / devfreq: Remove unneeded casting for HZ_PER_KHZ MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer cpufreq: Pass the policy to cpufreq_driver->adjust_perf() cpufreq/amd-pstate: Pass the policy to amd_pstate_update() cpufreq/amd-pstate-ut: Add a unit test for raw EPP cpufreq/amd-pstate: Add support for raw EPP writes cpufreq/amd-pstate: Add support for platform profile class cpufreq/amd-pstate: add kernel command line to override dynamic epp cpufreq/amd-pstate: Add dynamic energy performance preference Documentation: amd-pstate: fix dead links in the reference section cpufreq/amd-pstate: Cache the max frequency in cpudata Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count} Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file amd-pstate-ut: Add a testcase to validate the visibility of driver attributes ...	2026-04-13 19:47:52 -07:00
Pierre Gondois	6e39ba4e5a	cpufreq: Add boost_freq_req QoS request The Power Management Quality of Service (PM QoS) allows to aggregate constraints from multiple entities. It is currently used to manage the min/max frequency of a given policy. Frequency constraints can come for instance from: - Thermal framework: acpi_thermal_cpufreq_init() - Firmware: _PPC objects: acpi_processor_ppc_init() - User: by setting policyX/scaling_[min\|max]_freq The minimum of the max frequency constraints is used to compute the resulting maximum allowed frequency. When enabling boost frequencies, the same frequency request object (policy->max_freq_req) as to handle requests from users is used. As a result, when setting: - scaling_max_freq - boost The last sysfs file used overwrites the request from the other sysfs file. To avoid this, create a per-policy boost_freq_req to save the boost constraints instead of overwriting the last scaling_max_freq constraint. policy_set_boost() calls the cpufreq set_boost callback. Update the newly added boost_freq_req request from there: - whenever boost is toggled - to cover all possible paths In the existing .set_boost() callbacks: - Don't update policy->max as this is done through the qos notifier cpufreq_notifier_max() which calls cpufreq_set_policy(). - Remove freq_qos_update_request() calls as the qos request is now done in policy_set_boost() and updates the new boost_freq_req $ ## Init state scaling_max_freq:1000000 cpuinfo_max_freq:1000000 $ echo 700000 > scaling_max_freq scaling_max_freq:700000 cpuinfo_max_freq:1000000 $ echo 1 > ../boost scaling_max_freq:1200000 cpuinfo_max_freq:1200000 $ echo 800000 > scaling_max_freq scaling_max_freq:800000 cpuinfo_max_freq:1200000 $ ## Final step: $ ## Without the patches: $ echo 0 > ../boost scaling_max_freq:1000000 cpuinfo_max_freq:1000000 $ ## With the patches: $ echo 0 > ../boost scaling_max_freq:800000 cpuinfo_max_freq:1000000 Note: cpufreq_frequency_table_cpuinfo() updates policy->min and max from: A. cpufreq_boost_set_sw() \-cpufreq_frequency_table_cpuinfo() B. cpufreq_policy_online() \-cpufreq_table_validate_and_sort() \-cpufreq_frequency_table_cpuinfo() Keep these updates as some drivers expect policy->min and max to be set through B. Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20260326204404.1401849-3-pierre.gondois@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-03-30 21:56:52 +02:00
Pengjie Zhang	8505bfb4e4	ACPI: CPPC: Move reference performance to capabilities Currently, the `Reference Performance` register is read every time the CPU frequency is sampled in `cppc_get_perf_ctrs()`. This function is on the hot path of the cppc_cpufreq driver. Reference Performance indicates the performance level that corresponds to the Reference Counter incrementing and is not expected to change dynamically during runtime (unlike the Delivered and Reference counters). Reading this register in the hot path incurs unnecessary overhead, particularly on platforms where CPC registers are located in the PCC (Platform Communication Channel) subspace. This patch moves `reference_perf` from the dynamic feedback counters structure (`cppc_perf_fb_ctrs`) to the static capabilities structure (`cppc_perf_caps`). Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com> [ rjw: Changelog adjustment ] Link: https://patch.msgid.link/20260213100935.19111-1-zhangpengjie2@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-03-05 20:00:19 +01:00
Sumit Gupta	13c45a2663	ACPI: CPPC: add APIs and sysfs interface for perf_limited Add sysfs interface to read/write the Performance Limited register. The Performance Limited register indicates to the OS that an unpredictable event (like thermal throttling) has limited processor performance. It contains two sticky bits set by the platform: - Bit 0 (Desired_Excursion): Set when delivered performance is constrained below desired performance. Not used when Autonomous Selection is enabled. - Bit 1 (Minimum_Excursion): Set when delivered performance is constrained below minimum performance. These bits remain set until OSPM explicitly clears them. The write operation accepts a bitmask of bits to clear: - Write 0x1 to clear bit 0 - Write 0x2 to clear bit 1 - Write 0x3 to clear both bits This enables users to detect if platform throttling impacted a workload. Users clear the register before execution, run the workload, then check afterward - if set, hardware throttling occurred during that time window. The interface is exposed as: /sys/devices/system/cpu/cpuX/cpufreq/perf_limited Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-7-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-02-27 20:50:42 +01:00
Sumit Gupta	ea3db45ae4	cpufreq: cppc: Update MIN_PERF/MAX_PERF in target callbacks Update MIN_PERF and MAX_PERF registers from policy->min and policy->max in the .target() and .fast_switch() callbacks. This allows controlling performance bounds via standard scaling_min_freq and scaling_max_freq sysfs interfaces. Similar to intel_cpufreq which updates HWP min/max limits in .target(), cppc_cpufreq now programs MIN_PERF/MAX_PERF along with DESIRED_PERF. Since MIN_PERF/MAX_PERF can be updated even when auto_sel is disabled, they are updated unconditionally. Also program MIN_PERF/MAX_PERF in store_auto_select() when enabling autonomous selection so the platform uses correct bounds immediately. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Link: https://patch.msgid.link/20260206142658.72583-6-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-02-27 20:50:42 +01:00
Sumit Gupta	24ad4c6c13	cpufreq: CPPC: Update cached perf_ctrls on sysfs write Update the cached perf_ctrls values when writing via sysfs to keep them in sync with hardware registers: - store_auto_select(): update perf_ctrls.auto_sel - store_energy_performance_preference_val(): update perf_ctrls.energy_perf This ensures consistent cached values after sysfs writes, which complements the cppc_get_perf() initialization during policy setup. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-5-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-02-27 20:50:42 +01:00
Sumit Gupta	658fa7b1c4	ACPI: CPPC: Add cppc_get_perf() API to read performance controls Add cppc_get_perf() function to read values of performance control registers including desired_perf, min_perf, max_perf, energy_perf, and auto_sel. This provides a read interface to complement the existing cppc_set_perf() write interface for performance control registers. Note that auto_sel is read by cppc_get_perf() but not written by cppc_set_perf() to avoid unintended mode changes during performance updates. It can be updated with existing dedicated cppc_set_auto_sel() API. Use cppc_get_perf() in cppc_cpufreq_get_cpu_data() to initialize perf_ctrls with current hardware register values during cpufreq policy initialization. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-2-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-02-27 20:50:41 +01:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Sumit Gupta	4a1cf5ed51	cpufreq: CPPC: Add generic helpers for sysfs show/store Add generic helper functions for u64 sysfs attributes that follow the common pattern of calling CPPC get/set APIs: - cppc_cpufreq_sysfs_show_u64(): reads value and handles -EOPNOTSUPP - cppc_cpufreq_sysfs_store_u64(): parses input and calls set function Add CPPC_CPUFREQ_ATTR_RW_U64() macro to generate show/store functions using these helpers, reducing boilerplate for simple attributes. Convert auto_act_window and energy_performance_preference_val to use the new macro. No functional changes. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> [ rjw: Retained empty code line after a conditional ] Link: https://patch.msgid.link/20260120145623.2959636-2-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2026-01-28 19:10:50 +01:00
Jie Zhan	997c021abc	cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs Currently, the CPPC Frequency Invariance Engine (FIE) is invoked from the scheduler tick but defers the update of arch_freq_scale to a separate thread because cppc_get_perf_ctrs() would sleep if the CPC regs are in PCC. However, this deferred update mechanism is unnecessary and introduces extra overhead for non-PCC register spaces (e.g. System Memory or FFH), where accessing the regs won't sleep and can be safely performed from the tick context. Furthermore, with the CPPC FIE registered, it throws repeated warnings of "cppc_scale_freq_workfn: failed to read perf counters" on our platform with the CPC regs in System Memory and a power-down idle state enabled. That's because the remote CPU can be in a power-down idle state, and reading its perf counters returns 0. Moving the FIE handling back to the scheduler tick process makes the CPU handle its own perf counters, so it won't be idle and the issue would be inherently solved. To address the above issues, update arch_freq_scale directly in ticks for non-PCC regs and keep the deferred update mechanism for PCC regs. Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2026-01-27 11:21:23 +05:30
Jie Zhan	206b661255	cpufreq: CPPC: Factor out cppc_fie_kworker_init() Factor out the CPPC FIE kworker init in cppc_freq_invariance_init() because it's a standalone procedure for use when the CPC regs are in PCC channels. Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2026-01-27 11:21:23 +05:30
Jie Zhan	1971b18785	cpufreq: CPPC: Don't warn if FIE init fails to read counters During the CPPC FIE initialization, reading perf counters on offline cpus should be expected to fail. Don't warn on this case. Also, change the error log level to debug since FIE is optional. Co-developed-by: Bowen Yu <yubowen8@huawei.com> Signed-off-by: Bowen Yu <yubowen8@huawei.com> # Changing loglevel to debug Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> [ Viresh: Added back the dropped comment. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-28 10:40:47 +05:30
Rafael J. Wysocki	c28a280bd4	ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error value Instead of using CPUFREQ_ETERNAL for signaling an error condition in cppc_get_transition_latency(), change the return value type of that function to int and make it return a proper negative error code on failures. No intentional functional impact. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Qais Yousef <qyousef@layalina.io>	2025-10-01 13:57:13 +02:00
Rafael J. Wysocki	f965d111e6	cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delay If cppc_get_transition_latency() returns CPUFREQ_ETERNAL to indicate a failure to retrieve the transition latency value from the platform firmware, the CPPC cpufreq driver will use that value (converted to microseconds) as the policy transition delay, but it is way too large for any practical use. Address this by making the driver use the cpufreq's default transition latency value (in microseconds) as the transition delay if CPUFREQ_ETERNAL is returned by cppc_get_transition_latency(). Fixes: `d4f3388afd` ("cpufreq / CPPC: Set platform specific transition_delay_us") Cc: 5.19+ <stable@vger.kernel.org> # 5.19 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Qais Yousef <qyousef@layalina.io>	2025-10-01 13:56:57 +02:00
Zihuan Zhang	c8dc2368b2	cpufreq: CPPC: Use scope-based cleanup helper Replace the manual cpufreq_cpu_put() with __free(put_cpufreq_policy) annotation for policy references. This reduces the risk of reference counting mistakes and aligns the code with the latest kernel style. No functional change intended. Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> [ Viresh: Minor changes ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-08-29 11:33:57 +05:30
BowenYu	16d39e2bf9	cpufreq: Remove unused parameter in cppc_perf_from_fbctrs() Remove the unused parameter cppc_cpudata* cpu_data in cppc_perf_from_fbctrs(). Signed-off-by: BowenYu <yubowen8@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-08-11 12:24:51 +05:30
Prashant Malani	0a1416a49e	cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag AMU counters on certain CPPC-based platforms tend to yield inaccurate delivered performance measurements on systems that are idle/mostly idle. This results in an inaccurate frequency being stored by cpufreq in its policy structure when the CPU is brought online. [1] Consequently, if the userspace governor tries to set the frequency to a new value, there is a possibility that it would be the erroneous value stored earlier. In such a scenario, cpufreq would assume that the requested frequency has already been set and return early, resulting in the correct/new frequency request never making it to the hardware. Since the operating frequency is liable to this sort of inconsistency, mark the CPPC driver with CPUFREQ_NEED_UPDATE_LIMITS so that it is always invoked when a target frequency update is requested. Link: https://lore.kernel.org/linux-pm/20250619000925.415528-3-pmalani@google.com/ [1] Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Prashant Malani <pmalani@google.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250722055611.130574-2-pmalani@google.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-22 12:13:12 +02:00
Lifeng Zheng	c83a92df2f	cpufreq: CPPC: Remove forward declaration of cppc_cpufreq_register_em() cppc_cpufreq_register_em() is only used in populate_efficiency_class(). A forward declaration of it is not necessary. Move cppc_cpufreq_register_em() in front of populate_efficiency_class() and remove the forward declaration of cppc_cpufreq_register_em(). No functional change. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20250526113057.3086513-4-zhenglifeng1@huawei.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 21:03:52 +02:00
Lifeng Zheng	3d5978ea6c	cpufreq: CPPC: Do not return a value from populate_efficiency_class() The return value of populate_efficiency_class() is never needed and the result of it doesn't affect the initialization of cppc_cpufreq. It makes more sense to change it into a void function. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20250526113057.3086513-3-zhenglifeng1@huawei.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 21:03:52 +02:00
Lifeng Zheng	d80a756240	cpufreq: CPPC: Remove cpu_data_list After commit `a28b2bfc09` ("cppc_cpufreq: replace per-cpu data array with a list"), cpu_data can be got from policy->driver_data, so cpu_data_list is not actually needed and can be removed. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20250526113057.3086513-2-zhenglifeng1@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-06-18 21:03:52 +02:00
Lifeng Zheng	922607a2b4	cpufreq: CPPC: Add support for autonomous selection Add sysfs interfaces for CPPC autonomous selection in the cppc_cpufreq driver. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Reviewed-by: Sumit Gupta <sumitg@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250507031941.2812701-1-zhenglifeng1@huawei.com [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-05-21 22:47:44 +02:00
Marc Zyngier	2b8e6b5888	cpufreq: cppc: Fix invalid return value in .get() callback Returning a negative error code in a function with an unsigned return type is a pretty bad idea. It is probably worse when the justification for the change is "our static analisys tool found it". Fixes: `cf7de25878` ("cppc_cpufreq: Fix possible null pointer dereference") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-04-16 13:37:44 +05:30
Viresh Kumar	a3f48fb2e5	cpufreq: cppc: Set policy->boost_supported With a later commit, the cpufreq core will call the ->set_boost() callback only if the policy supports boost frequency. The boost_supported flag is set by the cpufreq core if policy->freq_table is set and one or more boost frequencies are present. For other drivers, the flag must be set explicitly. With this, the local variable boost_supported isn't required anymore. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-02-07 09:45:15 +05:30
Lifeng Zheng	03d8b4e762	cpufreq: CPPC: Fix wrong max_freq in policy initialization In policy initialization, policy->max and policy->cpuinfo.max_freq are always set to the value calculated from caps->nominal_perf. This will cause the frequency stay on base frequency even if the policy is already boosted when a CPU is going online. Fix this by using policy->boost_enabled to determine which value should be set. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250117101457.1530653-4-zhenglifeng1@huawei.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-01-23 21:06:33 +01:00
Frederic Weisbecker	b04e317b52	treewide: Introduce kthread_run_worker[_on_cpu]() kthread_create() creates a kthread without running it yet. kthread_run() creates a kthread and runs it. On the other hand, kthread_create_worker() creates a kthread worker and runs it. This difference in behaviours is confusing. Also there is no way to create a kthread worker and affine it using kthread_bind_mask() or kthread_affine_preferred() before starting it. Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]() that behaves just like kthread_run(). kthread_create_worker[_on_cpu]() will now only create a kthread worker without starting it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>	2025-01-08 18:15:03 +01:00
Rafael J. Wysocki	baf4ae8038	ARM cpufreq updates for 6.13 - Add virtual cpufreq driver for guest kernels (David Dai). - Minor cleanup to various cpufreq drivers (Andy Shevchenko, Dhruva Gole, Jie Zhan, Jinjie Ruan, Shuosheng Huang, Sibi Sankar, and Yuan Can). - Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check" (Colin Ian King). - Improve DT bindings for qcom-hw driver (Dmitry Baryshkov, Konrad Dybcio, and Nikunj Kela). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmc6wgAACgkQ0rkcPK6B Ehz2HA/9FvaDMmi4q1yt2CvkypN4XiaerUZBE+wBMgLDiGIOzx3X1YpiZwfH7LRR V+E7+63ZYH0bxfErG3M75x8YKvARB5WaiP+f+YYIFGnBiNdbG8WdooAy+gViE+AX Wiq4FNV2IqaQXceq1TCFMiwN8tn1gZO/axsWDiEUB+bTn11noJBkNa4I6TaGDH4X IwTVss5VBcP4fORmkTSnA/Epw6mtFIQfHPO3m5SbgBiB6NVK3+//ZAnHCYB23H34 X5f0BcTw0IxkHbSuASg8ZMgqHfnmduV8g8dAhhIMkh2Zci145nmcSdcBZk1G32NC ffYznTTwEVRGMQ9ku6j9FXrqUw0Bb8GEOKSNMoO0Tc+e0UmAP/xKYICH3lbLEfXo 3DWBDNu7wNjTpZ5OJwkFsRspCVZVBDi/bnBH+XN2ELzLIPiSMN2R+I480G5uNfMt iLy9Vzqpw7H6Z2OZiN9scUDZ/pgIC7T3ZjJLwy6XsBeWjM3/AwNPx1LYM5UNFl2P MEQs0EOXl1jxMEFG4jmr8iDT5dhgX0LM5zVq0kK/dQCtLtz9BqcVr3NV3bU4pLrx fldHqaxZXnCFqtXR3RHSodGdj1B3H+KhgeXki7qxtXAAzdydNFDZxA0Kxm7DHG0i onwB+b+J4TrEIwdAODZsjh4XYjKl/fLaIinHCkabQhOXUlQsgcg= =72Nq -----END PGP SIGNATURE----- Merge tag 'cpufreq-arm-updates-6.13' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge ARM cpufreq updates for 6.13 from Viresh Kumar: "- Add virtual cpufreq driver for guest kernels (David Dai). - Minor cleanup to various cpufreq drivers (Andy Shevchenko, Dhruva Gole, Jie Zhan, Jinjie Ruan, Shuosheng Huang, Sibi Sankar, and Yuan Can). - Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check" (Colin Ian King). - Improve DT bindings for qcom-hw driver (Dmitry Baryshkov, Konrad Dybcio, and Nikunj Kela)." * tag 'cpufreq-arm-updates-6.13' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: arm64: dts: qcom: sc8180x: Add a SoC-specific compatible to cpufreq-hw dt-bindings: cpufreq: cpufreq-qcom-hw: Add SC8180X compatible cpufreq: sun50i: add a100 cpufreq support cpufreq: mediatek-hw: Fix wrong return value in mtk_cpufreq_get_cpu_power() cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_power() cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_cost() cpufreq: loongson3: Check for error code from devm_mutex_init() call cpufreq: scmi: Fix cleanup path when boost enablement fails cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost() cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw() Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check" dt-bindings: cpufreq: cpufreq-qcom-hw: Add SAR2130P compatible cpufreq: add virtual-cpufreq driver dt-bindings: cpufreq: add virtual cpufreq device cpufreq: loongson2: Unregister platform_driver on failure cpufreq: ti-cpufreq: Remove revision offsets in AM62 family cpufreq: ti-cpufreq: Allow backward compatibility for efuse syscon cppc_cpufreq: Remove HiSilicon CPPC workaround cppc_cpufreq: Use desired perf if feedback ctrs are 0 or unchanged dt-bindings: cpufreq: qcom-hw: document support for SA8255p	2024-11-19 21:35:14 +01:00
Jinjie Ruan	b51eb0874d	cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_power() cppc_get_cpu_power() return 0 if the policy is NULL. Then in em_create_perf_table(), the later zero check for power is not valid as power is uninitialized. As Quentin pointed out, kernel energy model core check the return value of active_power() first, so if the callback failed it should tell the core. So return -EINVAL to fix it. Fixes: `a78e720756` ("cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-11-11 09:45:16 +05:30
Jinjie Ruan	be392aa80f	cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_cost() cppc_get_cpu_cost() return 0 if the policy is NULL. Then in em_compute_costs(), the later zero check for cost is not valid as cost is uninitialized. As Quentin pointed out, kernel energy model core check the return value of get_cost() first, so if the callback failed it should tell the core. Return -EINVAL to fix it. Fixes: `1a1374bb8c` ("cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/c4765377-7830-44c2-84fa-706b6e304e10@stanley.mountain/ Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-11-11 09:45:16 +05:30
Jinjie Ruan	1a1374bb8c	cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost() cpufreq_cpu_get_raw() may return NULL if the cpu is not in policy->cpus cpu mask and it will cause null pointer dereference, so check NULL for cppc_get_cpu_cost(). Fixes: `740fcdc2c2` ("cpufreq: CPPC: Register EM based on efficiency class information") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-10-30 14:38:40 +05:30
Jinjie Ruan	a78e720756	cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw() cpufreq_cpu_get_raw() may return NULL if the cpu is not in policy->cpus cpu mask and it will cause null pointer dereference. Fixes: `740fcdc2c2` ("cpufreq: CPPC: Register EM based on efficiency class information") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-10-30 08:48:36 +05:30
Jie Zhan	ea1829d4d4	cppc_cpufreq: Remove HiSilicon CPPC workaround Since commit `6c8d750f97` ("cpufreq / cppc: Work around for Hisilicon CPPC cpufreq"), we introduce a workround for HiSilicon platforms that do not support performance feedback counters, whereas they can get the actual frequency from the desired perf register. Later on, FIE is disabled in that workaround as well. Now the workround can be handled by the common code. Desired perf would be read and converted to frequency if feedback counters don't change. FIE would be disabled if the CPPC regs are in PCC region. Hence, the workaround is no longer needed and can be safely removed, in an effort to consolidate the driver procedure. Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Huisong Li <lihuisong@huawei.com> [ Viresh: Move fie_disabled withing CONFIG option to fix warning ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-10-03 11:09:44 +05:30
Jie Zhan	c471956319	cppc_cpufreq: Use desired perf if feedback ctrs are 0 or unchanged The CPPC performance feedback counters could be 0 or unchanged when the target cpu is in a low-power idle state, e.g. power-gated or clock-gated. When the counters are 0, cppc_cpufreq_get_rate() returns 0 KHz, which makes cpufreq_online() get a false error and fail to generate a cpufreq policy. When the counters are unchanged, the existing cppc_perf_from_fbctrs() returns a cached desired perf, but some platforms may update the real frequency back to the desired perf reg. For the above cases in cppc_cpufreq_get_rate(), get the latest desired perf from the CPPC reg to reflect the frequency because some platforms may update the actual frequency back there; if failed, use the cached desired perf. Fixes: `6a4fec4f6d` ("cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.") Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-10-03 11:07:53 +05:30
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Christian Loehle	4eb71e3b45	cpufreq/cppc: Use NSEC_PER_MSEC for deadline task Convert the cppc deadline task attributes to use the available definitions to make them more readable. No functional change. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20240813144348.1180344-4-christian.loehle@arm.com	2024-09-11 11:25:10 +02:00
Lizhe	b4b1ddc9df	cpufreq: Make cpufreq_driver->exit() return void The cpufreq core doesn't check the return type of the exit() callback and there is not much the core can do on failures at that point. Just drop the returned value and make it return void. Signed-off-by: Lizhe <sensor1010@163.com> [ Viresh: Reworked the patches to fix all missing changes together. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Kevin Hilman <khilman@baylibre.com> # omap	2024-07-09 08:45:30 +05:30
Riwen Lu	90e4ed6bb0	cpufreq/cppc: Don't compare desired_perf in target() There is a corner case where the desired_perf is exactly same as the old perf, but the actual current freq is not. This happens during S3 while the cpufreq governor is set to powersave. During cpufreq resume process, the booting CPU's new_freq obtained via .get() is the highest frequency, while the policy->cur and cpu->perf_ctrls.desired_perf are set to the lowest level (powersave governor). This causes the warning: "CPU frequency out of sync:", and the cpufreq core sets policy->cur to new_freq. Then the governor->limits() calls cppc_cpufreq_set_target() to configures the CPU frequency and returns directly because the desired_perf converted from target_freq is same as the cpu->perf_ctrls.desired_perf and both are the lowest_perf. Since target_freq and policy->cur have been already compared in __cpufreq_driver_target(), there's no need to compare them again here. Drop the comparison. Signed-off-by: Riwen Lu <luriwen@kylinos.cn> [ Viresh: Updated commit message / subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-06-13 15:12:57 +05:30
Aleksandr Mishin	cf7de25878	cppc_cpufreq: Fix possible null pointer dereference cppc_cpufreq_get_rate() and hisi_cppc_cpufreq_get_rate() can be called from different places with various parameters. So cpufreq_cpu_get() can return null as 'policy' in some circumstances. Fix this bug by adding null return check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `a28b2bfc09` ("cppc_cpufreq: replace per-cpu data array with a list") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-04-19 11:59:16 +05:30
Vincent Guittot	50b813b147	cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz\|khz_to_perf}() Move and rename cppc_cpufreq_perf_to_khz() and cppc_cpufreq_khz_to_perf() to use them outside cppc_cpufreq in topology_init_cpu_capacity_cppc(). Modify the interface to use struct cppc_perf_caps caps instead of struct cppc_cpudata cpu_data as we only use the fields of cppc_perf_caps. cppc_cpufreq was converting the lowest and nominal freq from MHz to kHz before using them. We move this conversion inside cppc_perf_to_khz and cppc_khz_to_perf to make them generic and usable outside cppc_cpufreq. No functional change Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Pierre Gondois <pierre.gondois@arm.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/20231211104855.558096-6-vincent.guittot@linaro.org	2023-12-23 15:52:35 +01:00
Liao Chang	e613d8cff5	cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie The function cppc_freq_invariance_init() may failed to create kworker_fie, make it more robust by setting fie_disabled to FIE_DISBALED to prevent an invalid pointer dereference in kthread_destroy_worker(), which called from cppc_freq_invariance_exit(). Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2023-08-17 14:24:58 +05:30
Liao Chang	6a4fec4f6d	cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases. The cpufreq framework used to use the zero of return value to reflect the cppc_cpufreq_get_rate() had failed to get current frequecy and treat all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a negative integer in error case, so it is better to convert the value to zero as the return value of cppc_cpufreq_get_rate(). Signed-off-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2023-08-17 14:12:11 +05:30
Pierre Gondois	f5f94b9c8b	cpufreq: CPPC: Add u64 casts to avoid overflowing The fields of the _CPC object are unsigned 32-bits values. To avoid overflows while using _CPC's values, add 'u64' casts. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2022-12-27 08:27:14 +05:30
Jeremy Linton	ae2df912d1	ACPI: CPPC: Disable FIE if registers in PCC regions PCC regions utilize a mailbox to set/retrieve register values used by the CPPC code. This is fine as long as the operations are infrequent. With the FIE code enabled though the overhead can range from 2-11% of system CPU overhead (ex: as measured by top) on Arm based machines. So, before enabling FIE assure none of the registers used by cppc_get_perf_ctrs() are in the PCC region. Finally, add a module parameter which can override the PCC region detection at boot or module reload. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-09-24 18:43:46 +02:00
Perry Yuan	a2a9d18500	ACPI: CPPC: Add ACPI disabled check to acpi_cpc_valid() Make acpi_cpc_valid() check if ACPI is disabled, so that its callers don't need to check that separately. This will also cause the AMD pstate driver to refuse to load right away when ACPI is disabled. Also update the warning message in amd_pstate_init() to mention the ACPI disabled case for completeness. Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> [ rjw: Subject edits, new changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-08-25 13:55:17 +02:00
Pierre Gondois	da4363457f	cpufreq: CPPC: Fix unused-function warning Building the cppc_cpufreq driver with for arm64 with CONFIG_ENERGY_MODEL=n triggers the following warnings: drivers/cpufreq/cppc_cpufreq.c:550:12: error: ‘cppc_get_cpu_cost’ defined but not used [-Werror=unused-function] 550 \| static int cppc_get_cpu_cost(struct device cpu_dev, unsigned long KHz, \| ^~~~~~~~~~~~~~~~~ drivers/cpufreq/cppc_cpufreq.c:481:12: error: ‘cppc_get_cpu_power’ defined but not used [-Werror=unused-function] 481 \| static int cppc_get_cpu_power(struct device cpu_dev, \| ^~~~~~~~~~~~~~~~~~ Move the Energy Model related functions into specific guards. This allows to fix the warning and prevent doing extra work when the Energy Model is not present. Fixes: `740fcdc2c2` ("cpufreq: CPPC: Register EM based on efficiency class information") Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-30 15:33:42 +02:00
Zheng Bin	a3f083e04a	cpufreq: CPPC: Fix build error without CONFIG_ACPI_CPPC_CPUFREQ_FIE If CONFIG_ACPI_CPPC_CPUFREQ_FIE is not set, building fails: drivers/cpufreq/cppc_cpufreq.c: In function ‘populate_efficiency_class’: drivers/cpufreq/cppc_cpufreq.c:584:2: error: ‘cppc_cpufreq_driver’ undeclared (first use in this function); did you mean ‘cpufreq_driver’? cppc_cpufreq_driver.register_em = cppc_cpufreq_register_em; ^~~~~~~~~~~~~~~~~~~ cpufreq_driver Make declare of cppc_cpufreq_driver out of CONFIG_ACPI_CPPC_CPUFREQ_FIE to fix this. Fixes: `740fcdc2c2` ("cpufreq: CPPC: Register EM based on efficiency class information") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-30 15:31:28 +02:00
Pierre Gondois	2d41dc2380	cpufreq: CPPC: Enable dvfs_possible_from_any_cpu The communication mean of the _CPC desired performance can be PCC, System Memory, System IO, or Functional Fixed Hardware (FFH). PCC, SystemMemory and SystemIo address spaces are available from any CPU. Thus, dvfs_possible_from_any_cpu should be enabled in such case. For FFH, let the FFH implementation do smp_call_function_*() calls. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-19 19:45:34 +02:00
Pierre Gondois	3cc30dd00a	cpufreq: CPPC: Enable fast_switch The communication mean of the _CPC desired performance can be PCC, System Memory, System IO, or Functional Fixed Hardware. commit `b7898fda5b` ("cpufreq: Support for fast frequency switching") fast_switching is 'for switching CPU frequencies from interrupt context'. Writes to SystemMemory and SystemIo are fast and suitable this. This is not the case for PCC and might not be the case for FFH. Enable fast_switching for the cppc_cpufreq driver in above cases. Add cppc_allow_fast_switch() to check the desired performance register address space and set fast_switching accordingly. Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-19 19:45:34 +02:00
Pierre Gondois	740fcdc2c2	cpufreq: CPPC: Register EM based on efficiency class information Performance states and energy consumption values are not advertised in ACPI. In the GicC structure of the MADT table, the "Processor Power Efficiency Class field" (called efficiency class from now) allows to describe the relative energy efficiency of CPUs. To leverage the EM and EAS, the CPPC driver creates a set of artificial performance states and registers them in the Energy Model (EM), such as: - Every 20 capacity unit, a performance state is created. - The energy cost of each performance state gradually increases. No power value is generated as only the cost is used in the EM. During task placement, a task can raise the frequency of its whole pd. This can make EAS place a task on a pd with CPUs that are individually less energy efficient. As cost values are artificial, and to place tasks on CPUs with the lower efficiency class, a gap in cost values is generated for adjacent efficiency classes. E.g.: - efficiency class = 0, capacity is in [0-1024], so cost values are in [0: 51] (one performance state every 20 capacity unit) - efficiency class = 1, capacity is in [0-1024], cost values are in [1gap+0: 1gap+51]. The value of the cost gap is chosen to absorb a the energy of 4 CPUs at their maximum capacity. This means that between: 1- a pd of 4 CPUs, each of them being used at almost their full capacity. Their efficiency class is N. 2- a CPU using almost none of its capacity. Its efficiency class is N+1 EAS will choose the first option. This patch also populates the (struct cpufreq_driver).register_em callback if the valid efficiency_class ACPI values are provided. Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-06 21:01:17 +02:00
Pierre Gondois	d3c3db41df	cpufreq: CPPC: Add per_cpu efficiency_class In ACPI, describing power efficiency of CPUs can be done through the following arm specific field: ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure', 'Processor Power Efficiency Class field': Describes the relative power efficiency of the associated pro- cessor. Lower efficiency class numbers are more efficient than higher ones (e.g. efficiency class 0 should be treated as more efficient than efficiency class 1). However, absolute values of this number have no meaning: 2 isn’t necessarily half as efficient as 1. The efficiency_class field is stored in the GicC structure of the ACPI MADT table and it's currently supported in Linux for arm64 only. Thus, this new functionality is introduced for arm64 only. To allow the cppc_cpufreq driver to know and preprocess the efficiency_class values of all the CPUs, add a per_cpu efficiency_class variable to store them. At least 2 different efficiency classes must be present, otherwise there is no use in creating an Energy Model. The efficiency_class values are squeezed in [0:#efficiency_class-1] while conserving the order. For instance, efficiency classes of: [111, 212, 250] will be mapped to: [0 (was 111), 1 (was 212), 2 (was 250)]. Each policy being independently registered in the driver, populating the per_cpu efficiency_class is done only once at the driver initialization. This prevents from having each policy re-searching the efficiency_class values of other CPUs. The EM will be registered in a following patch. The patch also exports acpi_cpu_get_madt_gicc() to fetch the GicC structure of the ACPI MADT table for each CPU. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-06 21:01:17 +02:00

1 2

98 Commits