Commit Graph

1435 Commits

Author SHA1 Message Date
Rafael J. Wysocki
8e1d6a9223 Merge back system-wide sleep material for v6.6. 2023-08-14 09:55:44 +02:00
Vlastimil Babka
df2f7cde73 PM: hibernate: fix resume_store() return value when hibernation not available
On a laptop with hibernation set up but not actively used, and with
secure boot and lockdown enabled kernel, 6.5-rc1 gets stuck on boot with
the following repeated messages:

  A start job is running for Resume from hibernation using device /dev/system/swap (24s / no limit)
  lockdown_is_locked_down: 25311154 callbacks suppressed
  Lockdown: systemd-hiberna: hibernation is restricted; see man kernel_lockdown.7
  ...

Checking the resume code leads to commit cc89c63e2f ("PM: hibernate:
move finding the resume device out of software_resume") which
inadvertently changed the return value from resume_store() to 0 when
!hibernation_available(). This apparently translates to userspace
write() returning 0 as in number of bytes written, and userspace looping
indefinitely in the attempt to write the intended value.

Fix this by returning the full number of bytes that were to be written,
as that's what was done before the commit.

Fixes: cc89c63e2f ("PM: hibernate: move finding the resume device out of software_resume")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-07 11:41:11 +02:00
Jiri Slaby
bcb48185ed tty: sysrq: switch sysrq handlers from int to u8
The passed parameter to sysrq handlers is a key (a character). So change
the type from 'int' to 'u8'. Let it specifically be 'u8' for two
reasons:
* unsigned: unsigned values come from the upper layers (devices) and the
  tty layer assumes unsigned on most places, and
* 8-bit: as that what's supposed to be one day in all the layers built
  on the top of tty. (Currently, we use mostly 'unsigned char' and
  somewhere still only 'char'. (But that also translates to the former
  thanks to -funsigned-char.))

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de> # DRM
Acked-by: WANG Xuerui <git@xen0n.name> # loongarch
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20230712081811.29004-3-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-25 19:21:03 +02:00
Brian Geffon
005e8dddd4 PM: hibernate: don't store zero pages in the image file
On ChromeOS we've observed a considerable number of in-use pages filled with
zeros. Today with hibernate it's entirely possible that saveable pages are just
zero filled. Since we're already copying pages word-by-word in do_copy_page it
becomes almost free to determine if a page was completely filled with zeros.

This change introduces a new bitmap which will track these zero pages. If a page
is zero it will not be included in the saved image, instead to track these zero
pages in the image file we will introduce a new flag which we will set on the
packed PFN list. When reading back in the image file we will detect these zero
page PFNs and rebuild the zero page bitmap.

When the image is being loaded through calls to write_next_page if we encounter
a zero page we will silently memset it to 0 and then continue on to the next
page. Given the implementation in snapshot_read_next/snapshot_write_next this
change  will be transparent to non-compressed/compressed and swsusp modes of
operation.

To provide some concrete numbers from simple ad-hoc testing, on a device which
was lightly in use we saw that:

PM: hibernation: Image created (964408 pages copied, 548304 zero pages)

Of the approximately 6.2GB of saveable pages 2.2GB (36%) were just zero filled
and could be tracked entirely within the packed PFN list. The savings would
obviously be much lower for lzo compressed images, but even in the case of
compression not copying pages across to the compression threads will still
speed things up. It's also possible that we would see better overall compression
ratios as larger regions of "real data" would improve the compressibility.

Finally, such an approach could dramatically improve swsusp performance
as each one of those zero pages requires a write syscall to reload, by
handling it as part of the packed PFN list we're able to fully avoid
that.

Signed-off-by: Brian Geffon <bgeffon@google.com>
[ rjw: Whitespace adjustments, removal of redundant parentheses ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-24 09:51:59 +02:00
Rafael J. Wysocki
d121758da6 Merge branches 'pm-sleep' and 'pm-qos'
Merge a PM QoS fix and a hibernation fix for 6.5-rc2.

 - Unbreak the /sys/power/resume interface after recent changes (Azat
   Khuzhin).

 - Allow PM_QOS_DEFAULT_VALUE to be used with frequency QoS (Chungkai
   Yang).

* pm-sleep:
  PM: hibernate: Fix writing maj:min to /sys/power/resume

* pm-qos:
  PM: QoS: Restore support for default value on frequency QoS
2023-07-14 19:13:21 +02:00
Chungkai Yang
3a8395b565 PM: QoS: Restore support for default value on frequency QoS
Commit 8d36694245 ("PM: QoS: Add check to make sure CPU freq is
non-negative") makes sure CPU freq is non-negative to avoid negative
value converting to unsigned data type. However, when the value is
PM_QOS_DEFAULT_VALUE, pm_qos_update_target specifically uses
c->default_value which is set to FREQ_QOS_MIN/MAX_DEFAULT_VALUE when
cpufreq_policy_alloc is executed, for this case handling.

Adding check for PM_QOS_DEFAULT_VALUE to let default setting work will
fix this problem.

Fixes: 8d36694245 ("PM: QoS: Add check to make sure CPU freq is non-negative")
Link: https://lore.kernel.org/lkml/20230626035144.19717-1-Chung-kai.Yang@mediatek.com/
Link: https://lore.kernel.org/lkml/20230627071727.16646-1-Chung-kai.Yang@mediatek.com/
Link: https://lore.kernel.org/lkml/CAJZ5v0gxNOWhC58PHeUhW_tgf6d1fGJVZ1x91zkDdht11yUv-A@mail.gmail.com/
Signed-off-by: Chungkai Yang <Chung-kai.Yang@mediatek.com>
Cc: 6.0+ <stable@vger.kernel.org> # 6.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-11 20:09:57 +02:00
Azat Khuzhin
c9e4bf607d PM: hibernate: Fix writing maj:min to /sys/power/resume
resume_store() first calls lookup_bdev() and after tries to handle
maj:min, but it does not reset the error before, hence if you will write
maj:min you will get ENOENT:

    # echo 259:2 >| /sys/power/resume
    bash: echo: write error: No such file or directory

This also should fix hiberation via systemd, since it uses this way.

Fixes: 1e8c813b08 ("PM: hibernate: don't use early_lookup_bdev in resume_store")
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-11 19:58:08 +02:00
Linus Torvalds
6e17c6de3d - Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing.
 
 - Nhat Pham adds the new cachestat() syscall.  It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability.
 
 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning.
 
 - Lorenzo Stoakes has done some maintanance work on the get_user_pages()
   interface.
 
 - Liam Howlett continues with cleanups and maintenance work to the maple
   tree code.  Peng Zhang also does some work on maple tree.
 
 - Johannes Weiner has done some cleanup work on the compaction code.
 
 - David Hildenbrand has contributed additional selftests for
   get_user_pages().
 
 - Thomas Gleixner has contributed some maintenance and optimization work
   for the vmalloc code.
 
 - Baolin Wang has provided some compaction cleanups,
 
 - SeongJae Park continues maintenance work on the DAMON code.
 
 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting.
 
 - Christoph Hellwig has some cleanups for the filemap/directio code.
 
 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the provided
   APIs rather than open-coding accesses.
 
 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings.
 
 - John Hubbard has a series of fixes to the MM selftesting code.
 
 - ZhangPeng continues the folio conversion campaign.
 
 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock.
 
 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
   128 to 8.
 
 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management.
 
 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code.
 
 - Vishal Moola also has done some folio conversion work.
 
 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
 joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
 J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
 =B7yQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
Linus Torvalds
40e8e98f51 Power management updates for 6.5-rc1
- Introduce power capping core support for Intel TPMI (Topology Aware
    Register and PM Capsule Interface) and a TPMI interface driver for
    Intel RAPL (Zhang Rui, Dan Carpenter).
 
  - Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping
    driver (Zhang Rui).
 
  - Fix invalid initialization for pl4_supported field in the Intel RAPL
    power capping driver (Sumeet Pawnikar).
 
  - Clean up the intel_idle driver, make it work with VM guests that
    cannot use the MWAIT instruction and address the case in which the
    host may enter a deep idle state when the guest is idle (Arjan van
    de Ven).
 
  - Prevent cpufreq drivers that provide the ->adjust_perf() callback
    without a ->fast_switch() one which is used as a fallback from the
    former in some cases (Wyes Karny).
 
  - Fix some issues related to the AMD P-state cpufreq driver (Mario
    Limonciello, Wyes Karny).
 
  - Fix the energy_performance_preference attribute handling in the
    intel_pstate driver in passive mode (Tero Kristo).
 
  - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset
    (Kai-Heng Feng).
 
  - Correct spelling mistake in a comment in the hibernation code (Wang
    Honghui).
 
  - Add arch_resume_nosmt() prototype to avoid a "missing prototypes"
    build warning (Arnd Bergmann).
 
  - Restrict pm_pr_dbg() to system-wide power transitions and use it in
    a few additional places (Mario Limonciello).
 
  - Drop verification of in-params from genpd_add_device() and ensure
    that all of its callers will do it (Ulf Hansson).
 
  - Prevent possible integer overflows from occurring in
    genpd_parse_state() (Nikita Zhandarovich).
 
  - Reorder fieldls in 'struct devfreq_dev_status' to reduce its size
    somewhat (Christophe JAILLET).
 
  - Ensure that the Exynos PPMU driver is already loaded before the
    Exynos Bus driver starts probing so as to avoid a possible freeze
    loading of the kernel modules (Marek Szyprowski).
 
  - Fix variable deferencing before NULL check in the mtk-cci devfreq
    driver (Sukrut Bellary).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSZwPASHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxu44P/AouvVFDMt+eE76nPfNc10X1lswS0vrT
 X7LmylSDPyWiuz6p8MWGXB6T2nQ+3DbvSfVyBJ960ymkBnE/F9me8o3wB8eGbd6z
 ZvOD8+wVPXS4Cq8gUxy2zV1ul+o5IwwT20cYC6mWjasvByl13vTevN5d6ZQ9o6hS
 1hAQQDd6JjsdLIUyU0EbE4aD+l4h96o45IFxbV86qVH77ywa6VMNdulRKmDcONj3
 kM7jHFYL4xl0TfMjHp4IhGWXK32qGYgX1zYTOU5kSc11IExJfVzQcL2uQ9A0KSLp
 RJ0c93loUsHdMhenNkN4nSBFWBIaftKDLbS+5Ubt0DBuNN7kxWivEVts4DM/wxuB
 72PNl5h8YglcW7LHH2IXb/6HEerzbj42+6y459o+M0DcNTq18gu19OQTK5IGtRrQ
 Yf6+5BhgLR3R1REg0eaBg6njtGq0f5fmW7Iqo52eA8cXhHU0MTDJE1p6ytfN40gH
 ViA+T8HB6Mh91lWHVftbwo3wONHGcfJy+S2hGM45V5LKEGeKHILgmw0nUzO7epWE
 7VIPKGzkVd7h/Drk7X3nQR3DJFA/x5eNhjxt5LZD83cVVg34SS3ST5oH13FI9S7Q
 8zwG5KoHTDrmYug3sxQ+Q9pq8MnOl0ZbgqlVfwyiWjKYmNMbg4elsQG/2zD3t0kv
 y8zXbr7Kr4QB
 =SZV7
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add Intel TPMI (Topology Aware Register and PM Capsule
  Interface) support to the power capping subsystem, extend the
  intel_idle driver to work in VM guests where MWAIT is not available,
  extend the system-wide power management diagnostics, fix bugs and
  clean up code.

  Specifics:

   - Introduce power capping core support for Intel TPMI (Topology Aware
     Register and PM Capsule Interface) and a TPMI interface driver for
     Intel RAPL (Zhang Rui, Dan Carpenter)

   - Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping
     driver (Zhang Rui)

   - Fix invalid initialization for pl4_supported field in the Intel
     RAPL power capping driver (Sumeet Pawnikar)

   - Clean up the intel_idle driver, make it work with VM guests that
     cannot use the MWAIT instruction and address the case in which the
     host may enter a deep idle state when the guest is idle (Arjan van
     de Ven)

   - Prevent cpufreq drivers that provide the ->adjust_perf() callback
     without a ->fast_switch() one which is used as a fallback from the
     former in some cases (Wyes Karny)

   - Fix some issues related to the AMD P-state cpufreq driver (Mario
     Limonciello, Wyes Karny)

   - Fix the energy_performance_preference attribute handling in the
     intel_pstate driver in passive mode (Tero Kristo)

   - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset
     (Kai-Heng Feng)

   - Correct spelling mistake in a comment in the hibernation code (Wang
     Honghui)

   - Add arch_resume_nosmt() prototype to avoid a "missing prototypes"
     build warning (Arnd Bergmann)

   - Restrict pm_pr_dbg() to system-wide power transitions and use it in
     a few additional places (Mario Limonciello)

   - Drop verification of in-params from genpd_add_device() and ensure
     that all of its callers will do it (Ulf Hansson)

   - Prevent possible integer overflows from occurring in
     genpd_parse_state() (Nikita Zhandarovich)

   - Reorder fieldls in 'struct devfreq_dev_status' to reduce its size
     somewhat (Christophe JAILLET)

   - Ensure that the Exynos PPMU driver is already loaded before the
     Exynos Bus driver starts probing so as to avoid a possible freeze
     loading of the kernel modules (Marek Szyprowski)

   - Fix variable deferencing before NULL check in the mtk-cci devfreq
     driver (Sukrut Bellary)"

* tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits)
  intel_idle: Add a "Long HLT" C1 state for the VM guest mode
  cpufreq: intel_pstate: Fix energy_performance_preference for passive
  cpufreq: amd-pstate: Add a kernel config option to set default mode
  cpufreq: amd-pstate: Set a fallback policy based on preferred_profile
  ACPI: CPPC: Add definition for undefined FADT preferred PM profile value
  cpufreq: amd-pstate: Set default governor to schedutil
  PM: domains: Move the verification of in-params from genpd_add_device()
  cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated
  cpufreq: amd-pstate: Write CPPC enable bit per-socket
  intel_idle: Add support for using intel_idle in a VM guest using just hlt
  cpufreq: Fail driver register if it has adjust_perf without fast_switch
  intel_idle: clean up the (new) state_update_enter_method function
  intel_idle: refactor state->enter manipulation into its own function
  platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages
  pinctrl: amd: Use pm_pr_dbg to show debugging messages
  ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking
  include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume
  powercap: RAPL: Fix a NULL vs IS_ERR() bug
  powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
  powercap: RAPL: fix invalid initialization for pl4_supported field
  ...
2023-06-26 19:36:30 -07:00
Linus Torvalds
19300488c9 - Address -Wmissing-prototype warnings
- Remove repeated 'the' in comments
  - Remove unused current_untag_mask()
  - Document urgent tip branch timing
  - Clean up MSR kernel-doc notation
  - Clean up paravirt_ops doc
  - Update Srivatsa S. Bhat's maintained areas
  - Remove unused extern declaration acpi_copy_wakeup_routine()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmSZ6xIACgkQaDWVMHDJ
 krB9aQ/+NjB4CiWLbrnOYj9QYG6p1GE7lfu2dzIDdmcNuiai8htopXys54Igy3Rq
 BbIoW4E0SGK5E2OD7nLe4fBA/LpsYZTwDhGUu3SiovxLOoC5qkF0Q+6aVypPJE5o
 q7kn0Eo9IDL1dO0EbJptFDJRjk3K5caEoyXJRelarjIfPRbDEhUFaybVRykMZN9I
 4AOxrlb9WFggT4gUE4+N0kWyEqdgI9/aguavmasaG4lBHZ5JAHNQPNIa8bkVSAPL
 wULAzsrGp96V3tVxdjDCzD9aumk4xlJq7gk+v7mfx013dg7Cjs074Xoi2Y+TmaC7
 fdIZiGPJIkNToW+nENVO7BYtACSQhXeVTGxLQO/HNTDc//ZWiIUoJT2U4qu/6e6F
 aAIGoLwv68H4BghS2qx6Gz+BTIfl35mcPUb75MQhu+D84QZoZWrdamCYhsvHeZzc
 uC3nojrb6PBOth9nJsRae+j1zpRe/DT2LvHSWPJgK6EygOAi05ZfYUll/6sb0vze
 IXkUrVV1BvDDVpY9/HnE8RpDCDolP0/ezK9zsw48arZtkc+Qmw2WlD/2D98E+pSb
 MJPelbVmpzWTaoR4jDzXJCXkWe7CQJ5uPQj5azAE9l7YvnxgCQP5xnm5sLU9eyLu
 RsOwRzss0+3z44x5rJi9nSxQJ0LHfTAzW8/ZmNSZGHzi0ClszK0=
 =N82i
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Dave Hansen:
 "As usual, these are all over the map. The biggest cluster is work from
  Arnd to eliminate -Wmissing-prototype warnings:

   - Address -Wmissing-prototype warnings

   - Remove repeated 'the' in comments

   - Remove unused current_untag_mask()

   - Document urgent tip branch timing

   - Clean up MSR kernel-doc notation

   - Clean up paravirt_ops doc

   - Update Srivatsa S. Bhat's maintained areas

   - Remove unused extern declaration acpi_copy_wakeup_routine()"

* tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine()
  Documentation: virt: Clean up paravirt_ops doc
  x86/mm: Remove unused current_untag_mask()
  x86/mm: Remove repeated word in comments
  x86/lib/msr: Clean up kernel-doc notation
  x86/platform: Avoid missing-prototype warnings for OLPC
  x86/mm: Add early_memremap_pgprot_adjust() prototype
  x86/usercopy: Include arch_wb_cache_pmem() declaration
  x86/vdso: Include vdso/processor.h
  x86/mce: Add copy_mc_fragile_handle_tail() prototype
  x86/fbdev: Include asm/fb.h as needed
  x86/hibernate: Declare global functions in suspend.h
  x86/entry: Add do_SYSENTER_32() prototype
  x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled()
  x86/mm: Include asm/numa.h for set_highmem_pages_init()
  x86: Avoid missing-prototype warnings for doublefault code
  x86/fpu: Include asm/fpu/regset.h
  x86: Add dummy prototype for mk_early_pgtbl_32()
  x86/pci: Mark local functions as 'static'
  x86/ftrace: Move prepare_ftrace_return prototype to header
  ...
2023-06-26 16:43:54 -07:00
Mario Limonciello
cdb8c100d8 include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume
All uses in the kernel are currently already oriented around
suspend/resume. As some other parts of the kernel may also use these
messages in functions that could also be used outside of
suspend/resume, only enable in suspend/resume path.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:56:08 +02:00
Christoph Hellwig
05bdb99653 block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:05 -06:00
Christoph Hellwig
2736e8eeb0 block: use the holder as indication for exclusive opens
The current interface for exclusive opens is rather confusing as it
requires both the FMODE_EXCL flag and a holder.  Remove the need to pass
FMODE_EXCL and just key off the exclusive open off a non-NULL holder.

For blkdev_put this requires adding the holder argument, which provides
better debug checking that only the holder actually releases the hold,
but at the same time allows removing the now superfluous mode argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>		[btrfs]
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Christoph Hellwig
c889d0793d swsusp: don't pass a stack address to blkdev_get_by_path
holder is just an on-stack pointer that can easily be reused by other calls,
replace it with a static variable that doesn't change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Kefeng Wang
07f44ac3c9 mm: page_alloc: move pm_* function into power
pm_restrict_gfp_mask()/pm_restore_gfp_mask() only used in power, let's
move them out of page_alloc.c.

Adding a general gfp_has_io_fs() function which return true if gfp with
both __GFP_IO and __GFP_FS flags, then use it inside of
pm_suspended_storage(), also the pm_suspended_storage() is moved into
suspend.h.

Link: https://lkml.kernel.org/r/20230516063821.121844-11-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:24 -07:00
Kefeng Wang
31a1b9d7fe mm: page_alloc: move mark_free_page() into snapshot.c
The mark_free_page() is only used in kernel/power/snapshot.c, move it out
to reduce a bit of page_alloc.c

Link: https://lkml.kernel.org/r/20230516063821.121844-10-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:24 -07:00
Christoph Hellwig
1e8c813b08 PM: hibernate: don't use early_lookup_bdev in resume_store
resume_store is a sysfs attribute written during normal kernel runtime,
and it should not use the early_lookup_bdev API that bypasses all normal
path based permission checking, and might cause problems with certain
container environments renaming devices.

Switch to lookup_bdev, which does a normal path lookup instead, and fall
back to trying to parse a numeric dev_t just like early_lookup_bdev did.

Note that this strictly speaking changes the kernel ABI as the PARTUUID=
and PARTLABEL= style syntax is now not available during a running
systems.  They never were intended for that, but this breaks things
we'll have to figure out a way to make them available again.  But if
avoidable in any way I'd rather avoid that.

Fixes: 421a5fa1a6 ("PM / hibernate: use name_to_dev_t to parse resume")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:57:40 -06:00
Christoph Hellwig
cf056a4312 init: improve the name_to_dev_t interface
name_to_dev_t has a very misleading name, that doesn't make clear
it should only be used by the early init code, and also has a bad
calling convention that doesn't allow returning different kinds of
errors.  Rename it to early_lookup_bdev to make the use case clear,
and return an errno, where -EINVAL means the string could not be
parsed, and -ENODEV means it the string was valid, but there was
no device found for it.

Also stub out the whole call for !CONFIG_BLOCK as all the non-block
root cases are always covered in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:56:46 -06:00
Christoph Hellwig
cc89c63e2f PM: hibernate: move finding the resume device out of software_resume
software_resume can be called either from an init call in the boot code,
or from sysfs once the system has finished booting, and the two
invocation methods this can't race with each other.

For the latter case we did just parse the suspend device manually, while
the former might not have one.  Split software_resume so that the search
only happens for the boot case, which also means the special lockdep
nesting annotation can go away as the system transition mutex can be
taken a little later and doesn't have the sysfs locking nest inside it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:55:20 -06:00
Christoph Hellwig
d6545e6872 PM: hibernate: remove the global snapshot_test variable
Passing call dependent variable in global variables is a huge
antipattern.  Fix it up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:55:20 -06:00
Christoph Hellwig
02b42d58f3 PM: hibernate: factor out a helper to find the resume device
Split the logic to find the resume device out software_resume and into
a separate helper to start unwindig the convoluted goto logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:55:20 -06:00
Christoph Hellwig
0718afd47f block: introduce holder ops
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims.  It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:53:04 -06:00
Wang Honghui
847aea98e0 PM: hibernate: Correct spelling mistake in a comment
Fix a typo in a comment in kernel/power/snapshot.c

Signed-off-by: Wang Honghui <honghui.wang@ucas.com.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 19:00:50 +02:00
Arnd Bergmann
8a3e82d386 x86/hibernate: Declare global functions in suspend.h
Three functions that are defined in x86 specific code to override
generic __weak implementations cause a warning because of a missing
prototype:

arch/x86/power/cpu.c:298:5: error: no previous prototype for 'hibernate_resume_nonboot_cpu_disable' [-Werror=missing-prototypes]
arch/x86/power/hibernate.c:129:5: error: no previous prototype for 'arch_hibernation_header_restore' [-Werror=missing-prototypes]
arch/x86/power/hibernate.c:91:5: error: no previous prototype for 'arch_hibernation_header_save' [-Werror=missing-prototypes]

Move the declarations into a global header so it can be included
by any file defining one of these.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/all/20230516193549.544673-14-arnd%40kernel.org
2023-05-18 11:56:18 -07:00
Linus Torvalds
fa31fc82fb More power management updates for 6.4-rc1
- Make test_resume work again after the changes that made hibernation
    open the snapshot device in exclusive mode (Chen Yu).
 
  - Clean up code in several places in intel_idle (Artem Bityutskiy).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRSbLgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxbvUP/3WeXEfulhN2FRuqmfqLtWNrpMwsoFpd
 9BQRokAQdKL3V2Q+YgdEH22+cB2LNAo5ty1+SHXjzsiExxBYWbKd6/AwOwwmFVPq
 I8uHS5pSqYQrZLI7eA1ZhFiTotePOpLyHpsHO3lezxXMlvEw30tY8g09WTH1F2Cz
 uzgTB1NTHRZaHWObrjvPkq3IERAbtF1xAQVPMtyWzs7IoCOlLxsKtHpfLBwGFYpZ
 1U7dbAFoGuQYjCUE9i1wbQdee9elRhPDJ6TGCx8rqtRRybxPZOdz1M947K/N5Q23
 OtK1HHfTIFHoi36sjwfEdZC89RGr7CI3hc5CAgexxwtsCw4gpIgvFoNXjCKW1q0J
 +C5ztCntxTGzWi37pnriV3I0NlTjTdEAoS2VDNRGlj0Vvrrx5S5H35qjoqD66N3a
 +zJ9eEYl5WGJlmZWxUvycJmS0PdPp755tmxrBRWqm7nr+oVJWKY8j2OKjlADCgoG
 k74zf1dp4zZJHIml1QpaRp0EsueiHs66Xu52VoFKyrAp0+ytYtnC1/SeKoYF4jDg
 PoTJmIT5ve8Nq8vwYbrg/z497J3bKHfbf1LPxDPNHVB6gx4Nv254qUNaM8oyic9j
 aHNwna6IAl+BshickaR0lUccJLnBAgPWyxnwFlfHaDbAGVtLNLSVFYlwr7xiUvlo
 t9eB4FZNGKMU
 =CwBd
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These fix a hibernation test mode regression and clean up the
  intel_idle driver.

  Specifics:

   - Make test_resume work again after the changes that made hibernation
     open the snapshot device in exclusive mode (Chen Yu)

   - Clean up code in several places in intel_idle (Artem Bityutskiy)"

* tag 'pm-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: mark few variables as __read_mostly
  intel_idle: do not sprinkle module parameter definitions around
  intel_idle: fix confusing message
  intel_idle: improve C-state flags handling robustness
  intel_idle: further intel_idle_init_cstates_icpu() cleanup
  intel_idle: clean up intel_idle_init_cstates_icpu()
  intel_idle: use pr_info() instead of printk()
  PM: hibernate: Do not get block device exclusively in test_resume mode
  PM: hibernate: Turn snapshot_test into global variable
2023-05-03 12:01:05 -07:00
Linus Torvalds
cd546fa325 workqueue changes for v6.4-rc1
Mostly changes from Petr to improve warning and error reporting. Workqueue
 now reports more of the relevant failures with better context which should
 help debugging.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErbOg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfipAP9BXrUBI99uZL7XjSEe6rLWa0STODyWh67ikR+0
 nUO22QEA+ttt5ecLZyopYB18Le06VBBW7J5cs0Cg83bCBTGb4wE=
 =gHbc
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "Mostly changes from Petr to improve warning and error reporting.

  Workqueue now reports more of the relevant failures with better
  context which should help debugging"

* tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Introduce show_freezable_workqueues
  workqueue: Print backtraces from CPUs with hung CPU bound workqueues
  workqueue: Warn when a rescuer could not be created
  workqueue: Interrupted create_worker() is not a repeated event
  workqueue: Warn when a new worker could not be created
  workqueue: Fix hung time report of worker pools
  workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()
  MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry
2023-04-29 09:48:52 -07:00
Chen Yu
5904de0d73 PM: hibernate: Do not get block device exclusively in test_resume mode
The system refused to do a test_resume because it found that the
swap device has already been taken by someone else. Specifically,
the swsusp_check()->blkdev_get_by_dev(FMODE_EXCL) is supposed to
do this check.

Steps to reproduce:
 dd if=/dev/zero of=/swapfile bs=$(cat /proc/meminfo |
       awk '/MemTotal/ {print $2}') count=1024 conv=notrunc
 mkswap /swapfile
 swapon /swapfile
 swap-offset /swapfile
 echo 34816 > /sys/power/resume_offset
 echo test_resume > /sys/power/disk
 echo disk > /sys/power/state

 PM: Using 3 thread(s) for compression
 PM: Compressing and saving image data (293150 pages)...
 PM: Image saving progress:   0%
 PM: Image saving progress:  10%
 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata1.00: configured for UDMA/100
 ata2: SATA link down (SStatus 0 SControl 300)
 ata5: SATA link down (SStatus 0 SControl 300)
 ata6: SATA link down (SStatus 0 SControl 300)
 ata3: SATA link down (SStatus 0 SControl 300)
 ata4: SATA link down (SStatus 0 SControl 300)
 PM: Image saving progress:  20%
 PM: Image saving progress:  30%
 PM: Image saving progress:  40%
 PM: Image saving progress:  50%
 pcieport 0000:00:02.5: pciehp: Slot(0-5): No device found
 PM: Image saving progress:  60%
 PM: Image saving progress:  70%
 PM: Image saving progress:  80%
 PM: Image saving progress:  90%
 PM: Image saving done
 PM: hibernation: Wrote 1172600 kbytes in 2.70 seconds (434.29 MB/s)
 PM: S|
 PM: hibernation: Basic memory bitmaps freed
 PM: Image not found (code -16)

This is because when using the swapfile as the hibernation storage,
the block device where the swapfile is located has already been mounted
by the OS distribution(usually mounted as the rootfs). This is not
an issue for normal hibernation, because software_resume()->swsusp_check()
happens before the block device(rootfs) mount. But it is a problem for the
test_resume mode. Because when test_resume happens, the block device has
been mounted already.

Thus remove the FMODE_EXCL for test_resume mode. This would not be a
problem because in test_resume stage, the processes have already been
frozen, and the race condition described in
Commit 39fbef4b0f ("PM: hibernate: Get block device exclusively in swsusp_check()")
is unlikely to happen.

Fixes: 39fbef4b0f ("PM: hibernate: Get block device exclusively in swsusp_check()")
Reported-by: Yifan Li <yifan2.li@intel.com>
Suggested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Tested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:00:28 +02:00
Chen Yu
08169a162f PM: hibernate: Turn snapshot_test into global variable
There is need to check snapshot_test and open block device
in different mode, so as to avoid the race condition.

No functional changes intended.

Suggested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:00:28 +02:00
Mario Limonciello
b52124a78a PM: Add sysfs files to represent time spent in hardware sleep state
Userspace can't easily discover how much of a sleep cycle was spent in a
hardware sleep state without using kernel tracing and vendor specific sysfs
or debugfs files.

To make this information more discoverable, introduce 3 new sysfs files:
1) The time spent in a hw sleep state for last cycle.
2) The time spent in a hw sleep state since the kernel booted
3) The maximum time that the hardware can report for a sleep cycle.
All of these files will be present only if the system supports s2idle.

Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-20 19:06:12 +02:00
Jungseung Lee
704bc669e1 workqueue: Introduce show_freezable_workqueues
Currently show_all_workqueue is called if freeze fails at the time of
freeze the workqueues, which shows the status of all workqueues and of
all worker pools. In this cases we may only need to dump state of only
workqueues that are freezable and busy.

This patch defines show_freezable_workqueues, which uses
show_one_workqueue, a granular function that shows the state of individual
workqueues, so that dump only the state of freezable workqueues
at that time.

tj: Minor message adjustment.

Signed-off-by: Jungseung Lee <js07.lee@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-23 15:55:38 -10:00
Rafael J. Wysocki
ace5029856 Merge branches 'powercap', 'pm-domains', 'pm-em' and 'pm-opp'
Merge updates of the powercap framework, generic PM domains, Energy
Model and operating performance points for 6.3-rc1:

 - Fix possible name leak in powercap_register_zone() (Yang Yingliang).

 - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
   capping driver (Zhang Rui).

 - Modify the idle_inject power capping facility to support 100% idle
   injection (Srinivas Pandruvada).

 - Fix large time windows handling in the intel_rapl power capping
   driver (Zhang Rui).

 - Fix memory leaks with using debugfs_lookup() in the generic PM
   domains and Energy Model code (Greg Kroah-Hartman).

 - Add missing 'cache-unified' property in example for kryo OPP bindings
   (Rob Herring).

 - Fix error checking in opp_migrate_dentry() (Qi Zheng).

 - Remove "select SRCU" (Paul E. McKenney).

 - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
   Dybcio).

* powercap:
  powercap: intel_rapl: Fix handling for large time window
  powercap: idle_inject: Support 100% idle injection
  powercap: intel_rapl: add support for Emerald Rapids
  powercap: intel_rapl: add support for Meteor Lake
  powercap: fix possible name leak in powercap_register_zone()

* pm-domains:
  PM: domains: fix memory leak with using debugfs_lookup()

* pm-em:
  PM: EM: fix memory leak with using debugfs_lookup()

* pm-opp:
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: opp: v2-qcom-level: Let qcom,opp-fuse-level be a 2-long array
  drivers/opp: Remove "select SRCU"
  dt-bindings: opp: opp-v2-kryo-cpu: Add missing 'cache-unified' property in example
2023-02-15 20:06:26 +01:00
Greg Kroah-Hartman
a0e8c13ccd PM: EM: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09 20:36:10 +01:00
Paul E. McKenney
52e0452b41 PM: sleep: Remove "select SRCU"
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it.  Therefore, remove the "select SRCU"
Kconfig statements.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 17:53:07 +01:00
Randy Dunlap
6b37dfcb39 PM: hibernate: swap: don't use /** for non-kernel-doc comments
kernel-doc complains about multiple occurrences of "/**" being used
for something that is not a kernel-doc comment, so change all of these
to just use "/*" comment style.

The warning message for all of these is:

FILE:LINE: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst

kernel/power/swap.c:585: warning: ...
Structure used for CRC32.
kernel/power/swap.c:600: warning: ...
 * CRC32 update function that runs in its own thread.
kernel/power/swap.c:627: warning: ...
 * Structure used for LZO data compression.
kernel/power/swap.c:644: warning: ...
 * Compression function that runs in its own thread.
kernel/power/swap.c:952: warning: ...
 *      The following functions allow us to read data using a swap map
kernel/power/swap.c:1111: warning: ...
 * Structure used for LZO data decompression.
kernel/power/swap.c:1127: warning: ...
 * Decompression function that runs in its own thread.

Also correct one spello/typo.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 14:49:48 +01:00
Rafael J. Wysocki
96d4b8e1ad PM: sleep: Refine error message in try_to_freeze_tasks()
A previous change amended try_to_freeze_tasks() with the "what"
variable pointing to a string describing the group of tasks subject to
the freezing which may be used in the error message in there too, so
make that happen.

Accordingly, update sleepgraph.py to catch the modified error message
as appropriate.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
2022-12-06 12:04:34 +01:00
Rafael J. Wysocki
a449dfbfc0 PM: sleep: Avoid using pr_cont() in the tasks freezing code
Using pr_cont() in the tasks freezing code related to system-wide
suspend and hibernation is problematic, because the continuation
messages printed there are susceptible to interspersing with other
unrelated messages which results in output that is hard to
understand.

Address this issue by modifying try_to_freeze_tasks() to print
messages that don't require continuations and adjusting its
callers accordingly.

Reported-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
2022-12-06 12:04:10 +01:00
Xueqin Luo
3363e0adb3 PM: hibernate: Complain about memory map mismatches during resume
The system memory map can change over a hibernation-restore cycle due
to a defect in the platform firmware, and some of the page frames used
by the kernel before hibernation may not be available any more during
the subsequent restore which leads to the error below.

[  T357] PM: Image loading progress:   0%
[  T357] PM: Read 2681596 kbytes in 0.03 seconds (89386.53 MB/s)
[  T357] PM: Error -14 resuming
[  T357] PM: Failed to load hibernation image, recovering.
[  T357] PM: Basic memory bitmaps freed
[  T357] OOM killer enabled.
[  T357] Restarting tasks ... done.
[  T357] PM: resume from hibernation failed (-14)
[  T357] PM: Hibernation image not present or could not be loaded.

Add an error message to the unpack() function to allow problematic
page frames to be identified and the source of the problem to be
diagnosed more easily. This can save developers quite a bit of
debugging time.

Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
[ rjw: New subject, edited changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-01 19:20:14 +01:00
xiongxin
6e5d7300cb PM: hibernate: Fix mistake in kerneldoc comment
The actual maximum image size formula in hibernate_preallocate_memory()
is as follows:

max_size = (count - (size + PAGES_FOR_IO)) / 2
	    - 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE);

but the one in the kerneldoc comment of the function is different and
incorrect.

Fixes: ddeb648708 ("PM / Hibernate: Add sysfs knob to control size of memory for drivers")
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
[ rjw: Subject and changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-03 17:32:58 +01:00
Mario Limonciello
85850af4fc PM: hibernate: Allow hybrid sleep to work with s2idle
Hybrid sleep is currently hardcoded to only operate with S3 even
on systems that might not support it.

Instead of assuming this mode is what the user wants to use, for
hybrid sleep follow the setting of `mem_sleep_current` which
will respect mem_sleep_default kernel command line and policy
decisions made by the presence of the FADT low power idle bit.

Fixes: 81d45bdf89 ("PM / hibernate: Untangle power_down()")
Reported-and-tested-by: kolAflash <kolAflash@kolahilft.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216574
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-25 14:53:19 +02:00
Linus Torvalds
30c999937f Scheduler changes for v6.1:
- Debuggability:
 
      - Change most occurances of BUG_ON() to WARN_ON_ONCE()
 
      - Reorganize & fix TASK_ state comparisons, turn it into a bitmap
 
      - Update/fix misc scheduler debugging facilities
 
  - Load-balancing & regular scheduling:
 
      - Improve the behavior of the scheduler in presence of lot of
        SCHED_IDLE tasks - in particular they should not impact other
        scheduling classes.
 
      - Optimize task load tracking, cleanups & fixes
 
      - Clean up & simplify misc load-balancing code
 
  - Freezer:
 
      - Rewrite the core freezer to behave better wrt thawing and be simpler
        in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting
        all the fallout.
 
  - Deadline scheduler:
 
      - Fix the DL capacity-aware code
 
      - Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period()
 
      - Relax/optimize locking in task_non_contending()
 
  - Cleanups:
 
      - Factor out the update_current_exec_runtime() helper
 
      - Various cleanups, simplifications
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/01cRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1geZA/+PB4KC1T9aVxzaTHI36R03YgJYZmIdtxw
 wTf02MixePmz+gQCbepJbempGOh5ST28aOcI0xhdYOql5B63MaUBBMlB0HvGUyDG
 IU3zETqLMRtAbnSTdQFv8m++ECUtZYp8/x1FCel4WO7ya4ETkRu1NRfCoUepEhpZ
 aVAlae9LH3NBaF9t7s0PT2lTjf3pIzMFRkddJ0ywJhbFR3VnWat05fAK+J6fGY8+
 LS54coefNlJD4oDh5TY8uniL1j5SmWmmwbk9Cdj7bLU5P3dFSS0/+5FJNHJPVGDE
 srGT7wstRUcDrN0CnZo48VIUBiApJCCDqTfJYi9wNYd0NAHvwY6MIJJgEIY8mKsI
 L/qH26H81Wt+ezSZ/5JIlGlZ/LIeNaa6OO/fbWEYABBQogvvx3nxsRNUYKSQzumH
 CnSBasBjLnjWyLlK4qARM9cI7NFSEK6NUigrEx/7h8JFu/8T4DlSy6LsF1HUyKgq
 4+FJLAqG6cL0tcwB/fHYd0oRESN8dStnQhGxSojgufwLc7dlFULvCYF5JM/dX+/V
 IKwbOfIOeOn6ViMtSOXAEGdII+IQ2/ZFPwr+8Z5JC7NzvTVL6xlu/3JXkLZR3L7o
 yaXTSaz06h1vil7Z+GRf7RHc+wUeGkEpXh5vnarGZKXivhFdWsBdROIJANK+xR0i
 TeSLCxQxXlU=
 =KjMD
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Debuggability:

   - Change most occurances of BUG_ON() to WARN_ON_ONCE()

   - Reorganize & fix TASK_ state comparisons, turn it into a bitmap

   - Update/fix misc scheduler debugging facilities

  Load-balancing & regular scheduling:

   - Improve the behavior of the scheduler in presence of lot of
     SCHED_IDLE tasks - in particular they should not impact other
     scheduling classes.

   - Optimize task load tracking, cleanups & fixes

   - Clean up & simplify misc load-balancing code

  Freezer:

   - Rewrite the core freezer to behave better wrt thawing and be
     simpler in general, by replacing PF_FROZEN with TASK_FROZEN &
     fixing/adjusting all the fallout.

  Deadline scheduler:

   - Fix the DL capacity-aware code

   - Factor out dl_task_is_earliest_deadline() &
     replenish_dl_new_period()

   - Relax/optimize locking in task_non_contending()

  Cleanups:

   - Factor out the update_current_exec_runtime() helper

   - Various cleanups, simplifications"

* tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  sched: Fix more TASK_state comparisons
  sched: Fix TASK_state comparisons
  sched/fair: Move call to list_last_entry() in detach_tasks
  sched/fair: Cleanup loop_max and loop_break
  sched/fair: Make sure to try to detach at least one movable task
  sched: Show PF_flag holes
  freezer,sched: Rewrite core freezer logic
  sched: Widen TAKS_state literals
  sched/wait: Add wait_event_state()
  sched/completion: Add wait_for_completion_state()
  sched: Add TASK_ANY for wait_task_inactive()
  sched: Change wait_task_inactive()s match_state
  freezer,umh: Clean up freezer/initrd interaction
  freezer: Have {,un}lock_system_sleep() save/restore flags
  sched: Rename task_running() to task_on_cpu()
  sched/fair: Cleanup for SIS_PROP
  sched/fair: Default to false in test_idle_cores()
  sched/fair: Remove useless check in select_idle_core()
  sched/fair: Avoid double search on same cpu
  sched/fair: Remove redundant check in select_idle_smt()
  ...
2022-10-10 09:10:28 -07:00
Mario Limonciello
811d59fdf5 ACPI: s2idle: Add a new ->check() callback for platform_s2idle_ops
On some platforms it is found that Linux more aggressively enters s2idle
than Windows enters Modern Standby and this uncovers some synchronization
issues for the platform.  To aid in debugging this class of problems in
the future, add support for an extra optional callback intended for
drivers to emit extra debugging.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220829162953.5947-2-mario.limonciello@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-09-09 17:37:40 +02:00
Peter Zijlstra
f5d39b0208 freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:

	TASK_FREEZABLE	-> TASK_FROZEN
	__TASK_STOPPED	-> TASK_FROZEN
	__TASK_TRACED	-> TASK_FROZEN

The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).

The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in ->jobctl.

With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-09-07 21:53:50 +02:00
Peter Zijlstra
5950e5d574 freezer: Have {,un}lock_system_sleep() save/restore flags
Rafael explained that the reason for having both PF_NOFREEZE and
PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from
kthread context that has previously called set_freezable().

In preparation of merging the flags, have {,un}lock_system_slee() save
and restore current->flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114648.725003428@infradead.org
2022-09-07 21:53:48 +02:00
Linus Torvalds
228dfe98a3 Char / Misc driver changes for 6.0-rc1
Here is the large set of char and misc and other driver subsystem
 changes for 6.0-rc1.
 
 Highlights include:
 	- large set of IIO driver updates, additions, and cleanups
 	- new habanalabs device support added (loads of register maps
 	  much like GPUs have)
 	- soundwire driver updates
 	- phy driver updates
 	- slimbus driver updates
 	- tiny virt driver fixes and updates
 	- misc driver fixes and updates
 	- interconnect driver updates
 	- hwtracing driver updates
 	- fpga driver updates
 	- extcon driver updates
 	- firmware driver updates
 	- counter driver update
 	- mhi driver fixes and updates
 	- binder driver fixes and updates
 	- speakup driver fixes
 
 Full details are in the long shortlog contents.
 
 All of these have been in linux-next for a while without any reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYup9QQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylBKQCfaSuzl9ZP9dTvAw2FPp14oRqXnpoAnicvWAoq
 1vU9Vtq2c73uBVLdZm4m
 =AwP3
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc driver updates from Greg KH:
 "Here is the large set of char and misc and other driver subsystem
  changes for 6.0-rc1.

  Highlights include:

   - large set of IIO driver updates, additions, and cleanups

   - new habanalabs device support added (loads of register maps much
     like GPUs have)

   - soundwire driver updates

   - phy driver updates

   - slimbus driver updates

   - tiny virt driver fixes and updates

   - misc driver fixes and updates

   - interconnect driver updates

   - hwtracing driver updates

   - fpga driver updates

   - extcon driver updates

   - firmware driver updates

   - counter driver update

   - mhi driver fixes and updates

   - binder driver fixes and updates

   - speakup driver fixes

  All of these have been in linux-next for a while without any reported
  problems"

* tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits)
  drivers: lkdtm: fix clang -Wformat warning
  char: remove VR41XX related char driver
  misc: Mark MICROCODE_MINOR unused
  spmi: trace: fix stack-out-of-bound access in SPMI tracing functions
  dt-bindings: iio: adc: Add compatible for MT8188
  iio: light: isl29028: Fix the warning in isl29028_remove()
  iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes
  iio: fix iio_format_avail_range() printing for none IIO_VAL_INT
  iio: adc: max1027: unlock on error path in max1027_read_single_value()
  iio: proximity: sx9324: add empty line in front of bullet list
  iio: magnetometer: hmc5843: Remove duplicate 'the'
  iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
  iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
  iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
  iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
  ...
2022-08-04 11:05:48 -07:00
Linus Torvalds
c013d0af81 for-5.20/block-2022-07-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLko3gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmQaD/90NKFj4v8I456TUQyg1jimXEsL+e84E6o2
 ALWVb6JzQvlPVQXNLnK5YKIunMWOTtTMz0nyB8sVRwVJVJO0P5d7QopAkZM8fkyU
 MK5OCzoryENw4DTc2wJS4in6cSbGylIuN74wMzlf7+M67JTImfoZQhbTMcjwzZfn
 b3OlL6sID7zMXwGcuOJPZyUJICCpDhzdSF9JXqKma5PQuG2SBmQyvFxJAcsoFBPc
 YetnoRIOIN6yBvsIZaPaYq7XI9MIvF0e67EQtyCEHj4tHpyVnyDWkeObVFULsISU
 gGEKbkYPvNUzRAU5Q1NBBHh1tTfkf/MaUxTuZwoEwZ/s04IGBGMmrZGyfvdfzYo6
 M7NwSEg/TrUSNfTwn65mQi7uOXu1pGkJrqz84Flm8u9Qid9Vd7LExLG5p/ggnWdH
 5th93MDEmtEg29e9DXpEAuS5d0t3TtSvosflaKpyfNNfr+P0rWCN6GM/uW62VUTK
 ls69SQh/AQJRbg64jU4xper6WhaYtSXK7TKEnxJycoEn9gYNyCcdot2uekth0xRH
 ChHGmRlteiqe/y4uFWn/2dcxWjoleiHbFjTaiRL75WVl8wIDEjw02LGuoZ61Ss9H
 WOV+MT7KqNjBGe6lreUY+O/PO02dzmoR6heJXN19p8zr/pBuLCTGX7UpO7rzgaBR
 4N1HEozvIw==
 =celk
 -----END PGP SIGNATURE-----

Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Improve the type checking of request flags (Bart)

 - Ensure queue mapping for a single queues always picks the right queue
   (Bart)

 - Sanitize the io priority handling (Jan)

 - rq-qos race fix (Jinke)

 - Reserved tags handling improvements (John)

 - Separate memory alignment from file/disk offset aligment for O_DIRECT
   (Keith)

 - Add new ublk driver, userspace block driver using io_uring for
   communication with the userspace backend (Ming)

 - Use try_cmpxchg() to cleanup the code in various spots (Uros)

 - Finally remove bdevname() (Christoph)

 - Clean up the zoned device handling (Christoph)

 - Clean up independent access range support (Christoph)

 - Clean up and improve block sysfs handling (Christoph)

 - Clean up and improve teardown of block devices.

   This turns the usual two step process into something that is simpler
   to implement and handle in block drivers (Christoph)

 - Clean up chunk size handling (Christoph)

 - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu,
   Ming, Sebastian, Yang, Ying)

* tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits)
  ublk_drv: fix double shift bug
  ublk_drv: make sure that correct flags(features) returned to userspace
  ublk_drv: fix error handling of ublk_add_dev
  ublk_drv: fix lockdep warning
  block: remove __blk_get_queue
  block: call blk_mq_exit_queue from disk_release for never added disks
  blk-mq: fix error handling in __blk_mq_alloc_disk
  ublk: defer disk allocation
  ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask
  ublk: fold __ublk_create_dev into ublk_ctrl_add_dev
  ublk: cleanup ublk_ctrl_uring_cmd
  ublk: simplify ublk_ch_open and ublk_ch_release
  ublk: remove the empty open and release block device operations
  ublk: remove UBLK_IO_F_PREFLUSH
  ublk: add a MAINTAINERS entry
  block: don't allow the same type rq_qos add more than once
  mmc: fix disk/queue leak in case of adding disk failure
  ublk_drv: fix an IS_ERR() vs NULL check
  ublk: remove UBLK_IO_F_INTEGRITY
  ublk_drv: remove unneeded semicolon
  ...
2022-08-02 13:46:35 -07:00
Rafael J. Wysocki
aa727b7b4b Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs'
Merge devfreq changes, PM QoS change, and power management tools and
documentation changes for v5.20-rc1:

 - Add new devfreq driver for Mediatek CCI (Cache Coherent
   Interconnect) (Johnson Wang).

 - Convert the Samsung Exynos SoC Bus bindings to DT schema of
   exynos-bus.c (Krzysztof Kozlowski).

 - Address kernel-doc warnings by adding the description for unused
   fucntion parameters in devfreq core (Mauro Carvalho Chehab).

 - Use NULL to pass a null pointer rather than zero according to the
   function propotype in imx-bus.c (Colin Ian King).

 - Print error message instead of error interger value in
   tegra30-devfreq.c (Dmitry Osipenko).

 - Add checks to prevent setting negative frequency QoS limits for
   CPUs (Shivnandan Kumar).

 - Update the pm-graph suite of utilities to the latest revision 5.9
   including multiple improvements (Todd Brandt).

 - Drop pme_interrupt reference from the PCI power management
   documentation (Mario Limonciello).

* pm-devfreq:
  PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
  PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
  PM / devfreq: shut up kernel-doc warnings
  dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
  PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
  dt-bindings: interconnect: Add MediaTek CCI dt-bindings

* pm-qos:
  PM: QoS: Add check to make sure CPU freq is non-negative

* pm-tools:
  pm-graph v5.9

* pm-docs:
  Documentation: PM: Drop pme_interrupt reference
2022-07-29 19:46:00 +02:00
Rafael J. Wysocki
954a83fc60 Merge branches 'pm-core', 'pm-sleep', 'powercap', 'pm-domains' and 'pm-em'
Merge core device power management changes for v5.20-rc1:

 - Extend support for wakeirq to callback wrappers used during system
   suspend and resume (Ulf Hansson).

 - Defer waiting for device probe before loading a hibernation image
   till the first actual device access to avoid possible deadlocks
   reported by syzbot (Tetsuo Handa).

 - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
   Helgaas).

 - Add Raptor Lake-P to the list of processors supported by the Intel
   RAPL driver (George D Sworo).

 - Add Alder Lake-N and Raptor Lake-P to the list of processors for
   which Power Limit4 is supported in the Intel RAPL driver (Sumeet
   Pawnikar).

 - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
   attempting to remove it (Hsin-Yi Wang).

 - Change the Energy Model code to represent power in micro-Watts and
   adjust its users accordingly (Lukasz Luba).

* pm-core:
  PM: runtime: Extend support for wakeirq for force_suspend|resume

* pm-sleep:
  PM: hibernate: defer device probing when resuming from hibernation
  PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP

* powercap:
  powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
  powercap: intel_rapl: Add support for RAPTORLAKE_P

* pm-domains:
  PM: domains: Ensure genpd_debugfs_dir exists before remove

* pm-em:
  cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
  firmware: arm_scmi: Get detailed power scale from perf
  Documentation: EM: Switch to micro-Watts scale
  PM: EM: convert power field to micro-Watts precision and align drivers
2022-07-29 19:33:13 +02:00
Shivnandan Kumar
8d36694245 PM: QoS: Add check to make sure CPU freq is non-negative
CPU frequency should never be negative.

If some client driver calls freq_qos_update_request with a
negative value which will be very high in absolute terms,
then frequency QoS sets max CPU freq at fmax as it considers
it's absolute value but it will add plist node with negative
priority.

plist node has priority from INT_MIN (highest) to INT_MAX(lowest).
Once priority is set as negative, another client will not be able
to reduce CPU frequency.

Adding check to make sure CPU freq is non-negative will fix
this problem.

Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26 20:48:33 +02:00
Tetsuo Handa
8386c414e2 PM: hibernate: defer device probing when resuming from hibernation
syzbot is reporting hung task at misc_open() [1], for there is a race
window of AB-BA deadlock which involves probe_count variable. Currently
wait_for_device_probe() from snapshot_open() from misc_open() can sleep
forever with misc_mtx held if probe_count cannot become 0.

When a device is probed by hub_event() work function, probe_count is
incremented before the probe function starts, and probe_count is
decremented after the probe function completed.

There are three cases that can prevent probe_count from dropping to 0.

  (a) A device being probed stopped responding (i.e. broken/malicious
      hardware).

  (b) A process emulating a USB device using /dev/raw-gadget interface
      stopped responding for some reason.

  (c) New device probe requests keeps coming in before existing device
      probe requests complete.

The phenomenon syzbot is reporting is (b). A process which is holding
system_transition_mutex and misc_mtx is waiting for probe_count to become
0 inside wait_for_device_probe(), but the probe function which is called
 from hub_event() work function is waiting for the processes which are
blocked at mutex_lock(&misc_mtx) to respond via /dev/raw-gadget interface.

This patch mitigates (b) by deferring wait_for_device_probe() from
snapshot_open() to snapshot_write() and snapshot_ioctl(). Please note that
the possibility of (b) remains as long as any thread which is emulating a
USB device via /dev/raw-gadget interface can be blocked by uninterruptible
blocking operations (e.g. mutex_lock()).

Please also note that (a) and (c) are not addressed. Regarding (c), we
should change the code to wait for only one device which contains the
image for resuming from hibernation. I don't know how to address (a), for
use of timeout for wait_for_device_probe() might result in loss of user
data in the image. Maybe we should require the userland to wait for the
image device before opening /dev/snapshot interface.

Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26 20:39:01 +02:00
Lukasz Luba
ae6ccaa650 PM: EM: convert power field to micro-Watts precision and align drivers
The milli-Watts precision causes rounding errors while calculating
efficiency cost for each OPP. This is especially visible in the 'simple'
Energy Model (EM), where the power for each OPP is provided from OPP
framework. This can cause some OPPs to be marked inefficient, while
using micro-Watts precision that might not happen.

Update all EM users which access 'power' field and assume the value is
in milli-Watts.

Solve also an issue with potential overflow in calculation of energy
estimation on 32bit machine. It's needed now since the power value
(thus the 'cost' as well) are higher.

Example calculation which shows the rounding error and impact:

power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz

power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000
power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18

power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961
power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21

max_freq = 2000MHz

cost_a_mW = 18 * 2000MHz/500MHz = 72
cost_a_uW = 18000 * 2000MHz/500MHz = 72000

cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better
cost_b_uW = 21961 * 2000MHz/600MHz = 73203

The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly
better that the 'cost_b_uW' (this patch uses micro-Watts) and such
would have impact on the 'inefficient OPPs' information in the Cpufreq
framework. This patch set removes the rounding issue.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-15 19:17:30 +02:00
Bart Van Assche
568e34ed73 PM: Use the enum req_op and blk_opf_t types
Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
variables that represent request flags. Combine the first two
hib_submit_io() arguments into a single argument.

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-62-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:33 -06:00
Kalesh Singh
261e224d6a pm/sleep: Add PM_USERSPACE_AUTOSLEEP Kconfig
Systems that initiate frequent suspend/resume from userspace
can make the kernel aware by enabling PM_USERSPACE_AUTOSLEEP
config.

This allows for certain sleep-sensitive code (wireguard/rng) to
decide on what preparatory work should be performed (or not) in
their pm_notification callbacks.

This patch was prompted by the discussion at [1] which attempts
to remove CONFIG_ANDROID that currently guards these code paths.

[1] https://lore.kernel.org/r/20220629150102.1582425-1-hch@lst.de/

Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Link: https://lore.kernel.org/r/20220630191230.235306-1-kaleshsingh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-01 10:39:20 +02:00
Dmitry Osipenko
2027732600 PM: hibernate: Use kernel_can_power_off()
Use new kernel_can_power_off() API instead of legacy pm_power_off global
variable to fix regressed hibernation to disk where machine no longer
powers off when it should because ACPI power driver transitioned to the
new sys-off based API and it doesn't use pm_power_off anymore.

Fixes: 98f30d0ecf ("ACPI: power: Switch to sys-off handler API")
Tested-by: Ken Moffat <zarniwhoop@ntlworld.com>
Reported-by: Ken Moffat <zarniwhhop@ntlworld.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-21 20:57:30 +02:00
Linus Torvalds
9d004b2f4f cxl for 5.19
- Add driver-core infrastructure for lockdep validation of
   device_lock(), and fixup a deadlock report that was previously hidden
   behind the 'lockdep no validate' policy.
 
 - Add CXL _OSC support for claiming native control of CXL hotplug and
   error handling.
 
 - Disable suspend in the presence of CXL memory unless and until a
   protocol is identified for restoring PCI device context from memory
   hosted on CXL PCI devices.
 
 - Add support for snooping CXL mailbox commands to protect against
   inopportune changes, like set-partition with the 'immediate' flag set.
 
 - Rework how the driver detects legacy CXL 1.1 configurations (CXL DVSEC
   / 'mem_enable') before enabling new CXL 2.0 decode configurations (CXL
   HDM Capability).
 
 - Miscellaneous cleanups and fixes from -next exposure.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYpFUogAKCRDfioYZHlFs
 Zz+VAP9o/NkYhbaM2Ne9ImgsdJii96gA8nN7q/q/ZoXjsSx2WQD+NRC5d3ZwZDCa
 9YKEkntnvbnAZOCs+ZUuyZBgNh6vsgU=
 =p92w
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl updates from Dan Williams:
 "Compute Express Link (CXL) updates for this cycle.

  The highlight is new driver-core infrastructure and CXL subsystem
  changes for allowing lockdep to validate device_lock() usage. Thanks
  to PeterZ for setting me straight on the current capabilities of the
  lockdep API, and Greg acked it as well.

  On the CXL ACPI side this update adds support for CXL _OSC so that
  platform firmware knows that it is safe to still grant Linux native
  control of PCIe hotplug and error handling in the presence of CXL
  devices. A circular dependency problem was discovered between suspend
  and CXL memory for cases where the suspend image might be stored in
  CXL memory where that image also contains the PCI register state to
  restore to re-enable the device. Disable suspend for now until an
  architecture is defined to clarify that conflict.

  Lastly a collection of reworks, fixes, and cleanups to the CXL
  subsystem where support for snooping mailbox commands and properly
  handling the "mem_enable" flow are the highlights.

  Summary:

   - Add driver-core infrastructure for lockdep validation of
     device_lock(), and fixup a deadlock report that was previously
     hidden behind the 'lockdep no validate' policy.

   - Add CXL _OSC support for claiming native control of CXL hotplug and
     error handling.

   - Disable suspend in the presence of CXL memory unless and until a
     protocol is identified for restoring PCI device context from memory
     hosted on CXL PCI devices.

   - Add support for snooping CXL mailbox commands to protect against
     inopportune changes, like set-partition with the 'immediate' flag
     set.

   - Rework how the driver detects legacy CXL 1.1 configurations (CXL
     DVSEC / 'mem_enable') before enabling new CXL 2.0 decode
     configurations (CXL HDM Capability).

   - Miscellaneous cleanups and fixes from -next exposure"

* tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (47 commits)
  cxl/port: Enable HDM Capability after validating DVSEC Ranges
  cxl/port: Reuse 'struct cxl_hdm' context for hdm init
  cxl/port: Move endpoint HDM Decoder Capability init to port driver
  cxl/pci: Drop @info argument to cxl_hdm_decode_init()
  cxl/mem: Merge cxl_dvsec_ranges() and cxl_hdm_decode_init()
  cxl/mem: Skip range enumeration if mem_enable clear
  cxl/mem: Consolidate CXL DVSEC Range enumeration in the core
  cxl/pci: Move cxl_await_media_ready() to the core
  cxl/mem: Validate port connectivity before dvsec ranges
  cxl/mem: Fix cxl_mem_probe() error exit
  cxl/pci: Drop wait_for_valid() from cxl_await_media_ready()
  cxl/pci: Consolidate wait_for_media() and wait_for_media_ready()
  cxl/mem: Drop mem_enabled check from wait_for_media()
  nvdimm: Fix firmware activation deadlock scenarios
  device-core: Kill the lockdep_mutex
  nvdimm: Drop nd_device_lock()
  ACPI: NFIT: Drop nfit_device_lock()
  nvdimm: Replace lockdep_mutex with local lock classes
  cxl: Drop cxl_device_lock()
  cxl/acpi: Add root device lockdep validation
  ...
2022-05-27 21:24:19 -07:00
Rafael J. Wysocki
16a23f394d Merge branches 'pm-em' and 'pm-cpuidle'
Marge Energy Model support updates and cpuidle updates for 5.19-rc1:

 - Update the Energy Model support code to allow the Energy Model to be
   artificial, which means that the power values may not be on a uniform
   scale with other devices providing power information, and update the
   cpufreq_cooling and devfreq_cooling thermal drivers to support
   artificial Energy Models (Lukasz Luba).

 - Make DTPM check the Energy Model type (Lukasz Luba).

 - Fix policy counter decrementation in cpufreq if Energy Model is in
   use (Pierre Gondois).

 - Add AlderLake processor support to the intel_idle driver (Zhang Rui).

 - Fix regression leading to no genpd governor in the PSCI cpuidle
   driver and fix the riscv-sbi cpuidle driver to allow a genpd
   governor to be used (Ulf Hansson).

* pm-em:
  PM: EM: Decrement policy counter
  powercap: DTPM: Check for Energy Model type
  thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling
  Documentation: EM: Add artificial EM registration description
  PM: EM: Remove old debugfs files and print all 'flags'
  PM: EM: Change the order of arguments in the .active_power() callback
  PM: EM: Use the new .get_cost() callback while registering EM
  PM: EM: Add artificial EM flag
  PM: EM: Add .get_cost() callback

* pm-cpuidle:
  cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used
  cpuidle: psci: Fix regression leading to no genpd governor
  intel_idle: Add AlderLake support
2022-05-23 19:18:51 +02:00
Pierre Gondois
c9d8923bfb PM: EM: Decrement policy counter
In commit e458716a92 ("PM: EM: Mark inefficiencies in CPUFreq"),
cpufreq_cpu_get() is called without a cpufreq_cpu_put(), permanently
increasing the reference counts of the policy struct.

Decrement the reference count once the policy struct is not used
anymore.

Fixes: e458716a92 ("PM: EM: Mark inefficiencies in CPUFreq")
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-11 19:15:14 +02:00
Dan Williams
9ea4dcf498 PM: CXL: Disable suspend
The CXL specification claims S3 support at a hardware level, but at a
system software level there are some missing pieces. Section 9.4 (CXL
2.0) rightly claims that "CXL mem adapters may need aux power to retain
memory context across S3", but there is no enumeration mechanism for the
OS to determine if a given adapter has that support. Moreover the save
state and resume image for the system may inadvertantly end up in a CXL
device that needs to be restored before the save state is recoverable.
I.e. a circular dependency that is not resolvable without a third party
save-area.

Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly
allows for suspend, but requires unbinding all CXL memory devices before
the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind
flow is intended to also tear down all CXL memory regions associated
with a given cxl_memdev.

It is reasonable to assume that any device participating in a System RAM
range published in the EFI memory map is covered by aux power and
save-area outside the device itself. So this restriction can be
minimized in the future once pre-existing region enumeration support
arrives, and perhaps a spec update to clarify if the EFI memory map is
sufficent for determining the range of devices managed by
platform-firmware for S3 support.

Per Rafael, if the CXL configuration prevents suspend then it should
fail early before tasks are frozen, and mem_sleep should stop showing
'mem' as an option [1]. Effectively CXL augments the platform suspend
->valid() op since, for example, the ACPI ops are not aware of the CXL /
PCI dependencies. Given the split role of platform firmware vs OS
provisioned CXL memory it is up to the cxl_mem driver to determine if
the CXL configuration has elements that platform firmware may not be
prepared to restore.

Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1]
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-22 16:09:42 -07:00
Haowen Bai
e5a3b0c5b6 PM: hibernate: Don't mark comment as kernel-doc
Change the comment to a normal (non-kernel-doc) comment to avoid
these kernel-doc warnings:

kernel/power/snapshot.c:335: warning: This comment starts with '/**', but
 isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Data types related to memory bitmaps.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 17:09:37 +02:00
Yang Li
467df4cfdc PM: hibernate: Fix some kernel-doc comments
Add parameter description in alloc_rtree_node()  kernel-doc
comment and fix several inconsistent function name descriptions.

Remove some warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.

kernel/power/snapshot.c:438: warning: Function parameter or member
'gfp_mask' not described in 'alloc_rtree_node'
kernel/power/snapshot.c:438: warning: Function parameter or member
'safe_needed' not described in 'alloc_rtree_node'
kernel/power/snapshot.c:438: warning: Function parameter or member 'ca'
not described in 'alloc_rtree_node'
kernel/power/snapshot.c:438: warning: Function parameter or member
'list' not described in 'alloc_rtree_node'
kernel/power/snapshot.c:916: warning: expecting prototype for
memory_bm_rtree_next_pfn(). Prototype was for memory_bm_next_pfn()
instead
kernel/power/snapshot.c:1947: warning: expecting prototype for
alloc_highmem_image_pages(). Prototype was for alloc_highmem_pages()
instead
kernel/power/snapshot.c:2230: warning: expecting prototype for load
header(). Prototype was for load_header() instead

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
[ rjw: Comment adjustments to avoid line breaks ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:44:20 +02:00
David Cohen
ce1cb680ff PM: sleep: enable dynamic debug support within pm_pr_dbg()
Currently pm_pr_dbg() is used to filter kernel pm debug messages based
on pm_debug_messages_on flag. The problem is if we enable/disable this
flag it will affect all pm_pr_dbg() calls at once, so we can't
individually control them.

This patch changes pm_pr_dbg() implementation as such:

 - If pm_debug_messages_on is enabled, print the message.
 - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is
   enabled, only print the messages explicitly enabled on
   /sys/kernel/debug/dynamic_debug/control.
 - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is
   disabled, don't print the message.

Signed-off-by: David Cohen <dacohen@pm.me>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:34:01 +02:00
David Cohen
ae20cb9aec PM: sleep: Narrow down -DDEBUG on kernel/power/ files
The macro -DDEBUG is broadly enabled on kernel/power/ directory if
CONFIG_DYNAMIC_DEBUG is enabled. As side effect all debug messages using
pr_debug() and dev_dbg() are enabled by default on dynamic debug.
We're reworking pm_pr_dbg() to support dynamic debug, where pm_pr_dbg()
will print message if either pm_debug_messages_on flag is set or if it's
explicitly enabled on dynamic debug's control. That means if we let
-DDEBUG broadly set, the pm_debug_messages_on flag will be bypassed by
default on pm_pr_dbg() if dynamic debug is also enabled.

The files that directly use pr_debug() and dev_dbg() on kernel/power/ are:
 - swap.c
 - snapshot.c
 - energy_model.c

And those files do not use pm_pr_dbg(). So if we limit -DDEBUG to them,
we keep the same functional behavior while allowing the pm_pr_dbg()
refactor.

Signed-off-by: David Cohen <dacohen@pm.me>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:34:01 +02:00
Lukasz Luba
16857482b8 PM: EM: Remove old debugfs files and print all 'flags'
The Energy Model gets more bits used in 'flags'. Avoid adding another
debugfs file just to print what is the status of a new flag. Simply
remove old debugfs files and add one generic which prints all flags
as a hex value.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:26:17 +02:00
Lukasz Luba
75a3a99a5a PM: EM: Change the order of arguments in the .active_power() callback
The .active_power() callback passes the device pointer when it's called.
Aligned with a convetion present in other subsystems and pass the 'dev'
as a first argument. It looks more cleaner.

Adjust all affected drivers which implement that API callback.

Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:26:17 +02:00
Lukasz Luba
9136246311 PM: EM: Use the new .get_cost() callback while registering EM
The Energy Model (EM) allows to provide the 'cost' values when the device
driver provides the .get_cost() optional callback. This removes
restriction which is in the EM calculation function of the 'cost'
for each performance state. Now, the driver is in charge of providing
the right values which are then used by Energy Aware Scheduler.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:26:17 +02:00
Pierre Gondois
fc3a9a9858 PM: EM: Add artificial EM flag
The Energy Model (EM) can be used on platforms which are missing real
power information. Those platforms would implement .get_cost() which
populates needed values for the Energy Aware Scheduler (EAS). The EAS
doesn't use 'power' fields from EM, but other frameworks might use them.
Thus, to avoid miss-usage of this specific type of EM, introduce a new
flags which can be checked by other frameworks.

Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:26:17 +02:00
Linus Torvalds
616355cc81 for-5.18/block-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0+GcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUpD/9aTJEnj7VCw7UouSsg098sdjtoy9ilslU3
 ew47K8CIXHbCB4CDqLnFyvCwAdG1XGgS+fUmFAxvTr29R9SZeS5d+bXL6sZzEo0C
 bwxsJy9MM2QRtMvB+giAt1myXbwB8cG+ketMBWXqwXXRHRzPbbQfMZia7FqWMnfY
 KQanH9IwYHp1oa5U/W6Qcjm4oCnLgBMRwqByzUCtiF3y9qgaLkK+3IgkNwjJQjLA
 DTeUJ/9CgxGQQbzA+LPktbw2xfTqiUfcKq0mWx6Zt4wwNXn1ClqUDUXX6QSM8/5u
 3OimbscSkEPPTIYZbVBPkhFnAlQb4JaJEgOrbXvYKVV2Dh+eZY81XwNeE/E8gdBY
 TnHOTOCjkN/4sR3hIrWazlJzPLdpPA0eOYrhguCraQsX9mcsYNxlJ9otRv/Ve99g
 uqL0RZg3+NoK84fm79FCGy/ZmPQJvJttlBT9CKVwylv/Lky42xWe7AdM3OipKluY
 2nh+zN5Ai7WxZdTKXQFRhCSWfWQ+1qW51tB3dcGW+BooZr/oox47qKQVcHsEWbq1
 RNR45F5a4AuPwYUHF/P36WviLnEuq9AvX7OTTyYOplyVQohKIoDXp9chVzLNzBiZ
 KBR00W6MLKKKN+8foalQWgNyb2i2PH7Ib4xRXvXj/22Vwxg5UmUoBmSDSas9SZUS
 +dMo7CtNgA==
 =DpgP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Randy Dunlap
7a64ca17e4 PM: suspend: fix return value of __setup handler
If an invalid option is given for "test_suspend=<option>", the entire
string is added to init's environment, so return 1 instead of 0 from
the __setup handler.

  Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc5
    test_suspend=invalid"

and

 Run /sbin/init as init process
   with arguments:
     /sbin/init
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc5
     test_suspend=invalid

Fixes: 2ce986892f ("PM / sleep: Enhance test_suspend option with repeat capability")
Fixes: 27ddcc6596 ("PM / sleep: Add state field to pm_states[] entries")
Fixes: a9d7052363 ("PM: Separate suspend to RAM functionality from core")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-01 18:55:07 +01:00
Randy Dunlap
ba7ffcd4c4 PM: hibernate: fix __setup handler error handling
If an invalid value is used in "resumedelay=<seconds>", it is
silently ignored. Add a warning message and then let the __setup
handler return 1 to indicate that the kernel command line option
has been handled.

Fixes: 317cf7e5e8 ("PM / hibernate: convert simple_strtoul to kstrtoul")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-01 18:51:11 +01:00
Jiapeng Chong
444e1154b2 PM: hibernate: Clean up non-kernel-doc comments
Address the following W=1 kernel build warning:

kernel/power/swap.c:120: warning: This comment starts with '/**', but
isn't a kernel-doc comment. Refer
Documentation/doc-guide/kernel-doc.rst.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-01 16:20:46 +01:00
Ye Bin
3f51aa9e29 PM: hibernate: fix load_image_and_restore() error path
As 'swsusp_check' open 'hib_resume_bdev', if call 'create_basic_memory_bitmaps'
failed, we need to close 'hib_resume_bdev' in 'load_image_and_restore' function.

Signed-off-by: Ye Bin <yebin10@huawei.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-16 19:47:52 +01:00
Rafael J. Wysocki
cb1f65c1e1 PM: s2idle: ACPI: Fix wakeup interrupts handling
After commit e3728b50cd ("ACPI: PM: s2idle: Avoid possible race
related to the EC GPE") wakeup interrupts occurring immediately after
the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
the SCI triggers again immediately after the rearming in
acpi_s2idle_wake(), that wakeup may be missed too.

The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
when pm_wakeup_irq is 0, but that's not the case any more after the
interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
there may be wakeup interrupts occurring in that time frame and if
that happens, they will be missed.

To address that issue first move the clearing of pm_wakeup_irq to
the point at which it is known that the interrupt causing
acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
for wakeup.  Moreover, because that only reduces the size of the
time window in which the issue may manifest itself, allow
pm_system_irq_wakeup() to register two second wakeup interrupts in
a row and, when discarding the first one, replace it with the second
one.  [Of course, this assumes that only one wakeup interrupt can be
discarded in one go, but currently that is the case and I am not
aware of any plans to change that.]

Fixes: e3728b50cd ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-07 21:02:31 +01:00
Christoph Hellwig
07888c665b block: pass a block_device and opf to bio_alloc
Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment.  NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig
322cbb50de block: remove genhd.h
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it.  So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Amadeusz Sławiński
33569ef3c7 PM: hibernate: Remove register_nosave_region_late()
It is an unused wrapper forcing kmalloc allocation for registering
nosave regions. Also, rename __register_nosave_region() to
register_nosave_region() now that there is no need for disambiguation.

Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-25 18:34:08 +01:00
Greg Kroah-Hartman
c9d967b2ce PM: wakeup: simplify the output logic of pm_show_wakelocks()
The buffer handling in pm_show_wakelocks() is tricky, and hopefully
correct.  Ensure it really is correct by using sysfs_emit_at() which
handles all of the tricky string handling logic in a PAGE_SIZE buffer
for us automatically as this is a sysfs file being read from.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-25 18:27:02 +01:00
David Woodhouse
74d9555580 PM: hibernate: Allow ACPI hardware signature to be honoured
Theoretically, when the hardware signature in FACS changes, the OS
is supposed to gracefully decline to attempt to resume from S4:

 "If the signature has changed, OSPM will not restore the system
  context and can boot from scratch"

In practice, Windows doesn't do this and many laptop vendors do allow
the signature to change especially when docking/undocking, so it would
be a bad idea to simply comply with the specification by default in the
general case.

However, there are use cases where we do want the compliant behaviour
and we know it's safe. Specifically, when resuming virtual machines where
we know the hypervisor has changed sufficiently that resume will fail.
We really want to be able to *tell* the guest kernel not to try, so it
boots cleanly and doesn't just crash. This patch provides a way to opt
in to the spec-compliant behaviour on the command line.

A follow-up patch may do this automatically for certain "known good"
machines based on a DMI match, or perhaps just for all hypervisor
guests since there's no good reason a hypervisor would change the
hardware_signature that it exposes to guests *unless* it wants them
to obey the ACPI specification.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-08 16:06:10 +01:00
Evan Green
88a5045f17 PM: hibernate: Fix snapshot partial write lengths
snapshot_write() is inappropriately limiting the amount of data that can
be written in cases where a partial page has already been written. For
example, one would expect to be able to write 1 byte, then 4095 bytes to
the snapshot device, and have both of those complete fully (since now
we're aligned to a page again). But what ends up happening is we write 1
byte, then 4094/4095 bytes complete successfully.

The reason is that simple_write_to_buffer()'s second argument is the
total size of the buffer, not the size of the buffer minus the offset.
Since simple_write_to_buffer() accounts for the offset in its
implementation, snapshot_write() can just pass the full page size
directly down.

Signed-off-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-24 13:50:18 +01:00
Thomas Zeitlhofer
cefcf24b4d PM: hibernate: use correct mode for swsusp_close()
Commit 39fbef4b0f ("PM: hibernate: Get block device exclusively in
swsusp_check()") changed the opening mode of the block device to
(FMODE_READ | FMODE_EXCL).

In the corresponding calls to swsusp_close(), the mode is still just
FMODE_READ which triggers the warning in blkdev_flush_mapping() on
resume from hibernate.

So, use the mode (FMODE_READ | FMODE_EXCL) also when closing the
device.

Fixes: 39fbef4b0f ("PM: hibernate: Get block device exclusively in swsusp_check()")
Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-24 13:45:54 +01:00
Linus Torvalds
833db72142 Power management updates for 5.16-rc1
- Add support for inefficient operating performance points to the
    Energy Model and modify cpufreq to use them properly (Vincent
    Donnefort).
 
  - Rearrange the DTPM framework code to simplify it and make it easier
    to follow (Daniel Lezcano).
 
  - Fix power intialization in DTPM (Daniel Lezcano).
 
  - Add CPU load consideration when estimating the instaneous power
    consumption in DTPM (Daniel Lezcano).
 
  - Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang
    Rui).
 
  - Make intel_pstate process HWP Guaranteed change notifications from
    the processor (Srinivas Pandruvada).
 
  - Fix typo in cpufreq.h (Rafael Wysocki).
 
  - Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
 
  - Fix the parameter usage of the newly added perf-domain API (Hector
    Yuan).
 
  - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
    Guenter Roeck, and Arnd Bergmann).
 
  - Fix kobject memory leaks in cpuidle error paths (Anel Orazgaliyeva).
 
  - Make intel_idle enable interrupts before entering C1 on some Xeon
    processor models (Artem Bityutskiy).
 
  - Clean up hib_wait_io() (Falla Coulibaly).
 
  - Fix sparse warnings in hibernation-related code (Anders Roxell).
 
  - Use vzalloc() and kzalloc() instead of their open-coded
    equivalents in hibernation-related code (Cai Huoqing).
 
  - Prevent user space from crashing the kernel by attempting to
    restore the system state from a swap partition in use (Ye Bin).
 
  - Do not let "syscore" devices runtime-suspend during system PM
    transitions (Rafael Wysocki).
 
  - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki).
 
  - Pause cpuidle later and resume it earlier during system PM
    transitions (Rafael Wysocki).
 
  - Make system suspend code use valid_state() consistently (Rafael
    Wysocki).
 
  - Add support for enabling wakeup IRQs after invoking the
    ->runtime_suspend() callback and make two drivers use it (Chunfeng
    Yun).
 
  - Make the association of ACPI device objects with PCI devices more
    straightforward and simplify the code doing that for all devices
    in general (Rafael Wysocki).
 
  - Eliminate struct pci_platform_pm_ops and handle the both of its
    users (PCI and Intel MID) directly in the PCI bus code (Rafael
    Wysocki).
 
  - Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki).
 
  - Fix ordering of operations in pci_back_from_sleep() (Rafael
    Wysocki).
 
  - Make exynos-ppmu use hyphens in DT properties (Krzysztof
    Kozlowski).
 
  - Simplify parsing event-type from DT in exynos-ppmu (Krzysztof
    Kozlowski).
 
  - Strengthen check for freq_table in devfreq (Samuel Holland).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmGBkiQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxDZwP/1kyhVdk+FSMMwEfhn8NriOkE8drJ3nB
 Y/PWI93oPhgA7ITMwBv4ufLcxx6uYu5bctVDx3XG2lqvFC30t1fzTosJiwireRaS
 WC5p7ZTqAECxd1LiZWPjL5EzHmtbyzrz3/AxSxOEN1K66qENBd6q5QldKZylxhY3
 bawCQjabz6CLXvOjzdv8M8A7Fmd8LTcjHI5Y3/IOcDydPJqyN4/rDCoqft3/xcNB
 kwN6de73aQwB3AQYufS/VAiNN4XOOkhPwF4QfULmAlnMdCsov6YzahtMB2+oG7O4
 G3DF/OVFrONr3GPMMuMJSC6GXyFiBuW8FRva4W9HpY0MA8xVGLPUpwlpaFVaX1+c
 vAYcRBTyJvOWgRap8+q+UKTlkj37pAgHp7kRiaO1wkVnKxJB1w40OSJZO1nnsExe
 3qeCJHOJ9r+S/FsSPKCmws8vr0XQH5wPXY639Kmj9OI/t3gXGrfy3cXm9pa+gSh0
 eMyHxtCp5ItT7V2FMpYh+wn+wfe5h//sK3tESZs+h6FKwJG1hYIbG4+F3ztIgzHp
 t0rT3JXZIkY41KREGFhCMS9+wnLugOik21w9O0qVZfn/dJtDe73Kely7rY/EA3Mw
 H4aBJDD19BvbIKqaTguxJXEc9zJI737fy/Ze4rrzTDkbXU8qVmjvFoEl2i/Ef4o2
 b6aiDdz3V/CW
 =L2Jo
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These make the power management of PCI devices with ACPI companions
  more straightforwad, add support for inefficient operating performance
  points to the Energy model and make cpufreq handle them as
  appropriate, rearrange the handling of cpuidle during system PM
  transitions, update a few cpufreq drivers and intel_idle, fix assorded
  issues and clean up code in multiple places.

  Specifics:

   - Add support for inefficient operating performance points to the
     Energy Model and modify cpufreq to use them properly (Vincent
     Donnefort).

   - Rearrange the DTPM framework code to simplify it and make it easier
     to follow (Daniel Lezcano).

   - Fix power intialization in DTPM (Daniel Lezcano).

   - Add CPU load consideration when estimating the instaneous power
     consumption in DTPM (Daniel Lezcano).

   - Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang
     Rui).

   - Make intel_pstate process HWP Guaranteed change notifications from
     the processor (Srinivas Pandruvada).

   - Fix typo in cpufreq.h (Rafael Wysocki).

   - Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).

   - Fix the parameter usage of the newly added perf-domain API (Hector
     Yuan).

   - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
     Guenter Roeck, and Arnd Bergmann).

   - Fix kobject memory leaks in cpuidle error paths (Anel
     Orazgaliyeva).

   - Make intel_idle enable interrupts before entering C1 on some Xeon
     processor models (Artem Bityutskiy).

   - Clean up hib_wait_io() (Falla Coulibaly).

   - Fix sparse warnings in hibernation-related code (Anders Roxell).

   - Use vzalloc() and kzalloc() instead of their open-coded equivalents
     in hibernation-related code (Cai Huoqing).

   - Prevent user space from crashing the kernel by attempting to
     restore the system state from a swap partition in use (Ye Bin).

   - Do not let "syscore" devices runtime-suspend during system PM
     transitions (Rafael Wysocki).

   - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki).

   - Pause cpuidle later and resume it earlier during system PM
     transitions (Rafael Wysocki).

   - Make system suspend code use valid_state() consistently (Rafael
     Wysocki).

   - Add support for enabling wakeup IRQs after invoking the
     ->runtime_suspend() callback and make two drivers use it (Chunfeng
     Yun).

   - Make the association of ACPI device objects with PCI devices more
     straightforward and simplify the code doing that for all devices in
     general (Rafael Wysocki).

   - Eliminate struct pci_platform_pm_ops and handle the both of its
     users (PCI and Intel MID) directly in the PCI bus code (Rafael
     Wysocki).

   - Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki).

   - Fix ordering of operations in pci_back_from_sleep() (Rafael
     Wysocki).

   - Make exynos-ppmu use hyphens in DT properties (Krzysztof
     Kozlowski).

   - Simplify parsing event-type from DT in exynos-ppmu (Krzysztof
     Kozlowski).

   - Strengthen check for freq_table in devfreq (Samuel Holland)"

* tag 'pm-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (49 commits)
  cpufreq: Fix parameter in parse_perf_domain()
  usb: mtu3: enable wake-up interrupt after runtime_suspend called
  usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called
  PM / wakeirq: support enabling wake-up irq after runtime_suspend called
  PM / devfreq: Strengthen check for freq_table
  devfreq: exynos-ppmu: simplify parsing event-type from DT
  devfreq: exynos-ppmu: use node names with hyphens
  cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
  PM: suspend: Use valid_state() consistently
  PM: sleep: Pause cpuidle later and resume it earlier during system transitions
  PM: suspend: Do not pause cpuidle in the suspend-to-idle path
  PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
  PM: hibernate: Get block device exclusively in swsusp_check()
  powercap/drivers/dtpm: Fix power limit initialization
  powercap/drivers/dtpm: Scale the power with the load
  powercap/drivers/dtpm: Use container_of instead of a private data field
  powercap/drivers/dtpm: Simplify the dtpm table
  powercap/drivers/dtpm: Encapsulate even more the code
  PM: hibernate: swap: Use vzalloc() and kzalloc()
  PM: hibernate: fix sparse warnings
  ...
2021-11-02 16:04:28 -07:00
Rafael J. Wysocki
bf56b90797 Merge branches 'pm-em' and 'powercap'
Merge Energy Model and power capping updates for 5.16-rc1:

 - Add support for inefficient operating performance points to the
   Energy Model and modify cpufreq to use them properly (Vincent
   Donnefort).

 - Rearrange the DTPM framework code to simplify it and make it easier
   to follow (Daniel Lezcano).

 - Fix power intialization in DTPM (Daniel Lezcano).

 - Add CPU load consideration when estimating the instaneous power
   consumption in DTPM (Daniel Lezcano).

* pm-em:
  cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
  PM: EM: Mark inefficiencies in CPUFreq
  cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
  cpufreq: Introducing CPUFREQ_RELATION_E
  cpufreq: Add an interface to mark inefficient frequencies
  cpufreq: Make policy min/max hard requirements
  PM: EM: Allow skipping inefficient states
  PM: EM: Extend em_perf_domain with a flag field
  PM: EM: Mark inefficient states
  PM: EM: Fix inefficient states detection

* powercap:
  powercap/drivers/dtpm: Fix power limit initialization
  powercap/drivers/dtpm: Scale the power with the load
  powercap/drivers/dtpm: Use container_of instead of a private data field
  powercap/drivers/dtpm: Simplify the dtpm table
  powercap/drivers/dtpm: Encapsulate even more the code
2021-11-02 19:31:28 +01:00
Rafael J. Wysocki
9f6abfcd67 PM: suspend: Use valid_state() consistently
Make valid_state() check if the ->enter callback is present in
suspend_ops (only PM_SUSPEND_TO_IDLE can be valid otherwise) and
make sleep_state_supported() call valid_state() consistently to
validate the states other than PM_SUSPEND_TO_IDLE.

While at it, clean up the comment in valid_state().

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-26 15:52:58 +02:00
Rafael J. Wysocki
23f62d7ab2 PM: sleep: Pause cpuidle later and resume it earlier during system transitions
Commit 8651f97bd9 ("PM / cpuidle: System resume hang fix with
cpuidle") that introduced cpuidle pausing during system suspend
did that to work around a platform firmware issue causing systems
to hang during resume if CPUs were allowed to enter idle states
in the system suspend and resume code paths.

However, pausing cpuidle before the last phase of suspending
devices is the source of an otherwise arbitrary difference between
the suspend-to-idle path and other system suspend variants, so it is
cleaner to do that later, before taking secondary CPUs offline (it
is still safer to take secondary CPUs offline with cpuidle paused,
though).

Modify the code accordingly, but in order to avoid code duplication,
introduce new wrapper functions, pm_sleep_disable_secondary_cpus()
and pm_sleep_enable_secondary_cpus(), to combine cpuidle_pause()
and cpuidle_resume(), respectively, with the handling of secondary
CPUs during system-wide transitions to sleep states.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26 15:52:07 +02:00
Rafael J. Wysocki
8d89835b04 PM: suspend: Do not pause cpuidle in the suspend-to-idle path
It is pointless to pause cpuidle in the suspend-to-idle path,
because it is going to be resumed in the same path later and
pausing it does not serve any particular purpose in that case.

Rework the code to avoid doing that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26 15:52:07 +02:00
Ye Bin
39fbef4b0f PM: hibernate: Get block device exclusively in swsusp_check()
The following kernel crash can be triggered:

[   89.266592] ------------[ cut here ]------------
[   89.267427] kernel BUG at fs/buffer.c:3020!
[   89.268264] invalid opcode: 0000 [#1] SMP KASAN PTI
[   89.269116] CPU: 7 PID: 1750 Comm: kmmpd-loop0 Not tainted 5.10.0-862.14.0.6.x86_64-08610-gc932cda3cef4-dirty #20
[   89.273169] RIP: 0010:submit_bh_wbc.isra.0+0x538/0x6d0
[   89.277157] RSP: 0018:ffff888105ddfd08 EFLAGS: 00010246
[   89.278093] RAX: 0000000000000005 RBX: ffff888124231498 RCX: ffffffffb2772612
[   89.279332] RDX: 1ffff11024846293 RSI: 0000000000000008 RDI: ffff888124231498
[   89.280591] RBP: ffff8881248cc000 R08: 0000000000000001 R09: ffffed1024846294
[   89.281851] R10: ffff88812423149f R11: ffffed1024846293 R12: 0000000000003800
[   89.283095] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8881161f7000
[   89.284342] FS:  0000000000000000(0000) GS:ffff88839b5c0000(0000) knlGS:0000000000000000
[   89.285711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   89.286701] CR2: 00007f166ebc01a0 CR3: 0000000435c0e000 CR4: 00000000000006e0
[   89.287919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   89.289138] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   89.290368] Call Trace:
[   89.290842]  write_mmp_block+0x2ca/0x510
[   89.292218]  kmmpd+0x433/0x9a0
[   89.294902]  kthread+0x2dd/0x3e0
[   89.296268]  ret_from_fork+0x22/0x30
[   89.296906] Modules linked in:

by running the following commands:

 1. mkfs.ext4 -O mmp  /dev/sda -b 1024
 2. mount /dev/sda /home/test
 3. echo "/dev/sda" > /sys/power/resume

That happens because swsusp_check() calls set_blocksize() on the
target partition which confuses the file system:

       Thread1                       Thread2
mount /dev/sda /home/test
get s_mmp_bh  --> has mapped flag
start kmmpd thread
				echo "/dev/sda" > /sys/power/resume
				  resume_store
				    software_resume
				      swsusp_check
				        set_blocksize
					  truncate_inode_pages_range
					    truncate_cleanup_page
					      block_invalidatepage
					        discard_buffer --> clean mapped flag
write_mmp_block
  submit_bh
    submit_bh_wbc
      BUG_ON(!buffer_mapped(bh))

To address this issue, modify swsusp_check() to open the target block
device with exclusive access.

Signed-off-by: Ye Bin <yebin10@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-21 19:38:32 +02:00
Cai Huoqing
9437e39377 PM: hibernate: swap: Use vzalloc() and kzalloc()
Replace vmalloc()/memset() with vzalloc() and kmalloc()/memset() with
kzalloc() to simplify the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-21 13:04:57 +02:00
Anders Roxell
01de5fcd8b PM: hibernate: fix sparse warnings
When building the kernel with sparse enabled 'C=1' the following
warnings shows up:

kernel/power/swap.c:390:29: warning: incorrect type in assignment (different base types)
kernel/power/swap.c:390:29:    expected int ret
kernel/power/swap.c:390:29:    got restricted blk_status_t

This is due to function hib_wait_io() returns a 'blk_status_t' which is
a bitwise u8. Commit 5416da01ff ("PM: hibernate: Remove
blk_status_to_errno in hib_wait_io") seemed to have mixed up the return
type. However, the 4e4cbee93d ("block: switch bios to blk_status_t")
actually broke the behaviour by returning the wrong type.

Rework so function hib_wait_io() returns a 'int' instead of
'blk_status_t' and make sure to call function
blk_status_to_errno(hb->error)' when returning from function
hib_wait_io() a int gets returned.

Fixes: 4e4cbee93d ("block: switch bios to blk_status_t")
Fixes: 5416da01ff ("PM: hibernate: Remove blk_status_to_errno in hib_wait_io")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-21 12:45:18 +02:00
Imran Khan
55df0933be workqueue: Introduce show_one_worker_pool and show_one_workqueue.
Currently show_workqueue_state shows the state of all workqueues and of
all worker pools. In certain cases we may need to dump state of only a
specific workqueue or worker pool. For example in destroy_workqueue we
only need to show state of the workqueue which is getting destroyed.

So rename show_workqueue_state to show_all_workqueues(to signify it
dumps state of all busy workqueues) and divide it into more granular
functions (show_one_workqueue and show_one_worker_pool), that would show
states of individual workqueues and worker pools and can be used in
cases such as the one mentioned above.

Also, as mentioned earlier, make destroy_workqueue dump data pertaining
to only the workqueue that is being destroyed and make user(s) of
earlier interface(show_workqueue_state), use new interface
(show_all_workqueues).

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-20 06:19:03 -10:00
Rafael J. Wysocki
c1bfc59818 Revert "PM: sleep: Do not assume that "mem" is always present"
Revert commit bfcc1e67ff ("PM: sleep: Do not assume that "mem" is
always present"), because it breaks compatibility with user space
utilities assuming that "mem" will always be present in
/sys/power/state.

Fixes: bfcc1e67ff ("PM: sleep: Do not assume that "mem" is always present")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-19 20:59:02 +02:00
Vincent Donnefort
e458716a92 PM: EM: Mark inefficiencies in CPUFreq
The Energy Model has a 1:1 mapping between OPPs and performance states
(em_perf_state). If a CPUFreq driver registers an Energy Model,
inefficiencies found by the latter can be applied to CPUFreq.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
8354eb9eb3 PM: EM: Allow skipping inefficient states
The new performance domain flag EM_PERF_DOMAIN_SKIP_INEFFICIENCIES allows
to not take into account inefficient states when estimating energy
consumption. This intends to let the Energy Model know that CPUFreq itself
will skip inefficiencies and such states don't need to be part of the
estimation anymore.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
88f7a89560 PM: EM: Extend em_perf_domain with a flag field
Merge the current "milliwatts" option into a "flag" field. This intends to
prepare the extension of this structure for inefficient states support in
the Energy Model.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
c8ed99533d PM: EM: Mark inefficient states
Some SoCs, such as the sd855 have OPPs within the same performance domain,
whose cost is higher than others with a higher frequency. Even though
those OPPs are interesting from a cooling perspective, it makes no sense
to use them when the device can run at full capacity. Those OPPs handicap
the performance domain, when choosing the most energy-efficient CPU and
are wasting energy. They are inefficient.

Hence, add support for such OPPs to the Energy Model. The table can now
be read skipping inefficient performance states (and by extension,
inefficient OPPs).

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
aa1a43262a PM: EM: Fix inefficient states detection
Currently, a debug message is printed if an inefficient state is detected
in the Energy Model. Unfortunately, it won't detect if the first state is
inefficient or if two successive states are. Fix this behavior.

Fixes: 27871f7a8a (PM: Introduce an Energy Model management framework)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Falla Coulibaly
5416da01ff PM: hibernate: Remove blk_status_to_errno in hib_wait_io
blk_status_to_errno doesn't appear to perform extra work besides
converting blk_status_t to integer. This patch removes that unnecessary
conversion as the return type of the function is blk_status_t.

Signed-off-by: Falla Coulibaly <fallacoulibalyz@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-15 14:17:14 +02:00
Florian Fainelli
bfcc1e67ff PM: sleep: Do not assume that "mem" is always present
An implementation of suspend_ops is allowed to reject the PM_SUSPEND_MEM
suspend type from its ->valid() callback, we should not assume that it
is always present as this is not a correct reflection of what a firmware
interface may support.

Fixes: 406e79385f ("PM / sleep: System sleep state selection interface rework")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-15 14:04:04 +02:00
Rafael J. Wysocki
fe583359dd Merge branches 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap'
* pm-pci:
  PCI: PM: Enable PME if it can be signaled from D3cold
  PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
  PCI: Use pci_update_current_state() in pci_enable_device_flags()

* pm-sleep:
  PM: sleep: unmark 'state' functions as kernel-doc
  PM: sleep: check RTC features instead of ops in suspend_test
  PM: sleep: s2idle: Replace deprecated CPU-hotplug functions

* pm-domains:
  PM: domains: Fix domain attach for CONFIG_PM_OPP=n
  arm64: dts: sc7180: Add required-opps for i2c
  PM: domains: Add support for 'required-opps' to set default perf state
  opp: Don't print an error if required-opps is missing

* powercap:
  powercap: Add Power Limit4 support for Alder Lake SoC
  powercap: intel_rapl: Replace deprecated CPU-hotplug functions
2021-08-30 19:25:42 +02:00
Randy Dunlap
dbcfa7156f PM: sleep: unmark 'state' functions as kernel-doc
Fix kernel-doc warnings in kernel/power/main.c by unmarking the
comment block as kernel-doc notation. This eliminates the following
kernel-doc warnings:

kernel/power/main.c:593: warning: expecting prototype for state(). Prototype was for state_show() instead
kernel/power/main.c:593: warning: Function parameter or member 'kobj' not described in 'state_show'
kernel/power/main.c:593: warning: Function parameter or member 'attr' not described in 'state_show'
kernel/power/main.c:593: warning: Function parameter or member 'buf' not described in 'state_show'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:49:39 +02:00
Lukasz Luba
7fcc17d0cb PM: EM: Increase energy calculation precision
The Energy Model (EM) provides useful information about device power in
each performance state to other subsystems like: Energy Aware Scheduler
(EAS). The energy calculation in EAS does arithmetic operation based on
the EM em_cpu_energy(). Current implementation of that function uses
em_perf_state::cost as a pre-computed cost coefficient equal to:
cost = power * max_frequency / frequency.
The 'power' is expressed in milli-Watts (or in abstract scale).

There are corner cases when the EAS energy calculation for two Performance
Domains (PDs) return the same value. The EAS compares these values to
choose smaller one. It might happen that this values are equal due to
rounding error. In such scenario, we need better resolution, e.g. 1000
times better. To provide this possibility increase the resolution in the
em_perf_state::cost for 64-bit architectures. The cost of increasing
resolution on 32-bit is pretty high (64-bit division) and is not justified
since there are no new 32bit big.LITTLE EAS systems expected which would
benefit from this higher resolution.

This patch allows to avoid the rounding to milli-Watt errors, which might
occur in EAS energy estimation for each PD. The rounding error is common
for small tasks which have small utilization value.

There are two places in the code where it makes a difference:
1. In the find_energy_efficient_cpu() where we are searching for
best_delta. We might suffer there when two PDs return the same result,
like in the example below.

Scenario:
Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There
are quite a few small tasks ~10-15 util. These tasks would suffer for
the rounding error. These utilization values are typical when running games
on Android. One of our partners has reported 5..10mA less battery drain
when running with increased resolution.

Some details:
We have two PDs: PD0 (big) and PD1 (little)
Let's compare w/o patch set ('old') and w/ patch set ('new')
We are comparing energy w/ task and w/o task placed in the PDs

a) 'old' w/o patch set, PD0
task_util = 13
cost = 480
sum_util_w/o_task = 215
sum_util_w_task = 228
scale_cpu = 1024
energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100
energy_w_task = 480 * 228 / 1024 = 106.87 => 106
energy_diff = 106 - 100 = 6
(this is equal to 'old' PD1's energy_diff in 'c)')

b) 'new' w/ patch set, PD0
task_util = 13
cost = 480 * 1000 = 480000
sum_util_w/o_task = 215
sum_util_w_task = 228
energy_w/o_task = 480000 * 215 / 1024 = 100781
energy_w_task = 480000 * 228 / 1024  = 106875
energy_diff = 106875 - 100781 = 6094
(this is not equal to 'new' PD1's energy_diff in 'd)')

c) 'old' w/o patch set, PD1
task_util = 13
cost = 160
sum_util_w/o_task = 283
sum_util_w_task = 293
scale_cpu = 355
energy_w/o_task = 160 * 283 / 355 = 127.55 => 127
energy_w_task = 160 * 296 / 355 = 133.41 => 133
energy_diff = 133 - 127 = 6
(this is equal to 'old' PD0's energy_diff in 'a)')

d) 'new' w/ patch set, PD1
task_util = 13
cost = 160 * 1000 = 160000
sum_util_w/o_task = 283
sum_util_w_task = 293
scale_cpu = 355
energy_w/o_task = 160000 * 283 / 355 = 127549
energy_w_task = 160000 * 296 / 355 =   133408
energy_diff = 133408 - 127549 = 5859
(this is not equal to 'new' PD0's energy_diff in 'b)')

2. Difference in the 6% energy margin filter at the end of
find_energy_efficient_cpu(). With this patch the margin comparison also
has better resolution, so it's possible to have better task placement
thanks to that.

Fixes: 27871f7a8a ("PM: Introduce an Energy Model management framework")
Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-06 15:30:42 +02:00
Alexandre Belloni
4fac49fd0a PM: sleep: check RTC features instead of ops in suspend_test
Test RTC_FEATURE_ALARM instead of relying on ops->set_alarm to know whether
alarms are available.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-04 20:23:05 +02:00
Sebastian Andrzej Siewior
d2c8cce647 PM: sleep: s2idle: Replace deprecated CPU-hotplug functions
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-04 20:18:13 +02:00
Mike Rapoport
9a436f8ff6 PM: hibernate: disable when there are active secretmem users
It is unsafe to allow saving of secretmem areas to the hibernation
snapshot as they would be visible after the resume and this essentially
will defeat the purpose of secret memory mappings.

Prevent hibernation whenever there are active secret memory users.

Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:21 -07:00
Zhen Lei
480f0de68c PM: hibernate: remove leading spaces before tabs
1) Run the following command to find and remove the leading spaces
    before tabs:
    $ find kernel/power/ -type f | xargs sed -r -i 's/^[ ]+\t/\t/'
 2) Manually check and correct if necessary

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11 18:52:05 +02:00
Zhen Lei
03466883a0 PM: sleep: remove trailing spaces and tabs
Run the following command to find and remove the trailing spaces and tabs:

$ find kernel/power/ -type f | xargs sed -r -i 's/[ \t]+$//'

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11 18:49:09 +02:00
Zhen Lei
6be2408a1e PM: hibernate: fix spelling mistakes
Fix some spelling mistakes in comments:

corresonds ==> corresponds
alocated ==> allocated
unitialized ==> uninitialized
Deompression ==> Decompression

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-05-24 16:17:07 +02:00
Lu Jialin
e4b2897ae1 PM: sleep: fix typos in comments
Change "occured" to "occurred" in kernel/power/autosleep.c.

Change "consiting" to "consisting" in kernel/power/snapshot.c.

Change "avaiable" to "available" in kernel/power/swap.c.

No functionality changed.

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-08 19:37:21 +02:00
Lukasz Luba
fb9d62b27a PM: EM: postpone creating the debugfs dir till fs_initcall
The debugfs directory '/sys/kernel/debug/energy_model' is needed before
the Energy Model registration can happen. With the recent change in
debugfs subsystem it's not allowed to create this directory at early
stage (core_initcall). Thus creating this directory would fail.

Postpone the creation of the EM debug dir to later stage: fs_initcall.

It should be safe since all clients: CPUFreq drivers, Devfreq drivers
will be initialized in later stages.

The custom debug log below prints the time of creation the EM debug dir
at fs_initcall and successful registration of EMs at later stages.

[    1.505717] energy_model: creating rootdir
[    3.698307] cpu cpu0: EM: created perf domain
[    3.709022] cpu cpu1: EM: created perf domain

Fixes: 56348560d4 ("debugfs: do not attempt to create a new file before the filesystem is initalized")
Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-23 19:53:48 +01:00
Rafael J. Wysocki
a9a939cb34 Merge branches 'powercap' and 'pm-misc'
* powercap:
  powercap: intel_rapl: Use topology interface in rapl_init_domains()
  powercap: intel_rapl: Use topology interface in rapl_add_package()
  powercap/intel_rapl: add support for AlderLake Mobile
  powercap/drivers/dtpm: Fix size of object being allocated
  powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check
  powercap/drivers/dtpm: Fix some missing unlock bugs
  powercap/drivers/dtpm: Fix a double shift bug
  powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols
  powercap/drivers/dtpm: Add CPU energy model based support
  powercap/drivers/dtpm: Add API for dynamic thermal power management
  Documentation/powercap/dtpm: Add documentation for dtpm
  units: Add Watt units

* pm-misc:
  PM: Kconfig: remove unneeded "default n" options
  PM: EM: update Kconfig description and drop "default n" option
2021-02-15 18:50:01 +01:00
Rikard Falkeborn
1556057413 PM: sleep: Constify static struct attribute_group
The only usage of suspend_attr_group is to put its address in an
array of pointers to const attribute_group structs.

Make it const to allow the compiler to put it into read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-12 16:46:59 +01:00
Lukasz Luba
c4cc3141b6 PM: Kconfig: remove unneeded "default n" options
Remove "default n" options. If the "default" line is removed, it
defaults to 'n'.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-12 16:43:23 +01:00
Lukasz Luba
3af2f0aa2e PM: EM: update Kconfig description and drop "default n" option
Energy Model supports now other devices like GPUs, DSPs, not only CPUs.
Thus, update the description in the config option. Remove also unneeded
"default n". If the "default" line is removed, it defaults to 'n'.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-12 16:43:23 +01:00
Zqiang
ccf7ce46ab PM: sleep: No need to check PF_WQ_WORKER in thaw_kernel_threads()
Because PF_KTHREAD is set for all wq worker threads, it is
not necessary to check PF_WQ_WORKER in addition to it in
thaw_kernel_threads(), so stop doing that.

Signed-off-by: Zqiang <qiang.zhang@windriver.com>
[ rjw: Subject and changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-27 19:20:17 +01:00
Laurent Badel
fef9c8d28e PM: hibernate: flush swap writer after marking
Flush the swap writer after, not before, marking the files, to ensure the
signature is properly written.

Fixes: 6f612af578 ("PM / Hibernate: Group swap ops")
Signed-off-by: Laurent Badel <laurentbadel@eaton.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-25 18:52:30 +01:00
Linus Torvalds
b4ec805464 Power management updates for 5.11-rc1
- Use local_clock() instead of jiffies in the cpufreq statistics to
    improve accuracy (Viresh Kumar).
 
  - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
    drivers (Viresh Kumar).
 
  - Clean up the cpufreq core, the intel_pstate driver and the
    schedutil cpufreq governor (Rafael Wysocki).
 
  - Fix up error code paths in the sti-cpufreq and mediatek cpufreq
    drivers (Yangtao Li, Qinglang Miao).
 
  - Fix cpufreq_online() to return error codes instead of success (0)
    in all cases when it fails (Wang ShaoBo).
 
  - Add mt8167 support to the mediatek cpufreq driver and blacklist
    mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
 
  - Modify the tegra194 cpufreq driver to always return values from
    the frequency table as the current frequency and clean up that
    driver (Sumit Gupta, Jon Hunter).
 
  - Modify the arm_scmi cpufreq driver to allow it to discover the
    power scale present in the performance protocol and provide this
    information to the Energy Model (Lukasz Luba).
 
  - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
    Rohár).
 
  - Clean up the CPPC cpufreq driver (Ionela Voinescu).
 
  - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
    Bergmann).
 
  - Rework the poling interval selection for the polling state in
    cpuidle (Mel Gorman).
 
  - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle
    driver (Ulf Hansson).
 
  - Modify the OPP framework to support empty (node-less) OPP tables
    in DT for passing dependency information (Nicola Mazzucato).
 
  - Fix potential lockdep issue in the OPP core and clean up the OPP
    core (Viresh Kumar).
 
  - Modify dev_pm_opp_put_regulators() to accept a NULL argument and
    update its users accordingly (Viresh Kumar).
 
  - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
 
  - Add support for governor feature flags to devfreq, make devfreq
    sysfs file permissions depend on the governor and clean up the
    devfreq core (Chanwoo Choi).
 
  - Clean up the tegra20 devfreq driver and deprecate it to allow
    another driver based on EMC_STAT to be used instead of it (Dmitry
    Osipenko).
 
  - Add interconnect support to the tegra30 devfreq driver, allow it
    to take the interconnect and OPP information from DT and clean it
    up ((Dmitry Osipenko).
 
  - Add interconnect support to the exynos-bus devfreq driver along
    with interconnect properties documentation (Sylwester Nawrocki).
 
  - Add suport for AMD Fam17h and Fam19h processors to the RAPL power
    capping driver (Victor Ding, Kim Phillips).
 
  - Fix handling of overly long constraint names in the powercap
    framework (Lukasz Luba).
 
  - Fix the wakeup configuration handling for bridges in the ACPI
    device power management core (Rafael Wysocki).
 
  - Add support for using an abstract scale for power units in the
    Energy Model (EM) and document it (Lukasz Luba).
 
  - Add em_cpu_energy() micro-optimization to the EM (Pavankumar
    Kondeti).
 
  - Modify the generic power domains (genpd) framwework to support
    suspend-to-idle (Ulf Hansson).
 
  - Fix creation of debugfs nodes in genpd (Thierry Strudel).
 
  - Clean up genpd (Lina Iyer).
 
  - Clean up the core system-wide suspend code and make it print
    driver flags for devices with debug enabled (Alex Shi, Patrice
    Chotard, Chen Yu).
 
  - Modify the ACPI system reboot code to make it prepare for system
    power off to avoid confusing the platform firmware (Kai-Heng Feng).
 
  - Update the pm-graph (multiple changes, mostly usability-related)
    and cpupower (online and offline CPU information support) PM
    utilities (Todd Brandt, Brahadambal Srinivasan).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl/Y8mcSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxjY4QAKsNFJeEtjGCxq7MxQIML3QLAsdJM9of
 9kkY9skMEw4v1TRmyy7sW9jZW2pLSRcLJwWRKWu4143qUS3YUp2DQ0lqX4WyXoWu
 BhnkhkMUl6iCeBO8CWnt8zsTuqSa20A13sL9LyqN1+7OZKHD8StbT4hKjBncdNNN
 4aDj+1uAPyOgj2iCUZuHQ8DtpBvOLjgTh367vbhbufjeJ//8/9+R7s4Xzrj7wtmv
 JlE0LDgvge9QeGTpjhxQJzn0q2/H5fg9jbmjPXUfbHJNuyKhrqnmjGyrN5m256JI
 8DqGqQtJpmFp7Ihrur3uKTk3gWO05YwJ1FdeEooAKEjEMObm5xuYhKVRoDhmlJAu
 G6ui+OAUvNR0FffJtbzvWe/pLovLGOEOHdvTrZxUF8Abo6br3untTm8rKTi1fhaF
 wWndSMw0apGsPzCx5T+bE7AbJz2QHFpLhaVAutenuCzNI8xoMlxNKEzsaVz/+FqL
 Pq/PdFaM4vNlMbv7hkb/fujkCs/v3EcX2ihzvt7I2o8dBS0D1X8A4mnuWJmiGslw
 1ftbJ6M9XacwkPBTHPgeXxJh2C1yxxe5VQ9Z5fWWi7sPOUeJnUwxKaluv+coFndQ
 sO6JxsPQ4hQihg8yOxLEkL6Wn68sZlmp+u2Oj+TPFAsAGANIA8rJlBPo1ppJWvdQ
 j1OCIc/qzwpH
 =BVdX
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These update cpufreq (core and drivers), cpuidle (polling state
  implementation and the PSCI driver), the OPP (operating performance
  points) framework, devfreq (core and drivers), the power capping RAPL
  (Running Average Power Limit) driver, the Energy Model support, the
  generic power domains (genpd) framework, the ACPI device power
  management, the core system-wide suspend code and power management
  utilities.

  Specifics:

   - Use local_clock() instead of jiffies in the cpufreq statistics to
     improve accuracy (Viresh Kumar).

   - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
     drivers (Viresh Kumar).

   - Clean up the cpufreq core, the intel_pstate driver and the
     schedutil cpufreq governor (Rafael Wysocki).

   - Fix up error code paths in the sti-cpufreq and mediatek cpufreq
     drivers (Yangtao Li, Qinglang Miao).

   - Fix cpufreq_online() to return error codes instead of success (0)
     in all cases when it fails (Wang ShaoBo).

   - Add mt8167 support to the mediatek cpufreq driver and blacklist
     mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).

   - Modify the tegra194 cpufreq driver to always return values from the
     frequency table as the current frequency and clean up that driver
     (Sumit Gupta, Jon Hunter).

   - Modify the arm_scmi cpufreq driver to allow it to discover the
     power scale present in the performance protocol and provide this
     information to the Energy Model (Lukasz Luba).

   - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
     Rohár).

   - Clean up the CPPC cpufreq driver (Ionela Voinescu).

   - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
     Bergmann).

   - Rework the poling interval selection for the polling state in
     cpuidle (Mel Gorman).

   - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
     (Ulf Hansson).

   - Modify the OPP framework to support empty (node-less) OPP tables in
     DT for passing dependency information (Nicola Mazzucato).

   - Fix potential lockdep issue in the OPP core and clean up the OPP
     core (Viresh Kumar).

   - Modify dev_pm_opp_put_regulators() to accept a NULL argument and
     update its users accordingly (Viresh Kumar).

   - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).

   - Add support for governor feature flags to devfreq, make devfreq
     sysfs file permissions depend on the governor and clean up the
     devfreq core (Chanwoo Choi).

   - Clean up the tegra20 devfreq driver and deprecate it to allow
     another driver based on EMC_STAT to be used instead of it (Dmitry
     Osipenko).

   - Add interconnect support to the tegra30 devfreq driver, allow it to
     take the interconnect and OPP information from DT and clean it up
     (Dmitry Osipenko).

   - Add interconnect support to the exynos-bus devfreq driver along
     with interconnect properties documentation (Sylwester Nawrocki).

   - Add suport for AMD Fam17h and Fam19h processors to the RAPL power
     capping driver (Victor Ding, Kim Phillips).

   - Fix handling of overly long constraint names in the powercap
     framework (Lukasz Luba).

   - Fix the wakeup configuration handling for bridges in the ACPI
     device power management core (Rafael Wysocki).

   - Add support for using an abstract scale for power units in the
     Energy Model (EM) and document it (Lukasz Luba).

   - Add em_cpu_energy() micro-optimization to the EM (Pavankumar
     Kondeti).

   - Modify the generic power domains (genpd) framwework to support
     suspend-to-idle (Ulf Hansson).

   - Fix creation of debugfs nodes in genpd (Thierry Strudel).

   - Clean up genpd (Lina Iyer).

   - Clean up the core system-wide suspend code and make it print driver
     flags for devices with debug enabled (Alex Shi, Patrice Chotard,
     Chen Yu).

   - Modify the ACPI system reboot code to make it prepare for system
     power off to avoid confusing the platform firmware (Kai-Heng Feng).

   - Update the pm-graph (multiple changes, mostly usability-related)
     and cpupower (online and offline CPU information support) PM
     utilities (Todd Brandt, Brahadambal Srinivasan)"

* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
  cpufreq: Fix cpufreq_online() return value on errors
  cpufreq: Fix up several kerneldoc comments
  cpufreq: stats: Use local_clock() instead of jiffies
  cpufreq: schedutil: Simplify sugov_update_next_freq()
  cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
  PM: domains: create debugfs nodes when adding power domains
  opp: of: Allow empty opp-table with opp-shared
  dt-bindings: opp: Allow empty OPP tables
  media: venus: dev_pm_opp_put_*() accepts NULL argument
  drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
  drm/lima: dev_pm_opp_put_*() accepts NULL argument
  PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
  cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
  cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
  opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
  opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
  cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
  opp: Reduce the size of critical section in _opp_kref_release()
  PM / EM: Micro optimization in em_cpu_energy
  cpufreq: arm_scmi: Discover the power scale in performance protocol
  ...
2020-12-15 16:30:31 -08:00
Vlastimil Babka
03b6c9a3e8 kernel/power: allow hibernation with page_poison sanity checking
Page poisoning used to be incompatible with hibernation, as the state of
poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
forces CONFIG_PAGE_POISONING_NO_SANITY.  For the same reason, the
poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
hibernation.  The latter restriction was removed by commit 1ad1410f63
("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
similarly for init_on_free by commit 18451f9f9e ("PM: hibernate: fix
crashes with init_on_free=1") by making sure free pages are cleared after
resume.

We can use the same mechanism to instead poison free pages with
PAGE_POISON after resume.  This covers both zero and 0xAA patterns.  Thus
we can remove the Kconfig restriction that disables page poison sanity
checking when hibernation is enabled.

Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[hibernation]
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Mike Rapoport
2abf962a8d PM: hibernate: make direct map manipulations more explicit
When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be
not present in the direct map and has to be explicitly mapped before it
could be copied.

Introduce hibernate_map_page() and hibernation_unmap_page() that will
explicitly use set_direct_map_{default,invalid}_noflush() for
ARCH_HAS_SET_DIRECT_MAP case and debug_pagealloc_{map,unmap}_pages() for
DEBUG_PAGEALLOC case.

The remapping of the pages in safe_copy_page() presumes that it only
changes protection bits in an existing PTE and so it is safe to ignore
return value of set_direct_map_{default,invalid}_noflush().

Still, add a pr_warn() so that future changes in set_memory APIs will not
silently break hibernation.

Link: https://lkml.kernel.org/r/20201109192128.960-3-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:43 -08:00
Rafael J. Wysocki
42b4ca04cb Merge branches 'pm-sleep', 'pm-acpi', 'pm-domains' and 'powercap'
* pm-sleep:
  PM: sleep: Add dev_wakeup_path() helper
  PM / suspend: fix kernel-doc markup
  PM: sleep: Print driver flags for all devices during suspend/resume

* pm-acpi:
  PM: ACPI: Refresh wakeup device power configuration every time
  PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()
  PM: ACPI: reboot: Use S5 for reboot

* pm-domains:
  PM: domains: create debugfs nodes when adding power domains
  PM: domains: replace -ENOTSUPP with -EOPNOTSUPP

* powercap:
  powercap: Adjust printing the constraint name with new line
  powercap: RAPL: Add AMD Fam19h RAPL support
  powercap: Add AMD Fam17h RAPL support
  powercap/intel_rapl_msr: Convert rapl_msr_priv into pointer
  x86/msr-index: sort AMD RAPL MSRs by address
2020-12-15 15:26:14 +01:00
Alex Shi
ab150c3f80 PM / suspend: fix kernel-doc markup
Add parameter explanation to fix kernel-doc marks:

kernel/power/suspend.c:233: warning: Function parameter or member
'state' not described in 'suspend_valid_only_mem'
kernel/power/suspend.c:344: warning: Function parameter or member
'state' not described in 'suspend_prepare'

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
[ rjw: Change the proposed parameter descriptions. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-23 18:24:53 +01:00
Lukasz Luba
f2c90b12e7 PM: EM: update the comments related to power scale
The Energy Model supports power values expressed in milli-Watts or in an
'abstract scale'. Update the related comments is the code to reflect that
state.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 20:29:28 +01:00
Lukasz Luba
c250d50fe2 PM: EM: Add a flag indicating units of power values in Energy Model
There are different platforms and devices which might use different scale
for the power values. Kernel sub-systems might need to check if all
Energy Model (EM) devices are using the same scale. Address that issue and
store the information inside EM for each device. Thanks to that they can
be easily compared and proper action triggered.

Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 20:22:00 +01:00
Jackie Zamow
4d4ce8053b PM: sleep: fix typo in kernel/power/process.c
Fix a typo in a comment in freeze_processes().

Signed-off-by: Jackie Zamow <jackie.zamow@gmail.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-27 19:11:44 +01:00
Randy Dunlap
7b7b8a2c95 kernel/: fix repeated words in comments
Fix multiple occurrences of duplicated words in kernel/.

Fix one typo/spello on the same line as a duplicate word.  Change one
instance of "the the" to "that the".  Otherwise just drop one of the
repeated words.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Linus Torvalds
0b8417c141 Power management updates for 5.10-rc1
- Rework cpufreq statistics collection to allow it to take place
    when fast frequency switching is enabled in the governor (Viresh
    Kumar).
 
  - Make the cpufreq core set the frequency scale on behalf of the
    driver and update several cpufreq drivers accordingly (Ionela
    Voinescu, Valentin Schneider).
 
  - Add new hardware support to the STI and qcom cpufreq drivers and
    improve them (Alain Volmat, Manivannan Sadhasivam).
 
  - Fix multiple assorted issues in cpufreq drivers (Jon Hunter,
    Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan
    Gerhold, Viresh Kumar).
 
  - Fix several assorted issues in the operating performance points
    (OPP) framework (Stephan Gerhold, Viresh Kumar).
 
  - Allow devfreq drivers to fetch devfreq instances by DT enumeration
    instead of using explicit phandles and modify the devfreq core
    code to support driver-specific devfreq DT bindings (Leonard
    Crestez, Chanwoo Choi).
 
  - Improve initial hardware resetting in the tegra30 devfreq driver
    and clean up the tegra cpuidle driver (Dmitry Osipenko).
 
  - Update the cpuidle core to collect state entry rejection
    statistics and expose them via sysfs (Lina Iyer).
 
  - Improve the ACPI _CST code handling diagnostics (Chen Yu).
 
  - Update the PSCI cpuidle driver to allow the PM domain
    initialization to occur in the OSI mode as well as in the PC
    mode (Ulf Hansson).
 
  - Rework the generic power domains (genpd) core code to allow
    domain power off transition to be aborted in the absence of the
    "power off" domain callback (Ulf Hansson).
 
  - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael
    Wysocki).
 
  - Fix the handling of timer_expires in the PM-runtime framework on
    32-bit systems and the handling of device links in it (Grygorii
    Strashko, Xiang Chen).
 
  - Add IO requests batching support to the hibernate image saving and
    reading code and drop a bogus get_gendisk() from there (Xiaoyi
    Chen, Christoph Hellwig).
 
  - Allow PCIe ports to be put into the D3cold power state if they
    are power-manageable via ACPI (Lukas Wunner).
 
  - Add missing header file include to a power capping driver (Pujin
    Shi).
 
  - Clean up the qcom-cpr AVS driver a bit (Liu Shixin).
 
  - Kevin Hilman steps down as designated reviwer of adaptive voltage
    scaling (AVS) driverrs (Kevin Hilman).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+F4A4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxX6QP/iELq9/OsH0aJdDQlY9tnh2Oa13+HB/Y
 w1e6W+ZR/YjPgUpMVARwRLKf/gn7dUEwRDHVpGvDOyun+HACCPHB2hg8iktbxdVl
 NFAVGZCCRezXqz3opL1hl8C3Dh0CqUPUjWXGMr+Lw2TZQKT+hx9K1dm9Epe3ivyT
 RlVH/wifei80cFRcUUj7DI5KLCAyk+uKkZIFnZHAGKK6qOHMqRL5sDZsMUwWpd2i
 AdghABjePbaiLTAoZuUsJINAGY4DnIt6ASRdMJ4iksiD6pFITwFs0HSOPe7hZLlv
 zbwDPI5+TIkrOy9/aWoMaEIH1OQiFN/O++Slvdjn7gMsRgoW4d300ru4Jo1pOHxb
 5twxagCCqlOf4YAaSrMCH4HT+c6fOWoGj2AKzX3DMJyO3/WN+8XNvUxKtC5Px1u+
 pWRASjfQMO2j6nNjTCTwDJdYzggiKa54rYH2k7svX7XnTIAf+2E1gv8b4rMTgQrZ
 0rq9kULYlhgk3EYjd/DndkvxunRlmiqhzrYB4jc9eDSPNzB8FZEbw1ZMRQTFfjK0
 kp0vaEpTJ7JfKSCfluB4UmTuQoGogLl0xbzc+2NNIpwdNmrH2Srvq6wbj35jEDTU
 tqsTsBP+XZFOWyFOw/L2J47LTOp0TJnz8z4aycLfrmdNUVnXJoU1sXgFlDzETMgT
 0E6cTVwLF7Zi
 =rGhy
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These rework the collection of cpufreq statistics to allow it to take
  place if fast frequency switching is enabled in the governor, rework
  the frequency invariance handling in the cpufreq core and drivers, add
  new hardware support to a couple of cpufreq drivers, fix a number of
  assorted issues and clean up the code all over.

  Specifics:

   - Rework cpufreq statistics collection to allow it to take place when
     fast frequency switching is enabled in the governor (Viresh Kumar).

   - Make the cpufreq core set the frequency scale on behalf of the
     driver and update several cpufreq drivers accordingly (Ionela
     Voinescu, Valentin Schneider).

   - Add new hardware support to the STI and qcom cpufreq drivers and
     improve them (Alain Volmat, Manivannan Sadhasivam).

   - Fix multiple assorted issues in cpufreq drivers (Jon Hunter,
     Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan
     Gerhold, Viresh Kumar).

   - Fix several assorted issues in the operating performance points
     (OPP) framework (Stephan Gerhold, Viresh Kumar).

   - Allow devfreq drivers to fetch devfreq instances by DT enumeration
     instead of using explicit phandles and modify the devfreq core code
     to support driver-specific devfreq DT bindings (Leonard Crestez,
     Chanwoo Choi).

   - Improve initial hardware resetting in the tegra30 devfreq driver
     and clean up the tegra cpuidle driver (Dmitry Osipenko).

   - Update the cpuidle core to collect state entry rejection statistics
     and expose them via sysfs (Lina Iyer).

   - Improve the ACPI _CST code handling diagnostics (Chen Yu).

   - Update the PSCI cpuidle driver to allow the PM domain
     initialization to occur in the OSI mode as well as in the PC mode
     (Ulf Hansson).

   - Rework the generic power domains (genpd) core code to allow domain
     power off transition to be aborted in the absence of the "power
     off" domain callback (Ulf Hansson).

   - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael
     Wysocki).

   - Fix the handling of timer_expires in the PM-runtime framework on
     32-bit systems and the handling of device links in it (Grygorii
     Strashko, Xiang Chen).

   - Add IO requests batching support to the hibernate image saving and
     reading code and drop a bogus get_gendisk() from there (Xiaoyi
     Chen, Christoph Hellwig).

   - Allow PCIe ports to be put into the D3cold power state if they are
     power-manageable via ACPI (Lukas Wunner).

   - Add missing header file include to a power capping driver (Pujin
     Shi).

   - Clean up the qcom-cpr AVS driver a bit (Liu Shixin).

   - Kevin Hilman steps down as designated reviwer of adaptive voltage
     scaling (AVS) drivers (Kevin Hilman)"

* tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
  cpufreq: stats: Fix string format specifier mismatch
  arm: disable frequency invariance for CONFIG_BL_SWITCHER
  cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()
  cpufreq: stats: Add memory barrier to store_reset()
  cpufreq: schedutil: Simplify sugov_fast_switch()
  ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
  ACPI: EC: PM: Flush EC work unconditionally after wakeup
  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
  PM: hibernate: remove the bogus call to get_gendisk() in software_resume()
  cpufreq: Move traces and update to policy->cur to cpufreq core
  cpufreq: stats: Enable stats for fast-switch as well
  cpufreq: stats: Mark few conditionals with unlikely()
  cpufreq: stats: Remove locking
  cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()
  PM: domains: Allow to abort power off when no ->power_off() callback
  PM: domains: Rename power state enums for genpd
  PM / devfreq: tegra30: Improve initial hardware resetting
  PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function
  PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function
  PM / devfreq: Add devfreq_get_devfreq_by_node function
  ...
2020-10-14 10:45:41 -07:00
Linus Torvalds
3ad11d7ac8 block-5.10-2020-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
 u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
 jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
 fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
 e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
 6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
 WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
 8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
 c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
 FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
 YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
 ZwfpDh2+Tg==
 =LzyE
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
2020-10-13 12:12:44 -07:00
Christoph Hellwig
428805c0c5 PM: hibernate: remove the bogus call to get_gendisk() in software_resume()
get_gendisk grabs a reference on the disk and file operation, so this
code will leak both of them while having absolutely no use for the
gendisk itself.

This effectively reverts commit 2df83fa4bc ("PM / Hibernate: Use
get_gendisk to verify partition if resume_file is integer format")

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-05 18:42:58 +02:00
Xiaoyi Chen
55c4478a8f PM: hibernate: Batch hibernate and resume IO requests
Hibernate and resume process submits individual IO requests for each page
of the data, so use blk_plug to improve the batching of these requests.

Testing this change with hibernate and resumes consistently shows merging
of the IO requests and more than an order of magnitude improvement in
hibernate and resume speed is observed.

One hibernate and resume cycle for 16GB RAM out of 32GB in use takes
around 21 minutes before the change, and 1 minutes after the change on
a system with limited storage IOPS.

Signed-off-by: Xiaoyi Chen <cxiaoyi@amazon.com>
Co-Developed-by: Anchal Agarwal <anchalag@amazon.com>
Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
[ rjw: Subject and changelog edits, white space damage fixes ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-28 15:58:18 +02:00
Christoph Hellwig
36daaa98f7 PM: mm: cleanup swsusp_swap_check
Use blkdev_get_by_dev instead of bdget + blkdev_get.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:19 -06:00
Christoph Hellwig
21bd900572 mm: split swap_type_of
swap_type_of is used for two entirely different purposes:

 (1) check what swap type a given device/offset corresponds to
 (2) find the first available swap device that can be written to

Mixing both in a single function creates an unreadable mess.  Create two
separate functions instead, and switch both to pass a dev_t instead of
a struct block_device to further simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:19 -06:00
Christoph Hellwig
bb3247a399 PM: rewrite is_hibernate_resume_dev to not require an inode
Just check the dev_t to help simplifying the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:19 -06:00
Peter Zijlstra
70d9329857 notifier: Fix broken error handling pattern
The current notifiers have the following error handling pattern all
over the place:

	int err, nr;

	err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr);
	if (err & NOTIFIER_STOP_MASK)
		__foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL)

And aside from the endless repetition thereof, it is broken. Consider
blocking notifiers; both calls take and drop the rwsem, this means
that the notifier list can change in between the two calls, making @nr
meaningless.

Fix this by replacing all the __foo_notifier_call_chain() functions
with foo_notifier_call_chain_robust() that embeds the above pattern,
but ensures it is inside a single lock region.

Note: I switched atomic_notifier_call_chain_robust() to use
      the spinlock, since RCU cannot provide the guarantee
      required for the recovery.

Note: software_resume() error handling was broken afaict.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org
2020-09-01 09:58:03 +02:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds
4bf5e36118 libnvdimm for 5.9
- Add 'Runtime Firmware Activation' support for NVDIMMs that advertise
   the relevant capability
 - Misc libnvdimm and DAX cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT9vPEBxh63bwxRYEEPzq5USduLdgUCXzHodgAKCRAPzq5USduL
 djTjAQD1THDmizHn16zd94ueygh/BXfN0zyeVvQH352ol7kdfQEAj2A7YJ9XBbBY
 JC6/CNd+OiB9W88lLOUf3Waj1a7cUQ8=
 =Q6qn
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updayes from Vishal Verma:
 "You'd normally receive this pull request from Dan Williams, but he's
  busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
  this cycle.

  This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
  and a few small cleanups and fixes in libnvdimm and DAX. I'd
  originally intended to make separate topic-based pull requests - one
  for libnvdimm, and one for DAX, but some of the DAX material fell out
  since it wasn't quite ready.

  Summary:

   - add 'Runtime Firmware Activation' support for NVDIMMs that
     advertise the relevant capability

   - misc libnvdimm and DAX cleanups"

* tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
  libnvdimm/security: the 'security' attr never show 'overwrite' state
  libnvdimm/security: fix a typo
  ACPI: NFIT: Fix ARS zero-sized allocation
  dax: Fix incorrect argument passed to xas_set_err()
  ACPI: NFIT: Add runtime firmware activate support
  PM, libnvdimm: Add runtime firmware activation support
  libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
  drivers/dax: Expand lock scope to cover the use of addresses
  fs/dax: Remove unused size parameter
  dax: print error message by pr_info() in __generic_fsdax_supported()
  driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
  tools/testing/nvdimm: Emulate firmware activation commands
  tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
  tools/testing/nvdimm: Add command debug messages
  tools/testing/nvdimm: Cleanup dimm index passing
  ACPI: NFIT: Define runtime firmware activation commands
  ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
  libnvdimm: Validate command family indices
2020-08-11 10:59:19 -07:00
Roman Gushchin
d42f3245c7 mm: memcg: convert vmstat slab counters to bytes
In order to prepare for per-object slab memory accounting, convert
NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes.

To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and
NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB).

Internally global and per-node counters are stored in pages, however memcg
and lruvec counters are stored in bytes.  This scheme may look weird, but
only for now.  As soon as slab pages will be shared between multiple
cgroups, global and node counters will reflect the total number of slab
pages.  However memcg and lruvec counters will be used for per-memcg slab
memory tracking, which will take separate kernel objects in the account.
Keeping global and node counters in pages helps to avoid additional
overhead.

The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it
will fit into atomic_long_t we use for vmstats.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-4-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Rafael J. Wysocki
86ba54fb08 Merge branches 'pm-sleep', 'pm-domains', 'powercap' and 'pm-tools'
* pm-sleep:
  PM: sleep: spread "const char *" correctness
  PM: hibernate: fix white space in a few places
  freezer: Add unsafe version of freezable_schedule_timeout_interruptible() for NFS
  PM: sleep: core: Emit changed uevent on wakeup_sysfs_add/remove

* pm-domains:
  PM: domains: Restore comment indentation for generic_pm_domain.child_links
  PM: domains: Fix up terminology with parent/child

* powercap:
  powercap: Add Power Limit4 support
  powercap: idle_inject: Replace play_idle() with play_idle_precise() in comments
  powercap: intel_rapl: add support for Sapphire Rapids

* pm-tools:
  pm-graph v5.7 - important s2idle fixes
  cpupower: Replace HTTP links with HTTPS ones
  cpupower: Fix NULL but dereferenced coccicheck errors
  cpupower: Fix comparing pointer to 0 coccicheck warns
2020-08-03 13:12:44 +02:00
Dan Williams
48001ea50d PM, libnvdimm: Add runtime firmware activation support
Abstract platform specific mechanics for nvdimm firmware activation
behind a handful of generic ops. At the bus level ->activate_state()
indicates the unified state (idle, busy, armed) of all DIMMs on the bus,
and ->capability() indicates the system state expectations for activate.
At the DIMM level ->activate_state() indicates the per-DIMM state,
->activate_result() indicates the outcome of the last activation
attempt, and ->arm() attempts to transition the DIMM from 'idle' to
'armed'.

A new hibernate_quiet_exec() facility is added to support firmware
activation in an OS defined system quiesce state. It leverages the fact
that the hibernate-freeze state wants to assert that a memory
hibernation snapshot can be taken. This is in contrast to a platform
firmware defined quiesce state that may forcefully quiet the memory
controller independent of whether an individual device-driver properly
supports hibernate-freeze.

The libnvdimm sysfs interface is extended to support detection of a
firmware activate capability. The mechanism supports enumeration and
triggering of firmware activate, optionally in the
hibernate_quiet_exec() context.

[rafael: hibernate_quiet_exec() proposal]
[vishal: fix up sparse warning, grammar in Documentation/]

Cc: Pavel Machek <pavel@ucw.cz>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-07-28 19:28:32 -06:00
Alexey Dobriyan
02d7f4005e PM: sleep: spread "const char *" correctness
Fixed string literals can be referred to as "const char *".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
[ rjw: Minor subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-14 19:25:41 +02:00
Xiang Chen
6ada7ba2fa PM: hibernate: fix white space in a few places
In hibernate.c, some places lack of spaces while some places have
redundant spaces. So fix them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-14 19:25:41 +02:00
Lukasz Luba
07891f15d9 PM / EM: remove em_register_perf_domain
Remove old function em_register_perf_domain which is no longer needed.
There is em_dev_register_perf_domain that covers old use cases and new as
well.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24 17:16:42 +02:00
Lukasz Luba
1bc138c622 PM / EM: add support for other devices than CPUs in Energy Model
Add support for other devices than CPUs. The registration function
does not require a valid cpumask pointer and is ready to handle new
devices. Some of the internal structures has been reorganized in order to
keep consistent view (like removing per_cpu pd pointers).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24 17:16:27 +02:00
Lukasz Luba
d0351cc3b0 PM / EM: update callback structure and add device pointer
The Energy Model framework is going to support devices other that CPUs. In
order to make this happen change the callback function and add pointer to
a device as an argument.

Update the related users to use new function and new callback from the
Energy Model.

Acked-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24 17:14:07 +02:00
Lukasz Luba
7d9895c7fb PM / EM: introduce em_dev_register_perf_domain function
Add now function in the Energy Model framework which is going to support
new devices. This function will help in transition and make it smoother.
For now it still checks if the cpumask is a valid pointer, which will be
removed later when the new structures and infrastructure will be ready.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24 17:14:07 +02:00
Lukasz Luba
521b512b15 PM / EM: change naming convention from 'capacity' to 'performance'
The Energy Model uses concept of performance domain and capacity states in
order to calculate power used by CPUs. Change naming convention from
capacity to performance state would enable wider usage in future, e.g.
upcoming support for other devices other than CPUs.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24 17:14:07 +02:00
Masahiro Yamada
a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Linus Torvalds
0c67f6b297 More power management updates for 5.8-rc1
- Add support for interconnect bandwidth to the OPP core (Georgi
    Djakov, Saravana Kannan, Sibi Sankar, Viresh Kumar).
 
  - Add support for regulator enable/disable to the OPP core (Kamil
    Konieczny).
 
  - Add boost support to the CPPC cpufreq driver (Xiongfeng Wang).
 
  - Make the tegra186 cpufreq driver set the
    CPUFREQ_NEED_INITIAL_FREQ_CHECK flag (Mian Yousaf Kaukab).
 
  - Prevent the ACPI power management from using power resources
    with devices where the list of power resources for power state
    D0 (full power) is missing (Rafael Wysocki).
 
  - Annotate a hibernation-related function with __init (Christophe
    JAILLET).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl7g/zESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxZzgP+wTBW/WVLVkrlk2tQhcbbj+y9TNNOfU1
 FZ9C56bR+5VJhrrxTxVHeQP7PDCNxqVM57M8Bcnl3I0LFi+OHNAqkN/xW323N7ZA
 8OGkFgeqSxgG21691rTEwVnwwhdvQsNw47Fqjbu10PiNFYm6W8YNI5JMQRxfTVHb
 H8Nt7xcJ5i7wnMRnAyrotnTUYmS3nZ7IwpHFEoM2SwWCqlYr0h9rDqKz3MvbiE59
 m0G+4tFUv8egyshzwMD78PeFG+7iZP9s4uovsKujj4UGskmAVn9BGP0vI5AJiS4b
 9KdrDNdX5NAEBFn5eDVzZSMzYhRI3pebd306oWUS+A4/rDA+BtC4ECkaWKE6IlX6
 pmJuY5w8mZU0geH+W2xJQp6t/f60XJymEYKu88opm69ujgJL8X2PglWNq8tal6iX
 BfaPNMla+0mNt9L+GzIb6v5f/nbNBQ8qe6vXlQndqcxxerIBPktLTP+j18FzN9N4
 4XOyFJYoauvfMidK8+fjGlSGi54GnlseopNcVTD6IRjRkXhYYJE62oTfgoFe1DIs
 7pFyEtnTgA0Ma9h1CMs2/UwaL1Xule4HMZzdeAnkelAAivdJI181XU0k3Y6vuNwA
 4IXkpMph8utvWX/Dp+OH0a5YGvpEAuihwe8a2Ld/VtOGVINh4hVQucU/yrGN6EIJ
 Od7S+Zua3bR6
 =p5eu
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These are operating performance points (OPP) framework updates mostly,
  including support for interconnect bandwidth in the OPP core, plus a
  few cpufreq changes, including boost support in the CPPC cpufreq
  driver, an ACPI device power management fix and a hibernation code
  cleanup.

  Specifics:

   - Add support for interconnect bandwidth to the OPP core (Georgi
     Djakov, Saravana Kannan, Sibi Sankar, Viresh Kumar).

   - Add support for regulator enable/disable to the OPP core (Kamil
     Konieczny).

   - Add boost support to the CPPC cpufreq driver (Xiongfeng Wang).

   - Make the tegra186 cpufreq driver set the
     CPUFREQ_NEED_INITIAL_FREQ_CHECK flag (Mian Yousaf Kaukab).

   - Prevent the ACPI power management from using power resources with
     devices where the list of power resources for power state D0 (full
     power) is missing (Rafael Wysocki).

   - Annotate a hibernation-related function with __init (Christophe
     JAILLET)"

* tag 'pm-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: Avoid using power resources if there are none for D0
  cpufreq: CPPC: add SW BOOST support
  cpufreq: change '.set_boost' to act on one policy
  PM: hibernate: Add __init annotation to swsusp_header_init()
  opp: Don't parse icc paths unnecessarily
  opp: Remove bandwidth votes when target_freq is zero
  opp: core: add regulators enable and disable
  opp: Reorder the code for !target_freq case
  opp: Expose bandwidth information via debugfs
  cpufreq: dt: Add support for interconnect bandwidth scaling
  opp: Update the bandwidth on OPP frequency changes
  opp: Add sanity checks in _read_opp_key()
  opp: Add support for parsing interconnect bandwidth
  cpufreq: tegra186: add CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
  OPP: Add helpers for reading the binding properties
  dt-bindings: opp: Introduce opp-peak-kBps and opp-avg-kBps bindings
2020-06-10 14:04:39 -07:00
Mike Rapoport
e31cf2f4ca mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Linus Torvalds
081096d98b TTY/Serial driver updates for 5.8-rc1
Here is the tty and serial driver updates for 5.8-rc1
 
 Nothing huge at all, just a lot of little serial driver fixes, updates
 for new devices and features, and other small things.  Full details are
 in the shortlog.
 
 Note, you will get a conflict merging with your tree in the
 Documentation/devicetree/bindings/serial/rs485.yaml file, but it should
 be pretty obvious what to do.  If not, I'm sure Rob will clean it all up
 afterwards :)
 
 All of these have been in linux-next with no issues for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXtzpCg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylRxACgjGtOKPjahONL4lWd0F8ZYEcyw7sAn34woBCO
 BDUV3kolrRQ4OYNJWsHP
 =TvqG
 -----END PGP SIGNATURE-----

Merge tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the tty and serial driver updates for 5.8-rc1

  Nothing huge at all, just a lot of little serial driver fixes, updates
  for new devices and features, and other small things. Full details are
  in the shortlog.

  All of these have been in linux-next with no issues for a while"

* tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (67 commits)
  tty: serial: qcom_geni_serial: Add 51.2MHz frequency support
  tty: serial: imx: clear Ageing Timer Interrupt in handler
  serial: 8250_fintek: Add F81966 Support
  sc16is7xx: Add flag to activate IrDA mode
  dt-bindings: sc16is7xx: Add flag to activate IrDA mode
  serial: 8250: Support rs485 bus termination GPIO
  serial: 8520_port: Fix function param documentation
  dt-bindings: serial: Add binding for rs485 bus termination GPIO
  vt: keyboard: avoid signed integer overflow in k_ascii
  serial: 8250: Enable 16550A variants by default on non-x86
  tty: hvc_console, fix crashes on parallel open/close
  serial: imx: Initialize lock for non-registered console
  sc16is7xx: Read the LSR register for basic device presence check
  sc16is7xx: Allow sharing the IRQ line
  sc16is7xx: Use threaded IRQ
  sc16is7xx: Always use falling edge IRQ
  tty: n_gsm: Fix bogus i++ in gsm_data_kick
  tty: n_gsm: Remove unnecessary test in gsm_print_packet()
  serial: stm32: add no_console_suspend support
  tty: serial: fsl_lpuart: Use __maybe_unused instead of #if CONFIG_PM_SLEEP
  ...
2020-06-07 09:52:36 -07:00
Christophe JAILLET
afd8d7c7f9 PM: hibernate: Add __init annotation to swsusp_header_init()
'swsusp_header_init()' is only called via 'core_initcall'.
It can be marked as __init to save a few bytes of memory.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-05 13:52:38 +02:00
Domenico Andreoli
ad1e4f74c0 PM: hibernate: Restrict writes to the resume device
Hibernation via snapshot device requires write permission to the swap
block device, the one that more often (but not necessarily) is used to
store the hibernation image.

With this patch, such permissions are granted iff:

 1) snapshot device config option is enabled
 2) swap partition is used as resume device

In other circumstances the swap device is not writable from userspace.

In order to achieve this, every write attempt to a swap device is
checked against the device configured as part of the uswsusp API [0]
using a pointer to the inode struct in memory. If the swap device being
written was not configured for resuming, the write request is denied.

NOTE: this implementation works only for swap block devices, where the
inode configured by swapon (which sets S_SWAPFILE) is the same used
by SNAPSHOT_SET_SWAP_AREA.

In case of swap file, SNAPSHOT_SET_SWAP_AREA indeed receives the inode
of the block device containing the filesystem where the swap file is
located (+ offset in it) which is never passed to swapon and then has
not set S_SWAPFILE.

As result, the swap file itself (as a file) has never an option to be
written from userspace. Instead it remains writable if accessed directly
from the containing block device, which is always writeable from root.

[0] Documentation/power/userland-swsusp.rst

v2:
 - rename is_hibernate_snapshot_dev() to is_hibernate_resume_dev()
 - fix description so to correctly refer to the resume device

Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-27 17:55:59 +02:00
Domenico Andreoli
c4f39a6c74 PM: hibernate: Split off snapshot dev option
Make it possible to reduce the attack surface in case the snapshot
device is not to be used from userspace.

Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-19 17:48:08 +02:00
Domenico Andreoli
ab7e9b067f PM: hibernate: Incorporate concurrency handling
Hibernation concurrency handling is currently delegated to user.c,
where it's also used for regulating the access to the snapshot device.

In the prospective of making user.c a separate configuration option,
such mutual exclusion is brought into hibernate.c and made available
through accessor helpers hereby introduced.

Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-19 17:48:08 +02:00
Emil Velikov
6400b5a0f6 kernel/power: constify sysrq_key_op
With earlier commits, the API no longer discards the const-ness of the
sysrq_key_op. As such we can add the notation.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-kernel@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: linux-pm@vger.kernel.org
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Link: https://lore.kernel.org/r/20200513214351.2138580-10-emil.l.velikov@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-15 14:53:20 +02:00
Dexuan Cui
2351f8d295 PM: hibernate: Freeze kernel threads in software_resume()
Currently the kernel threads are not frozen in software_resume(), so
between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(),
system_freezable_power_efficient_wq can still try to submit SCSI
commands and this can cause a panic since the low level SCSI driver
(e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept
any SCSI commands: https://lkml.org/lkml/2020/4/10/47

At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying
to resolve the issue from hv_storvsc, but with the help of
Bart Van Assche, I realized it's better to fix software_resume(),
since this looks like a generic issue, not only pertaining to SCSI.

Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-27 10:30:30 +02:00
Christoph Hellwig
0f5c4c6e0e PM / sleep: handle the compat case in snapshot_set_swap_area()
Use in_compat_syscall to copy directly from the 32-bit ABI structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-06 21:42:36 +02:00
Christoph Hellwig
88a77559cc PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper
Move the handling of the SNAPSHOT_SET_SWAP_AREA ioctl from the main
ioctl helper into a helper function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-06 21:42:36 +02:00
Linus Torvalds
ef05db16bb Additional power management updates for 5.7-rc1
- Fix corner-case suspend-to-idle wakeup issue on systems where
    the ACPI SCI is shared with another wakeup source (Hans de Goede).
 
  - Add document describing system-wide suspend and resume code flows
    to the admin guide (Rafael Wysocki).
 
  - Add kernel command line option to set pm_debug_messages (Chen Yu).
 
  - Choose schedutil as the preferred scaling governor by default on
    ARM big.LITTLE systems and on x86 systems using the intel_pstate
    driver in the passive mode (Linus Walleij, Rafael Wysocki).
 
  - Drop racy and redundant checks from the PM core's device_prepare()
    routine (Rafael Wysocki).
 
  - Make resume from hibernation take the hibernation_restore() return
    value into account (Dexuan Cui).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl6LN9kSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxqj0P/3Uh16YIKOUeXosqtptJJiWgZoWu82aR
 H2uzc9hCV6k/tnSG/Y3ZGgI4dUZGcJ94By80XoWceZ97CC1YuBHpF/dHSupTCcvt
 tBEP9H2qbSLi9d2PhlEUJ0B6PBpWEjCzOb519Tzj4sd67FGhsKeihXl+WXdgaaKZ
 hpdj+tBkQ6Dxb7CXVw/mUpWlg/nmF+yRqpK5By+V0WMOlQtg4Ecb3lMqaJPr7b4F
 W/I5m4PE9QG+VetSSXoe7CKfLo2fsDtyHH6ioTJ9xklcOflIblZ3MYf3sKgOuuny
 m/6Zm71HmQwWRkjA/V+C/eTr0VX2TuMUALfA8i+uZBo//I58TTJq9oqosu8eao2P
 5MR79Vwa+eqtYRgpHCTM4JO1iyKiE+FTfVAPOS4hHOKjqY9BUj0YtWHxk8abdzQd
 owv9DjEtGKcipAnCKtWDIS9ikZ5TOsYiGgWyJQnoonpFY+0s3DwtRdi+gBLJliSj
 5j7sRv9wJrJRFZ7tRMHHAqoQkSYqf4YdBuRBG7F/M44anveORBJNaTbPgm30qbUP
 FXD9hTJn/4Q2nmBSzqvEv0+TJrWoCyKduzr5uNsR0eyg48w9mrlUDDkW7dOMb3DU
 WV6QGNomMuKIY4DHKCo3/k21NmzW2JN7zhUrErCnl0bXjV2AM71Cb2xjUD1eNOqV
 9LxPDvAmZoGt
 =QFDr
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "Additional power management updates.

  These fix a corner-case suspend-to-idle wakeup issue on systems where
  the ACPI SCI is shared with another wakeup source, add a kernel
  command line option to set pm_debug_messages via the kernel command
  line, add a document desctibing system-wide suspend and resume code
  flows, modify cpufreq Kconfig to choose schedutil as the preferred
  governor by default in a couple of cases and do some assorted
  cleanups.

  Specifics:

   - Fix corner-case suspend-to-idle wakeup issue on systems where the
     ACPI SCI is shared with another wakeup source (Hans de Goede).

   - Add document describing system-wide suspend and resume code flows
     to the admin guide (Rafael Wysocki).

   - Add kernel command line option to set pm_debug_messages (Chen Yu).

   - Choose schedutil as the preferred scaling governor by default on
     ARM big.LITTLE systems and on x86 systems using the intel_pstate
     driver in the passive mode (Linus Walleij, Rafael Wysocki).

   - Drop racy and redundant checks from the PM core's device_prepare()
     routine (Rafael Wysocki).

   - Make resume from hibernation take the hibernation_restore() return
     value into account (Dexuan Cui)"

* tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler()
  ACPI: PM: Add acpi_[un]register_wakeup_handler()
  Documentation: PM: sleep: Document system-wide suspend code flows
  cpufreq: Select schedutil when using big.LITTLE
  PM: sleep: Add pm_debug_messages kernel command line option
  PM: sleep: core: Drop racy and redundant checks from device_prepare()
  PM: hibernate: Propagate the return value of hibernation_restore()
  cpufreq: intel_pstate: Select schedutil as the default governor
2020-04-06 10:14:39 -07:00
Linus Torvalds
ad0bf4eb91 s390 updates for the 5.7 merge window
- Update maintainers. Niklas Schnelle takes over zpci and Vineeth Vijayan
   common io code.
 
 - Extend cpuinfo to include topology information.
 
 - Add new extended counters for IBM z15 and sampling buffer allocation
   rework in perf code.
 
 - Add control over zeroing out memory during system restart.
 
 - CCA protected key block version 2 support and other fixes/improvements
   in crypto code.
 
 - Convert to new fallthrough; annotations.
 
 - Replace zero-length arrays with flexible-arrays.
 
 - QDIO debugfs and other small improvements.
 
 - Drop 2-level paging support optimization for compat tasks. Varios
   mm cleanups.
 
 - Remove broken and unused hibernate / power management support.
 
 - Remove fake numa support which does not bring any benefits.
 
 - Exclude offline CPUs from CPU topology masks to be more consistent
   with other architectures.
 
 - Prevent last branching instruction address leaking to userspace.
 
 - Other small various fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl6Ig2YACgkQjYWKoQLX
 FBj2gggAibnHOl9d0ngX1mVT4nz51R3V8z5sEQjNMr2uHBmaTqs7pi/00gaFMxoC
 NngVEXvL443jSogQivthGgXPpRCV9xdKE3sp38j7fF4LgHoeuDtGd1oaX4W9Rqk0
 7Yii35EaO2e2WHdOKaAbu+ZvDRunFjERyntc51MYaIUivFosogSo07vC73vFIArF
 VGStS09fJ4Ny76ott896T7Ulx1Iek/MkF1vponEMLGNUIcLIQbbxZxOwgz0pHuEF
 SlyyJBnhOIaAJGOYlKREQDt1cew+hsxluPU+a01bwdsmdZv9LH1BGwLayDqTH58i
 QWvtEpzJFmDvo9jGM1v81ebaGnyCKg==
 =hiGF
 -----END PGP SIGNATURE-----

Merge tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Update maintainers. Niklas Schnelle takes over zpci and Vineeth
   Vijayan common io code.

 - Extend cpuinfo to include topology information.

 - Add new extended counters for IBM z15 and sampling buffer allocation
   rework in perf code.

 - Add control over zeroing out memory during system restart.

 - CCA protected key block version 2 support and other
   fixes/improvements in crypto code.

 - Convert to new fallthrough; annotations.

 - Replace zero-length arrays with flexible-arrays.

 - QDIO debugfs and other small improvements.

 - Drop 2-level paging support optimization for compat tasks. Varios mm
   cleanups.

 - Remove broken and unused hibernate / power management support.

 - Remove fake numa support which does not bring any benefits.

 - Exclude offline CPUs from CPU topology masks to be more consistent
   with other architectures.

 - Prevent last branching instruction address leaking to userspace.

 - Other small various fixes and improvements all over the code.

* tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (57 commits)
  s390/mm: cleanup init_new_context() callback
  s390/mm: cleanup virtual memory constants usage
  s390/mm: remove page table downgrade support
  s390/qdio: set qdio_irq->cdev at allocation time
  s390/qdio: remove unused function declarations
  s390/ccwgroup: remove pm support
  s390/ap: remove power management code from ap bus and drivers
  s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc
  s390/mm: cleanup arch_get_unmapped_area() and friends
  s390/ism: remove pm support
  s390/cio: use fallthrough;
  s390/vfio: use fallthrough;
  s390/zcrypt: use fallthrough;
  s390: use fallthrough;
  s390/cpum_sf: Fix wrong page count in error message
  s390/diag: fix display of diagnose call statistics
  s390/ap: Remove ap device suspend and resume callbacks
  s390/pci: Improve handling of unset UID
  s390/pci: Fix zpci_alloc_domain() over allocation
  s390/qdio: pass ISC as parameter to chsc_sadc()
  ...
2020-04-04 09:45:50 -07:00
Linus Torvalds
0ad5b053d4 Char/Misc driver patches for 5.7-rc1
Here is the big set of char/misc/other driver patches for 5.7-rc1.
 
 Lots of things in here, and it's later than expected due to some reverts
 to resolve some reported issues.  All is now clean with no reported
 problems in linux-next.
 
 Included in here is:
 	- interconnect updates
 	- mei driver updates
 	- uio updates
 	- nvmem driver updates
 	- soundwire updates
 	- binderfs updates
 	- coresight updates
 	- habanalabs updates
 	- mhi new bus type and core
 	- extcon driver updates
 	- some Kconfig cleanups
 	- other small misc driver cleanups and updates
 
 As mentioned, all have been in linux-next for a while, and with the last
 two reverts, all is calm and good.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodfvA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynzCQCfROhar3E8EhYEqSOP6xq6uhX9uegAnRgGY2rs
 rN4JJpOcTddvZcVlD+vo
 =ocWk
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big set of char/misc/other driver patches for 5.7-rc1.

  Lots of things in here, and it's later than expected due to some
  reverts to resolve some reported issues. All is now clean with no
  reported problems in linux-next.

  Included in here is:
   - interconnect updates
   - mei driver updates
   - uio updates
   - nvmem driver updates
   - soundwire updates
   - binderfs updates
   - coresight updates
   - habanalabs updates
   - mhi new bus type and core
   - extcon driver updates
   - some Kconfig cleanups
   - other small misc driver cleanups and updates

  As mentioned, all have been in linux-next for a while, and with the
  last two reverts, all is calm and good"

* tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
  Revert "driver core: platform: Initialize dma_parms for platform devices"
  Revert "amba: Initialize dma_parms for amba devices"
  amba: Initialize dma_parms for amba devices
  driver core: platform: Initialize dma_parms for platform devices
  bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
  bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
  bus: mhi: core: Add support for reading MHI info from device
  misc: rtsx: set correct pcr_ops for rts522A
  speakup: misc: Use dynamic minor numbers for speakup devices
  mei: me: add cedar fork device ids
  coresight: do not use the BIT() macro in the UAPI header
  Documentation: provide IBM contacts for embargoed hardware
  nvmem: core: remove nvmem_sysfs_get_groups()
  nvmem: core: use is_bin_visible for permissions
  nvmem: core: use device_register and device_unregister
  nvmem: core: add root_only member to nvmem device struct
  extcon: axp288: Add wakeup support
  extcon: Mark extcon_get_edev_name() function as exported symbol
  extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
  dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
  ...
2020-04-03 13:22:40 -07:00
Chen Yu
db96a75946 PM: sleep: Add pm_debug_messages kernel command line option
Debug messages from the system suspend/hibernation infrastructure
are disabled by default, and can only be enabled after the system
has boot up via /sys/power/pm_debug_messages.

This makes the hibernation resume hard to track as it involves system
boot up across hibernation.  There's no chance for software_resume()
to track the resume process, for example.

Add a kernel command line option to set pm_debug_messages during
boot up.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-02 15:29:56 +02:00
Dexuan Cui
3704a6a445 PM: hibernate: Propagate the return value of hibernation_restore()
If hibernation_restore() fails, the 'error' should not be zero.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-01 11:36:11 +02:00
Rafael J. Wysocki
ada0629bd3 Merge branches 'pm-core', 'pm-sleep', 'pm-acpi' and 'pm-domains'
* pm-core:
  PM: runtime: Add pm_runtime_get_if_active()

* pm-sleep:
  PM: sleep: wakeup: Skip wakeup_source_sysfs_remove() if device is not there
  PM / hibernate: Remove unnecessary compat ioctl overrides
  PM: hibernate: fix docs for ioctls that return loff_t via pointer
  PM: sleep: wakeup: Use built-in RCU list checking
  PM: sleep: core: Use built-in RCU list checking

* pm-acpi:
  ACPI: PM: s2idle: Refine active GPEs check
  ACPICA: Allow acpi_any_gpe_status_set() to skip one GPE
  ACPI: PM: s2idle: Fix comment in acpi_s2idle_prepare_late()

* pm-domains:
  cpuidle: psci: Split psci_dt_cpu_init_idle()
  PM / Domains: Allow no domain-idle-states DT property in genpd when parsing
2020-03-30 14:46:58 +02:00
Rafael J. Wysocki
8f1073ed8c Merge branch 'pm-qos'
* pm-qos: (30 commits)
  PM: QoS: annotate data races in pm_qos_*_value()
  Documentation: power: fix pm_qos_interface.rst format warning
  PM: QoS: Make CPU latency QoS depend on CONFIG_CPU_IDLE
  Documentation: PM: QoS: Update to reflect previous code changes
  PM: QoS: Update file information comments
  PM: QoS: Drop PM_QOS_CPU_DMA_LATENCY and rename related functions
  sound: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: usb: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: tty: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: spi: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: net: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: mmc: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: media: Call cpu_latency_qos_*() instead of pm_qos_*()
  drivers: hsi: Call cpu_latency_qos_*() instead of pm_qos_*()
  drm: i915: Call cpu_latency_qos_*() instead of pm_qos_*()
  x86: platform: iosf_mbi: Call cpu_latency_qos_*() instead of pm_qos_*()
  cpuidle: Call cpu_latency_qos_limit() instead of pm_qos_request()
  PM: QoS: Add CPU latency QoS API wrappers
  PM: QoS: Adjust pm_qos_request() signature and reorder pm_qos.h
  PM: QoS: Simplify definitions of CPU latency QoS trace events
  ...
2020-03-30 14:45:57 +02:00
Heiko Carstens
086b2d7837 PM: remove s390 specific callbacks
ARCH_SAVE_PAGE_KEYS has been introduced in order to be able to save
and restore s390 specific storage keys into a hibernation image.
With hibernation support removed from s390 there is no point in
keeping the callbacks.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23 13:41:55 +01:00
Zhenzhong Duan
6ce6ae7c17 misc: cleanup minor number definitions in c file into miscdevice.h
HWRNG_MINOR and RNG_MISCDEV_MINOR are duplicate definitions, use
unified HWRNG_MINOR instead and moved into miscdevice.h

ANSLCD_MINOR and LCD_MINOR are duplicate definitions, use unified
LCD_MINOR instead and moved into miscdevice.h

MISCDEV_MINOR is renamed to PXA3XX_GCU_MINOR and moved into
miscdevice.h

Other definitions are just moved without any change.

Link: https://lore.kernel.org/lkml/20200120221323.GJ15860@mit.edu/t/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Build-tested-by: Willy TARREAU <wtarreau@haproxy.com>
Build-tested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20200311071654.335-2-zhenzhong.duan@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 12:27:03 +01:00
Eric Biggers
fba616a49f PM / hibernate: Remove unnecessary compat ioctl overrides
Since the SNAPSHOT_GET_IMAGE_SIZE, SNAPSHOT_AVAIL_SWAP_SIZE, and
SNAPSHOT_ALLOC_SWAP_PAGE ioctls produce an loff_t result, and loff_t is
always 64-bit even in the compat case, there's no reason to have the
special compat handling for these ioctls.  Just remove this unneeded
code so that these ioctls call into snapshot_ioctl() directly, doing
just the compat_ptr() conversion on the argument.

(This unnecessary code was also causing a KMSAN false positive.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 11:55:08 +01:00
Qian Cai
a534e924c5 PM: QoS: annotate data races in pm_qos_*_value()
The target_value field in struct pm_qos_constraints is used for
lockless access to the effective constraint value of a given QoS
list, so the readers of it cannot expect it to always reflect the
most recent effective constraint value.  However, they can and do
expect it to be equal to a valid effective constraint value computed
at a certain time in the past (event though it may not be the most
recent one), so add READ|WRITE_ONCE() annotations around the
target_value accesses to prevent the compiler from possibly causing
that expectation to be unmet by generating code in an exceptionally
convoluted way.

Signed-off-by: Qian Cai <cai@lca.pw>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-03 23:34:51 +01:00
Alexandre Belloni
b0c609ab20 PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
Fix a mistake in a variable name in a comment.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-20 11:58:01 +01:00
Rafael J. Wysocki
814d51f888 PM: QoS: Make CPU latency QoS depend on CONFIG_CPU_IDLE
Because cpuidle is the only user of the effective constraint coming
from the CPU latency QoS, add #ifdef CONFIG_CPU_IDLE around that code
to avoid building it unnecessarily.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-14 10:37:27 +01:00
Rafael J. Wysocki
fe52de36dc PM: QoS: Update file information comments
Update the file information comments in include/linux/pm_qos.h
and kernel/power/qos.c by adding titles along with copyright and
authors information to them and changing the qos.c description to
better reflect its contents (outdated information is dropped from
it in particular).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-14 10:37:26 +01:00
Rafael J. Wysocki
67b06ba018 PM: QoS: Drop PM_QOS_CPU_DMA_LATENCY and rename related functions
Drop the PM QoS classes enum including PM_QOS_CPU_DMA_LATENCY,
drop the wrappers around pm_qos_request(), pm_qos_request_active(),
and pm_qos_add/update/remove_request() introduced previously, rename
these functions, respectively, to cpu_latency_qos_limit(),
cpu_latency_qos_request_active(), and
cpu_latency_qos_add/update/remove_request(), and update their
kerneldoc comments.  [While at it, drop some useless comments from
these functions.]

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-14 10:37:26 +01:00
Rafael J. Wysocki
e033b6c175 PM: QoS: Adjust pm_qos_request() signature and reorder pm_qos.h
Change the return type of pm_qos_request() to be the same as the
one of pm_qos_read_value() called by it internally and stop exporting
it to modules (because its only caller, cpuidle, is not modular).

Also move the pm_qos_read_value() header away from the CPU latency
QoS API function headers in pm_qos.h (because it technically does
not belong to that API).

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:46 +01:00
Rafael J. Wysocki
333eed7d20 PM: QoS: Simplify definitions of CPU latency QoS trace events
Modify the definitions of the CPU latency QoS trace events to take
one argument (since PM_QOS_CPU_DMA_LATENCY is always passed as the
pm_qos_class argument to them) and update the documentation of them
accordingly (while at it, make it explicitly mention CPU latency QoS
and relocate it after the device PM QoS trace events documentation).

The names and output format of the trace events do not change to
preserve user space compatibility.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:39 +01:00
Rafael J. Wysocki
2552d35201 PM: QoS: Rename things related to the CPU latency QoS
First, rename PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE to
PM_QOS_CPU_LATENCY_DEFAULT_VALUE and update all of the code
referring to it accordingly.

Next, rename cpu_dma_constraints to cpu_latency_constraints, move
the definition of it closer to the functions referring to it and
update all of them accordingly.  [While at it, add a comment to mark
the start of the code related to the CPU latency QoS.]

Finally, rename the pm_qos_power_*() family of functions and
pm_qos_power_fops to cpu_latency_qos_*() and cpu_latency_qos_fops,
respectively, and update the definition of cpu_latency_qos_miscdev.
[While at it, update the miscdev interface code start comment.]

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:33 +01:00
Rafael J. Wysocki
3a4a004222 PM: QoS: Drop PM_QOS_CPU_DMA_LATENCY notifier chain
Notice that pm_qos_remove_notifier() is not used at all and the only
caller of pm_qos_add_notifier() is the cpuidle core, which only needs
the PM_QOS_CPU_DMA_LATENCY notifier to invoke wake_up_all_idle_cpus()
upon changes of the PM_QOS_CPU_DMA_LATENCY target value.

First, to ensure that wake_up_all_idle_cpus() will be called
whenever the PM_QOS_CPU_DMA_LATENCY target value changes, modify the
pm_qos_add/update/remove_request() family of functions to check if
the effective constraint for the PM_QOS_CPU_DMA_LATENCY has changed
and call wake_up_all_idle_cpus() directly in that case.

Next, drop the PM_QOS_CPU_DMA_LATENCY notifier from cpuidle as it is
not necessary any more.

Finally, drop both pm_qos_add_notifier() and pm_qos_remove_notifier(),
as they have no callers now, along with cpu_dma_lat_notifier which is
only used by them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:27 +01:00
Rafael J. Wysocki
02c92a3789 PM: QoS: Redefine struct pm_qos_request and drop struct pm_qos_object
First, change the definition of struct pm_qos_request so that it
contains a struct pm_qos_constraints pointer (called "qos") instead
of a PM QoS class number (in preparation for dropping the PM QoS
classes concept altogether going forward) and move its definition
(along with the definition of struct pm_qos_flags_request that does
not change) after the definition of struct pm_qos_constraints.

Next, drop the definition of struct pm_qos_object and the null_pm_qos
and cpu_dma_pm_qos variables of that type along with pm_qos_array[]
holding pointers to them and change the code to refer to the
pm_qos_constraints structure directly or to use the new qos pointer
in struct pm_qos_request for that instead of going through
pm_qos_array[] to access it.  Also update kerneldoc comments that
mention pm_qos_class to refer to PM_QOS_CPU_DMA_LATENCY directly
instead.

Finally, drop register_pm_qos_misc(), introduce cpu_latency_qos_miscdev
(with the name field set to "cpu_dma_latency") to implement the
CPU latency QoS interface in /dev/ and register it directly from
pm_qos_power_init().

After these changes the notion of PM QoS classes remains only in the
API (in the form of redundant function parameters that are ignored)
and in the definitions of PM QoS trace events.

While at it, some redundant local variables are dropped etc.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:19 +01:00
Rafael J. Wysocki
299a229830 PM: QoS: Clean up misc device file operations
Reorder the code to avoid using extra function header declarations
for the pm_qos_power_*() family of functions and drop those
declarations.

Also clean up the internals of those functions to consolidate checks,
avoid using redundant local variables and similar.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:13 +01:00
Rafael J. Wysocki
63cffc0534 PM: QoS: Drop iterations over global QoS classes
After commit c3082a674f ("PM: QoS: Get rid of unused flags") the
only global PM QoS class in use is PM_QOS_CPU_DMA_LATENCY, so it
does not really make sense to iterate over global QoS classes
anywhere, since there is only one.

Remove iterations over global QoS classes from the code and use
PM_QOS_CPU_DMA_LATENCY as the target class directly where needed.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:07 +01:00
Rafael J. Wysocki
dcd70ca1a3 PM: QoS: Clean up pm_qos_read_value() and pm_qos_get/set_value()
Move the definition of pm_qos_read_value() before the one of
pm_qos_get_value() and add a kerneldoc comment to it (as it is
not static).

Also replace the BUG() in pm_qos_get_value() with WARN() (to
prevent the kernel from crashing if an unknown PM QoS type is
used by mistake) and drop the comment next to it that is not
necessary any more.

Additionally, drop the unnecessary inline modifier from the header
of pm_qos_set_value().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:26:02 +01:00
Rafael J. Wysocki
7b35370b2e PM: QoS: Clean up pm_qos_update_target() and pm_qos_update_flags()
Clean up the pm_qos_update_target() function:
 * Update its kerneldoc comment.
 * Drop the redundant ret local variable from it.
 * Reorder definitions of local variables in it.
 * Update a comment in it.

Also update the kerneldoc comment of pm_qos_update_flags() (e.g.
notifiers are not called by it any more) and add one emtpy line
to its body (for more visual clarity).

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:25:55 +01:00
Rafael J. Wysocki
87ad735679 PM: QoS: Drop the PM_QOS_SUM QoS type
The PM_QOS_SUM QoS type is not used, so drop it along with the
code referring to it in pm_qos_get_value() and the related local
variables in there.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:25:48 +01:00
Rafael J. Wysocki
5a7ea52b6f PM: QoS: Drop pm_qos_update_request_timeout()
The pm_qos_update_request_timeout() function is not called from
anywhere, so drop it along with the work member in struct
pm_qos_request needed by it.

Also drop the useless pm_qos_update_request_timeout trace event
that is only triggered by that function (so it never triggers at
all) and update the trace events documentation accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-13 11:25:40 +01:00
Rafael J. Wysocki
3489662042 PM: QoS: Drop debugfs interface
After commit c3082a674f ("PM: QoS: Get rid of unused flags") the
only global PM QoS class in use is PM_QOS_CPU_DMA_LATENCY and the
existing PM QoS debugfs interface has become overly complicated (as
it takes other potentially possible PM QoS classes that are not there
any more into account).  It is also not particularly useful (the
"type" of the PM_QOS_CPU_DMA_LATENCY is known, its aggregate value
can be read from /dev/cpu_dma_latency and the number of requests in
the queue does not really matter) and there are no known users
depending on it.  Moreover, there are dedicated trace events that
can be used for tracking PM QoS usage with much higher precision.

For these reasons, drop the PM QoS debugfs interface altogether.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-02-12 10:28:42 +01:00
Rafael J. Wysocki
e3728b50cd ACPI: PM: s2idle: Avoid possible race related to the EC GPE
It is theoretically possible for the ACPI EC GPE to be set after the
s2idle_ops->wake() called from s2idle_loop() has returned and before
the subsequent pm_wakeup_pending() check is carried out.  If that
happens, the resulting wakeup event will cause the system to resume
even though it may be a spurious one.

To avoid that race, first make the ->wake() callback in struct
platform_s2idle_ops return a bool value indicating whether or not
to let the system resume and rearrange s2idle_loop() to use that
value instad of the direct pm_wakeup_pending() call if ->wake() is
present.

Next, rework acpi_s2idle_wake() to process EC events and check
pm_wakeup_pending() before re-arming the SCI for system wakeup
to prevent it from triggering prematurely and add comments to
that function to explain the rationale for the new code flow.

Fixes: 56b9918490 ("PM: sleep: Simplify suspend-to-idle control flow")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 10:11:02 +01:00
Rafael J. Wysocki
322e929d19 Merge back new material related to system-wide PM for v5.6. 2020-01-23 16:00:56 +01:00
Alexander Potapenko
18451f9f9e PM: hibernate: fix crashes with init_on_free=1
Upon resuming from hibernation, free pages may contain stale data from
the kernel that initiated the resume. This breaks the invariant
inflicted by init_on_free=1 that freed pages must be zeroed.

To deal with this problem, make clear_free_pages() also clear the free
pages when init_on_free is enabled.

Fixes: 6471384af2 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-16 23:51:45 +01:00
Jonas Meurer
c052bf82c6 PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
The sysfs attribute `/sys/power/sync_on_suspend` controls, whether or not
filesystems are synced by the kernel before system suspend.

Congruously, the behaviour of build-time switch CONFIG_SUSPEND_SKIP_SYNC
is slightly changed: It now defines the run-tim default for the new sysfs
attribute `/sys/power/sync_on_suspend`.

The run-time attribute is added because the existing corresponding
build-time Kconfig flag for (`CONFIG_SUSPEND_SKIP_SYNC`) is not flexible
enough. E.g. Linux distributions that provide pre-compiled kernels
usually want to stick with the default (sync filesystems before suspend)
but under special conditions this needs to be changed.

One example for such a special condition is user-space handling of
suspending block devices (e.g. using `cryptsetup luksSuspend` or `dmsetup
suspend`) before system suspend. The Kernel trying to sync filesystems
after the underlying block device already got suspended obviously leads
to dead-locks. Be aware that you have to take care of the filesystem sync
yourself before suspending the system in those scenarios.

Signed-off-by: Jonas Meurer <jonas@freesources.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-16 21:47:03 +01:00
Colin Ian King
5c0e9de065 PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
There is a spelling mistake in a pr_info message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-10 12:15:30 +01:00
Luigi Semenzato
7a7b99bf80 PM: hibernate: Add more logging on hibernation failure
Hibernation fails when the kernel cannot allocate enough memory
to copy all pages of RAM in use.

Ensure that the failure reason is clearly logged, and clearly
attributable to the hibernation module.

Signed-off-by: Luigi Semenzato <semenzato@google.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-07 13:31:12 +01:00
Wen Yang
809ed78a83 PM: hibernate: improve arithmetic division in preallocate_highmem_fraction()
do_div() does a 64-by-32 division. Use div64_u64() instead of
do_div() if the divisor is u64, to avoid truncation to 32-bit.

This change also cleans up code a tad.

Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-07 12:42:56 +01:00
Alexandre Belloni
2a2ef473cc PM: sleep: Switch to rtc_time64_to_tm()/rtc_tm_to_time64()
Call the 64bit versions of rtc_tm time conversion to avoid the y2038 issue.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-20 09:58:08 +01:00
Linus Torvalds
ef867c12f3 Additional power management updates for 5.5-rc1
- Avoid a race condition in the ACPI EC driver that may cause
    systems to be unable to leave suspend-to-idle (Rafael Wysocki).
 
  - Drop the "disabled" field, which is redundant, from struct
    cpuidle_state (Rafael Wysocki).
 
  - Reintroduce device PM QoS frequency constraints (temporarily
    introduced and than dropped during the 5.4 cycle) in preparation
    for adding QoS support to devfreq (Leonard Crestez).
 
  - Clean up indentation (in multiple places) and the cpuidle drivers
    help text in Kconfig (Krzysztof Kozlowski, Randy Dunlap).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl3nhpQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxQj4P/2HbVROWMON7q9iWhgO59qABEbqU8M7L
 DaJ2gu+bDe3FQ9Ek6Y2EObfGw3nl9riyGbZH/jVmcOkbuXE+aQXv/j7eEnM9G35+
 8+JSfhucVsohaHVxT2ROMv+7YD+pLyWK1ivuVK/dNcvmxQaC9CKrmn3GF2ujkqNR
 ahdRRzZobGeC6mc8tms3GYpWkd1R5zd74ALGVsw9i/eB3P/YgrlS8HaQynpbaflZ
 qhRKZgsTf8QD6+OG+6HQhWpOfAlG36dsJnvuk0Oa0Cpnw+Zfj6WoR1jpL9ufNWBM
 Re1faTfppy6Hnyxr62Ytkbq2pYozTVAnQM+TKNIGoqxA4OIXvhgQpBqApmuJXpRx
 ZFBfr943f7I2jmAAznHeiW9l3n+4h725rpoxKapnlO3OMRDwCTqxbMahiS+CDULd
 gSu4prnoBdd9WrwiR7M1PA4X2Eb2M0kYFQUr7BltlTgjLHjQy47Mnazh9WxYBAv8
 p1tip39QHeZcdO3rdW1O21ljNekEIOFAi5bVVECsR6RyA+KR+vHgFP9pMUWyCpgU
 +rde+MdGKIL3sw/szNhTTDfQ49vz/ObcipJg3/rakq6jXeFL4n5NwMy5jYrquPlx
 xxHx3Yp1PCBEZ1TXS6+JjznvQBU/G/7YvoWobpqwN/IL1wa55rWOX8Ah1+YnfLzF
 fGzh0EvPJKyM
 =KAyd
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull additional power management updates from Rafael Wysocki:
 "These fix an ACPI EC driver bug exposed by the recent rework of the
  suspend-to-idle code flow, reintroduce frequency constraints into
  device PM QoS (in preparation for adding QoS support to devfreq), drop
  a redundant field from struct cpuidle_state and clean up Kconfig in
  some places.

  Specifics:

   - Avoid a race condition in the ACPI EC driver that may cause systems
     to be unable to leave suspend-to-idle (Rafael Wysocki)

   - Drop the "disabled" field, which is redundant, from struct
     cpuidle_state (Rafael Wysocki)

   - Reintroduce device PM QoS frequency constraints (temporarily
     introduced and than dropped during the 5.4 cycle) in preparation
     for adding QoS support to devfreq (Leonard Crestez)

   - Clean up indentation (in multiple places) and the cpuidle drivers
     help text in Kconfig (Krzysztof Kozlowski, Randy Dunlap)"

* tag 'pm-5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: s2idle: Rework ACPI events synchronization
  ACPI: EC: Rework flushing of pending work
  PM / devfreq: Add missing locking while setting suspend_freq
  PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY
  PM / QoS: Reorder pm_qos/freq_qos/dev_pm_qos structs
  PM / QoS: Initial kunit test
  PM / QoS: Redefine FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX
  power: avs: Fix Kconfig indentation
  cpufreq: Fix Kconfig indentation
  cpuidle: minor Kconfig help text fixes
  cpuidle: Drop disabled field from struct cpuidle_state
  cpuidle: Fix Kconfig indentation
2019-12-04 10:48:09 -08:00
Linus Torvalds
ceb3074745 y2038: syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended
 for namespace cleaning: the kernel defines the traditional
 time_t, timeval and timespec types that often lead to y2038-unsafe
 code. Even though the unsafe usage is mostly gone from the kernel,
 having the types and associated functions around means that we
 can still grow new users, and that we may be missing conversions
 to safe types that actually matter.
 
 There are still a number of driver specific patches needed to
 get the last users of these types removed, those have been
 submitted to the respective maintainers.
 
 Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJd3D+wAAoJEJpsee/mABjZfdcQAJvl6e+4ddKoDMIVJqVCE25N
 meFRgA7S8jy6BefEVeUgI8TxK+amGO36szMBUEnZxSSxq9u+gd13m5bEK6Xq/ov7
 4KTAiA3Irm/W5FBTktu1zc5ROIra1Xj7jLdubf8wEC3viSXIXB3+68Y28iBN7D2O
 k9kSpwINC5lWeC8guZy2I+2yc4ywUEXao9nVh8C/J+FQtU02TcdLtZop9OhpAa8u
 U19VVH3WHkQI7ZfLvBTUiYK6tlYTiYCnpr8l6sm850CnVv1fzBW+DzmVhPJ6FdFd
 4m5staC0sQ6gVqtjVMBOtT5CdzREse6hpwbKo2GRWFroO5W9tljMOJJXHvv/f6kz
 DxrpUmj37JuRbqAbr8KDmQqPo6M2CRkxFxjol1yh5ER63u1xMwLm/PQITZIMDvPO
 jrFc2C2SdM2E9bKP/RMCVoKSoRwxCJ5IwJ2AF237rrU0sx/zB2xsrOGssx5CWEgc
 3bbk6tDQujJJubnCfgRy1tTxpLZOHEEKw8YhFLLbR2LCtA9pA/0rfLLad16cjA5e
 5jIHxfsFc23zgpzrJeB7kAF/9xgu1tlA5BotOs3VBE89LtWOA9nK5dbPXng6qlUe
 er3xLCfS38ovhUw6DusQpaYLuaYuLM7DKO4iav9kuTMcY9GkbPk7vDD3KPGh2goy
 hY5cSM8+kT1q/THLnUBH
 =Bdbv
 -----END PGP SIGNATURE-----

Merge tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 cleanups from Arnd Bergmann:
 "y2038 syscall implementation cleanups

  This is a series of cleanups for the y2038 work, mostly intended for
  namespace cleaning: the kernel defines the traditional time_t, timeval
  and timespec types that often lead to y2038-unsafe code. Even though
  the unsafe usage is mostly gone from the kernel, having the types and
  associated functions around means that we can still grow new users,
  and that we may be missing conversions to safe types that actually
  matter.

  There are still a number of driver specific patches needed to get the
  last users of these types removed, those have been submitted to the
  respective maintainers"

Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/

* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
  y2038: alarm: fix half-second cut-off
  y2038: ipc: fix x32 ABI breakage
  y2038: fix typo in powerpc vdso "LOPART"
  y2038: allow disabling time32 system calls
  y2038: itimer: change implementation to timespec64
  y2038: move itimer reset into itimer.c
  y2038: use compat_{get,set}_itimer on alpha
  y2038: itimer: compat handling to itimer.c
  y2038: time: avoid timespec usage in settimeofday()
  y2038: timerfd: Use timespec64 internally
  y2038: elfcore: Use __kernel_old_timeval for process times
  y2038: make ns_to_compat_timeval use __kernel_old_timeval
  y2038: socket: use __kernel_old_timespec instead of timespec
  y2038: socket: remove timespec reference in timestamping
  y2038: syscalls: change remaining timeval to __kernel_old_timeval
  y2038: rusage: use __kernel_old_timeval
  y2038: uapi: change __kernel_time_t to __kernel_old_time_t
  y2038: stat: avoid 'time_t' in 'struct stat'
  y2038: ipc: remove __kernel_time_t reference from headers
  y2038: vdso: powerpc: avoid timespec references
  ...
2019-12-01 14:00:59 -08:00
Leonard Crestez
36a8015f89 PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY
Support for adding per-device frequency limits was removed in
commit 2aac8bdf7a ("PM: QoS: Drop frequency QoS types from device PM QoS")
after cpufreq switched to use a new "freq_constraints" construct.

Restore support for per-device freq limits but base this upon
freq_constraints. This is primarily meant to be used by the devfreq
subsystem.

This removes the "static" marking on freq_qos_apply but does not export
it for modules.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-29 12:04:50 +01:00
Rafael J. Wysocki
5a97aa5bbc Merge branches 'pm-sleep', 'pm-domains', 'pm-opp' and 'powercap'
* pm-sleep:
  PM / wakeirq: remove unnecessary parentheses
  PM / core: Clean up some function headers in power.h
  PM / hibernate: memory_bm_find_bit(): Tighten node optimisation

* pm-domains:
  PM / Domains: Convert to dev_to_genpd_safe() in genpd_syscore_switch()
  mmc: tmio: Avoid boilerplate code in ->runtime_suspend()
  PM / Domains: Implement the ->start() callback for genpd
  PM / Domains: Introduce dev_pm_domain_start()

* pm-opp:
  PM / OPP: Support adjusting OPP voltages at runtime

* powercap:
  powercap/intel_rapl: add support for Cometlake desktop
  powercap/intel_rapl: add support for CometLake Mobile
2019-11-26 10:27:49 +01:00
Rafael J. Wysocki
05ff1ba412 PM: QoS: Invalidate frequency QoS requests after removal
Switching cpufreq drivers (or switching operation modes of the
intel_pstate driver from "active" to "passive" and vice versa)
does not work on some x86 systems with ACPI after commit
3000ce3c52 ("cpufreq: Use per-policy frequency QoS"), because
the ACPI _PPC and thermal code uses the same frequency QoS request
object for a given CPU every time a cpufreq driver is registered
and freq_qos_remove_request() does not invalidate the request after
removing it from its QoS list, so freq_qos_add_request() complains
and fails when that request is passed to it again.

Fix the issue by modifying freq_qos_remove_request() to clear the qos
and type fields of the frequency request pointed to by its argument
after removing it from its QoS list so as to invalidate it.

Fixes: 3000ce3c52 ("cpufreq: Use per-policy frequency QoS")
Reported-and-tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-11-20 10:46:42 +01:00
Arnd Bergmann
75d319c06e y2038: syscalls: change remaining timeval to __kernel_old_timeval
All of the remaining syscalls that pass a timeval (gettimeofday, utime,
futimesat) can trivially be changed to pass a __kernel_old_timeval
instead, which has a compatible layout, but avoids ambiguity with
the timeval type in user space.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:29 +01:00
Rafael J. Wysocki
77751a466e PM: QoS: Introduce frequency QoS
Introduce frequency QoS, based on the "raw" low-level PM QoS, to
represent min and max frequency requests and aggregate constraints.

The min and max frequency requests are to be represented by
struct freq_qos_request objects and the aggregate constraints are to
be represented by struct freq_constraints objects.  The latter are
expected to be initialized with the help of freq_constraints_init().

The freq_qos_read_value() helper is defined to retrieve the aggregate
constraints values from a given struct freq_constraints object and
there are the freq_qos_add_request(), freq_qos_update_request() and
freq_qos_remove_request() helpers to manipulate the min and max
frequency requests.  It is assumed that the the helpers will not
run concurrently with each other for the same struct freq_qos_request
object, so if that may be the case, their uses must ensure proper
synchronization between them (e.g. through locking).

In addition, freq_qos_add_notifier() and freq_qos_remove_notifier()
are provided to add and remove notifiers that will trigger on aggregate
constraint changes to and from a given struct freq_constraints object,
respectively.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-10-21 02:05:21 +02:00
Andy Whitcroft
da6043fe85 PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
When looking for a bit by number we make use of the cached result from the
preceding lookup to speed up operation.  Firstly we check if the requested
pfn is within the cached zone and if not lookup the new zone.  We then
check if the offset for that pfn falls within the existing cached node.
This happens regardless of whether the node is within the zone we are
now scanning.  With certain memory layouts it is possible for this to
false trigger creating a temporary alias for the pfn to a different bit.
This leads the hibernation code to free memory which it was never allocated
with the expected fallout.

Ensure the zone we are scanning matches the cached zone before considering
the cached node.

Deep thanks go to Andrea for many, many, many hours of hacking and testing
that went into cornering this bug.

Reported-by: Andrea Righi <andrea.righi@canonical.com>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-10-21 01:24:27 +02:00
Ben Dooks
f49249d58a PM: sleep: include <linux/pm_runtime.h> for pm_wq
Include the <linux/runtime_pm.h> for the definition of
pm_wq to avoid the following warning:

kernel/power/main.c:890:25: warning: symbol 'pm_wq' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-10-10 11:11:56 +02:00
Linus Torvalds
aefcf2f4b5 Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current->comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
2019-09-28 08:14:15 -07:00
Rafael J. Wysocki
fc6763a2d7 Merge branches 'pm-opp', 'pm-qos', 'acpi-pm', 'pm-domains' and 'pm-tools'
* pm-opp:
  PM / OPP: Correct Documentation about library location
  opp: of: Support multiple suspend OPPs defined in DT
  dt-bindings: opp: Support multiple opp-suspend properties
  opp: core: add regulators enable and disable
  opp: Don't decrement uninitialized list_kref

* pm-qos:
  PM: QoS: Get rid of unused flags

* acpi-pm:
  ACPI: PM: Print debug messages on device power state changes

* pm-domains:
  PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
  PM / Domains: Simplify genpd_lookup_dev()
  PM / Domains: Align in-parameter names for some genpd functions

* pm-tools:
  pm-graph: make setVal unbuffered again for python2 and python3
  cpupower: update German translation
  tools/power/cpupower: fix 64bit detection when cross-compiling
  cpupower: Add missing newline at end of file
  pm-graph v5.5
2019-09-17 09:49:19 +02:00
Rafael J. Wysocki
1b531e55c5 Merge suspend-to-idle rework material for v5.4.
* pm-s2idle-rework: (21 commits)
  ACPI: PM: s2idle: Always set up EC GPE for system wakeup
  ACPI: PM: s2idle: Avoid rearming SCI for wakeup unnecessarily
  PM: suspend: Fix platform_suspend_prepare_noirq()
  intel-hid: Disable button array during suspend-to-idle
  intel-hid: intel-vbtn: Avoid leaking wakeup_mode set
  ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices
  ACPI: EC: PM: Make acpi_ec_dispatch_gpe() print debug message
  ACPI: EC: PM: Consolidate some code depending on PM_SLEEP
  ACPI: PM: s2idle: Eliminate acpi_sleep_no_ec_events()
  ACPI: PM: s2idle: Switch EC over to polling during "noirq" suspend
  ACPI: PM: s2idle: Add acpi.sleep_no_lps0 module parameter
  ACPI: PM: s2idle: Rearrange lps0_device_attach()
  ACPI: PM: Set up EC GPE for system wakeup from drivers that need it
  PM: sleep: Drop dpm_noirq_begin() and dpm_noirq_end()
  PM: sleep: Integrate suspend-to-idle with generig suspend flow
  PM: sleep: Simplify suspend-to-idle control flow
  ACPI: PM: Set s2idle_wakeup earlier and clear it later
  PM: sleep: Fix possible overflow in pm_system_cancel_wakeup()
  ACPI: EC: Return bool from acpi_ec_dispatch_gpe()
  ACPICA: Return u32 from acpi_dispatch_gpe()
  ...
2019-09-17 09:35:35 +02:00
Amit Kucheria
c3082a674f PM: QoS: Get rid of unused flags
The network_latency and network_throughput flags for PM-QoS have not
found much use in drivers or in userspace since they were introduced.

Commit 4a733ef1be ("mac80211: remove PM-QoS listener") removed the
only user PM_QOS_NETWORK_LATENCY in the kernel a while ago and there
don't seem to be any userspace tools using the character device files
either.

PM_QOS_MEMORY_BANDWIDTH was never even added to the trace events.

Remove all the flags except cpu_dma_latency.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-21 00:38:54 +02:00
Tri Vo
c8377adfa7 PM / wakeup: Show wakeup sources stats in sysfs
Add an ID and a device pointer to 'struct wakeup_source'. Use them to to
expose wakeup sources statistics in sysfs under
/sys/class/wakeup/wakeup<ID>/*.

Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Tri Vo <trong@android.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-21 00:20:40 +02:00
Tri Vo
2434aea58e PM / wakeup: Use wakeup_source_register() in wakelock.c
kernel/power/wakelock.c duplicates wakeup source creation and
registration code from drivers/base/power/wakeup.c.

Change struct wakelock's wakeup source to a pointer and use
wakeup_source_register() function to create and register said wakeup
source. Use wakeup_source_unregister() on cleanup path.

Signed-off-by: Tri Vo <trong@android.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-21 00:20:40 +02:00
Josh Boyer
38bd94b8a1 hibernate: Disable when the kernel is locked down
There is currently no way to verify the resume image when returning
from hibernate.  This might compromise the signed modules trust model,
so until we can work with signed hibernate images we disable it when the
kernel is locked down.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: rjw@rjwysocki.net
Cc: pavel@ucw.cz
cc: linux-pm@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19 21:54:15 -07:00
Chuhong Yuan
d30bdfc0ec PM: sleep: Replace strncmp() with str_has_prefix()
Use str_has_prefix() instead of strncmp() which is less
straightforward in decode_state().

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-16 14:18:53 +02:00
Rafael J. Wysocki
11f26633cc PM: suspend: Fix platform_suspend_prepare_noirq()
After commit ac9eafbe93 ("ACPI: PM: s2idle: Execute LPS0 _DSM
functions with suspended devices"), a NULL pointer may be dereferenced
if suspend-to-idle is attempted on a platform without "traditional"
suspend support due to invalid fall-through in
platform_suspend_prepare_noirq().

Fix that and while at it add missing braces in platform_resume_noirq().

Fixes: ac9eafbe93 ("ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-10 13:18:06 +02:00
Rafael J. Wysocki
ac9eafbe93 ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices
According to Section 3.5 of the "Intel Low Power S0 Idle" document [1],
Function 5 of the LPS0 _DSM is expected to be invoked when the system
configuration matches the criteria for entering the target low-power
state of the platform.  In particular, this means that all devices
should be suspended and in low-power states already when that function
is invoked.

This is not the case currently, however, because Function 5 of the
LPS0 _DSM is invoked by it before the "noirq" phase of device suspend,
which means that some devices may not have been put into low-power
states yet at that point.  That is a consequence of the previous
design of the suspend-to-idle flow that allowed the "noirq" phase of
device suspend and the "noirq" phase of device resume to be carried
out for multiple times while "suspended" (if any spurious wakeup
events were detected) and the point of the LPS0 _DSM Function 5
invocation was chosen so as to call it (and LPS0 _DSM Function 6
analogously) once per suspend-resume cycle (regardless of how many
times the "noirq" phases of device suspend and resume were carried
out while "suspended").

Now that the suspend-to-idle flow has been redesigned to carry out
the "noirq" phases of device suspend and resume once in each cycle,
the code can be reordered to follow the specification that it is
based on more closely.

For this purpose, add ->prepare_late and ->restore_early platform
callbacks for suspend-to-idle, to be executed, respectively, after
the "noirq" phase of suspending devices and before the "noirq"
phase of resuming them and make ACPI use them for the invocation
of LPS0 _DSM functions as appropriate.

While at it, move the LPS0 entry requirements check to be made
before invoking Functions 3 and 5 of the LPS0 _DSM (also once
per cycle) as follows from the specification [1].

Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf # [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
2019-08-08 11:26:01 +02:00
Kalesh Singh
2c8db5bef9 PM/sleep: Expose suspend stats in sysfs
Userspace can get suspend stats from the suspend stats debugfs node.
Since debugfs doesn't have stable ABI, expose suspend stats in
sysfs under /sys/power/suspend_stats.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-05 12:03:18 +02:00
Rafael J. Wysocki
8eb0fd3b55 PM: sleep: Integrate suspend-to-idle with generig suspend flow
After previous changes the suspend-to-idle code flow can be
integrated more tightly with the generic system suspend code flow
by making suspend_enter() call s2idle_loop() later and removing
the direct invocations of dpm_noirq_begin(),
dpm_noirq_suspend_devices(), dpm_noirq_end(), and
dpm_noirq_resume_devices() from the latter, so do that.

This change is not expected to alter functionality.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-23 09:46:50 +02:00
Rafael J. Wysocki
56b9918490 PM: sleep: Simplify suspend-to-idle control flow
After commit 33e4f80ee6 ("ACPI / PM: Ignore spurious SCI wakeups
from suspend-to-idle") the "noirq" phases of device suspend and
resume may run for multiple times during suspend-to-idle, if there
are spurious system wakeup events while suspended.  However, this
is complicated and fragile and actually unnecessary.

The main reason for doing this is that on some systems the EC may
signal system wakeup events (power button events, for example) as
well as events that should not cause the system to resume (spurious
system wakeup events).  Thus, in order to determine whether or not
a given event signaled by the EC while suspended is a proper system
wakeup one, the EC GPE needs to be dispatched and to start with that
was achieved by allowing the ACPI SCI action handler to run, which
was only possible after calling resume_device_irqs().

However, dispatching the EC GPE this way turned out to take too much
time in some cases and some EC events might be missed due to that, so
commit 68e2201185 ("ACPI: EC: Dispatch the EC GPE directly on
s2idle wake") started to dispatch the EC GPE right after a wakeup
event has been detected, so in fact the full ACPI SCI action handler
doesn't need to run any more to deal with the wakeups coming from the
EC.

Use this observation to simplify the suspend-to-idle control flow
so that the "noirq" phases of device suspend and resume are each
run only once in every suspend-to-idle cycle, which is reported to
significantly reduce power drawn by some systems when suspended to
idle (by allowing them to reach a deep platform-wide low-power state
through the suspend-to-idle flow).  [What appears to happen is that
the "noirq" resume of devices after a spurious EC wakeup brings some
devices into a state in which they prevent the platform from reaching
the deep low-power state going forward, even after a subsequent
"noirq" suspend phase, and on some systems the EC triggers such
wakeups already when the "noirq" suspend of devices is running for
the first time in the given suspend/resume cycle, so the platform
cannot reach the deep low-power state at all.]

First, make acpi_s2idle_wake() use the acpi_ec_dispatch_gpe() return
value to determine whether or not the wakeup may have been triggered
by the EC (in which case the system wakeup is canceled and ACPI
events are processed in order to determine whether or not the event
is a proper system wakeup one) and use rearm_wake_irq() (introduced
by a previous change) in it to rearm the ACPI SCI for system wakeup
detection in case the system will remain suspended.

Second, drop acpi_s2idle_sync(), which is not needed any more, and
the corresponding global platform suspend-to-idle callback.

Next, drop the pm_wakeup_pending() check (which is an optimization
only) from __device_suspend_noirq() to prevent it from returning
errors on system wakeups occurring before the "noirq" phase of
device suspend is complete (as in the case of suspend-to-idle it is
not known whether or not these wakeups are suprious at that point),
in order to avoid having to carry out a "noirq" resume of devices
on a spurious system wakeup.

Finally, change the code flow in s2idle_loop() to (1) run the
"noirq" suspend of devices once before starting the loop, (2) check
for spurious EC wakeups (via the platform ->wake callback) for the
first time before calling s2idle_enter(), and (3) run the "noirq"
resume of devices once after leaving the loop.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-23 09:46:40 +02:00
Linus Torvalds
fb4da215ed pci-v5.3-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl0siFoUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzi9A//S4jRyyZrgUr88Az0GbgMhE4b3yqc
 uL7om/Sf+443gG6C+aKkZSM/IE9hrbyIKuYq7GGxDkzZ/HkucZo2yIuAHkPgG4ik
 QQYJ8fJsmMq1bUht87c1ZZwGP0++Deq/Ns2+VNy/WBYqKLulnV0DvEEaJgPs9C5D
 ppwccGdo6UghiujBTpE4ddUBjFjjURWqT6wSnMRDQ4EGwfUhG0MWwwHKI4hbBuaL
 N6refuggdYyUUX5FeUOHa6VF6uTnSSAQ75k+40n4nljdayqoumHLskst77o9q5ZI
 oXjdpwgmuEqYhfp03HEA4Xo/bBxiRj76NuTiEMKvPokxjpanwbLrdV0GhF0OIlM0
 rp1NOI1w+vppFrU+rc2gtq+7hYXFmvdhjS29hFLeD91PP36N5d29jW5NVFpm7GCm
 n4TMGAOsu8RB+bNua6ZbZVcDk2EnPgQeIcM0ZPoBtPK19Fg/rScdEU4u/aFE1Y0Q
 C+Ks7D1qCvFpHzl/xAg0oo9v/jFsWef3qnQWOzot964Zz4W4NSVvB9Ox6Vbfj6C4
 v331LJmlPxG8fxBNA3q28FrTxcG1NW6sgo3WY9VoSp/vc0aqaPKhm7sbraTt5IrI
 TwqA/WhnAHv90MQCGFcofANyYTkjPkKk2QBFK6b0suoAmVdwVWWELi1WaZ+HdvgQ
 JP7YpmC2cXcQBPk=
 =ZGxL
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration changes:

   - Evaluate PCI Boot Configuration _DSM to learn if firmware wants us
     to preserve its resource assignments (Benjamin Herrenschmidt)

   - Simplify resource distribution (Nicholas Johnson)

   - Decode 32 GT/s link speed (Gustavo Pimentel)

  Virtualization:

   - Fix incorrect caching of VF config space size (Alex Williamson)

   - Fix VF driver probing sysfs knobs (Alex Williamson)

  Peer-to-peer DMA:

   - Fix dma_virt_ops check (Logan Gunthorpe)

  Altera host bridge driver:

   - Allow building as module (Ley Foon Tan)

  Armada 8K host bridge driver:

   - add PHYs support (Miquel Raynal)

  DesignWare host bridge driver:

   - Export APIs to support removable loadable module (Vidya Sagar)

   - Enable Relaxed Ordering erratum workaround only on Tegra20 &
     Tegra30 (Vidya Sagar)

  Hyper-V host bridge driver:

   - Fix use-after-free in eject (Dexuan Cui)

  Mobiveil host bridge driver:

   - Clean up and fix many issues, including non-identify mapped
     windows, 64-bit windows, multi-MSI, class code, INTx clearing (Hou
     Zhiqiang)

  Qualcomm host bridge driver:

   - Use clk bulk API for 2.4.0 controllers (Bjorn Andersson)

   - Add QCS404 support (Bjorn Andersson)

   - Assert PERST for at least 100ms (Niklas Cassel)

  R-Car host bridge driver:

   - Add r8a774a1 DT support (Biju Das)

  Tegra host bridge driver:

   - Add support for Gen2, opportunistic UpdateFC and ACK (PCIe protocol
     details) AER, GPIO-based PERST# (Manikanta Maddireddy)

   - Fix many issues, including power-on failure cases, interrupt
     masking in suspend, UPHY settings, AFI dynamic clock gating,
     pending DLL transactions (Manikanta Maddireddy)

  Xilinx host bridge driver:

   - Fix NWL Multi-MSI programming (Bharat Kumar Gogada)

  Endpoint support:

   - Fix 64bit BAR support (Alan Mikhak)

   - Fix pcitest build issues (Alan Mikhak, Andy Shevchenko)

  Bug fixes:

   - Fix NVIDIA GPU multi-function power dependencies (Abhishek Sahu)

   - Fix NVIDIA GPU HDA enablement issue (Lukas Wunner)

   - Ignore lockdep for sysfs "remove" (Marek Vasut)

  Misc:

   - Convert docs to reST (Changbin Du, Mauro Carvalho Chehab)"

* tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (107 commits)
  PCI: Enable NVIDIA HDA controllers
  tools: PCI: Fix installation when `make tools/pci_install`
  PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB
  PCI: Fix typos and whitespace errors
  PCI: mobiveil: Fix INTx interrupt clearing in mobiveil_pcie_isr()
  PCI: mobiveil: Fix infinite-loop in the INTx handling function
  PCI: mobiveil: Move PCIe PIO enablement out of inbound window routine
  PCI: mobiveil: Add upper 32-bit PCI base address setup in inbound window
  PCI: mobiveil: Add upper 32-bit CPU base address setup in outbound window
  PCI: mobiveil: Mask out hardcoded bits in inbound/outbound windows setup
  PCI: mobiveil: Clear the control fields before updating it
  PCI: mobiveil: Add configured inbound windows counter
  PCI: mobiveil: Fix the valid check for inbound and outbound windows
  PCI: mobiveil: Clean-up program_{ib/ob}_windows()
  PCI: mobiveil: Remove an unnecessary return value check
  PCI: mobiveil: Fix error return values
  PCI: mobiveil: Refactor the MEM/IO outbound window initialization
  PCI: mobiveil: Make some register updates more readable
  PCI: mobiveil: Reformat the code for readability
  dt-bindings: PCI: mobiveil: Change gpio_slave and apb_csr to optional
  ...
2019-07-15 20:44:49 -07:00
Linus Torvalds
cf2d213e49 Power management updates for 5.3-rc1
- Improve the handling of shared ACPI power resources in the PCI
    bus type layer (Mika Westerberg).
 
  - Make the PCI layer take link delays required by the PCIe spec
    into account as appropriate and avoid polling devices in D3cold
    for PME (Mika Westerberg).
 
  - Fix some corner case issues in ACPI device power management and
    in the PCI bus type layer, optimiza and clean up the handling of
    runtime-suspended PCI devices during system-wide transitions to
    sleep states (Rafael Wysocki).
 
  - Rework hibernation handling in the ACPI core and the PCI bus type
    to resume runtime-suspended devices before hibernation (which
    allows some functional problems to be avoided) and fix some ACPI
    power management issues related to hiberation (Rafael Wysocki).
 
  - Extend the operating performance points (OPP) framework to support
    a wider range of devices (Rajendra Nayak, Stehpen Boyd).
 
  - Fix issues related to genpd_virt_devs and issues with platforms
    using the set_opp() callback in the OPP framework (Viresh Kumar,
    Dmitry Osipenko).
 
  - Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).
 
  - Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).
 
  - Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210,
    and armada-37xx cpufreq drivers (David Arcari, Florian Fainelli,
    Paweł Chmiel, YueHaibing).
 
  - Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).
 
  - Fix minor issue in the ACPI system sleep support code and export
    one function from it (Lenny Szubowicz, Dexuan Cui).
 
  - Clean up assorted pieces of PM code and documentation (Kefeng Wang,
    Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
    Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).
 
  - Update the pm-graph utility to v5.4 (Todd Brandt).
 
  - Fix and clean up the cpupower utility (Abhishek Goel, Nick Black).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl0jK18SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxgEAP/RbPe71Y9ufKu64L3EtgV6mS9iuLEhux
 /Ad9laLNeM1b0oceT3QxGk7xQCacnZcBlcaqXVWI4NRsn4RBZp1cYZngpgJ9DP6E
 ONr8hzyzDOMVReba3XJIF8H+WoTKjywMYtFutjdx6dRe2ZJLutqnuZ0JbH1YSSK7
 IxOt0mJVALf2M4Zz7F17d+n3yGE/4xAPBVbj/rBRcTEsGYlR/Hoxs7iF6EBau7fy
 R5drUH6XSrWk8adc+z7l3BTGqMMYj9deRSfAWB3wpM4YK7Fv7msX/amBoGINkdn6
 xP/ZcrHvhKKzE89MS8OUGP4rGVwq+7tu6mktnYL/tpKgutJqqx5LVvrLsGDSWr+W
 /aJExN8Eb4Jh98C6vog3XUJoqBxkVGbU8qoCBU3jlFsaznFEWjW9IKhBHs5CIaqz
 MXte6AsJ8lvFzxILjvx0m2206wNpRJRXYLX3a/BSBxa4OgOESjIpBTmbPfOwbxwj
 8z9hIVlDTTDtnF6BEyDQr1fjPi3Mxl7ibGnoqRrJm36VKBy9VZNNwG/0Y2oSvm6k
 Es8CiTWA3A/46dCZxGr18/9Vbfxn1Yvg9QZ1lCE5Fqij+0F2yRApbHZgUro+1rji
 6J8OWC5r5JccdKuGHh4RH/asMFhD0cAR/gUsRzS4dz/wz2jVIN1FstdD1aKN5GBy
 d0lchx/AKR5H
 =aBN3
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These update PCI and ACPI power management (improved handling of ACPI
  power resources and PCIe link delays, fixes related to corner cases,
  hibernation handling rework), fix and extend the operating performance
  points (OPP) framework, add new cpufreq drivers for Raspberry Pi and
  imx8m chips, update some other cpufreq drivers, clean up assorted
  pieces of PM code and documentation and update tools.

  Specifics:

   - Improve the handling of shared ACPI power resources in the PCI bus
     type layer (Mika Westerberg).

   - Make the PCI layer take link delays required by the PCIe spec into
     account as appropriate and avoid polling devices in D3cold for PME
     (Mika Westerberg).

   - Fix some corner case issues in ACPI device power management and in
     the PCI bus type layer, optimiza and clean up the handling of
     runtime-suspended PCI devices during system-wide transitions to
     sleep states (Rafael Wysocki).

   - Rework hibernation handling in the ACPI core and the PCI bus type
     to resume runtime-suspended devices before hibernation (which
     allows some functional problems to be avoided) and fix some ACPI
     power management issues related to hiberation (Rafael Wysocki).

   - Extend the operating performance points (OPP) framework to support
     a wider range of devices (Rajendra Nayak, Stehpen Boyd).

   - Fix issues related to genpd_virt_devs and issues with platforms
     using the set_opp() callback in the OPP framework (Viresh Kumar,
     Dmitry Osipenko).

   - Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).

   - Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).

   - Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210, and
     armada-37xx cpufreq drivers (David Arcari, Florian Fainelli, Paweł
     Chmiel, YueHaibing).

   - Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).

   - Fix minor issue in the ACPI system sleep support code and export
     one function from it (Lenny Szubowicz, Dexuan Cui).

   - Clean up assorted pieces of PM code and documentation (Kefeng Wang,
     Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
     Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).

   - Update the pm-graph utility to v5.4 (Todd Brandt).

   - Fix and clean up the cpupower utility (Abhishek Goel, Nick Black)"

* tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (57 commits)
  ACPI: PM: Make acpi_sleep_state_supported() non-static
  PM: sleep: Drop dev_pm_skip_next_resume_phases()
  ACPI: PM: Unexport acpi_device_get_power()
  Documentation: ABI: power: Add missing newline at end of file
  ACPI: PM: Drop unused function and function header
  ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
  ACPI: PM: Simplify and fix PM domain hibernation callbacks
  PCI: PM: Simplify bus-level hibernation callbacks
  PM: ACPI/PCI: Resume all devices during hibernation
  cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()
  cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()
  kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
  cpufreq: Don't skip frequency validation for has_target() drivers
  PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete()
  PCI / ACPI: Add _PR0 dependent devices
  ACPI / PM: Introduce concept of a _PR0 dependent device
  PCI / ACPI: Use cached ACPI device state to get PCI device power state
  ACPI: PM: Allow transitions to D0 to occur in special cases
  ACPI: PM: Avoid evaluating _PS3 on transitions from D3hot to D3cold
  cpufreq: Use has_target() instead of !setpolicy
  ...
2019-07-09 10:05:22 -07:00
Linus Torvalds
dad1c12ed8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Remove the unused per rq load array and all its infrastructure, by
   Dietmar Eggemann.

 - Add utilization clamping support by Patrick Bellasi. This is a
   refinement of the energy aware scheduling framework with support for
   boosting of interactive and capping of background workloads: to make
   sure critical GUI threads get maximum frequency ASAP, and to make
   sure background processing doesn't unnecessarily move to cpufreq
   governor to higher frequencies and less energy efficient CPU modes.

 - Add the bare minimum of tracepoints required for LISA EAS regression
   testing, by Qais Yousef - which allows automated testing of various
   power management features, including energy aware scheduling.

 - Restructure the former tsk_nr_cpus_allowed() facility that the -rt
   kernel used to modify the scheduler's CPU affinity logic such as
   migrate_disable() - introduce the task->cpus_ptr value instead of
   taking the address of &task->cpus_allowed directly - by Sebastian
   Andrzej Siewior.

 - Misc optimizations, fixes, cleanups and small enhancements - see the
   Git log for details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  sched/uclamp: Add uclamp support to energy_compute()
  sched/uclamp: Add uclamp_util_with()
  sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
  sched/uclamp: Set default clamps for RT tasks
  sched/uclamp: Reset uclamp values on RESET_ON_FORK
  sched/uclamp: Extend sched_setattr() to support utilization clamping
  sched/core: Allow sched_setattr() to use the current policy
  sched/uclamp: Add system default clamps
  sched/uclamp: Enforce last task's UCLAMP_MAX
  sched/uclamp: Add bucket local max tracking
  sched/uclamp: Add CPU's clamp buckets refcounting
  sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
  sched/debug: Export the newly added tracepoints
  sched/debug: Add sched_overutilized tracepoint
  sched/debug: Add new tracepoint to track PELT at se level
  sched/debug: Add new tracepoints to track PELT at rq level
  sched/debug: Add a new sched_trace_*() helper functions
  sched/autogroup: Make autogroup_path() always available
  sched/wait: Deduplicate code with do-while
  sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
  ...
2019-07-08 16:39:53 -07:00
Rafael J. Wysocki
3dbeb44854 Merge branch 'pm-sleep'
* pm-sleep:
  PM: sleep: Drop dev_pm_skip_next_resume_phases()
  ACPI: PM: Drop unused function and function header
  ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
  ACPI: PM: Simplify and fix PM domain hibernation callbacks
  PCI: PM: Simplify bus-level hibernation callbacks
  PM: ACPI/PCI: Resume all devices during hibernation
  kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
  PM: sleep: Update struct wakeup_source documentation
  drivers: base: power: remove wakeup_sources_stats_dentry variable
  PM: suspend: Rename pm_suspend_via_s2idle()
  PM: sleep: Show how long dpm_suspend_start() and dpm_suspend_end() take
  PM: hibernate: powerpc: Expose pfn_is_nosave() prototype
2019-07-08 10:51:25 +02:00
Fuqian Huang
2f02a7ecd5 kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
Use zeroing allocator instead of using allocator
followed with memset with 0

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-28 10:20:39 +02:00
Rafael J. Wysocki
471a739a47 PCI: PM: Avoid skipping bus-level PM on platforms without ACPI
There are platforms that do not call pm_set_suspend_via_firmware(),
so pm_suspend_via_firmware() returns 'false' on them, but the power
states of PCI devices (PCIe ports in particular) are changed as a
result of powering down core platform components during system-wide
suspend.  Thus the pm_suspend_via_firmware() checks in
pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by
commit 3e26c5feed ("PCI: PM: Skip devices in D0 for suspend-to-
idle") are not sufficient to determine that devices left in D0
during suspend will remain in D0 during resume and so the bus-level
power management can be skipped for them.

For this reason, introduce a new global suspend flag,
PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only
and replace the pm_suspend_via_firmware() checks mentioned above
with checks against this flag.

Fixes: 3e26c5feed ("PCI: PM: Skip devices in D0 for suspend-to-idle")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-26 23:51:56 +02:00
Vincent Guittot
8ec59c0f5f sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is
unused since commit:

  765d0af19f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")

Remove it.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: linux@armlinux.org.uk
Cc: quentin.perret@arm.com
Cc: rafael@kernel.org
Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24 19:23:39 +02:00
Thomas Gleixner
8092f73c51 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248
Based on 1 normalized pattern(s):

  this file is released under the gpl v2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 3 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204655.103854853@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:08 +02:00
Rafael J. Wysocki
0b385a0c3b PM: suspend: Rename pm_suspend_via_s2idle()
The name of pm_suspend_via_s2idle() is confusing, as it doesn't
reflect the purpose of the function precisely enough and it is
very similar to pm_suspend_via_firmware(), which has a different
purpose, so rename it as pm_suspend_default_s2idle() and update
its only caller, i8042_register_ports(), accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-06-19 11:41:26 +02:00
Mauro Carvalho Chehab
151f4e2bdc docs: power: convert docs to ReST and rename to *.rst
Convert the PM documents to ReST, in order to allow them to
build with Sphinx.

The conversion is actually:
  - add blank lines and indentation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
2019-06-14 16:08:36 -05:00
Mathieu Malaterre
1ec0cd8286 PM: hibernate: powerpc: Expose pfn_is_nosave() prototype
The declaration for pfn_is_nosave is only available in
kernel/power/power.h. Since this function can be override in arch,
expose it globally. Having a prototype will make sure to avoid warning
(sometime treated as error with W=1) such as:

  arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]

This moves the declaration into a globally visible header file and add
missing include to avoid a warning on powerpc.

Also remove the duplicated prototypes since not required anymore.

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-14 10:48:56 +02:00
Linus Torvalds
9331b6740f SPDX update for 5.2-rc4
Another round of SPDX header file fixes for 5.2-rc4
 
 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
 added, based on the text in the files.  We are slowly chipping away at
 the 700+ different ways people tried to write the license text.  All of
 these were reviewed on the spdx mailing list by a number of different
 people.
 
 We now have over 60% of the kernel files covered with SPDX tags:
 	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
 	Files checked:            64533
 	Files with SPDX:          40392
 	Files with errors:            0
 
 I think the majority of the "easy" fixups are now done, it's now the
 start of the longer-tail of crazy variants to wade through.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
 sxVEiFZo8ZU9v1IoRb1I
 =qU++
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull yet more SPDX updates from Greg KH:
 "Another round of SPDX header file fixes for 5.2-rc4

  These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
  added, based on the text in the files. We are slowly chipping away at
  the 700+ different ways people tried to write the license text. All of
  these were reviewed on the spdx mailing list by a number of different
  people.

  We now have over 60% of the kernel files covered with SPDX tags:
	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
	Files checked:            64533
	Files with SPDX:          40392
	Files with errors:            0

  I think the majority of the "easy" fixups are now done, it's now the
  start of the longer-tail of crazy variants to wade through"

* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
  ...
2019-06-08 12:52:42 -07:00
Rafael J. Wysocki
a964d23c94 Merge branch 'pm-x86'
* pm-x86:
  x86/power: Fix 'nosmt' vs hibernation triple fault during resume
  x86: intel_epb: Do not build when CONFIG_PM is unset
2019-06-07 10:48:57 +02:00
Thomas Gleixner
55716d2643 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428
Based on 1 normalized pattern(s):

  this file is released under the gplv2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 68 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:16 +02:00
Jiri Kosina
ec527c3180 x86/power: Fix 'nosmt' vs hibernation triple fault during resume
As explained in

	0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")

we always, no matter what, have to bring up x86 HT siblings during boot at
least once in order to avoid first MCE bringing the system to its knees.

That means that whenever 'nosmt' is supplied on the kernel command-line,
all the HT siblings are as a result sitting in mwait or cpudile after
going through the online-offline cycle at least once.

This causes a serious issue though when a kernel, which saw 'nosmt' on its
commandline, is going to perform resume from hibernation: if the resume
from the hibernated image is successful, cr3 is flipped in order to point
to the address space of the kernel that is being resumed, which in turn
means that all the HT siblings are all of a sudden mwaiting on address
which is no longer valid.

That results in triple fault shortly after cr3 is switched, and machine
reboots.

Fix this by always waking up all the SMT siblings before initiating the
'restore from hibernation' process; this guarantees that all the HT
siblings will be properly carried over to the resumed kernel waiting in
resume_play_dead(), and acted upon accordingly afterwards, based on the
target kernel configuration.

Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
again in case it has SMT disabled; this means it has to online all
the siblings when resuming (so that they come out of hlt) and offline
them again to let them reach mwait.

Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-03 12:02:03 +02:00
Rafael J. Wysocki
a613734761 PM: sleep: Add kerneldoc comments to some functions
Add kerneldoc comments to pm_suspend_via_firmware(),
pm_resume_via_firmware() and pm_suspend_via_s2idle()
to explain what they do.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-03 10:44:21 +02:00
Rafael J. Wysocki
bb1869012d ACPI: PM: Call pm_set_suspend_via_firmware() during hibernation
On systems with ACPI platform firmware the last stage of hibernation
is analogous to system suspend to S3 (suspend-to-RAM), so it should
be handled analogously.  In particular, pm_suspend_via_firmware()
should return 'true' in that stage to let the callers of it know that
control will be passed to the platform firmware going forward, so
pm_set_suspend_via_firmware() needs to be called then in analogy with
acpi_suspend_begin().

However, the platform hibernation ->begin() callback is invoked
during the "freeze" transition (before creating a snapshot image of
system memory) as well as during the "hibernate" transition which is
the last stage of it and pm_set_suspend_via_firmware() should be
invoked by that callback in the latter stage only.

In order to implement that redefine the hibernation ->begin()
callback to take a pm_message_t argument to indicate which stage
of hibernation is taking place and rework acpi_hibernation_begin()
and acpi_hibernation_begin_old() to take it into account as needed.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-05-27 10:51:45 +02:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Thomas Gleixner
457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Linus Torvalds
8f5e823f91 Power management updates for 5.2-rc1
- Fix the handling of Performance and Energy Bias Hint (EPB) on
    Intel processors and expose it to user space via sysfs to avoid
    having to access it through the generic MSR I/F (Rafael Wysocki).
 
  - Improve the handling of global turbo changes made by the platform
    firmware in the intel_pstate driver (Rafael Wysocki).
 
  - Convert some slow-path static_cpu_has() callers to boot_cpu_has()
    in cpufreq (Borislav Petkov).
 
  - Fix the frequency calculation loop in the armada-37xx cpufreq
    driver (Gregory CLEMENT).
 
  - Fix possible object reference leaks in multuple cpufreq drivers
    (Wen Yang).
 
  - Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
 
  - Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
    Kumar).
 
  - Add support for lx2160a and ls1028a to the qoriq cpufreq driver
    (Vabhav Sharma, Yuantian Tang).
 
  - Fix kobject memory leak in the cpufreq core (Viresh Kumar).
 
  - Simplify the IOwait boosting in the schedutil cpufreq governor
    and rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
 
  - Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
 
  - Improve the cpufreq documentation, add SPDX license tags to
    some PM documentation files and unify copyright notices in
    them (Rafael Wysocki).
 
  - Add support for "CPU" domains to the generic power domains (genpd)
    framework and provide low-level PSCI firmware support for that
    feature (Ulf Hansson).
 
  - Rearrange the PSCI firmware support code and add support for
    SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
 
  - Improve genpd support for devices in multiple power domains (Ulf
    Hansson).
 
  - Unify target residency for the AFTR and coupled AFTR states in the
    exynos cpuidle driver (Marek Szyprowski).
 
  - Introduce new helper routine in the operating performance points
    (OPP) framework (Andrew-sh.Cheng).
 
  - Add support for passing on-die termination (ODT) and auto power
    down parameters from the kernel to Trusted Firmware-A (TF-A) to
    the rk3399_dmc devfreq driver (Enric Balletbo i Serra).
 
  - Add tracing to devfreq (Lukasz Luba).
 
  - Make the exynos-bus devfreq driver suspend all devices on system
    shutdown (Marek Szyprowski).
 
  - Fix a few minor issues in the devfreq subsystem and clean it up
    somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
    Saravana Kannan, Yangtao Li).
 
  - Improve system wakeup diagnostics (Stephen Boyd).
 
  - Rework filesystem sync messages emitted during system suspend and
    hibernation (Harry Pan).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAlzQEwUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxxXwP/jrxikIXdCOV3CJVioV0NetyebwlOqYp
 UsIA7lQBfZ/DY6dHw/oKuAT9LP01vcFg6XGe83Alkta9qczR5KZ/MYHFNSZXjXjL
 kEvIMBCS/oykaBuW+Xn9am8Ke3Yq/rBSTKWVom3vzSQY0qvZ9GBwPDrzw+k63Zhz
 P3afB4ThyY0e9ftgw4HvSSNm13Kn0ItUIQOdaLatXMMcPqP5aAdnUma5Ibinbtpp
 rpTHuHKYx7MSjaCg6wl3kKTJeWbQP4wYO2ISZqH9zEwQgdvSHeFAvfPKTegUkmw9
 uUsQnPD1JvdglOKovr2muehD1Ur+zsjKDf2OKERkWsWXHPyWzA/AqaVv1mkkU++b
 KaWaJ9pE86kGlJ3EXwRbGfV0dM5rrl+dUUQW6nPI1XJnIOFlK61RzwAbqI26F0Mz
 AlKxY4jyPLcM3SpQz9iILqyzHQqB67rm29XvId/9scoGGgoqEI4S+v6LYZqI3Vx6
 aeSRu+Yof7p5w4Kg5fODX+HzrtMnMrPmLUTXhbExfsYZMi7hXURcN6s+tMpH0ckM
 4yiIpnNGCKUSV4vxHBm8XJdAuUnR4Vcz++yFslszgDVVvw5tkvF7SYeHZ6HqcQVm
 af9HdWzx3qajs/oyBwdRBedZYDnP1joC5donBI2ofLeF33NA7TEiPX8Zebw8XLkv
 fNikssA7PGdv
 =nY9p
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These fix the (Intel-specific) Performance and Energy Bias Hint (EPB)
  handling and expose it to user space via sysfs, fix and clean up
  several cpufreq drivers, add support for two new chips to the qoriq
  cpufreq driver, fix, simplify and clean up the cpufreq core and the
  schedutil governor, add support for "CPU" domains to the generic power
  domains (genpd) framework and provide low-level PSCI firmware support
  for that feature, fix the exynos cpuidle driver and fix a couple of
  issues in the devfreq subsystem and clean it up.

  Specifics:

   - Fix the handling of Performance and Energy Bias Hint (EPB) on Intel
     processors and expose it to user space via sysfs to avoid having to
     access it through the generic MSR I/F (Rafael Wysocki).

   - Improve the handling of global turbo changes made by the platform
     firmware in the intel_pstate driver (Rafael Wysocki).

   - Convert some slow-path static_cpu_has() callers to boot_cpu_has()
     in cpufreq (Borislav Petkov).

   - Fix the frequency calculation loop in the armada-37xx cpufreq
     driver (Gregory CLEMENT).

   - Fix possible object reference leaks in multuple cpufreq drivers
     (Wen Yang).

   - Fix kerneldoc comment in the centrino cpufreq driver (dongjian).

   - Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
     Kumar).

   - Add support for lx2160a and ls1028a to the qoriq cpufreq driver
     (Vabhav Sharma, Yuantian Tang).

   - Fix kobject memory leak in the cpufreq core (Viresh Kumar).

   - Simplify the IOwait boosting in the schedutil cpufreq governor and
     rework the TSC cpufreq notifier on x86 (Rafael Wysocki).

   - Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).

   - Improve the cpufreq documentation, add SPDX license tags to some PM
     documentation files and unify copyright notices in them (Rafael
     Wysocki).

   - Add support for "CPU" domains to the generic power domains (genpd)
     framework and provide low-level PSCI firmware support for that
     feature (Ulf Hansson).

   - Rearrange the PSCI firmware support code and add support for
     SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).

   - Improve genpd support for devices in multiple power domains (Ulf
     Hansson).

   - Unify target residency for the AFTR and coupled AFTR states in the
     exynos cpuidle driver (Marek Szyprowski).

   - Introduce new helper routine in the operating performance points
     (OPP) framework (Andrew-sh.Cheng).

   - Add support for passing on-die termination (ODT) and auto power
     down parameters from the kernel to Trusted Firmware-A (TF-A) to the
     rk3399_dmc devfreq driver (Enric Balletbo i Serra).

   - Add tracing to devfreq (Lukasz Luba).

   - Make the exynos-bus devfreq driver suspend all devices on system
     shutdown (Marek Szyprowski).

   - Fix a few minor issues in the devfreq subsystem and clean it up
     somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
     Saravana Kannan, Yangtao Li).

   - Improve system wakeup diagnostics (Stephen Boyd).

   - Rework filesystem sync messages emitted during system suspend and
     hibernation (Harry Pan)"

* tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
  cpufreq: Fix kobject memleak
  cpufreq: armada-37xx: fix frequency calculation for opp
  cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment
  cpufreq: qoriq: add support for lx2160a
  x86: tsc: Rework time_cpufreq_notifier()
  PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name()
  PM / Domains: Search for the CPU device outside the genpd lock
  PM / Domains: Drop unused in-parameter to some genpd functions
  PM / Domains: Use the base device for driver_deferred_probe_check_state()
  cpufreq: qoriq: Add ls1028a chip support
  PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain
  PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev()
  PM / Domains: Don't kfree() the virtual device in the error path
  cpufreq: Move ->get callback check outside of __cpufreq_get()
  PM / Domains: remove unnecessary unlikely()
  cpufreq: Remove needless bios_limit check in show_bios_limit()
  drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning
  firmware/psci: add support for SYSTEM_RESET2
  PM / devfreq: add tracing for scheduling work
  trace: events: add devfreq trace event file
  ...
2019-05-06 19:40:31 -07:00
Linus Torvalds
0bc40e549a Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The changes in here are:

   - text_poke() fixes and an extensive set of executability lockdowns,
     to (hopefully) eliminate the last residual circumstances under
     which we are using W|X mappings even temporarily on x86 kernels.
     This required a broad range of surgery in text patching facilities,
     module loading, trampoline handling and other bits.

   - tweak page fault messages to be more informative and more
     structured.

   - remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
     default.

   - reduce KASLR granularity on 5-level paging kernels from 512 GB to
     1 GB.

   - misc other changes and updates"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/mm: Initialize PGD cache during mm initialization
  x86/alternatives: Add comment about module removal races
  x86/kprobes: Use vmalloc special flag
  x86/ftrace: Use vmalloc special flag
  bpf: Use vmalloc special flag
  modules: Use vmalloc special flag
  mm/vmalloc: Add flag for freeing of special permsissions
  mm/hibernation: Make hibernation handle unmapped pages
  x86/mm/cpa: Add set_direct_map_*() functions
  x86/alternatives: Remove the return value of text_poke_*()
  x86/jump-label: Remove support for custom text poker
  x86/modules: Avoid breaking W^X while loading modules
  x86/kprobes: Set instruction page as executable
  x86/ftrace: Set trampoline pages as executable
  x86/kgdb: Avoid redundant comparison of patched code
  x86/alternatives: Use temporary mm for text poking
  x86/alternatives: Initialize temporary mm for patching
  fork: Provide a function for copying init_mm
  uprobes: Initialize uprobes earlier
  x86/mm: Save debug registers when loading a temporary mm
  ...
2019-05-06 16:13:31 -07:00
Nicholas Piggin
9ca12ac04b kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze
This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
opt-in to allowing suspend to occur on one of the housekeeping CPUs
rather than hardcoded CPU0.

This will allow CPU0 to be a nohz_full CPU with a later change.

It may be possible for platforms with hardware/firmware restrictions
on suspend/wake effectively support this by handing off the final
stage to CPU0 when kernel housekeeping is no longer required. Another
option is to make housekeeping / nohz_full mask dynamic at runtime,
but the complexity could not be justified at this time.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-4-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03 19:42:58 +02:00
Nicholas Piggin
2f1a6fbbef power/suspend: Add function to disable secondaries for suspend
This adds a function to disable secondary CPUs for suspend that are
not necessarily non-zero / non-boot CPUs. Platforms will be able to
use this to suspend using non-zero CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-3-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03 19:42:41 +02:00
Rick Edgecombe
d633269286 mm/hibernation: Make hibernation handle unmapped pages
Make hibernate handle unmapped pages on the direct map when
CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages
to invalid configurations, so now hibernate should check if the pages have
valid mappings and handle if they are unmapped when doing a hibernate
save operation.

Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y
was configured. It does not appear to have a big hibernating performance
impact. The speed of the saving operation before this change was measured
as 819.02 MB/s, and after was measured at 813.32 MB/s.

Before:
[    4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s)

After:
[    4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s)

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-16-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30 12:37:57 +02:00
Harry Pan
c64546b17b PM / sleep: Measure the time of filesystems syncing
Measure the filesystems sync time during system sleep more precisely.

Among other things, this allows the pr_cont() to be dropped from
ksys_sync_helper() and makes automatic system suspend and hibernation
profiling somewhat more straightforward.

Signed-off-by: Harry Pan <harry.pan@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-02 10:53:19 +02:00
Harry Pan
b5dee3130b PM / sleep: Refactor filesystems sync to reduce duplication
Create a common helper to sync filesystems for system suspend and
hibernation.

Signed-off-by: Harry Pan <harry.pan@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-02 10:53:19 +02:00
Mike Rapoport
8a7f97b902 treewide: add checks for the return value of memblock_alloc*()
Add check for the return value of memblock_alloc*() functions and call
panic() in case of error.  The panic message repeats the one used by
panicing memblock allocators with adjustment of parameters to include
only relevant ones.

The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.

  @@
  expression ptr, size, align;
  @@
  ptr = memblock_alloc(size, align);
  + if (!ptr)
  + 	panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);

[anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
  Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
[rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
  Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
[rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
  Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
[akpm@linux-foundation.org: fix xtensa printk warning]
Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Guo Ren <ren_guo@c-sky.com>		[c-sky]
Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390]
Reviewed-by: Juergen Gross <jgross@suse.com>		[Xen]
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Max Filippov <jcmvbkbc@gmail.com>		[xtensa]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Linus Torvalds
ef8006846a Power management updates for 5.1-rc1
- Update the PM-runtime framework to use ktime instead of
    jiffies for accounting (Thara Gopinath, Vincent Guittot).
 
  - Optimize the autosuspend code in the PM-runtime framework
    somewhat (Ladislav Michl).
 
  - Add a PM core flag to mark devices that don't need any form of
    power management (Sudeep Holla).
 
  - Introduce driver API documentation for cpuidle and add a new
    cpuidle governor for tickless systems (Rafael Wysocki).
 
  - Add Jacobsville support to the intel_idle driver (Zhang Rui).
 
  - Clean up a cpuidle core header file and the cpuidle-dt and ACPI
    processor-idle drivers (Yangtao Li, Joseph Lo, Yazen Ghannam).
 
  - Add new cpufreq driver for Armada 8K (Gregory Clement).
 
  - Fix and clean up cpufreq core (Rafael Wysocki, Viresh Kumar,
    Amit Kucheria).
 
  - Add support for light-weight tear-down and bring-up of CPUs to the
    cpufreq core and use it in the cpufreq-dt driver (Viresh Kumar).
 
  - Fix cpu_cooling Kconfig dependencies, add support for CPU cooling
    auto-registration to the cpufreq core and use it in multiple
    cpufreq drivers (Amit Kucheria).
 
  - Fix some minor issues and do some cleanups in the davinci,
    e_powersaver, ap806, s5pv210, qcom and kryo cpufreq drivers
    (Bartosz Golaszewski, Gustavo Silva, Julia Lawall, Paweł Chmiel,
    Taniya Das, Viresh Kumar).
 
  - Add a Hisilicon CPPC quirk to the cppc_cpufreq driver (Xiongfeng
    Wang).
 
  - Clean up the intel_pstate and acpi-cpufreq drivers (Erwan Velu,
    Rafael Wysocki).
 
  - Clean up multiple cpufreq drivers (Yangtao Li).
 
  - Update cpufreq-related MAINTAINERS entries (Baruch Siach, Lukas
    Bulwahn).
 
  - Add support for exposing the Energy Model via debugfs and make
    multiple cpufreq drivers register an Energy Model to support
    energy-aware scheduling (Quentin Perret, Dietmar Eggemann,
    Matthias Kaehlcke).
 
  - Add Ice Lake mobile and Jacobsville support to the Intel RAPL
    power-capping driver (Gayatri Kammela, Zhang Rui).
 
  - Add a power estimation helper to the operating performance points
    (OPP) framework and clean up a core function in it (Quentin Perret,
    Viresh Kumar).
 
  - Make minor improvements in the generic power domains (genpd), OPP
    and system suspend frameworks and in the PM core (Aditya Pakki,
    Douglas Anderson, Greg Kroah-Hartman, Rafael Wysocki, Yangtao Li).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJcfSGlAAoJEILEb/54YlRxikwP/1rQ9+HqDmDUvO2QeYREGO/m
 R4kK+iUQW7O4ZJzsSvoGyuKCl7c2ANPlJWmbsEZKbevpKZ4XuUcv/CJDqKD1izV7
 hfsQyum34ePSCUEMf6CpMAGAkdmK//NVysHiLXZ4j1hhzi6gA6Cm50qyNZ8xX6kF
 Ri6zYG5x7nhn/o/l569FDe+K5W/LDDaZUmvr858pPsrZZR5c4p3ylq+HBrZt0FPQ
 70D+u7RcT5v3DQLTghNrgHHiOJ0/DQM43I7aZvkKM3JA8BCDou/Nvq+gH0C0YUP0
 QE+oFK9C8CBPEz9N9cSMTb0+S78GQNB0GntJPDN3QQFCHRe6EYKUtu6CvllIE1v9
 5pFfagXGVi9UmShu80v+qGGUILVK1ZJ5fjSyxx4UcneTsarNJZg7Y7d72mrX+0zi
 J3KodcqQi295jNq9P55K/9XtAiRdpRR6bQzXBtrprpw8PA94yqBHPpxbD32Wl05/
 U2+ss/SNyMAzhsP9kqzxSxPBlTFek/ArxZm0Uk4kHt75gkl09CG64r+6OG8gLtwD
 Skkr02AeYvx6fx0kFnKIS4sc2c2/8xW3FUtHlv+TDPvuzCEaL0ooqsWgt7rcwlmg
 Xz5ufXbEIiVSlLlH/YGZxbgy+WfIzYA5WMpYrA1Givn8s5jI9Sm+ROD2qhOKA2n4
 aekEDkum/bxVVeykZaXy
 =TSKG
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are PM-runtime framework changes to use ktime instead of jiffies
  for accounting, new PM core flag to mark devices that don't need any
  form of power management, cpuidle updates including driver API
  documentation and a new governor, cpufreq updates including a new
  driver for Armada 8K, thermal cleanups and more, some energy-aware
  scheduling (EAS) enabling changes, new chips support in the intel_idle
  and RAPL drivers and assorted cleanups in some other places.

  Specifics:

   - Update the PM-runtime framework to use ktime instead of jiffies for
     accounting (Thara Gopinath, Vincent Guittot)

   - Optimize the autosuspend code in the PM-runtime framework somewhat
     (Ladislav Michl)

   - Add a PM core flag to mark devices that don't need any form of
     power management (Sudeep Holla)

   - Introduce driver API documentation for cpuidle and add a new
     cpuidle governor for tickless systems (Rafael Wysocki)

   - Add Jacobsville support to the intel_idle driver (Zhang Rui)

   - Clean up a cpuidle core header file and the cpuidle-dt and ACPI
     processor-idle drivers (Yangtao Li, Joseph Lo, Yazen Ghannam)

   - Add new cpufreq driver for Armada 8K (Gregory Clement)

   - Fix and clean up cpufreq core (Rafael Wysocki, Viresh Kumar, Amit
     Kucheria)

   - Add support for light-weight tear-down and bring-up of CPUs to the
     cpufreq core and use it in the cpufreq-dt driver (Viresh Kumar)

   - Fix cpu_cooling Kconfig dependencies, add support for CPU cooling
     auto-registration to the cpufreq core and use it in multiple
     cpufreq drivers (Amit Kucheria)

   - Fix some minor issues and do some cleanups in the davinci,
     e_powersaver, ap806, s5pv210, qcom and kryo cpufreq drivers
     (Bartosz Golaszewski, Gustavo Silva, Julia Lawall, Paweł Chmiel,
     Taniya Das, Viresh Kumar)

   - Add a Hisilicon CPPC quirk to the cppc_cpufreq driver (Xiongfeng
     Wang)

   - Clean up the intel_pstate and acpi-cpufreq drivers (Erwan Velu,
     Rafael Wysocki)

   - Clean up multiple cpufreq drivers (Yangtao Li)

   - Update cpufreq-related MAINTAINERS entries (Baruch Siach, Lukas
     Bulwahn)

   - Add support for exposing the Energy Model via debugfs and make
     multiple cpufreq drivers register an Energy Model to support
     energy-aware scheduling (Quentin Perret, Dietmar Eggemann, Matthias
     Kaehlcke)

   - Add Ice Lake mobile and Jacobsville support to the Intel RAPL
     power-capping driver (Gayatri Kammela, Zhang Rui)

   - Add a power estimation helper to the operating performance points
     (OPP) framework and clean up a core function in it (Quentin Perret,
     Viresh Kumar)

   - Make minor improvements in the generic power domains (genpd), OPP
     and system suspend frameworks and in the PM core (Aditya Pakki,
     Douglas Anderson, Greg Kroah-Hartman, Rafael Wysocki, Yangtao Li)"

* tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (80 commits)
  cpufreq: kryo: Release OPP tables on module removal
  cpufreq: ap806: add missing of_node_put after of_device_is_available
  cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
  cpufreq: Pass updated policy to driver ->setpolicy() callback
  cpufreq: Fix two debug messages in cpufreq_set_policy()
  cpufreq: Reorder and simplify cpufreq_update_policy()
  cpufreq: Add kerneldoc comments for two core functions
  PM / core: Add support to skip power management in device/driver model
  cpufreq: intel_pstate: Rework iowait boosting to be less aggressive
  cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate()
  cpufreq: intel_pstate: Avoid redundant initialization of local vars
  powercap/intel_rapl: add Ice Lake mobile
  ACPI / processor: Set P_LVL{2,3} idle state descriptions
  cpufreq / cppc: Work around for Hisilicon CPPC cpufreq
  ACPI / CPPC: Add a helper to get desired performance
  cpufreq: davinci: move configuration to include/linux/platform_data
  cpufreq: speedstep: convert BUG() to BUG_ON()
  cpufreq: powernv: fix missing check of return value in init_powernv_pstates()
  cpufreq: longhaul: remove unneeded semicolon
  cpufreq: pcc-cpufreq: remove unneeded semicolon
  ..
2019-03-06 12:59:46 -08:00
David Hildenbrand
abd02ac616 PM/Hibernate: exclude all PageOffline() pages
The content of pages that are marked PG_offline is not of interest (e.g.
inflated by a balloon driver), let's skip these pages.

In saveable_highmem_page(), move the PageReserved() check to a new check
along with the PageOffline() check to separate it from the swsusp
checks.

[david@redhat.com: v2]
  Link: http://lkml.kernel.org/r/20181122100627.5189-9-david@redhat.com
Link: http://lkml.kernel.org/r/20181119101616.8901-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Freche <jfreche@vmware.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:14 -08:00
David Hildenbrand
5b56db3721 PM/Hibernate: use pfn_to_online_page()
Let's use pfn_to_online_page() instead of pfn_to_page() when checking
for saveable pages to not save/restore offline memory sections.

Link: http://lkml.kernel.org/r/20181119101616.8901-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Freche <jfreche@vmware.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:14 -08:00
Rafael J. Wysocki
c3739c50ef Merge branches 'pm-core', 'pm-sleep', 'pm-qos', 'pm-domains' and 'pm-em'
* pm-core:
  PM / core: Add support to skip power management in device/driver model
  PM / suspend: Print debug messages for device using direct-complete
  PM-runtime: update time accounting only when enabled
  PM-runtime: Switch accounting over to ktime_get_mono_fast_ns()
  PM-runtime: Optimize pm_runtime_autosuspend_expiration()
  PM-runtime: Replace jiffies-based accounting with ktime-based accounting
  PM-runtime: update accounting_timestamp on enable
  PM: clock_ops: fix missing clk_prepare() return value check
  drm/i915: Move on the new pm runtime interface
  PM-runtime: Add new interface to get accounted time

* pm-sleep:
  PM / wakeup: fix kerneldoc comment for pm_wakeup_dev_event()

* pm-qos:
  PM: QoS: no need to check return value of debugfs_create functions

* pm-domains:
  PM / Domains: Mark "name" const in dev_pm_domain_attach_by_name()
  PM / Domains: Mark "name" const in genpd_dev_pm_attach_by_name()
  PM: domains: no need to check return value of debugfs_create functions

* pm-em:
  PM / EM: Expose the Energy Model in debugfs
2019-03-04 11:18:28 +01:00
Quentin Perret
9cac42d064 PM / EM: Expose the Energy Model in debugfs
The recently introduced Energy Model (EM) framework manages power cost
tables of CPUs. These tables are currently only visible from kernel
space. However, in order to debug the behaviour of subsystems that use
the EM (EAS for example), it is often required to know what the power
costs are from userspace.

For this reason, introduce under /sys/kernel/debug/energy_model a set of
directories representing the performance domains of the system. Each
performance domain contains a set of sub-directories representing the
different capacity states (cs) and their attributes, as well as a file
exposing the related CPUs.

The resulting hierarchy is as follows on Arm juno r0 for example:

    /sys/kernel/debug/energy_model
    ├── pd0
    │   ├── cpus
    │   ├── cs:450000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   ├── cs:575000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   ├── cs:700000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   ├── cs:775000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   └── cs:850000
    │       ├── cost
    │       ├── frequency
    │       └── power
    └── pd1
        ├── cpus
        ├── cs:1100000
        │   ├── cost
        │   ├── frequency
        │   └── power
        ├── cs:450000
        │   ├── cost
        │   ├── frequency
        │   └── power
        ├── cs:625000
        │   ├── cost
        │   ├── frequency
        │   └── power
        ├── cs:800000
        │   ├── cost
        │   ├── frequency
        │   └── power
        └── cs:950000
            ├── cost
            ├── frequency
            └── power

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-23 23:07:57 +01:00
Greg Kroah-Hartman
659dc4562c PM: QoS: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-22 23:12:12 +01:00
Arun KS
ca79b0c211 mm: convert totalram_pages and totalhigh_pages variables to atomic
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Linus Torvalds
17bf423a1f Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Introduce "Energy Aware Scheduling" - by Quentin Perret.

     This is a coherent topology description of CPUs in cooperation with
     the PM subsystem, with the goal to schedule more energy-efficiently
     on asymetric SMP platform - such as waking up tasks to the more
     energy-efficient CPUs first, as long as the system isn't
     oversubscribed.

     For details of the design, see:

        https://lore.kernel.org/lkml/20180724122521.22109-1-quentin.perret@arm.com/

   - Misc cleanups and smaller enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  sched/fair: Select an energy-efficient CPU on task wake-up
  sched/fair: Introduce an energy estimation helper function
  sched/fair: Add over-utilization/tipping point indicator
  sched/fair: Clean-up update_sg_lb_stats parameters
  sched/toplogy: Introduce the 'sched_energy_present' static key
  sched/topology: Make Energy Aware Scheduling depend on schedutil
  sched/topology: Disable EAS on inappropriate platforms
  sched/topology: Add lowest CPU asymmetry sched_domain level pointer
  sched/topology: Reference the Energy Model of CPUs when available
  PM: Introduce an Energy Model management framework
  sched/cpufreq: Prepare schedutil for Energy Aware Scheduling
  sched/topology: Relocate arch_scale_cpu_capacity() to the internal header
  sched/core: Remove unnecessary unlikely() in push_*_task()
  sched/topology: Remove the ::smt_gain field from 'struct sched_domain'
  sched: Fix various typos in comments
  sched/core: Clean up the #ifdef block in add_nr_running()
  sched/fair: Make some variables static
  sched/core: Create task_has_idle_policy() helper
  sched/fair: Add lsub_positive() and use it consistently
  sched/fair: Mask UTIL_AVG_UNCHANGED usages
  ...
2018-12-26 14:56:10 -08:00
Rafael J. Wysocki
442a5d000a Merge branches 'pm-core', 'pm-qos', 'pm-domains' and 'pm-sleep'
* pm-core:
  PM-runtime: Switch autosuspend over to using hrtimers

* pm-qos:
  PM / QoS: Change to use DEFINE_SHOW_ATTRIBUTE macro

* pm-domains:
  PM / Domains: remove define_genpd_open_function() and define_genpd_debugfs_fops()

* pm-sleep:
  PM / sleep: convert to DEFINE_SHOW_ATTRIBUTE
2018-12-21 10:06:44 +01:00
Yangtao Li
943a10f852 PM / sleep: convert to DEFINE_SHOW_ATTRIBUTE
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-12 23:23:58 +01:00
Quentin Perret
27871f7a8a PM: Introduce an Energy Model management framework
Several subsystems in the kernel (task scheduler and/or thermal at the
time of writing) can benefit from knowing about the energy consumed by
CPUs. Yet, this information can come from different sources (DT or
firmware for example), in different formats, hence making it hard to
exploit without a standard API.

As an attempt to address this, introduce a centralized Energy Model
(EM) management framework which aggregates the power values provided
by drivers into a table for each performance domain in the system. The
power cost tables are made available to interested clients (e.g. task
scheduler or thermal) via platform-agnostic APIs. The overall design
is represented by the diagram below (focused on Arm-related drivers as
an example, but applicable to any architecture):

     +---------------+  +-----------------+  +-------------+
     | Thermal (IPA) |  | Scheduler (EAS) |  |    Other    |
     +---------------+  +-----------------+  +-------------+
             |                   | em_pd_energy()   |
             |                   | em_cpu_get()     |
             +-----------+       |         +--------+
                         |       |         |
                         v       v         v
                      +---------------------+
                      |                     |
                      |    Energy Model     |
                      |                     |
                      |     Framework       |
                      |                     |
                      +---------------------+
                         ^       ^       ^
                         |       |       | em_register_perf_domain()
              +----------+       |       +---------+
              |                  |                 |
      +---------------+  +---------------+  +--------------+
      |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
      +---------------+  +---------------+  +--------------+
              ^                  ^                 ^
              |                  |                 |
      +--------------+   +---------------+  +--------------+
      | Device Tree  |   |   Firmware    |  |      ?       |
      +--------------+   +---------------+  +--------------+

Drivers (typically, but not limited to, CPUFreq drivers) can register
data in the EM framework using the em_register_perf_domain() API. The
calling driver must provide a callback function with a standardized
signature that will be used by the EM framework to build the power
cost tables of the performance domain. This design should offer a lot of
flexibility to calling drivers which are free of reading information
from any location and to use any technique to compute power costs.
Moreover, the capacity states registered by drivers in the EM framework
are not required to match real performance states of the target. This
is particularly important on targets where the performance states are
not known by the OS.

The power cost coefficients managed by the EM framework are specified in
milli-watts. Although the two potential users of those coefficients (IPA
and EAS) only need relative correctness, IPA specifically needs to
compare the power of CPUs with the power of other components (GPUs, for
example), which are still expressed in absolute terms in their
respective subsystems. Hence, specifying the power of CPUs in
milli-watts should help transitioning IPA to using the EM framework
without introducing new problems by keeping units comparable across
sub-systems.
On the longer term, the EM of other devices than CPUs could also be
managed by the EM framework, which would enable to remove the absolute
unit. However, this is not absolutely required as a first step, so this
extension of the EM framework is left for later.

On the client side, the EM framework offers APIs to access the power
cost tables of a CPU (em_cpu_get()), and to estimate the energy
consumed by the CPUs of a performance domain (em_pd_energy()). Clients
such as the task scheduler can then use these APIs to access the shared
data structures holding the Energy Model of CPUs.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:58 +01:00
Yangtao Li
96c6935212 PM / QoS: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-29 22:28:11 +01:00
Mike Rapoport
7e1c4e2792 memblock: stop using implicit alignment to SMP_CACHE_BYTES
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.

Implicit alignment is done deep in the memblock allocator and it can
come as a surprise.  Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.

Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.

For the case when memblock APIs are used via helper functions, e.g.  like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.

The direct memblock APIs users were updated using the semantic patch below:

@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)

[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
  Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00