mirror of
https://github.com/torvalds/linux.git
synced 2026-07-02 19:17:03 +02:00
3067a33d5f
24675 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
3067a33d5f |
sched/clock: Remove watchdog touching
Commit:
|
||
|
|
ac1e843f09 |
sched/clock: Remove unused argument to sched_clock_idle_wakeup_event()
The argument to sched_clock_idle_wakeup_event() has not been used in a long time. Remove it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
b421b22b00 |
x86/tsc, sched/clock, clocksource: Use clocksource watchdog to provide stable sync points
Currently we keep sched_clock_tick() active for stable TSC in order to keep the per-CPU state semi up-to-date. The (obvious) problem is that by the time we detect TSC is borked, our per-CPU state is also borked. So hook into the clocksource watchdog and call a method after we've found it to still be stable. There's the obvious race where the TSC goes wonky between finding it stable and us running the callback, but closing that is too much work and not really worth it, since we're already detecting TSC wobbles after the fact, so we cannot, per definition, fully avoid funny clock values. And since the watchdog runs less often than the tick, this is also an optimization. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
cf15ca8ded |
sched/clock: Initialize all per-CPU state before switching (back) to unstable
In preparation for not keeping the sched_clock_tick() active for stable TSC, we need to explicitly initialize all per-CPU state before switching back to unstable. Note: this patch looses the __gtod_offset calculation; it will be restored in the next one. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
625ed2bf04 |
sched/cfs: Make util/load_avg more stable
In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains to run. As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range [1002..1024[ whereas util_avg should stay at 1023. In order to keep the metric stable, we should not consider the ongoing time segment when computing load/util_avg but only the segments that have already fully elapsed. But to not consider the current time segment adds unwanted latency in the load/util_avg responsivness especially when the time is scaled instead of the contribution. Instead of waiting for the current time segment to have fully elapsed before accounting it in load/util_avg, we can already account the elapsed part but change the range used to compute load/util_avg accordingly. At the very beginning of a new time segment, the past segments have been decayed and the max value is LOAD_AVG_MAX*y. At the very end of the current time segment, the max value becomes: LOAD_AVG_MAX*y + 1024(us) (== LOAD_AVG_MAX) In fact, the max value is: LOAD_AVG_MAX*y + sa->period_contrib at any time in the time segment. Taking advantage of the fact that: LOAD_AVG_MAX*y == LOAD_AVG_MAX-1024 the range becomes [0..LOAD_AVG_MAX-1024+sa->period_contrib]. As the elapsed part is already accounted in load/util_sum, we update the max value according to the current position in the time segment instead of removing its contribution. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1493188076-2767-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
8663effb24 |
sched/core: Call __schedule() from do_idle() without enabling preemption
I finally got around to creating trampolines for dynamically allocated ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace function hook callbacks, like perf, that allocate the ftrace_ops descriptor via kmalloc() and friends, ftrace was not able to optimize the functions being traced to use a trampoline because they would also need to be allocated dynamically. The problem is that they cannot be freed when CONFIG_PREEMPT is set, as there's no way to tell if a task was preempted on the trampoline. That was before Paul McKenney implemented synchronize_rcu_tasks() that would make sure all tasks (except idle) have scheduled out or have entered user space. While testing this, I triggered this bug: BUG: unable to handle kernel paging request at ffffffffa0230077 ... RIP: 0010:0xffffffffa0230077 ... Call Trace: schedule+0x5/0xe0 schedule_preempt_disabled+0x18/0x30 do_idle+0x172/0x220 What happened was that the idle task was preempted on the trampoline. As synchronize_rcu_tasks() ignores the idle thread, there's nothing that lets ftrace know that the idle task was preempted on a trampoline. The idle task shouldn't need to ever enable preemption. The idle task is simply a loop that calls schedule or places the cpu into idle mode. In fact, having preemption enabled is inefficient, because it can happen when idle is just about to call schedule anyway, which would cause schedule to be called twice. Once for when the interrupt came in and was returning back to normal context, and then again in the normal path that the idle loop is running in, which would be pointless, as it had already scheduled. The only reason schedule_preempt_disable() enables preemption is to be able to call sched_submit_work(), which requires preemption enabled. As this is a nop when the task is in the RUNNING state, and idle is always in the running state, there's no reason that idle needs to enable preemption. But that means it cannot use schedule_preempt_disable() as other callers of that function require calling sched_submit_work(). Adding a new function local to kernel/sched/ that allows idle to call the scheduler without enabling preemption, fixes the synchronize_rcu_tasks() issue, as well as removes the pointless spurious schedule calls caused by interrupts happening in the brief window where preemption is enabled just before it calls schedule. Reviewed: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170414084809.3dacde2a@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
0538421343 |
gcov: support GCC 7.1
Starting from GCC 7.1, __gcov_exit is a new symbol expected to be implemented in a profiling runtime. [akpm@linux-foundation.org: coding-style fixes] [mliska@suse.cz: v2] Link: http://lkml.kernel.org/r/e63a3c59-0149-c97e-4084-20ca8f146b26@suse.cz Link: http://lkml.kernel.org/r/8c4084fa-3885-29fe-5fc4-0d4ca199c785@suse.cz Signed-off-by: Martin Liska <mliska@suse.cz> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
572e0ca9b9 |
time: delete current_fs_time()
All uses of the current_fs_time() function have been replaced by other time interfaces. And, its use cases can be fulfilled by current_time() or ktime_get_* variants. Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e0c4a5fc75 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates/fixes from Ingo Molnar: "Mostly tooling updates, but also two kernel fixes: a call chain handling robustness fix and an x86 PMU driver event definition fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/callchain: Force USER_DS when invoking perf_callchain_user() tools build: Fixup sched_getcpu feature test perf tests kmod-path: Don't fail if compressed modules aren't supported perf annotate: Fix AArch64 comment char perf tools: Fix spelling mistakes perf/x86: Fix Broadwell-EP DRAM RAPL events perf config: Refactor a duplicated code for obtaining config file name perf symbols: Allow user probes on versioned symbols perf symbols: Accept symbols starting at address 0 tools lib string: Adopt prefixcmp() from perf and subcmd perf units: Move parse_tag_value() to units.[ch] perf ui gtk: Move gtk .so name to the only place where it is used perf tools: Move HAS_BOOL define to where perl headers are used perf memswap: Split the byteswap memory range wrappers from util.[ch] perf tools: Move event prototypes from util.h to event.h perf buildid: Move prototypes from util.h to build-id.h |
||
|
|
1b84fc1503 |
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull stackprotector fixlet from Ingo Molnar: "A single fix/enhancement to increase stackprotector canary randomness on 64-bit kernels with very little cost" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms |
||
|
|
29250d301b |
This is a trivial patch that changes a check for a cpumask from a NULL
pointer to using cpumask_available(), which will do the check. This is because cpumasks when not allocated are always set, and clang complains about it. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJZEcUIFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L zygH/051hNj3aZlSZCD1pXwPkHgeVBn7lSB9k6hcJ5J/OknL/hNXws3Dv4Lb7Dzj cZhg62LTwwS6PVCJtOHk+PE/c5FIdY9o1mXJpAst6wbl9Sp1lzPJbFum45UadvWn UyU3RP0ncSgfojyrwIu6XyND7/NatdYk9irTMWL9+cDuy9xGvJgRX1sf7tXmxj4C AbZzQorDw7XDczDbvFM1XyPU3ApGUDqQ7VhCEBP6ivE+5Ceoo9xi/z7yfKyjLeb+ H7+/eA8ztaMLgTzLWwkFKdP/knqwPmAb+MHTR0DoLHcVe7fbbxFS7x+cfR8mfIA9 8tA5SUxc7bymRvDAcN2dMrtL7f8= =3hKI -----END PGP SIGNATURE----- Merge tag 'trace-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "This is a trivial patch that changes a check for a cpumask from a NULL pointer to using cpumask_available(), which will do the check. This is because cpumasks when not allocated are always set, and clang complains about it" * tag 'trace-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use cpumask_available() to check if cpumask variable may be used |
||
|
|
de4d195308 |
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ... |
||
|
|
2e4ab937ec |
More power management updates for v4.12-rc1
- Add Intel Gemini Lake CPU IDs to the intel_idle and intel_rapl
drivers (David Box).
- Add a NULL pointer check to the cpuidle core to prevent it from
crashing on platforms with incomplete cpuidle configuration (Fei
Li).
- Fix DT-related documentation in the generic power domains (genpd)
framework and add a MAINTAINERS entry for DT-related material in
genpd (Viresh Kumar).
- Update the system suspend/resume infrastructure to improve the
handling of aborts of suspend transitions in progress in the
wakeup framework and rework the suspend-to-idle core loop to make
it possible to filter out spurious wakeup events (specifically the
ones coming from ACPI) without resuming all the way up to user
space every time (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZEkVZAAoJEILEb/54YlRxypUP/RFSaiK7j/l9oBWfzzZYVqli
V5WtbaG04/2thHT7kycaxQLxMknT6MrhlHsGw9hM3VtLHHJBoBCYNybl8rTJlWdp
4eTe5wzYR+depKJcqo2DC4kiKDkghehG6QXG3dr40neV+hc+C54sssYddUXeK3Wm
nkAGqMEc/xjkul+DzmTy0KQd60cp9vWKedPkj1knrUpqkHa7fNj7YH3nX/OIXn+X
YMpj6hro5O2Zi3l2bz0M7x0aTqAynAOC+7vXTm+S4GDfTX/aPVpYonEt82N/NSGP
0esiLQDdX/52XAMwuIxyE092R4zatGMYTPLkG4SvZnkpdcnhUw5uCCJEUtBJYaKQ
5h9KaJ244XW1EPCrw/dzOnt4ysC7HXJBcqYSiysPlF+GB2vO8WxcOk0Duvzlyifp
9TPNTo+8h/uX5dCPDUfd4Usv7+yvDxhLJ635XAOsWlhhg8rm3RFYCUaG+snE7/Q0
GJVyNvywvxHu1CHC+bNeQ9OcMwzeiqn5CgBx2g2UjLlt9H8GS/8jsgQXnug1XPRC
wBVCYCQm9+JoKvKoCulov/KGAKQEXkJEDNZ0Ulc0vI0x1CWfEI1E07yHLEE8k+Ej
hcheMCFXlBgiIX6AeMK1asHNTcw4L/xo4gnrS9MS7DEOwZRqifI4CwLfU7iqcFdF
JBEIg6CNkc90Cszj6Xtx
=K1Q+
-----END PGP SIGNATURE-----
Merge tag 'pm-extra-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These add new CPU IDs to a couple of drivers, fix a possible NULL
pointer dereference in the cpuidle core, update DT-related things in
the generic power domains framework and finally update the
suspend/resume infrastructure to improve the handling of wakeups from
suspend-to-idle.
Specifics:
- Add Intel Gemini Lake CPU IDs to the intel_idle and intel_rapl
drivers (David Box).
- Add a NULL pointer check to the cpuidle core to prevent it from
crashing on platforms with incomplete cpuidle configuration (Fei
Li).
- Fix DT-related documentation in the generic power domains (genpd)
framework and add a MAINTAINERS entry for DT-related material in
genpd (Viresh Kumar).
- Update the system suspend/resume infrastructure to improve the
handling of aborts of suspend transitions in progress in the wakeup
framework and rework the suspend-to-idle core loop to make it
possible to filter out spurious wakeup events (specifically the
ones coming from ACPI) without resuming all the way up to user
space every time (Rafael Wysocki)"
* tag 'pm-extra-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle
PM / wakeup: Integrate mechanism to abort transitions in progress
x86/intel_idle: add Gemini Lake support
cpuidle: check dev before usage in cpuidle_use_deepest_state()
powercap: intel_rapl: Add support for Gemini Lake
PM / Domains: Add DT file to MAINTAINERS
PM / Domains: Fix DT example
|
||
|
|
88b0193d94 |
perf/callchain: Force USER_DS when invoking perf_callchain_user()
Perf can generate and record a user callchain in response to a synchronous request, such as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we can end up walking the user stack (and dereferencing/saving whatever we find there) without the protections usually afforded by checks such as access_ok. Rather than play whack-a-mole with each architecture's stack unwinding implementation, fix the root of the problem by ensuring that we force USER_DS when invoking perf_callchain_user from the perf core. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
50fb55d88c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix multiqueue in stmmac driver on PCI, from Andy Shevchenko.
2) cdc_ncm doesn't actually fully zero out the padding area is
allocates on TX, from Jim Baxter.
3) Don't leak map addresses in BPF verifier, from Daniel Borkmann.
4) If we randomize TCP timestamps, we have to do it everywhere
including SYN cookies. From Eric Dumazet.
5) Fix "ethtool -S" crash in aquantia driver, from Pavel Belous.
6) Fix allocation size for ntp filter bitmap in bnxt_en driver, from
Dan Carpenter.
7) Add missing memory allocation return value check to DSA loop driver,
from Christophe Jaillet.
8) Fix XDP leak on driver unload in qed driver, from Suddarsana Reddy
Kalluru.
9) Don't inherit MC list from parent inet connection sockets, another
syzkaller spotted gem. Fix from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
dccp/tcp: do not inherit mc_list from parent
qede: Split PF/VF ndos.
qed: Correct doorbell configuration for !4Kb pages
qed: Tell QM the number of tasks
qed: Fix VF removal sequence
qede: Fix XDP memory leak on unload
net/mlx4_core: Reduce harmless SRIOV error message to debug level
net/mlx4_en: Avoid adding steering rules with invalid ring
net/mlx4_en: Change the error print to debug print
drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison
DECnet: Use container_of() for embedded struct
Revert "ipv4: restore rt->fi for reference counting"
net: mdio-mux: bcm-iproc: call mdiobus_free() in error path
net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
net: cdc_ncm: Fix TX zero padding
stmmac: pci: split out common_default_data() helper
stmmac: pci: RX queue routing configuration
stmmac: pci: TX and RX queue priority configuration
stmmac: pci: set default number of rx and tx queues
...
|
||
|
|
80449d89d4 |
Merge branches 'pm-domains', 'pm-cpuidle', 'pm-sleep' and 'powercap'
* pm-domains: PM / Domains: Add DT file to MAINTAINERS PM / Domains: Fix DT example * pm-cpuidle: x86/intel_idle: add Gemini Lake support cpuidle: check dev before usage in cpuidle_use_deepest_state() * pm-sleep: ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle PM / wakeup: Integrate mechanism to abort transitions in progress * powercap: powercap: intel_rapl: Add support for Gemini Lake |
||
|
|
11fbf53d66 |
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro: "Assorted bits and pieces from various people. No common topic in this pile, sorry" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/affs: add rename exchange fs/affs: add rename2 to prepare multiple methods Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx() fs: don't set *REFERENCED on single use objects fs: compat: Remove warning from COMPATIBLE_IOCTL remove pointless extern of atime_need_update_rcu() fs: completely ignore unknown open flags fs: add a VALID_OPEN_FLAGS fs: remove _submit_bh() fs: constify tree_descr arrays passed to simple_fill_super() fs: drop duplicate header percpu-rwsem.h fs/affs: bugfix: Write files greater than page size on OFS fs/affs: bugfix: enable writes on OFS disks fs/affs: remove node generation check fs/affs: import amigaffs.h fs/affs: bugfix: make symbolic links work again |
||
|
|
00d9593335 |
These are three simple changes.
The first one is just a switch from using strcpy() to strlcpy(). Someone thought that it may cause an overflow bug, but since it only copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm should ever be bigger than that, nor not end with a nul character, this change is more of a safety precaution than fixing anything that is actually broken. The other two changes are simply cleaning and optimizing some code. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJZEKYJFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L +NcH/jK6ELGykogqi2FfLNzwJTjVpHdKrrMKyxHcC+jXv3mJUyK+0qKHkCO6zyy1 EWAbTrSMjHGG6r6AP42QLRRehsijk7xXjJm86T771PNtSgY4xCKobFisk73KR4YB 2Y1JXkSpKH2kKgdixR9hcg4h5RTv16KeAMu2cLSMxRfDEr1mBvv7LU8ZrobJSx2C LGR/241bTxOB6mWCmjqSTVrhHyEAMgJhVwV+ym7qfjqQULGhgFmq3CVTicFU0PWx UkzrcwYT2T56jU3Ngu/e1KkEZq0/rG7O86iSxgwnuraW4n48u3rpkl/q9eZ029Hd /kxyyXBKQDxx6cQd4hZrYUTW4IU= =8/K8 -----END PGP SIGNATURE----- Merge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: "These are three simple changes. The first one is just a switch from using strcpy() to strlcpy(). Someone thought that it may cause an overflow bug, but since it only copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm should ever be bigger than that, nor not end with a nul character, this change is more of a safety precaution than fixing anything that is actually broken. The other two changes are simply cleaning and optimizing some code" * tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Simplify ftrace_match_record() even more ftrace: Remove an unneeded condition tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline() |
||
|
|
857f864014 |
pci-v4.12-changes
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+ IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq ags00JtMLitfNPBH3eSl =eE4D -----END PGP SIGNATURE----- Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ... |
||
|
|
8f3207c7ea |
TTY/Serial patches for 4.12-rc1
Here is the "big" TTY/Serial patch updates for 4.12-rc1
Not a lot of new things here, the normal number of serial driver updates
and additions, tiny bugs fixed, and some core files split up to make
future changes a bit easier for Nicolas's "tiny-tty" work.
All of these have been in linux-next for a while. There will be a merge
conflict with include/linux/serdev.h coming from the bluetooth tree
merge, which we knew about, as we wanted some of the serdev changes to
go in through that tree. I'll send the expected merge result as a
follow-on message.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iGwEABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWRA9rw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn8OwCXSoCtZMGl25ohu1osCL5G0UEMtgCg2Z9k7hDk
LpQTTN98hHn/VwM47ro=
=X8sk
-----END PGP SIGNATURE-----
Merge tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the "big" TTY/Serial patch updates for 4.12-rc1
Not a lot of new things here, the normal number of serial driver
updates and additions, tiny bugs fixed, and some core files split up
to make future changes a bit easier for Nicolas's "tiny-tty" work.
All of these have been in linux-next for a while"
* tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits)
serial: small Makefile reordering
tty: split job control support into a file of its own
tty: move baudrate handling code to a file of its own
console: move console_init() out of tty_io.c
serial: 8250_early: Add earlycon support for Palmchip UART
tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
vt: make mouse selection of non-ASCII consistent
vt: set mouse selection word-chars to gpm's default
imx-serial: Reduce RX DMA startup latency when opening for reading
serial: omap: suspend device on probe errors
serial: omap: fix runtime-pm handling on unbind
tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
serial: samsung: Remove useless spinlock
serial: samsung: Add missing checks for dma_map_single failure
serial: samsung: Use right device for DMA-mapping calls
serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
tty: fix comment typo s/repsonsible/responsible/
tty: amba-pl011: Fix spurious TX interrupts
serial: xuartps: Enable clocks in the pm disable case also
serial: core: Re-use struct uart_port {name} field
...
|
||
|
|
4dbbe2d8e9 |
tracing: Use cpumask_available() to check if cpumask variable may be used
This fixes the following clang warning:
kernel/trace/trace.c:3231:12: warning: address of array 'iter->started'
will always evaluate to 'true' [-Wpointer-bool-conversion]
if (iter->started)
Link: http://lkml.kernel.org/r/20170421234110.117075-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
||
|
|
51aad0aee5 |
trace: make trace_hwlat timestamp y2038 safe
struct timespec is not y2038 safe on 32 bit machines and needs to be replaced by struct timespec64 in order to represent times beyond year 2038 on such machines. Fix all the timestamp representation in struct trace_hwlat and all the corresponding implementations. Link: http://lkml.kernel.org/r/1491613030-11599-3-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
50327ddfbc |
kernel/power/snapshot.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-13-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
bbca07c307 |
kernel/module.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-12-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Jessica Yu <jeyu@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
19809c2da2 |
mm, vmalloc: use __GFP_HIGHMEM implicitly
__vmalloc* allows users to provide gfp flags for the underlying allocation. This API is quite popular $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l 77 The only problem is that many people are not aware that they really want to give __GFP_HIGHMEM along with other flags because there is really no reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages which are mapped to the kernel vmalloc space. About half of users don't use this flag, though. This signals that we make the API unnecessarily too complex. This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM are simplified and drop the flag. Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Cristopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f61e869d51 |
kcov: simplify interrupt check
in_interrupt() semantics are confusing and wrong for most users as it also returns true when bh is disabled. Thus we open coded a proper check for interrupts in __sanitizer_cov_trace_pc() with a lengthy explanatory comment. Use the new in_task() predicate instead. Link: http://lkml.kernel.org/r/20170321091026.139655-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: James Morse <james.morse@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8c733420bd |
taskstats: add e/u/stime for TGID command
The elapsed time, user CPU time and system CPU time for the thread group status request are presently left at zero. Fill these in. [akpm@linux-foundation.org: run ktime_get_ns() a single time] [akpm@linux-foundation.org: include linux/sched/cputime.h for task_cputime()] Link: http://lkml.kernel.org/r/1488508424-12322-1-git-send-email-xiao.zhang@windriver.com Signed-off-by: Zhang Xiao <xiao.zhang@windriver.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
eaa0d190bf |
pidns: expose task pid_ns_for_children to userspace
pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. This patch solves the problem, and it exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8896c23d2e |
pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()
alloc_pidmap() advances pid_namespace::last_pid. When first pid
allocation fails, then next created process will have pid 2 and
pid_ns_prepare_proc() won't be called. So, pid_namespace::proc_mnt will
never be initialized (not to mention that there won't be a child
reaper).
I saw crash stack of such case on kernel 3.10:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: proc_flush_task+0x8f/0x1b0
Call Trace:
release_task+0x3f/0x490
wait_consider_task.part.10+0x7ff/0xb00
do_wait+0x11f/0x280
SyS_wait4+0x7d/0x110
We may fix this by restore of last_pid in 0 or by prohibiting of futher
allocations. Since there was a similar issue in Oleg Nesterov's commit
|
||
|
|
51dbd92520 |
ia64: reuse append_elf_note() and final_note() functions
Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
692f66f26a |
crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
19659c59af |
fork: free vmapped stacks in cache when cpus are offline
Using virtually mapped stack, kernel stacks are allocated via vmalloc. In the current implementation, two stacks per cpu can be cached when tasks are freed and the cached stacks are used again in task duplications. But the cached stacks may remain unfreed even when cpu are offline. By adding a cpu hotplug callback to free the cached stacks when a cpu goes offline, the pages of the cached stacks are not wasted. Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
780cbcf287 |
kernel/hung_task.c: defer showing held locks
When I was running my testcase which may block hundreds of threads on fs
locks, I got lockup due to output from debug_show_all_locks() added by
commit
|
||
|
|
63259457a2 |
proc/sysctl: fix the int overflow for jiffies conversion
do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to avoid overflow. But actually the *valp is int type, so it still causes overflow. For example, echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time Then, cat ./sys/net/ipv4/tcp_keepalive_time The output is "-1", it is not expected. Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it. Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com Signed-off-by: Gao Feng <fgao@ikuai8.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0d0e57697f |
bpf: don't let ldimm64 leak map addresses on unprivileged
The patch fixes two things at once: 1) It checks the env->allow_ptr_leaks and only prints the map address to the log if we have the privileges to do so, otherwise it just dumps 0 as we would when kptr_restrict is enabled on %pK. Given the latter is off by default and not every distro sets it, I don't want to rely on this, hence the 0 by default for unprivileged. 2) Printing of ldimm64 in the verifier log is currently broken in that we don't print the full immediate, but only the 32 bit part of the first insn part for ldimm64. Thus, fix this up as well; it's okay to access, since we verified all ldimm64 earlier already (including just constants) through replace_map_fd_with_map_ptr(). Fixes: |
||
|
|
eed4d47efe |
ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle
The ACPI SCI (System Control Interrupt) is set up as a wakeup IRQ during suspend-to-idle transitions and, consequently, any events signaled through it wake up the system from that state. However, on some systems some of the events signaled via the ACPI SCI while suspended to idle should not cause the system to wake up. In fact, quite often they should just be discarded. Arguably, systems should not resume entirely on such events, but in order to decide which events really should cause the system to resume and which are spurious, it is necessary to resume up to the point when ACPI SCIs are actually handled and processed, which is after executing dpm_resume_noirq() in the system resume path. For this reasons, add a loop around freeze_enter() in which the platforms can process events signaled via multiplexed IRQ lines like the ACPI SCI and add suspend-to-idle hooks that can be used for this purpose to struct platform_freeze_ops. In the ACPI case, the ->wake hook is used for checking if the SCI has triggered while suspended and deferring the interrupt-induced system wakeup until the events signaled through it are actually processed sufficiently to decide whether or not the system should resume. In turn, the ->sync hook allows all of the relevant event queues to be flushed so as to prevent events from being missed due to race conditions. In addition to that, some ACPI code processing wakeup events needs to be modified to use the "hard" version of wakeup triggers, so that it will cause a system resume to happen on device-induced wakeup events even if the "soft" mechanism to prevent the system from suspending is not enabled (that also helps to catch device-induced wakeup events occurring during suspend transitions in progress). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
7246f60068 |
powerpc updates for 4.12 part 1.
Highlights include:
- Larger virtual address space on 64-bit server CPUs. By default we use a 128TB
virtual address space, but a process can request access to the full 512TB by
passing a hint to mmap().
- Support for the new Power9 "XIVE" interrupt controller.
- TLB flushing optimisations for the radix MMU on Power9.
- Support for CAPI cards on Power9, using the "Coherent Accelerator Interface
Architecture 2.0".
- The ability to configure the mmap randomisation limits at build and runtime.
- Several small fixes and cleanups to the kprobes code, as well as support for
KPROBES_ON_FTRACE.
- Major improvements to handling of system reset interrupts, correctly treating
them as NMIs, giving them a dedicated stack and using a new hypervisor call
to trigger them, all of which should aid debugging and robustness.
Many fixes and other minor enhancements.
Thanks to:
Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben
Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian
Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson,
Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli,
Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan,
Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev
Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler,
Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZDHUMAAoJEFHr6jzI4aWAT7oQALkE2Nj3gjcn1z0SkFhq/1iO
Py9Elmqm4E+L6NKYtBY5dS8xVAJ088ffzERyqJ1FY1LHkB8tn8bWRcMQmbjAFzTI
V4TAzDNI890BN/F4ptrYRwNFxRBHAvZ4NDunTzagwYnwmTzW9PYHmOi4pvWTo3Tw
KFUQ0joLSEgHzyfXxYB3fyj41u8N0FZvhfazdNSqia2Y5Vwwv/ION5jKplDM+09Y
EtVEXFvaKAS1sjbM/d/Jo5rblHfR0D9/lYV10+jjyIokjzslIpyTbnj3izeYoM5V
I4h99372zfsEjBGPPXyM3khL3zizGMSDYRmJHQSaKxjtecS9SPywPTZ8ufO/aSzV
Ngq6nlND+f1zep29VQ0cxd3Jh40skWOXzxJaFjfDT25xa6FbfsWP2NCtk8PGylZ7
EyqTuCWkMgIP02KlX3oHvEB2LRRPCDmRU2zECecRGNJrIQwYC2xjoiVi7Q8Qe8rY
gr7Ib5Jj/a+uiTcCIy37+5nXq2s14/JBOKqxuYZIxeuZFvKYuRUipbKWO05WDOAz
m/pSzeC3J8AAoYiqR0gcSOuJTOnJpGhs7zrQFqnEISbXIwLW+ICumzOmTAiBqOEY
Rt8uW2gYkPwKLrE05445RfVUoERaAjaE06eRMOWS6slnngHmmnRJbf3PcoALiJkT
ediqGEj0/N1HMB31V5tS
=vSF3
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Larger virtual address space on 64-bit server CPUs. By default we
use a 128TB virtual address space, but a process can request access
to the full 512TB by passing a hint to mmap().
- Support for the new Power9 "XIVE" interrupt controller.
- TLB flushing optimisations for the radix MMU on Power9.
- Support for CAPI cards on Power9, using the "Coherent Accelerator
Interface Architecture 2.0".
- The ability to configure the mmap randomisation limits at build and
runtime.
- Several small fixes and cleanups to the kprobes code, as well as
support for KPROBES_ON_FTRACE.
- Major improvements to handling of system reset interrupts,
correctly treating them as NMIs, giving them a dedicated stack and
using a new hypervisor call to trigger them, all of which should
aid debugging and robustness.
- Many fixes and other minor enhancements.
Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple,
Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton
Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt,
Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy,
Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy,
Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin,
Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J
Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver
O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell
Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C.
Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar,
Yang Shi"
* tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it
powerpc/powernv: Fix TCE kill on NVLink2
powerpc/mm/radix: Drop support for CPUs without lockless tlbie
powerpc/book3s/mce: Move add_taint() later in virtual mode
powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body
powerpc/smp: Document irq enable/disable after migrating IRQs
powerpc/mpc52xx: Don't select user-visible RTAS_PROC
powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset()
powerpc/eeh: Clean up and document event handling functions
powerpc/eeh: Avoid use after free in eeh_handle_special_event()
cxl: Mask slice error interrupts after first occurrence
cxl: Route eeh events to all drivers in cxl_pci_error_detected()
cxl: Force context lock during EEH flow
powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST
powerpc/xmon: Teach xmon oops about radix vectors
powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids
powerpc/pseries: Enable VFIO
powerpc/powernv: Fix iommu table size calculation hook for small tables
powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc
powerpc: Add arch/powerpc/tools directory
...
|
||
|
|
e579dde654 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman: "This is a set of small fixes that were mostly stumbled over during more significant development. This proc fix and the fix to posix-timers are the most significant of the lot. There is a lot of good development going on but unfortunately it didn't quite make the merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Fix unbalanced hard link numbers signal: Make kill_proc_info static rlimit: Properly call security_task_setrlimit signal: Remove unused definition of sig_user_definied ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET ipc: Remove unused declaration of recompute_msgmni posix-timers: Correct sanity check in posix_cpu_nsleep sysctl: Remove dead register_sysctl_root |
||
|
|
5ea30e4e58 |
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
The stack canary is an 'unsigned long' and should be fully initialized to random data rather than only 32 bits of random data. Signed-off-by: Daniel Micay <danielmicay@gmail.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Arjan van Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-hardening@lists.openwall.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
77c0eddeee |
ftrace: Simplify ftrace_match_record() even more
Dan Carpenter sent a patch to remove a check in ftrace_match_record() because the logic of the code made the check redundant. I looked deeper into the code, and made the following logic table, with the three variables and the result of the original code. modname mod_matches exclude_mod result ------- ----------- ----------- ------ 0 0 0 return 0 0 0 1 func_match 0 1 * < cannot exist > 1 0 0 return 0 1 0 1 func_match 1 1 0 func_match 1 1 1 return 0 Notice that when mod_matches == exclude mod, the result is always to return 0, and when mod_matches != exclude_mod, then the result is to test the function. This means we only need test if mod_matches is equal to exclude_mod. Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
|
|
31805c9052 |
ftrace: Remove an unneeded condition
We know that "mod_matches" is true here so there is no need to check again. Link: http://lkml.kernel.org/r/20170331152130.GA4947@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
|
|
e09e28671c |
tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()
Strcpy is inherently not safe, and strlcpy() should be used instead. __trace_find_cmdline() uses strcpy() because the comms saved must have a terminating nul character, but it doesn't hurt to add the extra protection of using strlcpy() instead of strcpy(). Link: http://lkml.kernel.org/r/1493806274-13936-1-git-send-email-amit.pundir@linaro.org Signed-off-by: Amey Telawane <ameyt@codeaurora.org> [AmitP: Cherry-picked this commit from CodeAurora kernel/msm-3.10 https://source.codeaurora.org/quic/la/kernel/msm-3.10/commit/?id=2161ae9a70b12cf18ac8e5952a20161ffbccb477] Signed-off-by: Amit Pundir <amit.pundir@linaro.org> [ Updated change log and removed the "- 1" from len parameter ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
|
|
a1be8edda4 |
Modules updates for v4.12
Summary of modules changes for the 4.12 merge window: - Minor code cleanups - Fix section alignment for .init_array Signed-off-by: Jessica Yu <jeyu@redhat.com> -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJZClptAAoJEMBFfjjOO8FygIkP/iUB2+Ek2i5aD5w5cLpJ3Sm3 w5DnoSLLHATd64TmvtwIv1UUHJMkUv7Ls9soz7aegxETzp6E1RCRwGxH7FZfuEk7 T8wPiOzhr76rEylr0y7iSYjRv3j5x9PKY5mUejldeiUGIS5crGG14wvnVFzOPIJQ Y2J7pjCPKLgtDxIHfBZFV/ut8TEaepf3du/qGi8UDqxEexEiBixXq3VGOqP/YmYt earCKOU1EhHCIo7LDU4QvrK/6vWq1Ip7yzRtho/LHsgtNeRg5sQ9DO12HvylvXUo IRYLW2yKM9RZnw2XNVt4mHt6zCDTK3gshfLg5SiCBr4AWP5JMX4GLF/w+YpC11tt Ec2M9S9xqWuk5z6rhJyHcEsRgzfDRYRrz79c0wvH+fqKL6kwj7CSPudGkbFIQCXy LjDEe/Fk0RPDSUzHSDpQJWf3u3/mD5rwAcX3X673mRyc9mmm/HzNDOOxJV0EuO6G J2qhjO5a0vLlZ4tpd4uKUgoO9x8jc2Y3jV8wDDDRfOWwr6DD399l20/wnJ9VUkcN 55rQsmKCHEf+5SyGwXUcVsICy1jBg+rfg/SnSBhCrP07uGYz+iGSQ+FZ3xaBCj3f SY/9sAACA3Tn/tTJCY+ncj71oTigTRxU3aYlfqoiwIp/lIpC+gd567SUKssUQmi7 RhAC280Y6SNEB6iK2t7I =Z/Mw -----END PGP SIGNATURE----- Merge tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: - Minor code cleanups - Fix section alignment for .init_array * tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: kallsyms: Use bounded strnchr() when parsing string module: Unify the return value type of try_module_get module: set .init_array alignment to 8 |
||
|
|
4c174688ee |
New features for this release:
o Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter
o The rewrite was needed to add plugins to be unique to tracing instances.
i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter
The old way was written very hacky. This removes a lot of those hacks.
o New "function-fork" tracing option. When set, pids in the set_ftrace_pid
will have their children added when the processes with their pids
listed in the set_ftrace_pid file forks.
o Exposure of "maxactive" for kretprobe in kprobe_events
o Allow for builtin init functions to be traced by the function tracer
(via the kernel command line). Module init function tracing will come
in the next release.
o Added more selftests, and have selftests also test in an instance.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJZCRchFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
zuIH/RsLUb8Hj6GmhAvn/tblUDzWyqlXX2h79VVlo/XrWayHYNHnKOmua1WwMZC6
xESXb/AffAc89VWTkKsrwaK7yfRPG6+w8zTZOcFuXSBpqSGG/oey9Fxj5Wqqpche
oJ2UY7ngxANAipkP5GxdYTafFSoWhGZGfUUtW+5tAHoFHzqO2lOjO8olbXP69sON
kVX/b461S20cVvRe5H/F0klXLSc37Tlp5YznXy4H4V4HcJSN1Fb6/uozOXALZ4se
SBpVMWmVVoGJorzj+ic7gVOeohvC8RnR400HbeMVwaI0Lj50noidDj/5Hv8F7T+D
h1B8vATNZLFAFUOSHINCBIu6Vj0=
=t8mg
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New features for this release:
- Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter
- The rewrite was needed to add plugins to be unique to tracing
instances. i.e. mkdir instance/foo; cd instances/foo; echo
do_IRQ:stacktrace > set_ftrace_filter The old way was written very
hacky. This removes a lot of those hacks.
- New "function-fork" tracing option. When set, pids in the
set_ftrace_pid will have their children added when the processes
with their pids listed in the set_ftrace_pid file forks.
- Exposure of "maxactive" for kretprobe in kprobe_events
- Allow for builtin init functions to be traced by the function
tracer (via the kernel command line). Module init function tracing
will come in the next release.
- Added more selftests, and have selftests also test in an instance"
* tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits)
ring-buffer: Return reader page back into existing ring buffer
selftests: ftrace: Allow some event trigger tests to run in an instance
selftests: ftrace: Have some basic tests run in a tracing instance too
selftests: ftrace: Have event tests also run in an tracing instance
selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances
selftests: ftrace: Allow some tests to be run in a tracing instance
tracing/ftrace: Allow for instances to trigger their own stacktrace probes
tracing/ftrace: Allow for the traceonoff probe be unique to instances
tracing/ftrace: Enable snapshot function trigger to work with instances
tracing/ftrace: Allow instances to have their own function probes
tracing/ftrace: Add a better way to pass data via the probe functions
ftrace: Dynamically create the probe ftrace_ops for the trace_array
tracing: Pass the trace_array into ftrace_probe_ops functions
tracing: Have the trace_array hold the list of registered func probes
ftrace: If the hash for a probe fails to update then free what was initialized
ftrace: Have the function probes call their own function
ftrace: Have each function probe use its own ftrace_ops
ftrace: Have unregister_ftrace_function_probe_func() return a value
ftrace: Add helper function ftrace_hash_move_and_update_ops()
ftrace: Remove data field from ftrace_func_probe structure
...
|
||
|
|
9c35baf6ce |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- There is a situation when early console is not deregistered because
the preferred one matches a wrong entry. It caused messages to appear
twice.
This is the 2nd attempt to fix it. The first one was wrong, see the
commit
|
||
|
|
dd23f273d9 |
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton: - a few misc things - most of MM - KASAN updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) kasan: separate report parts by empty lines kasan: improve double-free report format kasan: print page description after stacks kasan: improve slab object description kasan: change report header kasan: simplify address description logic kasan: change allocation and freeing stack traces headers kasan: unify report headers kasan: introduce helper functions for determining bug type mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page mm: hwpoison: call shake_page() unconditionally mm/swapfile.c: fix swap space leak in error path of swap_free_entries() mm/gup.c: fix access_ok() argument type mm/truncate: avoid pointless cleancache_invalidate_inode() calls. mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty fs/block_dev: always invalidate cleancache in invalidate_bdev() fs: fix data invalidation in the cleancache during direct IO zram: reduce load operation in page_same_filled zram: use zram_free_page instead of open-coded zram: introduce zram data accessor ... |
||
|
|
7dea19f9ee |
mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:
- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives
Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.
In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.
Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.
Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.
PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.
This patch shouldn't introduce any functional changes.
Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.
[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
7e7844226f |
lockdep: allow to disable reclaim lockup detection
The current implementation of the reclaim lockup detection can lead to false positives and those even happen and usually lead to tweak the code to silence the lockdep by using GFP_NOFS even though the context can use __GFP_FS just fine. See http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example. ================================= [ INFO: inconsistent lock state ] 4.5.0-rc2+ #4 Tainted: G O --------------------------------- inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs] {RECLAIM_FS-ON-R} state was registered at: mark_held_locks+0x79/0xa0 lockdep_trace_alloc+0xb3/0x100 kmem_cache_alloc+0x33/0x230 kmem_zone_alloc+0x81/0x120 [xfs] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs] __xfs_refcount_find_shared+0x75/0x580 [xfs] xfs_refcount_find_shared+0x84/0xb0 [xfs] xfs_getbmap+0x608/0x8c0 [xfs] xfs_vn_fiemap+0xab/0xc0 [xfs] do_vfs_ioctl+0x498/0x670 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x12/0x6f CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 3 locks held by kswapd0/543: stack backtrace: CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4 Call Trace: lock_acquire+0xd8/0x1e0 down_write_nested+0x5e/0xc0 xfs_ilock+0x177/0x200 [xfs] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] xfs_fs_evict_inode+0xdc/0x1e0 [xfs] evict+0xc5/0x190 dispose_list+0x39/0x60 prune_icache_sb+0x4b/0x60 super_cache_scan+0x14f/0x1a0 shrink_slab.part.63.constprop.79+0x1e9/0x4e0 shrink_zone+0x15e/0x170 kswapd+0x4f1/0xa80 kthread+0xf2/0x110 ret_from_fork+0x3f/0x70 To quote Dave: "Ignoring whether reflink should be doing anything or not, that's a "xfs_refcountbt_init_cursor() gets called both outside and inside transactions" lockdep false positive case. The problem here is lockdep has seen this allocation from within a transaction, hence a GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context. Also note that we have an active reference to this inode. So, because the reclaim annotations overload the interrupt level detections and it's seen the inode ilock been taken in reclaim ("interrupt") context, this triggers a reclaim context warning where it thinks it is unsafe to do this allocation in GFP_KERNEL context holding the inode ilock..." This sounds like a fundamental problem of the reclaim lock detection. It is really impossible to annotate such a special usecase IMHO unless the reclaim lockup detection is reworked completely. Until then it is much better to provide a way to add "I know what I am doing flag" and mark problematic places. This would prevent from abusing GFP_NOFS flag which has a runtime effect even on configurations which have lockdep disabled. Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to skip the current allocation request. While we are at it also make sure that the radix tree doesn't accidentaly override tags stored in the upper part of the gfp_mask. Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
6d7225f0cc |
lockdep: teach lockdep about memalloc_noio_save
Patch series "scope GFP_NOFS api", v5. This patch (of 7): Commit |
||
|
|
2f34c1231b |
main drm pull request for 4.12 kernel
-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZCTzvAAoJEAx081l5xIa+9kcQAJsQiija4/7QGx6IzakOMqjx WulJ3zYG/cU/HLwCBcuWRDF6wAj+7iWNeLCPmolHwEazcI8tQVdgMlWtbdMbDh8U ckzD3FBXsEVfIfab+u6tyoUkm3l/VDhMXbjkUK7NTo/+dkRqe5LuFfZPCGN09jft Y+5salkRXzDhXPSFsqmjfzhx1v7PTgf0a5HUenKWEWOv+sJQaW4/iPvcDSIcg5qR l9WjAqro1NpFYhUodnh6DkLeledL1U5whdtp/yvrUAck8y+WP/jwGYmQ7pZ0UkQm f0M3kV6K67ox9eqN++jsGX5o8sB1qF01Uh95kBAnyzYzsw4ZlMCx6pV7PDX+J88M UBNMEqX10hrLkNJA9lGjPWx+/6fudcwg9anKvTRO3Uyx7MbYoJAgjzAM+yBqqtV0 8Otxa4Bw0V2pmUD+0lqJDERRvE77VCXkLb8SaI5lQo0MHpQqT2cZA+GD+B+rZHO6 Ie5LDFY87vM2GG1IECufG+xOa3v6sn2FfQ1ouu1KNGKOAMBKcQCQyQx3kGVuNW2i HDACVXALJgXdRlVLm4jydOCZdRoguX7AWmRjtdwxgaO+lBcGfLhkXdjLQ7Ho+29p 32ArJfkZPfA53vMB6lHxAfbtrs1q2RzyVnPHj/KqeJnGZbABKTsF2HQ5BQc4Xq/J mqXoz6Oubdvk4Pwyx7Ne =UxFF -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux Pull drm u pdates from Dave Airlie: "This is the main drm pull request for v4.12. Apart from two fixes pulls, everything should have been in drm-next for at least 2 weeks. The biggest thing in here is AMD released the public headers for their upcoming VEGA GPUs. These as always are quite a sizeable chunk of header files. They've also added initial non-display support for those GPUs, though they aren't available in production yet. Otherwise it's pretty much normal. New bridge drivers: - megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++ - generic LVDS bridge support. Core: - Displayport link train failure reporting to userspace - debugfs interface cleaned up - subsystem TODO in kerneldoc now - Extended fbdev support (flipping and vblank wait) - drm_platform removed - EDP CRC support in helper - HF-VSDB SCDC support in EDID parser - Lots of code cleanups and header extraction - Thunderbolt external GPU awareness - Atomic helper improvements - Documentation improvements panel: - Sitronix and Samsung new panel support amdgpu: - Preliminary vega10 support - Multi-level page table support - GPU sensor support for userspace - PRT support for sparse buffers - SR-IOV improvements - Non-contig VRAM CPU mapping i915: - Atomic modesetting enabled by default on Gen5+ - LSPCON improvements - Atomic state handling for cdclk - GPU reset improvements - In-kernel unit tests - Geminilake improvements and color manager support - Designware i2c fixes - vblank evasion improvements - Hotplug safe connector iterators - GVT scheduler QoS support - GVT Kabylake support nouveau: - Acceleration support for Pascal (GP10x). - Rearchitecture of code handling proprietary signed firmware - Fix GTX 970 with odd MMU configuration - GP10B support - GP107 acceleration support vmwgfx: - Atomic modesetting support for vmwgfx omapdrm: - Support for render nodes - Refactor omapdss code - Fix some probe ordering issues - Fix too dark RGB565 rendering sunxi: - prelim rework for multiple pipes. mali-dp: - Color management support - Plane scaling - Power management improvements imx-drm: - Prefetch Resolve Engine/Gasket on i.MX6QP - Deferred plane disabling - Separate alpha support mediatek: - Mediatek SoC MT2701 support rcar-du: - Gen3 HDMI support msm: - 4k support for newer chips - OPP bindings for gpu - prep work for per-process pagetables vc4: - HDMI audio support - fixes qxl: - minor fixes. dw-hdmi: - PHY improvements - CSC fixes - Amlogic GX SoC support" * tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits) drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr() drm/nouveau/kms: Increase max retries in scanout position queries. drm/nouveau/bios/bitP: check that table is long enough for optional pointers drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine drm: mali-dp: use div_u64 for expensive 64-bit divisions drm/i915: Confirm the request is still active before adding it to the await drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio drm/i915/selftests: Allocate inode/file dynamically drm/i915: Fix system hang with EI UP masked on Haswell drm/i915: checking for NULL instead of IS_ERR() in mock selftests drm/i915: Perform link quality check unconditionally during long pulse drm/i915: Fix use after free in lpe_audio_platdev_destroy() drm/i915: Use the right mapping_gfp_mask for final shmem allocation drm/i915: Make legacy cursor updates more unsynced drm/i915: Apply a cond_resched() to the saturated signaler drm/i915: Park the signaler before sleeping drm: mali-dp: Check the mclk rate and allow up/down scaling drm: mali-dp: Enable image enhancement when scaling drm: mali-dp: Add plane upscaling support ... |