Provides the scheduler with a cpu scaling correction factor for more
accurate load-tracking and cpu capacity handling.
The Energy Model (EM) (in fact the capacity value of the last element
of the capacity states vector of the core (MC) level sched_group_energy
structure) is used as the source for this cpu scaling factor.
The cpu capacity value depends on the micro-architecture and the
maximum frequency of the cpu.
The maximum frequency part should not be confused with the frequency
invariant scheduler load-tracking support which deals with frequency
related scaling due to DFVS functionality.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Provides the scheduler with a cpu scaling correction factor for more
accurate load-tracking and cpu capacity handling.
The Energy Model (EM) (in fact the capacity value of the last element
of the capacity states vector of the core (MC) level sched_group_energy
structure) is used instead of the arm arch specific cpu_efficiency and
dtb property 'clock-frequency' values as the source for this cpu
scaling factor.
The cpu capacity value depends on the micro-architecture and the
maximum frequency of the cpu.
The maximum frequency part should not be confused with the frequency
invariant scheduler load-tracking support which deals with frequency
related scaling due to DFVS functionality.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
cpufreq is currently keeping it a secret which cpus are sharing
clock source. The scheduler needs to know about clock domains as well
to become more energy aware. The SD_SHARE_CAP_STATES domain flag
indicates whether cpus belonging to the sched_domain share capacity
states (P-states).
There is no connection with cpufreq (yet). The flag must be set by
the arch specific topology code.
cc: Russell King <linux@arm.linux.org.uk>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
The sched_group_energy (sge) pointer of the first sched_group (sg) in
the sched_domain (sd) is initialized to point to the appropriate (in
terms of sd level and cpu) sge data defined in the arch and so to the
correct part of the Energy Model (EM).
Energy-aware scheduling allows that a system has only EM data up to a
certain sd level (so called highest energy aware balancing sd level).
A check in init_sched_energy() enforces that all sd's below this sd
level contain EM data.
The 'int cpu' parameter of sched_domain_energy_f requires that
check_sched_energy_data() makes sure that all cpus spanned by a sg
are provisioned with the same EM data.
This patch has also been tested with feature FORCE_SD_OVERLAP enabled.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
The struct sched_group_energy represents the per sched_group related
data which is needed for energy aware scheduling. It contains:
(1) number of elements of the idle state array
(2) pointer to the idle state array which comprises 'power consumption'
for each idle state
(3) number of elements of the capacity state array
(4) pointer to the capacity state array which comprises 'compute
capacity and power consumption' tuples for each capacity state
The struct sched_group obtains a pointer to a struct sched_group_energy.
The function pointer sched_domain_energy_f is introduced into struct
sched_domain_topology_level which will allow the arch to pass a particular
struct sched_group_energy from the topology shim layer into the scheduler
core.
The function pointer sched_domain_energy_f has an 'int cpu' parameter
since the folding of two adjacent sd levels via sd degenerate doesn't work
for all sd levels. I.e. it is not possible for example to use this feature
to provide per-cpu energy in sd level DIE on ARM's TC2 platform.
It was discussed that the folding of sd levels approach is preferable
over the cpu parameter approach, simply because the user (the arch
specifying the sd topology table) can introduce less errors. But since
it is not working, the 'int cpu' parameter is the only way out. It's
possible to use the folding of sd levels approach for
sched_domain_flags_f and the cpu parameter approach for the
sched_domain_energy_f at the same time though. With the use of the
'int cpu' parameter, an extra check function has to be provided to make
sure that all cpus spanned by a sched group are provisioned with the same
energy data.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
This patch introduces the ENERGY_AWARE sched feature, which is
implemented using jump labels when SCHED_DEBUG is defined. It is
statically set false when SCHED_DEBUG is not defined. Hence this doesn't
allow energy awareness to be enabled without SCHED_DEBUG. This
sched_feature knob will be replaced later with a more appropriate
control knob when things have matured a bit.
ENERGY_AWARE is based on per-entity load-tracking hence FAIR_GROUP_SCHED
must be enable. This dependency isn't checked at compile time yet.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
This documentation patch provides an overview of the experimental
scheduler energy costing model, associated data structures, and a
reference recipe on how platforms can be characterized to derive energy
models.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Scenarios with the busiest group having just one task and the local
being idle on topologies with sched groups with different numbers of
cpus manage to dodge all load-balance bailout conditions resulting the
nr_balance_failed counter to be incremented. This eventually causes a
pointless active migration of the task. This patch prevents this by not
incrementing the counter when the busiest group only has one task.
ASYM_PACKING migrations and migrations due to reduced capacity should
still take place as these are explicitly captured by
need_active_balance().
A better solution would be to not attempt the load-balance in the first
place, but that requires significant changes to the order of bailout
conditions and statistics gathering.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
We do not want to miss out on the ability to pull a single remaining
task from a potential source cpu towards an idle destination cpu. Add an
extra criteria to need_active_balance() to kick off active load balance
if the source cpu is over-utilized and has lower capacity than the
destination cpu.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
find_idlest_group() selects the wake-up target group purely
based on group load which leads to suboptimal choices in low load
scenarios. An idle group with reduced capacity (due to RT tasks or
different cpu type) isn't necessarily a better target than a lightly
loaded group with higher capacity.
The patch adds spare capacity as an additional group selection
parameter. The target group is now selected based on the following
criteria:
1. Return the group with the cpu with most spare capacity and this
capacity is significant if such group exists. Significant spare capacity
is currently at least 20% to spare.
2. Return the group with the lowest load, unless it is the local group
in which case NULL is returned and the search is continued at the next
(lower) level.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Wakeup balancing is completely unaware of cpu capacity, cpu utilization
and task utilization. The task is preferably placed on a cpu which is
idle in the instant the wakeup happens. New tasks
(SD_BALANCE_{FORK,EXEC} are placed on an idle cpu in the idlest group if
such can be found, otherwise it goes on the least loaded one. Existing
tasks (SD_BALANCE_WAKE) are placed on the previous cpu or an idle cpu
sharing the same last level cache unless the wakee_flips heuristic in
wake_wide() decides to fallback to considering cpus outside SD_LLC.
Hence existing tasks are not guaranteed to get a chance to migrate to a
different group at wakeup in case the current one has reduced cpu
capacity (due RT/IRQ pressure or different uarch e.g. ARM big.LITTLE).
They may eventually get pulled by other cpus doing
periodic/idle/nohz_idle balance, but it may take quite a while before it
happens.
This patch adds capacity awareness to find_idlest_{group,queue} (used by
SD_BALANCE_{FORK,EXEC} and SD_BALANCE_WAKE under certain circumstances)
such that groups/cpus that can accommodate the waking task based on task
utilization are preferred. In addition, wakeup of existing tasks
(SD_BALANCE_WAKE) is sent through find_idlest_{group,queue} also if the
task doesn't fit the capacity of the previous cpu to allow it to escape
(override wake_affine) when necessary instead of relying on
periodic/idle/nohz_idle balance to eventually sort it out.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
To be able to compare the capacity of the target cpu with the highest
cpu capacity of the system in the wakeup path, store the system-wide
maximum cpu capacity in the root domain.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
arch_scale_cpu_capacity() is no longer a weak function but a #define
instead. Include the #define in topology.h.
cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Defines arch_scale_freq_capacity() to use cpufreq implementation.
Including <linux/cpufreq.h> in topology.h like for the arm arch doesn't
work because of CONFIG_COMPAT=y (Kernel support for 32-bit EL0).
That's why cpufreq_scale_freq_capacity() has to be declared extern in
topology.h.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
frequency scaling correction factor for more accurate load-tracking.
The factor is:
current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
In fact, freq_scale should be a struct cpufreq_policy data member. But
this would require that the scheduler hot path (__update_load_avg()) would
have to grab the cpufreq lock. This can be avoided by using per-cpu data
initialized to SCHED_CAPACITY_SCALE for freq_scale.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
If a newly created task is selected to go to a different CPU in fork
balance when it wakes up the first time, its load averages should
not be removed from the source CPU since they are never added to
it before. The same is also applicable to a never used group entity.
Fix it in remove_entity_load_avg(): when entity's last_update_time
is 0, simply return. This should precisely identify the case in
question, because in other migrations, the last_update_time is set
to 0 after remove_entity_load_avg().
Reported-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
[peterz: cfs_rq_last_update_time]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <Juri.Lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJW4LNjAAoJEDjbvchgkmk+XfUP/R7mD1vHLjLRVMKfbwlIEqOv
pisPGxmxzZwHYUeMkIyU0a4i6jd3COEJDSIpHOhGVbCRs/1wdYYzMEnT6TyvpDup
dWrIkr+zlT9NGg3Gh5MIpjDNnEml8xQT1GdLX1+fqmiCXM7N9/XnecSa7n6hut7k
GzR3N/3FTUrqz65mcuUm1b2VJkLWSyMxjX/QpVXnJnLyTqpdzCMglA0fMkUaGk7H
emcc9JpFXn4yPWZ/WY1WrVWF3CGTFg5PUIaIZJ19iyNKNlS8mXovh6VNy4E7Jter
BOeaR+e+ZPvptg0YbApVV5dRzQ0sFJKW74b3XGGV31La85vqeX43WJcX98SBoGI0
A/qaz87fkuSKfaMrtwuy4vnTZCGNtrFNcBtovm6DqFBBo5jOtHphf8ShJ7k4IVYK
c9g2k31OSJM9RGraNMNFMvH7DZr1/Ib8pHcGnr6DnVK2N+gnBDvbhPOcBM10RRtI
Q35y3Y1Mdg2ssnFIO3qtewMlptdME5z76zkCfUwtG4nVqExMlA4qvrPhculK/Jn/
mEjLW2QCIIu1weFsSLc/2vCZ+9pguh+yudr0fk0unlHxUdJ2fMAY4SarFu8zLlBe
/VJwdjbYOSKTE6Mdcn0Zdwyv6OPmMLc4MGimBdC7JIV5EH0gOpu3nsDdOY3KsQYT
5nhZDVB18+M8+hfag+aL
=9ghs
-----END PGP SIGNATURE-----
Merge tag 'v4.4.5' into linux-linaro-lsk-v4.4
This is the 4.4.5 stable release
# gpg: Signature made Wed 09 Mar 2016 23:36:03 GMT using RSA key ID 6092693E
# gpg: Good signature from "Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kroah.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 647F 2865 4894 E3BD 4571 99BE 38DB BDC8 6092 693E
commit 11bf9b8658 upstream.
Although the ARM vDSO is cleanly separated by code/data with the code
being read-only in userspace mappings, the code page is still writable
from the kernel.
There have been exploits (such as http://itszn.com/blog/?p=21) that
take advantage of this on x86 to go from a bad kernel write to full
root.
Prevent this specific exploit class on ARM as well by putting the vDSO
code page in post-init read-only memory as well.
Before:
vdso: 1 text pages at base 80927000
root@Vexpress:/ cat /sys/kernel/debug/kernel_page_tables
---[ Modules ]---
---[ Kernel Mapping ]---
0x80000000-0x80100000 1M RW NX SHD
0x80100000-0x80600000 5M ro x SHD
0x80600000-0x80800000 2M ro NX SHD
0x80800000-0xbe000000 984M RW NX SHD
After:
vdso: 1 text pages at base 8072b000
root@Vexpress:/ cat /sys/kernel/debug/kernel_page_tables
---[ Modules ]---
---[ Kernel Mapping ]---
0x80000000-0x80100000 1M RW NX SHD
0x80100000-0x80600000 5M ro x SHD
0x80600000-0x80800000 2M ro NX SHD
0x80800000-0xbe000000 984M RW NX SHD
Inspired by https://lkml.org/lkml/2016/1/19/494 based on work by the
PaX Team, Brad Spengler, and Kees Cook.
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathan_lynch@mentor.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1455748879-21872-8-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit 7cca071ccb upstream.
The new __ro_after_init section should be writable before init, but
not after. Validate that it gets updated at init and can't be written
to afterwards.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit c74ba8b348 upstream.
One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.
Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.
This introduces __ro_after_init as a way to mark such memory, and adds
some documentation about the existing __read_mostly marking.
This improves the security of the Linux kernel by marking formerly
read-write memory regions as read-only on a fully booted up system.
Based on work by PaX Team and Brad Spengler.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit 9ccaf77cf0 upstream.
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.
This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit d2aa1acad2 upstream.
It may be useful to debug writes to the readonly sections of memory,
so provide a cmdline "rodata=off" to allow for this. This can be
expanded in the future to support "log" and "write" modes, but that
will need to be architecture-specific.
This also makes KDB software breakpoints more usable, as read-only
mappings can now be disabled on any kernel.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit 4b7b1ef2c2 upstream.
The ld-version.sh script fails on some versions of awk with the
following error, resulting in build failures for MIPS:
awk: scripts/ld-version.sh: line 4: regular expression compile failed (missing '(')
This is due to the regular expression ".*)", meant to strip off the
beginning of the ld version string up to the close bracket, however
brackets have a meaning in regular expressions, so lets escape it so
that awk doesn't expect a corresponding open bracket.
Fixes: ccbef1674a ("Kbuild, lto: add ld-version and ld-ifversion ...")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: Michal Marek <mmarek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kbuild@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12838/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f54ab5ff5 upstream.
This patch fixes a recent ABORT_TASK regression associated
with commit febe562c, where a left-over target_put_sess_cmd()
would still be called when __target_check_io_state() detected
a command has already been completed, and explicit ABORT must
be avoided.
Note commit febe562c dropped the local kref_get_unless_zero()
check in core_tmr_abort_task(), but did not drop this extra
corresponding target_put_sess_cmd() in the failure path.
So go ahead and drop this now bogus target_put_sess_cmd(),
and avoid this potential use-after-free.
Reported-by: Dan Lane <dracodan@gmail.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90d0f0f115 upstream.
For !BIO_CLONED bio, we can use .bi_vcnt safely, but it
doesn't mean we can just simply return .bi_io_vec[.bi_vcnt - 1]
because the start postion may have been moved in the middle of
the bvec, such as splitting in the middle of bvec.
Fixes: 7bcd79ac50d9(block: bio: introduce helpers to get the 1st and last bvec)
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d825c06bfe upstream.
When calculate_cpu_foreign_map() recalculates the cpu_foreign_map
cpumask it uses the local variable temp_foreign_map without initialising
it to zero. Since the calculation only ever sets bits in this cpumask
any existing bits at that memory location will remain set and find their
way into cpu_foreign_map too. This could potentially lead to cache
operations suboptimally doing smp calls to multiple VPEs in the same
core, even though the VPEs share primary caches.
Therefore initialise temp_foreign_map using cpumask_clear() before use.
Fixes: cccf34e941 ("MIPS: c-r4k: Fix cache flushing for MT cores")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12759/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a50e4688d upstream.
The MIPS_GIC_IPI should only be selected when MIPS_GIC is also
selected, otherwise it results in a compile error. smp-gic.c uses some
functions from include/linux/irqchip/mips-gic.h like
plat_ipi_call_int_xlate() which are only added to the header file when
MIPS_GIC is set. The Lantiq SoC does not use the GIC, but supports SMP.
The calls top the functions from smp-gic.c are already protected by
some #ifdefs
The first part of this was introduced in commit 72e20142b2 ("MIPS:
Move GIC IPI functions out of smp-cmp.c")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12774/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce9113bbcb upstream.
ovl_remove_upper() should do d_drop() only after it successfully
removes the dir, otherwise a subsequent getcwd() system call will
fail, breaking userspace programs.
This is to fix: https://bugzilla.kernel.org/show_bug.cgi?id=110491
Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b81de061fa upstream.
Overlayfs must update uid/gid after chown, otherwise functions
like inode_owner_or_capable() will check user against stale uid.
Catched by xfstests generic/087, it chowns file and calls utimes.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39680f50ae upstream.
The exit path will do some final updates to the VM of an exiting process
to inform others of the fact that the process is going away.
That happens, for example, for robust futex state cleanup, but also if
the parent has asked for a TID update when the process exits (we clear
the child tid field in user space).
However, at the time we do those final VM accesses, we've already
stopped accepting signals, so the usual "stop waiting for userfaults on
signal" code in fs/userfaultfd.c no longer works, and the process can
become an unkillable zombie waiting for something that will never
happen.
To solve this, just make handle_userfault() abort any user fault
handling if we're already in the exit path past the signal handling
state being dead (marked by PF_EXITING).
This VM special case is pretty ugly, and it is possible that we should
look at finalizing signals later (or move the VM final accesses
earlier). But in the meantime this is a fairly minimally intrusive fix.
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c88c5d4373 upstream.
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
parameters and returned nothing. The call was updated to accept the
terminal number to flush, and returned various values depending on the
state of the output buffer.
The prototype has been updated and its usage in the OPAL kmsg dumper has
been modified to support its new behaviour as an incremental flush.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit affddff69c upstream.
On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called. When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.
Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines. Before this
patch, however, only partial output would be printed during the timeout wait.
This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost. It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.
The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic(). As such, the "end Kernel panic"
message may still be truncated as follows:
>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not
This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f15838e9ca upstream.
Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But
dedotify modifies the symbol names in place, which can also modify
unrelated symbols with a name that matches a suffix of a dotted name. To
remove the leading dot of a symbol name we can just increment the pointer
into the STRTAB section instead.
Backport to all stables to avoid breakage when people update their
binutils - mpe.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d74e766e19 upstream.
This reverts commit 39d4275058.
This caused a regression on some older hardware.
bug:
https://bugzilla.kernel.org/show_bug.cgi?id=113891
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e1490a385 upstream.
This is a port of the patch "drm/amdgpu: Fix error handling in amdgpu_flip_work_func."
to fix the following problem for radeon as well which was
reported against amdgpu:
The patch e1d09dc0ccc6: "drm/amdgpu: Don't hang in
amdgpu_flip_work_func on disabled crtc." from Feb 19, 2016, leads to
the following static checker warning, as reported by Dan Carpenter in
https://lists.freedesktop.org/archives/dri-devel/2016-February/101987.html
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:127 amdgpu_flip_work_func() warn: should this be 'repcnt == -1'
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:136 amdgpu_flip_work_func() error: double unlock 'spin_lock:&crtc->dev->event_lock'
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:136 amdgpu_flip_work_func() error: double unlock 'irqsave:flags'
This patch fixes both reported problems:
Change post-decrement of repcnt to pre-decrement, so
it can't underflow anymore, but still performs up to
three repetitions - three is the maximum one could
expect in practice.
Move the spin_unlock_irqrestore to where it actually
belongs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90e94b160c upstream.
The patch e1d09dc0ccc6: "drm/amdgpu: Don't hang in
amdgpu_flip_work_func on disabled crtc." from Feb 19, 2016, leads to
the following static checker warning, as reported by Dan Carpenter in
https://lists.freedesktop.org/archives/dri-devel/2016-February/101987.html
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:127 amdgpu_flip_work_func() warn: should this be 'repcnt == -1'
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:136 amdgpu_flip_work_func() error: double unlock 'spin_lock:&crtc->dev->event_lock'
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:136 amdgpu_flip_work_func() error: double unlock 'irqsave:flags'
This patch fixes both reported problems:
Change post-decrement of repcnt to pre-decrement, so
it can't underflow anymore, but still performs up to
three repetitions - three is the maximum one could
expect in practice.
Move the spin_unlock_irqrestore to where it actually
belongs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 256faedcfd upstream.
This reverts commit dbb17a21c1.
It turns out that commit can cause problems for systems with multiple
GPUs, and causes X to hang on at least a HP Pavilion dv7 with hybrid
graphics.
This got noticed originally in 4.4.4, where this patch had already
gotten back-ported, but 4.5-rc7 was verified to have the same problem.
Alexander Deucher says:
"It looks like you have a muxed system so I suspect what's happening is
that one of the display is being reported as connected for both the
IGP and the dGPU and then the desktop environment gets confused or
there some sort problem in the detect functions since the mux is not
switched to the dGPU. I don't see an easy fix unless Dave has any
ideas. I'd say just revert for now"
Reported-by: Jörg-Volker Peetz <jvpeetz@web.de>
Acked-by: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf70e5513d upstream.
"d1cd12108346: x86, pageattr: Prevent overflow in slow_virt_to_phys() for
X86_PAE" was unintentionally removed by the recent "34437e67a672: x86/mm: Fix
slow_virt_to_phys() to handle large PAT bit".
And, the variable 'phys_addr' was defined as "unsigned long" by mistake -- it should
be "phys_addr_t".
As a result, Hyper-V network driver in 32-PAE Linux guest can't work again.
Fixes: commit 34437e67a672: "x86/mm: Fix slow_virt_to_phys() to handle large PAT bit"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: olaf@aepfle.de
Cc: jasowang@redhat.com
Cc: driverdev-devel@linuxdriverproject.org
Cc: linux-mm@kvack.org
Cc: apw@canonical.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/1456394292-9030-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 17e0521750 upstream.
The port nodes are documented as optional, treat them accordingly.
Reported-by: Martin Fuzzey <mfuzzey@parkeon.com>
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Fixes: 304e6be652 ("gpu: ipu-v3: Assign of_node of child platform devices to corresponding ports")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ec7bae8be upstream.
Public Action frames use special rules for how the BSSID field (Address
3) is set. A wildcard BSSID is used in cases where the transmitter and
recipient are not members of the same BSS. As such, we need to accept
Public Action frames with wildcard BSSID.
Commit db8e173245 ("mac80211: ignore frames between TDLS peers when
operating as AP") added a rule that drops Action frames to TDLS-peers
based on an Action frame having different DA (Address 1) and BSSID
(Address 3) values. This is not correct since it misses the possibility
of BSSID being a wildcard BSSID in which case the Address 1 would not
necessarily match.
Fix this by allowing mac80211 to accept wildcard BSSID in an Action
frame when in AP mode.
Fixes: db8e173245 ("mac80211: ignore frames between TDLS peers when operating as AP")
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9acc54beb4 upstream.
Just like for CCMP we need to check that for GCMP the fragments
have PNs that increment by one; the spec was updated to fix this
security issue and now has the following text:
The receiver shall discard MSDUs and MMPDUs whose constituent
MPDU PN values are not incrementing in steps of 1.
Adapt the code for CCMP to work for GCMP as well, luckily the
relevant fields already alias each other so no code duplication
is needed (just check the aliasing with BUILD_BUG_ON.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c36dd3eaf1 upstream.
RTS/CTS needs to be enabled if the rate is a fallback rate *or* if it's
a dual-stream rate and the sta is in dynamic SMPS mode.
Fixes: a3ebb4e1b7 ("mac80211: minstrel_ht: handle peers in dynamic SMPS")
Reported-by: Matías Richart <mrichart@fing.edu.uy>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a36b930e6 upstream.
The value 5000 was put here with the addition of the timeout field to
ieee80211_start_tx_ba_session. It was originally added in mac80211 to
save resources for drivers like iwlwifi, which only supports a limited
number of concurrent aggregation sessions.
Since iwlwifi does not use minstrel_ht and other drivers don't need
this, 0 is a better default - especially since there have been
recent reports of aggregation setup related issues reproduced with
ath9k. This should improve stability without causing any adverse
effects.
Acked-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>