Disable periodic stats update background thread and update stats in
background on demand when ndo_get_stats is called.
Having a background thread running in the driver all the time is bad for
power consumption and normally a user space daemon will query the stats
once every specific interval, so ideally the background thread and its
interval can be done in user space..
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
A per-ring counter for NOP operations already exists.
Here I add a counter that sums them up.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add ethtool counter to indicate the number of strides consumed
by filler CQEs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-channel and global ethtool counters for channel events.
Each event indicates an interrupt on one of the channel's
completion queues.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-ring and global ethtool counters for congested UMR requests.
These events indicate congestion in UMR handlers in HW.
Such event is concluded when there's an outstanding UMR post,
yet the SW consumed at least two additional MPWQEs in the meanwhile.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-channel and global ethtool counters for NAPI.
This helps us monitor and analyze performance in general.
- ch[i]_poll:
the number of times the channel's NAPI poll was invoked.
- ch[i]_arm:
the number of times the channel's NAPI poll completed
and armed the completion queues.
- ch[i]_aff_change:
the number of times the channel's NAPI poll explicitly
stopped execution on a cpu due to a change in affinity.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-ring and global ethtool counters for XDP_TX completions.
This helps us monitor and analyze XDP_TX flow performance.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-ring and global ethtool counters for TX completions.
This helps us monitor and analyze TX flow performance.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Local variable 'wq' already points to &sq->wq, use it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Replace calls to kzalloc_node with kvzalloc_node, as it fallsback
to lower-order pages if the higher-order trials fail.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch adds a counter for tx UDP GSO packets that contain a segment
that is not aligned to MSS - remaining segment.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch enables UDP GSO support. We enable this by using two WQEs
the first is a UDP LSO WQE for all segments with equal length, and the
second is for the last segment in case it has different length.
Due to HW limitation, before sending, we must adjust the packet length fields.
We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
We compare single stream UDP, UDP GSO and UDP GSO with offload.
Performance:
| MSS (bytes) | Throughput (Gbps) | CPU utilization (%)
UDP GSO offload | 1472 | 35.6 | 8%
UDP GSO | 1472 | 25.5 | 17%
UDP | 1472 | 10.2 | 17%
UDP GSO offload | 1024 | 35.6 | 8%
UDP GSO | 1024 | 19.2 | 17%
UDP | 1024 | 5.7 | 17%
UDP GSO offload | 512 | 33.8 | 16%
UDP GSO | 512 | 10.4 | 17%
UDP | 512 | 3.5 | 17%
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Coldfire still provides its own variant of the clk API rather than using
the generic COMMON_CLK API. This generally works, but it causes some
link errors with drivers using the clk_round_rate(), clk_set_rate(),
clk_set_parent(), or clk_get_parent() functions when a platform lacks
those interfaces.
This adds empty stub implementations for each of them, and I don't even
try to do something useful here but instead just print a WARN() message
to make it obvious what is going on if they ever end up being called.
The drivers that call these won't be used on these platforms (otherwise
we'd get a link error today), so the added code is harmless bloat and
will warn about accidental use.
Based on commit bd7fefe1f0 ("ARM: w90x900: normalize clk API").
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19503/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
The VDSO Makefile filters CFLAGS to select a subset which it uses whilst
building the VDSO ELF. One of the flags it allows through is the -march=
flag that selects the architecture/ISA to target.
Unfortunately in cases where CONFIG_CPU_MIPS32_R{1,2}=y and the
toolchain defaults to building for MIPS64, the main MIPS Makefile ends
up using the short-form -<arch> flags in cflags-y. This is because the
calls to cc-option always fail to use the long-form -march=<arch> flag
due to the lack of an -mabi=<abi> flag in KBUILD_CFLAGS at the point
where the cc-option function is executed. The resulting GCC invocation
is something like:
$ mips64-linux-gcc -Werror -march=mips32r2 -c -x c /dev/null -o tmp
cc1: error: '-march=mips32r2' is not compatible with the selected ABI
These short-form -<arch> flags are dropped by the VDSO Makefile's
filtering, and so we attempt to build the VDSO without specifying any
architecture. This results in an attempt to build the VDSO using
whatever the compiler's default architecture is, regardless of whether
that is suitable for the kernel configuration.
One encountered build failure resulting from this mismatch is a
rejection of the sync instruction if the kernel is configured for a
MIPS32 or MIPS64 r1 or r2 target but the toolchain defaults to an older
architecture revision such as MIPS1 which did not include the sync
instruction:
CC arch/mips/vdso/gettimeofday.o
/tmp/ccGQKoOj.s: Assembler messages:
/tmp/ccGQKoOj.s:273: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:329: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:520: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:714: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1009: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1066: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1114: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1279: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1334: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1374: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1459: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1514: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:1814: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:2002: Error: opcode not supported on this processor: mips1 (mips1) `sync'
/tmp/ccGQKoOj.s:2066: Error: opcode not supported on this processor: mips1 (mips1) `sync'
make[2]: *** [scripts/Makefile.build:318: arch/mips/vdso/gettimeofday.o] Error 1
make[1]: *** [scripts/Makefile.build:558: arch/mips/vdso] Error 2
make[1]: *** Waiting for unfinished jobs....
This can be reproduced for example by attempting to build
pistachio_defconfig using Arnd's GCC 8.1.0 mips64 toolchain from
kernel.org:
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/x86_64-gcc-8.1.0-nolibc-mips64-linux.tar.xz
Resolve this problem by using the long-form -march=<arch> in all cases,
which makes it through the arch/mips/vdso/Makefile's filtering & is thus
consistently used to build both the kernel proper & the VDSO.
The use of cc-option to prefer the long-form & fall back to the
short-form flags makes no sense since the short-form is just an
abbreviation for the also-supported long-form in all GCC versions that
we support building with. This means there is no case in which we have
to use the short-form -<arch> flags, so we can simply remove them.
The manual redefinition of _MIPS_ISA is removed naturally along with the
use of the short-form flags that it accompanied, and whilst here we
remove the separate assembler ISA selection. I suspect that both of
these were only required due to the mips32 vs mips2 mismatch that was
introduced by commit 59b3e8e9aa ("[MIPS] Makefile crapectomy.") and
fixed but not cleaned up by commit 9200c0b2a0 ("[MIPS] Fix Makefile
bugs for MIPS32/MIPS64 R1 and R2.").
I've marked this for backport as far as v4.4 where the MIPS VDSO was
introduced. In earlier kernels there should be no ill effect to using
the short-form flags.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.4+
Reviewed-by: James Hogan <jhogan@kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/19579/
random_ether_addr is a #define for eth_random_addr which is
generally preferred in kernel code by ~3:1
Convert the uses of random_ether_addr to enable removing the #define
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19600/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Annotate cpu_wait implementations using the __cpuidle macro which
places these functions in the .cpuidle.text section. This allows
cpu_in_idle() to return true for PC values which fall within these
functions, allowing nmi_backtrace() to produce cleaner output for CPUs
running idle functions. For example:
# echo l >/proc/sysrq-trigger
[ 38.587170] sysrq: SysRq : Show backtrace of all active CPUs
[ 38.593657] NMI backtrace for cpu 1
[ 38.597611] CPU: 1 PID: 161 Comm: sh Not tainted 4.18.0-rc1+ #27
[ 38.604306] Stack : 00000000 00000004 00000006 80486724 00000000 00000000 00000000 00000000
[ 38.613647] 80e17eda 00000034 00000000 00000000 80d20000 80b67e98 8e559c90 0ffe1e88
[ 38.622986] 00000000 00000000 80e70000 00000000 8f61db18 38312e34 722d302e 202b3163
[ 38.632324] 8e559d3c 8e559adc 00000001 6b636162 80d20000 80000000 00000000 80d1cfa4
[ 38.641664] 00000001 80d20000 80d19520 00000000 00000003 80836724 00000004 80e10004
[ 38.650993] ...
[ 38.653724] Call Trace:
[ 38.656499] [<8040cdd0>] show_stack+0xa0/0x144
[ 38.661475] [<80b67e98>] dump_stack+0xe8/0x120
[ 38.666455] [<80b6f6d4>] nmi_cpu_backtrace+0x1b4/0x1cc
[ 38.672189] [<80b6f81c>] nmi_trigger_cpumask_backtrace+0x130/0x1e4
[ 38.679081] [<808295d8>] __handle_sysrq+0xc0/0x180
[ 38.684421] [<80829b84>] write_sysrq_trigger+0x50/0x64
[ 38.690176] [<8061c984>] proc_reg_write+0xd0/0xfc
[ 38.695447] [<805aac1c>] __vfs_write+0x54/0x194
[ 38.700500] [<805aaf24>] vfs_write+0xe0/0x18c
[ 38.705360] [<805ab190>] ksys_write+0x7c/0xf0
[ 38.710238] [<80416018>] syscall_common+0x34/0x58
[ 38.715558] Sending NMI from CPU 1 to CPUs 0,2-3:
[ 38.720916] NMI backtrace for cpu 0 skipped: idling at r4k_wait_irqoff+0x2c/0x34
[ 38.729186] NMI backtrace for cpu 3 skipped: idling at r4k_wait_irqoff+0x2c/0x34
[ 38.737449] NMI backtrace for cpu 2 skipped: idling at r4k_wait_irqoff+0x2c/0x34
Without this we get register value & backtrace output from all CPUs,
which is generally useless for those running the idle function & serves
only to overwhelm & obfuscate the meaningful output from non-idle CPUs.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/19598/
The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.
This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:
- Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.
- Not succeed in generating the desired output from remote CPUs.
- Produce warnings about this from smp_call_function_many(), for
example:
[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0
[42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0
[42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] ------------[ cut here ]------------
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000
[42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0
[42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [<c7521934>] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [<b3fce717>] exit_mmap+0x76/0xea
[42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a
[42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c
[42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] ------------[ cut here ]------------
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
...
This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.
I've marked this for stable branches as far back as v4.9, to which it
applies cleanly. Strictly speaking the faulty MIPS implementation can be
traced further back to commit 856839b768 ("MIPS: Add
arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel
versions v3.19 through v4.8 will require further work to backport due to
the rework performed in commit 9a01c3ed5c ("nmi_backtrace: add more
trigger_*_cpu_backtrace() methods").
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19597/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.9+
Fixes: 856839b768 ("MIPS: Add arch_trigger_all_cpu_backtrace() function")
Fixes: 9a01c3ed5c ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods")
The other day I was testing one of the HP laptops at my office with an
i915/amdgpu hybrid setup and noticed that hotplugging was non-functional
on almost all of the display outputs. I eventually discovered that all
of the external outputs were connected to the amdgpu device instead of
i915, and that the hotplugs weren't being detected so long as the GPU
was in runtime suspend. After some talking with folks at AMD, I learned
that amdgpu is actually supposed to support hotplug detection in runtime
suspend so long as the OEM has implemented it properly in the firmware.
On this HP ZBook 15 G4 (the machine in question), amdgpu wasn't managing
to find the ATIF handle at all despite the fact that I could see acpi
events being sent in response to any hotplugging. After going through
dumps of the firmware, I discovered that this machine did in fact
support ATIF, but that it's ATIF method lived in an entirely different
namespace than this device's handle (the device handle was
\_SB_.PCI0.PEG0.PEGP, but ATIF lives in ATPX's handle at
\_SB_.PCI0.GFX0).
So, fix this by probing ATPX's ACPI parent's namespace if we can't find
ATIF elsewhere, along with storing a pointer to the proper handle to use
for ATIF and using that instead of the device's handle.
This fixes HPD detection while in runtime suspend for this ZBook!
v2: Update the comment to reflect how the namespaces are arranged
based on the system configuration. (Alex)
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Since it seems that some vendors are storing the ATIF ACPI methods under
the same handle that ATPX lives under instead of the device's own
handle, we're going to need to be able to retrieve this handle later so
we can probe for ATIF there.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Currently, there is nothing in amdgpu that actually uses these structs
other than amdgpu_acpi.c. Additionally, since we're about to start
saving the correct ACPI handle to use for calling ATIF in this struct
this saves us from having to handle making sure that the acpi_handle
(and by proxy, the type definition for acpi_handle and all of the other
acpi headers) doesn't need to be included within the amdgpu_drv struct
itself. This follows the example set by amdgpu_atpx_handler.c.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
- A single fix in meson for an unhandled error path in meson_drv_bind_master().
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAls0rxUACgkQ/lWMcqZw
E8NZIQ/6An8Mtj+cqLu6kBRNOFXpaWdouAy/LfHICrIA3lhNt7D6ANNs+H7Po6uO
d+S18rliimcxPrxAO3LXPoSk3PNDnScBAgZFTLHaQflcRKwGjHCcSeTAKzBFR/Ek
7Nl3rR2dD14atQ4Z7sdcXEpr7jMyK/7n8qCcYLn6EG1scolH6Rk+SWBiLnmyCylZ
BjfNzZcOoiP9RjoyOJMOUH46AQ+AOTeTaY5lTZbmHrNJR1DjttRrymaWbaJQfHsE
2AndEUjEEhr8NSVASi/RL6ds7q9jcNqbudCJXji9I8Y+BWaCUKG29jNzd2Tg3pC6
wgFzztxHzMRctCkuxbsJ6M0XGP5thj2/6uPqU0jryj27S9fh0ptm6nhaGA4RH0uS
nstZWZlA7TTyfaiJVxyKFwSoHhdDzOlhyoLhYRS1oStC88KfJwAPfvvk/vfYhEb/
IUME201f8PAM+O+0nyiw+cQXQsmSR/XZ8TPUgojZu6nzYPd4Lb/Yffk7THw/QMO1
1cV18uzlRE52q1QK7fl8+rCa0PZN/lpRC1do7qRgAZExwu4+NN0jOWqiPLoEWNA6
KPDao27gFFZoYeNBF1mN7nbM1ENQKCuCWzZIsN1BZpnxDF7X8GgvZWjMn63NQmBV
2U/woZ4FmNWaj4swf6qQUP/9r1Gbayue8rOyv9lqzVl0bmjQtvA=
=zPj3
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2018-06-28' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v4.18-rc3:
- A single fix in meson for an unhandled error path in meson_drv_bind_master().
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fa740f31-5a8d-ed45-5e8a-aecd3f6f11b7@linux.intel.com
A few fixes for 4.18:
- fix a read past the end of an array due to vega20 changes
- fix driver on systems with non-4K pages
- fix locking with pageflipping in DC that could lead to a sleep while atomic
- fix VCN firmware version reporting for upcoming firmware
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628032641.2765-1-alexander.deucher@amd.com
Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX
flag is set on the device's request queue to decide whether or not the
device supports filesystem DAX. Really we should be using
bdev_dax_supported() like filesystems do at mount time. This performs
other tests like checking to make sure the dax_direct_access() path works.
We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if
any of the underlying devices do not support DAX. This makes the handling
of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags
in dm_table_set_restrictions().
Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this
will ensure that filesystems built upon DM devices will only be able to
mount with DAX if all underlying devices also support DAX.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: commit 545ed20e6d ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add an explicit check for QUEUE_FLAG_DAX to __bdev_dax_supported(). This
is needed for DM configurations where the first element in the dm-linear or
dm-stripe target supports DAX, but other elements do not. Without this
check __bdev_dax_supported() will pass for such devices, letting a
filesystem on that device mount with the DAX option.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Fixes: commit 545ed20e6d ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
QUEUE_FLAG_DAX is an indication that a given block device supports
filesystem DAX and should not be set for PMEM namespaces which are in "raw"
mode. These namespaces lack struct page and are prevented from
participating in filesystem DAX as of commit 569d0365f5 ("dax: require
'struct page' by default for filesystem dax").
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Fixes: 569d0365f5 ("dax: require 'struct page' by default for filesystem dax")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
During assemble, the spare marked for replacement is not checked.
conf->fullsync cannot be updated to be 1. As a result, recovery will
treat it as a clean array. All recovering sectors are skipped. Original
device is replaced with the not-recovered spare.
mdadm -C /dev/md0 -l10 -n4 -pn2 /dev/loop[0123]
mdadm /dev/md0 -a /dev/loop4
mdadm /dev/md0 --replace /dev/loop0
mdadm -S /dev/md0 # stop array during recovery
mdadm -A /dev/md0 /dev/loop[01234]
After reassemble, you can see recovery go on, but it completes
immediately. In fact, recovery is not actually processed.
To solve this problem, we just add the missing logics for replacment
spares. (In raid1.c or raid5.c, they have already been checked.)
Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>
If we have more interrupts pending (because we know there are more
breadcrumb signals before the completion), then we do not need to
trigger an irq_seqno_barrier or even wakeup the task on this interrupt
as there will be another. To allow some margin of error (we are trying
to work around incoherent seqno after all), we wakeup the breadcrumb
before the target as well as on the target.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-2-chris@chris-wilson.co.uk
By taking advantage of the RCU protection of the task struct, we can find
the appropriate signaler under the spinlock and then release the spinlock
before waking the task and signaling the fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-1-chris@chris-wilson.co.uk
At the moment, gem_exec_gttfill fails with a sporadic EBUSY due to us
wanting to unbind a pinned batch. Let's dump who first bound that vma to
see if that helps us identify who still unexpectedly has it pinned.
v2: We cannot allocate inside the printer (as it may be on an fs-reclaim
path), so hope for the best and build the string on the stack
v3: stack depth of 16 routinely overflows a 512 character string, limit
it to 12 to avoid unsightly truncation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628132206.8329-1-chris@chris-wilson.co.uk
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJbMkMKAAoJEFKgDEdIgJTy8PMP+QEF+x9/+wi2AWqm2cnmgm9t
eXyzejBEEyFNVQNu5ndhXujfsBO+BE+ZyGKcbV9jGehz43IWGT8YD05r3w1yHIpq
IReWc6AiDIs0sWKjliYJKm1GWv10zfEb5hXiwaQsdQBXqj5vSEb+qQFI8noxvnp7
Ogk2V32Et6+ZCI6FA8858vVq8vyDGEJG/xNwPB3ANdOOvQoTGd4SmNneovUY5kzA
yOYViATwH28bEP/x6p4WDO282uryQhMopkIWpUIbZ08WmHxHg8KYlHCk/IOBEhoG
gxU38YDZMkEolh2Ptgd8y2VVMu2YBeVf3N+bPoxalUsfvIjgMGNXDr37hL6+bsc0
gXRiHLtAyIwBKeH+eTzj0phNwZ/JTTmqsoI0JAmu2x2CVlWBo2VsGtw0um9GTYmt
eZ4WdrZo7QmlfdQzvGdPe2OBcTLBqx9jZ3UyZvK882V88mMpmxew5jAZFxN1nqPu
NUI9grCd/H80gLEi5gjDSjCrKrOGqaBbInZ/pQb4ETLDfueGCoeYCOvVzNwdQklE
FheDcVMpuZOMliXI2jsYuGcMdlTRnUM0NDBh4NJaL+cjAgvAK7TQccejK88TInea
K1EE66PZnmDVlx+EIuIdm1rSLwGZHwfjEjn27siMKfsJKfh6qY5EleWO8s4qJNoD
ZVMU+0BWmiFAcemevcu9
=yyKY
-----END PGP SIGNATURE-----
Merge tag 'printk-for-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
"Revert a commit that went in by mistake. I already have a better fix
in the queue for 4.19"
* tag 'printk-for-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
Revert "lib/test_printf.c: call wait_for_random_bytes() before plain %p tests"
Over a dozen of changes, but all small and clear fixes.
A half of them are the regression fixes for CA0132 HD-audio codec,
and the rest are, again, a few more fixups for HD-audio, two UBSAN
fixes in the core ioctls, and a trivial fix in the error path
handling in lx6464es driver.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAls0CQUOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8S7A//U4T/cbxv+ozwnrtD0Flz962JM1sfe2d8hCp1
kQyS7ImYtiqx7UjZCm6nmwNMM4vnUDfSDoTB/OjAQgSIXsu/gme9FvmZeFbso4j2
9mIGaU5te2StylaUrwHyLt3OMBKZGJ6xPxXSI36Fe+YktWB1jlul7kmGxfw1PHCp
VHLTcEebgAhgJYPDlZFCu0XOZXRCjr4bKnwVSXA/HPMk+5kvDIP1wfcG5b5dC/6R
Q0y3tKJZqfIK13eivppdOYQ/0AvaognZXvCA3NeFTjmuDCe+9B1QNOqnzUba53TI
/EZDKmMU3wZ0UnO6NVnpFoFzxl7Z82qTAcOPXC+QSPTCzqk6j+vYuEx9TmZAsaE6
sOoTIXAFRdksMBcC4zh5KdhsspuPEtPeG2yuOtm/J/32Iome2G9pZd3aT5YpfqbI
sX0h7bDSLpgsvvueBaLimBgs0gpCUYE7AqLUlHPtSBF8Dl7mOKVz9vlXCI0v0Q4/
PHhPYA4XBKRnexTaj8qmv0WlhPQb3vXq9nVJ7LTvYCGJUIHgnXj7duWMrVBZvsOT
Bci8r9p9RB+LRsoGwAvZ9yP/q0TlnyCs0CwNRoBudOlU/u4SiGvav0mkhA58/czZ
JoMmhq5SL7RrDGPJ9e3Z+UVx+YNyG6abuKuuCm9WTCEYzmwoh9WElcjgdcl7THP6
RZCNdgg=
=yeOb
-----END PGP SIGNATURE-----
Merge tag 'sound-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Over a dozen changes, but all small and clear fixes.
Half of them are the regression fixes for CA0132 HD-audio codec, and
the rest are, again, a few more fixups for HD-audio, two UBSAN fixes
in the core ioctls, and a trivial fix in the error path handling in
lx6464es driver"
* tag 'sound-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: seq: Fix UBSAN warning at SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT ioctl
ALSA: timer: Fix UBSAN warning at SNDRV_TIMER_IOCTL_NEXT_DEVICE ioctl
ALSA: hda/realtek - Fix the problem of two front mics on more machines
ALSA: hda/realtek - Add a quirk for FSC ESPRIMO U9210
ALSA: hda/ca0132: make array ca0132_alt_chmaps static
ALSA: hda - Force to link down at runtime suspend on ATI/AMD HDMI
ALSA: lx6464es: Missing error code in snd_lx6464es_create()
ALSA: hda/ca0132: Fix DMic data rate for Alienware M17x R4
ALSA: hda/ca0132: Restore PCM Analog Mic-In2
ALSA: hda/ca0132: Don't test for QUIRK_NONE
ALSA: hda/ca0132: Restore behavior of QUIRK_ALIENWARE
ALSA: hda/ca0132: Delete redundant UNSOL event requests
ALSA: hda/ca0132: Delete pointless assignments to struct auto_pin_cfg fields
ALSA: hda/realtek - Fix pop noise on Lenovo P50 & co
The driver was suggested for deletion as it implements a subset of the
netdev trigger. It's in the way for further cleanups in the trigger
code but doesn't get an Ack by someone who can actually test and confirm
that the netdev trigger works for can devices. So marking as broken to
get forward with the cleanups.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
The leds.txt was moved and renamed. Fix references to
it accordingly.
Fixes: f67605394f ("devicetree/bindings: Move gpio-leds binding into leds directory")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
All the triggers are defined in a big if LEDS_TRIGGERS...endif block.
So there is no need to let each driver depend on LEDS_TRIGGERS explicitly
once more.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
PC Engines apu3 is an improved version of the apu2, using the same SoC
and almost everything else.
This patch reuses as much as possible from the apu2 definitions, to
avoid redundancy.
Signed-off-by: Raffaello D. Di Napoli <rafdev@dinapo.li>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
- Add a quirk for a bunch of broken Macronix chips
- Fix nand_block_bad() when chip->ecc.read_oob() returns a positive
value encoding the number of bitflips
- Fix OOB handling in the MXC driver fo V2.1 controllers
- Flag the ONFI_FEATURE_ON_DIE_ECC as supported in the Micron driver
- Hardcode clk rate in the denali_dt driver to address a bad DT
representation (the proper fix will be queued for 4.19)
SPI NOR fixes:
- Add an ULL constant to some ID definitions so that the ID is not
truncated on 32-bit platforms
MTD fixes:
- Fix the sector unlocking logic in the CFI driver
-----BEGIN PGP SIGNATURE-----
iQI5BAABCAAjBQJbNJHjHBxib3Jpcy5icmV6aWxsb25AYm9vdGxpbi5jb20ACgkQ
Ze02AX4ItwBEFBAAw65L3Su2REWkqWe0x2lfT/OB61CUd7NlLLifjjxWj6ysRrlO
BiomFoeXITlDPNMWLMYygzm7e8Lf2+Nb59pM4aMS/V+Yech6HD6j8qld7IJrz7/U
YBRUNTKTfkc1jI2KothXWWLcltAtS0XzADTs+Lxn5BZ0a4idFay/iqeB/wDIwZ/T
dQi08OrlbZ/H3ggLN7PoCK+vRnamjpnLecYdkHSMNP/T0msKPT6UJxZaoTZURBlq
qeI6rClcwFlfjYFV70UchgjeD++rhE1cy14jO38dodbpPl3qoRiqsFi3kFHjn+a1
b+nJXIQWL5U1NzWiVNwxQHpoeRHU39Cpg1VxMcAjtkHJxnHKZ9C+dkYZXYAmchfv
QYD0cd7KlmYgHNBjYKlUeBdS/X1qnTYx7su/69YavxgPzscaFuOHS8AgctvmJJhc
dMxDJR29lgtKOOB+AsNo+8wRtZpEef3pE/s14QZmR7/71tIAxBVVlakWCH3ND3NJ
7lTL8BT8bAp0v2EcEMghyc71ryT0gCZdSHMOvJ/bQadnB//LTT9aXkDW2IdRHfQ0
DyBtnHlZxudYLZJ/uIB8zsoYY852rYYFaJ/H5gUafQdZ+8NWT7zYdlizXtzGxcEF
Eq8CxDdbKmVSwpic/Nfl5i5T7zs9BOST3UkG/yiimx9X+GlSAGTRaKv7JIw=
=BjBU
-----END PGP SIGNATURE-----
Merge tag 'mtd/fixes-for-4.18-rc3' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
"NAND fixes:
- add a quirk for a bunch of broken Macronix chips
- fix nand_block_bad() when chip->ecc.read_oob() returns a positive
value encoding the number of bitflips
- fix OOB handling in the MXC driver fo V2.1 controllers
- flag the ONFI_FEATURE_ON_DIE_ECC as supported in the Micron driver
- hardcode clk rate in the denali_dt driver to address a bad DT
representation (the proper fix will be queued for 4.19)
SPI NOR fixes:
- add an ULL constant to some ID definitions so that the ID is not
truncated on 32-bit platforms
MTD fixes:
- fix the sector unlocking logic in the CFI driver"
* tag 'mtd/fixes-for-4.18-rc3' of git://git.infradead.org/linux-mtd:
mtd: rawnand: denali_dt: set clk_x_rate to 200 MHz unconditionally
mtd: dataflash: Use ULL suffix for 64-bit constants
mtd: cfi_cmdset_0002: Avoid walking all chips when unlocking.
mtd: cfi_cmdset_0002: Fix unlocking requests crossing a chip boudary
mtd: cfi_cmdset_0002: fix SEGV unlocking multiple chips
mtd: cfi_cmdset_0002: Use right chip in do_ppb_xxlock()
mtd: rawnand: All AC chips have a broken GET_FEATURES(TIMINGS).
mtd: rawnand: fix return value check for bad block status
mtd: rawnand: mxc: set spare area size register explicitly
mtd: rawnand: micron: add ONFI_FEATURE_ON_DIE_ECC to supported features
Since NFS allows open-by-fhandle, we have to cope with the possibility
of mkdir vs. open-by-guessed-handle races. A local filesystem could
decide what the inumber of the new object will be and insert a locked
inode with that inumber into icache _before_ the on-disk data structures
begin to look good and unlock it only once it has a dentry alias, so
that open-by-handle coming first would quietly fail and mkdir coming
first would have open-by-handle grab its dentry.
For NFS it's a non-starter - the icache key is server-supplied fhandle
and we do not get that until the object has been fully created on server.
We really have to deal with the possibility that open-by-handle gets
the in-core inode and attaches a dentry to it before mkdir does.
Solution: let nfs_mkdir() use d_splice_alias() to catch those. We can
* get an error. Just return it to our caller.
* get NULL - no preexisting dentry aliases, we'd just done what
d_add() would've done. Success.
* get a reference to preexisting alias. In that case the alias
had been moved in place of nfs_mkdir() argument (and hashed there), while
nfs_mkdir() argument is left unhashed negative. Which is just fine for
->mkdir() callers, all we need is to release the reference we'd got from
d_splice_alias() and report success.
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A new member Vr2_I2C_address is added.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The vbios firmware structure changed between v3_1 and v3_2. So,
the code to setup bootup values needs different paths based
on header version.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Thermal support is enabled on vega12.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The generic nmi_cpu_backtrace() function calls show_regs() when a struct
pt_regs is available, and dump_stack() otherwise. If we were to make use
of the generic nmi_cpu_backtrace() with MIPS' current implementation of
show_regs() this would mean that we see only register data with no
accompanying stack information, in contrast with our current
implementation which calls dump_stack() regardless of whether register
state is available.
In preparation for making use of the generic nmi_cpu_backtrace() to
implement arch_trigger_cpumask_backtrace(), have our implementation of
show_regs() call dump_stack() and drop the explicit dump_stack() call in
arch_dump_stack() which is invoked by arch_trigger_cpumask_backtrace().
This will allow the output we produce to remain the same after a later
patch switches to using nmi_cpu_backtrace(). It may mean that we produce
extra stack output in other uses of show_regs(), but this:
1) Seems harmless.
2) Is good for consistency between arch_trigger_cpumask_backtrace()
and other users of show_regs().
3) Matches the behaviour of the ARM & PowerPC architectures.
Marked for stable back to v4.9 as a prerequisite of the following patch
"MIPS: Call dump_stack() from show_regs()".
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19596/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.9+
Merge fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
proc: add Alexey to MAINTAINERS
kasan: depend on CONFIG_SLUB_DEBUG
include/linux/dax.h: dax_iomap_fault() returns vm_fault_t
x86/e820: put !E820_TYPE_RAM regions into memblock.reserved
slub: fix failure when we delete and create a slab cache
Revert mm/vmstat.c: fix vmstat_update() preemption BUG
lib/percpu_ida.c: don't do alloc from per-CPU list if there is none
KASAN depends on having access to some of the accounting that SLUB_DEBUG
does; without it, there are immediate crashes [1]. So, the natural
thing to do is to make KASAN select SLUB_DEBUG.
[1] http://lkml.kernel.org/r/CAHmME9rtoPwxUSnktxzKso14iuVCWT7BE_-_8PAC=pGw1iJnQg@mail.gmail.com
Link: http://lkml.kernel.org/r/20180622154623.25388-1-Jason@zx2c4.com
Fixes: f9e13c0a5a ("slab, slub: skip unnecessary kasan_cache_shutdown()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1c8f422059 ("mm: change return type to vm_fault_t") missed a
conversion. It's not a big problem at present because mainline is still
using
typedef int vm_fault_t;
Fixes: 1c8f422059 ("mm: change return type to vm_fault_t")
Link: http://lkml.kernel.org/r/20180620172046.GA27894@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a kernel panic that is triggered when reading /proc/kpageflags
on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
BUG: unable to handle kernel paging request at fffffffffffffffe
PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
RIP: 0010:stable_page_flags+0x27/0x3c0
Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
Call Trace:
kpageflags_read+0xc7/0x120
proc_reg_read+0x3c/0x60
__vfs_read+0x36/0x170
vfs_read+0x89/0x130
ksys_pread64+0x71/0x90
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7efc42e75e23
Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
According to kernel bisection, this problem became visible due to commit
f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
which changes how struct pages are initialized.
Memblock layout affects the pfn ranges covered by node/zone. Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
the default (no memmap= given) memblock layout is like below:
MEMBLOCK configuration:
memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
memory.cnt = 0x4
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
...
If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:
MEMBLOCK configuration:
memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
memory.cnt = 0x3
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
...
This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory. So some of struct pages in the gap
range are left uninitialized.
We have a function zero_resv_unavail() which does zeroing the struct pages
within the reserved unavailable range (i.e. memblock.memory &&
!memblock.reserved). This patch utilizes it to cover all unavailable
ranges by putting them into memblock.reserved.
Link: http://lkml.kernel.org/r/20180615072947.GB23273@hori1.linux.bs1.fc.nec.co.jp
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Oscar Salvador <osalvador@suse.de>
Tested-by: "Herton R. Krzesinski" <herton@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In kernel 4.17 I removed some code from dm-bufio that did slab cache
merging (commit 21bb13276768: "dm bufio: remove code that merges slab
caches") - both slab and slub support merging caches with identical
attributes, so dm-bufio now just calls kmem_cache_create and relies on
implicit merging.
This uncovered a bug in the slub subsystem - if we delete a cache and
immediatelly create another cache with the same attributes, it fails
because of duplicate filename in /sys/kernel/slab/. The slub subsystem
offloads freeing the cache to a workqueue - and if we create the new
cache before the workqueue runs, it complains because of duplicate
filename in sysfs.
This patch fixes the bug by moving the call of kobject_del from
sysfs_slab_remove_workfn to shutdown_cache. kobject_del must be called
while we hold slab_mutex - so that the sysfs entry is deleted before a
cache with the same attributes could be created.
Running device-mapper-test-suite with:
dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/
triggered:
Buffer I/O error on dev dm-0, logical block 1572848, async page read
device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5
device-mapper: thin: 253:1: aborting current metadata transaction
sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144'
CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25
Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017
Workqueue: dm-thin do_worker [dm_thin_pool]
Call Trace:
dump_stack+0x5a/0x73
sysfs_warn_dup+0x58/0x70
sysfs_create_dir_ns+0x77/0x80
kobject_add_internal+0xba/0x2e0
kobject_init_and_add+0x70/0xb0
sysfs_slab_add+0xb1/0x250
__kmem_cache_create+0x116/0x150
create_cache+0xd9/0x1f0
kmem_cache_create_usercopy+0x1c1/0x250
kmem_cache_create+0x18/0x20
dm_bufio_client_create+0x1ae/0x410 [dm_bufio]
dm_block_manager_create+0x5e/0x90 [dm_persistent_data]
__create_persistent_data_objects+0x38/0x940 [dm_thin_pool]
dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool]
metadata_operation_failed+0x59/0x100 [dm_thin_pool]
alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool]
process_cell+0x2a3/0x550 [dm_thin_pool]
do_worker+0x28d/0x8f0 [dm_thin_pool]
process_one_work+0x171/0x370
worker_thread+0x49/0x3f0
kthread+0xf8/0x130
ret_from_fork+0x35/0x40
kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory.
kmem_cache_create(dm_bufio_buffer-16) failed with error -17
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Mike Snitzer <snitzer@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>