Commit Graph

5702 Commits

Author SHA1 Message Date
Alexander A. Klimov
c8ed9fc9d2 powerpc: Replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200718103958.5455-1-grandmaster@al2klimov.de
2020-07-22 00:01:23 +10:00
Aneesh Kumar K.V
e0d8e991be powerpc/book3s64/kuap: Move UAMOR setup to key init function
UAMOR values are not application-specific. The kernel initializes
its value based on different reserved keys. Remove the thread-specific
UAMOR value and don't switch the UAMOR on context switch.

Move UAMOR initialization to key initialization code and remove
thread_struct.uamor because it is not used anymore.

Before commit: 4a4a5e5d2a ("powerpc/pkeys: key allocation/deallocation must not change pkey registers")
we used to update uamor based on key allocation and free.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V
000a42b35a powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec
As we kexec across kernels that use AMR/IAMR for different purposes
we need to ensure that new kernels get kexec'd with a reset value
of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old
AMR value prevents access to key 0.

This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR
is not needed and the IAMR reset is partial (it doesn't do the reset
on secondary cpus) and is redundant with this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-19-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V
f7045a4511 powerpc/book3s64/pkeys: Use MMU_FTR_PKEY instead of pkey_disabled static key
Instead of pkey_disabled static key use mmu feature MMU_FTR_PKEY.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-17-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V
2daf298de7 powerpc/book3s64/pkeys: Use pkey_execute_disable_supported
Use pkey_execute_disable_supported to check for execute key support instead
of pkey_disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-16-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V
e10cc8715d powerpc/book3s64/kuep: Add MMU_FTR_KUEP
This will be used to enable/disable Kernel Userspace Execution
Prevention (KUEP).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-15-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V
d3cd91fb8d powerpc/book3s64/pkeys: Add MMU_FTR_PKEY
Parse storage keys related device tree entry in early_init_devtree
and enable MMU feature MMU_FTR_PKEY if pkeys are supported.

MMU feature is used instead of CPU feature because this enables us
to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-14-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V
3c8ab47362 powerpc/book3s64/pkeys: Make initial_allocation_mask static
initial_allocation_mask is not used outside this file.
Also mark reserved_allocation_mask and initial_allocation_mask __ro_after_init;

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-12-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V
c529afd7cb powerpc/book3s64/pkeys: Convert pkey_total to num_pkey
num_pkey now represents max number of keys supported such that we return
to userspace 0 - num_pkey - 1.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-11-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V
a4678d4b47 powerpc/book3s64/pkeys: Simplify pkey disable branch
Make the default value FALSE (pkey enabled) and set to TRUE when we
find the total number of keys supported to be zero.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-10-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V
a24204c307 powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEY
We don't use CPU_FTR_PKEY anymore. Remove the feature bit and mark it
free.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-9-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V
ee8b39331f powerpc/book3s64/pkeys: Move pkey related bits in the linux page table
To keep things simple, all the pkey related bits are kept together
in linux page table for 64K config with hash translation. With hash-4k
kernel requires 4 bits to store slots details. This is done by overloading
some of the RPN bits for storing the slot details. Due to this PKEY_BIT0 on
the 4K config is used for storing hash slot details.

64K before

|....|RSV1| RSV2| RSV3 | RSV4 | RPN44| RPN43   |.... | RSV5|
|....| P4 |  P3 |  P2  |  P1  | Busy | HASHPTE |.... |  P0 |

after

|....|RSV1| RSV2| RSV3 | RSV4 | RPN44 | RPN43   |.... | RSV5 |
|....| P4 |  P3 |  P2  |  P1  | P0    | HASHPTE |.... | Busy |

4k before

|....| RSV1 | RSV2     | RSV3 | RSV4 | RPN44| RPN43.... | RSV5|
|....| Busy |  HASHPTE |  P2  |  P1  | F_SEC| F_GIX.... |  P0 |

after

|....| RSV1    | RSV2| RSV3 | RSV4 | Free | RPN43.... | RSV5 |
|....| HASHPTE |  P2 |  P1  |  P0  | F_SEC| F_GIX.... | BUSY |

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-5-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:57 +10:00
Aneesh Kumar K.V
b9658f83e7 powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-4-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:57 +10:00
Aneesh Kumar K.V
33699023f5 powerpc/book3s64/pkeys: Fixup bit numbering
This number the pkey bit such that it is easy to follow. PKEY_BIT0 is
the lower order bit. This makes further changes easy to follow.

No functional change in this patch other than linux page table for
hash translation now maps pkeys differently.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-3-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:57 +10:00
Santosh Sivaraj
c37a63afc4 powerpc/mce: Add MCE notification chain
Introduce notification chain which lets us know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.

Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org
2020-07-20 22:57:56 +10:00
Aneesh Kumar K.V
af9d00e93a powerpc/mm/radix: Create separate mappings for hot-plugged memory
To enable memory unplug without splitting kernel page table
mapping, we force the max mapping size to the LMB size. LMB
size is the unit in which hypervisor will do memory add/remove
operation.

Pseries systems supports max LMB size of 256MB. Hence on pseries,
we now end up mapping memory with 2M page size instead of 1G. To improve
that we want hypervisor to hint the kernel about the hotplug
memory range. That was added that as part of

commit b6eca183e2 ("powerpc/kernel: Enables memory
hot-remove after reboot on pseries guests")

But PowerVM doesn't provide that hint yet. Once we get PowerVM
updated, we can then force the 2M mapping only to hot-pluggable
memory region using memblock_is_hotpluggable(). Till then
let's depend on LMB size for finding the mapping page size
for linear range.

With this change KVM guest will also be doing linear mapping with
2M page size.

The actual TLB benefit of mapping guest page table entries with
hugepage size can only be materialized if the partition scoped
entries are also using the same or higher page size. A guest using
1G hugetlbfs backing guest memory can have a performance impact with
the above change.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Fold in fix from Aneesh spotted by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709131925.922266-5-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:56 +10:00
Aneesh Kumar K.V
645d5ce2f7 powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
We can hit the following BUG_ON during memory unplug:

kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
NIP [c000000000093308] pmd_fragment_free+0x48/0xc0
LR [c00000000147bfec] remove_pagetable+0x578/0x60c
Call Trace:
0xc000008050000000 (unreliable)
remove_pagetable+0x384/0x60c
radix__remove_section_mapping+0x18/0x2c
remove_section_mapping+0x1c/0x3c
arch_remove_memory+0x11c/0x180
try_remove_memory+0x120/0x1b0
__remove_memory+0x20/0x40
dlpar_remove_lmb+0xc0/0x114
dlpar_memory+0x8b0/0xb20
handle_dlpar_errorlog+0xc0/0x190
pseries_hp_work_fn+0x2c/0x60
process_one_work+0x30c/0x810
worker_thread+0x98/0x540
kthread+0x1c4/0x1d0
ret_from_kernel_thread+0x5c/0x74

This occurs when unplug is attempted for such memory which has
been mapped using memblock pages as part of early kernel page
table setup. We wouldn't have initialized the PMD or PTE fragment
count for those PMD or PTE pages.

This can be fixed by allocating memory in PAGE_SIZE granularity
during early page table allocation. This makes sure a specific
page is not shared for another memblock allocation and we can
free them correctly on removing page-table pages.

Since we now do PAGE_SIZE allocations for both PUD table and
PMD table (Note that PTE table allocation is already of PAGE_SIZE),
we end up allocating more memory for the same amount of system RAM.
Here is a comparision of how much more we need for a 64T and 2G
system after this patch:

1. 64T system
-------------
64T RAM would need 64G for vmemmap with struct page size being 64B.

128 PUD tables for 64T memory (1G mappings)
1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K

2. 2G system
------------
2G RAM would need 2M for vmemmap with struct page size being 64B.

1 PUD table for 2G memory (1G mapping)
1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709131925.922266-2-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:56 +10:00
Michael Ellerman
ef9f7cfaa5 Merge branch 'fixes' into next
Merge our fixes branch, primarily to bring in the ebb selftests build
fix and the pkey fix, which is a dependency for some future work.
2020-07-18 22:43:55 +10:00
Anju T Sudhakar
77ca3951cc powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc
IMC trace-mode record has MSR[HV PR] bits added in the third DW.
These bits can be used to set the cpumode for the instruction pointer
captured in each sample.

Add support in kernel to use these bits to set the cpumode for
each sample.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713144623.508695-1-maddy@linux.ibm.com
2020-07-16 13:12:46 +10:00
YueHaibing
29d9407e10 powerpc/xive: Remove unused inline function xive_kexec_teardown_cpu()
commit e27e0a9465 ("powerpc/xive: Remove xive_kexec_teardown_cpu()")
left behind this, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200715025040.33952-1-yuehaibing@huawei.com
2020-07-16 13:12:44 +10:00
Anton Blanchard
ade7667a98 powerpc: Add cputime_to_nsecs()
Generic code has a wrapper to implement cputime_to_nsecs() on top of
cputime_to_usecs() but we can easily return the full nanosecond
resolution directly.

Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713083601.1103978-1-anton@ozlabs.org
2020-07-16 13:12:43 +10:00
Balamuruhan S
e4208f1399 powerpc/ppc-opcode: Fold PPC_INST_* macros into PPC_RAW_* macros
Lots of PPC_INST_* macros are used only ever in PPC_* macros, fold
those PPC_INST_* into PPC_RAW_* to avoid using PPC_INST_*
accidentally.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Deal with PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-7-bala24@linux.ibm.com
2020-07-16 13:12:43 +10:00
Balamuruhan S
357c572948 powerpc/ppc-opcode: Reuse raw instruction macros to stringify
Wrap existing stringify macros to reuse raw instruction encoding
macros that are newly added.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-6-bala24@linux.ibm.com
2020-07-16 13:12:43 +10:00
Balamuruhan S
3a18123791 powerpc/ppc-opcode: Consolidate powerpc instructions from bpf_jit.h
Move macro definitions of powerpc instructions from bpf_jit.h to
ppc-opcode.h and adopt the users of the macros accordingly. `PPC_MR()`
is defined twice in bpf_jit.h, remove the duplicate one.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-5-bala24@linux.ibm.com
2020-07-16 13:12:42 +10:00
Balamuruhan S
1d33dd8408 powerpc/ppc-opcode: Move ppc instruction encoding from test_emulate_step
Few ppc instructions are encoded in test_emulate_step.c, consolidate
them and use it from ppc-opcode.h

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-3-bala24@linux.ibm.com
2020-07-16 13:12:42 +10:00
Balamuruhan S
db551f8cc6 powerpc/ppc-opcode: Introduce PPC_RAW_* macros for base instruction encoding
Introduce PPC_RAW_* macros to have all the bare encoding of ppc
instructions. Move `VSX_XX*()` and `TMRN()` macros up to reuse it.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-2-bala24@linux.ibm.com
2020-07-16 13:12:41 +10:00
Nathan Lynch
38c392cef1 powerpc/pseries: remove dlpar_cpu_readd()
dlpar_cpu_readd() is unused now.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-18-nathanl@linux.ibm.com
2020-07-16 13:12:40 +10:00
Nathan Lynch
4abe60c644 powerpc/pseries: remove memory "re-add" implementation
dlpar_memory() no longer has any callers which pass
PSERIES_HP_ELOG_ACTION_READD. Remove this case and the corresponding
unreachable code.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-17-nathanl@linux.ibm.com
2020-07-16 13:12:40 +10:00
Nathan Lynch
cdf082c457 powerpc/numa: remove arch_update_cpu_topology
Since arch_update_cpu_topology() doesn't do anything on powerpc now,
remove it and associated dead code.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-15-nathanl@linux.ibm.com
2020-07-16 13:12:39 +10:00
Nathan Lynch
042ef7cc43 powerpc/numa: remove prrn_is_enabled()
All users of this prrn_is_enabled() are gone; remove it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-14-nathanl@linux.ibm.com
2020-07-16 13:12:39 +10:00
Nathan Lynch
1835303e56 powerpc/numa: remove start/stop_topology_update()
These APIs have become no-ops, so remove them and all call sites.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-12-nathanl@linux.ibm.com
2020-07-16 13:12:38 +10:00
Nathan Lynch
b1815aeac7 powerpc/numa: remove timed_topology_update()
timed_topology_update is a no-op now, so remove it and all call sites.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-11-nathanl@linux.ibm.com
2020-07-16 13:12:37 +10:00
Nathan Lynch
ec2fc2a9e9 powerpc/rtas: don't online CPUs for partition suspend
Partition suspension, used for hibernation and migration, requires
that the OS place all but one of the LPAR's processor threads into one
of two states prior to calling the ibm,suspend-me RTAS function:

  * the architected offline state (via RTAS stop-self); or
  * the H_JOIN hcall, which does not return until the partition
    resumes execution

Using H_CEDE as the offline mode, introduced by
commit 3aa565f53c ("powerpc/pseries: Add hooks to put the CPU into
an appropriate offline state"), means that any threads which are
offline from Linux's point of view must be moved to one of those two
states before a partition suspension can proceed.

This was eventually addressed in commit 120496ac2d ("powerpc: Bring
all threads online prior to migration/hibernation"), which added code
to temporarily bring up any offline processor threads so they can call
H_JOIN. Conceptually this is fine, but the implementation has had
multiple races with cpu hotplug operations initiated from user
space[1][2][3], the error handling is fragile, and it generates
user-visible cpu hotplug events which is a lot of noise for a platform
feature that's supposed to minimize disruption to workloads.

With commit 3aa565f53c ("powerpc/pseries: Add hooks to put the CPU
into an appropriate offline state") reverted, this code becomes
unnecessary, so remove it. Since any offline CPUs now are truly
offline from the platform's point of view, it is no longer necessary
to bring up CPUs only to have them call H_JOIN and then go offline
again upon resuming. Only active threads are required to call H_JOIN;
stopped threads can be left alone.

[1] commit a6717c01dd ("powerpc/rtas: use device model APIs and
    serialization during LPM")
[2] commit 9fb603050f ("powerpc/rtas: retry when cpu offline races
    with suspend/migration")
[3] commit dfd718a2ed ("powerpc/rtas: Fix a potential race between
    CPU-Offline & Migration")

Fixes: 120496ac2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com
2020-07-16 13:12:35 +10:00
Nicholas Piggin
4d24e21cc6 powerpc/security: Allow for processors that flush the link stack using the special bcctr
If both count cache and link stack are to be flushed, and can be flushed
with the special bcctr, patch that in directly to the flush/branch nop
site.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-7-npiggin@gmail.com
2020-07-16 13:12:32 +10:00
Nicholas Piggin
70d7cdaf05 powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-6-npiggin@gmail.com
2020-07-16 13:12:32 +10:00
Nicholas Piggin
1026798c64 powerpc/security: re-name count cache flush to branch cache flush
The count cache flush mostly refers to both count cache and link stack
flushing. As a first step to untangling these a bit, re-name the bits
that apply to both.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-2-npiggin@gmail.com
2020-07-16 13:12:31 +10:00
Aneesh Kumar K.V
76e6c73f33 powerpc/pmem: Update ppc64 to use the new barrier instruction.
pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-6-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:23 +10:00
Aneesh Kumar K.V
d358042793 powerpc/pmem: Add flush routines using new pmem store and sync instruction
Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-4-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:22 +10:00
Aneesh Kumar K.V
32db09d992 powerpc/pmem: Add new instructions for persistent storage and sync
POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-3-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:22 +10:00
Nicholas Piggin
dd3d9aa558 powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE
When platform doesn't support GTSE, let TLB invalidation requests
for radix guests be off-loaded to the host using H_RPT_INVALIDATE
hcall.

	[hcall wrapper, error path handling and renames]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703053608.12884-4-bharata@linux.ibm.com
2020-07-16 13:00:21 +10:00
Bharata B Rao
029ab30b4c powerpc/mm: Enable radix GTSE only if supported.
Make GTSE an MMU feature and enable it by default for radix.
However for guest, conditionally enable it if hypervisor supports
it via OV5 vector. Let prom_init ask for radix GTSE only if the
support exists.

Having GTSE as an MMU feature will make it easy to enable radix
without GTSE. Currently radix assumes GTSE is enabled by default.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703053608.12884-2-bharata@linux.ibm.com
2020-07-16 13:00:21 +10:00
Haren Myneni
6068e1a442 powerpc/vas: Report proper error code for address translation failure
P9 DD2 NX workbook (Table 4-36) says DMA controller uses CC=5
internally for translation fault handling. NX reserves CC=250 for
OS to notify user space when NX encounters address translation
failure on the request buffer. Not an issue in earlier releases
as NX does not get faults on kernel addresses.

This patch defines CSB_CC_FAULT_ADDRESS(250) and updates CSB.CC with
this proper error code for user space.

Fixes: c96c4436ab ("powerpc/vas: Update CSB and notify process for fault CRBs")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Added Fixes tag and fix typo in comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/019fd53e7538c6f8f332d175df74b1815ef5aa8c.camel@linux.ibm.com
2020-07-15 23:09:55 +10:00
Christophe Leroy
b506923ee4 Revert "powerpc/kasan: Fix shadow pages allocation failure"
This reverts commit d2a91cef9b.

This commit moved too much work in kasan_init(). The allocation
of shadow pages has to be moved for the reason explained in that
patch, but the allocation of page tables still need to be done
before switching to the final hash table.

First revert the incorrect commit, following patch redoes it
properly.

Fixes: d2a91cef9b ("powerpc/kasan: Fix shadow pages allocation failure")
Cc: stable@vger.kernel.org
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181
Link: https://lore.kernel.org/r/3667deb0911affbf999b99f87c31c77d5e870cd2.1593690707.git.christophe.leroy@csgroup.eu
2020-07-15 12:04:39 +10:00
Nicholas Piggin
0138ba5783 powerpc/64/signal: Balance return predictor stack in signal trampoline
Returning from an interrupt or syscall to a signal handler currently
begins execution directly at the handler's entry point, with LR set to
the address of the sigreturn trampoline. When the signal handler
function returns, it runs the trampoline. It looks like this:

    # interrupt at user address xyz
    # kernel stuff... signal is raised
    rfid
    # void handler(int sig)
    addis 2,12,.TOC.-.LCF0@ha
    addi 2,2,.TOC.-.LCF0@l
    mflr 0
    std 0,16(1)
    stdu 1,-96(1)
    # handler stuff
    ld 0,16(1)
    mtlr 0
    blr
    # __kernel_sigtramp_rt64
    addi    r1,r1,__SIGNAL_FRAMESIZE
    li      r0,__NR_rt_sigreturn
    sc
    # kernel executes rt_sigreturn
    rfid
    # back to user address xyz

Note the blr with no matching bl. This can corrupt the return
predictor.

Solve this by instead resuming execution at the signal trampoline
which then calls the signal handler. qtrace-tools link_stack checker
confirms the entire user/kernel/vdso cycle is balanced after this
patch, whereas it's not upstream.

Alan confirms the dwarf unwind info still looks good. gdb still
recognises the signal frame and can step into parent frames if it
break inside a signal handler.

Performance is pretty noisy, not a very significant change on a POWER9
here, but branch misses are consistently a lot lower on a
microbenchmark:

 Performance counter stats for './signal':

       13,085.72 msec task-clock                #    1.000 CPUs utilized
  45,024,760,101      cycles                    #    3.441 GHz
  65,102,895,542      instructions              #    1.45  insn per cycle
  11,271,673,787      branches                  #  861.372 M/sec
      59,468,979      branch-misses             #    0.53% of all branches

       12,989.09 msec task-clock                #    1.000 CPUs utilized
  44,692,719,559      cycles                    #    3.441 GHz
  65,109,984,964      instructions              #    1.46  insn per cycle
  11,282,136,057      branches                  #  868.585 M/sec
      39,786,942      branch-misses             #    0.35% of all branches

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
2020-07-15 11:08:27 +10:00
Christophe Leroy
105fb38124 powerpc/8xx: Modify ptep_get()
Move ptep_get() close to pte_update(), in an ifdef section already
dedicated to powerpc 8xx. This section contains explanation about
the layout of page table entries.

Also modify it to return 4 times the pte value instead of padding
with zeroes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9f2df6621fcaf9eba15fadc61c169d0c8e2fb849.1592481938.git.christophe.leroy@csgroup.eu
2020-06-22 20:37:33 +10:00
Christophe Leroy
03fd42d458 powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k
FIX_EARLY_DEBUG_BASE reserves a 128k area for debuging.

When page size is 256k, the calculation results in a 0 number of
pages, leading to the following failure:

  CC      arch/powerpc/kernel/asm-offsets.s
In file included from ./arch/powerpc/include/asm/nohash/32/pgtable.h:77:0,
                 from ./arch/powerpc/include/asm/nohash/pgtable.h:8,
                 from ./arch/powerpc/include/asm/pgtable.h:20,
                 from ./include/linux/pgtable.h:6,
                 from ./arch/powerpc/include/asm/kup.h:42,
                 from ./arch/powerpc/include/asm/uaccess.h:9,
                 from ./include/linux/uaccess.h:11,
                 from ./include/linux/crypto.h:21,
                 from ./include/crypto/hash.h:11,
                 from ./include/linux/uio.h:10,
                 from ./include/linux/socket.h:8,
                 from ./include/linux/compat.h:15,
                 from arch/powerpc/kernel/asm-offsets.c:14:
./arch/powerpc/include/asm/fixmap.h:75:2: error: overflow in enumeration values
  __end_of_permanent_fixed_addresses,
  ^
make[2]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1

Ensure the debug area is at least one page.

Fixes: b8e8efaa86 ("powerpc: reserve fixmap entries for early debug")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ca8c9f8249f523b1fab873e67b81b11989d46553.1592207216.git.christophe.leroy@csgroup.eu
2020-06-22 14:19:12 +10:00
Linus Torvalds
7561393908 powerpc fixes for 5.8 #3
One fix for the interrupt rework we did last release which broke KVM-PR.
 
 Three commits fixing some fallout from the READ_ONCE() changes interacting badly
 with our 8xx 16K pages support, which uses a pte_t that is a structure of 4
 actual PTEs.
 
 A cleanup of the 8xx pte_update() to use the newly added pmd_off().
 
 A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is enabled.
 
 A minor fix for the SPU syscall generation.
 
 Thanks to:
   Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike Rapoport,
   Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7vNVsTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgO3rD/46cXJQ9AMQqtZh3+sgWu95Zd6JOviL
 vfhWeH/kbt/p6OGPoLXoYChoFD44Mf7BmTEDflslYICrxvhu9zI2lYN+948zfrEY
 lIjjP+Dd6fr1D2o3+hnOOX/LHAVyyZJTsZp5i6ehTOXeUw8KOCF1ulVB3o5GgQK0
 I/0oewL/SXNFnZS5qLgF2/OFS/BH3OnDG6mpICxCetZC9mNbHrTzos403ijyrvcX
 AsE4JSzI2UM9kT0pWXLa9QR3RgfBZ4wtMrnKAwdGI/E+YqAa7TuHZatPDAqoCJYY
 aePEZdweaeLWHQaQYSqlNP7YLAHuSdvZ2SvU65c2EKaaXug9sZJImyboJl/fo0Xo
 EtZiVbfaTfqsyi7EVQnsLMFYmtquacXoUH//nIoTro4pRkeMsM94BiK2HISa+8Bs
 KGQxBsnK2UaTgWERZHiK2VaKY/Tl1vGs09u7R21GiE2aD25ly+/q1Uo+WUr6iRKh
 1v42AsH1VCeEZKAog43gBGOr7bCez8/90GNtTJnKTKndSRSybCH68ME/zBKdNACn
 A7M9E0CNNjTOQNJyQ2UhyiBJzUK6kT/5g+C4mEH5WG4FkO6YHT1JyEusvsfj6Oe9
 RwDr98iNuM8AhaT30XmUXithDAl6JA5+3S0OcC2bL2xQ0O/VBPGZhIzgSFU8T7BY
 qpDj8l/8zk64Fw==
 =qws5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - One fix for the interrupt rework we did last release which broke
   KVM-PR

 - Three commits fixing some fallout from the READ_ONCE() changes
   interacting badly with our 8xx 16K pages support, which uses a pte_t
   that is a structure of 4 actual PTEs

 - A cleanup of the 8xx pte_update() to use the newly added pmd_off()

 - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
   enabled

 - A minor fix for the SPU syscall generation

Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
Rapoport, Nicholas Piggin.

* tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/8xx: Provide ptep_get() with 16k pages
  mm: Allow arches to provide ptep_get()
  mm/gup: Use huge_ptep_get() in gup_hugepte()
  powerpc/syscalls: Use the number when building SPU syscall table
  powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
  powerpc/64s: Fix KVM interrupt using wrong save area
  powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
2020-06-21 10:02:53 -07:00
Linus Torvalds
eede2b9b3f libnvdimm for 5.8-rc2
- Fix the visibility of the region 'align' attribute. The new unit tests
   for region alignment handling caught a corner case where the alignment
   cannot be specified if the region is converted from static to dynamic
   provisioning at runtime.
 
 - Add support for device health retrieval for the persistent memory
   supported by the papr_scm driver. This includes both the standard
   sysfs "health flags" that the nfit persistent memory driver publishes
   and a mechanism for the ndctl tool to retrieve a health-command payload.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl7tMD8ACgkQHtKRamZ9
 iAJNdA//aoULVhq/ZEo+tiPPXwgC9yHPJJ+Oe3+pcV2saWNeSBuHu047IdSQayRi
 6VZgsQ3lLRlEmk5aUbK00T84tVtwN20fppw1H4XyqfyGg4hmW7O5HomY8ZyOAETc
 0lEJet3+jf4Ym36BxBvfFTU9fOwtvLum5EHUw3Dc1Xl7UZVtp2zb0lklZx5xU/xr
 /nKNkPktvj7ENtmIOKegPFC/eSeKtOmD9shzjFygLeG52ZU1n+TEvhm4LCUU7Z+Y
 e5s/Ny3LUWzPY+bESXxTI1XpLthWOR6T+yPkNp6JoZHzk3P6vKawMGzRrvI1bUai
 hOj7Bf6BeNnKqIXIIZOXO8/ISgveHbnvUp6Kk5Lq5OYzPuGn81zZhen3EfHkB0y2
 Dc+xyceDsTeTlu2RGcrGrV60hCo2+5DqSLodu+kUAX6btvhtdrEgt3kmWAZSctk2
 7gHy8H+54p5ocXS+eDqafX6qYGmQIdGbtfTtiTuZJKIIH+Dsh9+mcu3KtH1AwtpN
 bV+/sbjuWOMmMBNnlVRtqhy1xA9bSVSaodthl+iWpw1nKXFPSKdl3w8ngSeZsfFD
 eZnGa4hO5G02s96gozh1n61ye8KFJLk8gCjx100PQKCemvIWTWLIzlh2NjPnCdTx
 SAI0vNfJgpmNZfb62NHYAGFwNwvOn4b7j9xFSqtHVh7+Vuyt6JM=
 =qVy4
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "A feature (papr_scm health retrieval) and a fix (sysfs attribute
  visibility) for v5.8.

  Vaibhav explains in the merge commit below why missing v5.8 would be
  painful and I agreed to try a -rc2 pull because only cosmetics kept
  this out of -rc1 and his initial versions were posted in more than
  enough time for v5.8 consideration:

   'These patches are tied to specific features that were committed to
    customers in upcoming distros releases (RHEL and SLES) whose
    time-lines are tied to 5.8 kernel release.

    Being able to track the health of an nvdimm is critical for our
    customers that are running workloads leveraging papr-scm nvdimms.
    Missing the 5.8 kernel would mean missing the distro timelines and
    shifting forward the availability of this feature in distro kernels
    by at least 6 months'

  Summary:

   - Fix the visibility of the region 'align' attribute.

     The new unit tests for region alignment handling caught a corner
     case where the alignment cannot be specified if the region is
     converted from static to dynamic provisioning at runtime.

   - Add support for device health retrieval for the persistent memory
     supported by the papr_scm driver.

     This includes both the standard sysfs "health flags" that the nfit
     persistent memory driver publishes and a mechanism for the ndctl
     tool to retrieve a health-command payload"

* tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm/region: always show the 'align' attribute
  powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
  ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
  powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  seq_buf: Export seq_buf_printf
  powerpc: Document details on H_SCM_HEALTH hcall
2020-06-20 13:13:21 -07:00
Christophe Leroy
c0e1c8c22b powerpc/8xx: Provide ptep_get() with 16k pages
READ_ONCE() now enforces atomic read, which leads to:

  CC      mm/gup.o
In file included from ./include/linux/kernel.h:11:0,
                 from mm/gup.c:2:
In function 'gup_hugepte.constprop',
    inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
    prefix ## suffix();    \
    ^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
  compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
  ^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
  compiletime_assert_rwonce_type(x);    \
  ^
mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
  pte = READ_ONCE(*ptep);
        ^
In function 'gup_get_pte',
    inlined from 'gup_pte_range' at mm/gup.c:2228:9,
    inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
    inlined from 'gup_pud_range' at mm/gup.c:2641:15,
    inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
    inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
    inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
    prefix ## suffix();    \
    ^
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
  compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
  ^
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
  compiletime_assert_rwonce_type(x);    \
  ^
mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
  return READ_ONCE(*ptep);
         ^
make[2]: *** [mm/gup.o] Error 1

Define ptep_get() on 8xx when using 16k pages.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/341688399c1b102756046d19ea6ce39db1ae4742.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:54 +10:00
Linus Torvalds
0c389d89ab maccess: make get_kernel_nofault() check for minimal type compatibility
Now that we've renamed probe_kernel_address() to get_kernel_nofault()
and made it look and behave more in line with get_user(), some of the
subtle type behavior differences end up being more obvious and possibly
dangerous.

When you do

        get_user(val, user_ptr);

the type of the access comes from the "user_ptr" part, and the above
basically acts as

        val = *user_ptr;

by design (except, of course, for the fact that the actual dereference
is done with a user access).

Note how in the above case, the type of the end result comes from the
pointer argument, and then the value is cast to the type of 'val' as
part of the assignment.

So the type of the pointer is ultimately the more important type both
for the access itself.

But 'get_kernel_nofault()' may now _look_ similar, but it behaves very
differently.  When you do

        get_kernel_nofault(val, kernel_ptr);

it behaves like

        val = *(typeof(val) *)kernel_ptr;

except, of course, for the fact that the actual dereference is done with
exception handling so that a faulting access is suppressed and returned
as the error code.

But note how different the casting behavior of the two superficially
similar accesses are: one does the actual access in the size of the type
the pointer points to, while the other does the access in the size of
the target, and ignores the pointer type entirely.

Actually changing get_kernel_nofault() to act like get_user() is almost
certainly the right thing to do eventually, but in the meantime this
patch adds logit to at least verify that the pointer type is compatible
with the type of the result.

In many cases, this involves just casting the pointer to 'void *' to
make it obvious that the type of the pointer is not the important part.
It's not how 'get_user()' acts, but at least the behavioral difference
is now obvious and explicit.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18 12:10:37 -07:00