Commit 3e86e4d74c ("kbuild: keep .modinfo section in
vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it
from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and
ELF_DETAILS was present in all architecture specific vmlinux linker
scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and
COMMON_DISCARDS may be used by other linker scripts, such as the s390
and x86 compressed boot images, which may not expect to have a .modinfo
section. In certain circumstances, this could result in a bootloader
failing to load the compressed kernel [1].
Commit ddc6cbef3e ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with
SecureBoot trailer") recently addressed this for the s390 bzImage but
the same bug remains for arm, parisc, and x86. The presence of .modinfo
in the x86 bzImage was the root cause of the issue worked around with
commit d50f210913 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes
lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its
MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition.
Split .modinfo out from ELF_DETAILS into its own macro and handle it in
all vmlinux linker scripts. Discard .modinfo in the places where it was
previously being discarded from being in COMMON_DISCARDS, as it has
never been necessary in those uses.
Cc: stable@vger.kernel.org
Fixes: 3e86e4d74c ("kbuild: keep .modinfo section in vmlinux.unstripped")
Reported-by: Ed W <lists@wildgooses.com>
Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1]
Tested-by: Ed W <lists@wildgooses.com> # x86_64
Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
- Provide a new interface for dynamic configuration and deconfiguration of
hotplug memory, allowing with and without memmap_on_memory support. This
makes the way memory hotplug is handled on s390 much more similar to
other architectures
- Remove compat support. There shouldn't be any compat user space around
anymore, therefore get rid of a lot of code which also doesn't need to be
tested anymore
- Add stackprotector support. GCC 16 will get new compiler options, which
allow to generate code required for kernel stackprotector support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes
a lot of duplicated code. The new driver is also extendable and allows
to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of
crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area and
the kernel image are within the same 4GB area. This eliminates the need
of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmktZioACgkQIg7DeRsp
bsK4Rw//VzkvHyzOtGKZ8Hb4S+Sh/PFlaZQXNhj+Xt5gWoOhP1uPmmhBe6LxjYaB
J9Ns3hpONQ1dTHV7VVkds8FvM/SBcGe8m5RpefmChC/bjm5UEOV/MppKtA0aLnEH
hJmdubIrrRAXKggxlHEfRLzBsFvV/rJ9Xf16FhRxGDc4pgmgkI1NPQ41/dyCHklQ
dB3YrFVPIETywVYYVB/G3h11JgF5Z6CKtjYCdSx72Fkbj65+6JPfcPgLKMpcJuPd
UxUXtCo1FCXlP70jsz8JQI8cdieG0KDQTtnZP4P/pqjQ3wirOqvMewNa9t9xmQ2e
p6Rc1Vx5DESkq9bRWtQEaprTVVzK7DDLH3RuZwB+uLrcLGD8JvVS6/m9n9CgzBMT
BnJXG2sLZH+gdQy+DSD/fVDD7OvIk8TGrH+OFwVIKhrT/J3B2E7ZSYyZZCNIS7VG
yiuypoDGYg3ZpYjH9+qOXWB3nc0vQWrlFzb1bsQu1omJGmunLv4jtTjAKGN82C33
auBsIYAlQW20X7DV0vZa59PwqwtBqtdQQcTidwtSztzKogRXAdK8KKHtN60JM4S2
7sWFOFCQaTChAeDNw6MF5EtULb551nwH2RtJ9x3CrJj+OGK6clbQNcxIA7Oy0veR
Sl9v1lMfeKOgDrPdDy3ArQBJ8WLlF9qX9wLKbiaNyIKmkz2ymkg=
=CNrb
-----END PGP SIGNATURE-----
Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Provide a new interface for dynamic configuration and deconfiguration
of hotplug memory, allowing with and without memmap_on_memory
support. This makes the way memory hotplug is handled on s390 much
more similar to other architectures
- Remove compat support. There shouldn't be any compat user space
around anymore, therefore get rid of a lot of code which also doesn't
need to be tested anymore
- Add stackprotector support. GCC 16 will get new compiler options,
which allow to generate code required for kernel stackprotector
support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This
removes a lot of duplicated code. The new driver is also extendable
and allows to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of
crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area
and the kernel image are within the same 4GB area. This eliminates
the need of weak per cpu variables. Get rid of
ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
s390/entry: Use lay instead of aghik
s390/vdso: Get rid of -m64 flag handling
s390/vdso: Rename vdso64 to vdso
s390: Rename head64.S to head.S
s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
s390: Add stackprotector support
s390/modules: Simplify module_finalize() slightly
s390: Remove KMSG_COMPONENT macro
s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
s390/ap: Restrict driver_override versus apmask and aqmask use
s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
s390/ap: Support driver_override for AP queue devices
s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
s390/debug: Update description of resize operation
s390/syscalls: Switch to generic system call table generation
s390/syscalls: Remove system call table pointer from thread_struct
s390/uapi: Remove 31 bit support from uapi header files
s390: Remove compat support
tools: Remove s390 compat support
...
Stackprotector support was previously unavailable on s390 because by
default compilers generate code which is not suitable for the kernel:
the canary value is accessed via thread local storage, where the address
of thread local storage is within access registers 0 and 1.
Using those registers also for the kernel would come with a significant
performance impact and more complicated kernel entry/exit code, since
access registers contents would have to be exchanged on every kernel entry
and exit.
With the upcoming gcc 16 release new compiler options will become available
which allow to generate code suitable for the kernel. [1]
Compiler option -mstack-protector-guard=global instructs gcc to generate
stackprotector code that refers to a global stackprotector canary value via
symbol __stack_chk_guard. Access to this value is guaranteed to occur via
larl and lgrl instructions.
Furthermore, compiler option -mstack-protector-guard-record generates a
section containing all code addresses that reference the canary value.
To allow for per task canary values the instructions which load the address
of __stack_chk_guard are patched so they access a lowcore field instead: a
per task canary value is available within the task_struct of each task, and
is written to the per-cpu lowcore location on each context switch.
Also add sanity checks and debugging option to be consistent with other
kernel code patching mechanisms.
Full debugging output can be enabled with the following kernel command line
options:
debug_stackprotector
bootdebug
ignore_loglevel
earlyprintk
dyndbg="file stackprotector.c +p"
Example debug output:
stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240
where "<insn address>: <old insn> -> <new insn>".
[1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The s390 indirect thunks are placed in the .text.__s390_indirect_jump_*
sections.
Certain config options which enable -ffunction-sections have a custom
version of the TEXT_TEXT macro:
.text.[0-9a-zA-Z_]*
That unintentionally matches the thunk sections, causing them to get
grouped with normal text rather than being handled by their intended
rule later in the script:
*(.text.*_indirect_*)
Fix that by adding another period to the thunk section names, following
the kernel's general convention for distinguishing code-generated text
sections from compiler-generated ones.
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
When building s390 defconfig with binutils older than 2.32, there are
several warnings during the final linking stage:
s390-linux-ld: .tmp_vmlinux1: warning: allocated section `.got.plt' not in segment
s390-linux-ld: .tmp_vmlinux2: warning: allocated section `.got.plt' not in segment
s390-linux-ld: vmlinux.unstripped: warning: allocated section `.got.plt' not in segment
s390-linux-objcopy: vmlinux: warning: allocated section `.got.plt' not in segment
s390-linux-objcopy: st7afZyb: warning: allocated section `.got.plt' not in segment
binutils commit afca762f598 ("S/390: Improve partial relro support for
64 bit") [1] in 2.32 changed where .got.plt is emitted, avoiding the
warning.
The :NONE in the .vmlinux.info output section description changes the
segment for subsequent allocated sections. Move .vmlinux.info right
above the discards section to place all other sections in the previously
defined segment, .data.
Fixes: 30226853d6 ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections")
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=afca762f598d453c563f244cd3777715b1a0cb72 [1]
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Alexey Gladkov <legion@kernel.org>
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251008-kbuild-fix-modinfo-regressions-v1-3-9fc776c5887c@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
In the upcoming changes, the ELF_DETAILS macro will be extended with
the ".modinfo" section, which will cause an error:
>> s390x-linux-ld: .tmp_vmlinux1: warning: allocated section `.modinfo' not in segment
>> s390x-linux-ld: .tmp_vmlinux2: warning: allocated section `.modinfo' not in segment
>> s390x-linux-ld: vmlinux.unstripped: warning: allocated section `.modinfo' not in segment
This happens because the .vmlinux.info use :NONE to override the default
segment and tell the linker to not put the section in any segment at all.
To avoid this, we need to change the sections order that will be placed
in the default segment.
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506062053.zbkFBEnJ-lkp@intel.com/
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://patch.msgid.link/20d40a7a3a053ba06a54155e777dcde7fdada1db.1758182101.git.legion@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
The current assumption is that kernel code is always executed with access
key zero, which means that storage key protection does not apply.
However this assumption is not correct: cmpxchg_user_key() may be executed
with a non-zero key; if then the storage key of the page which belongs to
the cmpxchg_user_key() code contains a key with fetch-protection enabled
the result is a protection exception.
For several performance optimizations storage keys are not initialized on
system boot. To keep these optimizations add infrastructure which allows to
define code ranges within functions which are executed with a non-default
key. When such code is executed such functions must explicitly call
skey_regions_initialize().
This will initialize all storage keys belonging to such code ranges in a
way that no protection exceptions happen when the code is executed with a
non-default access key.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The fixup section was added again by mistake when test_fp_ctl() was
removed. The reason for the removal of the fixup section is described in
commit 484a8ed8b7 ("s390/extable: add dedicated uaccess handler").
Remove it again for the same reason.
Add an exception handler which handles exceptions when the floating point
control register is attempted to be set to invalid values. The exception
handler sets the floating point control register to zero and continues
execution at the specified address.
The new sfpc inline assembly is open-coded to make back porting a bit
easier.
Fixes: 702644249d ("s390/fpu: get rid of test_fp_ctl()")
Cc: stable@vger.kernel.org
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
These are only two small patches, one cleanup for arch/alpha
and a preparation patch cleaning up the handling of runtime
constants in the linker scripts.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmboHV0ACgkQYKtH/8kJ
UifHfhAAqTHHxxe+HiphGBPHN0ODyLVUs7fOQHtLOSmJlQa6x1TCR/+1nL1kTDbe
j6EcIRxZrllQZ+jZBA8z2XsAmjjBLUxCB4yu6oxYJh8OdFyqeVM/myZEr2TAyb0o
A3D9b+rfnY8sr9XaFHSHGWbh4c33cGQhACumHVAjtPvU06Voskq4pAf9ZnpGkNBe
AdKNTVG6+w84dKUNuzXcexP8d7SnsXNfd6T9+evtW/M+fziWzs3aPQr+GZED96E5
8IRldXi2nzIwm9LT5IzZAt+QvpVb2Qob1+rej9p5WpptGp840CROTo61SwaYHCMV
DDxTlmADsApWJQ3B5gDu6QS2jXT4eeOrY3JI2baeCyOV6auj15UXKiWc2QVoHOVU
6+PzlSFuLatI6WsxXfOcD0o3bfQXMKS6zCC/4eD7Y/SmmMqBbL5+d9sU5lwkiOFl
swoswF4HTwo5d6NdkSuJOt6KA/V8a68lBhKYBXHu2yuLi/LDNOaipEvBHQLzfnlY
91e5DtDiHK9CYDNkwiR+bV9rQnhA535JSlfR8VtpU/SJTTjyF+dkt9JGPdivXoIA
8Zv+DN/oyrahUtCrgzzPXahOuBrfD/WfIajsvpEK6vNPuBhscsZFg/thc70FMIXo
qn8Dmpi/CnDWFNOy0xO0cbYWrGBGn9E7kzbSZ78tUIjPUmmEKfk=
=OOMl
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are only two small patches, one cleanup for arch/alpha and a
preparation patch cleaning up the handling of runtime constants in the
linker scripts"
* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
runtime constants: move list of constants to vmlinux.lds.h
alpha: no need to include asm/xchg.h twice
Symbol offsets to the KASLR base do not match symbol address in
the vmlinux image. That is the result of setting the KASLR base
to the beginning of .text section as result of an optimization.
Revert that optimization and allocate virtual memory for the
whole kernel image including __START_KERNEL bytes as per the
linker script. That allows keeping the semantics of the KASLR
base offset in sync with other architectures.
Rename __START_KERNEL to TEXT_OFFSET, since it represents the
offset of the .text section within the kernel image, rather than
a virtual address.
Still skip mapping TEXT_OFFSET bytes to save memory on pgtables
and provoke exceptions in case an attempt to access this area is
made, as no kernel symbol may reside there.
In case CONFIG_KASAN is enabled the location counter might exceed
the value of TEXT_OFFSET, while the decompressor linker script
forcefully resets it to TEXT_OFFSET, which leads to a sections
overlap link failure. Use MAX() expression to avoid that.
Reported-by: Omar Sandoval <osandov@osandov.com>
Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/
Fixes: 56b1069c40 ("s390/boot: Rework deployment of the kernel image")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Refactor the list of constant variables into a macro.
This should make it easier to add more constants in the future.
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add the required code to patch alternatives early in the decompressor.
This is required for the upcoming lowcore relocation changes, where
alternatives for facility 193 need to get patched before lowcore
alternatives.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Implement the runtime constant infrastructure for s390, allowing the
dcache d_hash() function to be generated using as a constant for hash
table address followed by shift by a constant of the hash index.
This is the s390 variant of commit 94a2bc0f61 ("arm64: add 'runtime
constant' support") and commit e3c92e8171 ("runtime constants: add
x86 architecture support").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When the kernel is built with CONFIG_PIE_BUILD option enabled it
uses dynamic symbols, for which the linker does not allow more
than 64K number of entries. This can break features like kpatch.
Hence, whenever possible the kernel is built with CONFIG_PIE_BUILD
option disabled. For that support of unaligned symbols generated by
linker scripts in the compiler is necessary.
However, older compilers might lack such support. In that case the
build process resorts to CONFIG_PIE_BUILD option-enabled build.
Compile object files with -fPIC option and then link the kernel
binary with -no-pie linker option.
As result, the dynamic symbols are not generated and not only kpatch
feature succeeds, but also the whole CONFIG_PIE_BUILD option-enabled
code could be dropped.
[ agordeev: Reworded the commit message ]
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
option is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled.
[ agordeev: Reworded the commit message ]
Fixes: 778666df60 ("s390: compile relocatable kernel without -fPIE")
Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Rework deployment of kernel image for both compressed and
uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED
kernel configuration variable.
In case CONFIG_KERNEL_UNCOMPRESSED is disabled avoid uncompressing
the kernel to a temporary buffer and copying it to the target
address. Instead, uncompress it directly to the target destination.
In case CONFIG_KERNEL_UNCOMPRESSED is enabled avoid moving the
kernel to default 0x100000 location when KASLR is disabled or
failed. Instead, use the uncompressed kernel image directly.
In case KASLR is disabled or failed .amode31 section location in
memory is not randomized and precedes the kernel image. In case
CONFIG_KERNEL_UNCOMPRESSED is disabled that location overlaps the
area used by the decompression algorithm. That is fine, since that
area is not used after the decompression finished and the size of
.amode31 section is not expected to exceed BOOT_HEAP_SIZE ever.
There is no decompression in case CONFIG_KERNEL_UNCOMPRESSED is
enabled. Therefore, rename decompress_kernel() to deploy_kernel(),
which better describes both uncompressed and compressed cases.
Introduce AMODE31_SIZE macro to avoid immediate value of 0x3000
(the size of .amode31 section) in the decompressor linker script.
Modify the vmlinux linker script to force the size of .amode31
section to AMODE31_SIZE (the value of (_eamode31 - _samode31)
could otherwise differ as result of compiler options used).
Introduce __START_KERNEL macro that defines the kernel ELF image
entry point and set it to the currrent value of 0x100000.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The end of GOT is calculated dynamically on boot. The size of GOT
is calculated on build from the start and end of GOT. Avoid both
calculations and use the end of GOT directly.
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Naresh reported this build error on linux-next:
s390x-linux-gnu-ld: Unexpected GOT/PLT entries detected!
make[3]: *** [/builds/linux/arch/s390/boot/Makefile:87:
arch/s390/boot/vmlinux.syms] Error 1
make[3]: Target 'arch/s390/boot/bzImage' not remade because of errors.
The reason for the build error is an incorrect/incomplete assertion which
checks the size of the .got.plt section. Similar to x86 the size is either
zero or 24 bytes (three entries).
See commit 262b5cae67 ("x86/boot/compressed: Move .got.plt entries out of
the .got section") for more details. The three reserved/additional entries
for s390 are described in chapter 3.2.2 of the s390x ABI [1] (thanks to
Andreas Krebbel for pointing this out!).
[1] https://github.com/IBM/s390x-abi/releases/download/v1.6.1/lzsabi_s390x.pdf
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYvWp8TY-fMEvc3UhoVtoR_eM5VsfHj3+n+kexcfJJ+Cvw@mail.gmail.com
Fixes: 30226853d6 ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux. This has a few problems:
- It uses dynamic symbols (.dynsym), for which the linker refuses to
allow more than 64k sections. This can break features which use
'-ffunction-sections' and '-fdata-sections', including kpatch-build
[1] and Function Granular KASLR.
- It unnecessarily uses GOT relocations, adding an extra layer of
indirection for many memory accesses.
Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.
This is done by first telling the linker to preserve all relocations
during the vmlinux link. (Note this is harmless: they are later
stripped in the vmlinux.bin link.)
Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections. The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.
(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'. So Clang-compiled kernels have a GOT, which needs to
be adjusted.)
On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.
[1] https://github.com/dynup/kpatch/issues/1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html
Compiler consideration:
Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.
Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.
In addition to it:
move vmlinux.relocs to safe relocation
When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.
Compile warning fix from Sumanth Korikkar:
When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:
ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.
Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.
This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84598 ("x86/build: Add asserts for unwanted sections")
[sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move
vmlinux.relocs to safe location]
[hca@linux.ibm.com: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are some warnings around certain
ELF sections:
s390-linux-ld: warning: orphan section `.dynstr' from `arch/s390/kernel/head64.o' being placed in section `.dynstr'
s390-linux-ld: warning: orphan section `.dynamic' from `arch/s390/kernel/head64.o' being placed in section `.dynamic'
s390-linux-ld: warning: orphan section `.hash' from `arch/s390/kernel/head64.o' being placed in section `.hash'
s390-linux-ld: warning: orphan section `.gnu.hash' from `arch/s390/kernel/head64.o' being placed in section `.gnu.hash'
Explicitly keep those sections like other architectures when
CONFIG_RELOCATABLE is enabled, which is always true for s390.
[hca@linux.ibm.com: keep sections instead of discarding]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-4-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around the
GOT and PLT sections:
s390-linux-ld: warning: orphan section `.plt' from `arch/s390/kernel/head64.o' being placed in section `.plt'
s390-linux-ld: warning: orphan section `.got' from `arch/s390/kernel/head64.o' being placed in section `.got'
s390-linux-ld: warning: orphan section `.got.plt' from `arch/s390/kernel/head64.o' being placed in section `.got.plt'
s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/kernel/head64.o' being placed in section `.iplt'
s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/kernel/head64.o' being placed in section `.igot.plt'
s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/boot/head.o' being placed in section `.iplt'
s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/boot/head.o' being placed in section `.igot.plt'
s390-linux-ld: warning: orphan section `.got' from `arch/s390/boot/head.o' being placed in section `.got'
Currently, only the '.got' section is actually emitted in the final
binary. In a manner similar to other architectures, put the '.got'
section near the '.data' section and coalesce the PLT sections,
checking that the final section is zero sized, which is a safe/tested
approach versus full discard.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-3-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around
'.data.rel' sections:
s390-linux-ld: warning: orphan section `.data.rel' from `kernel/sched/build_utility.o' being placed in section `.data.rel'
s390-linux-ld: warning: orphan section `.data.rel.local' from `kernel/sched/build_utility.o' being placed in section `.data.rel.local'
s390-linux-ld: warning: orphan section `.data.rel.ro' from `kernel/sched/build_utility.o' being placed in section `.data.rel.ro'
s390-linux-ld: warning: orphan section `.data.rel.ro.local' from `kernel/sched/build_utility.o' being placed in section `.data.rel.ro.local'
Describe these in vmlinux.lds.S so there is no more warning and the
sections are placed consistently between linkers.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-2-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
It is quite subtle to use test_fp_ctl() correctly. Therefore remove it -
instead copy whatever new floating point control (fpc) register values are
supposed to be used into its save area.
Test the validity of the new value when loading it. If the new value is
invalid, load the fpc register with zero.
This seems to be a the best way to approach this problem. Even though this
changes behavior:
- sigreturn with an invalid fpc value on the stack will succeed, and
continue with zero value, instead of returning with SIGSEGV
- ptraced processes will also use a zero value instead of letting the
request fail with -EINVAL
However all of this seems to acceptable. After all testing of the value was
only implemented to avoid that user space can crash the kernel. It is not
there to test values for validity; and the assumption is that there is no
existing user space which is doing this.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Currently, exception tables are marked as ro_after_init. However,
since they are sorted during compile time using scripts/sorttable,
they can be moved to RO_DATA using the RO_EXCEPTION_TABLE_ALIGN macro,
which is specifically designed for this purpose.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since regular paging structs are initialized in decompressor already
move KASAN shadow mapping to decompressor as well. This helps to avoid
allocating KASAN required memory in 1 large chunk, de-duplicate paging
structs creation code and start the uncompressed kernel with KASAN
instrumentation right away. This also allows to avoid all pitfalls
accidentally calling KASAN instrumented code during KASAN initialization.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
- Large cleanup of the con3270/tty3270 driver. Among others this fixes:
* Background Color Support
* ASCII Line Character Support
* VT100 Support
* Geometries other than 80x24
- Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to
uaccess functions, which will be used by KVM to access KVM guest memory
with a specific storage key.
- Add support for user space events counting to CPUMF.
- Cleanup the vfio/ccw code, which also allows now to properly support 2K
Format-2 IDALs.
- Move kernel page table allocation and initialization to decompressor,
which finally allows to enter the kernel with dynamic address translation
enabled. This in turn allows to get rid of code with special handling in
the kernel, which has to distinguish if DAT is on or off.
- Replace kretprobe with rethook.
- Various improvements to vfio/ap queue resets:
* Use TAPQ to verify completion of a reset in progress rather than
multiple invocations of ZAPQ.
* Check TAPQ response codes when verifying successful completion of ZAPQ.
* Fix erroneous handling of some error response codes.
* Increase the maximum amount of time to wait for successful completion
of ZAPQ.
- Rework system call wrappers to get rid of alias functions, which were
only left on s390.
- Cleanup diag288_wdt watchdog driver. It has been agreed on with Guenter
Roeck that this goes upstream via the s390 tree.
- Add missing loadparm parameter handling for list-directed ECKD ipl/reipl.
- Various improvements to memory detection code.
- Remove arch_cpu_idle_time() since the current implementation is broken,
and allows user space observable accounted idle times which can
temporarily decrease.
- Add Reset DAT-Protection support: (only) allow to change PTEs from RO to
RW with a new RDP instruction. Unlike the currently used IPTE
instruction, this does not necessarily guarantee that TLBs of all CPUs
are synchronously flushed; and that remote CPUs can see spurious
protection faults. The overall improvement for not requiring an all CPU
synchronization, like it is required with IPTE, should be beneficial.
- Fix KFENCE page fault reporting.
- Smaller cleanups and improvement all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmPzWhQACgkQIg7DeRsp
bsKkdQ//QbCwDMt6T3bmi6gGgs9HRSkLOTHAlOIRuetEzRiBzm/O4Gm8NycvPspl
BIcuXmQKt+gBS44tWikKpwuhmWrAtiFUxs/M1uPfRXqjUf+ZFJinPJgtPCBa/3rv
tQkh541QxpX4K5Ks71WKv2Kh0RjaTqw5Kj+rlDBYHsxZvb28mDigINRYoVSxNUKi
dTVlR0UgdGLecXfezpvWeEAbJu6Q2pbIkOT3tNOumNqRAoUN4cbH3P0agHJdq8oj
L/++d4tfVbwL8N/VCwIVBeW/AQzA0B2UCDVz75Pd55+FFrIGVp1hn7QC9QQieomL
fzGOTrL4D9U8JkAIJqhioA1NlcN1+QW2svoMVo0N3vBJpIbzX4bZKTDxZZ26dG9H
ox7YvhsZtJA7p34X5hetoObzZcmiYJStT+BDao7q1x3oLf4G31HaP+YUDIKBPmNW
ieZa+ujYbKor1pD6ysaMVX+c1qhfX6S/V0uBAikoqMWUVUvH/ZeuSxCSfMuvWUrQ
KFuc0HnPiiIO1Ux3wN5oN33+pWCSdUcJOeg4aj0jkkFT9Ct3TOBupIGBGyhOAh6r
OTp1iqJuQjwOmkWPLyRuGMzRmDDp+hWz9qNF/DFwIV1IMi9AzJjEOg31cMVxx3gG
iM25560uqvhhUwQFbyjVgJmj00dmqMX07q8QSVOgg9oUrnRhQrc=
=MW9x
-----END PGP SIGNATURE-----
Merge tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Large cleanup of the con3270/tty3270 driver. Among others this fixes:
- Background Color Support
- ASCII Line Character Support
- VT100 Support
- Geometries other than 80x24
- Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to
uaccess functions, which will be used by KVM to access KVM guest
memory with a specific storage key
- Add support for user space events counting to CPUMF
- Cleanup the vfio/ccw code, which also allows now to properly support
2K Format-2 IDALs
- Move kernel page table allocation and initialization to decompressor,
which finally allows to enter the kernel with dynamic address
translation enabled. This in turn allows to get rid of code with
special handling in the kernel, which has to distinguish if DAT is on
or off
- Replace kretprobe with rethook
- Various improvements to vfio/ap queue resets:
- Use TAPQ to verify completion of a reset in progress rather than
multiple invocations of ZAPQ.
- Check TAPQ response codes when verifying successful completion of
ZAPQ.
- Fix erroneous handling of some error response codes.
- Increase the maximum amount of time to wait for successful
completion of ZAPQ
- Rework system call wrappers to get rid of alias functions, which were
only left on s390
- Cleanup diag288_wdt watchdog driver. It has been agreed on with
Guenter Roeck that this goes upstream via the s390 tree
- Add missing loadparm parameter handling for list-directed ECKD
ipl/reipl
- Various improvements to memory detection code
- Remove arch_cpu_idle_time() since the current implementation is
broken, and allows user space observable accounted idle times which
can temporarily decrease
- Add Reset DAT-Protection support: (only) allow to change PTEs from RO
to RW with a new RDP instruction. Unlike the currently used IPTE
instruction, this does not necessarily guarantee that TLBs of all
CPUs are synchronously flushed; and that remote CPUs can see spurious
protection faults. The overall improvement for not requiring an all
CPU synchronization, like it is required with IPTE, should be
beneficial
- Fix KFENCE page fault reporting
- Smaller cleanups and improvement all over the place
* tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (182 commits)
s390/irq,idle: simplify idle check
s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag()
s390/processor: let cpu helper functions return boolean values
s390/kfence: fix page fault reporting
s390/zcrypt: introduce ctfm field in struct CPRBX
s390: remove confusing comment from uapi types header file
vfio/ccw: remove WARN_ON during shutdown
s390/entry: remove toolchain dependent micro-optimization
s390/mem_detect: do not truncate online memory ranges info
s390/vx: remove __uint128_t type from __vector128 struct again
s390/mm: add support for RDP (Reset DAT-Protection)
s390/mm: define private VM_FAULT_* reasons from top bits
Documentation: s390: correct spelling
s390/ap: fix status returned by ap_qact()
s390/ap: fix status returned by ap_aqic()
s390: vfio-ap: tighten the NIB validity check
Revert "s390/mem_detect: do not update output parameters on failure"
s390/idle: remove arch_cpu_idle_time() and corresponding code
s390/vx: use simple assignments to access __vector128 members
s390/vx: add 64 and 128 bit members to __vector128 struct
...
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
SDc/Y/c=
=zpaD
-----END PGP SIGNATURE-----
Merge tag 'v6.2-rc6' into sched/core, to pick up fixes
Pick up fixes before merging another batch of cpuidle updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When debugging vmlinux with QEMU + GDB, the following GDB error may
occur:
(gdb) c
Continuing.
Warning:
Cannot insert breakpoint -1.
Cannot access memory at address 0xffffffffffff95c0
Command aborted.
(gdb)
The reason is that, when .interp section is present, GDB tries to
locate the file specified in it in memory and put a number of
breakpoints there (see enable_break() function in gdb/solib-svr4.c).
Sometimes GDB finds a bogus location that matches its heuristics,
fails to set a breakpoint and stops. This makes further debugging
impossible.
The .interp section contains misleading information anyway (vmlinux
does not need ld.so), so fix by discarding it.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The setup of the kernel virtual address space is spread
throughout the sources, boot stages and config options
like this:
1. The available physical memory regions are queried
and stored as mem_detect information for later use
in the decompressor.
2. Based on the physical memory availability the virtual
memory layout is established in the decompressor;
3. If CONFIG_KASAN is disabled the kernel paging setup
code populates kernel pgtables and turns DAT mode on.
It uses the information stored at step [1].
4. If CONFIG_KASAN is enabled the kernel early boot
kasan setup populates kernel pgtables and turns DAT
mode on. It uses the information stored at step [1].
The kasan setup creates early_pg_dir directory and
directly overwrites swapper_pg_dir entries to make
shadow memory pages available.
Move the kernel virtual memory setup to the decompressor
and start the kernel with DAT turned on right from the
very first istruction. That completely eliminates the
boot phase when the kernel runs in DAT-off mode, simplies
the overall design and consolidates pgtables setup.
The identity mapping is created in the decompressor, while
kasan shadow mappings are still created by the early boot
kernel code.
Share with decompressor the existing kasan memory allocator.
It decreases the size of a newly requested memory block from
pgalloc_pos and ensures that kernel image is not overwritten.
pgalloc_low and pgalloc_pos pointers are made preserved boot
variables for that.
Use the bootdata infrastructure to setup swapper_pg_dir
and invalid_pg_dir directories used by the kernel later.
The interim early_pg_dir directory established by the
kasan initialization code gets eliminated as result.
As the kernel runs in DAT-on mode only the PSW_KERNEL_BITS
define gets PSW_MASK_DAT bit by default. Additionally, the
setup_lowcore_dat_off() and setup_lowcore_dat_on() routines
get merged, since there is no DAT-off mode stage anymore.
The memory mappings are created with RW+X protection that
allows the early boot code setting up all necessary data
and services for the kernel being booted. Just before the
paging is enabled the memory protection is changed to
RO+X for text, RO+NX for read-only data and RW+NX for
kernel data and the identity mapping.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
Nathan Chancellor reports that the s390 vmlinux fails to link with
GNU ld < 2.36 since commit 99cb0d917f ("arch: fix broken BuildID
for arm64 and riscv").
It happens for defconfig, or more specifically for CONFIG_EXPOLINE=y.
$ s390x-linux-gnu-ld --version | head -n1
GNU ld (GNU Binutils for Debian) 2.35.2
$ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- allnoconfig
$ ./scripts/config -e CONFIG_EXPOLINE
$ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- olddefconfig
$ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu-
`.exit.text' referenced in section `.s390_return_reg' of drivers/base/dd.o: defined in discarded section `.exit.text' of drivers/base/dd.o
make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make: *** [Makefile:1252: vmlinux] Error 2
arch/s390/kernel/vmlinux.lds.S wants to keep EXIT_TEXT:
.exit.text : {
EXIT_TEXT
}
But, at the same time, EXIT_TEXT is thrown away by DISCARD because
s390 does not define RUNTIME_DISCARD_EXIT.
I still do not understand why the latter wins after 99cb0d917f,
but defining RUNTIME_DISCARD_EXIT seems correct because the comment
line in arch/s390/kernel/vmlinux.lds.S says:
/*
* .exit.text is discarded at runtime, not link time,
* to deal with references from __bug_table
*/
Nathan also found that binutils commit 21401fc7bf67 ("Duplicate output
sections in scripts") cured this issue, so we cannot reproduce it with
binutils 2.36+, but it is better to not rely on it.
Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20230105031306.1455409-1-masahiroy@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Symbols _edata and _end in the linker script are the
only unaligned expicitly on page boundary. Although
_end is aligned implicitly by BSS_SECTION macro that
is still inconsistent and could lead to a bug if a tool
or function would assume that _edata is as aligned as
others.
For example, vmem_map_init() function does not align
symbols _etext, _einittext etc. Should these symbols
be unaligned as well, the size of ranges to update
were short on one page.
Instead of fixing every occurrence of this kind in the
code and external tools just force the alignment on
these two symbols.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add proper alignment for .nospec_call_table and .nospec_return_table in
vmlinux.
[hca@linux.ibm.com]: The problem with the missing alignment of the nospec
tables exist since a long time, however only since commit e6ed91fd07
("s390/alternatives: remove padding generation code") and with
CONFIG_RELOCATABLE=n the kernel may also crash at boot time.
The above named commit reduced the size of struct alt_instr by one byte,
so its new size is 11 bytes. Therefore depending on the number of cpu
alternatives the size of the __alt_instructions array maybe odd, which
again also causes that the addresses of the nospec tables will be odd.
If the address of __nospec_call_start is odd and the kernel is compiled
With CONFIG_RELOCATABLE=n the compiler may generate code that loads the
address of __nospec_call_start with a 'larl' instruction.
This will generate incorrect code since the 'larl' instruction only works
with even addresses. In result the members of the nospec tables will be
accessed with an off-by-one offset, which subsequently may lead to
addressing exceptions within __nospec_revert().
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/8719bf1ce4a72ebdeb575200290094e9ce047bcc.1661557333.git.jpoimboe@kernel.org
Cc: <stable@vger.kernel.org> # 4.16
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The only user is gone. Remove the section.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The memory for amode31 section is allocated from the decompressed
kernel. Instead, allocate that memory from the decompressor. This
is a prerequisite to allow initialization of the virtual memory
before the decompressed kernel takes over.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Rename amod31 to amode31 like it was supposed to be.
Fixes: c78d0c7484 ("s390: rename dma section to amode31")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The dma section name is confusing, since the code which resides within
that section has nothing to do with direct memory access. Instead the
limitation is that the code has to run in 31 bit addressing mode, and
therefore has to reside below 2GB. So the name was chosen since
ZONE_DMA is the same region.
To reduce confusion rename the section to amode31, which hopefully
describes better what this is about.
Note: this will also change vmcoreinfo strings
- SDMA=... gets renamed to SAMODE31=...
- EDMA=... gets renamed to EAMODE31=...
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This change simplifies the task of making the decompressor relocatable.
The decompressor's image contains special DMA sections between _sdma and
_edma. This DMA segment is loaded at boot as part of the decompressor and
then simply handed over to the decompressed kernel. The decompressor itself
never uses it in any way. The primary reason for this is the need to keep
the aforementioned DMA segment below 2GB which is required by architecture,
and because the decompressor is always loaded at a fixed low physical
address, it is guaranteed that the DMA region will not cross the 2GB
memory limit. If the DMA region had been placed in the decompressed kernel,
then KASLR would make this guarantee impossible to fulfill or it would
be restricted to the first 2GB of memory address space.
This commit moves all DMA sections between _sdma and _edma from
the decompressor's image to the decompressed kernel's image. The complete
DMA region is placed in the init section of the decompressed kernel and
immediately relocated below 2GB at start-up before it is needed by other
parts of the decompressed kernel. The relocation of the DMA region happens
even if the decompressed kernel is already located below 2GB in order
to keep the first implementation simple. The relocation should not have
any noticeable impact on boot time because the DMA segment is only a couple
of pages.
After relocating the DMA sections, the kernel has to fix all references
which point into it. In order to automate this, place all variables
pointing into the DMA sections in a special .dma.refs section. All such
variables must be defined using the new __dma_ref macro. Only variables
containing addresses within the DMA sections must be placed in the new
.dma.refs section.
Furthermore, move the initialization of control registers from
the decompressor to the decompressed kernel because some control registers
reference tables that must be placed in the DMA data section to
guarantee that their addresses are below 2G. Because the decompressed
kernel relocates the DMA sections at startup, the content of control
registers CR2, CR5 and CR15 must be updated with new addresses after
the relocation. The decompressed kernel initializes all control registers
early at boot and then updates the content of CR2, CR5 and CR15
as soon as the DMA relocation has occurred. This practically reverts
the commit a80313ff91 ("s390/kernel: introduce .dma sections").
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Create a region 3 page table which contains only invalid entries, and
use that via "s390_invalid_asce" instead of the kernel ASCE whenever
there is either
- no user address space available, e.g. during early startup
- as an intermediate ASCE when address spaces are switched
This makes sure that user space accesses in such situations are
guaranteed to fail.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org
Perf relies on _etext and _stext symbols being one of 't', 'T', 'v' or
'V'. Put them into .text section to guarantee that.
Also moves padding to page boundary inside .text which has an effect that
.text section is now padded with nops rather than 0's, which apparently
has been the initial intention for specifying 0x0700 fill expression.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>