Commit Graph

1586 Commits

Author SHA1 Message Date
Claudio Imbrenda
9029496abf KVM: s390: Properly reset zero bit in PGSTE
In case of memory pressure, it's possible that a guest page gets freed
and then almost immediately reused by the guest. If CMMA is enabled,
_essa_clear_cbrl() will discard all pages that are either unused or
zero. If a discarded page is reused before _essa_clear_cbrl() is called,
and the pgste.zero bit is not cleared, the page will be discarded
despite not being unused.

When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This
prevents the page from being accidentally discarded when not unused.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22 11:25:11 +02:00
Claudio Imbrenda
a488e753de KVM: s390: vsie: Fix redundant rmap entries
The address passed to the gmap rmap was not being masked. As a
consequence several different (but functionally equivalent) rmap
entries were being created for each shadowed table.

Fix this by properly masking the address depending on the table level.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22 11:25:11 +02:00
Claudio Imbrenda
2d505c2906 KVM: s390: vsie: Fix unshadowing logic
In some cases (i.e. under extreme memory pressure on the host),
attempting to shadow memory will result in the same memory being
unshadowed, causing a loop.

Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT
tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent
unnecessary unshadowing and perform better checks.

Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did
not unshadow properly when the large page would become unprotected.

Opportunistically add a check in gmap_protect_rmap() to make sure it
won't be called with level == TABLE_TYPE_PAGE_TABLE.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22 11:25:10 +02:00
Claudio Imbrenda
4df4b7cdf5 KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors
Fix a memory leak that can happen if gmap_ucas_map_one() or
kvm_s390_mmu_cache_topup() return error values.

Also fix a similar issue in gmap_set_limit().

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reported-by: Jiaxin Fan <jiaxin.fan@ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22 11:25:10 +02:00
Claudio Imbrenda
d0f2eb4493 KVM: s390: vsie: Fix memory leak when unshadowing
When performing a partial unshadowing, the rmap was being leaked.

Add the missing kfree().

Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22 11:25:10 +02:00
Paolo Bonzini
ef7e0c51d9 KVM: s390: pci: fix array indexing
For large amounts of PCI devices its possible to overrun the arrays as
 the index was miscalculated in 2 places.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmn4p9UACgkQEXu8gLWm
 HHw7kA//cr8wtdVq2CWwvLpHIvpjRYmQDCApB2vIYPE1AECJqtddiJhq9TolT5rw
 +kqn3hcYmjVhqgqay2IukbuJXFfruPPK2UrF46NmGSxsc/iCglcefRoTOkvJsOXo
 wNzJ/Y7AzZNT1vTTm396vdb/8ACv2zuh073iowDFdRSDLMLt087rJNPf8MQkfxhj
 ZwIqOsGsl1p4WYnnwSy3E5ZsRxPK3kV/JGYvLQyAtx0PGMaTkbAB7KR/PmaxJPal
 IeawpKrpsGzvajXV2EfGVpisTSdKvJ2dwM7NQtX8Q0qVuDufYfsSNlbDKKl3lWIq
 8y5wA2z9oAumZynejBeG46b/nq6Sbeq9lyTNk/52u9ED4RoNmh0s5K0FajD/f229
 xx2XAwsLTrF3ojb0ynHfXKfyzBKMjYu/Y4LtE8bL/wi/BKfs9puoBFixnbvwMrDe
 J8zhxlQyLeZ7Z/hjSWP3UI6w+idA72Z0thf9Nrh0MjhqsKOW4TAD2ZRh/9KQ6B66
 TcmCVe57ehp0aMJ/cqNhXBrvVSH7HL31F/g6Qj8CMiZzJlq3+mlPbWULlqiiBGLr
 Aoxytqlg6YquB8T7SPuopWNkmEU3B9edAn35sqz7Q6/1kzyWdLKMpJLZvU5gtQC1
 KTxpm8aeLdmAzXcwuakdGin9wCT6VfLDEj+wo9qGSLr58wQdwco=
 =CGuC
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: pci: fix array indexing

For large amounts of PCI devices its possible to overrun the arrays as
the index was miscalculated in 2 places.
2026-05-12 23:15:38 +02:00
Matthew Rosato
0cfe660559 KVM: s390: pci: Fix aisb calculation
The current implementation of aisb calculation will erroneously index
via an unsigned long * as well as multiply by 8B for every 64-bits in
the offset; only one or the other is required.  This throws off aisb
calculations once the number of devices exceeds 64, and can result
in out-of-bounds access as well as failure to indicate summary bits
associated with those devices in guests.

Fix this by converting to a physical address before applying the
offset, as is already done in arch/s390/pci/pci_irq.c.

Fixes: 3c5a1b6f0a ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-04-27 11:14:45 +02:00
Junrui Luo
16d990a154 KVM: s390: pci: fix GAIT table indexing due to double-scaling pointer arithmetic
kvm_s390_pci_aif_enable(), kvm_s390_pci_aif_disable(), and
aen_host_forward() index the GAIT by manually multiplying the index
with sizeof(struct zpci_gaite).

Since aift->gait is already a struct zpci_gaite pointer, this
double-scales the offset, accessing element aisb*16 instead of aisb.

This causes out-of-bounds accesses when aisb >= 32 (with
ZPCI_NR_DEVICES=512)

Fix by removing the erroneous sizeof multiplication.

Fixes: 3c5a1b6f0a ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding")
Fixes: 73f91b0043 ("KVM: s390: pci: enable host forwarding of Adapter Event Notifications")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-04-17 13:12:07 +02:00
Paolo Bonzini
6b80203187 - ESA nesting support
- 4k memslots
 - LPSW/E fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmncpfkACgkQ41TmuOI4
 ufj4ehAA0fTpaA4VdUbF/uH1o4BLu/hElPXhJYnyDa6hUK0XiFS6bpouz50wTMz/
 QjbmM+uCLKxVBK2FPE0cPj3iobvlfTTgP0tNkgwHDFlLfuZ9914cxYc4HYPrRJ/y
 Ey+6TT4ynkf2mihiLFHKKuBPi4DjfC3rAjy8ZHOnNh5ro+00uXVCGhssBUKvXNST
 X45q6JaN6p3eDVjC/ov/K593BJgMoW5x/kDmoyICuhDYs+8TiY+n+61BdVARKdtu
 3+vwkjQ/mrl+IwJMvfeH+nO2qnjREc6EZd9YTJOCheThhELw0tX4jeha4PldeeZY
 fg+8uObSmbzxcmsvWRGTuVpobEBpOqRP9sdADxF77dq1ExFXwthXFT8AQw8NzI2k
 leU8DQqXVUOkykmpvacV96AGlYrRWb47806TdVM+fJmLkvmt0llS/MK6fQNz+Jlb
 okFx1kLnqSKz7x0O6Avgz/+F6yjFAwTp7mwKmd8bHzKCkLCYq8Gl6WPxx/peFY0P
 dwEwq0k89Wld7gjkAXwtwjttIrQcwghacqBCJAu4cA/3NnM2DCAPf3gSiY1PoYPX
 06ZUYBzLH8wQJRZLToWpYvH9xOOfMmTETx7LDsYuMztxyesS+ReR/dVkCCei/2oD
 KeoGD0vBA0d8/wW+ZmB6YYxUiWT0WOllb/9s26NG/7lCTY1UgLI=
 =YTUc
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-7.1-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

- ESA nesting support
- 4k memslots
- LPSW/E fix
2026-04-13 19:01:15 +02:00
Claudio Imbrenda
3ffe5eb4a5 KVM: s390: vsie: Fix races with partial gmap invalidations
Introduce a new boolean flag, used for shadow gmaps, to keep track of
whether the gmap has been invalidated, either partially or totally.

Use the new flag to check whether shadow gmap invalidations happened
during shadowing. In such cases, abort whatever was going on, return
-EAGAIN and let the caller try again.

Fixes: 19d6c5b804 ("KVM: s390: vsie: Fix unshadowing while shadowing")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260407161721.247044-1-imbrenda@linux.ibm.com>
2026-04-07 18:20:58 +02:00
Claudio Imbrenda
9b8e8aad58 KVM: s390: ucontrol: Fix memslot handling
Fix memslots handling for UCONTROL guests. Attempts to delete user
memslots will fail, as they should, without the risk of a NULL pointer
dereference.

Fixes: 413c98f24c ("KVM: s390: fake memslot for ucontrol VMs")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07 17:20:42 +02:00
Claudio Imbrenda
06a20c3ab6 KVM: s390: Allow 4k granularity for memslots
Until now memslots on s390 needed to have 1M granularity and be 1M
aligned. Since the new gmap code can handle memslots with 4k
granularity and alignment, remove the restrictions.

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07 17:07:22 +02:00
Claudio Imbrenda
4204067f99 KVM: s390: Add alignment checks for hugepages
When backing a guest page with a large page, check that the alignment
of the guest page matches the alignment of the host physical page
backing it within the large page.

Also check that the memslot is large enough to fit the large page.

Those checks are currently not needed, because memslots are guaranteed
to be 1m-aligned, but this will change.

Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07 17:07:19 +02:00
Claudio Imbrenda
6da4b1a435 KVM: s390: Add some useful mask macros
Add _{SEGMENT,REGION3}_FR_MASK, similar to _{SEGMENT,REGION3}_MASK, but
working on gfn/pfn instead of addresses. Use them in gaccess.c instead
of using the normal masks plus gpa_to_gfn().

Also add _PAGES_PER_{SEGMENT,REGION3} to make future code more readable.

Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07 17:07:14 +02:00
Hendrik Brueckner
4aebd7d5c7 KVM: s390: Add KVM capability for ESA mode guests
Now that all the bits are properly addressed, provide a mechanism
for testing ESA mode guests in nested configurations.

Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
[farman@us.ibm.com: Updated commit message]
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02 15:37:02 +02:00
Eric Farman
c0dcada088 KVM: s390: vsie: Accommodate ESA prefix pages
The prefix page address occupies a different number
of bits for z/Architecture versus ESA mode. Adjust the
definition to cover both, and permit an ESA mode
address within the nested codepath.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02 15:37:01 +02:00
Eric Farman
a9640e2eb7 KVM: s390: vsie: Disable some bits when in ESA mode
In the event that a nested guest is put in ESA mode,
ensure that some bits are scrubbed from the shadow SCB.

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02 15:37:01 +02:00
Eric Farman
b0ad874d98 KVM: s390: vsie: Allow non-zarch guests
Linux/KVM runs in z/Architecture-only mode. Although z/Architecture
is built upon a long history of hardware refinements, any other
CPU mode is not permitted.

Allow a userspace to explicitly enable the use of ESA mode for
nested guests, otherwise usage will be rejected.

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02 15:37:01 +02:00
Janosch Frank
1653545abc KVM: s390: Fix lpsw/e breaking event handling
LPSW and LPSWE need to set the gbea on completion but currently don't.
Time to fix this up.

LPSWEY was designed to not set the bear.

Fixes: 48a3e950f4 ("KVM: s390: Add support for machine checks.")
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-03-31 08:37:06 +00:00
Eric Farman
2623c96f11 KVM: s390: only deliver service interrupt with payload
Routine __inject_service() may set both the SERVICE and SERVICE_EV
pending bits, and in the case of a pure service event the corresponding
trip through __deliver_service_ev() will clear the SERVICE_EV bit only.
This necessitates an additional trip through __deliver_service() for
the other pending interrupt bit, however it is possible that the
external interrupt parameters are zero and there is nothing to be
delivered to the guest.

To avoid sending empty data to the guest, let's only write out the SCLP
data when there is something for the guest to do, otherwise bail out.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-03-31 08:36:51 +00:00
Claudio Imbrenda
0a28e06575 KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl
A previous commit changed the behaviour of the KVM_S390_VCPU_FAULT
ioctl. The current (wrong) implementation will trigger a guest
addressing exception if the requested address lies outside of a
memslot, unless the VM is UCONTROL.

Restore the previous behaviour by open coding the fault-in logic.

Fixes: 3762e905ec ("KVM: s390: use __kvm_faultin_pfn()")
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:38 +01:00
Claudio Imbrenda
a12cc7e3d6 KVM: s390: vsie: Fix guest page tables protection
When shadowing, the guest page tables are write-protected, in order to
trap changes and properly unshadow the shadow mapping for the nested
guest. Already shadowed levels are skipped, so that only the needed
levels are write protected.

Currently the levels that get write protected are exactly one level too
deep: the last level (nested guest memory) gets protected in the wrong
way, and will be protected again correctly a few lines afterwards; most
importantly, the highest non-shadowed level does *not* get write
protected.

Moreover, if the nested guest is running in a real address space, there
are no DAT tables to shadow.

Write protect the correct levels, so that all the levels that need to
be protected are protected, and avoid double protecting the last level;
skip attempting to shadow the DAT tables when the nested guest is
running in a real address space.

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:34 +01:00
Claudio Imbrenda
19d6c5b804 KVM: s390: vsie: Fix unshadowing while shadowing
If shadowing causes the shadow gmap to get unshadowed, exit early to
prevent an attempt to dereference the parent pointer, which at this
point is NULL.

Opportunistically add some more checks to prevent NULL parents.

Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Fixes: e5f98a6899 ("KVM: s390: Add some helper functions needed for vSIE")
Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:30 +01:00
Claudio Imbrenda
0ec456b8a5 KVM: s390: vsie: Fix refcount overflow for shadow gmaps
In most cases gmap_put() was not called when it should have.

Add the missing gmap_put() in vsie_run().

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:25 +01:00
Claudio Imbrenda
fd7bc612cf KVM: s390: vsie: Fix nested guest memory shadowing
Fix _do_shadow_pte() to use the correct pointer (guest pte instead of
nested guest) to set up the new pte.

Add a check to return -EOPNOTSUPP if the mapping for the nested guest
is writeable but the same page in the guest is only read-only.

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:21 +01:00
Claudio Imbrenda
0f2b760a17 KVM: s390: Correctly handle guest mappings without struct page
Introduce a new special softbit for large pages, like already presend
for normal pages, and use it to mark guest mappings that do not have
struct pages.

Whenever a leaf DAT entry becomes dirty, check the special softbit and
only call SetPageDirty() if there is an actual struct page.

Move the logic to mark pages dirty inside _gmap_ptep_xchg() and
_gmap_crstep_xchg_atomic(), to avoid needlessly duplicating the code.

Fixes: 5a74e3d934 ("KVM: s390: KVM-specific bitfields and helper functions")
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:18 +01:00
Claudio Imbrenda
45921d0212 KVM: s390: Fix gmap_link()
The slow path of the fault handler ultimately called gmap_link(), which
assumed the fault was a major fault, and blindly called dat_link().

In case of minor faults, things were not always handled properly; in
particular the prefix and vsie marker bits were ignored.

Move dat_link() into gmap.c, renaming it accordingly. Once moved, the
new _gmap_link() function will be able to correctly honour the prefix
and vsie markers.

This will cause spurious unshadows in some uncommon cases.

Fixes: 94fd9b16cc ("KVM: s390: KVM page table management functions: lifecycle management")
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:13 +01:00
Claudio Imbrenda
6f93d1ed6f KVM: s390: vsie: Fix check for pre-existing shadow mapping
When shadowing a nested guest, a check is performed and no shadowing is
attempted if the nested guest is already shadowed.

The existing check was incomplete; fix it by also checking whether the
leaf DAT table entry in the existing shadow gmap has the same protection
as the one specified in the guest DAT entry.

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:07 +01:00
Claudio Imbrenda
b827ef02f4 KVM: s390: Remove non-atomic dat_crstep_xchg()
In practice dat_crstep_xchg() is racy and hard to use correctly. Simply
remove it and replace its uses with dat_crstep_xchg_atomic().

This solves some actual races that lead to system hangs / crashes.

Opportunistically fix an alignment issue in _gmap_crstep_xchg_atomic().

Fixes: 589071eaaa ("KVM: s390: KVM page table management functions: clear and replace")
Fixes: 94fd9b16cc ("KVM: s390: KVM page table management functions: lifecycle management")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:12:03 +01:00
Claudio Imbrenda
0f54755343 KVM: s390: vsie: Fix dat_split_ste()
If the guest misbehaves and puts the page tables for its nested guest
inside the memory of the nested guest itself, and the guest and nested
guest are being mapped with large pages, the shadow mapping will
lose synchronization with the actual mapping, since this will cause the
large page with the vsie notification bit to be split, but the
vsie notification bit will not be propagated to the resulting small
pages.

Fix this by propagating the vsie_notif bit from large pages to normal
pages when splitting a large page.

Fixes: 2db149a0a6 ("KVM: s390: KVM page table management functions: walks")
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26 16:11:58 +01:00
Paolo Bonzini
12fd965871 KVM: s390: Fixes for 7.0
- fix deadlock in new memory management
 - handle kernel faults on donated memory properly
 - fix bounds checking for irq routing + selftest
 - fix invalid machine checks + logging
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmm5TzoACgkQEXu8gLWm
 HHyrjQ/+KlX/odZnN6KE/WGxB0pf06aXfQTBhM8vmfrig/vimIZrm2xszO6TIdZQ
 rYcUik1mMv1VTCYi4RWnKPklj70NgXRRKwfUNrHzql4VFiTlCPmALHw7LDUDrJEf
 OriU4wL+T9G/638logfZJBmfhunHR6HqHP+LJLm6eIIQKIYmEjPoGpSB1HBP+9YN
 viz2dvKXO8NR41rx14NkqMeyR6zQl+I+1CQCuJmSqxtnAyRFPCTrWLElPFO+J+ha
 02jurSiQk89nLlgEqlzthnbv9NopyaLErSXXx9FzESjHli6hhP8rPtxDL2oJB1VF
 YHDW5ln1w1H22i1VXuyU5jg4D3OOUz7e//CaP5wZBHFUIJxpYzeK7faDLYJHphk4
 JNg4uI+mhQ/6E2Dlos8efefP/gqdVAfqOHr7l+4nCYtfh3aQhezbQAB24W6wQL9/
 gs/TnTRt8Rs2UGXLAY0t3+Y7ATrRynDD5DzmQodc19l26076QodvI1xCeptX5Kth
 N855SIIcCcEbYSK1fSquIeCoJ9aAAyQbLDefNLHtWzgzX+Lz77lnmu90tpVnq4qk
 sjIsFq6qw8xso3bDKviiFOLdJz/zTW33YCHKPAl43iFgc6yC8pTT4hp6J5kcGHmD
 bwRSnUz9mmgmyCzU/DetXo3P+n5mqXG2c+iMMQ8vkig+NVduQ7w=
 =uUMD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 7.0

- fix deadlock in new memory management
- handle kernel faults on donated memory properly
- fix bounds checking for irq routing + selftest
- fix invalid machine checks + logging
2026-03-24 17:32:13 +01:00
Christian Borntraeger
ab5119735e KVM: s390: vsie: Avoid injecting machine check on signal
The recent XFER_TO_GUEST_WORK change resulted in a situation, where the
vsie code would interpret a signal during work as a machine check during
SIE as both use the EINTR return code.
The exit_reason of the sie64a function has nothing to do with the
kvm_run exit_reason. Rename it and define a specific code for machine
checks instead of abusing -EINTR.
rename exit_reason into sie_return to avoid the naming conflict
and change the code flow in vsie.c to have a separate variable for rc
and sie_return.

Fixes: 2bd1337a12 ("KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions")
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-16 16:56:39 +01:00
Christian Borntraeger
1ca90f4ae5 KVM: s390: log machine checks more aggressively
KVM will reinject machine checks that happen during guest activity.
From a host perspective this machine check is no longer visible
and even for the guest, the guest might decide to only kill a
userspace program or even ignore the machine check.
As this can be a disruptive event nevertheless, we should log this
not only in the VM debug event (that gets lost after guest shutdown)
but also on the global KVM event as well as syslog.
Consolidate the logging and log with loglevel 2 and higher.

Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
2026-03-16 16:56:39 +01:00
Janosch Frank
dcf96f7ad5 KVM: s390: Limit adapter indicator access to mapped page
While we check the address for errors, we don't seem to check the bit
offsets and since they are 32 and 64 bits a lot of memory can be
reached indirectly via those offsets.

Fixes: 8422359877 ("KVM: s390: irq routing for adapter interrupts.")
Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-03-16 16:56:39 +01:00
Paolo Bonzini
94fe3e6515 KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
    unnecessary and confusing, triggered compiler warnings due to
    -Wflex-array-member-not-at-end.
 
  - Document that vcpu->mutex is take outside of kvm->slots_lock and
    kvm->slots_arch_lock, which is intentional and desirable despite being
    rather unintuitive.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmmp19MACgkQOlYIJqCj
 N/02KA//e7D1DqCcDC46tMyLI+/Q6Wy0F40nXp0tTzJ+gRT5QesEw3jSQdXCRmPV
 yTFLyDaGYD2jqV+EpJLPYBT41oU2FXsjD5NFJRAISD5KPIJbACHvJUxWGYWLvaLU
 iMlwhqZimXKUFAECW2QpwLV8BQenyOEj5dVeKYdPjX6seIEeFlK6JAdteLK0g9gR
 gksE+9QzCFXt0cRfgkaA4UKcA+xWb3ThKMej1AadB6dGF7ezkMvyyQynGLB2N19L
 LZRpOXr70ypyaihC553Msgi4vrpVTPN2BjLrsudGN/IJv6QbdAz5jTU8Lwu9R5QT
 y9LiEPfdMT7WmIBxnH6V7HO5OoN8V2rGJpB/a3KvKO73QjhJJqNyqB6LDPqEbHyw
 AmhQCuQ8Pn1RLKQDXdKll+aI19vi7aOVpq67ii+I9xbzHgg5+uAzKr8hkPAibnVw
 KPGYqgYQa5j3jyRq6jRkAZSkEKZ9PoM8LMiqgnNW1ZrlrDqsPajKaegXODfLuvGf
 yLYtfXbZLMAIAM32YeIH0LrcAT7SEPUFkoh85IB2YOk0mfU1PxqrXOVTPh1GkY2Q
 bKH16T9S4zCfB20V+NYCn+juX4uCNb56b7/jbjI0Ueu/AGv/ITHwRrlhQvXuGSvN
 A65w+LSWlcgRQwLglCPpX308A4DcGCPcY4RvzoirBG+WWNn/Aj4=
 =bD3g
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 7.0

 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.

 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.
2026-03-11 18:01:55 +01:00
Claudio Imbrenda
f303406efd KVM: s390: Fix a deadlock
In some scenarios, a deadlock can happen, involving _do_shadow_pte().

Convert all usages of pgste_get_lock() to pgste_get_trylock() in
_do_shadow_pte() and return -EAGAIN. All callers can already deal with
-EAGAIN being returned.

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-06 12:41:28 +01:00
Paolo Bonzini
70295a479d KVM: always define KVM_CAP_SYNC_MMU
KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always
available.  Move the definition from individual architectures to common
code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-28 15:31:35 +01:00
Paolo Bonzini
407fd8b8d8 KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER
All architectures now use MMU notifier for KVM page table management.
Remove the Kconfig symbol and the code that is used when it is
disabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-28 15:31:35 +01:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Steffen Eiden
e3372ffb5f KVM: s390: Increase permitted SE header size to 1 MiB
Relax the maximum allowed Secure Execution (SE) header size from
8 KiB to 1 MiB. This allows individual secure guest images to run on a
wider range of physical machines.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10 12:21:30 +01:00
Claudio Imbrenda
f8f296ea1c KVM: s390: vsie: Fix race in acquire_gmap_shadow()
The shadow gmap returned by gmap_create_shadow() could get dropped
before taking the gmap->children_lock. This meant that the shadow gmap
was sometimes being used while its reference count was 0.

Fix this by taking the additional reference inside gmap_create_shadow()
while still holding gmap->children_lock, instead of afterwards.

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10 11:33:34 +01:00
Claudio Imbrenda
b6ab71a27c KVM: s390: vsie: Fix race in walk_guest_tables()
It is possible that walk_guest_tables() is called on a shadow gmap that
has been removed already, in which case its parent will be NULL.

In such case, return -EAGAIN and let the callers deal with it.

Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10 11:33:30 +01:00
Claudio Imbrenda
898885477e KVM: s390: Use guest address to mark guest page dirty
Stop using the userspace address to mark the guest page dirty.
mark_page_dirty() expects a guest frame number, but was being passed a
host virtual frame number. When slot == NULL, mark_page_dirty_in_slot()
does nothing and does not complain.

This means that in some circumstances the dirtiness of the guest page
might have been lost.

Fix by adding two fields in struct kvm_s390_adapter_int to keep the
guest addressses, and use those for mark_page_dirty().

Fixes: f65470661f ("KVM: s390/interrupt: do not pin adapter interrupt pages")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10 11:33:25 +01:00
Claudio Imbrenda
0ee4ddc164 KVM: s390: Storage key manipulation IOCTL
Add a new IOCTL to allow userspace to manipulate storage keys directly.

This will make it easier to write selftests related to storage keys.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-04 17:00:10 +01:00
Claudio Imbrenda
0fdd5c18a9 KVM: s390: Enable 1M pages for gmap
While userspace is allowed to have pages of any size, the new gmap
would always use 4k pages to back the guest.

Enable 1M pages for gmap.

This allows 1M pages to be used to back a guest when userspace is using
1M pages for the corresponding addresses (e.g. THP or hugetlbfs).

Remove the limitation that disallowed having nested guests and
hugepages at the same time.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-04 17:00:10 +01:00
Claudio Imbrenda
728b0e21b4 KVM: S390: Remove PGSTE code from linux/s390 mm
Remove the PGSTE config option.
Remove all code from linux/s390 mm that involves PGSTEs.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-04 17:00:10 +01:00
Claudio Imbrenda
e38c884df9 KVM: s390: Switch to new gmap
Switch KVM/s390 to use the new gmap code.

Remove includes to <gmap.h> and include "gmap.h" instead; fix all the
existing users of the old gmap functions to use the new ones instead.

Fix guest storage key access functions to work with the new gmap.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-04 17:00:10 +01:00
Claudio Imbrenda
d29a29a9e1 KVM: s390: Storage key functions refactoring
Refactor some storage key functions to improve readability.

Introduce helper functions that will be used in the next patches.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-04 17:00:09 +01:00