mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-01-12 01:20:14 +00:00
706 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
feb06d2690 |
hyperv-next for v6.19
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmk2b0ITHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXkefCACpUWTK0U0i47hXT+s4aA0T3sq6V3/T
+su9WnT3GPQ3BuRCRk51w6u9ADYt1EXtu8gRwq/wZiES9PJtz+9DmNuLT8nkkHXH
exbaRIBAiwLGg6QFC2VpbQzeHLp7qeko0MsLWyMiVPkw+lw9QPqcLKVEWuzPZfOn
UCkPB+XpzZg9Ft4vKRjXLyUMpwKzkqJw/aiXMfwonuaelcrzLw0hkzO3/I+eKRHv
JKxaHCwLgrPZyGCJpWtwiLxgu0DKLeDDhj0WSqDz/kUNhjo/GEshLA25UQJUdzI0
O+tFN9my7SZSYtq7fGoyfo16mAsLaXh0oYuwP8UnR4CDm4UF4JB4QTsM
=laZR
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Enhancements to Linux as the root partition for Microsoft Hypervisor:
- Support a new mode called L1VH, which allows Linux to drive the
hypervisor running the Azure Host directly
- Support for MSHV crash dump collection
- Allow Linux's memory management subsystem to better manage guest
memory regions
- Fix issues that prevented a clean shutdown of the whole system on
bare metal and nested configurations
- ARM64 support for the MSHV driver
- Various other bug fixes and cleanups
- Add support for Confidential VMBus for Linux guest on Hyper-V
- Secure AVIC support for Linux guests on Hyper-V
- Add the mshv_vtl driver to allow Linux to run as the secure kernel in
a higher virtual trust level for Hyper-V
* tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits)
mshv: Cleanly shutdown root partition with MSHV
mshv: Use reboot notifier to configure sleep state
mshv: Add definitions for MSHV sleep state configuration
mshv: Add support for movable memory regions
mshv: Add refcount and locking to mem regions
mshv: Fix huge page handling in memory region traversal
mshv: Move region management to mshv_regions.c
mshv: Centralize guest memory region destruction
mshv: Refactor and rename memory region handling functions
mshv: adjust interrupt control structure for ARM64
Drivers: hv: use kmalloc_array() instead of kmalloc()
mshv: Add ioctl for self targeted passthrough hvcalls
Drivers: hv: Introduce mshv_vtl driver
Drivers: hv: Export some symbols for mshv_vtl
static_call: allow using STATIC_CALL_TRAMP_STR() from assembly
mshv: Extend create partition ioctl to support cpu features
mshv: Allow mappings that overlap in uaddr
mshv: Fix create memory region overlap check
mshv: add WQ_PERCPU to alloc_workqueue users
Drivers: hv: Use kmalloc_array() instead of kmalloc()
...
|
||
|
|
e0c26d47de |
- SCA rework
- VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmktiX8ACgkQ41TmuOI4 ufhozBAAuyPxu1cZqfAiuEpftR0fUFZeyqRLHqfFPNQUGW/kPZRz2uNd38qulboV gmbu5jcwf8SdbF+p8f7RLvkEyTEnzuXELrfSwcwyv9IUiK++p9gRNkuppHbNnTI7 yK21hJz+jZmRzUrSxnLylTC3++RZczhVeHqHzwosnHcNerK6FLcIjjsl7YinJToI T3jiTmprXl5NzFu7O5N/3J2KAIqNr+3DfnOf2lnLzHeupc52Z6TtvdizypAAV7Yk qWQ/81HI8GtIPFWss1kNwrJXQBjgBObz3XBOtq0bw1Ycs+BijsQh424vFoetV1/n bdmEh38lfY3sbbSE3RomnEATRdzremiYb63v5E4Bg7/bpLPhXw+jMF2Hp8jNqOiZ jI7KpGPOA4+C1EzS+Uge81fksW+ylNEYk/dZgGQgOFtF8Vf+Ana0NloDAqMHUeXq gVI2Sd9nMR80WslVzs5DMj/XK86J2TsFxtKYPa1cHV9PkHegO+eJm2nWCRHbfddz iEymokTm9xmfykjFfKDwZ4EcB5vdV7cuNE8aedsp9NXgICrgDbPn8ualG6aZUB0c ScvfRuoiZT7e4D8UZ79uCOCPQqwGCffOfIOee3ocf/95ZVY+9xv7FTTh200DjBU2 Jv1NoTe9ZOO4+dYWRsht0fzC7zBVDO3CEb6OcNRB9wgNidDQaeM= =PtzZ -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-6.19-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - SCA rework - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups |
||
|
|
f58e70cc31 |
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
-----BEGIN PGP SIGNATURE-----
iIgEABYKADAWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaS3m5RIcb3VwdG9uQGtl
cm5lbC5vcmcACgkQor51iCR83Rb4NAD8C1fGoiCErb6htQMHf1I7ua0ThdIx7OnY
Mk1EysNWu94BAI/VKEYgz+UC5uapHh+gnsoOdVTMJZedI/OPrnKa3QIA
=/Vl1
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
|
||
|
|
63a9b0bc65 |
KVM/riscv changes for 6.19
- SBI MPXY support for KVM guest - New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmkpa8kACgkQrUjsVaLH LAd3lBAAhNlBVnva6fZseKf1ICGpwclXT/Ndqhn6CKWAPuvqsZvApQzTkW6f/txI dwhu7SfAJeH62bQRHoyH/gpd5I1cplogp/xmUcAQJrzD4W0Wf0799hdFNOm9PAJf IWeMMXSvj4CT8s3xinoKPt1YbmNvdDq3KkK776CET5B0/mIaGi3zBWC9ThU0aMl9 mlUTvIojApqmdhe6rXpjIZWj/nSP8XrDuYVmJS1Ys4xvCRW4Qyiu4QU1OKYMcwYR xh6fgXDYufxojMs+h59mL8HOqBO5Kf79aO4lvjesFfZiRIii0+BATf16InH3XPyn bkX3RD4LqgkU4q9I5TtwZ+UpxFvrkigliUewLYrxWFgLzJu6kSBpACduQYDyNSgm X33iAm+m8V2tbl0FLHWRQGw970H9z4ycmEa4eII//+AePGTeFlHK90Qy9As2uW4E XQet0Wqh/tw+qHRpy7Bls1k5MRtyYGJwi4fbSOp/g8Kjgg/DzSsF+qN2FyNE8GNj +w8044fNYpDqd13BsSR99K/cUtFiAOjWN+RiMsu1wM8MRXpAL1lgW01KWqcH/LaD gKZjmevETiWMKDUdERkXj+e7xZCb2cfyheJ+vw9Ds5u8Dwp9p8cga8dGyvgcUTEX gF+4dx+MoW6uirX+Cd/TJYluu5c19bYKhgEybVBG/5er24cnshE= =9ob6 -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.19 - SBI MPXY support for KVM guest - New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores |
||
|
|
679fcce002 |
KVM SVM changes for 6.19:
- Fix a few missing "VMCB dirty" bugs.
- Fix the worst of KVM's lack of EFER.LMSLE emulation.
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
- Fix incorrect handling of selective CR0 writes when checking intercepts
during emulation of L2 instructions.
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
VMRUN and #VMEXIT.
- Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
interrupt if the guest patched the underlying code after the VM-Exit, e.g.
when Linux patches code with a temporary INT3.
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
userspace, and extend KVM "support" to all policy bits that don't require
any actual support from KVM.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmVFgACgkQOlYIJqCj
N/3GVA/+ITWLRuY28kGmpDBflp6EyMIDlBe4v+JRFV5Ll/6kq+sYWZTIE8uiDHNP
E6dbrAnU3xxF2fK5pb3Fq0kKslu/+UTnldt3VN0uGzHlTqvw9itKaCdFnOBMND4Z
Fs7SbXUd1ZWz5Z/Uq2niJDd3p1YA69i1At+udXterFVKzl3GSmNMvsXhNPjjF3gL
+purCkHfWLnXhFJYMYgnaWDJUQldR+YfulWwEd2y4qXyfoqOBJhg7DpKuHlewBVT
2ice4k4nBu47rNl+ZRFM9sFX0959OX8MxykO902UB4+qS39jFzTlyM+LjnxKmCfC
dzGTh4lhG/1QQme6TYBQ+OgXMj6H+8KqQ+YNbjxjAEgY8hWDdVK0bZMIq/iS16aQ
VPSf1/GufdvV+dUyyb2DZzf7NhWmKyVGjlN5PnGQQl0x6+LwI5m3EODDLfTHlmlb
0UEZkXdN74ghT2ExepVyVKeDtbQPJNFN/voBnYr8n0P+9Jf28QuoWD5bTloJIxIJ
OwjwJq3HbDduq/RCFbiSERMBPYFxYCkxVlt+TI+ONhNCUNfvxefNfftHIx+6Yk73
IV5g3gWNWkIo4h1yp8zsglwiTStY4qpiR52YlDLN3+btgYcPOAXt/U4nigaomfdR
NoYbuqD1N+u1P4Vlnr8uUZExQVY+JoIPrB3zPnITni+aucSpp+c=
=p1kg
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.19:
- Fix a few missing "VMCB dirty" bugs.
- Fix the worst of KVM's lack of EFER.LMSLE emulation.
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
- Fix incorrect handling of selective CR0 writes when checking intercepts
during emulation of L2 instructions.
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
VMRUN and #VMEXIT.
- Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
interrupt if the guest patched the underlying code after the VM-Exit, e.g.
when Linux patches code with a temporary INT3.
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
userspace, and extend KVM "support" to all policy bits that don't require
any actual support from KVM.
|
||
|
|
9aca52b552 |
KVM generic changes for 6.19:
- Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
irqfd cleanup.
- Fix a goof in the dirty ring documentation.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmSKsACgkQOlYIJqCj
N/00fA/6A15+hw7l/McGvaiYU1u7BZP2ST9ySK6j+opLJaly1CzJF17164+0sxRr
igwVZrS4pzjlucC8T308AjqMAqkOPkOeBW3YBftdRWUO8gd0ns8XZuYtUxEXcCM1
s6kIQrtuYmMplW0Gb6zCcXuVCFiVkTbV/ZBsk/URgf5tA1hX8W5Tj/VtufEEPqzv
40N6HRvUz2xg5wQWEx9hOG1WQtvPhjOhbRXwDk/Tz/BYoHPK5lh3HEfNAGgyosoW
nOtpm41iA0S/AFCnucuhK9Sv0JzIvcArp1znjXo/f742k8LJZ8UwNjk7iccAYspk
4QzHpXi4MkUoIpUQ/is6ODgu4E4g6/f3QUOmbvX2snXXLluvwfRUrEaoDRaJtaXG
3W5Wz2f5j+J6oNRNWT96qDBbbI9s16tHEFhds1cyk964JYUs5mfVx8d0iNfsmXa0
SDDPvqoFWx3yXtEHg2rgGGQm/oQ02D/azn2XZ/ZMy6EcUP33nVAm4zkeJZXFjofE
Uk7t0kfE4tRweHIxZuOsp9GmSu8Ps5asZpNZuaJWBivPiHxCE7WCSNjM9USUbqFt
NShsZjPMOejGmDfP5e+xbwMrIDr1Ywwp6+QZXDyfj9+O+u3HJ/mg7QyfwWCjuFLy
zD9OCwYnZzvmTFA8zB6kE9DRZY1TLDZWhdmJEWHwIrqrAu0kXv8=
=NRH/
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-generic-6.19' of https://github.com/kvm-x86/linux into HEAD
KVM generic changes for 6.19:
- Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
irqfd cleanup.
- Fix a goof in the dirty ring documentation.
|
||
|
|
df60cb2e67 |
KVM: riscv: Support enabling dirty log gradually in small chunks
There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in
small chunks") and c862626 ("KVM: arm64: Support enabling dirty log
gradually in small chunks"). This adds support for riscv.
x86 and arm64 writes protect both huge pages and normal pages now, so
riscv protect also protects both huge pages and normal pages.
On a nested virtualization setup (RISC-V KVM running inside a QEMU VM
on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux
VM using different backing page sizes. The time taken for
memory_global_dirty_log_start in the L2 QEMU is listed below:
Page Size Before After Optimization
4K 4490.23ms 31.94ms
2M 48.97ms 45.46ms
1G 28.40ms 30.93ms
Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251103062825.9084-1-dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
||
|
|
8e8678e740 |
KVM: s390: Add capability that forwards operation exceptions
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation exceptions to user space. This also includes the 0x0000 instructions managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants to emulate instructions which do not (yet) have an opcode. While we're at it refine the documentation for KVM_CAP_S390_USER_INSTR0. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> |
||
|
|
92c7053b44 |
Documentation: hyperv: Confidential VMBus
Define what the confidential VMBus is and describe what advantages it offers on the capable hardware. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> |
||
|
|
9f4ce48788 |
KVM: x86: Document a virtualization gap for GIF on AMD CPUs
According to the APM Volume #2, Section 15.17, Table 15-10 (24593—Rev. 3.42—March 2024), When "GIF==0", an "Debug exception or trap, due to breakpoint register match" should be "Ignored and discarded". KVM lacks any handling of this. Even when vGIF is enabled and vGIF==0, the CPU does not ignore #DBs and relies on the VMM to do so. Handling this is possible, but the complexity is unjustified given the rarity of using HW breakpoints when GIF==0 (e.g. near VMRUN). KVM would need to intercept the #DB, temporarily disable the breakpoint, singe-step over the instruction (probably reusing NMI singe-stepping), and re-enable the breakpoint. Instead, document this as an erratum. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251030223757.2950309-1-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
4debb5e895 |
Documentation: kvm: new UAPI for handling SEA
Document the new userspace-visible features and APIs for handling synchronous external abort (SEA) - KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature. - KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA and what userspace gets while taking the SEA. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://msgid.link/20251013185903.1372553-4-jiaqiyan@google.com [ oliver: make documentation concise, remove implementation detail ] Signed-off-by: Oliver Upton <oupton@kernel.org> |
||
|
|
182a258b5e |
Documentation: kvm: Fix ordering
7.43 has been assigned twice, make
KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 7.44.
Fixes: f55ce5a6cd33 ("KVM: arm64: Expose new KVM cap for cacheable PFNMAP")
Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
||
|
|
4361f5aa8b |
KVM x86 fixes for 6.18:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace
deletes/moves memslot during prefault")
- Don't try to get PMU capabbilities from perf when running a CPU with hybrid
CPUs/PMUs, as perf will rightly WARN.
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
said flag to initialize memory as SHARED, irrespective of MMAP. The
behavior merged in 6.18 is that enabling mmap() implicitly initializes
memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
as their memory is currently always initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
memory, to enable testing such setups, i.e. to hopefully flush out any
other lurking ABI issues before 6.18 is officially released.
- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
and host userspace accesses to mmap()'d private memory.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjyszoACgkQOlYIJqCj
N/03+g/9FPoIPxGL9+tJyGahdH2Mygiip8Q3tUTlGVkskp+dplf+6T51ogdqBOkS
tvlGjccxAOVW73ijn8ox7UtGSY9B1IJ+rj/uhsBOMFQptyUJEv4iEujFKB/t5RIF
gOTVVR6Z/mcrYJY7F21qBPCHPbz++rEXrfgyAsosz6tpS/nL6vrNdp1LZlcsLM/k
5DuhkQHLEwJoUXO5VUsBWRa3gRdOuZ+SJGd4C1mfDwXagxe8uFYRO/vraOV9dKJx
l9RxKPIhf42cfu9tIeJYDJyXIDgzXUEND4smVw/ito1SeakBlC9rQ15ya8ETLAEX
tHzmj38RPOfjsycGKIRhlLzQx77b+t7FhNdse19QC3p3u9Jn8FuMyFcGZuiaP5kK
e9Xrp/zcaCNfHB6gGKEud7h7fpV9yB8SNlCsP81if73buq3qHx0+jeVg3jCS4mkb
zGY7CG+oKJOdTSN8JAh8nJMM4bUv5m8myr6yYGU+SsGBQsyPyqQdRWuKdQyOQIVC
RZSWXSjKfmT5FwSs0KRI0yUjMeUYbgwrsysFuS3qX62mZpr0vLaPoiYFvuffe9gB
W3Tt98QNPtBmJdINNncgKvbn3sp/CzUHirygaI0APZwh6QQAkL1Dp2beA1i95uVI
8uin4zUjRbRjUpMUaEtAQIaVMTVQqgfiPAvtcDCRBBeOGaNr81M=
=lBtj
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.18:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace
deletes/moves memslot during prefault")
- Don't try to get PMU capabbilities from perf when running a CPU with hybrid
CPUs/PMUs, as perf will rightly WARN.
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
said flag to initialize memory as SHARED, irrespective of MMAP. The
behavior merged in 6.18 is that enabling mmap() implicitly initializes
memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
as their memory is currently always initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
memory, to enable testing such setups, i.e. to hopefully flush out any
other lurking ABI issues before 6.18 is officially released.
- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
and host userspace accesses to mmap()'d private memory.
|
||
|
|
04fd067b77 |
KVM: Fix VM exit code for full dirty ring in API documentation
While reading the documentation, I saw a exit code I could not grep for, to figure out it has a slightly different name. Fix that name in documentation so it points to the right exit code. Signed-off-by: Leonardo Bras <leo.bras@arm.com> Link: https://lore.kernel.org/r/20251014152802.13563-1-leo.bras@arm.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
164ecbf73c |
Documentation: KVM: Update GICv3 docs for GICv5 hosts
GICv5 hosts optionally include FEAT_GCIE_LEGACY, which allows them to execute GICv3-based VMs on GICv5 hardware. Update the GICv3 documentation to reflect this now that GICv3 guests are supports on compatible GICv5 hosts. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
|
|
cc4309324d |
KVM: arm64: Document vCPU event ioctls as requiring init'ed vCPU
KVM rejects calls to KVM_{GET,SET}_VCPU_EVENTS for an uninitialized vCPU
as of commit cc96679f3c03 ("KVM: arm64: Prevent access to vCPU events
before init"). Update the corresponding API documentation.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
||
|
|
fe2bf6234e |
KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set
Add a guest_memfd flag to allow userspace to state that the underlying
memory should be configured to be initialized as shared, and reject user
page faults if the guest_memfd instance's memory isn't shared. Because
KVM doesn't yet support in-place private<=>shared conversions, all
guest_memfd memory effectively follows the initial state.
Alternatively, KVM could deduce the initial state based on MMAP, which for
all intents and purposes is what KVM currently does. However, implicitly
deriving the default state based on MMAP will result in a messy ABI when
support for in-place conversions is added.
For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
by default (otherwise the memory would be unusable). If MMAP implies
memory is shared by default, then the default state for CoCo VMs will vary
based on MMAP, and from userspace's perspective, will change when in-place
conversion support is added. I.e. to maintain guest<=>host ABI, userspace
would need to immediately convert all memory from shared=>private, which
is both ugly and inefficient. The inefficiency could be avoided by adding
a flag to state that memory is _private_ by default, irrespective of MMAP,
but that would lead to an equally messy and hard to document ABI.
Bite the bullet and immediately add a flag to control the default state so
that the effective behavior is explicit and straightforward.
Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
||
|
|
d2042d8f96 |
KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS
Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't require a new capability, and so that developers aren't tempted to bundle multiple flags into a single capability. Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit value, but that limitation can be easily circumvented by adding e.g. KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more than 32 flags. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
256e341706 |
Generic:
* Rework almost all of KVM's exports to expose symbols only to KVM's x86
vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
x86:
* Rework almost all of KVM x86's exports to expose symbols only to KVM's
vendor modules, i.e. to kvm-{amd,intel}.ko.
* Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
It's worth noting that while SHSTK and IBT can be enabled separately in CPUID,
it is not really possible to virtualize them separately. Therefore, Intel
processors will really allow both SHSTK and IBT under the hood if either is
made visible in the guest's CPUID. The alternative would be to intercept
XSAVES/XRSTORS, which is not feasible for performance reasons.
* Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when
completing userspace I/O. KVM has already committed to allowing L2 to
to perform I/O at that point.
* Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
supposed to exist for v2 PMUs.
* Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
* Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside of
forced emulation and other contrived testing scenarios).
* Clean up the MSR APIs in preparation for CET and FRED virtualization, as
well as mediated vPMU support.
* Clean up a pile of PMU code in anticipation of adding support for mediated
vPMUs.
* Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits
needed to faithfully emulate an I/O APIC for such guests.
* Many cleanups and minor fixes.
* Recover possible NX huge pages within the TDP MMU under read lock to
reduce guest jitter when restoring NX huge pages.
* Return -EAGAIN during prefault if userspace concurrently deletes/moves the
relevant memslot, to fix an issue where prefaulting could deadlock with the
memslot update.
x86 (AMD):
* Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported.
* Require a minimum GHCB version of 2 when starting SEV-SNP guests via
KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors
instead of latent guest failures.
* Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
unauthorized CPU accesses from reading the ciphertext of SNP guest private
memory, e.g. to attempt an offline attack. This feature splits the shared
SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests,
therefore a new module parameter is needed to control the number of ASIDs
that can be used for VMs with CipherText Hiding vs. how many can be used to
run SEV-ES guests.
* Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted
host from tampering with the guest's TSC frequency, while still allowing the
the VMM to configure the guest's TSC frequency prior to launch.
* Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs
resulting from bogus XCR0 values.
* Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to
avoid leaving behind stale state (thankfully not consumed in KVM).
* Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE
instead of subtly relying on guest_memfd to deal with them.
* Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's
desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's
TSC_AUX in the host MSR until return to userspace.
KVM (Intel):
* Preparation for FRED support.
* Don't retry in TDX's anti-zero-step mitigation if the target memslot is
invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar
to the aforementioned prefaulting case.
* Misc bugfixes and minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjjx/0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMLFwf9HXZdqBn6VvkbSL/HIGdNG1BEzeJ0
MQVEMMdmWJ72JtI6soJ6oN5NWTIJJeMTPuCgRrNxFbIivSdm9vYPTSCNwNBhKb+H
FEsr62a9T4XgnTqy20h+yZJiKNvwtaggdTWFnUAUqsBSFkEtksAP72odvZx+GNv/
cndqtxy/84TcJ4ZXFdxElylCcQ9xRoRkqkU8KaVfg88wqMIMbSR3OBSH/g8bqR+3
cjvDGNC7TPHPEN2Wmq2AYluRlBxB2ZhsOauArsdidPXHAevO+AFnbS27fz6bixZK
LTS/qwKOsvhFzyHngemuG6s6HgkgBEshfcKk5i7d2ReRjaGP4EvkhmlImA==
=k49c
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
"Generic:
- Rework almost all of KVM's exports to expose symbols only to KVM's
x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko
x86:
- Rework almost all of KVM x86's exports to expose symbols only to
KVM's vendor modules, i.e. to kvm-{amd,intel}.ko
- Add support for virtualizing Control-flow Enforcement Technology
(CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
(Shadow Stacks).
It is worth noting that while SHSTK and IBT can be enabled
separately in CPUID, it is not really possible to virtualize them
separately. Therefore, Intel processors will really allow both
SHSTK and IBT under the hood if either is made visible in the
guest's CPUID. The alternative would be to intercept
XSAVES/XRSTORS, which is not feasible for performance reasons
- Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
when completing userspace I/O. KVM has already committed to
allowing L2 to to perform I/O at that point
- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
MSR is supposed to exist for v2 PMUs
- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs
- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside
of forced emulation and other contrived testing scenarios)
- Clean up the MSR APIs in preparation for CET and FRED
virtualization, as well as mediated vPMU support
- Clean up a pile of PMU code in anticipation of adding support for
mediated vPMUs
- Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
vmexits needed to faithfully emulate an I/O APIC for such guests
- Many cleanups and minor fixes
- Recover possible NX huge pages within the TDP MMU under read lock
to reduce guest jitter when restoring NX huge pages
- Return -EAGAIN during prefault if userspace concurrently
deletes/moves the relevant memslot, to fix an issue where
prefaulting could deadlock with the memslot update
x86 (AMD):
- Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
supported
- Require a minimum GHCB version of 2 when starting SEV-SNP guests
via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
errors instead of latent guest failures
- Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
prevents unauthorized CPU accesses from reading the ciphertext of
SNP guest private memory, e.g. to attempt an offline attack. This
feature splits the shared SEV-ES/SEV-SNP ASID space into separate
ranges for SEV-ES and SEV-SNP guests, therefore a new module
parameter is needed to control the number of ASIDs that can be used
for VMs with CipherText Hiding vs. how many can be used to run
SEV-ES guests
- Add support for Secure TSC for SEV-SNP guests, which prevents the
untrusted host from tampering with the guest's TSC frequency, while
still allowing the the VMM to configure the guest's TSC frequency
prior to launch
- Validate the XCR0 provided by the guest (via the GHCB) to avoid
bugs resulting from bogus XCR0 values
- Save an SEV guest's policy if and only if LAUNCH_START fully
succeeds to avoid leaving behind stale state (thankfully not
consumed in KVM)
- Explicitly reject non-positive effective lengths during SNP's
LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
them
- Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
host's desired TSC_AUX, to fix a bug where KVM was keeping a
different vCPU's TSC_AUX in the host MSR until return to userspace
KVM (Intel):
- Preparation for FRED support
- Don't retry in TDX's anti-zero-step mitigation if the target
memslot is invalid, i.e. is being deleted or moved, to fix a
deadlock scenario similar to the aforementioned prefaulting case
- Misc bugfixes and minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: x86: Export KVM-internal symbols for sub-modules only
KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
KVM: Export KVM-internal symbols for sub-modules only
KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
KVM: VMX: Make CR4.CET a guest owned bit
KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
KVM: selftests: Add coverage for KVM-defined registers in MSRs test
KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
KVM: selftests: Extend MSRs test to validate vCPUs without supported features
KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
KVM: selftests: Add an MSR test to exercise guest/host and read/write
KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
KVM: x86: Define Control Protection Exception (#CP) vector
KVM: x86: Add human friendly formatting for #XM, and #VE
KVM: SVM: Enable shadow stack virtualization for SVM
KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
KVM: SVM: Pass through shadow stack MSRs as appropriate
KVM: SVM: Update dump_vmcb with shadow stack save area additions
KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
...
|
||
|
|
f3826aa996 |
guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM
types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the kernel direct
map, as well as for limited mmap() for guest_memfd-backed memory.
For more information see:
* a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
* https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
(guest_memfd in Firecracker)
* https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
(direct map removal)
* https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
(mmap support)
ARM:
* Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
* Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
* Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
* Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This will
hopefully help debugging non-VHE setups.
* Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
* Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
* Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
* Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then allows
a bunch of features to be turned off, which helps cross-host
migration.
* Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
* Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
* Detect page table walk feature on new hardware
* Add sign extension with kernel MMIO/IOCSR emulation
* Improve in-kernel IPI emulation
* Improve in-kernel PCH-PIC emulation
* Move kvm_iocsr tracepoint out of generic code
RISC-V:
* Added SBI FWFT extension for Guest/VM with misaligned delegation and
pointer masking PMLEN features
* Added ONE_REG interface for SBI FWFT extension
* Added Zicbop and bfloat16 extensions for Guest/VM
* Enabled more common KVM selftests for RISC-V
* Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
* Improve interrupt cpu for wakeup, in particular the heuristic to decide
which vCPU to deliver a floating interrupt to.
* Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
* Add #DE coverage in the fastops test (the only exception that's guest-
triggerable in fastop-emulated instructions).
* Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
Forest (SRF) and Clearwater Forest (CWF).
* Minor cleanups and improvements
x86 (guest side):
* For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
could map devices as WB and prevent the device drivers from mapping their
devices with UC/UC-.
* Make kvm_async_pf_task_wake() a local static helper and remove its
export.
* Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
bindings even when PV_UNHALT is unsupported.
Generic:
* Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
now included in GFP_NOWAIT.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
=fyov
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This excludes the bulk of the x86 changes, which I will send
separately. They have two not complex but relatively unusual conflicts
so I will wait for other dust to settle.
guest_memfd:
- Add support for host userspace mapping of guest_memfd-backed memory
for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
(which isn't precisely the same thing as CoCo VMs, since x86's
SEV-MEM and SEV-ES have no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the
kernel direct map, as well as for limited mmap() for
guest_memfd-backed memory.
For more information see:
- commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD")
- guest_memfd in Firecracker:
https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
- direct map removal:
https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
- mmap support:
https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
ARM:
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This
will hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then
allows a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
- Detect page table walk feature on new hardware
- Add sign extension with kernel MMIO/IOCSR emulation
- Improve in-kernel IPI emulation
- Improve in-kernel PCH-PIC emulation
- Move kvm_iocsr tracepoint out of generic code
RISC-V:
- Added SBI FWFT extension for Guest/VM with misaligned delegation
and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V
- Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
- Improve interrupt cpu for wakeup, in particular the heuristic to
decide which vCPU to deliver a floating interrupt to.
- Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
- Add #DE coverage in the fastops test (the only exception that's
guest- triggerable in fastop-emulated instructions).
- Fix PMU selftests errors encountered on Granite Rapids (GNR),
Sierra Forest (SRF) and Clearwater Forest (CWF).
- Minor cleanups and improvements
x86 (guest side):
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI
auto-mapping could map devices as WB and prevent the device drivers
from mapping their devices with UC/UC-.
- Make kvm_async_pf_task_wake() a local static helper and remove its
export.
- Use native qspinlocks when running in a VM with dedicated
vCPU=>pCPU bindings even when PV_UNHALT is unsupported.
Generic:
- Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
__GFP_NOWARN is now included in GFP_NOWAIT.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
KVM: s390: Fix to clear PTE when discarding a swapped page
KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
KVM: arm64: selftests: Add basic test for running in VHE EL2
KVM: arm64: selftests: Enable EL2 by default
KVM: arm64: selftests: Initialize HCR_EL2
KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
KVM: arm64: selftests: Select SMCCC conduit based on current EL
KVM: arm64: selftests: Provide helper for getting default vCPU target
KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
KVM: arm64: selftests: Add helper to check for VGICv3 support
KVM: arm64: selftests: Initialize VGICv3 only once
KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
KVM: selftests: Add ex_str() to print human friendly name of exception vectors
selftests/kvm: remove stale TODO in xapic_state_test
KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
...
|
||
|
|
12abeb81c8 |
KVM x86 CET virtualization support for 6.18
Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
Branch Tracking (IBT), that can be utilized by software to help provide
Control-flow integrity (CFI). SHSTK defends against backward-edge attacks
(a.k.a. Return-oriented programming (ROP)), while IBT defends against
forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).
Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
flow to unauthorized targets in order to execute small snippets of code,
a.k.a. gadgets, of the attackers choice. By chaining together several gadgets,
an attacker can perform arbitrary operations and circumvent the system's
defenses.
SHSTK defends against backward-edge attacks, which execute gadgets by modifying
the stack to branch to the attacker's target via RET, by providing a second
stack that is used exclusively to track control transfer operations. The
shadow stack is separate from the data/normal stack, and can be enabled
independently in user and kernel mode.
When SHSTK is is enabled, CALL instructions push the return address on both the
data and shadow stack. RET then pops the return address from both stacks and
compares the addresses. If the return addresses from the two stacks do not
match, the CPU generates a Control Protection (#CP) exception.
IBT defends against backward-edge attacks, which branch to gadgets by executing
indirect CALL and JMP instructions with attacker controlled register or memory
state, by requiring the target of indirect branches to start with a special
marker instruction, ENDBRANCH. If an indirect branch is executed and the next
instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH
behaves as a NOP if IBT is disabled or unsupported.
From a virtualization perspective, CET presents several problems. While SHSTK
and IBT have two layers of enabling, a global control in the form of a CR4 bit,
and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
option due to complexity, and outright disallowing use of XSTATE to context
switch SHSTK/IBT state would render the features unusable to most guests.
To limit the overall complexity without sacrificing performance or usability,
simply ignore the potential virtualization hole, but ensure that all paths in
KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow
userspace to advertise one of SHSTK or IBT if both are supported in hardware,
even though doing so would allow a misbehaving guest to use the unadvertised
feature.
Fully emulating SHSTK and IBT would also require significant complexity, e.g.
to track and update branch state for IBT, and shadow stack state for SHSTK.
Given that emulating large swaths of the guest code stream isn't necessary on
modern CPUs, punt on emulating instructions that meaningful impact or consume
SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation
of such instructions so that KVM's emulator can't be abused to circumvent CET.
Disable support for SHSTK and IBT if KVM is configured such that emulation of
arbitrary guest instructions may be required, specifically if Unrestricted
Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
is smaller than host.MAXPHYADDR.
Lastly disable SHSTK support if shadow paging is enabled, as the protections
for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
that they can't be directly modified by software), i.e. would require
non-trivial support in the Shadow MMU.
Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support
so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
in SVM requires KVM modifications.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXbisACgkQOlYIJqCj
N/373w//ckB4c9MjS6eDRp+LtTXQfXyAs8eMcs9YTs7yD3uMvqcbaNuDsf1U2cI6
i2qcuOdxlnKSJphn6oH2JKDWPjRAfHhCqmYghUPaJwgeYqsTfork9s8rzU2tC82q
38mQ6BhAuOwa/plodvDp/+POEIoXUyexSoWX+cngGVTmFWdbfA4NNGjWMZOl1XG2
qLBck6t+IxxUTs1Ij+OsexlAKdY7FcZZ85Ok6I/VE4/lITEhuTJkwkYdh8td3KK/
IVVk1jb1Z7t8lGQ5fi3+N/D8iHJ/0ladmOux6Yxzw88uyj6XLIFOOFsdK09GyhUS
QzV06syFkV2vU68VDYiOcMZIdeGmYR5jDpmy9N+o0s86YLU6rKKEaXRP7vW5yHj/
99AU+DfRHvhqKwWyQ51B+rhr80F3EQrkZXI0QBr8KO7sseFZvZNNVozwKjSyZtNH
VBhxjIlVQm5Z1rjucKjc573sONK95z9XUSZjYnCUwB1NH7VsvdULQmJBucCmzW/p
9j49CpmShwggceV6LcYg4Miuvjl/bL1B8Go5Fg+1Fdg7L6Nepi16yywxHmyPqreJ
Wx/6N0gqZ3LKDdl5CFYxAxvJoldJR6lbw/AGjvFkre8A+TGGRdz3uS9XXqGHvtbu
W5wKhnvGov69lm4xYbxbI+rvxYmmQLm9SgQXel23icbKJ5kmE48=
=zsBl
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD
KVM x86 CET virtualization support for 6.18
Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).
CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
Branch Tracking (IBT), that can be utilized by software to help provide
Control-flow integrity (CFI). SHSTK defends against backward-edge attacks
(a.k.a. Return-oriented programming (ROP)), while IBT defends against
forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).
Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
flow to unauthorized targets in order to execute small snippets of code,
a.k.a. gadgets, of the attackers choice. By chaining together several gadgets,
an attacker can perform arbitrary operations and circumvent the system's
defenses.
SHSTK defends against backward-edge attacks, which execute gadgets by modifying
the stack to branch to the attacker's target via RET, by providing a second
stack that is used exclusively to track control transfer operations. The
shadow stack is separate from the data/normal stack, and can be enabled
independently in user and kernel mode.
When SHSTK is is enabled, CALL instructions push the return address on both the
data and shadow stack. RET then pops the return address from both stacks and
compares the addresses. If the return addresses from the two stacks do not
match, the CPU generates a Control Protection (#CP) exception.
IBT defends against backward-edge attacks, which branch to gadgets by executing
indirect CALL and JMP instructions with attacker controlled register or memory
state, by requiring the target of indirect branches to start with a special
marker instruction, ENDBRANCH. If an indirect branch is executed and the next
instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH
behaves as a NOP if IBT is disabled or unsupported.
From a virtualization perspective, CET presents several problems. While SHSTK
and IBT have two layers of enabling, a global control in the form of a CR4 bit,
and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
option due to complexity, and outright disallowing use of XSTATE to context
switch SHSTK/IBT state would render the features unusable to most guests.
To limit the overall complexity without sacrificing performance or usability,
simply ignore the potential virtualization hole, but ensure that all paths in
KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow
userspace to advertise one of SHSTK or IBT if both are supported in hardware,
even though doing so would allow a misbehaving guest to use the unadvertised
feature.
Fully emulating SHSTK and IBT would also require significant complexity, e.g.
to track and update branch state for IBT, and shadow stack state for SHSTK.
Given that emulating large swaths of the guest code stream isn't necessary on
modern CPUs, punt on emulating instructions that meaningful impact or consume
SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation
of such instructions so that KVM's emulator can't be abused to circumvent CET.
Disable support for SHSTK and IBT if KVM is configured such that emulation of
arbitrary guest instructions may be required, specifically if Unrestricted
Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
is smaller than host.MAXPHYADDR.
Lastly disable SHSTK support if shadow paging is enabled, as the protections
for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
that they can't be directly modified by software), i.e. would require
non-trivial support in the Shadow MMU.
Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support
so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
in SVM requires KVM modifications.
|
||
|
|
d05ca6b793 |
KVM x86 changes for 6.18
- Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
intercepts and trigger a variety of WARNs in KVM.
- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
supposed to exist for v2 PMUs.
- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
- Clean up KVM's vector hashing code for delivering lowest priority IRQs.
- Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
in the process of doing so add fastpath support for TSC_DEADLINE writes on
AMD CPUs.
- Clean up a pile of PMU code in anticipation of adding support for mediated
vPMUs.
- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside of
forced emulation and other contrived testing scenarios).
- Clean up the MSR APIs in preparation for CET and FRED virtualization, as
well as mediated vPMU support.
- Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
as KVM can't faithfully emulate an I/O APIC for such guests.
- KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
response to PMU refreshes for guests with mediated vPMUs.
- Misc cleanups and minor fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXIr0ACgkQOlYIJqCj
N/1bbhAAxHzxN7IcizgAYf1BZWMjRU4zJgwlkoGuBeH/IgUOODPjs93L9kyrzvVL
tcFgIe9o5fZRGmUfyZbCKnJaQi/4u/2QPRSGhsYt7vyDjCoXzO5CJPMYIqDz5Z2r
qg+GNMlLtWI8EbcDd4qT22SWC8GufoXFEQnX6PUNhasOHeKit5ye8wmttcG+zvYV
KeIkPluddQkQ2JKyG53IFNmm1lkY05oAibv61hkxqUSwCIJKsQFuDjl4GVouAd/H
eu0+pzNmzPUTQ/qJzr2cNL5Nqz08DGp2OCFFRO6bgXaWkvHnFG3EAEHlhTAUh92t
LPJxmhb6R8SUc+z8rYTgyF/zVpgeJcJO7F44FrXa7r2iV58ds3TfuO53hVaEfyNp
1GUMH0m8N2vfjtFyUVP1KwZHuFxiGKLd1wZ1h0yKpj1Eg1FjR2cEontqwH44tHn2
ENq8MIbWIBhvCsz5fIbM4y591JSevJUrDlYu60Lz7VyXHAw8Cq92t/dN9O7oH5mJ
pIyoracU1g0Q6bbATZYsOGhkCTYLtdelZaBb5AYIgQ+U4C1TA4GpgEBUSVH8HXDy
kXzVqSFlL0v5rrFkBPjiNFb5WD3iLjJIM3DLGoNegOM8+79r/USGHUY+XU3z/kCH
rV8JBlTnLBCrNOHEiHJUI2kwBQ9C9/l88X/VwvRUNv7SthuExSo=
=9IB0
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD
KVM x86 changes for 6.18
- Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw
where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's
intercepts and trigger a variety of WARNs in KVM.
- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is
supposed to exist for v2 PMUs.
- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs.
- Clean up KVM's vector hashing code for delivering lowest priority IRQs.
- Clean up the fastpath handler code to only handle IPIs and WRMSRs that are
actually "fast", as opposed to handling those that KVM _hopes_ are fast, and
in the process of doing so add fastpath support for TSC_DEADLINE writes on
AMD CPUs.
- Clean up a pile of PMU code in anticipation of adding support for mediated
vPMUs.
- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside of
forced emulation and other contrived testing scenarios).
- Clean up the MSR APIs in preparation for CET and FRED virtualization, as
well as mediated vPMU support.
- Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs,
as KVM can't faithfully emulate an I/O APIC for such guests.
- KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation
for mediated vPMU support, as KVM will need to recalculate MSR intercepts in
response to PMU refreshes for guests with mediated vPMUs.
- Misc cleanups and minor fixes.
|
||
|
|
9d6812d415 |
KVM: x86: Enable guest SSP read/write interface with new uAPIs
Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace save and restore the guest's Shadow Stack Pointer (SSP). On both Intel and AMD, SSP is a hardware register that can only be accessed by software via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware to context switch SSP at entry/exit). As a result, SSP doesn't fit in any of KVM's existing interfaces for saving/restoring state. Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP MSRs. Use a translation layer to hide the KVM-internal MSR index so that the arbitrary index doesn't become ABI, e.g. so that KVM can rework its implementation as needed, so long as the ONE_REG ABI is maintained. Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack support to avoid running afoul of ignore_msrs, which unfortunately applies to host-initiated accesses (which is a discussion for another day). I.e. ensure consistent behavior for KVM-defined registers irrespective of ignore_msrs. Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-14-seanjc@google.com Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
06f2969c6a |
KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
other non-MSR registers through them, along with support for
KVM_GET_REG_LIST to enumerate support for KVM-defined registers.
This is in preparation for allowing userspace to read/write the guest SSP
register, which is needed for the upcoming CET virtualization support.
Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
added for registers that lack existing KVM uAPIs to access them. The "KVM"
in the name is intended to be vague to give KVM flexibility to include
other potential registers. More precise names like "SYNTHETIC" and
"SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can
be conflated with synthetic guest-visible MSRs) and may put KVM into a
corner (e.g. if KVM wants to change how a KVM-defined register is modeled
internally).
Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid
duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_
registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e.
can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped).
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1]
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
||
|
|
86bcd23df9 |
KVM: x86: Fix hypercalls docs section number order
Commit 4180bf1b655a79 ("KVM: X86: Implement "send IPI" hypercall")
documents KVM_HC_SEND_IPI hypercall, yet its section number duplicates
KVM_HC_CLOCK_PAIRING one (which both are 6th). Fix the numbering order
so that the former should be 7th.
Fixes: 4180bf1b655a ("KVM: X86: Implement "send IPI" hypercall")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250909003952.10314-1-bagasdotme@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
||
|
|
5b5133e6a5 |
Documentation: KVM: Call out that KVM strictly follows the 8254 PIT spec
Explicitly document that the behavior of KVM_SET_PIT2 strictly conforms to the Intel 8254 PIT hardware specification, specifically that a write of '0' adheres to the spec's definition that a programmed count of '0' is converted to the maximum possible value (2^16). E.g. an unaware userspace might attempt to validate that KVM_GET_PIT2 returns the exact state set via KVM_SET_PIT2, and be surprised when the returned count is 65536, not 0. Add a references to the Intel 8254 PIT datasheet that will hopefully stay fresh for some time (the internet isn't exactly brimming with copies of the 8254 datasheet). Link: https://lore.kernel.org/all/CANypQFbEySjKOFLqtFFf2vrEe=NBr7XJfbkjQhqXuZGg7Rpoxw@mail.gmail.com Signed-off-by: Jiaming Zhang <r772577952@gmail.com> Link: https://lore.kernel.org/r/20250905174736.260694-1-r772577952@gmail.com [sean: add context Link, drop local APIC change, massage changelog accordingly] Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
a4c2ff6e50 |
Documentation: Fix spelling mistakes
Corrected a few spelling mistakes to improve the readability. Signed-off-by: Ranganath V N <vnranganath.20@gmail.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250902193822.6349-1-vnranganath.20@gmail.com |
||
|
|
3d3a04fad2 |
KVM: Allow and advertise support for host mmap() on guest_memfd files
Now that all the x86 and arm64 plumbing for mmap() on guest_memfd is in place, allow userspace to set GUEST_MEMFD_FLAG_MMAP and advertise support via a new capability, KVM_CAP_GUEST_MEMFD_MMAP. The availability of this capability is determined per architecture, and its enablement for a specific guest_memfd instance is controlled by the GUEST_MEMFD_FLAG_MMAP flag at creation time. Update the KVM API documentation to detail the KVM_CAP_GUEST_MEMFD_MMAP capability, the associated GUEST_MEMFD_FLAG_MMAP, and provide essential information regarding support for mmap in guest_memfd. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
6836e1f30f |
Documentation: KVM: Use unordered list for pre-init VGIC registers
The intent was to create a single column table, however the markup used was actually for a header which led to docs build failures: Sphinx parallel build error: docutils.utils.SystemMessage: Documentation/virt/kvm/devices/arm-vgic-v3.rst:128: (SEVERE/4) Unexpected section title or transition. Fix the issue by converting the attempted table to an unordered list. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250729142217.0d4e64cd@canb.auug.org.au/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Message-ID: <20250729152242.3232229-1-oliver.upton@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
314b40b3b6 |
KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou
pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk=
=r+sp
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
|
||
|
|
1a14928e2e |
Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.17 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest. - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model. - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical. - Recalculate all MSR intercepts from the "source" on MSR filter changes, and drop the dedicated "shadow" bitmaps (and their awful "max" size defines). - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR. - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently. - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED, a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use the same approach KVM uses for dealing with "impossible" emulation when running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU has architecturally impossible state. - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can "virtualize" APERF/MPERF (with many caveats). - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default" frequency is unsupported for VMs with a "secure" TSC, and there's no known use case for changing the default frequency for other VM types. |
||
|
|
65164fd0f6 |
KVM/riscv changes for 6.17
- Enabled ring-based dirty memory tracking - Improved perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU related improvements for KVM RISC-V for upcoming nested virtualization -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmiIr0QACgkQrUjsVaLH LAf4yA/+KkfQCgMFhwpml6tIzFaa9yS+C9oqemnlWVT/wTADg18/+hraomqUnYhY HqOPdWo6O3vmH6E0jR6+AQXz5f4whYl8X8m+HdBHz8c1rF6ozoLMF4Qh3lsDDuZ0 7pdIEzkNLCA3umBxXUzy7yexKWSzcGD3521toFMPADQHaZTwT8Om5/KLblHB+he3 1DzsvErPWknWW1pynvSUJoD4zB41Qn364sJvyq4tAW6i8DmxLAmM/+Reh7GBP83r t7nAYVdnYikFj0oCb60NcFHqOQpk88mZTqCPMeZD1BoazEDXCPkdx0J44NsBRjun BhEpgBLIZDIwOF1A/DDIPrNuNOjSeeUAsAY1sK/yVEOkVZ1HPndyCil5SkE1FJHT dmsJBXq96dTlXYo9jBfExFUaUCI1mivLbX7uziIT1876IgLr5NlEJSbwk+TQ8VmR IS1PISi7yp5LeZcJEh6PBYgo02UE9gQ/C3tvvcaHbxXyQjVacB6Dw7EnaBArYMwv dbEPkxXem90Vup90ixgdLBW3nGCDckpogbsmlqoxV5m3MOknE5L8IuWxKXKB56Tp pxp7o1JywBV+Ym5w1BkpTvyEL/a2VDGFOfjryq/h8NAVxXPfwk2plyNhrwLt+Naz dYF7RprQL3XTtPnVOND6mMHyl2Sz9y0dc2tj6mpGamSBWPxlP/w= =ifud -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.17-2' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.17 - Enabled ring-based dirty memory tracking - Improved perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU related improvements for KVM RISC-V for upcoming nested virtualization |
||
|
|
f55ffaf896 |
RISC-V: KVM: Enable ring-based dirty memory tracking
Enable ring-based dirty memory tracking on riscv: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as riscv is weakly ordered. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add a check to kvm_vcpu_kvm_riscv_check_vcpu_requests for checking whether the dirty ring is soft full. To handle vCPU requests that cause exits to userspace, modified the `kvm_riscv_check_vcpu_requests` to return a value (currently only returns 0 or 1). Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20e116efb1f7aff211dd8e3cf8990c5521ed5f34.1749810735.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org> |
||
|
|
0d46e324c0 |
Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next
* kvm-arm64/vgic-v4-ctl: : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta : : Allow userspace to decide if support for SGIs without an active state is : advertised to the guest, allowing VMs from GICv3-only hardware to be : migrated to to GICv4.1 capable machines. Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init KVM: arm64: selftests: Add test for nASSGIcap attribute KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling KVM: arm64: Disambiguate support for vSGIs v. vLPIs Signed-off-by: Oliver Upton <oliver.upton@linux.dev> |
||
|
|
a7f49a9bf4 |
Merge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/next
* kvm-arm64/el2-reg-visibility:
: Fixes to EL2 register visibility, courtesy of Marc Zyngier
:
: - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the
: KVM_{GET,SET}_ONE_REG ioctls
:
: - Condition visibility of FGT registers on the presence of FEAT_FGT in
: the VM
KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
KVM: arm64: Enforce the sorting of the GICv3 system register table
KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
KVM: arm64: selftests: get-reg-list: Add base EL2 registers
KVM: arm64: selftests: get-reg-list: Simplify feature dependency
KVM: arm64: Advertise FGT2 registers to userspace
KVM: arm64: Condition FGT registers on feature availability
KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
KVM: arm64: Let GICv3 save/restore honor visibility attribute
KVM: arm64: Define helper for ICH_VTR_EL2
KVM: arm64: Define constant value for ICC_SRE_EL2
KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
KVM: arm64: Make RVBAR_EL2 accesses UNDEF
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
||
|
|
eed9b14209 |
Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
KVM allows userspace to control GICD_IIDR.Revision and GICD_TYPER2.nASSGIcap prior to initialization for the sake of provisioning the guest-visible feature set. Document the userspace expectations surrounding accesses to these registers. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250724062805.2658919-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev> |
||
|
|
4b7d440de2 |
KVM TDX fixes for 6.16
- Fix a formatting goof in the TDX documentation.
- Reject KVM_SET_TSC_KHZ for guests with a protected TSC (currently only TDX).
- Ensure struct kvm_tdx_capabilities fields that are not explicitly set by KVM
are zeroed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmh5C9oACgkQOlYIJqCj
N/1IlRAArAgX2C69qzTWougXFbwl0qiJhY3RpM+jCkiQsSkgICs8kTBSp5OFDRKE
njtMvdCrhDSXGabRmOlTOjlbauZUU7Ady9Z9GIevVskTu/feknU3I33CQJMBUjCI
49NG3o3p1vUWluK+WzzclHeojgAFVrPdoWZ3SLhgZf/N9s/hElI46bqpNN/zZOYd
x2oDC1YJiRDoQmAF/hGDZxvQSZB7ZWvufUEi/lFomjSbWMecn26ynNaFe8jMdjyj
cBXmsIn41qLMOdVZt+C5h7btcSar1uf0CZ4BfgZbKiAwZn6YH76rzBrAARP15rsI
iKRfnOTaENx2SkKEX17Xlh1mOFKIMFP5wl2oQbW0pHMEObdUgIcwUlzOtKHmN/Nz
dCN3W1sYat0fGec4lcx7emNmDDW8EfhmznS9eGo67okGKEbtUnzsDR01Ob335oVN
F8ORqT1vthn2d3Na1FyAo3a5NKhL2/J67lNThDKzfdiQ7UxYeTjsFW4ys3Li6jDM
H+LIcXpxLfEwbesbuCHAoyPbAcROFLuNgbTBr1CNm3zsfeqHzxeZAk7aUX6wbBV1
+F7C7ANfvhae8OBkZoAgJJ3aJEbeloXboQUiijzM12l6Qz+shcpfqmIy5ZrZHQeI
SKJhHlKDa11vy488sT23Pz1go23sQU83ZbB0x7sdcyOH+1dqCYQ=
=V3cQ
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.16-rc7' of https://github.com/kvm-x86/linux into HEAD
KVM TDX fixes for 6.16
- Fix a formatting goof in the TDX documentation.
- Reject KVM_SET_TSC_KHZ for guests with a protected TSC (currently only TDX).
- Ensure struct kvm_tdx_capabilities fields that are not explicitly set by KVM
are zeroed.
|
||
|
|
f68df3aee7 |
KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
We never documented which GICv3 registers are available for save/restore via the KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS interface. Let's take the opportunity of adding the EL2 registers to document the whole thing in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250714122634.3334816-12-maz@kernel.org [ oliver: fix trailing whitespace ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev> |
||
|
|
8a73c8dbb2 |
KVM: Documentation: document how KVM is tested
Proper testing greatly simplifies both patch development and review, but it can be unclear what kind of userspace or guest support should accompany new features. Clarify maintainer expectations in terms of testing expectations; additionally, list the cases in which open-source userspace support is pretty much a necessity and its absence can only be mitigated by selftests. While these ideas have long been followed implicitly by KVM contributors and maintainers, formalize them in writing to provide consistent (though not universal) guidelines. Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
83131d8469 |
KVM: Documentation: minimal updates to review-checklist.rst
While the file could stand a larger update, these are the bare minimum changes needed to make it more widely applicable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
b24bbb534c |
KVM: x86: Reject KVM_SET_TSC_KHZ vCPU ioctl for TSC protected guest
Reject KVM_SET_TSC_KHZ vCPU ioctl if guest's TSC is protected and not
changeable by KVM, and update the documentation to reflect it.
For such TSC protected guests, e.g. TDX guests, typically the TSC is
configured once at VM level before any vCPU are created and remains
unchanged during VM's lifetime. KVM provides the KVM_SET_TSC_KHZ VM
scope ioctl to allow the userspace VMM to configure the TSC of such VM.
After that the userspace VMM is not supposed to call the KVM_SET_TSC_KHZ
vCPU scope ioctl anymore when creating the vCPU.
The de facto userspace VMM Qemu does this for TDX guests. The upcoming
SEV-SNP guests with Secure TSC should follow.
Note, TDX support hasn't been fully released as of the "buggy" commit,
i.e. there is no established ABI to break.
Fixes: adafea110600 ("KVM: x86: Add infrastructure for secure TSC")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/r/71bbdf87fdd423e3ba3a45b57642c119ee2dd98c.1752444335.git.kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
||
|
|
dcbe5a466c |
KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created
Reject the KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created and update the documentation to reflect it. The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC frequency that all subsequently created vCPUs can use. It is only intended to be called before any vCPU is created. Allowing it to be called after that only results in confusion but nothing good. Note this is an ABI change. But currently in Qemu (the de facto userspace VMM) only TDX uses this VM ioctl, and it is only called once before creating any vCPU, therefore the risk of breaking userspace is pretty low. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/135a35223ce8d01cea06b6cef30bfe494ec85827.1752444335.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
073b3eca08 |
Documentation: KVM: Fix unexpected unindent warning
Add proper indentations to bullet list item to resolve the warning: "Bullet list ends without a blank line; unexpected unindent." Closes:https://lore.kernel.org/kvm/20250623162110.6e2f4241@canb.auug.org.au Fixes: 4580dbef5ce0 ("KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250701012536.1281367-1-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
a7cec20845 |
KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs without interception. The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not handled at all. The MSR values are not zeroed on vCPU creation, saved on suspend, or restored on resume. No accommodation is made for processor migration or for sharing a logical processor with other tasks. No adjustments are made for non-unit TSC multipliers. The MSRs do not account for time the same way as the comparable PMU events, whether the PMU is virtualized by the traditional emulation method or the new mediated pass-through approach. Nonetheless, in a properly constrained environment, this capability can be combined with a guest CPUID table that advertises support for CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the effective physical CPU frequency in /proc/cpuinfo. Moreover, there is no performance cost for this capability. Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
f55ce5a6cd |
KVM: arm64: Expose new KVM cap for cacheable PFNMAP
Introduce a new KVM capability to expose to the userspace whether cacheable mapping of PFNMAP is supported. The ability to safely do the cacheable mapping of PFNMAP is contingent on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache and turns icache_inval_pou() into a NOP. The cap would be false if those requirements are missing and is checked by making use of kvm_arch_supports_cacheable_pfnmap. This capability would allow userspace to discover the support. It could for instance be used by userspace to prevent live-migration across FWB and non-FWB hosts. CC: Catalin Marinas <catalin.marinas@arm.com> CC: Jason Gunthorpe <jgg@nvidia.com> CC: Oliver Upton <oliver.upton@linux.dev> CC: David Hildenbrand <david@redhat.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> |
||
|
|
0c84b53404 |
Documentation: KVM: Fix unexpected unindent warnings
Add proper indentations to bullet list items to resolve the warning: "Bullet list ends without a blank line; unexpected unindent." Closes:https://lore.kernel.org/kvm/20250623162110.6e2f4241@canb.auug.org.au/ Fixes: cf207eac06f6 ("KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>") Fixes: 25e8b1dd4883 ("KVM: TDX: Exit to userspace for GetTdVmCallInfo") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20250625014829.82289-1-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
|
|
28224ef02b |
KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
Allow userspace to advertise TDG.VP.VMCALL subfunctions that the kernel also supports. For each output register of GetTdVmCallInfo's leaf 1, add two fields to KVM_TDX_CAPABILITIES: one for kernel-supported TDVMCALLs (userspace can set those blindly) and one for user-supported TDVMCALLs (userspace can set those if it knows how to handle them). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
4580dbef5c |
KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
25e8b1dd48 |
KVM: TDX: Exit to userspace for GetTdVmCallInfo
Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX, to allow userspace to provide information about the support of TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API. GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>, <ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>, <Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and <Instruction.WRMSR>. They must be supported by VMM to support TDX guests. For GetTdVmCallInfo - When leaf (r12) to enumerate TDVMCALL functionality is set to 0, successful execution indicates all GHCI base TDVMCALLs listed above are supported. Update the KVM TDX document with the set of the GHCI base APIs. - When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it indicates the TDX guest is querying the supported TDVMCALLs beyond the GHCI base TDVMCALLs. Exit to userspace to let userspace set the TDVMCALL sub-function bit(s) accordingly to the leaf outputs. KVM could set the TDVMCALL bit(s) supported by itself when the TDVMCALLs don't need support from userspace after returning from userspace and before entering guest. Currently, no such TDVMCALLs implemented, KVM just sets the values returned from userspace. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
cf207eac06 |
KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
Handle TDVMCALL for GetQuote to generate a TD-Quote. GetQuote is a doorbell-like interface used by TDX guests to request VMM to generate a TD-Quote signed by a service hosting TD-Quoting Enclave operating on the host. A TDX guest passes a TD Report (TDREPORT_STRUCT) in a shared-memory area as parameter. Host VMM can access it and queue the operation for a service hosting TD-Quoting enclave. When completed, the Quote is returned via the same shared-memory area. KVM only checks the GPA from the TDX guest has the shared-bit set and drops the shared-bit before exiting to userspace to avoid bleeding the shared-bit into KVM's exit ABI. KVM forwards the request to userspace VMM (e.g. QEMU) and userspace VMM queues the operation asynchronously. KVM sets the return code according to the 'ret' field set by userspace to notify the TDX guest whether the request has been queued successfully or not. When the request has been queued successfully, the TDX guest can poll the status field in the shared-memory area to check whether the Quote generation is completed or not. When completed, the generated Quote is returned via the same buffer. Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is required to handle the KVM exit reason as the initial support for TDX, by reentering KVM to ensure that the TDVMCALL is complete. While at it, add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |