mirror of
https://github.com/hardkernel/linux.git
synced 2026-06-06 10:58:48 +09:00
Merge 3a755ebcc2 ("Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Steps on the way to 5.19-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia2bebe5c894cfdc798293af4c35ec79c9d1c812e
This commit is contained in:
@@ -26,6 +26,7 @@ x86-specific Documentation
|
||||
intel_txt
|
||||
amd-memory-encryption
|
||||
amd_hsmp
|
||||
tdx
|
||||
pti
|
||||
mds
|
||||
microcode
|
||||
|
||||
218
Documentation/x86/tdx.rst
Normal file
218
Documentation/x86/tdx.rst
Normal file
@@ -0,0 +1,218 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================
|
||||
Intel Trust Domain Extensions (TDX)
|
||||
=====================================
|
||||
|
||||
Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from
|
||||
the host and physical attacks by isolating the guest register state and by
|
||||
encrypting the guest memory. In TDX, a special module running in a special
|
||||
mode sits between the host and the guest and manages the guest/host
|
||||
separation.
|
||||
|
||||
Since the host cannot directly access guest registers or memory, much
|
||||
normal functionality of a hypervisor must be moved into the guest. This is
|
||||
implemented using a Virtualization Exception (#VE) that is handled by the
|
||||
guest kernel. A #VE is handled entirely inside the guest kernel, but some
|
||||
require the hypervisor to be consulted.
|
||||
|
||||
TDX includes new hypercall-like mechanisms for communicating from the
|
||||
guest to the hypervisor or the TDX module.
|
||||
|
||||
New TDX Exceptions
|
||||
==================
|
||||
|
||||
TDX guests behave differently from bare-metal and traditional VMX guests.
|
||||
In TDX guests, otherwise normal instructions or memory accesses can cause
|
||||
#VE or #GP exceptions.
|
||||
|
||||
Instructions marked with an '*' conditionally cause exceptions. The
|
||||
details for these instructions are discussed below.
|
||||
|
||||
Instruction-based #VE
|
||||
---------------------
|
||||
|
||||
- Port I/O (INS, OUTS, IN, OUT)
|
||||
- HLT
|
||||
- MONITOR, MWAIT
|
||||
- WBINVD, INVD
|
||||
- VMCALL
|
||||
- RDMSR*,WRMSR*
|
||||
- CPUID*
|
||||
|
||||
Instruction-based #GP
|
||||
---------------------
|
||||
|
||||
- All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH,
|
||||
VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON
|
||||
- ENCLS, ENCLU
|
||||
- GETSEC
|
||||
- RSM
|
||||
- ENQCMD
|
||||
- RDMSR*,WRMSR*
|
||||
|
||||
RDMSR/WRMSR Behavior
|
||||
--------------------
|
||||
|
||||
MSR access behavior falls into three categories:
|
||||
|
||||
- #GP generated
|
||||
- #VE generated
|
||||
- "Just works"
|
||||
|
||||
In general, the #GP MSRs should not be used in guests. Their use likely
|
||||
indicates a bug in the guest. The guest may try to handle the #GP with a
|
||||
hypercall but it is unlikely to succeed.
|
||||
|
||||
The #VE MSRs are typically able to be handled by the hypervisor. Guests
|
||||
can make a hypercall to the hypervisor to handle the #VE.
|
||||
|
||||
The "just works" MSRs do not need any special guest handling. They might
|
||||
be implemented by directly passing through the MSR to the hardware or by
|
||||
trapping and handling in the TDX module. Other than possibly being slow,
|
||||
these MSRs appear to function just as they would on bare metal.
|
||||
|
||||
CPUID Behavior
|
||||
--------------
|
||||
|
||||
For some CPUID leaves and sub-leaves, the virtualized bit fields of CPUID
|
||||
return values (in guest EAX/EBX/ECX/EDX) are configurable by the
|
||||
hypervisor. For such cases, the Intel TDX module architecture defines two
|
||||
virtualization types:
|
||||
|
||||
- Bit fields for which the hypervisor controls the value seen by the guest
|
||||
TD.
|
||||
|
||||
- Bit fields for which the hypervisor configures the value such that the
|
||||
guest TD either sees their native value or a value of 0. For these bit
|
||||
fields, the hypervisor can mask off the native values, but it can not
|
||||
turn *on* values.
|
||||
|
||||
A #VE is generated for CPUID leaves and sub-leaves that the TDX module does
|
||||
not know how to handle. The guest kernel may ask the hypervisor for the
|
||||
value with a hypercall.
|
||||
|
||||
#VE on Memory Accesses
|
||||
======================
|
||||
|
||||
There are essentially two classes of TDX memory: private and shared.
|
||||
Private memory receives full TDX protections. Its content is protected
|
||||
against access from the hypervisor. Shared memory is expected to be
|
||||
shared between guest and hypervisor and does not receive full TDX
|
||||
protections.
|
||||
|
||||
A TD guest is in control of whether its memory accesses are treated as
|
||||
private or shared. It selects the behavior with a bit in its page table
|
||||
entries. This helps ensure that a guest does not place sensitive
|
||||
information in shared memory, exposing it to the untrusted hypervisor.
|
||||
|
||||
#VE on Shared Memory
|
||||
--------------------
|
||||
|
||||
Access to shared mappings can cause a #VE. The hypervisor ultimately
|
||||
controls whether a shared memory access causes a #VE, so the guest must be
|
||||
careful to only reference shared pages it can safely handle a #VE. For
|
||||
instance, the guest should be careful not to access shared memory in the
|
||||
#VE handler before it reads the #VE info structure (TDG.VP.VEINFO.GET).
|
||||
|
||||
Shared mapping content is entirely controlled by the hypervisor. The guest
|
||||
should only use shared mappings for communicating with the hypervisor.
|
||||
Shared mappings must never be used for sensitive memory content like kernel
|
||||
stacks. A good rule of thumb is that hypervisor-shared memory should be
|
||||
treated the same as memory mapped to userspace. Both the hypervisor and
|
||||
userspace are completely untrusted.
|
||||
|
||||
MMIO for virtual devices is implemented as shared memory. The guest must
|
||||
be careful not to access device MMIO regions unless it is also prepared to
|
||||
handle a #VE.
|
||||
|
||||
#VE on Private Pages
|
||||
--------------------
|
||||
|
||||
An access to private mappings can also cause a #VE. Since all kernel
|
||||
memory is also private memory, the kernel might theoretically need to
|
||||
handle a #VE on arbitrary kernel memory accesses. This is not feasible, so
|
||||
TDX guests ensure that all guest memory has been "accepted" before memory
|
||||
is used by the kernel.
|
||||
|
||||
A modest amount of memory (typically 512M) is pre-accepted by the firmware
|
||||
before the kernel runs to ensure that the kernel can start up without
|
||||
being subjected to a #VE.
|
||||
|
||||
The hypervisor is permitted to unilaterally move accepted pages to a
|
||||
"blocked" state. However, if it does this, page access will not generate a
|
||||
#VE. It will, instead, cause a "TD Exit" where the hypervisor is required
|
||||
to handle the exception.
|
||||
|
||||
Linux #VE handler
|
||||
=================
|
||||
|
||||
Just like page faults or #GP's, #VE exceptions can be either handled or be
|
||||
fatal. Typically, an unhandled userspace #VE results in a SIGSEGV.
|
||||
An unhandled kernel #VE results in an oops.
|
||||
|
||||
Handling nested exceptions on x86 is typically nasty business. A #VE
|
||||
could be interrupted by an NMI which triggers another #VE and hilarity
|
||||
ensues. The TDX #VE architecture anticipated this scenario and includes a
|
||||
feature to make it slightly less nasty.
|
||||
|
||||
During #VE handling, the TDX module ensures that all interrupts (including
|
||||
NMIs) are blocked. The block remains in place until the guest makes a
|
||||
TDG.VP.VEINFO.GET TDCALL. This allows the guest to control when interrupts
|
||||
or a new #VE can be delivered.
|
||||
|
||||
However, the guest kernel must still be careful to avoid potential
|
||||
#VE-triggering actions (discussed above) while this block is in place.
|
||||
While the block is in place, any #VE is elevated to a double fault (#DF)
|
||||
which is not recoverable.
|
||||
|
||||
MMIO handling
|
||||
=============
|
||||
|
||||
In non-TDX VMs, MMIO is usually implemented by giving a guest access to a
|
||||
mapping which will cause a VMEXIT on access, and then the hypervisor
|
||||
emulates the access. That is not possible in TDX guests because VMEXIT
|
||||
will expose the register state to the host. TDX guests don't trust the host
|
||||
and can't have their state exposed to the host.
|
||||
|
||||
In TDX, MMIO regions typically trigger a #VE exception in the guest. The
|
||||
guest #VE handler then emulates the MMIO instruction inside the guest and
|
||||
converts it into a controlled TDCALL to the host, rather than exposing
|
||||
guest state to the host.
|
||||
|
||||
MMIO addresses on x86 are just special physical addresses. They can
|
||||
theoretically be accessed with any instruction that accesses memory.
|
||||
However, the kernel instruction decoding method is limited. It is only
|
||||
designed to decode instructions like those generated by io.h macros.
|
||||
|
||||
MMIO access via other means (like structure overlays) may result in an
|
||||
oops.
|
||||
|
||||
Shared Memory Conversions
|
||||
=========================
|
||||
|
||||
All TDX guest memory starts out as private at boot. This memory can not
|
||||
be accessed by the hypervisor. However, some kernel users like device
|
||||
drivers might have a need to share data with the hypervisor. To do this,
|
||||
memory must be converted between shared and private. This can be
|
||||
accomplished using some existing memory encryption helpers:
|
||||
|
||||
* set_memory_decrypted() converts a range of pages to shared.
|
||||
* set_memory_encrypted() converts memory back to private.
|
||||
|
||||
Device drivers are the primary user of shared memory, but there's no need
|
||||
to touch every driver. DMA buffers and ioremap() do the conversions
|
||||
automatically.
|
||||
|
||||
TDX uses SWIOTLB for most DMA allocations. The SWIOTLB buffer is
|
||||
converted to shared on boot.
|
||||
|
||||
For coherent DMA allocation, the DMA buffer gets converted on the
|
||||
allocation. Check force_dma_unencrypted() for details.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
TDX reference material is collected here:
|
||||
|
||||
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
|
||||
@@ -878,6 +878,21 @@ config ACRN_GUEST
|
||||
IOT with small footprint and real-time features. More details can be
|
||||
found in https://projectacrn.org/.
|
||||
|
||||
config INTEL_TDX_GUEST
|
||||
bool "Intel TDX (Trust Domain Extensions) - Guest Support"
|
||||
depends on X86_64 && CPU_SUP_INTEL
|
||||
depends on X86_X2APIC
|
||||
select ARCH_HAS_CC_PLATFORM
|
||||
select X86_MEM_ENCRYPT
|
||||
select X86_MCE
|
||||
help
|
||||
Support running as a guest under Intel TDX. Without this support,
|
||||
the guest kernel can not boot or run under TDX.
|
||||
TDX includes memory encryption and integrity capabilities
|
||||
which protect the confidentiality and integrity of guest
|
||||
memory contents and CPU state. TDX guests are protected from
|
||||
some attacks from the VMM.
|
||||
|
||||
endif #HYPERVISOR_GUEST
|
||||
|
||||
source "arch/x86/Kconfig.cpu"
|
||||
|
||||
@@ -26,6 +26,7 @@
|
||||
#include "bitops.h"
|
||||
#include "ctype.h"
|
||||
#include "cpuflags.h"
|
||||
#include "io.h"
|
||||
|
||||
/* Useful macros */
|
||||
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
|
||||
@@ -35,44 +36,10 @@ extern struct boot_params boot_params;
|
||||
|
||||
#define cpu_relax() asm volatile("rep; nop")
|
||||
|
||||
/* Basic port I/O */
|
||||
static inline void outb(u8 v, u16 port)
|
||||
{
|
||||
asm volatile("outb %0,%1" : : "a" (v), "dN" (port));
|
||||
}
|
||||
static inline u8 inb(u16 port)
|
||||
{
|
||||
u8 v;
|
||||
asm volatile("inb %1,%0" : "=a" (v) : "dN" (port));
|
||||
return v;
|
||||
}
|
||||
|
||||
static inline void outw(u16 v, u16 port)
|
||||
{
|
||||
asm volatile("outw %0,%1" : : "a" (v), "dN" (port));
|
||||
}
|
||||
static inline u16 inw(u16 port)
|
||||
{
|
||||
u16 v;
|
||||
asm volatile("inw %1,%0" : "=a" (v) : "dN" (port));
|
||||
return v;
|
||||
}
|
||||
|
||||
static inline void outl(u32 v, u16 port)
|
||||
{
|
||||
asm volatile("outl %0,%1" : : "a" (v), "dN" (port));
|
||||
}
|
||||
static inline u32 inl(u16 port)
|
||||
{
|
||||
u32 v;
|
||||
asm volatile("inl %1,%0" : "=a" (v) : "dN" (port));
|
||||
return v;
|
||||
}
|
||||
|
||||
static inline void io_delay(void)
|
||||
{
|
||||
const u16 DELAY_PORT = 0x80;
|
||||
asm volatile("outb %%al,%0" : : "dN" (DELAY_PORT));
|
||||
outb(0, DELAY_PORT);
|
||||
}
|
||||
|
||||
/* These functions are used to reference data in other segments. */
|
||||
|
||||
@@ -101,6 +101,7 @@ ifdef CONFIG_X86_64
|
||||
endif
|
||||
|
||||
vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
|
||||
vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
|
||||
|
||||
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
|
||||
vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
|
||||
|
||||
@@ -289,7 +289,7 @@ SYM_FUNC_START(startup_32)
|
||||
pushl %eax
|
||||
|
||||
/* Enter paged protected Mode, activating Long Mode */
|
||||
movl $(X86_CR0_PG | X86_CR0_PE), %eax /* Enable Paging and Protected mode */
|
||||
movl $CR0_STATE, %eax
|
||||
movl %eax, %cr0
|
||||
|
||||
/* Jump from 32bit compatibility mode into 64bit mode. */
|
||||
@@ -649,12 +649,28 @@ SYM_CODE_START(trampoline_32bit_src)
|
||||
movl $MSR_EFER, %ecx
|
||||
rdmsr
|
||||
btsl $_EFER_LME, %eax
|
||||
/* Avoid writing EFER if no change was made (for TDX guest) */
|
||||
jc 1f
|
||||
wrmsr
|
||||
popl %edx
|
||||
1: popl %edx
|
||||
popl %ecx
|
||||
|
||||
#ifdef CONFIG_X86_MCE
|
||||
/*
|
||||
* Preserve CR4.MCE if the kernel will enable #MC support.
|
||||
* Clearing MCE may fault in some environments (that also force #MC
|
||||
* support). Any machine check that occurs before #MC support is fully
|
||||
* configured will crash the system regardless of the CR4.MCE value set
|
||||
* here.
|
||||
*/
|
||||
movl %cr4, %eax
|
||||
andl $X86_CR4_MCE, %eax
|
||||
#else
|
||||
movl $0, %eax
|
||||
#endif
|
||||
|
||||
/* Enable PAE and LA57 (if required) paging modes */
|
||||
movl $X86_CR4_PAE, %eax
|
||||
orl $X86_CR4_PAE, %eax
|
||||
testl %edx, %edx
|
||||
jz 1f
|
||||
orl $X86_CR4_LA57, %eax
|
||||
@@ -668,8 +684,9 @@ SYM_CODE_START(trampoline_32bit_src)
|
||||
pushl $__KERNEL_CS
|
||||
pushl %eax
|
||||
|
||||
/* Enable paging again */
|
||||
movl $(X86_CR0_PG | X86_CR0_PE), %eax
|
||||
/* Enable paging again. */
|
||||
movl %cr0, %eax
|
||||
btsl $X86_CR0_PG_BIT, %eax
|
||||
movl %eax, %cr0
|
||||
|
||||
lret
|
||||
|
||||
@@ -48,6 +48,8 @@ void *memmove(void *dest, const void *src, size_t n);
|
||||
*/
|
||||
struct boot_params *boot_params;
|
||||
|
||||
struct port_io_ops pio_ops;
|
||||
|
||||
memptr free_mem_ptr;
|
||||
memptr free_mem_end_ptr;
|
||||
|
||||
@@ -374,6 +376,16 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
|
||||
lines = boot_params->screen_info.orig_video_lines;
|
||||
cols = boot_params->screen_info.orig_video_cols;
|
||||
|
||||
init_default_io_ops();
|
||||
|
||||
/*
|
||||
* Detect TDX guest environment.
|
||||
*
|
||||
* It has to be done before console_init() in order to use
|
||||
* paravirtualized port I/O operations if needed.
|
||||
*/
|
||||
early_tdx_detect();
|
||||
|
||||
console_init();
|
||||
|
||||
/*
|
||||
|
||||
@@ -22,17 +22,19 @@
|
||||
#include <linux/linkage.h>
|
||||
#include <linux/screen_info.h>
|
||||
#include <linux/elf.h>
|
||||
#include <linux/io.h>
|
||||
#include <asm/page.h>
|
||||
#include <asm/boot.h>
|
||||
#include <asm/bootparam.h>
|
||||
#include <asm/desc_defs.h>
|
||||
|
||||
#include "tdx.h"
|
||||
|
||||
#define BOOT_CTYPE_H
|
||||
#include <linux/acpi.h>
|
||||
|
||||
#define BOOT_BOOT_H
|
||||
#include "../ctype.h"
|
||||
#include "../io.h"
|
||||
|
||||
#include "efi.h"
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
|
||||
|
||||
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
|
||||
#define TRAMPOLINE_32BIT_CODE_SIZE 0x70
|
||||
#define TRAMPOLINE_32BIT_CODE_SIZE 0x80
|
||||
|
||||
#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
|
||||
|
||||
|
||||
3
arch/x86/boot/compressed/tdcall.S
Normal file
3
arch/x86/boot/compressed/tdcall.S
Normal file
@@ -0,0 +1,3 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
|
||||
#include "../../coco/tdx/tdcall.S"
|
||||
77
arch/x86/boot/compressed/tdx.c
Normal file
77
arch/x86/boot/compressed/tdx.c
Normal file
@@ -0,0 +1,77 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
#include "../cpuflags.h"
|
||||
#include "../string.h"
|
||||
#include "../io.h"
|
||||
#include "error.h"
|
||||
|
||||
#include <vdso/limits.h>
|
||||
#include <uapi/asm/vmx.h>
|
||||
|
||||
#include <asm/shared/tdx.h>
|
||||
|
||||
/* Called from __tdx_hypercall() for unrecoverable failure */
|
||||
void __tdx_hypercall_failed(void)
|
||||
{
|
||||
error("TDVMCALL failed. TDX module bug?");
|
||||
}
|
||||
|
||||
static inline unsigned int tdx_io_in(int size, u16 port)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = EXIT_REASON_IO_INSTRUCTION,
|
||||
.r12 = size,
|
||||
.r13 = 0,
|
||||
.r14 = port,
|
||||
};
|
||||
|
||||
if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
|
||||
return UINT_MAX;
|
||||
|
||||
return args.r11;
|
||||
}
|
||||
|
||||
static inline void tdx_io_out(int size, u16 port, u32 value)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = EXIT_REASON_IO_INSTRUCTION,
|
||||
.r12 = size,
|
||||
.r13 = 1,
|
||||
.r14 = port,
|
||||
.r15 = value,
|
||||
};
|
||||
|
||||
__tdx_hypercall(&args, 0);
|
||||
}
|
||||
|
||||
static inline u8 tdx_inb(u16 port)
|
||||
{
|
||||
return tdx_io_in(1, port);
|
||||
}
|
||||
|
||||
static inline void tdx_outb(u8 value, u16 port)
|
||||
{
|
||||
tdx_io_out(1, port, value);
|
||||
}
|
||||
|
||||
static inline void tdx_outw(u16 value, u16 port)
|
||||
{
|
||||
tdx_io_out(2, port, value);
|
||||
}
|
||||
|
||||
void early_tdx_detect(void)
|
||||
{
|
||||
u32 eax, sig[3];
|
||||
|
||||
cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
|
||||
|
||||
if (memcmp(TDX_IDENT, sig, sizeof(sig)))
|
||||
return;
|
||||
|
||||
/* Use hypercalls instead of I/O instructions */
|
||||
pio_ops.f_inb = tdx_inb;
|
||||
pio_ops.f_outb = tdx_outb;
|
||||
pio_ops.f_outw = tdx_outw;
|
||||
}
|
||||
13
arch/x86/boot/compressed/tdx.h
Normal file
13
arch/x86/boot/compressed/tdx.h
Normal file
@@ -0,0 +1,13 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef BOOT_COMPRESSED_TDX_H
|
||||
#define BOOT_COMPRESSED_TDX_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#ifdef CONFIG_INTEL_TDX_GUEST
|
||||
void early_tdx_detect(void);
|
||||
#else
|
||||
static inline void early_tdx_detect(void) { };
|
||||
#endif
|
||||
|
||||
#endif /* BOOT_COMPRESSED_TDX_H */
|
||||
@@ -71,8 +71,7 @@ int has_eflag(unsigned long mask)
|
||||
# define EBX_REG "=b"
|
||||
#endif
|
||||
|
||||
static inline void cpuid_count(u32 id, u32 count,
|
||||
u32 *a, u32 *b, u32 *c, u32 *d)
|
||||
void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d)
|
||||
{
|
||||
asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t"
|
||||
"cpuid \n\t"
|
||||
|
||||
@@ -17,5 +17,6 @@ extern u32 cpu_vendor[3];
|
||||
|
||||
int has_eflag(unsigned long mask);
|
||||
void get_cpuflags(void);
|
||||
void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d);
|
||||
|
||||
#endif
|
||||
|
||||
41
arch/x86/boot/io.h
Normal file
41
arch/x86/boot/io.h
Normal file
@@ -0,0 +1,41 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef BOOT_IO_H
|
||||
#define BOOT_IO_H
|
||||
|
||||
#include <asm/shared/io.h>
|
||||
|
||||
#undef inb
|
||||
#undef inw
|
||||
#undef inl
|
||||
#undef outb
|
||||
#undef outw
|
||||
#undef outl
|
||||
|
||||
struct port_io_ops {
|
||||
u8 (*f_inb)(u16 port);
|
||||
void (*f_outb)(u8 v, u16 port);
|
||||
void (*f_outw)(u16 v, u16 port);
|
||||
};
|
||||
|
||||
extern struct port_io_ops pio_ops;
|
||||
|
||||
/*
|
||||
* Use the normal I/O instructions by default.
|
||||
* TDX guests override these to use hypercalls.
|
||||
*/
|
||||
static inline void init_default_io_ops(void)
|
||||
{
|
||||
pio_ops.f_inb = __inb;
|
||||
pio_ops.f_outb = __outb;
|
||||
pio_ops.f_outw = __outw;
|
||||
}
|
||||
|
||||
/*
|
||||
* Redirect port I/O operations via pio_ops callbacks.
|
||||
* TDX guests override these callbacks with TDX-specific helpers.
|
||||
*/
|
||||
#define inb pio_ops.f_inb
|
||||
#define outb pio_ops.f_outb
|
||||
#define outw pio_ops.f_outw
|
||||
|
||||
#endif
|
||||
@@ -17,6 +17,8 @@
|
||||
|
||||
struct boot_params boot_params __attribute__((aligned(16)));
|
||||
|
||||
struct port_io_ops pio_ops;
|
||||
|
||||
char *HEAP = _end;
|
||||
char *heap_end = _end; /* Default end of heap = no heap */
|
||||
|
||||
@@ -133,6 +135,8 @@ static void init_heap(void)
|
||||
|
||||
void main(void)
|
||||
{
|
||||
init_default_io_ops();
|
||||
|
||||
/* First, copy the boot header into the "zeropage" */
|
||||
copy_boot_params();
|
||||
|
||||
|
||||
@@ -4,3 +4,5 @@ KASAN_SANITIZE_core.o := n
|
||||
CFLAGS_core.o += -fno-stack-protector
|
||||
|
||||
obj-y += core.o
|
||||
|
||||
obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/
|
||||
|
||||
@@ -18,7 +18,15 @@ static u64 cc_mask __ro_after_init;
|
||||
|
||||
static bool intel_cc_platform_has(enum cc_attr attr)
|
||||
{
|
||||
return false;
|
||||
switch (attr) {
|
||||
case CC_ATTR_GUEST_UNROLL_STRING_IO:
|
||||
case CC_ATTR_HOTPLUG_DISABLED:
|
||||
case CC_ATTR_GUEST_MEM_ENCRYPT:
|
||||
case CC_ATTR_MEM_ENCRYPT:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -90,9 +98,18 @@ EXPORT_SYMBOL_GPL(cc_platform_has);
|
||||
|
||||
u64 cc_mkenc(u64 val)
|
||||
{
|
||||
/*
|
||||
* Both AMD and Intel use a bit in the page table to indicate
|
||||
* encryption status of the page.
|
||||
*
|
||||
* - for AMD, bit *set* means the page is encrypted
|
||||
* - for Intel *clear* means encrypted.
|
||||
*/
|
||||
switch (vendor) {
|
||||
case CC_VENDOR_AMD:
|
||||
return val | cc_mask;
|
||||
case CC_VENDOR_INTEL:
|
||||
return val & ~cc_mask;
|
||||
default:
|
||||
return val;
|
||||
}
|
||||
@@ -100,9 +117,12 @@ u64 cc_mkenc(u64 val)
|
||||
|
||||
u64 cc_mkdec(u64 val)
|
||||
{
|
||||
/* See comment in cc_mkenc() */
|
||||
switch (vendor) {
|
||||
case CC_VENDOR_AMD:
|
||||
return val & ~cc_mask;
|
||||
case CC_VENDOR_INTEL:
|
||||
return val | cc_mask;
|
||||
default:
|
||||
return val;
|
||||
}
|
||||
|
||||
3
arch/x86/coco/tdx/Makefile
Normal file
3
arch/x86/coco/tdx/Makefile
Normal file
@@ -0,0 +1,3 @@
|
||||
# SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
obj-y += tdx.o tdcall.o
|
||||
205
arch/x86/coco/tdx/tdcall.S
Normal file
205
arch/x86/coco/tdx/tdcall.S
Normal file
@@ -0,0 +1,205 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#include <asm/asm-offsets.h>
|
||||
#include <asm/asm.h>
|
||||
#include <asm/frame.h>
|
||||
#include <asm/unwind_hints.h>
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include <linux/bits.h>
|
||||
#include <linux/errno.h>
|
||||
|
||||
#include "../../virt/vmx/tdx/tdxcall.S"
|
||||
|
||||
/*
|
||||
* Bitmasks of exposed registers (with VMM).
|
||||
*/
|
||||
#define TDX_R10 BIT(10)
|
||||
#define TDX_R11 BIT(11)
|
||||
#define TDX_R12 BIT(12)
|
||||
#define TDX_R13 BIT(13)
|
||||
#define TDX_R14 BIT(14)
|
||||
#define TDX_R15 BIT(15)
|
||||
|
||||
/*
|
||||
* These registers are clobbered to hold arguments for each
|
||||
* TDVMCALL. They are safe to expose to the VMM.
|
||||
* Each bit in this mask represents a register ID. Bit field
|
||||
* details can be found in TDX GHCI specification, section
|
||||
* titled "TDCALL [TDG.VP.VMCALL] leaf".
|
||||
*/
|
||||
#define TDVMCALL_EXPOSE_REGS_MASK ( TDX_R10 | TDX_R11 | \
|
||||
TDX_R12 | TDX_R13 | \
|
||||
TDX_R14 | TDX_R15 )
|
||||
|
||||
/*
|
||||
* __tdx_module_call() - Used by TDX guests to request services from
|
||||
* the TDX module (does not include VMM services) using TDCALL instruction.
|
||||
*
|
||||
* Transforms function call register arguments into the TDCALL register ABI.
|
||||
* After TDCALL operation, TDX module output is saved in @out (if it is
|
||||
* provided by the user).
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
* TDCALL ABI:
|
||||
*-------------------------------------------------------------------------
|
||||
* Input Registers:
|
||||
*
|
||||
* RAX - TDCALL Leaf number.
|
||||
* RCX,RDX,R8-R9 - TDCALL Leaf specific input registers.
|
||||
*
|
||||
* Output Registers:
|
||||
*
|
||||
* RAX - TDCALL instruction error code.
|
||||
* RCX,RDX,R8-R11 - TDCALL Leaf specific output registers.
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
*
|
||||
* __tdx_module_call() function ABI:
|
||||
*
|
||||
* @fn (RDI) - TDCALL Leaf ID, moved to RAX
|
||||
* @rcx (RSI) - Input parameter 1, moved to RCX
|
||||
* @rdx (RDX) - Input parameter 2, moved to RDX
|
||||
* @r8 (RCX) - Input parameter 3, moved to R8
|
||||
* @r9 (R8) - Input parameter 4, moved to R9
|
||||
*
|
||||
* @out (R9) - struct tdx_module_output pointer
|
||||
* stored temporarily in R12 (not
|
||||
* shared with the TDX module). It
|
||||
* can be NULL.
|
||||
*
|
||||
* Return status of TDCALL via RAX.
|
||||
*/
|
||||
SYM_FUNC_START(__tdx_module_call)
|
||||
FRAME_BEGIN
|
||||
TDX_MODULE_CALL host=0
|
||||
FRAME_END
|
||||
RET
|
||||
SYM_FUNC_END(__tdx_module_call)
|
||||
|
||||
/*
|
||||
* __tdx_hypercall() - Make hypercalls to a TDX VMM using TDVMCALL leaf
|
||||
* of TDCALL instruction
|
||||
*
|
||||
* Transforms values in function call argument struct tdx_hypercall_args @args
|
||||
* into the TDCALL register ABI. After TDCALL operation, VMM output is saved
|
||||
* back in @args.
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
* TD VMCALL ABI:
|
||||
*-------------------------------------------------------------------------
|
||||
*
|
||||
* Input Registers:
|
||||
*
|
||||
* RAX - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
|
||||
* RCX - BITMAP which controls which part of TD Guest GPR
|
||||
* is passed as-is to the VMM and back.
|
||||
* R10 - Set 0 to indicate TDCALL follows standard TDX ABI
|
||||
* specification. Non zero value indicates vendor
|
||||
* specific ABI.
|
||||
* R11 - VMCALL sub function number
|
||||
* RBX, RBP, RDI, RSI - Used to pass VMCALL sub function specific arguments.
|
||||
* R8-R9, R12-R15 - Same as above.
|
||||
*
|
||||
* Output Registers:
|
||||
*
|
||||
* RAX - TDCALL instruction status (Not related to hypercall
|
||||
* output).
|
||||
* R10 - Hypercall output error code.
|
||||
* R11-R15 - Hypercall sub function specific output values.
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
*
|
||||
* __tdx_hypercall() function ABI:
|
||||
*
|
||||
* @args (RDI) - struct tdx_hypercall_args for input and output
|
||||
* @flags (RSI) - TDX_HCALL_* flags
|
||||
*
|
||||
* On successful completion, return the hypercall error code.
|
||||
*/
|
||||
SYM_FUNC_START(__tdx_hypercall)
|
||||
FRAME_BEGIN
|
||||
|
||||
/* Save callee-saved GPRs as mandated by the x86_64 ABI */
|
||||
push %r15
|
||||
push %r14
|
||||
push %r13
|
||||
push %r12
|
||||
|
||||
/* Mangle function call ABI into TDCALL ABI: */
|
||||
/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
|
||||
xor %eax, %eax
|
||||
|
||||
/* Copy hypercall registers from arg struct: */
|
||||
movq TDX_HYPERCALL_r10(%rdi), %r10
|
||||
movq TDX_HYPERCALL_r11(%rdi), %r11
|
||||
movq TDX_HYPERCALL_r12(%rdi), %r12
|
||||
movq TDX_HYPERCALL_r13(%rdi), %r13
|
||||
movq TDX_HYPERCALL_r14(%rdi), %r14
|
||||
movq TDX_HYPERCALL_r15(%rdi), %r15
|
||||
|
||||
movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
|
||||
|
||||
/*
|
||||
* For the idle loop STI needs to be called directly before the TDCALL
|
||||
* that enters idle (EXIT_REASON_HLT case). STI instruction enables
|
||||
* interrupts only one instruction later. If there is a window between
|
||||
* STI and the instruction that emulates the HALT state, there is a
|
||||
* chance for interrupts to happen in this window, which can delay the
|
||||
* HLT operation indefinitely. Since this is the not the desired
|
||||
* result, conditionally call STI before TDCALL.
|
||||
*/
|
||||
testq $TDX_HCALL_ISSUE_STI, %rsi
|
||||
jz .Lskip_sti
|
||||
sti
|
||||
.Lskip_sti:
|
||||
tdcall
|
||||
|
||||
/*
|
||||
* RAX==0 indicates a failure of the TDVMCALL mechanism itself and that
|
||||
* something has gone horribly wrong with the TDX module.
|
||||
*
|
||||
* The return status of the hypercall operation is in a separate
|
||||
* register (in R10). Hypercall errors are a part of normal operation
|
||||
* and are handled by callers.
|
||||
*/
|
||||
testq %rax, %rax
|
||||
jne .Lpanic
|
||||
|
||||
/* TDVMCALL leaf return code is in R10 */
|
||||
movq %r10, %rax
|
||||
|
||||
/* Copy hypercall result registers to arg struct if needed */
|
||||
testq $TDX_HCALL_HAS_OUTPUT, %rsi
|
||||
jz .Lout
|
||||
|
||||
movq %r10, TDX_HYPERCALL_r10(%rdi)
|
||||
movq %r11, TDX_HYPERCALL_r11(%rdi)
|
||||
movq %r12, TDX_HYPERCALL_r12(%rdi)
|
||||
movq %r13, TDX_HYPERCALL_r13(%rdi)
|
||||
movq %r14, TDX_HYPERCALL_r14(%rdi)
|
||||
movq %r15, TDX_HYPERCALL_r15(%rdi)
|
||||
.Lout:
|
||||
/*
|
||||
* Zero out registers exposed to the VMM to avoid speculative execution
|
||||
* with VMM-controlled values. This needs to include all registers
|
||||
* present in TDVMCALL_EXPOSE_REGS_MASK (except R12-R15). R12-R15
|
||||
* context will be restored.
|
||||
*/
|
||||
xor %r10d, %r10d
|
||||
xor %r11d, %r11d
|
||||
|
||||
/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
|
||||
pop %r12
|
||||
pop %r13
|
||||
pop %r14
|
||||
pop %r15
|
||||
|
||||
FRAME_END
|
||||
|
||||
RET
|
||||
.Lpanic:
|
||||
call __tdx_hypercall_failed
|
||||
/* __tdx_hypercall_failed never returns */
|
||||
REACHABLE
|
||||
jmp .Lpanic
|
||||
SYM_FUNC_END(__tdx_hypercall)
|
||||
692
arch/x86/coco/tdx/tdx.c
Normal file
692
arch/x86/coco/tdx/tdx.c
Normal file
@@ -0,0 +1,692 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* Copyright (C) 2021-2022 Intel Corporation */
|
||||
|
||||
#undef pr_fmt
|
||||
#define pr_fmt(fmt) "tdx: " fmt
|
||||
|
||||
#include <linux/cpufeature.h>
|
||||
#include <asm/coco.h>
|
||||
#include <asm/tdx.h>
|
||||
#include <asm/vmx.h>
|
||||
#include <asm/insn.h>
|
||||
#include <asm/insn-eval.h>
|
||||
#include <asm/pgtable.h>
|
||||
|
||||
/* TDX module Call Leaf IDs */
|
||||
#define TDX_GET_INFO 1
|
||||
#define TDX_GET_VEINFO 3
|
||||
#define TDX_ACCEPT_PAGE 6
|
||||
|
||||
/* TDX hypercall Leaf IDs */
|
||||
#define TDVMCALL_MAP_GPA 0x10001
|
||||
|
||||
/* MMIO direction */
|
||||
#define EPT_READ 0
|
||||
#define EPT_WRITE 1
|
||||
|
||||
/* Port I/O direction */
|
||||
#define PORT_READ 0
|
||||
#define PORT_WRITE 1
|
||||
|
||||
/* See Exit Qualification for I/O Instructions in VMX documentation */
|
||||
#define VE_IS_IO_IN(e) ((e) & BIT(3))
|
||||
#define VE_GET_IO_SIZE(e) (((e) & GENMASK(2, 0)) + 1)
|
||||
#define VE_GET_PORT_NUM(e) ((e) >> 16)
|
||||
#define VE_IS_IO_STRING(e) ((e) & BIT(4))
|
||||
|
||||
/*
|
||||
* Wrapper for standard use of __tdx_hypercall with no output aside from
|
||||
* return code.
|
||||
*/
|
||||
static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = fn,
|
||||
.r12 = r12,
|
||||
.r13 = r13,
|
||||
.r14 = r14,
|
||||
.r15 = r15,
|
||||
};
|
||||
|
||||
return __tdx_hypercall(&args, 0);
|
||||
}
|
||||
|
||||
/* Called from __tdx_hypercall() for unrecoverable failure */
|
||||
void __tdx_hypercall_failed(void)
|
||||
{
|
||||
panic("TDVMCALL failed. TDX module bug?");
|
||||
}
|
||||
|
||||
/*
|
||||
* The TDG.VP.VMCALL-Instruction-execution sub-functions are defined
|
||||
* independently from but are currently matched 1:1 with VMX EXIT_REASONs.
|
||||
* Reusing the KVM EXIT_REASON macros makes it easier to connect the host and
|
||||
* guest sides of these calls.
|
||||
*/
|
||||
static u64 hcall_func(u64 exit_reason)
|
||||
{
|
||||
return exit_reason;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_KVM_GUEST
|
||||
long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2,
|
||||
unsigned long p3, unsigned long p4)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = nr,
|
||||
.r11 = p1,
|
||||
.r12 = p2,
|
||||
.r13 = p3,
|
||||
.r14 = p4,
|
||||
};
|
||||
|
||||
return __tdx_hypercall(&args, 0);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(tdx_kvm_hypercall);
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Used for TDX guests to make calls directly to the TD module. This
|
||||
* should only be used for calls that have no legitimate reason to fail
|
||||
* or where the kernel can not survive the call failing.
|
||||
*/
|
||||
static inline void tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
|
||||
struct tdx_module_output *out)
|
||||
{
|
||||
if (__tdx_module_call(fn, rcx, rdx, r8, r9, out))
|
||||
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
|
||||
}
|
||||
|
||||
static u64 get_cc_mask(void)
|
||||
{
|
||||
struct tdx_module_output out;
|
||||
unsigned int gpa_width;
|
||||
|
||||
/*
|
||||
* TDINFO TDX module call is used to get the TD execution environment
|
||||
* information like GPA width, number of available vcpus, debug mode
|
||||
* information, etc. More details about the ABI can be found in TDX
|
||||
* Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL
|
||||
* [TDG.VP.INFO].
|
||||
*
|
||||
* The GPA width that comes out of this call is critical. TDX guests
|
||||
* can not meaningfully run without it.
|
||||
*/
|
||||
tdx_module_call(TDX_GET_INFO, 0, 0, 0, 0, &out);
|
||||
|
||||
gpa_width = out.rcx & GENMASK(5, 0);
|
||||
|
||||
/*
|
||||
* The highest bit of a guest physical address is the "sharing" bit.
|
||||
* Set it for shared pages and clear it for private pages.
|
||||
*/
|
||||
return BIT_ULL(gpa_width - 1);
|
||||
}
|
||||
|
||||
static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = hcall_func(EXIT_REASON_HLT),
|
||||
.r12 = irq_disabled,
|
||||
};
|
||||
|
||||
/*
|
||||
* Emulate HLT operation via hypercall. More info about ABI
|
||||
* can be found in TDX Guest-Host-Communication Interface
|
||||
* (GHCI), section 3.8 TDG.VP.VMCALL<Instruction.HLT>.
|
||||
*
|
||||
* The VMM uses the "IRQ disabled" param to understand IRQ
|
||||
* enabled status (RFLAGS.IF) of the TD guest and to determine
|
||||
* whether or not it should schedule the halted vCPU if an
|
||||
* IRQ becomes pending. E.g. if IRQs are disabled, the VMM
|
||||
* can keep the vCPU in virtual HLT, even if an IRQ is
|
||||
* pending, without hanging/breaking the guest.
|
||||
*/
|
||||
return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0);
|
||||
}
|
||||
|
||||
static bool handle_halt(void)
|
||||
{
|
||||
/*
|
||||
* Since non safe halt is mainly used in CPU offlining
|
||||
* and the guest will always stay in the halt state, don't
|
||||
* call the STI instruction (set do_sti as false).
|
||||
*/
|
||||
const bool irq_disabled = irqs_disabled();
|
||||
const bool do_sti = false;
|
||||
|
||||
if (__halt(irq_disabled, do_sti))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
void __cpuidle tdx_safe_halt(void)
|
||||
{
|
||||
/*
|
||||
* For do_sti=true case, __tdx_hypercall() function enables
|
||||
* interrupts using the STI instruction before the TDCALL. So
|
||||
* set irq_disabled as false.
|
||||
*/
|
||||
const bool irq_disabled = false;
|
||||
const bool do_sti = true;
|
||||
|
||||
/*
|
||||
* Use WARN_ONCE() to report the failure.
|
||||
*/
|
||||
if (__halt(irq_disabled, do_sti))
|
||||
WARN_ONCE(1, "HLT instruction emulation failed\n");
|
||||
}
|
||||
|
||||
static bool read_msr(struct pt_regs *regs)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = hcall_func(EXIT_REASON_MSR_READ),
|
||||
.r12 = regs->cx,
|
||||
};
|
||||
|
||||
/*
|
||||
* Emulate the MSR read via hypercall. More info about ABI
|
||||
* can be found in TDX Guest-Host-Communication Interface
|
||||
* (GHCI), section titled "TDG.VP.VMCALL<Instruction.RDMSR>".
|
||||
*/
|
||||
if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
|
||||
return false;
|
||||
|
||||
regs->ax = lower_32_bits(args.r11);
|
||||
regs->dx = upper_32_bits(args.r11);
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool write_msr(struct pt_regs *regs)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = hcall_func(EXIT_REASON_MSR_WRITE),
|
||||
.r12 = regs->cx,
|
||||
.r13 = (u64)regs->dx << 32 | regs->ax,
|
||||
};
|
||||
|
||||
/*
|
||||
* Emulate the MSR write via hypercall. More info about ABI
|
||||
* can be found in TDX Guest-Host-Communication Interface
|
||||
* (GHCI) section titled "TDG.VP.VMCALL<Instruction.WRMSR>".
|
||||
*/
|
||||
return !__tdx_hypercall(&args, 0);
|
||||
}
|
||||
|
||||
static bool handle_cpuid(struct pt_regs *regs)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = hcall_func(EXIT_REASON_CPUID),
|
||||
.r12 = regs->ax,
|
||||
.r13 = regs->cx,
|
||||
};
|
||||
|
||||
/*
|
||||
* Only allow VMM to control range reserved for hypervisor
|
||||
* communication.
|
||||
*
|
||||
* Return all-zeros for any CPUID outside the range. It matches CPU
|
||||
* behaviour for non-supported leaf.
|
||||
*/
|
||||
if (regs->ax < 0x40000000 || regs->ax > 0x4FFFFFFF) {
|
||||
regs->ax = regs->bx = regs->cx = regs->dx = 0;
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Emulate the CPUID instruction via a hypercall. More info about
|
||||
* ABI can be found in TDX Guest-Host-Communication Interface
|
||||
* (GHCI), section titled "VP.VMCALL<Instruction.CPUID>".
|
||||
*/
|
||||
if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
|
||||
return false;
|
||||
|
||||
/*
|
||||
* As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of
|
||||
* EAX, EBX, ECX, EDX registers after the CPUID instruction execution.
|
||||
* So copy the register contents back to pt_regs.
|
||||
*/
|
||||
regs->ax = args.r12;
|
||||
regs->bx = args.r13;
|
||||
regs->cx = args.r14;
|
||||
regs->dx = args.r15;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool mmio_read(int size, unsigned long addr, unsigned long *val)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = hcall_func(EXIT_REASON_EPT_VIOLATION),
|
||||
.r12 = size,
|
||||
.r13 = EPT_READ,
|
||||
.r14 = addr,
|
||||
.r15 = *val,
|
||||
};
|
||||
|
||||
if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
|
||||
return false;
|
||||
*val = args.r11;
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool mmio_write(int size, unsigned long addr, unsigned long val)
|
||||
{
|
||||
return !_tdx_hypercall(hcall_func(EXIT_REASON_EPT_VIOLATION), size,
|
||||
EPT_WRITE, addr, val);
|
||||
}
|
||||
|
||||
static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
|
||||
{
|
||||
char buffer[MAX_INSN_SIZE];
|
||||
unsigned long *reg, val;
|
||||
struct insn insn = {};
|
||||
enum mmio_type mmio;
|
||||
int size, extend_size;
|
||||
u8 extend_val = 0;
|
||||
|
||||
/* Only in-kernel MMIO is supported */
|
||||
if (WARN_ON_ONCE(user_mode(regs)))
|
||||
return false;
|
||||
|
||||
if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
|
||||
return false;
|
||||
|
||||
if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64))
|
||||
return false;
|
||||
|
||||
mmio = insn_decode_mmio(&insn, &size);
|
||||
if (WARN_ON_ONCE(mmio == MMIO_DECODE_FAILED))
|
||||
return false;
|
||||
|
||||
if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
|
||||
reg = insn_get_modrm_reg_ptr(&insn, regs);
|
||||
if (!reg)
|
||||
return false;
|
||||
}
|
||||
|
||||
ve->instr_len = insn.length;
|
||||
|
||||
/* Handle writes first */
|
||||
switch (mmio) {
|
||||
case MMIO_WRITE:
|
||||
memcpy(&val, reg, size);
|
||||
return mmio_write(size, ve->gpa, val);
|
||||
case MMIO_WRITE_IMM:
|
||||
val = insn.immediate.value;
|
||||
return mmio_write(size, ve->gpa, val);
|
||||
case MMIO_READ:
|
||||
case MMIO_READ_ZERO_EXTEND:
|
||||
case MMIO_READ_SIGN_EXTEND:
|
||||
/* Reads are handled below */
|
||||
break;
|
||||
case MMIO_MOVS:
|
||||
case MMIO_DECODE_FAILED:
|
||||
/*
|
||||
* MMIO was accessed with an instruction that could not be
|
||||
* decoded or handled properly. It was likely not using io.h
|
||||
* helpers or accessed MMIO accidentally.
|
||||
*/
|
||||
return false;
|
||||
default:
|
||||
WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?");
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Handle reads */
|
||||
if (!mmio_read(size, ve->gpa, &val))
|
||||
return false;
|
||||
|
||||
switch (mmio) {
|
||||
case MMIO_READ:
|
||||
/* Zero-extend for 32-bit operation */
|
||||
extend_size = size == 4 ? sizeof(*reg) : 0;
|
||||
break;
|
||||
case MMIO_READ_ZERO_EXTEND:
|
||||
/* Zero extend based on operand size */
|
||||
extend_size = insn.opnd_bytes;
|
||||
break;
|
||||
case MMIO_READ_SIGN_EXTEND:
|
||||
/* Sign extend based on operand size */
|
||||
extend_size = insn.opnd_bytes;
|
||||
if (size == 1 && val & BIT(7))
|
||||
extend_val = 0xFF;
|
||||
else if (size > 1 && val & BIT(15))
|
||||
extend_val = 0xFF;
|
||||
break;
|
||||
default:
|
||||
/* All other cases has to be covered with the first switch() */
|
||||
WARN_ON_ONCE(1);
|
||||
return false;
|
||||
}
|
||||
|
||||
if (extend_size)
|
||||
memset(reg, extend_val, extend_size);
|
||||
memcpy(reg, &val, size);
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool handle_in(struct pt_regs *regs, int size, int port)
|
||||
{
|
||||
struct tdx_hypercall_args args = {
|
||||
.r10 = TDX_HYPERCALL_STANDARD,
|
||||
.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
|
||||
.r12 = size,
|
||||
.r13 = PORT_READ,
|
||||
.r14 = port,
|
||||
};
|
||||
u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
|
||||
bool success;
|
||||
|
||||
/*
|
||||
* Emulate the I/O read via hypercall. More info about ABI can be found
|
||||
* in TDX Guest-Host-Communication Interface (GHCI) section titled
|
||||
* "TDG.VP.VMCALL<Instruction.IO>".
|
||||
*/
|
||||
success = !__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
|
||||
|
||||
/* Update part of the register affected by the emulated instruction */
|
||||
regs->ax &= ~mask;
|
||||
if (success)
|
||||
regs->ax |= args.r11 & mask;
|
||||
|
||||
return success;
|
||||
}
|
||||
|
||||
static bool handle_out(struct pt_regs *regs, int size, int port)
|
||||
{
|
||||
u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
|
||||
|
||||
/*
|
||||
* Emulate the I/O write via hypercall. More info about ABI can be found
|
||||
* in TDX Guest-Host-Communication Interface (GHCI) section titled
|
||||
* "TDG.VP.VMCALL<Instruction.IO>".
|
||||
*/
|
||||
return !_tdx_hypercall(hcall_func(EXIT_REASON_IO_INSTRUCTION), size,
|
||||
PORT_WRITE, port, regs->ax & mask);
|
||||
}
|
||||
|
||||
/*
|
||||
* Emulate I/O using hypercall.
|
||||
*
|
||||
* Assumes the IO instruction was using ax, which is enforced
|
||||
* by the standard io.h macros.
|
||||
*
|
||||
* Return True on success or False on failure.
|
||||
*/
|
||||
static bool handle_io(struct pt_regs *regs, u32 exit_qual)
|
||||
{
|
||||
int size, port;
|
||||
bool in;
|
||||
|
||||
if (VE_IS_IO_STRING(exit_qual))
|
||||
return false;
|
||||
|
||||
in = VE_IS_IO_IN(exit_qual);
|
||||
size = VE_GET_IO_SIZE(exit_qual);
|
||||
port = VE_GET_PORT_NUM(exit_qual);
|
||||
|
||||
|
||||
if (in)
|
||||
return handle_in(regs, size, port);
|
||||
else
|
||||
return handle_out(regs, size, port);
|
||||
}
|
||||
|
||||
/*
|
||||
* Early #VE exception handler. Only handles a subset of port I/O.
|
||||
* Intended only for earlyprintk. If failed, return false.
|
||||
*/
|
||||
__init bool tdx_early_handle_ve(struct pt_regs *regs)
|
||||
{
|
||||
struct ve_info ve;
|
||||
|
||||
tdx_get_ve_info(&ve);
|
||||
|
||||
if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
|
||||
return false;
|
||||
|
||||
return handle_io(regs, ve.exit_qual);
|
||||
}
|
||||
|
||||
void tdx_get_ve_info(struct ve_info *ve)
|
||||
{
|
||||
struct tdx_module_output out;
|
||||
|
||||
/*
|
||||
* Called during #VE handling to retrieve the #VE info from the
|
||||
* TDX module.
|
||||
*
|
||||
* This has to be called early in #VE handling. A "nested" #VE which
|
||||
* occurs before this will raise a #DF and is not recoverable.
|
||||
*
|
||||
* The call retrieves the #VE info from the TDX module, which also
|
||||
* clears the "#VE valid" flag. This must be done before anything else
|
||||
* because any #VE that occurs while the valid flag is set will lead to
|
||||
* #DF.
|
||||
*
|
||||
* Note, the TDX module treats virtual NMIs as inhibited if the #VE
|
||||
* valid flag is set. It means that NMI=>#VE will not result in a #DF.
|
||||
*/
|
||||
tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out);
|
||||
|
||||
/* Transfer the output parameters */
|
||||
ve->exit_reason = out.rcx;
|
||||
ve->exit_qual = out.rdx;
|
||||
ve->gla = out.r8;
|
||||
ve->gpa = out.r9;
|
||||
ve->instr_len = lower_32_bits(out.r10);
|
||||
ve->instr_info = upper_32_bits(out.r10);
|
||||
}
|
||||
|
||||
/* Handle the user initiated #VE */
|
||||
static bool virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
|
||||
{
|
||||
switch (ve->exit_reason) {
|
||||
case EXIT_REASON_CPUID:
|
||||
return handle_cpuid(regs);
|
||||
default:
|
||||
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/* Handle the kernel #VE */
|
||||
static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
|
||||
{
|
||||
switch (ve->exit_reason) {
|
||||
case EXIT_REASON_HLT:
|
||||
return handle_halt();
|
||||
case EXIT_REASON_MSR_READ:
|
||||
return read_msr(regs);
|
||||
case EXIT_REASON_MSR_WRITE:
|
||||
return write_msr(regs);
|
||||
case EXIT_REASON_CPUID:
|
||||
return handle_cpuid(regs);
|
||||
case EXIT_REASON_EPT_VIOLATION:
|
||||
return handle_mmio(regs, ve);
|
||||
case EXIT_REASON_IO_INSTRUCTION:
|
||||
return handle_io(regs, ve->exit_qual);
|
||||
default:
|
||||
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve)
|
||||
{
|
||||
bool ret;
|
||||
|
||||
if (user_mode(regs))
|
||||
ret = virt_exception_user(regs, ve);
|
||||
else
|
||||
ret = virt_exception_kernel(regs, ve);
|
||||
|
||||
/* After successful #VE handling, move the IP */
|
||||
if (ret)
|
||||
regs->ip += ve->instr_len;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static bool tdx_tlb_flush_required(bool private)
|
||||
{
|
||||
/*
|
||||
* TDX guest is responsible for flushing TLB on private->shared
|
||||
* transition. VMM is responsible for flushing on shared->private.
|
||||
*
|
||||
* The VMM _can't_ flush private addresses as it can't generate PAs
|
||||
* with the guest's HKID. Shared memory isn't subject to integrity
|
||||
* checking, i.e. the VMM doesn't need to flush for its own protection.
|
||||
*
|
||||
* There's no need to flush when converting from shared to private,
|
||||
* as flushing is the VMM's responsibility in this case, e.g. it must
|
||||
* flush to avoid integrity failures in the face of a buggy or
|
||||
* malicious guest.
|
||||
*/
|
||||
return !private;
|
||||
}
|
||||
|
||||
static bool tdx_cache_flush_required(void)
|
||||
{
|
||||
/*
|
||||
* AMD SME/SEV can avoid cache flushing if HW enforces cache coherence.
|
||||
* TDX doesn't have such capability.
|
||||
*
|
||||
* Flush cache unconditionally.
|
||||
*/
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool try_accept_one(phys_addr_t *start, unsigned long len,
|
||||
enum pg_level pg_level)
|
||||
{
|
||||
unsigned long accept_size = page_level_size(pg_level);
|
||||
u64 tdcall_rcx;
|
||||
u8 page_size;
|
||||
|
||||
if (!IS_ALIGNED(*start, accept_size))
|
||||
return false;
|
||||
|
||||
if (len < accept_size)
|
||||
return false;
|
||||
|
||||
/*
|
||||
* Pass the page physical address to the TDX module to accept the
|
||||
* pending, private page.
|
||||
*
|
||||
* Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
|
||||
*/
|
||||
switch (pg_level) {
|
||||
case PG_LEVEL_4K:
|
||||
page_size = 0;
|
||||
break;
|
||||
case PG_LEVEL_2M:
|
||||
page_size = 1;
|
||||
break;
|
||||
case PG_LEVEL_1G:
|
||||
page_size = 2;
|
||||
break;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
|
||||
tdcall_rcx = *start | page_size;
|
||||
if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
|
||||
return false;
|
||||
|
||||
*start += accept_size;
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Inform the VMM of the guest's intent for this physical page: shared with
|
||||
* the VMM or private to the guest. The VMM is expected to change its mapping
|
||||
* of the page in response.
|
||||
*/
|
||||
static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
|
||||
{
|
||||
phys_addr_t start = __pa(vaddr);
|
||||
phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
|
||||
|
||||
if (!enc) {
|
||||
/* Set the shared (decrypted) bits: */
|
||||
start |= cc_mkdec(0);
|
||||
end |= cc_mkdec(0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Notify the VMM about page mapping conversion. More info about ABI
|
||||
* can be found in TDX Guest-Host-Communication Interface (GHCI),
|
||||
* section "TDG.VP.VMCALL<MapGPA>"
|
||||
*/
|
||||
if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
|
||||
return false;
|
||||
|
||||
/* private->shared conversion requires only MapGPA call */
|
||||
if (!enc)
|
||||
return true;
|
||||
|
||||
/*
|
||||
* For shared->private conversion, accept the page using
|
||||
* TDX_ACCEPT_PAGE TDX module call.
|
||||
*/
|
||||
while (start < end) {
|
||||
unsigned long len = end - start;
|
||||
|
||||
/*
|
||||
* Try larger accepts first. It gives chance to VMM to keep
|
||||
* 1G/2M SEPT entries where possible and speeds up process by
|
||||
* cutting number of hypercalls (if successful).
|
||||
*/
|
||||
|
||||
if (try_accept_one(&start, len, PG_LEVEL_1G))
|
||||
continue;
|
||||
|
||||
if (try_accept_one(&start, len, PG_LEVEL_2M))
|
||||
continue;
|
||||
|
||||
if (!try_accept_one(&start, len, PG_LEVEL_4K))
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
void __init tdx_early_init(void)
|
||||
{
|
||||
u64 cc_mask;
|
||||
u32 eax, sig[3];
|
||||
|
||||
cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
|
||||
|
||||
if (memcmp(TDX_IDENT, sig, sizeof(sig)))
|
||||
return;
|
||||
|
||||
setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
|
||||
|
||||
cc_set_vendor(CC_VENDOR_INTEL);
|
||||
cc_mask = get_cc_mask();
|
||||
cc_set_mask(cc_mask);
|
||||
|
||||
/*
|
||||
* All bits above GPA width are reserved and kernel treats shared bit
|
||||
* as flag, not as part of physical address.
|
||||
*
|
||||
* Adjust physical mask to only cover valid GPA bits.
|
||||
*/
|
||||
physical_mask &= cc_mask - 1;
|
||||
|
||||
x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
|
||||
x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;
|
||||
x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
|
||||
|
||||
pr_info("Guest detected\n");
|
||||
}
|
||||
@@ -13,7 +13,19 @@
|
||||
|
||||
/* Asm macros */
|
||||
|
||||
#define ACPI_FLUSH_CPU_CACHE() wbinvd()
|
||||
/*
|
||||
* ACPI_FLUSH_CPU_CACHE() flushes caches on entering sleep states.
|
||||
* It is required to prevent data loss.
|
||||
*
|
||||
* While running inside virtual machine, the kernel can bypass cache flushing.
|
||||
* Changing sleep state in a virtual machine doesn't affect the host system
|
||||
* sleep state and cannot lead to data loss.
|
||||
*/
|
||||
#define ACPI_FLUSH_CPU_CACHE() \
|
||||
do { \
|
||||
if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) \
|
||||
wbinvd(); \
|
||||
} while (0)
|
||||
|
||||
int __acpi_acquire_global_lock(unsigned int *lock);
|
||||
int __acpi_release_global_lock(unsigned int *lock);
|
||||
|
||||
@@ -328,6 +328,8 @@ struct apic {
|
||||
|
||||
/* wakeup_secondary_cpu */
|
||||
int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip);
|
||||
/* wakeup secondary CPU using 64-bit wakeup point */
|
||||
int (*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip);
|
||||
|
||||
void (*inquire_remote_apic)(int apicid);
|
||||
|
||||
@@ -488,6 +490,11 @@ static inline unsigned int read_apic_id(void)
|
||||
return apic->get_apic_id(reg);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
typedef int (*wakeup_cpu_handler)(int apicid, unsigned long start_eip);
|
||||
extern void acpi_wake_cpu_handler_update(wakeup_cpu_handler handler);
|
||||
#endif
|
||||
|
||||
extern int default_apic_id_valid(u32 apicid);
|
||||
extern int default_acpi_madt_oem_check(char *, char *);
|
||||
extern void default_setup_apic_routing(void);
|
||||
|
||||
@@ -238,6 +238,7 @@
|
||||
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
|
||||
#define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */
|
||||
#define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */
|
||||
#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Intel Trust Domain Extensions Guest */
|
||||
|
||||
/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
|
||||
#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
|
||||
|
||||
@@ -68,6 +68,12 @@
|
||||
# define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31))
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_INTEL_TDX_GUEST
|
||||
# define DISABLE_TDX_GUEST 0
|
||||
#else
|
||||
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Make sure to add features to the correct mask
|
||||
*/
|
||||
@@ -79,7 +85,7 @@
|
||||
#define DISABLED_MASK5 0
|
||||
#define DISABLED_MASK6 0
|
||||
#define DISABLED_MASK7 (DISABLE_PTI)
|
||||
#define DISABLED_MASK8 0
|
||||
#define DISABLED_MASK8 (DISABLE_TDX_GUEST)
|
||||
#define DISABLED_MASK9 (DISABLE_SMAP|DISABLE_SGX)
|
||||
#define DISABLED_MASK10 0
|
||||
#define DISABLED_MASK11 0
|
||||
|
||||
@@ -632,6 +632,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER, exc_xen_hypervisor_callback);
|
||||
DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER, exc_xen_unknown_trap);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_INTEL_TDX_GUEST
|
||||
DECLARE_IDTENTRY(X86_TRAP_VE, exc_virtualization_exception);
|
||||
#endif
|
||||
|
||||
/* Device interrupts common/spurious */
|
||||
DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, common_interrupt);
|
||||
#ifdef CONFIG_X86_LOCAL_APIC
|
||||
|
||||
@@ -44,6 +44,7 @@
|
||||
#include <asm/page.h>
|
||||
#include <asm/early_ioremap.h>
|
||||
#include <asm/pgtable_types.h>
|
||||
#include <asm/shared/io.h>
|
||||
|
||||
#define build_mmio_read(name, size, type, reg, barrier) \
|
||||
static inline type name(const volatile void __iomem *addr) \
|
||||
@@ -256,37 +257,23 @@ static inline void slow_down_io(void)
|
||||
#endif
|
||||
|
||||
#define BUILDIO(bwl, bw, type) \
|
||||
static inline void out##bwl(unsigned type value, int port) \
|
||||
{ \
|
||||
asm volatile("out" #bwl " %" #bw "0, %w1" \
|
||||
: : "a"(value), "Nd"(port)); \
|
||||
} \
|
||||
\
|
||||
static inline unsigned type in##bwl(int port) \
|
||||
{ \
|
||||
unsigned type value; \
|
||||
asm volatile("in" #bwl " %w1, %" #bw "0" \
|
||||
: "=a"(value) : "Nd"(port)); \
|
||||
return value; \
|
||||
} \
|
||||
\
|
||||
static inline void out##bwl##_p(unsigned type value, int port) \
|
||||
static inline void out##bwl##_p(type value, u16 port) \
|
||||
{ \
|
||||
out##bwl(value, port); \
|
||||
slow_down_io(); \
|
||||
} \
|
||||
\
|
||||
static inline unsigned type in##bwl##_p(int port) \
|
||||
static inline type in##bwl##_p(u16 port) \
|
||||
{ \
|
||||
unsigned type value = in##bwl(port); \
|
||||
type value = in##bwl(port); \
|
||||
slow_down_io(); \
|
||||
return value; \
|
||||
} \
|
||||
\
|
||||
static inline void outs##bwl(int port, const void *addr, unsigned long count) \
|
||||
static inline void outs##bwl(u16 port, const void *addr, unsigned long count) \
|
||||
{ \
|
||||
if (cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \
|
||||
unsigned type *value = (unsigned type *)addr; \
|
||||
type *value = (type *)addr; \
|
||||
while (count) { \
|
||||
out##bwl(*value, port); \
|
||||
value++; \
|
||||
@@ -299,10 +286,10 @@ static inline void outs##bwl(int port, const void *addr, unsigned long count) \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
static inline void ins##bwl(int port, void *addr, unsigned long count) \
|
||||
static inline void ins##bwl(u16 port, void *addr, unsigned long count) \
|
||||
{ \
|
||||
if (cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \
|
||||
unsigned type *value = (unsigned type *)addr; \
|
||||
type *value = (type *)addr; \
|
||||
while (count) { \
|
||||
*value = in##bwl(port); \
|
||||
value++; \
|
||||
@@ -315,13 +302,11 @@ static inline void ins##bwl(int port, void *addr, unsigned long count) \
|
||||
} \
|
||||
}
|
||||
|
||||
BUILDIO(b, b, char)
|
||||
BUILDIO(w, w, short)
|
||||
BUILDIO(l, , int)
|
||||
BUILDIO(b, b, u8)
|
||||
BUILDIO(w, w, u16)
|
||||
BUILDIO(l, , u32)
|
||||
#undef BUILDIO
|
||||
|
||||
#define inb inb
|
||||
#define inw inw
|
||||
#define inl inl
|
||||
#define inb_p inb_p
|
||||
#define inw_p inw_p
|
||||
#define inl_p inl_p
|
||||
@@ -329,9 +314,6 @@ BUILDIO(l, , int)
|
||||
#define insw insw
|
||||
#define insl insl
|
||||
|
||||
#define outb outb
|
||||
#define outw outw
|
||||
#define outl outl
|
||||
#define outb_p outb_p
|
||||
#define outw_p outw_p
|
||||
#define outl_p outl_p
|
||||
|
||||
@@ -7,6 +7,8 @@
|
||||
#include <linux/interrupt.h>
|
||||
#include <uapi/asm/kvm_para.h>
|
||||
|
||||
#include <asm/tdx.h>
|
||||
|
||||
#ifdef CONFIG_KVM_GUEST
|
||||
bool kvm_check_and_clear_guest_paused(void);
|
||||
#else
|
||||
@@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
|
||||
static inline long kvm_hypercall0(unsigned int nr)
|
||||
{
|
||||
long ret;
|
||||
|
||||
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
|
||||
return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
|
||||
|
||||
asm volatile(KVM_HYPERCALL
|
||||
: "=a"(ret)
|
||||
: "a"(nr)
|
||||
@@ -42,6 +48,10 @@ static inline long kvm_hypercall0(unsigned int nr)
|
||||
static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
|
||||
{
|
||||
long ret;
|
||||
|
||||
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
|
||||
return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
|
||||
|
||||
asm volatile(KVM_HYPERCALL
|
||||
: "=a"(ret)
|
||||
: "a"(nr), "b"(p1)
|
||||
@@ -53,6 +63,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
|
||||
unsigned long p2)
|
||||
{
|
||||
long ret;
|
||||
|
||||
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
|
||||
return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
|
||||
|
||||
asm volatile(KVM_HYPERCALL
|
||||
: "=a"(ret)
|
||||
: "a"(nr), "b"(p1), "c"(p2)
|
||||
@@ -64,6 +78,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
|
||||
unsigned long p2, unsigned long p3)
|
||||
{
|
||||
long ret;
|
||||
|
||||
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
|
||||
return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
|
||||
|
||||
asm volatile(KVM_HYPERCALL
|
||||
: "=a"(ret)
|
||||
: "a"(nr), "b"(p1), "c"(p2), "d"(p3)
|
||||
@@ -76,6 +94,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
|
||||
unsigned long p4)
|
||||
{
|
||||
long ret;
|
||||
|
||||
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
|
||||
return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
|
||||
|
||||
asm volatile(KVM_HYPERCALL
|
||||
: "=a"(ret)
|
||||
: "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
|
||||
|
||||
@@ -49,9 +49,6 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
|
||||
|
||||
void __init mem_encrypt_free_decrypted_mem(void);
|
||||
|
||||
/* Architecture __weak replacement functions */
|
||||
void __init mem_encrypt_init(void);
|
||||
|
||||
void __init sev_es_init_vc_handling(void);
|
||||
|
||||
#define __bss_decrypted __section(".bss..decrypted")
|
||||
@@ -89,6 +86,9 @@ static inline void mem_encrypt_free_decrypted_mem(void) { }
|
||||
|
||||
#endif /* CONFIG_AMD_MEM_ENCRYPT */
|
||||
|
||||
/* Architecture __weak replacement functions */
|
||||
void __init mem_encrypt_init(void);
|
||||
|
||||
/*
|
||||
* The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
|
||||
* writing to or comparing values from the cr3 register. Having the
|
||||
|
||||
@@ -25,6 +25,7 @@ struct real_mode_header {
|
||||
u32 sev_es_trampoline_start;
|
||||
#endif
|
||||
#ifdef CONFIG_X86_64
|
||||
u32 trampoline_start64;
|
||||
u32 trampoline_pgd;
|
||||
#endif
|
||||
/* ACPI S3 wakeup */
|
||||
|
||||
34
arch/x86/include/asm/shared/io.h
Normal file
34
arch/x86/include/asm/shared/io.h
Normal file
@@ -0,0 +1,34 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef _ASM_X86_SHARED_IO_H
|
||||
#define _ASM_X86_SHARED_IO_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define BUILDIO(bwl, bw, type) \
|
||||
static inline void __out##bwl(type value, u16 port) \
|
||||
{ \
|
||||
asm volatile("out" #bwl " %" #bw "0, %w1" \
|
||||
: : "a"(value), "Nd"(port)); \
|
||||
} \
|
||||
\
|
||||
static inline type __in##bwl(u16 port) \
|
||||
{ \
|
||||
type value; \
|
||||
asm volatile("in" #bwl " %w1, %" #bw "0" \
|
||||
: "=a"(value) : "Nd"(port)); \
|
||||
return value; \
|
||||
}
|
||||
|
||||
BUILDIO(b, b, u8)
|
||||
BUILDIO(w, w, u16)
|
||||
BUILDIO(l, , u32)
|
||||
#undef BUILDIO
|
||||
|
||||
#define inb __inb
|
||||
#define inw __inw
|
||||
#define inl __inl
|
||||
#define outb __outb
|
||||
#define outw __outw
|
||||
#define outl __outl
|
||||
|
||||
#endif
|
||||
40
arch/x86/include/asm/shared/tdx.h
Normal file
40
arch/x86/include/asm/shared/tdx.h
Normal file
@@ -0,0 +1,40 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef _ASM_X86_SHARED_TDX_H
|
||||
#define _ASM_X86_SHARED_TDX_H
|
||||
|
||||
#include <linux/bits.h>
|
||||
#include <linux/types.h>
|
||||
|
||||
#define TDX_HYPERCALL_STANDARD 0
|
||||
|
||||
#define TDX_HCALL_HAS_OUTPUT BIT(0)
|
||||
#define TDX_HCALL_ISSUE_STI BIT(1)
|
||||
|
||||
#define TDX_CPUID_LEAF_ID 0x21
|
||||
#define TDX_IDENT "IntelTDX "
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
/*
|
||||
* Used in __tdx_hypercall() to pass down and get back registers' values of
|
||||
* the TDCALL instruction when requesting services from the VMM.
|
||||
*
|
||||
* This is a software only structure and not part of the TDX module/VMM ABI.
|
||||
*/
|
||||
struct tdx_hypercall_args {
|
||||
u64 r10;
|
||||
u64 r11;
|
||||
u64 r12;
|
||||
u64 r13;
|
||||
u64 r14;
|
||||
u64 r15;
|
||||
};
|
||||
|
||||
/* Used to request services from the VMM */
|
||||
u64 __tdx_hypercall(struct tdx_hypercall_args *args, unsigned long flags);
|
||||
|
||||
/* Called from __tdx_hypercall() for unrecoverable failure */
|
||||
void __tdx_hypercall_failed(void);
|
||||
|
||||
#endif /* !__ASSEMBLY__ */
|
||||
#endif /* _ASM_X86_SHARED_TDX_H */
|
||||
91
arch/x86/include/asm/tdx.h
Normal file
91
arch/x86/include/asm/tdx.h
Normal file
@@ -0,0 +1,91 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/* Copyright (C) 2021-2022 Intel Corporation */
|
||||
#ifndef _ASM_X86_TDX_H
|
||||
#define _ASM_X86_TDX_H
|
||||
|
||||
#include <linux/init.h>
|
||||
#include <linux/bits.h>
|
||||
#include <asm/ptrace.h>
|
||||
#include <asm/shared/tdx.h>
|
||||
|
||||
/*
|
||||
* SW-defined error codes.
|
||||
*
|
||||
* Bits 47:40 == 0xFF indicate Reserved status code class that never used by
|
||||
* TDX module.
|
||||
*/
|
||||
#define TDX_ERROR _BITUL(63)
|
||||
#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
|
||||
#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
/*
|
||||
* Used to gather the output registers values of the TDCALL and SEAMCALL
|
||||
* instructions when requesting services from the TDX module.
|
||||
*
|
||||
* This is a software only structure and not part of the TDX module/VMM ABI.
|
||||
*/
|
||||
struct tdx_module_output {
|
||||
u64 rcx;
|
||||
u64 rdx;
|
||||
u64 r8;
|
||||
u64 r9;
|
||||
u64 r10;
|
||||
u64 r11;
|
||||
};
|
||||
|
||||
/*
|
||||
* Used by the #VE exception handler to gather the #VE exception
|
||||
* info from the TDX module. This is a software only structure
|
||||
* and not part of the TDX module/VMM ABI.
|
||||
*/
|
||||
struct ve_info {
|
||||
u64 exit_reason;
|
||||
u64 exit_qual;
|
||||
/* Guest Linear (virtual) Address */
|
||||
u64 gla;
|
||||
/* Guest Physical Address */
|
||||
u64 gpa;
|
||||
u32 instr_len;
|
||||
u32 instr_info;
|
||||
};
|
||||
|
||||
#ifdef CONFIG_INTEL_TDX_GUEST
|
||||
|
||||
void __init tdx_early_init(void);
|
||||
|
||||
/* Used to communicate with the TDX module */
|
||||
u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
|
||||
struct tdx_module_output *out);
|
||||
|
||||
void tdx_get_ve_info(struct ve_info *ve);
|
||||
|
||||
bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
|
||||
|
||||
void tdx_safe_halt(void);
|
||||
|
||||
bool tdx_early_handle_ve(struct pt_regs *regs);
|
||||
|
||||
#else
|
||||
|
||||
static inline void tdx_early_init(void) { };
|
||||
static inline void tdx_safe_halt(void) { };
|
||||
|
||||
static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }
|
||||
|
||||
#endif /* CONFIG_INTEL_TDX_GUEST */
|
||||
|
||||
#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
|
||||
long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2,
|
||||
unsigned long p3, unsigned long p4);
|
||||
#else
|
||||
static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
|
||||
unsigned long p2, unsigned long p3,
|
||||
unsigned long p4)
|
||||
{
|
||||
return -ENODEV;
|
||||
}
|
||||
#endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
|
||||
#endif /* !__ASSEMBLY__ */
|
||||
#endif /* _ASM_X86_TDX_H */
|
||||
@@ -65,6 +65,13 @@ static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
|
||||
static bool acpi_support_online_capable;
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
/* Physical address of the Multiprocessor Wakeup Structure mailbox */
|
||||
static u64 acpi_mp_wake_mailbox_paddr;
|
||||
/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
|
||||
static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_X86_IO_APIC
|
||||
/*
|
||||
* Locks related to IOAPIC hotplug
|
||||
@@ -336,7 +343,60 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long e
|
||||
return 0;
|
||||
}
|
||||
|
||||
#endif /*CONFIG_X86_LOCAL_APIC */
|
||||
#ifdef CONFIG_X86_64
|
||||
static int acpi_wakeup_cpu(int apicid, unsigned long start_ip)
|
||||
{
|
||||
/*
|
||||
* Remap mailbox memory only for the first call to acpi_wakeup_cpu().
|
||||
*
|
||||
* Wakeup of secondary CPUs is fully serialized in the core code.
|
||||
* No need to protect acpi_mp_wake_mailbox from concurrent accesses.
|
||||
*/
|
||||
if (!acpi_mp_wake_mailbox) {
|
||||
acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
|
||||
sizeof(*acpi_mp_wake_mailbox),
|
||||
MEMREMAP_WB);
|
||||
}
|
||||
|
||||
/*
|
||||
* Mailbox memory is shared between the firmware and OS. Firmware will
|
||||
* listen on mailbox command address, and once it receives the wakeup
|
||||
* command, the CPU associated with the given apicid will be booted.
|
||||
*
|
||||
* The value of 'apic_id' and 'wakeup_vector' must be visible to the
|
||||
* firmware before the wakeup command is visible. smp_store_release()
|
||||
* ensures ordering and visibility.
|
||||
*/
|
||||
acpi_mp_wake_mailbox->apic_id = apicid;
|
||||
acpi_mp_wake_mailbox->wakeup_vector = start_ip;
|
||||
smp_store_release(&acpi_mp_wake_mailbox->command,
|
||||
ACPI_MP_WAKE_COMMAND_WAKEUP);
|
||||
|
||||
/*
|
||||
* Wait for the CPU to wake up.
|
||||
*
|
||||
* The CPU being woken up is essentially in a spin loop waiting to be
|
||||
* woken up. It should not take long for it wake up and acknowledge by
|
||||
* zeroing out ->command.
|
||||
*
|
||||
* ACPI specification doesn't provide any guidance on how long kernel
|
||||
* has to wait for a wake up acknowledgement. It also doesn't provide
|
||||
* a way to cancel a wake up request if it takes too long.
|
||||
*
|
||||
* In TDX environment, the VMM has control over how long it takes to
|
||||
* wake up secondary. It can postpone scheduling secondary vCPU
|
||||
* indefinitely. Giving up on wake up request and reporting error opens
|
||||
* possible attack vector for VMM: it can wake up a secondary CPU when
|
||||
* kernel doesn't expect it. Wait until positive result of the wake up
|
||||
* request.
|
||||
*/
|
||||
while (READ_ONCE(acpi_mp_wake_mailbox->command))
|
||||
cpu_relax();
|
||||
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_X86_64 */
|
||||
#endif /* CONFIG_X86_LOCAL_APIC */
|
||||
|
||||
#ifdef CONFIG_X86_IO_APIC
|
||||
#define MP_ISA_BUS 0
|
||||
@@ -1083,6 +1143,29 @@ static int __init acpi_parse_madt_lapic_entries(void)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
static int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
|
||||
const unsigned long end)
|
||||
{
|
||||
struct acpi_madt_multiproc_wakeup *mp_wake;
|
||||
|
||||
if (!IS_ENABLED(CONFIG_SMP))
|
||||
return -ENODEV;
|
||||
|
||||
mp_wake = (struct acpi_madt_multiproc_wakeup *)header;
|
||||
if (BAD_MADT_ENTRY(mp_wake, end))
|
||||
return -EINVAL;
|
||||
|
||||
acpi_table_print_madt_entry(&header->common);
|
||||
|
||||
acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
|
||||
|
||||
acpi_wake_cpu_handler_update(acpi_wakeup_cpu);
|
||||
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_X86_64 */
|
||||
#endif /* CONFIG_X86_LOCAL_APIC */
|
||||
|
||||
#ifdef CONFIG_X86_IO_APIC
|
||||
@@ -1278,6 +1361,14 @@ static void __init acpi_process_madt(void)
|
||||
|
||||
smp_found_config = 1;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
/*
|
||||
* Parse MADT MP Wake entry.
|
||||
*/
|
||||
acpi_table_parse_madt(ACPI_MADT_TYPE_MULTIPROC_WAKEUP,
|
||||
acpi_parse_mp_wake, 1);
|
||||
#endif
|
||||
}
|
||||
if (error == -EINVAL) {
|
||||
/*
|
||||
|
||||
@@ -2551,6 +2551,16 @@ u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(x86_msi_msg_get_destid);
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
void __init acpi_wake_cpu_handler_update(wakeup_cpu_handler handler)
|
||||
{
|
||||
struct apic **drv;
|
||||
|
||||
for (drv = __apicdrivers; drv < __apicdrivers_end; drv++)
|
||||
(*drv)->wakeup_secondary_cpu_64 = handler;
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Override the generic EOI implementation with an optimized version.
|
||||
* Only called during early boot when only one CPU is active and with
|
||||
|
||||
@@ -65,6 +65,7 @@
|
||||
#include <asm/irq_remapping.h>
|
||||
#include <asm/hw_irq.h>
|
||||
#include <asm/apic.h>
|
||||
#include <asm/pgtable.h>
|
||||
|
||||
#define for_each_ioapic(idx) \
|
||||
for ((idx) = 0; (idx) < nr_ioapics; (idx)++)
|
||||
@@ -2677,6 +2678,19 @@ static struct resource * __init ioapic_setup_resources(void)
|
||||
return res;
|
||||
}
|
||||
|
||||
static void io_apic_set_fixmap(enum fixed_addresses idx, phys_addr_t phys)
|
||||
{
|
||||
pgprot_t flags = FIXMAP_PAGE_NOCACHE;
|
||||
|
||||
/*
|
||||
* Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
|
||||
* bits, just like normal ioremap():
|
||||
*/
|
||||
flags = pgprot_decrypted(flags);
|
||||
|
||||
__set_fixmap(idx, phys, flags);
|
||||
}
|
||||
|
||||
void __init io_apic_init_mappings(void)
|
||||
{
|
||||
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
|
||||
@@ -2709,7 +2723,7 @@ fake_ioapic_page:
|
||||
__func__, PAGE_SIZE, PAGE_SIZE);
|
||||
ioapic_phys = __pa(ioapic_phys);
|
||||
}
|
||||
set_fixmap_nocache(idx, ioapic_phys);
|
||||
io_apic_set_fixmap(idx, ioapic_phys);
|
||||
apic_printk(APIC_VERBOSE, "mapped IOAPIC to %08lx (%08lx)\n",
|
||||
__fix_to_virt(idx) + (ioapic_phys & ~PAGE_MASK),
|
||||
ioapic_phys);
|
||||
@@ -2838,7 +2852,7 @@ int mp_register_ioapic(int id, u32 address, u32 gsi_base,
|
||||
ioapics[idx].mp_config.flags = MPC_APIC_USABLE;
|
||||
ioapics[idx].mp_config.apicaddr = address;
|
||||
|
||||
set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
|
||||
io_apic_set_fixmap(FIX_IO_APIC_BASE_0 + idx, address);
|
||||
if (bad_ioapic_register(idx)) {
|
||||
clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
|
||||
return -ENODEV;
|
||||
|
||||
@@ -18,6 +18,7 @@
|
||||
#include <asm/bootparam.h>
|
||||
#include <asm/suspend.h>
|
||||
#include <asm/tlbflush.h>
|
||||
#include <asm/tdx.h>
|
||||
|
||||
#ifdef CONFIG_XEN
|
||||
#include <xen/interface/xen.h>
|
||||
@@ -65,6 +66,22 @@ static void __used common(void)
|
||||
OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
|
||||
#endif
|
||||
|
||||
BLANK();
|
||||
OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx);
|
||||
OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx);
|
||||
OFFSET(TDX_MODULE_r8, tdx_module_output, r8);
|
||||
OFFSET(TDX_MODULE_r9, tdx_module_output, r9);
|
||||
OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
|
||||
OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
|
||||
|
||||
BLANK();
|
||||
OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_args, r10);
|
||||
OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_args, r11);
|
||||
OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_args, r12);
|
||||
OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_args, r13);
|
||||
OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_args, r14);
|
||||
OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_args, r15);
|
||||
|
||||
BLANK();
|
||||
OFFSET(BP_scratch, boot_params, scratch);
|
||||
OFFSET(BP_secure_boot, boot_params, secure_boot);
|
||||
|
||||
@@ -40,6 +40,7 @@
|
||||
#include <asm/extable.h>
|
||||
#include <asm/trapnr.h>
|
||||
#include <asm/sev.h>
|
||||
#include <asm/tdx.h>
|
||||
|
||||
/*
|
||||
* Manage page tables very early on.
|
||||
@@ -417,6 +418,9 @@ void __init do_early_exception(struct pt_regs *regs, int trapnr)
|
||||
trapnr == X86_TRAP_VC && handle_vc_boot_ghcb(regs))
|
||||
return;
|
||||
|
||||
if (trapnr == X86_TRAP_VE && tdx_early_handle_ve(regs))
|
||||
return;
|
||||
|
||||
early_fixup_exception(regs, trapnr);
|
||||
}
|
||||
|
||||
@@ -515,6 +519,9 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
|
||||
|
||||
idt_setup_early_handler();
|
||||
|
||||
/* Needed before cc_platform_has() can be used for TDX */
|
||||
tdx_early_init();
|
||||
|
||||
copy_bootdata(__va(real_mode_data));
|
||||
|
||||
/*
|
||||
|
||||
@@ -173,8 +173,22 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
|
||||
addq $(init_top_pgt - __START_KERNEL_map), %rax
|
||||
1:
|
||||
|
||||
#ifdef CONFIG_X86_MCE
|
||||
/*
|
||||
* Preserve CR4.MCE if the kernel will enable #MC support.
|
||||
* Clearing MCE may fault in some environments (that also force #MC
|
||||
* support). Any machine check that occurs before #MC support is fully
|
||||
* configured will crash the system regardless of the CR4.MCE value set
|
||||
* here.
|
||||
*/
|
||||
movq %cr4, %rcx
|
||||
andl $X86_CR4_MCE, %ecx
|
||||
#else
|
||||
movl $0, %ecx
|
||||
#endif
|
||||
|
||||
/* Enable PAE mode, PGE and LA57 */
|
||||
movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
|
||||
orl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
|
||||
#ifdef CONFIG_X86_5LEVEL
|
||||
testl $1, __pgtable_l5_enabled(%rip)
|
||||
jz 1f
|
||||
@@ -280,13 +294,23 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
|
||||
/* Setup EFER (Extended Feature Enable Register) */
|
||||
movl $MSR_EFER, %ecx
|
||||
rdmsr
|
||||
/*
|
||||
* Preserve current value of EFER for comparison and to skip
|
||||
* EFER writes if no change was made (for TDX guest)
|
||||
*/
|
||||
movl %eax, %edx
|
||||
btsl $_EFER_SCE, %eax /* Enable System Call */
|
||||
btl $20,%edi /* No Execute supported? */
|
||||
jnc 1f
|
||||
btsl $_EFER_NX, %eax
|
||||
btsq $_PAGE_BIT_NX,early_pmd_flags(%rip)
|
||||
1: wrmsr /* Make changes effective */
|
||||
|
||||
/* Avoid writing EFER if no change was made (for TDX guest) */
|
||||
1: cmpl %edx, %eax
|
||||
je 1f
|
||||
xor %edx, %edx
|
||||
wrmsr /* Make changes effective */
|
||||
1:
|
||||
/* Setup cr0 */
|
||||
movl $CR0_STATE, %eax
|
||||
/* Make changes effective */
|
||||
|
||||
@@ -69,6 +69,9 @@ static const __initconst struct idt_data early_idts[] = {
|
||||
*/
|
||||
INTG(X86_TRAP_PF, asm_exc_page_fault),
|
||||
#endif
|
||||
#ifdef CONFIG_INTEL_TDX_GUEST
|
||||
INTG(X86_TRAP_VE, asm_exc_virtualization_exception),
|
||||
#endif
|
||||
};
|
||||
|
||||
/*
|
||||
|
||||
@@ -46,6 +46,7 @@
|
||||
#include <asm/proto.h>
|
||||
#include <asm/frame.h>
|
||||
#include <asm/unwind.h>
|
||||
#include <asm/tdx.h>
|
||||
|
||||
#include "process.h"
|
||||
|
||||
@@ -873,6 +874,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
|
||||
} else if (prefer_mwait_c1_over_halt(c)) {
|
||||
pr_info("using mwait in idle threads\n");
|
||||
x86_idle = mwait_idle;
|
||||
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
|
||||
pr_info("using TDX aware idle routine\n");
|
||||
x86_idle = tdx_safe_halt;
|
||||
} else
|
||||
x86_idle = default_idle;
|
||||
}
|
||||
|
||||
@@ -1083,6 +1083,11 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
|
||||
unsigned long boot_error = 0;
|
||||
unsigned long timeout;
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
/* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
|
||||
if (apic->wakeup_secondary_cpu_64)
|
||||
start_ip = real_mode_header->trampoline_start64;
|
||||
#endif
|
||||
idle->thread.sp = (unsigned long)task_pt_regs(idle);
|
||||
early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
|
||||
initial_code = (unsigned long)start_secondary;
|
||||
@@ -1124,11 +1129,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
|
||||
|
||||
/*
|
||||
* Wake up a CPU in difference cases:
|
||||
* - Use the method in the APIC driver if it's defined
|
||||
* - Use a method from the APIC driver if one defined, with wakeup
|
||||
* straight to 64-bit mode preferred over wakeup to RM.
|
||||
* Otherwise,
|
||||
* - Use an INIT boot APIC message for APs or NMI for BSP.
|
||||
*/
|
||||
if (apic->wakeup_secondary_cpu)
|
||||
if (apic->wakeup_secondary_cpu_64)
|
||||
boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip);
|
||||
else if (apic->wakeup_secondary_cpu)
|
||||
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
|
||||
else
|
||||
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
|
||||
|
||||
@@ -62,6 +62,7 @@
|
||||
#include <asm/insn.h>
|
||||
#include <asm/insn-eval.h>
|
||||
#include <asm/vdso.h>
|
||||
#include <asm/tdx.h>
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
#include <asm/x86_init.h>
|
||||
@@ -686,13 +687,40 @@ static bool try_fixup_enqcmd_gp(void)
|
||||
#endif
|
||||
}
|
||||
|
||||
static bool gp_try_fixup_and_notify(struct pt_regs *regs, int trapnr,
|
||||
unsigned long error_code, const char *str)
|
||||
{
|
||||
if (fixup_exception(regs, trapnr, error_code, 0))
|
||||
return true;
|
||||
|
||||
current->thread.error_code = error_code;
|
||||
current->thread.trap_nr = trapnr;
|
||||
|
||||
/*
|
||||
* To be potentially processing a kprobe fault and to trust the result
|
||||
* from kprobe_running(), we have to be non-preemptible.
|
||||
*/
|
||||
if (!preemptible() && kprobe_running() &&
|
||||
kprobe_fault_handler(regs, trapnr))
|
||||
return true;
|
||||
|
||||
return notify_die(DIE_GPF, str, regs, error_code, trapnr, SIGSEGV) == NOTIFY_STOP;
|
||||
}
|
||||
|
||||
static void gp_user_force_sig_segv(struct pt_regs *regs, int trapnr,
|
||||
unsigned long error_code, const char *str)
|
||||
{
|
||||
current->thread.error_code = error_code;
|
||||
current->thread.trap_nr = trapnr;
|
||||
show_signal(current, SIGSEGV, "", str, regs, error_code);
|
||||
force_sig(SIGSEGV);
|
||||
}
|
||||
|
||||
DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
|
||||
{
|
||||
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
|
||||
enum kernel_gp_hint hint = GP_NO_HINT;
|
||||
struct task_struct *tsk;
|
||||
unsigned long gp_addr;
|
||||
int ret;
|
||||
|
||||
if (user_mode(regs) && try_fixup_enqcmd_gp())
|
||||
return;
|
||||
@@ -711,40 +739,18 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
|
||||
return;
|
||||
}
|
||||
|
||||
tsk = current;
|
||||
|
||||
if (user_mode(regs)) {
|
||||
if (fixup_iopl_exception(regs))
|
||||
goto exit;
|
||||
|
||||
tsk->thread.error_code = error_code;
|
||||
tsk->thread.trap_nr = X86_TRAP_GP;
|
||||
|
||||
if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
|
||||
goto exit;
|
||||
|
||||
show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
|
||||
force_sig(SIGSEGV);
|
||||
gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
|
||||
goto exit;
|
||||
}
|
||||
|
||||
if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
|
||||
goto exit;
|
||||
|
||||
tsk->thread.error_code = error_code;
|
||||
tsk->thread.trap_nr = X86_TRAP_GP;
|
||||
|
||||
/*
|
||||
* To be potentially processing a kprobe fault and to trust the result
|
||||
* from kprobe_running(), we have to be non-preemptible.
|
||||
*/
|
||||
if (!preemptible() &&
|
||||
kprobe_running() &&
|
||||
kprobe_fault_handler(regs, X86_TRAP_GP))
|
||||
goto exit;
|
||||
|
||||
ret = notify_die(DIE_GPF, desc, regs, error_code, X86_TRAP_GP, SIGSEGV);
|
||||
if (ret == NOTIFY_STOP)
|
||||
if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc))
|
||||
goto exit;
|
||||
|
||||
if (error_code)
|
||||
@@ -1343,6 +1349,91 @@ DEFINE_IDTENTRY(exc_device_not_available)
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_INTEL_TDX_GUEST
|
||||
|
||||
#define VE_FAULT_STR "VE fault"
|
||||
|
||||
static void ve_raise_fault(struct pt_regs *regs, long error_code)
|
||||
{
|
||||
if (user_mode(regs)) {
|
||||
gp_user_force_sig_segv(regs, X86_TRAP_VE, error_code, VE_FAULT_STR);
|
||||
return;
|
||||
}
|
||||
|
||||
if (gp_try_fixup_and_notify(regs, X86_TRAP_VE, error_code, VE_FAULT_STR))
|
||||
return;
|
||||
|
||||
die_addr(VE_FAULT_STR, regs, error_code, 0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Virtualization Exceptions (#VE) are delivered to TDX guests due to
|
||||
* specific guest actions which may happen in either user space or the
|
||||
* kernel:
|
||||
*
|
||||
* * Specific instructions (WBINVD, for example)
|
||||
* * Specific MSR accesses
|
||||
* * Specific CPUID leaf accesses
|
||||
* * Access to specific guest physical addresses
|
||||
*
|
||||
* In the settings that Linux will run in, virtualization exceptions are
|
||||
* never generated on accesses to normal, TD-private memory that has been
|
||||
* accepted (by BIOS or with tdx_enc_status_changed()).
|
||||
*
|
||||
* Syscall entry code has a critical window where the kernel stack is not
|
||||
* yet set up. Any exception in this window leads to hard to debug issues
|
||||
* and can be exploited for privilege escalation. Exceptions in the NMI
|
||||
* entry code also cause issues. Returning from the exception handler with
|
||||
* IRET will re-enable NMIs and nested NMI will corrupt the NMI stack.
|
||||
*
|
||||
* For these reasons, the kernel avoids #VEs during the syscall gap and
|
||||
* the NMI entry code. Entry code paths do not access TD-shared memory,
|
||||
* MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves
|
||||
* that might generate #VE. VMM can remove memory from TD at any point,
|
||||
* but access to unaccepted (or missing) private memory leads to VM
|
||||
* termination, not to #VE.
|
||||
*
|
||||
* Similarly to page faults and breakpoints, #VEs are allowed in NMI
|
||||
* handlers once the kernel is ready to deal with nested NMIs.
|
||||
*
|
||||
* During #VE delivery, all interrupts, including NMIs, are blocked until
|
||||
* TDGETVEINFO is called. It prevents #VE nesting until the kernel reads
|
||||
* the VE info.
|
||||
*
|
||||
* If a guest kernel action which would normally cause a #VE occurs in
|
||||
* the interrupt-disabled region before TDGETVEINFO, a #DF (fault
|
||||
* exception) is delivered to the guest which will result in an oops.
|
||||
*
|
||||
* The entry code has been audited carefully for following these expectations.
|
||||
* Changes in the entry code have to be audited for correctness vs. this
|
||||
* aspect. Similarly to #PF, #VE in these places will expose kernel to
|
||||
* privilege escalation or may lead to random crashes.
|
||||
*/
|
||||
DEFINE_IDTENTRY(exc_virtualization_exception)
|
||||
{
|
||||
struct ve_info ve;
|
||||
|
||||
/*
|
||||
* NMIs/Machine-checks/Interrupts will be in a disabled state
|
||||
* till TDGETVEINFO TDCALL is executed. This ensures that VE
|
||||
* info cannot be overwritten by a nested #VE.
|
||||
*/
|
||||
tdx_get_ve_info(&ve);
|
||||
|
||||
cond_local_irq_enable(regs);
|
||||
|
||||
/*
|
||||
* If tdx_handle_virt_exception() could not process
|
||||
* it successfully, treat it as #GP(0) and handle it.
|
||||
*/
|
||||
if (!tdx_handle_virt_exception(regs, &ve))
|
||||
ve_raise_fault(regs, 0);
|
||||
|
||||
cond_local_irq_disable(regs);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_X86_32
|
||||
DEFINE_IDTENTRY_SW(iret_error)
|
||||
{
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
#include <asm/msr.h>
|
||||
#include <asm/archrandom.h>
|
||||
#include <asm/e820/api.h>
|
||||
#include <asm/io.h>
|
||||
#include <asm/shared/io.h>
|
||||
|
||||
/*
|
||||
* When built for the regular kernel, several functions need to be stubbed out
|
||||
|
||||
@@ -242,10 +242,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size,
|
||||
* If the page being mapped is in memory and SEV is active then
|
||||
* make sure the memory encryption attribute is enabled in the
|
||||
* resulting mapping.
|
||||
* In TDX guests, memory is marked private by default. If encryption
|
||||
* is not requested (using encrypted), explicitly set decrypt
|
||||
* attribute in all IOREMAPPED memory.
|
||||
*/
|
||||
prot = PAGE_KERNEL_IO;
|
||||
if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
|
||||
prot = pgprot_encrypted(prot);
|
||||
else
|
||||
prot = pgprot_decrypted(prot);
|
||||
|
||||
switch (pcm) {
|
||||
case _PAGE_CACHE_MODE_UC:
|
||||
|
||||
@@ -42,7 +42,14 @@ bool force_dma_unencrypted(struct device *dev)
|
||||
|
||||
static void print_mem_encrypt_feature_info(void)
|
||||
{
|
||||
pr_info("AMD Memory Encryption Features active:");
|
||||
pr_info("Memory Encryption Features active:");
|
||||
|
||||
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
|
||||
pr_cont(" Intel TDX\n");
|
||||
return;
|
||||
}
|
||||
|
||||
pr_cont(" AMD");
|
||||
|
||||
/* Secure Memory Encryption */
|
||||
if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
|
||||
|
||||
@@ -24,6 +24,7 @@ SYM_DATA_START(real_mode_header)
|
||||
.long pa_sev_es_trampoline_start
|
||||
#endif
|
||||
#ifdef CONFIG_X86_64
|
||||
.long pa_trampoline_start64
|
||||
.long pa_trampoline_pgd;
|
||||
#endif
|
||||
/* ACPI S3 wakeup */
|
||||
|
||||
@@ -70,7 +70,7 @@ SYM_CODE_START(trampoline_start)
|
||||
movw $__KERNEL_DS, %dx # Data segment descriptor
|
||||
|
||||
# Enable protected mode
|
||||
movl $X86_CR0_PE, %eax # protected mode (PE) bit
|
||||
movl $(CR0_STATE & ~X86_CR0_PG), %eax
|
||||
movl %eax, %cr0 # into protected mode
|
||||
|
||||
# flush prefetch and jump to startup_32
|
||||
@@ -143,13 +143,24 @@ SYM_CODE_START(startup_32)
|
||||
movl %eax, %cr3
|
||||
|
||||
# Set up EFER
|
||||
movl $MSR_EFER, %ecx
|
||||
rdmsr
|
||||
/*
|
||||
* Skip writing to EFER if the register already has desired
|
||||
* value (to avoid #VE for the TDX guest).
|
||||
*/
|
||||
cmp pa_tr_efer, %eax
|
||||
jne .Lwrite_efer
|
||||
cmp pa_tr_efer + 4, %edx
|
||||
je .Ldone_efer
|
||||
.Lwrite_efer:
|
||||
movl pa_tr_efer, %eax
|
||||
movl pa_tr_efer + 4, %edx
|
||||
movl $MSR_EFER, %ecx
|
||||
wrmsr
|
||||
|
||||
# Enable paging and in turn activate Long Mode
|
||||
movl $(X86_CR0_PG | X86_CR0_WP | X86_CR0_PE), %eax
|
||||
.Ldone_efer:
|
||||
# Enable paging and in turn activate Long Mode.
|
||||
movl $CR0_STATE, %eax
|
||||
movl %eax, %cr0
|
||||
|
||||
/*
|
||||
@@ -161,6 +172,19 @@ SYM_CODE_START(startup_32)
|
||||
ljmpl $__KERNEL_CS, $pa_startup_64
|
||||
SYM_CODE_END(startup_32)
|
||||
|
||||
SYM_CODE_START(pa_trampoline_compat)
|
||||
/*
|
||||
* In compatibility mode. Prep ESP and DX for startup_32, then disable
|
||||
* paging and complete the switch to legacy 32-bit mode.
|
||||
*/
|
||||
movl $rm_stack_end, %esp
|
||||
movw $__KERNEL_DS, %dx
|
||||
|
||||
movl $(CR0_STATE & ~X86_CR0_PG), %eax
|
||||
movl %eax, %cr0
|
||||
ljmpl $__KERNEL32_CS, $pa_startup_32
|
||||
SYM_CODE_END(pa_trampoline_compat)
|
||||
|
||||
.section ".text64","ax"
|
||||
.code64
|
||||
.balign 4
|
||||
@@ -169,6 +193,20 @@ SYM_CODE_START(startup_64)
|
||||
jmpq *tr_start(%rip)
|
||||
SYM_CODE_END(startup_64)
|
||||
|
||||
SYM_CODE_START(trampoline_start64)
|
||||
/*
|
||||
* APs start here on a direct transfer from 64-bit BIOS with identity
|
||||
* mapped page tables. Load the kernel's GDT in order to gear down to
|
||||
* 32-bit mode (to handle 4-level vs. 5-level paging), and to (re)load
|
||||
* segment registers. Load the zero IDT so any fault triggers a
|
||||
* shutdown instead of jumping back into BIOS.
|
||||
*/
|
||||
lidt tr_idt(%rip)
|
||||
lgdt tr_gdt64(%rip)
|
||||
|
||||
ljmpl *tr_compat(%rip)
|
||||
SYM_CODE_END(trampoline_start64)
|
||||
|
||||
.section ".rodata","a"
|
||||
# Duplicate the global descriptor table
|
||||
# so the kernel can live anywhere
|
||||
@@ -182,6 +220,17 @@ SYM_DATA_START(tr_gdt)
|
||||
.quad 0x00cf93000000ffff # __KERNEL_DS
|
||||
SYM_DATA_END_LABEL(tr_gdt, SYM_L_LOCAL, tr_gdt_end)
|
||||
|
||||
SYM_DATA_START(tr_gdt64)
|
||||
.short tr_gdt_end - tr_gdt - 1 # gdt limit
|
||||
.long pa_tr_gdt
|
||||
.long 0
|
||||
SYM_DATA_END(tr_gdt64)
|
||||
|
||||
SYM_DATA_START(tr_compat)
|
||||
.long pa_trampoline_compat
|
||||
.short __KERNEL32_CS
|
||||
SYM_DATA_END(tr_compat)
|
||||
|
||||
.bss
|
||||
.balign PAGE_SIZE
|
||||
SYM_DATA(trampoline_pgd, .space PAGE_SIZE)
|
||||
|
||||
@@ -1,4 +1,14 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
.section ".rodata","a"
|
||||
.balign 16
|
||||
SYM_DATA_LOCAL(tr_idt, .fill 1, 6, 0)
|
||||
|
||||
/*
|
||||
* When a bootloader hands off to the kernel in 32-bit mode an
|
||||
* IDT with a 2-byte limit and 4-byte base is needed. When a boot
|
||||
* loader hands off to a kernel 64-bit mode the base address
|
||||
* extends to 8-bytes. Reserve enough space for either scenario.
|
||||
*/
|
||||
SYM_DATA_START_LOCAL(tr_idt)
|
||||
.short 0
|
||||
.quad 0
|
||||
SYM_DATA_END(tr_idt)
|
||||
|
||||
@@ -62,8 +62,12 @@ static void send_morse(const char *pattern)
|
||||
}
|
||||
}
|
||||
|
||||
struct port_io_ops pio_ops;
|
||||
|
||||
void main(void)
|
||||
{
|
||||
init_default_io_ops();
|
||||
|
||||
/* Kill machine if structures are wrong */
|
||||
if (wakeup_header.real_magic != 0x12345678)
|
||||
while (1)
|
||||
|
||||
96
arch/x86/virt/vmx/tdx/tdxcall.S
Normal file
96
arch/x86/virt/vmx/tdx/tdxcall.S
Normal file
@@ -0,0 +1,96 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#include <asm/asm-offsets.h>
|
||||
#include <asm/tdx.h>
|
||||
|
||||
/*
|
||||
* TDCALL and SEAMCALL are supported in Binutils >= 2.36.
|
||||
*/
|
||||
#define tdcall .byte 0x66,0x0f,0x01,0xcc
|
||||
#define seamcall .byte 0x66,0x0f,0x01,0xcf
|
||||
|
||||
/*
|
||||
* TDX_MODULE_CALL - common helper macro for both
|
||||
* TDCALL and SEAMCALL instructions.
|
||||
*
|
||||
* TDCALL - used by TDX guests to make requests to the
|
||||
* TDX module and hypercalls to the VMM.
|
||||
* SEAMCALL - used by TDX hosts to make requests to the
|
||||
* TDX module.
|
||||
*/
|
||||
.macro TDX_MODULE_CALL host:req
|
||||
/*
|
||||
* R12 will be used as temporary storage for struct tdx_module_output
|
||||
* pointer. Since R12-R15 registers are not used by TDCALL/SEAMCALL
|
||||
* services supported by this function, it can be reused.
|
||||
*/
|
||||
|
||||
/* Callee saved, so preserve it */
|
||||
push %r12
|
||||
|
||||
/*
|
||||
* Push output pointer to stack.
|
||||
* After the operation, it will be fetched into R12 register.
|
||||
*/
|
||||
push %r9
|
||||
|
||||
/* Mangle function call ABI into TDCALL/SEAMCALL ABI: */
|
||||
/* Move Leaf ID to RAX */
|
||||
mov %rdi, %rax
|
||||
/* Move input 4 to R9 */
|
||||
mov %r8, %r9
|
||||
/* Move input 3 to R8 */
|
||||
mov %rcx, %r8
|
||||
/* Move input 1 to RCX */
|
||||
mov %rsi, %rcx
|
||||
/* Leave input param 2 in RDX */
|
||||
|
||||
.if \host
|
||||
seamcall
|
||||
/*
|
||||
* SEAMCALL instruction is essentially a VMExit from VMX root
|
||||
* mode to SEAM VMX root mode. VMfailInvalid (CF=1) indicates
|
||||
* that the targeted SEAM firmware is not loaded or disabled,
|
||||
* or P-SEAMLDR is busy with another SEAMCALL. %rax is not
|
||||
* changed in this case.
|
||||
*
|
||||
* Set %rax to TDX_SEAMCALL_VMFAILINVALID for VMfailInvalid.
|
||||
* This value will never be used as actual SEAMCALL error code as
|
||||
* it is from the Reserved status code class.
|
||||
*/
|
||||
jnc .Lno_vmfailinvalid
|
||||
mov $TDX_SEAMCALL_VMFAILINVALID, %rax
|
||||
.Lno_vmfailinvalid:
|
||||
|
||||
.else
|
||||
tdcall
|
||||
.endif
|
||||
|
||||
/*
|
||||
* Fetch output pointer from stack to R12 (It is used
|
||||
* as temporary storage)
|
||||
*/
|
||||
pop %r12
|
||||
|
||||
/*
|
||||
* Since this macro can be invoked with NULL as an output pointer,
|
||||
* check if caller provided an output struct before storing output
|
||||
* registers.
|
||||
*
|
||||
* Update output registers, even if the call failed (RAX != 0).
|
||||
* Other registers may contain details of the failure.
|
||||
*/
|
||||
test %r12, %r12
|
||||
jz .Lno_output_struct
|
||||
|
||||
/* Copy result registers to output struct: */
|
||||
movq %rcx, TDX_MODULE_rcx(%r12)
|
||||
movq %rdx, TDX_MODULE_rdx(%r12)
|
||||
movq %r8, TDX_MODULE_r8(%r12)
|
||||
movq %r9, TDX_MODULE_r9(%r12)
|
||||
movq %r10, TDX_MODULE_r10(%r12)
|
||||
movq %r11, TDX_MODULE_r11(%r12)
|
||||
|
||||
.Lno_output_struct:
|
||||
/* Restore the state of R12 register */
|
||||
pop %r12
|
||||
.endm
|
||||
@@ -80,6 +80,16 @@ enum cc_attr {
|
||||
* using AMD SEV-SNP features.
|
||||
*/
|
||||
CC_ATTR_GUEST_SEV_SNP,
|
||||
|
||||
/**
|
||||
* @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled.
|
||||
*
|
||||
* The platform/OS is running as a guest/virtual machine does not
|
||||
* support CPU hotplug feature.
|
||||
*
|
||||
* Examples include TDX Guest.
|
||||
*/
|
||||
CC_ATTR_HOTPLUG_DISABLED,
|
||||
};
|
||||
|
||||
#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
|
||||
|
||||
@@ -35,6 +35,7 @@
|
||||
#include <linux/percpu-rwsem.h>
|
||||
#include <linux/cpuset.h>
|
||||
#include <linux/random.h>
|
||||
#include <linux/cc_platform.h>
|
||||
|
||||
#include <trace/events/power.h>
|
||||
#define CREATE_TRACE_POINTS
|
||||
@@ -1190,6 +1191,12 @@ out:
|
||||
|
||||
static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
|
||||
{
|
||||
/*
|
||||
* If the platform does not support hotplug, report it explicitly to
|
||||
* differentiate it from a transient offlining failure.
|
||||
*/
|
||||
if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED))
|
||||
return -EOPNOTSUPP;
|
||||
if (cpu_hotplug_disabled)
|
||||
return -EBUSY;
|
||||
return _cpu_down(cpu, 0, target);
|
||||
|
||||
Reference in New Issue
Block a user