Virtual Trust Levels (VTL) helps enable Hyper-V Virtual Secure Mode (VSM)
feature. VSM is a set of hypervisor capabilities and enlightenments
offered to host and guest partitions which enable the creation and
management of new security boundaries within operating system software.
VSM achieves and maintains isolation through VTLs.
Add early initialization for Virtual Trust Levels (VTL). This includes
initializing the x86 platform for VTL and enabling boot support for
secondary CPUs to start in targeted VTL context. For now, only enable
the code for targeted VTL level as 2.
When starting an AP at a VTL other than VTL0, the AP must start directly
in 64-bit mode, bypassing the usual 16-bit -> 32-bit -> 64-bit mode
transition sequence that occurs after waking up an AP with SIPI whose
vector points to the 16-bit AP startup trampoline code.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Link: https://lore.kernel.org/r/1681192532-15460-6-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
In the case where page tables are not freed, native_flush_tlb_multi()
does not do a remote TLB flush on CPUs in lazy TLB mode because the
CPU will flush itself at the next context switch. By comparison, the
Hyper-V enlightened TLB flush does not exclude CPUs in lazy TLB mode
and so performs unnecessary flushes.
If we're not freeing page tables, add logic to test for lazy TLB
mode when adding CPUs to the input argument to the Hyper-V TLB
flush hypercall. Exclude lazy TLB mode CPUs so the behavior
matches native_flush_tlb_multi() and the unnecessary flushes are
avoided. Handle both the <=64 vCPU case and the _ex case for >64
vCPUs.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1679922967-26582-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
When copying CPUs from a Linux cpumask to a Hyper-V VPset,
cpumask_to_vpset() currently has a "_noself" variant that doesn't copy
the current CPU to the VPset. Generalize this variant by replacing it
with a "_skip" variant having a callback function that is invoked for
each CPU to decide if that CPU should be copied. Update the one caller
of cpumask_to_vpset_noself() to use the new "_skip" variant instead.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1679922967-26582-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.
Add functions to make the new MMIO read and MMIO write hypercalls.
Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].
[1] https://lore.kernel.org/all/20220511223207.3386-2-parri.andrea@gmail.com/
Co-developed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-13-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With the vTOM bit now treated as a protection flag and not part of
the physical address, avoid remapping physical addresses with vTOM set
since technically such addresses aren't valid. Use ioremap_cache()
instead of memremap() to ensure that the mapping provides decrypted
access, which will correctly set the vTOM bit as a protection flag.
While this change is not required for correctness with the current
implementation of memremap(), for general code hygiene it's better to
not depend on the mapping functions doing something reasonable with
a physical address that is out-of-range.
While here, fix typos in two error messages.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-12-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().
As such, remove the code to create and manage the second
mapping for the pre-allocated send and recv buffers. This mapping
is the last user of hv_map_memory()/hv_unmap_memory(), so delete
these functions as well. Finally, hv_map_memory() is the last
user of vmap_pfn() in Hyper-V guest code, so remove the Kconfig
selection of VMAP_PFN.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-11-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), it's no longer necessary to
have separate code paths for mapping VMBus ring buffers for
for normal VMs and for Confidential VMs.
As such, remove the code path that uses vmap_pfn(), and set
the protection flags argument to vmap() to account for the
difference between normal and Confidential VMs.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-10-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().
As such, remove the code to create and manage the second
mapping for VMBus monitor pages. Because set_memory_decrypted()
and set_memory_encrypted() are no-ops in normal VMs, it's
not even necessary to test for being in a Confidential VM
(a.k.a., "Isolation VM").
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-9-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().
As such, remove swiotlb_unencrypted_base and the associated
code.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-8-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Merge the following 6 patches from tip/x86/sev, which are taken from
Michael Kelley's series [0]. The rest of Michael's series depend on
them.
x86/hyperv: Change vTOM handling to use standard coco mechanisms
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
x86/hyperv: Reorder code to facilitate future work
x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
0: https://lore.kernel.org/linux-hyperv/1679838727-87310-1-git-send-email-mikelley@microsoft.com/
VMBus driver code currently has direct dependency on ACPI and struct
acpi_device. As a staging step toward optionally configuring based on
Devicetree instead of ACPI, use a more generic platform device to reduce
the dependency on ACPI where possible, though the dependency on ACPI
is not completely removed. Also rename the function vmbus_acpi_remove()
to the more generic vmbus_mmio_remove().
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1679298460-11855-4-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pull scheduler fix from Borislav Petkov:
- Do not pull tasks to the local scheduling group if its average load
is higher than the average system load
* tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix imbalance overflow
Pull x86 fix from Borislav Petkov:
- Drop __init annotation from two rtc functions which get called after
boot is done, in order to prevent a crash
* tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/rtc: Remove __init for runtime functions
Pull powerpc fix from Michael Ellerman:
- A fix for NUMA distance handling in the pseries SCM (pmem) driver.
Thanks to Aneesh Kumar K.V.
* tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Update the NUMA distance table for the target node
Pull Kbuild fixes from Masahiro Yamada:
- Drop debug info from purgatory objects again
- Document that kernel.org provides prebuilt LLVM toolchains
- Give up handling untracked files for source package builds
- Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given
with a pre-epoch data.
- Change panic_show_mem() to a macro to handle variable-length argument
- Compress tarballs on-the-fly again
* tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: do not create intermediate *.tar for tar packages
kbuild: do not create intermediate *.tar for source tarballs
kbuild: merge cmd_archive_linux and cmd_archive_perf
init/initramfs: Fix argument forwarding to panic() in panic_show_mem()
initramfs: Check negative timestamp to prevent broken cpio archive
kbuild: give up untracked files for source package builds
Documentation/llvm: Add a note about prebuilt kernel.org toolchains
purgatory: fix disabling debug info
Pull ksmbd server fix from Steve French:
"smb311 server preauth integrity negotiate context parsing fix (check
for out of bounds access)"
* tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
ksmbd: avoid out of bounds access in decode_preauth_ctxt()
Commit 05e96e96a3 ("kbuild: use git-archive for source package
creation") split the compression as a separate step to factor out
the common build rules.
With the previous commit, we got back to the situation where source
tarballs are compressed on-the-fly.
There is no reason to keep the separate compression rules.
Generate the comressed tar packages directly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Since commit 05e96e96a3 ("kbuild: use git-archive for source package
creation"), a source tarball is created in two steps; create *.tar file
then compress it. I split the compression as a separate rule because I
just thought 'git archive' supported only gzip.
For other compression algorithms, I could pipe the two commands:
$ git archive HEAD | xz > linux.tar.xz
I read git-archive(1) carefully, and I realized GIT had provided a
more elegant way:
$ git -c tar.tar.xz.command=xz archive -o linux.tar.xz HEAD
This commit uses 'tar.tar.*.command' configuration to specify the
compression backend so we can compress a source tarball on-the-fly.
GIT commit 767cf4579f0e ("archive: implement configurable tar filters")
is more than a decade old, so it should be available on almost all build
environments.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
The two commands, cmd_archive_linux and cmd_archive_perf, are similar.
Merge them to make it easier to add more changes to the git-archive
command.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Forwarding variadic argument lists can't be done by passing a va_list
to a function with signature foo(...) (as panic() has). It ends up
interpreting the va_list itself as a single argument instead of
iterating it. printf() happily accepts it of course, leading to corrupt
output.
Convert panic_show_mem() to a macro to allow forwarding the arguments.
The function is trivial enough that it's easier than trying to introduce
a vpanic() variant.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Similar to commit 4c9d410f32 ("initramfs: Check timestamp to prevent
broken cpio archive"), except asserts that the timestamp is
non-negative. This can happen when the KBUILD_BUILD_TIMESTAMP is a value
before UNIX epoch, which may be set when making reproducible builds that
don't want to look like they use a valid date.
While support for dates before 1970 might not be supported, this is more
about preventing undetected CPIO corruption. The printf's use a minimum
length format specifier, and will happily make the field longer than 8
characters if they need to.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull cifs fix from Steve French:
"Small client fix for better checking for smb311 negotiate context
overflows, also marked for stable"
* tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix negotiate context parsing
Pull UBI fixes from Richard Weinberger:
- Fix failure to attach when vid_hdr offset equals the (sub)page size
- Fix for a deadlock in UBI's worker thread
* tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
ubi: Fix deadlock caused by recursively holding work_sem
smb311_decode_neg_context() doesn't properly check against SMB packet
boundaries prior to accessing individual negotiate context entries. This
is due to the length check omitting the eight byte smb2_neg_context
header, as well as incorrect decrementing of len_of_ctxts.
Fixes: 5100d8a3fe ("SMB311: Improve checking of negotiate security contexts")
Reported-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull i2c fixes from Wolfram Sang:
"Just two driver fixes"
* tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ocores: generate stop condition after timeout in polling mode
i2c: mchp-pci1xxxx: Update Timing registers
Pull SCSI fix from James Bottomley:
"One small fix to SCSI Enclosure Services to fix a regression caused by
another recent fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ses: Handle enclosure with just a primary component gracefully
Pull block fix from Jens Axboe:
"A single NVMe quirk entry addition"
* tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux:
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
Pull io_uring fix from Jens Axboe:
"Just a small tweak to when task_work needs redirection, marked for
stable as well"
* tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux:
io_uring: complete request via task work in case of DEFER_TASKRUN
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a missing fence when generating the NOMMU sigreturn
trampoline
- A set of fixes for early DTB handling of reserved memory nodes
* tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: No need to relocate the dtb as it lies in the fixmap region
riscv: Do not set initial_boot_params to the linear address of the dtb
riscv: Move early dtb mapping into the fixmap region
riscv: add icache flush for nommu sigreturn trampoline
Pull ACPI fixes from Rafael Wysocki:
"These add two ACPI-related quirks:
- Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario
Limonciello)
- Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul
Menzel)"
* tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA
ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
Pull power management fix from Rafael Wysocki:
"Make the amd-pstate cpufreq driver take all of the possible
combinations of the 'old' and 'new' status values correctly while
changing the operation mode via sysfs (Wyes Karny)"
* tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
amd-pstate: Fix amd_pstate mode switch
Pull thermal control fix from Rafael Wysocki:
"Modify the Intel thermal throttling code to avoid updating unsupported
status clearing mask bits which causes the kernel to complain about
unchecked MSR access (Srinivas Pandruvada)"
* tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits
Pull sound fixes from Takashi Iwai:
"A collection of small fixes.
At this time, quite a few fixes for the old PCI drivers are found.
Although they are not regression fixes, I took these as they are
materials for stable kernels.
In addition, a couple of regression fixes and another couple of
HD-audio quirks are included"
* tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: disable KAE for Intel DG2
ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2
ALSA: hda: patch_realtek: add quirk for Asus N7601ZM
ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
ALSA: emu10k1: don't create old pass-through playback device on Audigy
ALSA: emu10k1: fix capture interrupt handler unlinking
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Pull rdma fixes from Jason Gunthorpe:
"We had a fairly slow cycle on the rc side this time, here are the
accumulated fixes, mostly in drivers:
- irdma should not generate extra completions during flushing
- Fix several memory leaks
- Do not get confused in irdma's iwarp mode if IPv6 is present
- Correct a link speed calculation in mlx5
- Increase the EQ/WQ limits on erdma as they are too small for big
applications
- Use the right math for erdma's inline mtt feature
- Make erdma probing more robust to boot time ordering differences
- Fix a KMSAN crash in CMA due to uninitialized qkey"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Fix GID entry ref leak when create_ah fails
RDMA/cma: Allow UD qp_type to join multicast only
RDMA/erdma: Defer probing if netdevice can not be found
RDMA/erdma: Inline mtt entries into WQE if supported
RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
RDMA/erdma: Fix some typos
IB/mlx5: Add support for 400G_8X lane speed
RDMA/irdma: Add ipv4 check to irdma_find_listener()
RDMA/irdma: Increase iWARP CM default rexmit count
RDMA/irdma: Fix memory leak of PBLE objects
RDMA/irdma: Do not generate SW completions for NOPs
Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario
Limonciello).
* acpi-x86:
ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
So far io_req_complete_post() only covers DEFER_TASKRUN by completing
request via task work when the request is completed from IOWQ.
However, uring command could be completed from any context, and if io
uring is setup with DEFER_TASKRUN, the command is required to be
completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
can't be wakeup, and may hang forever.
The issue can be observed on removing ublk device, but turns out it is
one generic issue for uring command & DEFER_TASKRUN, so solve it in
io_uring core code.
Fixes: e6aeb2721d ("io_uring: complete all requests in task context")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/
Reported-by: Jens Axboe <axboe@kernel.dk>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting
duplicate NGUIDs.
Signed-off-by: Duy Truong <dory@dory.moe>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>