linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-09 12:17:12 +09:00

Author	SHA1	Message	Date
Paolo Bonzini	9389d5774a	Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control" This reverts commit `03a8871add`. Since commit `03a8871add` ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control"), KVM has taken ownership of the "load IA32_PERF_GLOBAL_CTRL" VMX entry/exit control bits, trying to set these bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports the architectural PMU (CPUID[EAX=0Ah].EAX[7:0]=1), and clear otherwise. This was a misguided attempt at mimicking what commit `5f76f6f5ff` ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled", 2018-10-01) did for MPX. However, that commit was a workaround for another KVM bug and not something that should be imitated. Mucking with the VMX MSRs creates a subtle, difficult to maintain ABI as KVM must ensure that any internal changes, e.g. to how KVM handles _any_ guest CPUID changes, yield the same functional result. Therefore, KVM's policy is to let userspace have full control of the guest vCPU model so long as the host kernel is not at risk. Now that KVM really truly ensures kvm_set_msr() will succeed by loading PERF_GLOBAL_CTRL if and only if it exists, revert KVM's misguided and roundabout behavior. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [sean: make it a pure revert] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:24:41 -04:00
Sean Christopherson	4496a6f9b4	KVM: nVMX: Attempt to load PERF_GLOBAL_CTRL on nVMX xfer iff it exists Attempt to load PERF_GLOBAL_CTRL during nested VM-Enter/VM-Exit if and only if the MSR exists (according to the guest vCPU model). KVM has very misguided handling of VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and attempts to force the nVMX MSR settings to match the vPMU model, i.e. to hide/expose the control based on whether or not the MSR exists from the guest's perspective. KVM's modifications fail to handle the scenario where the vPMU is hidden from the guest _after_ being exposed to the guest, e.g. by userspace doing multiple KVM_SET_CPUID2 calls, which is allowed if done before any KVM_RUN. nested_vmx_pmu_refresh() is called if and only if there's a recognized vPMU, i.e. KVM will leave the bits in the allow state and then ultimately reject the MSR load and WARN. KVM should not force the VMX MSRs in the first place. KVM taking control of the MSRs was a misguided attempt at mimicking what commit `5f76f6f5ff` ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled", 2018-10-01) did for MPX. However, the MPX commit was a workaround for another KVM bug and not something that should be imitated (and it should never been done in the first place). In other words, KVM's ABI _should_ be that userspace has full control over the MSRs, at which point triggering the WARN that loading the MSR must not fail is trivial. The intent of the WARN is still valid; KVM has consistency checks to ensure that vmcs12->{guest,host}_ia32_perf_global_ctrl is valid. The problem is that '0' must be considered a valid value at all times, and so the simple/obvious solution is to just not actually load the MSR when it does not exist. It is userspace's responsibility to provide a sane vCPU model, i.e. KVM is well within its ABI and Intel's VMX architecture to skip the loads if the MSR does not exist. Fixes: `03a8871add` ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:24:22 -04:00
Sean Christopherson	b663f0b5f3	KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is unintuitive _and_ diverges from Intel's architecturally defined behavior. Even worse, KVM currently implements the check using two different (but equivalent) checks, _and_ there has been at least one attempt to add a _third_ flavor. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:23:57 -04:00
Sean Christopherson	93255bf929	KVM: VMX: Mark all PERF_GLOBAL_(OVF)_CTRL bits reserved if there's no vPMU Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits as reserved if there is no guest vPMU. The nVMX VM-Entry consistency checks do not check for a valid vPMU prior to consuming the masks via kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the actual MSR load will be rejected by intel_is_valid_msr()). Fixes: `f5132b0138` ("KVM: Expose a version 2 architectural PMU to a guests") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:23:40 -04:00
Paolo Bonzini	8805875aa4	Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled" Since commit `5f76f6f5ff` ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"), KVM has taken ownership of the "load IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls, trying to set these bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports MPX, and clear otherwise. The intent of the patch was to apply it to L0 in order to work around L1 kernels that lack the fix in commit `691bd4340b` ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS", 2017-07-04): by hiding the control bits from L0, L1 hides BNDCFGS from KVM_GET_MSR_INDEX_LIST, and the L1 bug is neutralized even in the lack of commit `691bd4340b`. This was perhaps a sensible kludge at the time, but a horrible idea in the long term and in fact it has not been extended to other CPUID bits like these: X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, SECONDARY_EXEC_TSC_SCALING X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES These days it's sort of common knowledge that any MSR in KVM_GET_MSR_INDEX_LIST must allow at least setting it with KVM_SET_MSR to a default value, so it is unlikely that something like commit `5f76f6f5ff` will be needed again. So revert it, at the potential cost of breaking L1s with a 6 year old kernel. While in principle the L0 owner doesn't control what runs on L1, such an old hypervisor would probably have many other bugs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:28 -04:00
Sean Christopherson	f8ae08f978	KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Restrict the nVMX MSRs based on KVM's config, not based on the guest's current config. Using the guest's config to audit the new config prevents userspace from restoring the original config (KVM's config) if at any point in the past the guest's config was restricted in any way. Fixes: `62cc6b9dc6` ("KVM: nVMX: support restore of VMX capability MSRs") Cc: stable@vger.kernel.org Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:28 -04:00
Sean Christopherson	a645c2b506	KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}() Rename the exit handlers for VMXON and VMXOFF to match the instruction names, the terms "vmon" and "vmoff" are not used anywhere in Intel's documentation, nor are they used elsehwere in KVM. Sadly, the exit reasons are exposed to userspace and so cannot be renamed without breaking userspace. :-( Fixes: `ec378aeef9` ("KVM: nVMX: Implement VMXON and VMXOFF") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:27 -04:00
Sean Christopherson	c7d855c2af	KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 Inject a #UD if L1 attempts VMXON with a CR0 or CR4 that is disallowed per the associated nested VMX MSRs' fixed0/1 settings. KVM cannot rely on hardware to perform the checks, even for the few checks that have higher priority than VM-Exit, as (a) KVM may have forced CR0/CR4 bits in hardware while running the guest, (b) there may incompatible CR0/CR4 bits that have lower priority than VM-Exit, e.g. CR0.NE, and (c) userspace may have further restricted the allowed CR0/CR4 values by manipulating the guest's nested VMX MSRs. Note, despite a very strong desire to throw shade at Jim, commit `70f3aac964` ("kvm: nVMX: Remove superfluous VMX instruction fault checks") is not to blame for the buggy behavior (though the comment...). That commit only removed the CR0.PE, EFLAGS.VM, and COMPATIBILITY mode checks (though it did erroneously drop the CPL check, but that has already been remedied). KVM may force CR0.PE=1, but will do so only when also forcing EFLAGS.VM=1 to emulate Real Mode, i.e. hardware will still #UD. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216033 Fixes: `ec378aeef9` ("KVM: nVMX: Implement VMXON and VMXOFF") Reported-by: Eric Li <ercli@ucdavis.edu> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:26 -04:00
Sean Christopherson	ca58f3aa53	KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks Check that the guest (L2) and host (L1) CR4 values that would be loaded by nested VM-Enter and VM-Exit respectively are valid with respect to KVM's (L0 host) allowed CR4 bits. Failure to check KVM reserved bits would allow L1 to load an illegal CR4 (or trigger hardware VM-Fail or failed VM-Entry) by massaging guest CPUID to allow features that are not supported by KVM. Amusingly, KVM itself is an accomplice in its doom, as KVM adjusts L1's MSR_IA32_VMX_CR4_FIXED1 to allow L1 to enable bits for L2 based on L1's CPUID model. Note, although nested_{guest,host}_cr4_valid() are _currently_ used if and only if the vCPU is post-VMXON (nested.vmxon == true), that may not be true in the future, e.g. emulating VMXON has a bug where it doesn't check the allowed/required CR0/CR4 bits. Cc: stable@vger.kernel.org Fixes: `3899152ccb` ("KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:26 -04:00
Sean Christopherson	c33f6f2228	KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:25 -04:00
Sean Christopherson	cfe12e64b0	KVM: selftests: Add an option to run vCPUs while disabling dirty logging Add a command line option to dirty_log_perf_test to run vCPUs for the entire duration of disabling dirty logging. By default, the test stops running runs vCPUs before disabling dirty logging, which is faster but less interesting as it doesn't stress KVM's handling of contention between page faults and the zapping of collapsible SPTEs. Enabling the flag also lets the user verify that KVM is indeed rebuilding zapped SPTEs as huge pages by checking KVM's pages_{1g,2m,4k} stats. Without vCPUs to fault in the zapped SPTEs, the stats will show that KVM is zapping pages, but they never show whether or not KVM actually allows huge pages to be recreated. Note! Enabling the flag can _significantly_ increase runtime, especially if the thread that's disabling dirty logging doesn't have a dedicated pCPU, e.g. if all pCPUs are used to run vCPUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715232107.3775620-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:24 -04:00
Sean Christopherson	85f44f8cc0	KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs When zapping collapsible SPTEs in the TDP MMU, don't bottom out on a leaf SPTE now that KVM doesn't require a PFN to compute the host mapping level, i.e. now that there's no need to first find a leaf SPTE and then step back up. Drop the now unused tdp_iter_step_up(), as it is not the safest of helpers (using any of the low level iterators requires some understanding of the various side effects). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715232107.3775620-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:24 -04:00
Sean Christopherson	65e3b446bc	KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Add a comment to document how host_pfn_mapping_level() can be used safely, as the line between safe and dangerous is quite thin. E.g. if KVM were to ever support in-place promotion to create huge pages, consuming the level is safe if the caller holds mmu_lock and checks that there's an existing _leaf_ SPTE, but unsafe if the caller only checks that there's a non-leaf SPTE. Opportunistically tweak the existing comments to explicitly document why KVM needs to use READ_ONCE(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715232107.3775620-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:23 -04:00
Sean Christopherson	a8ac499bb6	KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs Drop the requirement that a pfn be backed by a refcounted, compound or or ZONE_DEVICE, struct page, and instead rely solely on the host page tables to identify huge pages. The PageCompound() check is a remnant of an old implementation that identified (well, attempt to identify) huge pages without walking the host page tables. The ZONE_DEVICE check was added as an exception to the PageCompound() requirement. In other words, neither check is actually a hard requirement, if the primary has a pfn backed with a huge page, then KVM can back the pfn with a huge page regardless of the backing store. Dropping the @pfn parameter will also allow KVM to query the max host mapping level without having to first get the pfn, which is advantageous for use outside of the page fault path where KVM wants to take action if and only if a page can be mapped huge, i.e. avoids the pfn lookup for gfns that can't be backed with a huge page. Cc: Mingwei Zhang <mizhang@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220715232107.3775620-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:22 -04:00
Sean Christopherson	d5e90a6998	KVM: x86/mmu: Restrict mapping level based on guest MTRR iff they're used Restrict the mapping level for SPTEs based on the guest MTRRs if and only if KVM may actually use the guest MTRRs to compute the "real" memtype. For all forms of paging, guest MTRRs are purely virtual in the sense that they are completely ignored by hardware, i.e. they affect the memtype only if software manually consumes them. The only scenario where KVM consumes the guest MTRRs is when shadow_memtype_mask is non-zero and the guest has non-coherent DMA, in all other cases KVM simply leaves the PAT field in SPTEs as '0' to encode WB memtype. Note, KVM may still ultimately ignore guest MTRRs, e.g. if the backing pfn is host MMIO, but false positives are ok as they only cause a slight performance blip (unless the guest is doing weird things with its MTRRs, which is extremely unlikely). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220715230016.3762909-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:22 -04:00
Sean Christopherson	38bf9d7bf2	KVM: x86/mmu: Add shadow mask for effective host MTRR memtype Add shadow_memtype_mask to capture that EPT needs a non-zero memtype mask instead of relying on TDP being enabled, as NPT doesn't need a non-zero mask. This is a glorified nop as kvm_x86_ops.get_mt_mask() returns zero for NPT anyways. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220715230016.3762909-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:21 -04:00
Sean Christopherson	82ffad2ddf	KVM: x86: Drop unnecessary goto+label in kvm_arch_init() Return directly if kvm_arch_init() detects an error before doing any real work, jumping through a label obfuscates what's happening and carries the unnecessary risk of leaving 'r' uninitialized. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220715230016.3762909-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:20 -04:00
Sean Christopherson	94bda2f4cd	KVM: x86: Reject loading KVM if host.PAT[0] != WB Reject KVM if entry '0' in the host's IA32_PAT MSR is not programmed to writeback (WB) memtype. KVM subtly relies on IA32_PAT entry '0' to be programmed to WB by leaving the PAT bits in shadow paging and NPT SPTEs as '0'. If something other than WB is in PAT[0], at _best_ guests will suffer very poor performance, and at worst KVM will crash the system by breaking cache-coherency expecations (e.g. using WC for guest memory). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220715230016.3762909-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:20 -04:00
Suravee Suthikulpanit	01e69cef63	KVM: SVM: Fix x2APIC MSRs interception The index for svm_direct_access_msrs was incorrectly initialized with the APIC MMIO register macros. Fix by introducing a macro for calculating x2APIC MSRs. Fixes: `5c127c8547` ("KVM: SVM: Adding support for configuring x2APIC MSRs interception") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220718083833.222117-1-suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:19 -04:00
Sean Christopherson	3c2e10373e	KVM: x86/mmu: Remove underscores from __pte_list_remove() Remove the underscores from __pte_list_remove(), the function formerly known as pte_list_remove() is now named kvm_zap_one_rmap_spte() to show that it zaps rmaps/PTEs, i.e. doesn't just remove an entry from a list. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:18 -04:00
Sean Christopherson	9202aee816	KVM: x86/mmu: Rename pte_list_{destroy,remove}() to show they zap SPTEs Rename pte_list_remove() and pte_list_destroy() to kvm_zap_one_rmap_spte() and kvm_zap_all_rmap_sptes() respectively to document that (a) they zap SPTEs and (b) to better document how they differ (remove vs. destroy does not exactly scream "one vs. all"). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:18 -04:00
Sean Christopherson	f8480721a7	KVM: x86/mmu: Rename rmap zap helpers to eliminate "unmap" wrapper Rename kvm_unmap_rmap() and kvm_zap_rmap() to kvm_zap_rmap() and __kvm_zap_rmap() respectively to show that what was the "unmap" helper is just a wrapper for the "zap" helper, i.e. that they do the exact same thing, one just exists to deal with its caller passing in more params. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:17 -04:00
Sean Christopherson	2833eda0e2	KVM: x86/mmu: Rename __kvm_zap_rmaps() to align with other nomenclature Rename __kvm_zap_rmaps() to kvm_rmap_zap_gfn_range() to avoid future confusion with a soon-to-be-introduced __kvm_zap_rmap(). Using a plural "rmaps" is somewhat ambiguous without additional context, as it's not obvious whether it's referring to multiple rmap lists, versus multiple rmap entries within a single list. Use kvm_rmap_zap_gfn_range() to align with the pattern established by kvm_rmap_zap_collapsible_sptes(), without losing the information that it zaps only rmap-based MMUs, i.e. don't rename it to __kvm_zap_gfn_range(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:16 -04:00
Sean Christopherson	aed02fe3ca	KVM: x86/mmu: Drop the "p is for pointer" from rmap helpers Drop the trailing "p" from rmap helpers, i.e. rename functions to simply be kvm_<action>_rmap(). Declaring that a function takes a pointer is completely unnecessary and goes against kernel style. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:16 -04:00
Sean Christopherson	a42989e7fb	KVM: x86/mmu: Directly "destroy" PTE list when recycling rmaps Use pte_list_destroy() directly when recycling rmaps instead of bouncing through kvm_unmap_rmapp() and kvm_zap_rmapp(). Calling kvm_unmap_rmapp() is unnecessary and odd as it requires passing dummy parameters; passing NULL for @slot when __rmap_add() already has a valid slot is especially weird and confusing. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:15 -04:00
Sean Christopherson	35d539c3e4	KVM: x86/mmu: Return a u64 (the old SPTE) from mmu_spte_clear_track_bits() Return a u64, not an int, from mmu_spte_clear_track_bits(). The return value is the old SPTE value, which is very much a 64-bit value. The sole caller that consumes the return value, drop_spte(), already uses a u64. The only reason that truncating the SPTE value is not problematic is because drop_spte() only queries the shadow-present bit, which is in the lower 32 bits. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220715224226.3749507-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:15 -04:00
Maciej S. Szmigiero	da0b93d65e	KVM: nSVM: Pull CS.Base from actual VMCB12 for soft int/ex re-injection enter_svm_guest_mode() first calls nested_vmcb02_prepare_control() to copy control fields from VMCB12 to the current VMCB, then nested_vmcb02_prepare_save() to perform a similar copy of the save area. This means that nested_vmcb02_prepare_control() still runs with the previous save area values in the current VMCB so it shouldn't take the L2 guest CS.Base from this area. Explicitly pull CS.Base from the actual VMCB12 instead in enter_svm_guest_mode(). Granted, having a non-zero CS.Base is a very rare thing (and even impossible in 64-bit mode), having it change between nested VMRUNs is probably even rarer, but if it happens it would create a really subtle bug so it's better to fix it upfront. Fixes: `6ef88d6e36` ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <4caa0f67589ae3c22c311ee0e6139496902f2edc.1658159083.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-28 13:22:14 -04:00
Linus Torvalds	e64ab2dbd8	watch_queue: Fix missing locking in add_watch_to_object() If a watch is being added to a queue, it needs to guard against interference from addition of a new watch, manual removal of a watch and removal of a watch due to some other queue being destroyed. KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by holding the key->sem writelocked and by holding refs on both the key and the queue - but that doesn't prevent interaction from other {key,queue} pairs. While add_watch_to_object() does take the spinlock on the event queue, it doesn't take the lock on the source's watch list. The assumption was that the caller would prevent that (say by taking key->sem) - but that doesn't prevent interference from the destruction of another queue. Fix this by locking the watcher list in add_watch_to_object(). Fixes: `c73be61ced` ("pipe: Add general notification queue support") Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: keyrings@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-28 10:06:49 -07:00
David Howells	e0339f036e	watch_queue: Fix missing rcu annotation Since __post_watch_notification() walks wlist->watchers with only the RCU read lock held, we need to use RCU methods to add to the list (we already use RCU methods to remove from the list). Fix add_watch_to_object() to use hlist_add_head_rcu() instead of hlist_add_head() for that list. Fixes: `c73be61ced` ("pipe: Add general notification queue support") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-28 10:06:49 -07:00
Sumanth Korikkar	ded466e180	s390/unwind: fix fgraph return address recovery When HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is defined, the return address to the fgraph caller is recovered by tagging it along with the stack pointer of ftrace stack. This makes the stack unwinding more reliable. When the fgraph return address is modified to return_to_handler, ftrace_graph_ret_addr tries to restore it to the original value using tagged stack pointer. Fix this by passing tagged sp to ftrace_graph_ret_addr. Fixes: `d81675b60d` ("s390/unwind: recover kretprobe modified return address in stacktrace") Cc: <stable@vger.kernel.org> # 5.18 Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:24 +02:00
Sven Schnelle	520763a327	s390/nmi: use irqentry_nmi_enter()/irqentry_nmi_exit() With generic entry in place switch the nmi handler to use the generic entry helper functions. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:24 +02:00
Janosch Frank	a0c0c44e9a	s390: add ELF note type for encrypted CPU state of a PV VCPU The type NT_S390_PV_CPU_DATA note contains the encrypted CPU state of a PV VCPU. It's only relevant in dumps of s390 PV VMs and can't be decrypted without a second block of encrypted data which provides key parts. Therefore we only reserve the note type here. The zgetdump tool from the s390-tools package can, together with a Customer Communication Key, be used to convert a PV VM dump into a normal VM dump. zgetdump will decrypt the CPU data and overwrite the other respective notes to make the data accessible for crash and other debugging tools. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> [agordeev@linux.ibm.com changed desctiption] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:24 +02:00
Alexander Gordeev	e409b7f191	s390/smp,ptdump: add absolute lowcore markers Add "Lowcore Area Start" and "Lowcore Area End" markers that fence pages where absolute lowcore resides. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:24 +02:00
Alexander Gordeev	7d06fed77b	s390/smp: rework absolute lowcore access Temporary unsetting of the prefix page in memcpy_absolute() routine poses a risk of executing code path with unexpectedly disabled prefix page. This rework avoids the prefix page uninstalling and disabling of normal and machine check interrupts when accessing the absolute zero memory. Although memcpy_absolute() routine can access the whole memory, it is only used to update the absolute zero lowcore. This rework therefore introduces a new mechanism for the absolute zero lowcore access and scraps memcpy_absolute() routine for good. Instead, an area is reserved in the virtual memory that is used for the absolute lowcore access only. That area holds an array of 8KB virtual mappings - one per CPU. Whenever a CPU is brought online, the corresponding item is mapped to the real address of the previously installed prefix page. The absolute zero lowcore access works like this: a CPU calls the new primitive get_abs_lowcore() to obtain its 8KB mapping as a pointer to the struct lowcore. Virtual address references to that pointer get translated to the real addresses of the prefix page, which in turn gets swapped with the absolute zero memory addresses due to prefixing. Once the pointer is not needed it must be released with put_abs_lowcore() primitive: struct lowcore *abs_lc; unsigned long flags; abs_lc = get_abs_lowcore(&flags); abs_lc->... = ...; put_abs_lowcore(abs_lc, flags); To ensure the described mechanism works large segment- and region- table entries must be avoided for the 8KB mappings. Failure to do so results in usage of Region-Frame Absolute Address (RFAA) or Segment-Frame Absolute Address (SFAA) large page fields. In that case absolute addresses would be used to address the prefix page instead of the real ones and the prefixing would get bypassed. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:23 +02:00
Alexander Gordeev	2e2493c675	s390/setup: rearrange absolute lowcore initialization Make the absolute lowcore assignments immediately follow the boot CPU lowcore same member assignments. This way readability improves when reading from up to down, with no out of order mcck stack allocation in-between. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:23 +02:00
Alexander Gordeev	57ad19bcde	s390/boot: cleanup adjust_to_uv_max() function Uncouple input and output arguments by making the latter the function return value. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:23 +02:00
Alexander Gordeev	6f5c672d17	s390/smp: enforce lowcore protection on CPU restart As result of commit `915fea04f9` ("s390/smp: enable DAT before CPU restart callback is called") the low-address protection bit gets mistakenly unset in control register 0 save area of the absolute zero memory. That area is used when manual PSW restart happened to hit an offline CPU. In this case the low-address protection for that CPU will be dropped. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Fixes: `915fea04f9` ("s390/smp: enable DAT before CPU restart callback is called") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:23 +02:00
Jason Wang	fc7fab3f91	s390/tape: fix comment typo Remove duplicated `that' in a comment Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Link: https://lore.kernel.org/r/20220715053838.5005-1-wangborong@cdjrlc.com [agordeev@linux.ibm.com rephrased commit message] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:23 +02:00
Randy Dunlap	57c3ae8e44	s390/hmcdrv: fix Kconfig "its" grammar Use the possessive "its" instead of the contraction "it's" where appropriate. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: linux-s390@vger.kernel.org Link: https://lore.kernel.org/r/20220715020010.12678-1-rdunlap@infradead.org Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 18:05:22 +02:00
Alexander Gordeev	2f089a3846	Merge branch 'vmcore-iov_iter' into features Pull changes that finalize switching of copy_oldmem_page() callback to iov_iter interface. These changes were pulled in work.iov_iter of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2022-07-28 17:53:11 +02:00
wangjianli	dd390cba54	IB/qib: Fix repeated "in" within comments Delete the redundant word 'in'. Link: https://lore.kernel.org/r/20220724074407.18552-1-wangjianli@cdjrlc.com Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-28 12:50:05 -03:00
João Paulo Rechi Vita	339170d8d3	docs: efi-stub: Fix paths for x86 / arm stubs This fixes the paths of x86 / arm efi-stub source files. Signed-off-by: João Paulo Rechi Vita <jprvita@endlessos.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220727140539.10021-1-jprvita@endlessos.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:41:56 -06:00
Yanteng Si	4116ff7974	Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8 Update to commit `6c757e9f55` ("docs/scheduler: fix unit error") `ddb21d27a6` ("docs/scheduler: Change unit of cpu_time and rq_time to nanoseconds") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Wu XiangCheng <bobwxc@email.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/3cb1c4c466dfa38d72a867dc6e2c833ceb69ecb7.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	ce1120076c	Docs/zh_CN: Update the translation of pci to 5.19-rc8 Update to commit `f21949c149` ("PCI/doc:Update obsolete pci_set_dma_mask() references") `05b0ebd06a` ("PCI/doc: cleanup references to the legacy PCI DMA API") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Wu XiangCheng <bobwxc@email.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/bccb98a1c6706ecfd4aaba8c27f1c64024e1c139.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	c78478e164	Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8 Update to commit `4f23bd5d09` ("PCI/doc: Convert examples to generic power management") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/24c53aabc9d942f6fe38b8d3a843329edca3d18c.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	83b41bb27b	Docs/zh_CN: Update the translation of usage to 5.19-rc8 Update to commit `b57e39a743` ("Docs/admin-guide /damon/sysfs: document 'LRU_DEPRIO' scheme action") `0bcba960b1` ("Docs/admin-guide/damon/sysfs: document 'LRU_PRIO' scheme action") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/130795ae1a4097e25e8e354870e65c3018241a8e.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	63c1d2516b	Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8 Update to commit `12379401c0` ("Documentation: dev-tools: Add a section for static analysis tools") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/b9911916446f858fe95e4d4bfeaecdab14bc0b6a.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	6a5057e9dc	Docs/zh_CN: Update the translation of sparse to 5.19-rc8 Update to commit179fd6ba3bac ("Documentation/sparse: add hints about __CHECKER__") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/2ecdf7ddc8644e9031e4e2af947b97d63ab046f0.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	507f48799a	Docs/zh_CN: Update the translation of kasan to 5.19-rc8 update to commit `3ff16d30f5` ("kasan: test: improve failure message in KUNIT_EXPECT_KASAN_FAIL()") `c9d1af2b78` ("mm/kasan: move kasan.fault to mm/kasan/report.c") `2d27e58514` ("kasan: Extend KASAN mode kernel parameter") `8479d7b5be` ("kasan: documentation updates") `c2ec0c8f68` ("kasan: update documentation") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/abbb6c5cc5a7daf0720a9cb0a8255472d4ac69b4.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00
Yanteng Si	659797dc4d	Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8 update to commit `dafcf4ed83` ("iio: hrtimer: Allow sub Hz granularity"). `c1d82dbcb0` ("docs: iio: fix example formatting"). Remove some useless spaces. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/1c0ae3e0932d813d41de666c1c65379786e79ba6.1658983157.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-07-28 09:37:25 -06:00

... 84 85 86 87 88 ...

1123688 Commits