In ns_to_kernel_old_timeval() definition, the function argument is defined
with const identifier in kernel/time/time.c, but the prototype in
include/linux/time32.h looks different.
- The function is defined in kernel/time/time.c as below:
struct __kernel_old_timeval ns_to_kernel_old_timeval(const s64 nsec)
- The function is decalared in include/linux/time32.h as below:
extern struct __kernel_old_timeval ns_to_kernel_old_timeval(s64 nsec);
Because the variable of arithmethic types isn't modified in the calling scope,
there's no need to mark arguments as const, which was already mentioned during
review (Link[1) of the original patch.
Likewise remove the "const" keyword in both definition and declaration of
ns_to_timespec64() as requested by Arnd (Link[2]).
Fixes: a84d116916 ("y2038: Introduce struct __kernel_old_timeval")
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20220712094715.2918823-1-youngmin.nam@samsung.com
Link[1]: https://lore.kernel.org/all/20180310081123.thin6wphgk7tongy@gmail.com/
Link[2]: https://lore.kernel.org/all/CAK8P3a3nknJgEDESGdJH91jMj6R_xydFqWASd8r5BbesdvMBgA@mail.gmail.com/
(cherry picked from commit 46dae32fe6)
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I223421667b583168d4ce95ec1af2f76fef9fcf3d
Revert "ANDROID: Convert db845c to a mixed build."
Revert submission 2188970-db845c_mixed_build
Reason for revert: It breaks android-mainline
Reverted Changes:
I6cfb1ef19:ANDROID: Convert db845c to a mixed build.
I6680cb8ef:kleaf: Convert db845c to a mixed build.
I6cfb1ef19:ANDROID: Convert db845c to a mixed build.
Signed-off-by: Ulises Mendez Martinez <umendez@google.com>
Change-Id: I0b7b411d2e5cf13aff257b5827479d20092f2b3b
UFS storage devices require bRefClkFreq attribute to be set to operate
correctly at high speed mode. The necessary value is determined by what the
SoC / board supports. The standard doesn't specify a method to query the
value, so the information needs to be fed in separately.
DT information feeds into setting up the clock framework, so platforms
using DT can get the UFS reference clock frequency from the clock
framework. A special node "ref_clk" from the clock array for the UFS
controller node is used as the source for the information.
On the platforms that do not use DT (e.g. Intel), the alternative mechanism
to feed the intended reference clock frequency is necessary. Specifying the
necessary information in DSD of the UFS controller ACPI node is an
alternative mechanism proposed in this patch. Those can be accessed via
firmware property facility in the kernel and in many ways simillar to
querying properties defined in DT.
This patch introduces a small helper function to query a predetermined ACPI
supplied property of the UFS controller, and uses it to attempt retrieving
reference clock value, unless that was already done by the clock
infrastructure.
Link: https://lore.kernel.org/r/20220715210230.1.I365d113d275117dee8fd055ce4fc7e6aebd0bce9@changeid
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Daniil Lunev <dlunev@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ca452621b8 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: Ifbc20404d72b7a1bbc49fe0652657f397418c055
Since commit 1599069a62 ("phy: core: Warn when phy_power_on is called
before phy_init"), the following warning has been reported:
phy_power_on was called before phy_init
To address this, we need to remove phy_power_on from exynos_ufs_phy_init()
and move it after phy_init. phy_power_off and phy_exit are also necessary
in exynos_ufs_remove().
Link: https://lore.kernel.org/r/20220706020255.151177-4-chanho61.park@samsung.com
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3d73b200f9 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I4bbb4c05281e7a07ae8860b8a53a3dcca1e94d9d
Checks information about tx_lanes, but is not used.
Since commit 1e1e465c6d ("scsi/ufs: qcom: Remove ufs_qcom_phy_*() calls
from host"), tx_lanes is deprecated.
As a result, ufs_qcom_link_startup_notify -> POST_CHANGE action does
nothing. If it is not going to be updated, it can be removed.
Link: https://lore.kernel.org/r/20220627235545.16943-1-cw9316.lee@samsung.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bcec04b3cc git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I5e0f2daf8506d4798f13091fd7e48976285fe023
Fix build warnings:
1.
../drivers/ufs/host/ufs-mediatek.c:1375:5: error: no previous prototype for function 'ufs_mtk_system_suspend' [-Werror,-Wmissing-prototypes]
int ufs_mtk_system_suspend(struct device *dev)
^
../drivers/ufs/host/ufs-mediatek.c:1375:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int ufs_mtk_system_suspend(struct device *dev)
^
static
2.
../drivers/ufs/host/ufs-mediatek.c:702:50: error: format specifies type 'unsigned long' but the argument has type 'int' [-Werror,-Wformat]
snprintf(vcc_name, MAX_VCC_NAME, "vcc-ufs%lu", ver);
~~~ ^~~
%d
Link: https://lore.kernel.org/r/20220623035052.18802-2-stanley.chu@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e7bf1d5006 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: Id57c619219f48dde266be7fceaece0b0085d827d
If CONFIG_PM_SLEEP is not set.
make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-, will fail:
drivers/ufs/host/ufs-mediatek.c: In function ‘ufs_mtk_vreg_fix_vcc’:
drivers/ufs/host/ufs-mediatek.c:688:46: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ [-Wformat=]
snprintf(vcc_name, MAX_VCC_NAME, "vcc-opt%u", res.a1);
~^ ~~~~~~
%lu
drivers/ufs/host/ufs-mediatek.c: In function ‘ufs_mtk_system_suspend’:
drivers/ufs/host/ufs-mediatek.c:1371:8: error: implicit declaration of function ‘ufshcd_system_suspend’; did you mean ‘ufs_mtk_system_suspend’? [-Werror=implicit-function-declaration]
ret = ufshcd_system_suspend(dev);
^~~~~~~~~~~~~~~~~~~~~
ufs_mtk_system_suspend
drivers/ufs/host/ufs-mediatek.c: In function ‘ufs_mtk_system_resume’:
drivers/ufs/host/ufs-mediatek.c:1386:9: error: implicit declaration of function ‘ufshcd_system_resume’; did you mean ‘ufs_mtk_system_resume’? [-Werror=implicit-function-declaration]
return ufshcd_system_resume(dev);
^~~~~~~~~~~~~~~~~~~~
ufs_mtk_system_resume
cc1: some warnings being treated as errors
The declaration of func "ufshcd_system_suspend()" depends on
CONFIG_PM_SLEEP, so the function wrapper ufs_mtk_system_suspend() should
wrapped by CONFIG_PM_SLEEP too.
Link: https://lore.kernel.org/r/20220619115432.205504-1-renzhijie2@huawei.com
Fixes: 3fd23b8dfb ("scsi: ufs: ufs-mediatek: Fix the timing of configuring device regulators")
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f54912b228 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I6fcd2d3c4f88bea6d103a2adfb3c6e9d2d27c64d
Some MediaTek UFS platforms support different VCCQx power rails, for
example, both 1.2v and 1.8v VCCQx, in a single kernel image.
To optimize the system power consumption, provide a way to disable and
release the unused power rail during the device probing.
Link: https://lore.kernel.org/r/20220616053725.5681-12-stanley.chu@mediatek.com
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cb142b6d2f git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: If6da47c0bf8b997f894b82502f29db114be45970
Support multiple VCC sources in MediaTek UFS platforms.
Two options are provided and distinguished by specific device tree
attributes as below examples,
[Option 1: By numbering]
mediatek,ufs-vcc-by-num;
vcc-opt1-supply = <&mt6373_vbuck4_ufs>;
vcc-opt2-supply = <&mt6363_vemc>;
[Option 2: By UFS version]
mediatek,ufs-vcc-by-ver;
vcc-ufs3-supply = <&mt6373_vbuck4_ufs>;
Link: https://lore.kernel.org/r/20220616053725.5681-11-stanley.chu@mediatek.com
Signed-off-by: Alice Chao <alice.chao@mediatek.com>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ece418d029 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I1e141d3eb81e140532b8f277970cb784fe510a53
Patch "Introduce workaround for power mode change"
(commit b57c3d8b54) is not the same patch
as the patch that has been merged upstream. Make the Android code match
the upstream code. This patch modifies the code formatting but does not
change any functionality.
Bug: 239946304
Change-Id: Ifdac5baf0b63a6cff304fcb8a4634e81903532ec
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prevent that both the interrupt handler and the reset handler try to
complete a request at the same time. This patch is the result of an
analysis of the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000120
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE 5.10.107-android13-4-00051-g1e48e8970cca-ab8664745 #1
pc : ufshcd_release_scsi_cmd+0x30/0x46c
lr : __ufshcd_transfer_req_compl+0x4fc/0x9c0
Call trace:
ufshcd_release_scsi_cmd+0x30/0x46c
__ufshcd_transfer_req_compl+0x4fc/0x9c0
ufshcd_poll+0xf0/0x208
ufshcd_sl_intr+0xb8/0xf0
ufshcd_intr+0x168/0x2f4
__handle_irq_event_percpu+0xa0/0x30c
handle_irq_event+0x84/0x178
handle_fasteoi_irq+0x150/0x2e8
__handle_domain_irq+0x114/0x1e4
gic_handle_irq.31846+0x58/0x300
el1_irq+0xe4/0x1c0
cpuidle_enter_state+0x3ac/0x8c4
do_idle+0x2fc/0x55c
cpu_startup_entry+0x84/0x90
kernel_init+0x0/0x310
start_kernel+0x0/0x608
start_kernel+0x4ec/0x608
Link: https://lore.kernel.org/r/20220613214442.212466-4-bvanassche@acm.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2acd76e7b8 git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next)
Bug: 239946304
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Change-Id: I40f87272d6a8360aadf6e811ba204772ffa661bd
If CONFIG_DMA_DECLARE_COHERENT is not set,
make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- will be failed, like this:
drivers/remoteproc/remoteproc_core.c: In function ‘rproc_rvdev_release’:
./include/linux/dma-map-ops.h:182:42: error: statement with no effect [-Werror=unused-value]
#define dma_release_coherent_memory(dev) (0)
^
drivers/remoteproc/remoteproc_core.c:464:2: note: in expansion of macro ‘dma_release_coherent_memory’
dma_release_coherent_memory(dev);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The return type of function dma_release_coherent_memory in CONFIG_DMA_DECLARE_COHERENT area is void, so in !CONFIG_DMA_DECLARE_COHERENT area it should neither return any value nor be defined as zero.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e61c451476 ("dma-mapping: Add dma_release_coherent_memory to DMA API")
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220630123528.251181-1-renzhijie2@huawei.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
(cherry picked from commit 50d6281ce9)
Fixes: 2f36eb1b6a ("FROMLIST: dma-mapping: Add dma_release_coherent_memory to DMA API")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic213836e456bb2fbb20a5ba6d69feb63d6a80101
emulation_proc_handler() changes table->data for proc_dointvec_minmax
and can generate the following Oops if called concurrently with itself:
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
| Internal error: Oops: 96000006 [#1] SMP
| Call trace:
| update_insn_emulation_mode+0xc0/0x148
| emulation_proc_handler+0x64/0xb8
| proc_sys_call_handler+0x9c/0xf8
| proc_sys_write+0x18/0x20
| __vfs_write+0x20/0x48
| vfs_write+0xe4/0x1d0
| ksys_write+0x70/0xf8
| __arm64_sys_write+0x20/0x28
| el0_svc_common.constprop.0+0x7c/0x1c0
| el0_svc_handler+0x2c/0xa0
| el0_svc+0x8/0x200
To fix this issue, keep the table->data as &insn->current_mode and
use container_of() to retrieve the insn pointer. Another mutex is
used to protect against the current_mode update but not for retrieving
insn_emulation as table->data is no longer changing.
Bug: 237540956
Co-developed-by: hewenliang <hewenliang4@huawei.com>
Signed-off-by: hewenliang <hewenliang4@huawei.com>
Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220128090324.2727688-1-hewenliang4@huawei.com
Link: https://lore.kernel.org/r/9A004C03-250B-46C5-BF39-782D7551B00E@tencent.com
Signed-off-by: Will Deacon <will@kernel.org>
[Lee: Added Fixes: tag]
(cherry picked from commit af483947d4
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core)
Fixes: 587064b610 ("arm64: Add framework for legacy instruction emulation")
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If9b96bb79c79903f9d8292e719b06fdef57ef1c5
This driver creates per-cpu hrtimers which are required to do the
periodic 'pet' operation. On a conventional watchdog-core driver, the
userspace is responsible for delivering the 'pet' events by writing to
the particular /dev/watchdogN node. In this case we require a strong
thread affinity to be able to account for lost time on a per vCPU.
This part of the driver is the 'frontend' which is reponsible for
delivering the periodic 'pet' events, configuring the virtual peripheral
and listening for cpu hotplug events. The other part of the driver is
an emulated MMIO device which is part of the KVM virtual machine
monitor and this part accounts for lost time by looking at the
/proc/{}/task/{}/stat entries.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Will Deacon <will@kernel.org>
Bug: 213422094
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220711081720.2870509-3-sebastianene@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I979871fa3487d8320cef628675f164660e30ed8b
(cherry picked from commit 6c93c6f3ba)
[seb: Resolved minor conflict in drivers/misc/Makefile ]
The VCPU stall detection mechanism allows to configure the expiration
duration and the internal counter clock frequency measured in Hz.
Add these properties in the schema.
While this is a memory mapped virtual device, it is expected to be loaded
when the DT contains the compatible: "qemu,vcpu-stall-detector" node.
In a protected VM we trust the generated DT nodes and we don't rely on
the host to present the hardware peripherals.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220711081720.2870509-2-sebastianene@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 952ae488b9)
Change-Id: I495de803e7c8443e378273d1f52365a00b8fc6f2
Changes in 5.15.60
x86/speculation: Make all RETbleed mitigations 64-bit only
selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads
selftests/bpf: Check dst_port only on the client socket
block: fix default IO priority handling again
tools/vm/slabinfo: Handle files in debugfs
ACPI: video: Force backlight native for some TongFang devices
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
ACPI: APEI: Better fix to avoid spamming the console with old error logs
crypto: arm64/poly1305 - fix a read out-of-bound
KVM: x86: do not report a vCPU as preempted outside instruction boundaries
KVM: x86: do not set st->preempted when going back to user space
KVM: selftests: Make hyperv_clock selftest more stable
tools/kvm_stat: fix display of error when multiple processes are found
selftests: KVM: Handle compiler optimizations in ucall
KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()
arm64: set UXN on swapper page tables
btrfs: zoned: prevent allocation from previous data relocation BG
btrfs: zoned: fix critical section of relocation inode writeback
Bluetooth: hci_bcm: Add BCM4349B1 variant
Bluetooth: hci_bcm: Add DT compatible for CYW55572
dt-bindings: bluetooth: broadcom: Add BCM4349B1 DT binding
Bluetooth: btusb: Add support of IMC Networks PID 0x3568
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x04CA:0x4007
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x04C5:0x1675
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x0CB8:0xC558
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x13D3:0x3587
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x13D3:0x3586
macintosh/adb: fix oob read in do_adb_query() function
x86/speculation: Add RSB VM Exit protections
x86/speculation: Add LFENCE to RSB fill sequence
Linux 5.15.60
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I297301e2121a2fdda063cada7377da1ae414da2a
commit ba6e31af2b upstream.
RSB fill sequence does not have any protection for miss-prediction of
conditional branch at the end of the sequence. CPU can speculatively
execute code immediately after the sequence, while RSB filling hasn't
completed yet.
#define __FILL_RETURN_BUFFER(reg, nr, sp) \
mov $(nr/2), reg; \
771: \
ANNOTATE_INTRA_FUNCTION_CALL; \
call 772f; \
773: /* speculation trap */ \
UNWIND_HINT_EMPTY; \
pause; \
lfence; \
jmp 773b; \
772: \
ANNOTATE_INTRA_FUNCTION_CALL; \
call 774f; \
775: /* speculation trap */ \
UNWIND_HINT_EMPTY; \
pause; \
lfence; \
jmp 775b; \
774: \
add $(BITS_PER_LONG/8) * 2, sp; \
dec reg; \
jnz 771b; <----- CPU can miss-predict here.
Before RSB is filled, RETs that come in program order after this macro
can be executed speculatively, making them vulnerable to RSB-based
attacks.
Mitigate it by adding an LFENCE after the conditional branch to prevent
speculation while RSB is being filled.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b12993220 upstream.
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.
== Background ==
Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.
To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced. eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.
== Problem ==
Here's a simplification of how guests are run on Linux' KVM:
void run_kvm_guest(void)
{
// Prepare to run guest
VMRESUME();
// Clean up after guest runs
}
The execution flow for that would look something like this to the
processor:
1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()
Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:
* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.
* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".
IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.
However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.
Balanced CALL/RET instruction pairs such as in step #5 are not affected.
== Solution ==
The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.
However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.
Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.
In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.
There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
[ bp: Massage, incorporate review comments from Andy Cooper. ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>