enable CONFIG_BLK_INLINE_ENCRYPTION and
CONFIG_FS_ENCRYPTION_INLINE_CRYPT
Bug: 137270441
Test: Test cuttlefish boots both with and without inlinecrypt mount
option specified in fstab, while using both F2FS and EXT4 for
userdata.img. Also tested by running gce-xfstests on both the
auto and encrypt test groups on EXT4 and F2FS both with and
without the inlinecrypt mount option. The UFS changes were
tested on a Pixel 4 device.
Change-Id: I26aac0ac7845a9064f28bb1421eb2522828a6dec
Signed-off-by: Satya Tangirala <satyat@google.com>
Wire up ext4 to support inline encryption via the helper functions which
fs/crypto/ now provides. This includes:
- Adding a mount option 'inlinecrypt' which enables inline encryption
on encrypted files where it can be used.
- Setting the bio_crypt_ctx on bios that will be submitted to an
inline-encrypted file.
Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for
this part, since ext4 sometimes uses ll_rw_block() on file data.
- Not adding logically discontiguous data to bios that will be submitted
to an inline-encrypted file.
- Not doing filesystem-layer crypto on inline-encrypted files.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I54a8efe388289918f4144d8138fb87aa507ae760
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214781/
Wire up f2fs to support inline encryption via the helper functions which
fs/crypto/ now provides. This includes:
- Adding a mount option 'inlinecrypt' which enables inline encryption
on encrypted files where it can be used.
- Setting the bio_crypt_ctx on bios that will be submitted to an
inline-encrypted file.
- Not adding logically discontiguous data to bios that will be submitted
to an inline-encrypted file.
- Not doing filesystem-layer crypto on inline-encrypted files.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I50aee94acab3cf0922bb023bcc0d450744781812
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214785/
Add support for inline encryption to fs/crypto/. With "inline
encryption", the block layer handles the decryption/encryption as part
of the bio, instead of the filesystem doing the crypto itself via
Linux's crypto API. This model is needed in order to take advantage of
the inline encryption hardware present on most modern mobile SoCs.
To use inline encryption, the filesystem needs to be mounted with
'-o inlinecrypt'. The contents of any AES-256-XTS encrypted files will
then be encrypted using blk-crypto, instead of using the traditional
filesystem-layer crypto. fscrypt still provides the key and IV to use,
and the actual ciphertext on-disk is still the same; therefore it's
testable using the existing fscrypt ciphertext verification tests.
Note that since blk-crypto has a fallack to Linux's crypto API, this
feature is usable and testable even without actual inline encryption
hardware.
Per-filesystem changes will be needed to set encryption contexts when
submitting bios and to implement the 'inlinecrypt' mount option. This
patch just adds the common code.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I238b5484f3798dd4d829be5535234b53951db0ea
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214761/
Wire up ufshcd.c with the UFS Crypto API, the block layer inline
encryption additions and the keyslot manager.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I274282d9209932156ce806c3d656470bd040f5b3
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214753/
Introduce functions to manipulate UFS inline encryption hardware
in line with the JEDEC UFSHCI v2.1 specification and to work with the
block keyslot manager.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I4c6ba30f30eaea83da6822381ae2aa85b40e7b90
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214745/
Add the crypto registers and structs defined in v2.1 of the JEDEC UFSHCI
specification in preparation to add support for inline encryption to
UFS.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I640812448f9b7f25dc5b4927f143c89b02529edb
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214737/
We introduce blk-crypto, which manages programming keyslots for struct
bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with
the encryption key, algorithm and data_unit_num; they don't have to worry
about getting a keyslot for each encryption context, as blk-crypto handles
that. Blk-crypto also makes it possible for layered devices like device
mapper to make use of inline encryption hardware.
Blk-crypto delegates crypto operations to inline encryption hardware when
available, and also contains a software fallback to the kernel crypto API.
For more details, refer to Documentation/block/inline-encryption.rst.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I6a98e518e5de50f1d4110441568ecd142a02e900
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214731/
bio_crypt_should_process would WARN that the bio did not have a
keyslot in any keyslot manager even when we were on the decrypt path
of blk-crypto, which is a bug. The WARN is now conditional on the
caller being responible for handling encryption rather than blk-crypto
(i.e. the WARN happens only if this function return true).
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: Id7ef6b066d43bebae146b28edc76e506c7b03235
Signed-off-by: Satya Tangirala <satyat@google.com>
We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the filesystem/fscrypt that knows about and manages encryption
contexts. As such, when the filesystem layer submits a bio to the block
layer, and this bio eventually reaches a device driver with support for
inline encryption, the device driver will need to have been told the
encryption context for that bio.
We want to communicate the encryption context from the filesystem layer
to the storage device along with the bio, when the bio is submitted to the
block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which
can represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I16d99bb97f8cd7971cc11281a0d7120c5f87d83c
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214719/
Inline Encryption hardware allows software to specify an encryption context
(an encryption key, crypto algorithm, data unit num, data unit size, etc.)
along with a data transfer request to a storage device, and the inline
encryption hardware will use that context to en/decrypt the data. The
inline encryption hardware is part of the storage device, and it
conceptually sits on the data path between system memory and the storage
device.
Inline Encryption hardware implementations often function around the
concept of "keyslots". These implementations often have a limited number
of "keyslots", each of which can hold an encryption context (we say that
an encryption context can be "programmed" into a keyslot). Requests made
to the storage device may have a keyslot associated with them, and the
inline encryption hardware will en/decrypt the data in the requests using
the encryption context programmed into that associated keyslot. As
keyslots are limited, and programming keys may be expensive in many
implementations, and multiple requests may use exactly the same encryption
contexts, we introduce a Keyslot Manager to efficiently manage keyslots.
The keyslot manager also functions as the interface that upper layers will
use to program keys into inline encryption hardware. For more information
on the Keyslot Manager, refer to documentation found in
block/keyslot-manager.c and linux/keyslot-manager.h.
Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I9a2dc72d61d5a3c64af379a97dd46155b41193eb
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214713/
f2fs inode numbers are stable across filesystem resizing, and f2fs inode
and file logical block numbers are always 32-bit. So f2fs can always
support IV_INO_LBLK_64 encryption policies. Wire up the needed
fscrypt_operations to declare support.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Change-Id: Ifc5b6bf883d0a049dbccf6a06955f69a1dfc4617
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11210903/
IV_INO_LBLK_64 encryption policies have special requirements from the
filesystem beyond those of the existing encryption policies:
- Inode numbers must never change, even if the filesystem is resized.
- Inode numbers must be <= 32 bits.
- File logical block numbers must be <= 32 bits.
ext4 has 32-bit inode and file logical block numbers. However,
resize2fs can re-number inodes when shrinking an ext4 filesystem.
However, typically the people who would want to use this format don't
care about filesystem shrinking. They'd be fine with a solution that
just prevents the filesystem from being shrunk.
Therefore, add a new feature flag EXT4_FEATURE_COMPAT_STABLE_INODES that
will do exactly that. Then wire up the fscrypt_operations to expose
this flag to fs/crypto/, so that it allows IV_INO_LBLK_64 policies when
this flag is set.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Change-Id: Ia1c456692b0b91eda63e55b050ec16e8c53b499f
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11210907/
Inline encryption hardware compliant with the UFS v2.1 standard or with
the upcoming version of the eMMC standard has the following properties:
(1) Per I/O request, the encryption key is specified by a previously
loaded keyslot. There might be only a small number of keyslots.
(2) Per I/O request, the starting IV is specified by a 64-bit "data unit
number" (DUN). IV bits 64-127 are assumed to be 0. The hardware
automatically increments the DUN for each "data unit" of
configurable size in the request, e.g. for each filesystem block.
Property (1) makes it inefficient to use the traditional fscrypt
per-file keys. Property (2) precludes the use of the existing
DIRECT_KEY fscrypt policy flag, which needs at least 192 IV bits.
Therefore, add a new fscrypt policy flag IV_INO_LBLK_64 which causes the
encryption to modified as follows:
- The encryption keys are derived from the master key, encryption mode
number, and filesystem UUID.
- The IVs are chosen as (inode_number << 32) | file_logical_block_num.
For filenames encryption, file_logical_block_num is 0.
Since the file nonces aren't used in the key derivation, many files may
share the same encryption key. This is much more efficient on the
target hardware. Including the inode number in the IVs and mixing the
filesystem UUID into the keys ensures that data in different files is
nevertheless still encrypted differently.
Additionally, limiting the inode and block numbers to 32 bits and
placing the block number in the low bits maintains compatibility with
the 64-bit DUN convention (property (2) above).
Since this scheme assumes that inode numbers are stable (which may
preclude filesystem shrinking) and that inode and file logical block
numbers are at most 32-bit, IV_INO_LBLK_64 will only be allowed on
filesystems that meet these constraints. These are acceptable
limitations for the cases where this format would actually be used.
Note that IV_INO_LBLK_64 is an on-disk format, not an implementation.
This patch just adds support for it using the existing filesystem layer
encryption. A later patch will add support for inline encryption.
Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Change-Id: Iedecd7fa1ce8eefffdec57257e27e679938b0ad7
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11210909/
memset the struct fscrypt_info to zero before freeing. This isn't
really needed currently, since there's no secret key directly in the
fscrypt_info. But there's a decent chance that someone will add such a
field in the future, e.g. in order to use an API that takes a raw key
such as siphash(). So it's good to do this as a hardening measure.
Change-Id: I1942c6e977f4373c49915d164afddd589cd869c7
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11182405/
Now that ext4 and f2fs implement their own post-read workflow that
supports both fscrypt and fsverity, the fscrypt-only workflow based
around struct fscrypt_ctx is no longer used. So remove the unused code.
This is based on a patch from Chandan Rajendra's "Consolidate FS read
I/O callbacks code" patchset, but rebased onto the latest kernel, folded
__fscrypt_decrypt_bio() into fscrypt_decrypt_bio(), cleaned up
fscrypt_initialize(), and updated the commit message.
Change-Id: I21d126db69eea53c3e6dcec8710fa06ae35f980d
Originally-from: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11182387/
Instead of open-coding the calculations for ESSIV handling, use an ESSIV
skcipher which does all of this under the hood. ESSIV was added to the
crypto API in v5.4.
This is based on a patch from Ard Biesheuvel, but reworked to apply
after all the fscrypt changes that went into v5.4.
Tested with 'kvm-xfstests -c ext4,f2fs -g encrypt', including the
ciphertext verification tests for v1 and v2 encryption policies.
Change-Id: Id0e3cc38fcd9a25a4d55cf19c1b87e5798bf7d90
Originally-from: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11182383/
Causes CONFIG_RELR to be enabled, resulting in a gki_defconfig image size
decrease of 2.2MB/10.4% uncompressed or 170KB/2.0% compressed.
Bug: 137200966
Change-Id: I85d36e346ca54bfc50aaca6804684b9bf16c47f0
Signed-off-by: Peter Collingbourne <pcc@google.com>
When GCC_PLUGIN_STRUCTLEAK was backported, a prompt text mysteriously
made its way into the Kconfig option. Because this option is not
dependent on GCC_PLUGINS, it could become enabled even when building
with "CC=clang allmodconfig", which is not correct. The option is
correctly selected by GCC_PLUGIN_STRUCTLEAK_BYREF_ALL so this prompt
text seems to be unnecessary.
This change also aligns the help text to match upstream, to match the
version that was claimed to have been backported.
Fixes: e0c6791d04 ("BACKPORT: security: Create "kernel hardening" config area")
Bug: 143965122
Test: make CC=clang allmodconfig && make -j
Change-Id: Ia9dc88ec1bbfd3950eda5a3eb698ecd41c7e0c9a
Signed-off-by: Alistair Delva <adelva@google.com>
Changes in 4.19.84
bonding: fix state transition issue in link monitoring
CDC-NCM: handle incomplete transfer of MTU
ipv4: Fix table id reference in fib_sync_down_addr
net: ethernet: octeon_mgmt: Account for second possible VLAN header
net: fix data-race in neigh_event_send()
net: qualcomm: rmnet: Fix potential UAF when unregistering
net: usb: qmi_wwan: add support for DW5821e with eSIM support
NFC: fdp: fix incorrect free object
nfc: netlink: fix double device reference drop
NFC: st21nfca: fix double free
qede: fix NULL pointer deref in __qede_remove()
net: mscc: ocelot: don't handle netdev events for other netdevs
net: mscc: ocelot: fix NULL pointer on LAG slave removal
ipv6: fixes rt6_probe() and fib6_nh->last_probe init
net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
ALSA: timer: Fix incorrectly assigned timer instance
ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series
ALSA: hda/ca0132 - Fix possible workqueue stall
mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
mm, meminit: recalculate pcpu batch and high limits after init completes
mm: thp: handle page cache THP correctly in PageTransCompoundMap
mm, vmstat: hide /proc/pagetypeinfo from normal users
dump_stack: avoid the livelock of the dump_lock
tools: gpio: Use !building_out_of_srctree to determine srctree
perf tools: Fix time sorting
drm/radeon: fix si_enable_smc_cac() failed issue
HID: wacom: generic: Treat serial number and related fields as unsigned
soundwire: depend on ACPI
soundwire: bus: set initial value to port_status
arm64: Do not mask out PTE_RDONLY in pte_same()
ceph: fix use-after-free in __ceph_remove_cap()
ceph: add missing check in d_revalidate snapdir handling
iio: adc: stm32-adc: fix stopping dma
iio: imu: adis16480: make sure provided frequency is positive
iio: srf04: fix wrong limitation in distance measuring
ARM: sunxi: Fix CPU powerdown on A83T
netfilter: nf_tables: Align nft_expr private data to 64-bit
netfilter: ipset: Fix an error code in ip_set_sockfn_get()
intel_th: pci: Add Comet Lake PCH support
intel_th: pci: Add Jasper Lake PCH support
x86/apic/32: Avoid bogus LDR warnings
SMB3: Fix persistent handles reconnect
can: usb_8dev: fix use-after-free on disconnect
can: flexcan: disable completely the ECC mechanism
can: c_can: c_can_poll(): only read status register after status IRQ
can: peak_usb: fix a potential out-of-sync while decoding packets
can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak
can: gs_usb: gs_can_open(): prevent memory leak
can: dev: add missing of_node_put() after calling of_get_child_by_name()
can: mcba_usb: fix use-after-free on disconnect
can: peak_usb: fix slab info leak
configfs: stash the data we need into configfs_buffer at open time
configfs_register_group() shouldn't be (and isn't) called in rmdirable parts
configfs: new object reprsenting tree fragments
configfs: provide exclusion between IO and removals
configfs: fix a deadlock in configfs_symlink()
ALSA: usb-audio: More validations of descriptor units
ALSA: usb-audio: Simplify parse_audio_unit()
ALSA: usb-audio: Unify the release of usb_mixer_elem_info objects
ALSA: usb-audio: Remove superfluous bLength checks
ALSA: usb-audio: Clean up check_input_term()
ALSA: usb-audio: Fix possible NULL dereference at create_yamaha_midi_quirk()
ALSA: usb-audio: remove some dead code
ALSA: usb-audio: Fix copy&paste error in the validator
sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
sched/fair: Fix -Wunused-but-set-variable warnings
usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path
usbip: Implement SG support to vhci-hcd and stub driver
PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30
HID: google: add magnemite/masterball USB ids
dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
dmaengine: sprd: Fix the possible memory leak issue
HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring()
RDMA/mlx5: Clear old rate limit when closing QP
iw_cxgb4: fix ECN check on the passive accept
RDMA/qedr: Fix reported firmware version
net/mlx5e: TX, Fix consumer index of error cqe dump
net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq
scsi: qla2xxx: fixup incorrect usage of host_byte
RDMA/uverbs: Prevent potential underflow
net: openvswitch: free vport unless register_netdevice() succeeds
scsi: lpfc: Honor module parameter lpfc_use_adisc
scsi: qla2xxx: Initialized mailbox to prevent driver load failure
netfilter: nf_flow_table: set timeout before insertion into hashes
ipvs: don't ignore errors in case refcounting ip_vs module fails
ipvs: move old_secure_tcp into struct netns_ipvs
bonding: fix unexpected IFF_BONDING bit unset
macsec: fix refcnt leak in module exit routine
usb: fsl: Check memory resource before releasing it
usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode.
usb: gadget: composite: Fix possible double free memory bug
usb: dwc3: pci: prevent memory leak in dwc3_pci_probe
usb: gadget: configfs: fix concurrent issue between composite APIs
usb: dwc3: remove the call trace of USBx_GFLADJ
perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
perf/x86/uncore: Fix event group support
USB: Skip endpoints with 0 maxpacket length
USB: ldusb: use unsigned size format specifiers
usbip: tools: Fix read_usb_vudc_device() error path handling
RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
RDMA/hns: Prevent memory leaks of eq->buf_list
scsi: qla2xxx: stop timer in shutdown path
nvme-multipath: fix possible io hang after ctrl reconnect
fjes: Handle workqueue allocation failure
net: hisilicon: Fix "Trying to free already-free IRQ"
net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up
net: mscc: ocelot: refuse to overwrite the port's native vlan
iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41
drm/amdgpu: If amdgpu_ib_schedule fails return back the error.
drm/amd/display: Passive DP->HDMI dongle detection fix
hv_netvsc: Fix error handling in netvsc_attach()
usb: dwc3: gadget: fix race when disabling ep with cancelled xfers
NFSv4: Don't allow a cached open with a revoked delegation
net: ethernet: arc: add the missed clk_disable_unprepare
igb: Fix constant media auto sense switching when no cable is connected
e1000: fix memory leaks
pinctrl: intel: Avoid potential glitches if pin is in GPIO mode
ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()
pinctrl: cherryview: Fix irq_valid_mask calculation
blkcg: make blkcg_print_stat() print stats only for online blkgs
iio: imu: mpu6050: Add support for the ICM 20602 IMU
iio: imu: inv_mpu6050: fix no data on MPU6050
mm/filemap.c: don't initiate writeback if mapping has no dirty pages
cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
usbip: Fix free of unallocated memory in vhci tx
netfilter: ipset: Copy the right MAC address in hash:ip,mac IPv6 sets
net: prevent load/store tearing on sk->sk_stamp
iio: imu: mpu6050: Fix FIFO layout for ICM20602
vsock/virtio: fix sock refcnt holding during the shutdown
drm/i915: Rename gen7 cmdparser tables
drm/i915: Disable Secure Batches for gen6+
drm/i915: Remove Master tables from cmdparser
drm/i915: Add support for mandatory cmdparsing
drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
drm/i915: Allow parsing of unsized batches
drm/i915: Add gen9 BCS cmdparsing
drm/i915/cmdparser: Use explicit goto for error paths
drm/i915/cmdparser: Add support for backward jumps
drm/i915/cmdparser: Ignore Length operands during command matching
drm/i915: Lower RM timeout to avoid DSI hard hangs
drm/i915/gen8+: Add RC6 CTX corruption WA
drm/i915/cmdparser: Fix jump whitelist clearing
KVM: x86: use Intel speculation bugs and features as derived in generic x86 code
x86/msr: Add the IA32_TSX_CTRL MSR
x86/cpu: Add a helper function x86_read_arch_cap_msr()
x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
x86/speculation/taa: Add mitigation for TSX Async Abort
x86/speculation/taa: Add sysfs reporting for TSX Async Abort
kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
x86/tsx: Add "auto" option to the tsx= cmdline parameter
x86/speculation/taa: Add documentation for TSX Async Abort
x86/tsx: Add config options to set tsx=on|off|auto
x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
x86/bugs: Add ITLB_MULTIHIT bug infrastructure
x86/cpu: Add Tremont to the cpu vulnerability whitelist
cpu/speculation: Uninline and export CPU mitigations helpers
Documentation: Add ITLB_MULTIHIT documentation
kvm: x86, powerpc: do not allow clearing largepages debugfs entry
kvm: Convert kvm_lock to a mutex
kvm: mmu: Do not release the page inside mmu_set_spte()
KVM: x86: make FNAME(fetch) and __direct_map more similar
KVM: x86: remove now unneeded hugepage gfn adjustment
KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON
KVM: x86: add tracepoints around __direct_map and FNAME(fetch)
KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
kvm: mmu: ITLB_MULTIHIT mitigation
kvm: Add helper function for creating VM worker threads
kvm: x86: mmu: Recovery of shattered NX large pages
Linux 4.19.84
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7a820f00c4b868ed677bb49613f835b7e67a3a06
This reverts commit 87337fb791.
The patch I sent upstream to add iommu support is nicer than this and
also adds mboxes and io-channels support. So just revert this and pull
in the upstream patches to avoid conflicts and pull in support for
mboxes and io-channels.
Change-Id: I98ef50eb5cff310a5717d0fb78eceb04ff2510ec
Signed-off-by: Saravana Kannan <saravanak@google.com>
commit 1aa9b9572b upstream.
The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution. This removes the performance penalty
for walking deeper EPT page tables.
By default, one large page will last about one hour once the guest
reaches a steady state.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c57c80467f upstream.
Add a function to create a kernel thread associated with a given VM. In
particular, it ensures that the worker thread inherits the priority and
cgroups of the calling thread.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8e8c8303f upstream.
With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check resulting in a CPU lockup.
Unfortunately when EPT page tables use huge pages, it is possible for a
malicious guest to cause this situation.
Add a knob to mark huge pages as non-executable. When the nx_huge_pages
parameter is enabled (and we are using EPT), all huge pages are marked as
NX. If the guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.
This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen. With nested EPT, again the nested guest can cause problems shadow
and direct EPT is treated in the same way.
[ tglx: Fixup default to auto and massage wording a bit ]
Originally-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9167ab7993 upstream.
VMX already does so if the host has SMEP, in order to support the combination of
CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in
fact VMX also ends up running with EFER.NXE=1 on old processors that lack the
"load EFER" controls, because it may help avoiding a slow MSR write.
SVM does not have similar code, but it should since recent AMD processors do
support SMEP. So this patch makes the code for the two vendors simpler and
more similar, while fixing an issue with CR0.WP=1 and CR4.SMEP=1 on AMD.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9f2a760b1 upstream.
Note that in such a case it is quite likely that KVM will BUG_ON
in __pte_list_remove when the VM is closed. However, there is no
immediate risk of memory corruption in the host so a WARN_ON is
enough and it lets you gather traces for debugging.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d679b32611 upstream.
After the previous patch, the low bits of the gfn are masked in
both FNAME(fetch) and __direct_map, so we do not need to clear them
in transparent_hugepage_adjust.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fcf2d1bde upstream.
These two functions are basically doing the same thing through
kvm_mmu_get_page, link_shadow_page and mmu_set_spte; yet, for historical
reasons, their code looks very different. This patch tries to take the
best of each and make them very similar, so that it is easy to understand
changes that apply to both of them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43fdcda96e upstream.
Release the page at the call-site where it was originally acquired.
This makes the exit code cleaner for most call sites, since they
do not need to duplicate code between success and the failure
label.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d9ce162cf upstream.
It doesn't seem as if there is any particular need for kvm_lock to be a
spinlock, so convert the lock to a mutex so that sleepable functions (in
particular cond_resched()) can be called while holding it.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 833b45de69 upstream.
The largepages debugfs entry is incremented/decremented as shadow
pages are created or destroyed. Clearing it will result in an
underflow, which is harmless to KVM but ugly (and could be
misinterpreted by tools that use debugfs information), so make
this particular statistic read-only.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm-ppc@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 731dc9df97 upstream.
A kernel module may need to check the value of the "mitigations=" kernel
command line parameter as part of its setup when the module needs
to perform software mitigations for a CPU flaw.
Uninline and export the helper functions surrounding the cpu_mitigations
enum to allow for their usage from a module.
Lastly, privatize the enum and cpu_mitigations variable since the value of
cpu_mitigations can be checked with the exported helper functions.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cad14885a8 upstream.
Add the new cpu family ATOM_TREMONT_D to the cpu vunerability
whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT.
ATOM_TREMONT_D might have mitigations against other issues as well, but
only the ITLB multihit mitigation is confirmed at this point.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db4d30fbb7 upstream.
Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:
https://bugzilla.kernel.org/show_bug.cgi?id=205195
There are other processors affected for which the erratum does not fully
disclose the impact.
This issue affects both bare-metal x86 page tables and EPT.
It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.
Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.
Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 012206a822 upstream.
For new IBRS_ALL CPUs, the Enhanced IBRS check at the beginning of
cpu_bugs_smt_update() causes the function to return early, unintentionally
skipping the MDS and TAA logic.
This is not a problem for MDS, because there appears to be no overlap
between IBRS_ALL and MDS-affected CPUs. So the MDS mitigation would be
disabled and nothing would need to be done in this function anyway.
But for TAA, the TAA_MSG_SMT string will never get printed on Cascade
Lake and newer.
The check is superfluous anyway: when 'spectre_v2_enabled' is
SPECTRE_V2_IBRS_ENHANCED, 'spectre_v2_user' is always
SPECTRE_V2_USER_NONE, and so the 'spectre_v2_user' switch statement
handles it appropriately by doing nothing. So just remove the check.
Fixes: 1b42f01741 ("x86/speculation/taa: Add mitigation for TSX Async Abort")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db616173d7 upstream.
There is a general consensus that TSX usage is not largely spread while
the history shows there is a non trivial space for side channel attacks
possible. Therefore the tsx is disabled by default even on platforms
that might have a safe implementation of TSX according to the current
knowledge. This is a fair trade off to make.
There are, however, workloads that really do benefit from using TSX and
updating to a newer kernel with TSX disabled might introduce a
noticeable regressions. This would be especially a problem for Linux
distributions which will provide TAA mitigations.
Introduce config options X86_INTEL_TSX_MODE_OFF, X86_INTEL_TSX_MODE_ON
and X86_INTEL_TSX_MODE_AUTO to control the TSX feature. The config
setting can be overridden by the tsx cmdline options.
[ bp: Text cleanups from Josh. ]
Suggested-by: Borislav Petkov <bpetkov@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1d38b63ac upstream.
Export the IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0 to guests on TSX
Async Abort(TAA) affected hosts that have TSX enabled and updated
microcode. This is required so that the guests don't complain,
"Vulnerable: Clear CPU buffers attempted, no microcode"
when the host has the updated microcode to clear CPU buffers.
Microcode update also adds support for MSR_IA32_TSX_CTRL which is
enumerated by the ARCH_CAP_TSX_CTRL bit in IA32_ARCH_CAPABILITIES MSR.
Guests can't do this check themselves when the ARCH_CAP_TSX_CTRL bit is
not exported to the guests.
In this case export MDS_NO=0 to the guests. When guests have
CPUID.MD_CLEAR=1, they deploy MDS mitigation which also mitigates TAA.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b42f01741 upstream.
TSX Async Abort (TAA) is a side channel vulnerability to the internal
buffers in some Intel processors similar to Microachitectural Data
Sampling (MDS). In this case, certain loads may speculatively pass
invalid data to dependent operations when an asynchronous abort
condition is pending in a TSX transaction.
This includes loads with no fault or assist condition. Such loads may
speculatively expose stale data from the uarch data structures as in
MDS. Scope of exposure is within the same-thread and cross-thread. This
issue affects all current processors that support TSX, but do not have
ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.
On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
using VERW or L1D_FLUSH, there is no additional mitigation needed for
TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
disabling the Transactional Synchronization Extensions (TSX) feature.
A new MSR IA32_TSX_CTRL in future and current processors after a
microcode update can be used to control the TSX feature. There are two
bits in that MSR:
* TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
Transactional Memory (RTM).
* TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
disabled with updated microcode but still enumerated as present by
CPUID(EAX=7).EBX{bit4}.
The second mitigation approach is similar to MDS which is clearing the
affected CPU buffers on return to user space and when entering a guest.
Relevant microcode update is required for the mitigation to work. More
details on this approach can be found here:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
The TSX feature can be controlled by the "tsx" command line parameter.
If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
deployed. The effective mitigation state can be read from sysfs.
[ bp:
- massage + comments cleanup
- s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
- remove partial TAA mitigation in update_mds_branch_idle() - Josh.
- s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95c5824f75 upstream.
Add a kernel cmdline parameter "tsx" to control the Transactional
Synchronization Extensions (TSX) feature. On CPUs that support TSX
control, use "tsx=on|off" to enable or disable TSX. Not specifying this
option is equivalent to "tsx=off". This is because on certain processors
TSX may be used as a part of a speculative side channel attack.
Carve out the TSX controlling functionality into a separate compilation
unit because TSX is a CPU feature while the TSX async abort control
machinery will go to cpu/bugs.c.
[ bp: - Massage, shorten and clear the arg buffer.
- Clarifications of the tsx= possible options - Josh.
- Expand on TSX_CTRL availability - Pawan. ]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>