In preparation for poisoning guest memory pages in the pKVM hypervisor
when being reclaimed by the host, introduce a new 'flags' field in
'struct hyp_page' so that we will be able to track on a per-page basis
whether or not poisoning is required.
Rather than increase the total size of the structure, shrink the 16-bit
'order' field to a single byte and use the recovered space for the new
field.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8eb1f7ed8da0374b878a315eb1a6f867d0f379a9
As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the hyp vCPU on vCPU run and back
again on return to EL1.
This allows us to run using the pKVM hyp structures when KVM is
initialised in protected mode.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-26-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iaf1c07cbf58eaff8a2968e9dc6457d36dcef83cf
We no longer need to map the host's '.rodata' and '.bss' sections in the
stage-1 page-table of the pKVM hypervisor at EL2, so remove those
mappings and avoid creating any future dependencies at EL2 on
host-controlled data structures.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-25-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iddb819bc1ef4006b6ed7490476f28f0e880e1d8c
The pkvm hypervisor at EL2 may need to read the 'kvm_vgic_global_state'
variable from the host, for example when saving and restoring the state
of the virtual GIC.
Explicitly map 'kvm_vgic_global_state' in the stage-1 page-table of the
pKVM hypervisor rather than relying on mapping all of the host '.rodata'
section.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-24-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I13a190d4164a4e0fd68dd5ec88ab5647dd4e73fc
Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.
In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it
from the host value while it is still trusted.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-23-will@kernel.org
[willdeacon@: Fixed context conflict with CPU cap symbols in image-vars.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I613e7c0ef747324e73cf4d4f543354a3641c7505
When pKVM is enabled, the hypervisor at EL2 does not trust the host at
EL1 and must therefore prevent it from having unrestricted access to
internal hypervisor state.
The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor
per-cpu allocations, so move this this into the nVHE code where it
cannot be modified by the untrusted host at EL1.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-22-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8d67b2905ac97e15f4252d45c36b97e53d3072ce
Rather than relying on the host to free the previously-donated pKVM
hypervisor VM pages explicitly on teardown, introduce a dedicated
teardown memcache which allows the host to reclaim guest memory
resources without having to keep track of all of the allocations made by
the pKVM hypervisor at EL2.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-21-will@kernel.org
[willdeacon@: Fix GCC compat error due to variable declaration in for loop
initializer prior to C99]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ib43b68f36fdaf5aac578f177ab8260c72acc6ed5
Extend the initialisation of guest data structures within the pKVM
hypervisor at EL2 so that we instantiate a memory pool and a full
'struct kvm_s2_mmu' structure for each VM, with a stage-2 page-table
entirely independent from the one managed by the host at EL1.
The 'struct kvm_pgtable_mm_ops' used by the page-table code is populated
with a set of callbacks that can manage guest pages in the hypervisor
without any direct intervention from the host, allocating page-table
pages from the provided pool and returning these to the host on VM
teardown. To keep things simple, the stage-2 MMU for the guest is
configured identically to the host stage-2 in the VTCR register and so
the IPA size of the guest must match the PA size of the host.
For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-20-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2fc61f4f16d16b4786a50f4e2e4c5b06ed357b22
The initialisation of guest stage-2 page-tables is currently split
across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2().
That is presumably for historical reasons as kvm_arm_setup_stage2()
originates from the (now defunct) KVM port for 32-bit Arm.
Simplify this code path by merging both functions into one, taking care
to map the 'struct kvm' into the hypervisor stage-1 early on in order to
simplify the failure path.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-19-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I47433ed294fb5329b17b5e8b290e41b8e6d5b4a9
The host at EL1 and the pKVM hypervisor at EL2 will soon need to
exchange memory pages dynamically for creating and destroying VM state.
Indeed, the hypervisor will rely on the host to donate memory pages it
can use to create guest stage-2 page-tables and to store VM and vCPU
metadata. In order to ease this process, introduce a
'struct hyp_memcache' which is essentially a linked list of available
pages, indexed by physical addresses so that it can be passed
meaningfully between the different virtual address spaces configured at
EL1 and EL2.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-18-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I69996a4018e61473f25c5079f9ebcd82754914f4
In preparation for handling cache maintenance of guest pages from within
the pKVM hypervisor at EL2, introduce an EL2 copy of icache_inval_pou()
which will later be plumbed into the stage-2 page-table cache
maintenance callbacks, ensuring that the initial contents of pages
mapped as executable into the guest stage-2 page-table is visible to the
instruction fetcher.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-17-will@kernel.org
[willdeacon@: Adjusted new icache_inval_pou() asm function to use old
_PI() helper macros]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I0a1fb40d49bbc19eb53d59805acb93ebd7702b8b
The nVHE object at EL2 maintains its own copies of some host variables
so that, when pKVM is enabled, the host cannot directly modify the
hypervisor state. When running in normal nVHE mode, however, these
variables are still mirrored at EL2 but are not initialised.
Initialise the hypervisor symbols from the host copies regardless of
pKVM, ensuring that any reference to this data at EL2 with normal nVHE
will return a sensibly initialised value.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-16-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I1e490d403e85ae5dbc984c1f0b8447c93cdf8700
Mapping pages in a guest page-table from within the pKVM hypervisor at
EL2 may require cache maintenance to ensure that the initialised page
contents is visible even to non-cacheable (e.g. MMU-off) accesses from
the guest.
In preparation for performing this maintenance at EL2, introduce a
per-vCPU fixmap which allows the pKVM hypervisor to map guest pages
temporarily into its stage-1 page-table for the purposes of cache
maintenance and, in future, poisoning on the reclaim path. The use of a
fixmap avoids the need for memory allocation or locking on the map()
path.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Co-developed-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-15-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I8054dac551f4fd5c4afb5b497816589f1fae5285
With the pKVM hypervisor at EL2 now offering hypercalls to the host for
creating and destroying VM and vCPU structures, plumb these in to the
existing arm64 KVM backend to ensure that the hypervisor data structures
are allocated and initialised on first vCPU run for a pKVM guest.
In the host, 'struct kvm_protected_vm' is introduced to hold the handle
of the pKVM VM instance as well as to track references to the memory
donated to the hypervisor so that it can be freed back to the host
allocator following VM teardown. The stage-2 page-table, hypervisor VM
and vCPU structures are allocated separately so as to avoid the need for
a large physically-contiguous allocation in the host at run-time.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-14-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2823f936b69f756d867bf584c645b29357c6abe9
Introduce a global table (and lock) to track pKVM instances at EL2, and
provide hypercalls that can be used by the untrusted host to create and
destroy pKVM VMs and their vCPUs. pKVM VM/vCPU state is directly
accessible only by the trusted hypervisor (EL2).
Each pKVM VM is directly associated with an untrusted host KVM instance,
and is referenced by the host using an opaque handle. Future patches
will provide hypercalls to allow the host to initialize/set/get pKVM
VM/vCPU state using the opaque handle.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Co-developed-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-13-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ibe946e86f9a773b6024bf74faeccbb6929548a1f
In preparation for introducing VM and vCPU state at EL2, rename the
existing 'struct host_kvm' and its singleton 'host_kvm' instance to
'host_mmu' so as to avoid confusion between the structure tracking the
host stage-2 MMU state and the host instance of a 'struct kvm' for a
protected guest.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-12-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Iaf93fa2f37f1e87e56182b8494be808ce0dfcf95
nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.
Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-10-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I9f1eea7b57d6ca5cc5ac172ce5175272f166164a
Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned.
This will allow the hypervisor to take references on host-provided
data-structures (e.g. 'struct kvm') with the guarantee that these pages
will remain in a stable state until the hypervisor decides to release
them, for example during guest teardown.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-9-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I2918af9b4986b3d5d19ed1c6168b20b59b6a563e
Memory regions marked as "no-map" in the host device-tree routinely
include TrustZone carev-outs and DMA pools. Although donating such pages
to the hypervisor may not breach confidentiality, it could be used to
corrupt its state in uncontrollable ways. To prevent this, let's block
host-initiated memory transitions targeting "no-map" pages altogether in
nVHE protected mode as there should be no valid reason to do this in
current operation.
Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, so we can easily check for the presence of the
MEMBLOCK_NOMAP flag on a region containing pages being donated from the
host.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-8-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I21198f93a6d9c727b70c504ddd31345329eabb8f
Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely
and the new owner having exclusive access to the page.
Implement a do_donate() helper, along the same lines as do_{un,}share,
and provide this functionality for the host-{to,from}-hyp cases as this
will later be used to donate/reclaim memory pages to store VM metadata
at EL2.
In a similar manner to the sharing transitions, permission checks are
performed by the hypervisor to ensure that the component initiating the
transition really is the owner of the page and also that the completer
does not currently have a page mapped at the target address.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-7-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I19c349d40f63c1bd5eae0baf9acd21697e1e8762
The 'pkvm_component_id' enum type provides constants to refer to the
host and the hypervisor, yet this information is duplicated by the
'pkvm_hyp_id' constant.
Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
type definition to 'mem_protect.h' so that it can be used outside of
the memory protection code, for example when initialising the owner for
hypervisor-owned pages.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-6-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I78454dc315a0993808a741ebd7fabf9dbcaf09bc
In order to allow unmapping arbitrary memory pages from the hypervisor
stage-1 page-table, fix-up the initial refcount for pages that have been
mapped before the 'vmemmap' array was up and running so that it
accurately accounts for all existing hypervisor mappings.
This is achieved by traversing the entire hypervisor stage-1 page-table
during initialisation of EL2 and updating the corresponding
'struct hyp_page' for each valid mapping.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-5-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I10df07bab97198ac4fd9477091b2e9340cc441b0
The EL2 'vmemmap' array in nVHE Protected mode is currently very sparse:
only memory pages owned by the hypervisor itself have a matching 'struct
hyp_page'. However, as the size of this struct has been reduced
significantly since its introduction, it appears that we can now afford
to back the vmemmap for all of memory.
Having an easily accessible 'struct hyp_page' for every physical page in
memory provides the hypervisor with a simple mechanism to store metadata
(e.g. a refcount) that wouldn't otherwise fit in the very limited number
of software bits available in the host stage-2 page-table entries. This
will be used in subsequent patches when pinning host memory pages for
use by the hypervisor at EL2.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-4-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ifced0fb12faf2af6b757d68e18ba7cb74fdd0871
All the contiguous pages used to initialize a 'struct hyp_pool' are
considered coalescable, which means that the hyp page allocator will
actively try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.
In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy donation
of pages from the host to the hypervisor when allocating guest stage-2
page-table pages at EL2.
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221020133827.5541-3-will@kernel.org
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: Ie417f839ae6d30705847ab3e27e2b3c3ac6ee8dc
In order to perform more open-coded replacements of common allocation
size arithmetic, the kernel needs saturating (SIZE_MAX) helpers for
multiplication, addition, and subtraction. For example, it is common in
allocators, especially on realloc, to add to an existing size:
p = krealloc(map->patch,
sizeof(struct reg_sequence) * (map->patch_regs + num_regs),
GFP_KERNEL);
There is no existing saturating replacement for this calculation, and
just leaving the addition open coded inside array_size() could
potentially overflow as well. For example, an overflow in an expression
for a size_t argument might wrap to zero:
array_size(anything, something_at_size_max + 1) == 0
Introduce size_mul(), size_add(), and size_sub() helpers that
implicitly promote arguments to size_t and saturated calculations for
use in allocations. With these helpers it is also possible to redefine
array_size(), array3_size(), flex_array_size(), and struct_size() in
terms of the new helpers.
As with the check_*_overflow() helpers, the new helpers use __must_check,
though what is really desired is a way to make sure that assignment is
only to a size_t lvalue. Without this, it's still possible to introduce
overflow/underflow via type conversion (i.e. from size_t to int).
Enforcing this will currently need to be left to static analysis or
future use of -Wconversion.
Additionally update the overflow unit tests to force runtime evaluation
for the pathological cases.
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Len Baker <len.baker@gmx.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit e1be43d9b5)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I7743104617dd2ed530002cc06574203daf6d13d9
This reverts commit efe9cb9c8f.
Reason for revert: Android14-5.15+ tooling has been updated to not
consider the GKI modules using dynamic partition between vendor
and GKI modules making this change not required. This flag has been
removed from kleaf use already which is the only supported build
infrastructure for this branch going forward.
Bug: 232431411
Change-Id: I87b311d323b61716fc595fd869155f5961593b27
Test: TH
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
v1 -> v2:
- Address Ulf's suggestions
Some SD-cards that are SDA-6.0 compliant reports they supports discard
while they actually don't. This might cause mk2fs to fail while trying
to format the card and revert it to a read-only mode.
While at it, add SD MID for SANDISK. This is because eMMC MID is assign
by JEDEC and SD MID is assigned by SD 3c-LLC.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Bug: 249507619
(cherry picked from commit 07d2872bf4 git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git)
Change-Id: I43db47edbffc83dd680bad42ae691cef38edbf12
Signed-off-by: Bart Van Assche <bvanassche@google.com>
This reverts commit 71e7a059a0.
Reason of revert: build.sh will be deprecated on android14-5.15.
This change is accidentally merged into android14-5.15 when
android13-5.15 is merged into android14-5.15. However, it is intentional
that this change should not be merged here. Hence reverting.
Bug: 222074706
Test: manual
Signed-off-by: Yifan Hong <elsk@google.com>
Change-Id: I67a5d2feda64aa35348387a1d15d9d3aaf71271f
find_get_page() returns a page with increased refcount, assuming a page
exists at the given index. Ensure this refcount is dropped on error.
Bug: 253068137
Fixes: cd333a037c ("BACKPORT: FROMLIST: mm: implement speculative handling in filemap_fault()")
Change-Id: Idc7b9e3f11f32a02bed4c6f4e11cec9200a5c790
Signed-off-by: Patrick Daly <quic_pdaly@quicinc.com>
(cherry picked from commit 6232eecfa7ca0d8d0ca088da6d0edb2c3a879ff9)
* aosp/upstream-f2fs-stable-linux-5.15.y:
f2fs: simplify f2fs_force_buffered_io()
f2fs: move f2fs_force_buffered_io() into file.c
fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN
fscrypt: work on block_devices instead of request_queues
fscrypt: stop holding extra request_queue references
fscrypt: stop using keyrings subsystem for fscrypt_master_key
fscrypt: stop using PG_error to track error status
f2fs: change to use atomic_t type form sbi.atomic_files
f2fs: account swapfile inodes
f2fs: allow direct read for zoned device
f2fs: support recording errors into superblock
f2fs: support recording stop_checkpoint reason into super_block
f2fs: remove the unnecessary check in f2fs_xattr_fiemap
f2fs: introduce cp_status sysfs entry
f2fs: fix to detect corrupted meta ino
f2fs: fix to account FS_CP_DATA_IO correctly
f2fs: code clean and fix a type error
f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file
f2fs: fix to do sanity check on summary info
f2fs: fix to do sanity check on destination blkaddr during recovery
f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT
f2fs: fix race condition on setting FI_NO_EXTENT flag
f2fs: remove redundant check in f2fs_sanity_check_cluster
f2fs: add static init_idisk_time function to reduce the code
f2fs: fix typo
f2fs: fix wrong dirty page count when race between mmap and fallocate.
f2fs: use COMPRESS_MAPPING to get compress cache mapping
f2fs: return the tmp_ptr directly in __bitmap_ptr
f2fs: simplify code in f2fs_prepare_decomp_mem
f2fs: replace logical value "true" with a int number
f2fs: increase the limit for reserve_root
f2fs: complete checkpoints during remount
f2fs: flush pending checkpoints when freezing super
f2fs: remove gc_urgent_high_limited for cleanup
f2fs: iostat: support accounting compressed IO
f2fs: use memcpy_{to,from}_page() where possible
f2fs: fix wrong continue condition in GC
f2fs: LFS mode does not support ATGC
Conflicts:
fs/crypto/inline_crypt.c
include/linux/fscrypt.h
Bug: 243472081
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I00b415ead603aa64502dbfeede0221415c41e174
Presently stage2_apply_range() works on a batch of memory addressed by a
stage 2 root table entry for the VM. Depending on the IPA limit of the
VM and PAGE_SIZE of the host, this could address a massive range of
memory. Some examples:
4 level, 4K paging -> 512 GB batch size
3 level, 64K paging -> 4TB batch size
Unsurprisingly, working on such a large range of memory can lead to soft
lockups. When running dirty_log_perf_test:
./dirty_log_perf_test -m -2 -s anonymous_thp -b 4G -v 48
watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [dirty_log_perf_:16703]
Modules linked in: vfat fat cdc_ether usbnet mii xhci_pci xhci_hcd sha3_generic gq(O)
CPU: 0 PID: 16703 Comm: dirty_log_perf_ Tainted: G O 6.0.0-smp-DEV #1
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dcache_clean_inval_poc+0x24/0x38
lr : clean_dcache_guest_page+0x28/0x4c
sp : ffff800021763990
pmr_save: 000000e0
x29: ffff800021763990 x28: 0000000000000005 x27: 0000000000000de0
x26: 0000000000000001 x25: 00400830b13bc77f x24: ffffad4f91ead9c0
x23: 0000000000000000 x22: ffff8000082ad9c8 x21: 0000fffafa7bc000
x20: ffffad4f9066ce50 x19: 0000000000000003 x18: ffffad4f92402000
x17: 000000000000011b x16: 000000000000011b x15: 0000000000000124
x14: ffff07ff8301d280 x13: 0000000000000000 x12: 00000000ffffffff
x11: 0000000000010001 x10: fffffc0000000000 x9 : ffffad4f9069e580
x8 : 000000000000000c x7 : 0000000000000000 x6 : 000000000000003f
x5 : ffff07ffa2076980 x4 : 0000000000000001 x3 : 000000000000003f
x2 : 0000000000000040 x1 : ffff0830313bd000 x0 : ffff0830313bcc40
Call trace:
dcache_clean_inval_poc+0x24/0x38
stage2_unmap_walker+0x138/0x1ec
__kvm_pgtable_walk+0x130/0x1d4
__kvm_pgtable_walk+0x170/0x1d4
__kvm_pgtable_walk+0x170/0x1d4
__kvm_pgtable_walk+0x170/0x1d4
kvm_pgtable_stage2_unmap+0xc4/0xf8
kvm_arch_flush_shadow_memslot+0xa4/0x10c
kvm_set_memslot+0xb8/0x454
__kvm_set_memory_region+0x194/0x244
kvm_vm_ioctl_set_memory_region+0x58/0x7c
kvm_vm_ioctl+0x49c/0x560
__arm64_sys_ioctl+0x9c/0xd4
invoke_syscall+0x4c/0x124
el0_svc_common+0xc8/0x194
do_el0_svc+0x38/0xc0
el0_svc+0x2c/0xa4
el0t_64_sync_handler+0x84/0xf0
el0t_64_sync+0x1a0/0x1a4
Use the largest supported block mapping for the configured page size as
the batch granularity. In so doing the walker is guaranteed to visit a
leaf only once.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221007234151.461779-3-oliver.upton@linux.dev
(cherry picked from commit 5994bc9e05)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I6e4582c33968f80748a8e72f894386acae0689fd
Work out the minimum page table level where KVM supports block mappings
at compile time. While at it, rewrite the comment around supported block
mappings to directly describe what KVM supports instead of phrasing in
terms of what it does not.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221007234151.461779-2-oliver.upton@linux.dev
(cherry picked from commit 3b5c082bbf)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I5621213ac94e42f5aec0a33ce807e62cd85a7cd1
As announced on the kvmarm list, we're moving the mailing list over
to kvmarm@lists.linux.dev:
<quote>
As you probably all know, the kvmarm mailing has been hosted on
Columbia's machines for as long as the project existed (over 13
years). After all this time, the university has decided to retire the
list infrastructure and asked us to find a new hosting.
A new mailing list has been created on lists.linux.dev[1], and I'm
kindly asking everyone interested in following the KVM/arm64
developments to start subscribing to it (and start posting your
patches there). I hope that people will move over to it quickly enough
that we can soon give Columbia the green light to turn their systems
off.
Note that the new list will only get archived automatically once we
fully switch over, but I'll make sure we fill any gap and not lose any
message. In the meantime, please Cc both lists.
[...]
[1] https://subspace.kernel.org/lists.linux.dev.html
</quote>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221001091245.3900668-1-maz@kernel.org
(cherry picked from commit ac107abef1)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I0bd23aa382d4274954995a8f10e70a49298b90f6
Ignore kvm-arm.mode if !is_hyp_mode_available(). Specifically, we want
to avoid switching kvm_mode to KVM_MODE_PROTECTED if hypervisor mode is
not available. This prevents "Protected KVM" cpu capability being
reported when Linux is booting in EL1 and would not have KVM enabled.
Reasonably though, we should warn if the command line is requesting a
KVM mode at all if KVM isn't actually available. Allow
"kvm-arm.mode=none" to skip the warning since this would disable KVM
anyway.
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220920190658.2880184-1-quic_eberman@quicinc.com
(cherry picked from commit b2a4d007c3)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I557abb374599ba3f39d21350f8f7f4c7def523c6
The 'coll' parameter to update_affinity_collection() is never NULL,
so comparing it with 'ite->collection' is enough to cover both
the NULL case and the "another collection" case.
Remove the duplicate check in update_affinity_collection().
Signed-off-by: Gavin Shan <gshan@redhat.com>
[maz: repainted commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220923065447.323445-1-gshan@redhat.com
(cherry picked from commit 096560dd13)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I442cdd0831067b0787eafb5ab621769cf118ac84
While userspace enables single-step, if the Software Step state at the
last guest exit was "Active-pending", clear PSTATE.SS on guest entry
to restore the state.
Currently, KVM sets PSTATE.SS to 1 on every guest entry while userspace
enables single-step for the vCPU (with KVM_GUESTDBG_SINGLESTEP).
It means KVM always makes the vCPU's Software Step state
"Active-not-pending" on the guest entry, which lets the VCPU perform
single-step (then Software Step exception is taken). This could cause
extra single-step (without returning to userspace) if the Software Step
state at the last guest exit was "Active-pending" (i.e. the last
exit was triggered by an asynchronous exception after the single-step
is performed, but before the Software Step exception is taken.
See "Figure D2-3 Software step state machine" and "D2.12.7 Behavior
in the active-pending state" in ARM DDI 0487I.a for more info about
this behavior).
Fix this by clearing PSTATE.SS on guest entry if the Software Step state
at the last exit was "Active-pending" so that KVM restore the state (and
the exception is taken before further single-step is performed).
Fixes: 337b99bf7e ("KVM: arm64: guest debug, add support for single-step")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220917010600.532642-3-reijiw@google.com
(cherry picked from commit 370531d1e9)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I214f1064e8e233facfee08254c42db7c8c707cf0
Preserve the PSTATE.SS value for the guest while userspace enables
single-step (i.e. while KVM manipulates the PSTATE.SS) for the vCPU.
Currently, while userspace enables single-step for the vCPU
(with KVM_GUESTDBG_SINGLESTEP), KVM sets PSTATE.SS to 1 on every
guest entry, not saving its original value.
When userspace disables single-step, KVM doesn't restore the original
value for the subsequent guest entry (use the current value instead).
Exception return instructions copy PSTATE.SS from SPSR_ELx.SS
only in certain cases when single-step is enabled (and set it to 0
in other cases). So, the value matters only when the guest enables
single-step (and when the guest's Software step state isn't affected
by single-step enabled by userspace, practically), though.
Fix this by preserving the original PSTATE.SS value while userspace
enables single-step, and restoring the value once it is disabled.
This fix modifies the behavior of GET_ONE_REG/SET_ONE_REG for the
PSTATE.SS while single-step is enabled by userspace.
Presently, GET_ONE_REG/SET_ONE_REG gets/sets the current PSTATE.SS
value, which KVM will override on the next guest entry (i.e. the
value userspace gets/sets is not used for the next guest entry).
With this patch, GET_ONE_REG/SET_ONE_REG will get/set the guest's
preserved value, which KVM will preserve and try to restore after
single-step is disabled.
Fixes: 337b99bf7e ("KVM: arm64: guest debug, add support for single-step")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220917010600.532642-2-reijiw@google.com
(cherry picked from commit 34fbdee086)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I3658c129124e4c4266c32bd309bd17d803f12840
Currently the kernel refers to the versions of the PMU and SPE features by
the version of the architecture where those features were updated but the
ARM refers to them using the FEAT_ names for the features. To improve
consistency and help with updating for newer features and since v9 will
make our current naming scheme a bit more confusing update the macros
identfying features to use the FEAT_ based scheme.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220910163354.860255-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 121a8fc088)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I94244aacede897083e9d56d0416f13cf3544dd7a
One of the oddities of the architecture is that the AArch64 views of the
AArch32 ID registers are UNKNOWN if AArch32 isn't implemented at any EL.
Nonetheless, KVM exposes these registers to userspace for the sake of
save/restore. It is possible that the UNKNOWN value could differ between
systems, leading to a rejected write from userspace.
Avoid the issue altogether by handling the AArch32 ID registers as
RAZ/WI when on an AArch64-only system.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220913094441.3957645-7-oliver.upton@linux.dev
(cherry picked from commit d5efec7ed8)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I9c22fd3de55a909e65628b675b2400fedca57dd8
There is no longer a need for caller-specified RAZ visibility. Hoist the
call to sysreg_visible_as_raz() into read_id_reg() and drop the
parameter.
No functional change intended.
Suggested-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220913094441.3957645-4-oliver.upton@linux.dev
(cherry picked from commit cdd5036d04)
[willdeacon@: Fix conflict with ID_AA64PFR0_EL1_CSV2_SHIFT rename as per upstream]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Ib71a7a19f2497491b3b73feb5688db380534a426
After the conversion to automatically generating the ID_AA64DFR0_EL1
definition names, the build fails in a few different places because some
of the definitions were not changed to their new names along the way.
Update the names to resolve the build errors.
Fixes: c0357a73fa ("arm64/sysreg: Align field names in ID_AA64DFR0_EL1 with architecture")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220919160928.3905780-1-nathan@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit db74cd6337)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I608cefb411735e05e6730871a82d6a5194a51794