IOVAs are aligned to the smallest PAGE_SIZE order, where the requested
IOVA can fit. But this might not work for all use-cases. It can cause
IOVA fragmentation in some multimedia and 8K video use-cases that may
require larger buffers to be allocated and mapped.
When the above allocation pattern is used with the current alignment
scheme, the IOVA space could be quickly exhausted for 32bit devices.
In order to get better IOVA space utilization and reduce fragmentation,
a new kernel command line parameter is introduced to make the alignment
limit configurable by the user during boot.
Bug: 190519428
Change-Id: I0c8e72370fc3266a5a242837d82aae4f9831aef3
Link: https://lore.kernel.org/r/1634148667-409263-1-git-send-email-quic_c_gdjako@quicinc.com/
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
This reverts three increment-fs commits:
d5faa13b5910412e10c67ad88c9349
This is to fix the incrementalinstall test.
Can now install the same apk twice, and repeated installs are stable.
Bug: 217661925
Bug: 219731048
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: Ia8488d728218881ed17e4d68cab21b0b152e3ca4
During teardown, we currently walk the guest stage-2 page-table and
annotate all of its pages as 'pending poisoning' in the host stage-2.
Sadly, this requires a host stage-2 walk for every guest page, which is
rather inefficient and can lead to a long non-preemptible amount of time
spent at EL2. This gets particularly bad with IOMMUs as, in its current
form, the host stage-2 annotation triggers IOMMU updates.
To avoid the host stage-2 walks, let's annotate the pages pending
poisoning using a flag in the hyp_vmemmap instead.
Bug: 219180169
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I8894bd8e0b10ea8817763479412b540c0291e8f5
Add a 'flags' field to struct hyp_page, and reduce the size of the order
field to u8 to avoid growing the struct size.
Bug: 219180169
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: If629935bb6fa7d832c595685083f7985cfcfa221
We calculate nr_ports based on the max_nr_ports:
nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;
If the device advertises a large max_nr_ports, we will end up with a
integer overflow. Fixing this by validating the max_nr_ports and fail
the probe for invalid max_nr_ports in this case.
Cc: Amit Shah <amit@kernel.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 28962ec595)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Idb5462a1268d2bde5f867f5455da0957ca68035a
Although FF-A claims to require version v1.2 of SMCCC, in reality the
current set of calls work just fine with v1.1 and some devices ship with
EL3 firmware that advertises this configuration.
Allow pKVM to proxy FF-A calls for these devices by relaxing our SMCCC
version check to permit SMCCC v1.1+
Reported-by: Alan Stokes <alanstokes@google.com>
Bug: 222663556
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I41e9ff35f169df3609acee7bbc67999c1d11c9d1
Currently, the trace hook for is_cpu_allowed only executes if the
cpu is not a kthread. Modules need to be able to reject cpus
regardless of whether the task is a kthread or not. Modules also
need to have the flexibility to execute, or not, the remainder of
is_cpu_allowed.
Move the tracepoint for is_cpu_allowed so that it is invoked
regardless of task's kthread status, but do not interfere with
per-cpu-kthread cpu assignment.
Bug: 222550772
Change-Id: Ide48a82a33129448bb22be28814267b0b76535a2
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
Document the functionality of disable_dma32 as introduced in commit
c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make
ZONE_DMA32 empty").
Bug: 199917449
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I32ab2969f59fcc49e9ac49e7e6b545f816d120f9
zone_dma32_is_empty() currently lacks the proper validation to ensure
that the NUMA node ID it receives as an argument is valid. This has no
effect on kernels with CONFIG_NUMA=n as NODE_DATA() will return the
same pglist_data on these devices, but on kernels with CONFIG_NUMA=y,
this is not the case, and the node passed to NODE_DATA must be
validated.
Rather than trying to find the node containing ZONE_DMA32, replace
calls of zone_dma32_is_empty() with zone_dma32_are_empty() (which
iterates over all nodes and returns false if one of the nodes holds
DMA32 and it is non-empty).
Bug: 199917449
Fixes: c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty")
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I850fb9213b71a1ef29106728bfda0cc6de46fdbb
Where commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.
Commit 13765de814 ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.
Change-Id: I4d34311eac28b23ee32e9308a21c66afe8fa8a3b
Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
BUG: 221850698
(cherry picked from commit b1e8206582)
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
Syzbot found a GPF in reweight_entity. This has been bisected to
commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid
sched_task_group")
There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in
reweight_entity() in CFS. The scenario is that the main process spawns
number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
wait, and exit. For each of the new threads the copy_process() gets
invoked, which adds the new task_struct and calls sched_post_fork()
for it.
In the above scenario there is a possibility that
setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
in the group that is just being created by copy_process(), and for
which the sched_post_fork() has not been executed yet. This will
trigger a null pointer dereference in reweight_entity(), as it will
try to access the run queue pointer, which hasn't been set.
Before the mentioned change the cfs_rq pointer for the task has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group. Now it is done in
the sched_post_fork(), which is called after that. To fix the issue
the remove the update_load param from the update_load param() function
and call reweight_task() only if the task flag doesn't have the
TASK_NEW flag set.
Change-Id: I22d5b9d0b06cd85f0f02446b1e8a2389935cffa8
Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220203161846.1160750-1-tadeusz.struk@linaro.org
BUG: 221850698
(cherry picked from commit 13765de814)
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
It enables the power capping sysfs interface for
different power zone devices.
Bug: 220884335
Change-Id: I11bc3efe06d2a02dcc602d223d3e6757088ca771
Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
kvm_flush_dcache_to_poc() converts its (start,len) parameters into
(start,end) parameters for dcache_clean_inval_poc(). This mostly works
out except for the case when 'len == 0', where dcache_clean_inval_poc()
will still issue cache maintenance for the cache line containing 'start'.
If 'start' is not mapped, then this can generate an unexpected fault.
Don't call into dcache_clean_inval_poc() from kvm_flush_dcache_to_poc()
if the supplied length is 0.
Reported-by: John Stultz <john.stultz@linaro.org>
Bug: 196204410
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idae2b22289398e941938821d1d3b3a5a1da3fd8f
When page allocation in direct reclaim path fails, the system will make
one attempt to shrink per-cpu page lists and free pages from high alloc
reserves. Draining per-cpu pages into buddy allocator can be a very slow
operation because it's done using workqueues and the task in direct
reclaim waits for all of them to finish before proceeding. Currently this
time is not accounted as psi memory stall.
While testing mobile devices under extreme memory pressure, when
allocations are failing during direct reclaim, we notices that psi events
which would be expected in such conditions were not triggered. After
profiling these cases it was determined that the reason for missing psi
events was that a big chunk of time spent in direct reclaim is not
accounted as memory stall, therefore psi would not reach the levels at
which an event is generated. Further investigation revealed that the bulk
of that unaccounted time was spent inside drain_all_pages call.
A typical captured case when drain_all_pages path gets activated:
__alloc_pages_slowpath took 44.644.613ns
__perform_reclaim took 751.668ns (1.7%)
drain_all_pages took 43.887.167ns (98.3%)
PSI in this case records the time spent in __perform_reclaim but ignores
drain_all_pages, IOW it misses 98.3% of the time spent in
__alloc_pages_slowpath.
Annotate __alloc_pages_direct_reclaim in its entirety so that delays from
handling page allocation failure in the direct reclaim path are accounted
as memory stall.
Link: https://lkml.kernel.org/r/20220223194812.1299646-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Tim Murray <timmurray@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d4f448732857375eb3dc422225a61e64f8257cb1
https://github.com/hnaz/linux-mm.git master)
Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia3a4138f8d5e8ce612bd5c371cfcc0f21e1ebc42
In order to support the Protected KVM (pKVM) development effort, ensure
that the GKI kernel initialises KVM in "protected" mode when booted at
EL2, even if the underlying CPU hardware supports VHE.
This has no impact on platforms entering the kernel at EL1.
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <mzyngier@google.com>
Cc: Alistair Delva <adelva@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 178098380
Test: atest VirtualizationHostTestCases on an EL2-enabled device
Change-Id: Id84d0b0d08706658d1fc080c09ad8ee5b51ed517
ABI XML is tidied unconditionality from Android 13.
Bug: 221390852
Change-Id: If2d6ad724450d8affbf302f449e408ae2b0d3b2a
Signed-off-by: Giuliano Procida <gprocida@google.com>
Do not use variable to reflect something it wasn't intended to reflect, i.e.,
number of created vcpus vs number of vcpus pinned so far.
Consolidate pinning and error handling to the same level to make
code more readable.
Ensure that the donated pgd is big enough for all vcpus.
Bug: 220830416
Bug: 216808671
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Ibf41a93bb1175e59b3ab82d2f735f25505d2892a
Change the variable names to avoid confusion between total memory
area size or just the number of pages.
Use host_kvm.vtcr to make future refactoring easier.
Simplifies future fixes of the bug below.
Bug: 216808671
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Ica0a0dfcf839dae0625a26a2095e56212385bbe7
This function only works for loaded vcpus and no more information
is needed by hyp. This removes the need to access potentially
unsafe host memory.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I09cb49b06e541bba09e91ce5885b963b88a3c315
This function only works for loaded vcpus and no more information
is needed by hyp. This removes the need to access potentially
unsafe host memory.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Id705e9d8f1d147d474cb81af4ce974bbe45f3614
Split it into two functions, sync/flush, which correspond to the
direction the data is going. Remove the need to explicitly pass
the host vcpu since the shadow already has a trusted pointer to
it.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Ibb5a34d66254788782b219565833e061c664abb2
This function only works for loaded vcpus and no more information
is needed by hyp. This removes the need to access potentially
unsafe host memory.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I2dae77b900139bd61e91fcff52beedffa2746d9b
Pass the handle and other safe data instead for hyp to use to
lookup the shadow vcpu. This removes the need to access
potentially unsafe host memory.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Iff01f981aad8f1a064f8a8147e5443807558884c
Better to have the creation and teardown code in the same file to
understand what's happening. Simplifies subsequent patches.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I07bc8a9e254753f000c4faffffcf52a0d8f3a831
Pass the handle and other safe data instead for hyp to use to
lookup the shadow vcpu. This removes the need to access
potentially unsafe host memory.
Bug: 220830416
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I65a2ffc75dbdd34f36cf4d3cc860bbc7a2d9671e
Check that the donated memory for the hyp shadow vm is paged-aligned.
Bug: 217683487
Reported-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I289cf1704eea9c2036cf26a8d767b101626620ed
When the host shuts down cleanly under pKVM, it is EL2's responsibility
to clear the pvmfw pages before forwarding the PSCI call onto EL3.
Wipe the pvmfw pages on SYSTEM_OFF, SYSTEM_RESET and SYSTEM_RESET2 calls
from the host, cleaning the zeroed memory to the PoC for good measure.
Reported-by: Andrew Scull <ascull@google.com>
Bug: 196204410
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I0dd2757e355f384813319034c6eed0fa2c2328c2
The data abort fault IPA obtained from HFAR_EL2 has the bottom 12 bits
zeroed out. This broke the host MMIO DABT handler because the offsets
of accessed MMIO registers were rounded down to the nearest page.
Include FAR_EL2 in the address to fix the issue.
Bug: 220194478
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I2ee7352dba69c673e5d5bddca7e1df9db1b4ce1f
Adds vb2_dma_sg_memops to the symbol list now that VIDEOBUF2_DMA_SG is
built-in to the GKI kernel.
Bug: 219998156
Signed-off-by: Will McVicker <willmcvicker@google.com>
Change-Id: I59af06d1da835e21751636dd758ac25d9d00c8b1
The virtio video driver v2 [1] uses videobuf2 structure
vb2_dma_sg_memops, when virtio device supports non-contiguous DMA video
buffers.
DMA SG memory allocator for videobuf2
(drivers/media/common/videobuf2/videobuf2-dma-sg.c) is a common code and
has no hardware dependencies.
[1]: https://lore.kernel.org/all/20200218202753.652093-2-dmitry.sepp@opensynergy.com/
Bug: 219998156
Signed-off-by: Mikhail Golubev <Mikhail.Golubev@opensynergy.com>
Change-Id: I897898090d7a97b13202c05aae28955595e09468