This reverts commit b923dd1052.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: Ib7087614a16570125233f26d582d449fe5ead163
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Non-protected mode relies on the host to restore its SVE state if
necessary. However, protected VMs shouldn't reveal any
information to the host, including whether they have potentially
dirtied the host's sve state. Therefore, save and restore the
host's sve state at hyp in protected mode.
Currently this behavior applies to protected and non-protected
VMs in protected mode. It could be optimised for non-protected
VMs by applying the same behavior as non-protected mode, which is
to inform the host that it should restore its sve state. But for
now it's kept this way to maintain the same behavior for all VMs
in protected mode.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ifbcc64b387c3f821a6c1047e8c843f6250a3f690
The code for deactivating traps, to be able to update the fpsimd
registers, is the only code in this file that is n/vhe specific.
Move it to specialized functions.
This is also needed for the subsequent patch, since the logic for
deciding which traps to enable/disable will get more complex.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ia0477450aa9319a46a91b3c31c1910ad02fbe246
In subsequent patches, vhe/pKVM(nvhe) will diverge significantly
on saving the host fpsimd/sve state when taking a guest fpsimd
trap. Add a specialized helper to handle that.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ib6b13cafad8bf568694804e3b55e0a5a4fcd70a4
Allocate memory and donate it to hyp at setup time for tracking
the host sve state at hyp in protected mode. This memory is used
in the subsequent patch.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: If07eec9ea9c7b216d02e2d1ea69bd62d99f08081
The code to determine the maximum sve vector length by the system
isn't trivial. In subsequent patches hyp needs to know it for
allocating memory for the host sve state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I2561af67722a99d8a989b26cb47d073eba3869ff
Subsequent patches will augment this state to allocate space for
tracking the host sve state. SVE state size is not static, and
there isn't support for dynamic per_cpu allocation in hyp.
This is done as a first step in allowing us to allocate SVE state
under the same umbrella.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I0902623a5ab81a80105f5b00a26765d257bc1ceb
The state will be augmented in future patches and accessed in
more than one location. It makes it easier to reason about the
code.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: If3a3a9266c201f63c126860b61da9698be9b9faa
Subsequent patches will change how the fpsimd state is allocated,
and add tracking of sve state. Moving this to a helper makes
future code cleaner and patches easier to reason about.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ic46b8889c1fe11f0cfdd7b5f3d2b98bf412183f0
Before the conversion of the various booleans into an enum
representing the state, this helper clarified things. Since the
introduction of the enum, the helper obfuscates rather than
helps.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I83c870146ed2d910bf10d625d1048b95c8b23736
pKVM maintains its own state for tracking the host fpsimd state.
Therefore, no need to map and share the host's view with it.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I5e5164a7694881ffa641b5b6a8691a542fd55a14
Expand comment clarifying why the host value representing sve
vector length being restored for ZCR_EL1 on guest exit isn't the
same as it was on guest entry.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I5889407b4391a80dfcf77b31375c3a17705b68da
The GKI policy allows the addition of new symbols to a frozen KMI as
long as doing so has no impact on existing frozen symbols. Interestingly
the hypervisor's ABI is defined by the pkvm_module_ops structure. Any
addition to this struct will be flagged as a type change, which equates
to a KMI breakage in the GKI world. This could become a major problem
long term if it prevented backport of (security) fixes to KMI-frozen
kernels.
To allow such backports, add a set of reserved ABI slots to the
pkvm_module_ops struct. These slots are usually reserved to fix LTS
merges, but given that none of the pKVM module code is upstream yet,
these slots are likely to be used by Android-specific fixes.
Bug: 233587962
Change-Id: I61a00a09947ccff153c96a4829e083ef9ede19d3
Signed-off-by: Quentin Perret <qperret@google.com>
pKVM modules may need to access memory that is kept map in the host's
stage-2 page-table. Expose the host_{un}share_hyp() API to allow the
use-case, as well as the pinning API that goes with it.
Bug: 245034629
Change-Id: I1b5abacfcd2f066b1cbb1bbac43b77e6808f559c
Signed-off-by: Quentin Perret <qperret@google.com>
DWARFv5 is the latest iteration of the debug info spec; it contains many
encoding tricks to optimize for space.
For example, with this patch applied (DWARFv5), for
build.config.gki.aarch64:
$ du -h out/android-mainline/dist/vmlinux
304M out/android-mainline/dist/vmlinux
Before (DWARFv4):
du -h out/android-mainline/dist/vmlinux
339M out/android-mainline/dist/vmlinux
Bug: 192694378
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: I6644482d9b12eb3e0d1d3676c53ee2eee97a6573
If blk_crypto_evict_key() sees that the key is still in-use (due to a
bug) or that ->keyslot_evict failed, it currently just returns while
leaving the key linked into the keyslot management structures.
However, blk_crypto_evict_key() is only called in contexts such as inode
eviction where failure is not an option. So actually the caller
proceeds with freeing the blk_crypto_key regardless of the return value
of blk_crypto_evict_key().
These two assumptions don't match, and the result is that there can be a
use-after-free in blk_crypto_reprogram_all_keys() after one of these
errors occurs. (Note, these errors *shouldn't* happen; we're just
talking about what happens if they do anyway.)
Fix this by making blk_crypto_evict_key() unlink the key from the
keyslot management structures even on failure.
Also improve some comments.
Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 270098322
(cherry picked from commit 5c7cb94452https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/log/?h=for-next)
Change-Id: I4e8983ad7db94ea8cd422743196da8854adda552
Signed-off-by: Eric Biggers <ebiggers@google.com>
Once all I/O using a blk_crypto_key has completed, filesystems can call
blk_crypto_evict_key(). However, the block layer currently doesn't call
blk_crypto_put_keyslot() until the request is being freed, which happens
after upper layers have been told (via bio_endio()) the I/O has
completed. This causes a race condition where blk_crypto_evict_key()
can see 'slot_refs != 0' without there being an actual bug.
This makes __blk_crypto_evict_key() hit the
'WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)' and return without
doing anything, eventually causing a use-after-free in
blk_crypto_reprogram_all_keys(). (This is a very rare bug and has only
been seen when per-file keys are being used with fscrypt.)
There are two options to fix this: either release the keyslot before
bio_endio() is called on the request's last bio, or make
__blk_crypto_evict_key() ignore slot_refs. Let's go with the first
solution, since it preserves the ability to report bugs (via
WARN_ON_ONCE) where a key is evicted while still in-use.
Fixes: a892c8d52c ("block: Inline encryption support for blk-mq")
Cc: stable@vger.kernel.org
Reviewed-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 270098322
(cherry picked from commit 9cd1e56667https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/log/?h=for-next)
Change-Id: Ic2c2426db7693a06901c7893d481471f30de03b2
Signed-off-by: Eric Biggers <ebiggers@google.com>
Enable the ARMv8 Crypto Extensions implementation of AES-GCM, as it's an
order of magnitude faster than the generic implementation and is more
secure. AES-GCM is used by Android's IPsec support
(https://developer.android.com/reference/android/net/IpSecAlgorithm#AUTH_CRYPT_AES_GCM)
and often is the first choice of algorithm for new purposes as well.
This also makes GKI on arm64 consistent with GKI on x86, as the AES-NI
accelerated AES-GCM is already enabled on x86. (It is not its own
option on x86, but rather is included in CONFIG_CRYPTO_AES_NI_INTEL.)
Bug: 274721410
Change-Id: I2877192dad8f71a961d6f6f465b62b6aeee69540
Signed-off-by: Eric Biggers <ebiggers@google.com>
Simply make shadow of vmalloc area mapped on demand.
Since the virtual address of vmalloc for Arm is also between
MODULE_VADDR and 0x100000000 (ZONE_HIGHMEM), which means the shadow
address has already included between KASAN_SHADOW_START and
KASAN_SHADOW_END.
Thus we need to change nothing for memory map of Arm.
This can fix ARM_MODULE_PLTS with KASan, support KASan for higmem
and support CONFIG_VMAP_STACK with KASan.
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Bug: 275526617
(cherry picked from commit 565cbaad83)
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Change-Id: Ic2cb62e294dad96ba5a98b2ca48fa5efea2c2e57
I found a bug in the previous version and this patch fixes the gap from
upstream version.
Fixes: fcc385fd44 ("FROMGIT: f2fs: factor out discard_cmd usage from general rb_tree use")
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
(cherry picked from commit e39836183be8
https: //git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I4dbfb9f1f2cc956685a7c4de5fcfbba705c30cfb
Add a vendor hook for pagecache hit/miss and other
vendor specific functions.
Bug: 174088128
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: Ie9f14a69a86b8ed81de766e44e30f2eba1d9bd84
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit db158b4ae0)
Add a vendor hook for costly order page counting
and other vendor specific functions.
Bug: 174521902
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: I89206727a462548cc3500b695d85c83ff003eec7
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 369de37804)
This reverts commit 3df32812eb which is
commit b1a37ed00d upstream.
It breaks the Android KABI and if needed, should come back in an
abi-safe way.
Bug: 161946584
Cc: Lee Jones <joneslee@google.com>
Change-Id: I1f160797720e8bdf4960542e711fd17940a975d9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 02904e8a2f which is
commit 1c5d422124 upstream.
It breaks the Android KABI and if needed, should come back in an
abi-safe way.
Bug: 161946584
Cc: Lee Jones <joneslee@google.com>
Change-Id: I9a460d9dbc41512ee71ff607e875f2da9be7f9f6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Even if we have multiple queues in the plug list, chances that they
are very interspersed is minimal. Don't bother spending CPU cycles
sorting the list.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Change-Id: Ia85d5c75ef4f2bf3f90e4d3408cffec5c41dcfe2
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 274474142
(cherry picked from commit df87eb0fce)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
In addition to reverting commit 7b05bf7710 ("Revert "block/mq-deadline:
Prioritize high-priority requests""), this patch uses 'jiffies' instead
of ktime_get() in the code for aging lower priority requests.
This patch has been tested as follows:
Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: 555 K IOPS.
Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
Result without and with this patch: about 380 K IOPS.
Ran the following script:
set -e
scriptdir=$(dirname "$0")
if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
modprobe scsi_debug ndelay=1000000 max_queue=16
sd=''
while [ -z "$sd" ]; do
sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done
echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
if [ -e /sys/fs/cgroup/io.prio.class ]; then
cd /sys/fs/cgroup
echo restrict-to-be >io.prio.class
echo +io > cgroup.subtree_control
else
cd /sys/fs/cgroup/blkio/
echo restrict-to-be >blkio.prio.class
fi
echo $$ >cgroup.procs
mkdir -p hipri
cd hipri
if [ -e io.prio.class ]; then
echo none-to-rt >io.prio.class
else
echo none-to-rt >blkio.prio.class
fi
{ "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
echo $$ >cgroup.procs
"${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
Result:
* 11000 IOPS for the high-priority job
* 40 IOPS for the low-priority job
If the prio aging expiry time is changed from 100s into 0, the IOPS results
change into 6712 and 6796 IOPS.
The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Change-Id: I6eea845db892741089014853e7f5c5756b44288e
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20210927220328.1410161-5-bvanassche@acm.org
[axboe: @latest -> @latest_start]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 274474142
(cherry picked from commit 322cff70d4)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Calculating the sum over all CPUs of per-CPU counters frequently is
inefficient. Hence switch from per-CPU to individual counters. Three
counters are protected by the mq-deadline spinlock since these are
only accessed from contexts that already hold that spinlock. The fourth
counter is atomic because protecting it with the mq-deadline spinlock
would trigger lock contention.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Change-Id: If9a323c47dfa6aa1c61d0d43a0b1bfed92e137d8
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 274474142
(cherry picked from commit bce0363ed8)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The scheduler .insert_requests() callback is called when a request is
queued for the first time and also when it is requeued. Only count a
request the first time it is queued. Additionally, since the mq-deadline
scheduler only performs zone locking for requests that have been
inserted, skip the zone unlock code for requests that have not been
inserted into the mq-deadline scheduler.
Fixes: 38ba64d12d ("block/mq-deadline: Track I/O statistics")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Change-Id: I75923e60be67bd6da62ac25acd7d0635151d99f5
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210927220328.1410161-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 274474142
(cherry picked from commit e2c7275dc0)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The attempts so far to make write pipelining work are unsuccessful.
Revert commit 10d6ef4ce0 until write
pipelining works reliably.
Bug: 274474142
Change-Id: Ie7fd92c40ddefd1803b15329a3b1bd1d94012365
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Add new vendor hook when cpuset of task changed. This allows Pixel to
find a more energy efficient CPU instead of random distribution.
Bug: 236775946
Change-Id: I407637c85e2ea93585877312f090981fee848979
Signed-off-by: Jing-Ting Wu <Jing-Ting.Wu@mediatek.com>
Signed-off-by: Will McVicker <willmcvicker@google.com>
This reverts commit a027f0d72e. Multiple
partners have requested for this hook which has resulted in two
different versions -- android_rvh_set_cpus_allowed_by_task and
android_rvh_set_cpus_allowed_ptr_locked. These have since been
consolidated into a single vendor hook on android-mainline
(https://r.android.com/2135713). So let's update this branch to only
use android_rvh_set_cpus_allowed_by_task().
Bug: 236775946
Change-Id: I86f08021d6d87be96f559e133ccd09031bd1b8cd
Signed-off-by: Will McVicker <willmcvicker@google.com>