commit 13d7a08352 upstream.
The macros for building the kpti trampoline are all behind
CONFIG_UNMAP_KERNEL_AT_EL0, and in a region that outputs to the
.entry.tramp.text section.
Move the macros out so they can be used to generate other kinds of
trampoline. Only the symbols need to be guarded by
CONFIG_UNMAP_KERNEL_AT_EL0 and appear in the .entry.tramp.text section.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7ce53531765df5ac4eb9a4d814ab561a6df76931
commit ed50da7764 upstream.
The tramp_ventry macro uses tramp_vectors as the address of the vectors
when calculating which ventry in the 'full fat' vectors to branch to.
While there is one set of tramp_vectors, this will be true.
Adding multiple sets of vectors will break this assumption.
Move the generation of the vectors to a macro, and pass the start
of the vectors as an argument to tramp_ventry.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idff2a8dcbb088edb40ea0714212ec966d4c065da
commit 6c5bf79b69 upstream.
Systems using kpti enter and exit the kernel through a trampoline mapping
that is always mapped, even when the kernel is not. tramp_valias is a macro
to find the address of a symbol in the trampoline mapping.
Adding extra sets of vectors will expand the size of the entry.tramp.text
section to beyond 4K. tramp_valias will be unable to generate addresses
for symbols beyond 4K as it uses the 12 bit immediate of the add
instruction.
As there are now two registers available when tramp_alias is called,
use the extra register to avoid the 4K limit of the 12 bit immediate.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic7a70f79baeacb3c8d12c396009968e563a2e69a
commit c091fb6ae0 upstream.
The trampoline code has a data page that holds the address of the vectors,
which is unmapped when running in user-space. This ensures that with
CONFIG_RANDOMIZE_BASE, the randomised address of the kernel can't be
discovered until after the kernel has been mapped.
If the trampoline text page is extended to include multiple sets of
vectors, it will be larger than a single page, making it tricky to
find the data page without knowing the size of the trampoline text
pages, which will vary with PAGE_SIZE.
Move the data page to appear before the text page. This allows the
data page to be found without knowing the size of the trampoline text
pages. 'tramp_vectors' is used to refer to the beginning of the
.entry.tramp.text section, do that explicitly.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia72d8b1946c6c63226be2f43c75f1924d027946e
commit 03aff3a77a upstream.
Kpti stashes x30 in far_el1 while it uses x30 for all its work.
Making the vectors a per-cpu data structure will require a second
register.
Allow tramp_exit two registers before it unmaps the kernel, by
leaving x30 on the stack, and stashing x29 in far_el1.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib61b44ff4dcd83c18888fb60150c563a577518bf
commit d739da1694 upstream.
Subsequent patches will add additional sets of vectors that use
the same tricks as the kpti vectors to reach the full-fat vectors.
The full-fat vectors contain some cleanup for kpti that is patched
in by alternatives when kpti is in use. Once there are additional
vectors, the cleanup will be needed in more cases.
But on big/little systems, the cleanup would be harmful if no
trampoline vector were in use. Instead of forcing CPUs that don't
need a trampoline vector to use one, make the trampoline cleanup
optional.
Entry at the top of the vectors will skip the cleanup. The trampoline
vectors can then skip the first instruction, triggering the cleanup
to run.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia62262831514064e298fd796e260969a8a46a0f8
commit 1b33d4860d upstream.
The spectre-v4 sequence includes an SMC from the assembly entry code.
spectre_v4_patch_fw_mitigation_conduit is the patching callback that
generates an HVC or SMC depending on the SMCCC conduit type.
As this isn't specific to spectre-v4, rename it
smccc_patch_fw_mitigation_conduit so it can be re-used.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie99c947a2d728bfb55738dda3833b2efeabb6e2e
commit 4330e2c5c0 upstream.
Subsequent patches add even more code to the ventry slots.
Ensure kernels that overflow a ventry slot don't get built.
Bug: 215557547
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I03d4525ca81ff83c726a9a43258604b9e165fadc
Now that RCU_BOOST handles CFS threads, enable it.
Bug: 217236054
Test: ensure CFS threads are boosted, TH
Signed-off-by: Tim Murray <timmurray@google.com>
Change-Id: Idd02467f6caad063e14aa5496617b8bbaf0e9ab1
Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
before reporting the expedited quiescent state. Under heavy real-time
load, this can result in this function being preempted before the
quiescent state is reported, which can in turn prevent the expedited grace
period from completing. Tim Murray reports that the resulting expedited
grace periods can take hundreds of milliseconds and even more than one
second, when they should normally complete in less than a millisecond.
This was fine given that there were no particular response-time
constraints for synchronize_rcu_expedited(), as it was designed
for throughput rather than latency. However, some users now need
sub-100-millisecond response-time constratints.
This patch therefore follows Neeraj's suggestion (seconded by Tim and
by Uladzislau Rezki) of simply reversing the two operations.
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Joel Fernandes <joelaf@google.com>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Tim Murray <timmurray@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Bug: 217236054
Bug: 224756824
(cherry picked from commit 10c5357874
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
rcu/next)
Change-Id: Iac442f4cb0648c967ec65d4df0f74c8c25940393
Signed-off-by: Kyle Lin <kylelin@google.com>
commit b1a384d2cb upstream.
The kernel test robot discovered that building without
HARDEN_BRANCH_PREDICTOR issues a warning due to a missing
argument to pr_info().
Add the missing argument.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9dd78194a3 ("ARM: report Spectre v2 status through sysfs")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 90f59cc2f2)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I48fc9dff04f5cf292f069e7544f015d8c8322116
commit 36168e387f upstream.
ld.lld does not support the NOCROSSREFS directive at the moment, which
breaks the build after commit b9baf5c8c5 ("ARM: Spectre-BHB
workaround"):
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:34: AT expected, but got NOCROSSREFS
Support for this directive will eventually be implemented, at which
point a version check can be added. To avoid breaking the build in the
meantime, just define NOCROSSREFS to nothing when using ld.lld, with a
link to the issue for tracking.
Cc: stable@vger.kernel.org
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Link: https://github.com/ClangBuiltLinux/linux/issues/1609
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8c4192d126)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I82c14c8fbe885d76acb95d85289211979e0b298e
commit 33970b031d upstream.
In the recent Spectre BHB patches, there was a typo that is only
exposed in certain configurations: mcr p15,0,XX,c7,r5,4 should have
been mcr p15,0,XX,c7,c5,4
Reported-by: kernel test robot <lkp@intel.com>
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1749b553d7)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie941d37cac28fead32d2f3ca9a31038ec3efaa79
Document the functionality of disable_dma32 as introduced in commit
c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make
ZONE_DMA32 empty").
Bug: 199917449
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I32ab2969f59fcc49e9ac49e7e6b545f816d120f9
(cherry picked from commit 135406cecb)
zone_dma32_is_empty() currently lacks the proper validation to ensure
that the NUMA node ID it receives as an argument is valid. This has no
effect on kernels with CONFIG_NUMA=n as NODE_DATA() will return the
same pglist_data on these devices, but on kernels with CONFIG_NUMA=y,
this is not the case, and the node passed to NODE_DATA must be
validated.
Rather than trying to find the node containing ZONE_DMA32, replace
calls of zone_dma32_is_empty() with zone_dma32_are_empty() (which
iterates over all nodes and returns false if one of the nodes holds
DMA32 and it is non-empty).
Bug: 199917449
Fixes: c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty")
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I850fb9213b71a1ef29106728bfda0cc6de46fdbb
(cherry picked from commit bf96382fb9)
ZONE_DMA32 is enabled by default on android12-5.10, yet it is not
needed for all devices, nor is it desirable to have if not needed. For
instance, if a partner in GKI 1.0 did not use ZONE_DMA32, memory can
be lower for ZONE_NORMAL relative to older targets, such that memory
would run out more quickly in ZONE_NORMAL leading kswapd to be invoked
unnecessarily.
Correspondingly, provide a means of making ZONE_DMA32 empty via the
kernel command line when it is compiled in via CONFIG_ZONE_DMA32.
Bug: 199917449
Change-Id: I70ec76914b92e518d61a61072f0b3cb41cb28646
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
(cherry picked from commit c3c2bb34ac)
As pointed out by Evgenii Stepanov one potential issue with the new ABI for
enabling asymmetric is that if there are multiple places where MTE is
configured in a process, some of which were compiled with the old prctl.h
and some of which were compiled with the new prctl.h, there may be problems
keeping track of which MTE modes are requested. For example some code may
disable only sync and async modes leaving asymmetric mode enabled when it
intended to fully disable MTE.
In order to avoid such mishaps remove asymmetric mode from the prctl(),
instead implicitly allowing it if both sync and async modes are requested.
This should not disrupt userspace since a process requesting both may
already see a mix of sync and async modes due to differing defaults between
CPUs or changes in default while the process is running but it does mean
that userspace is unable to explicitly request asymmetric mode without
changing the system default for CPUs.
Reported-by: Evgenii Stepanov <eugenis@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Branislav Rankov <branislav.rankov@arm.com>
Link: https://lore.kernel.org/r/20220309131200.112637-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit cf220ad674
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/mte)
Bug: 217221156
Change-Id: I04eb365809b96a73f438f19069265ca901516bb5
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
MTE3 adds a new mode which is synchronous for reads but asynchronous for
writes. Document the userspace ABI for this feature, we call the new
mode ASYMM and add a new prctl flag and mte_tcf_preferred value for it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220216173224.2342152-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 3f9ab2a698
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/mte)
Bug: 217221156
Change-Id: Ib42652bf2d4924b201274454b98299574c8a5fad
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
This FROMLIST change has been updated. Reverting to be replaced with the
final version FROMGIT.
This reverts commit 926ce98105.
Bug: 217221156
Change-Id: I4e5c19675fc88987da9804a39a050ef050e2453a
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
In for_each_object_track we go through meta data of the slab
object in function(fn), and as a result false postive out-of-bound
access is reported by kasan. Fix this by wrapping that function call
with metadata_access_enable/disable.
Bug: 222651868
Fixes: ee8d2c7884 ("ANDROID: mm: add get_each_object_track function")
Change-Id: Ifb4241a9c3e397a52759d467aa267d1297e297dd
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
(cherry picked from commit cd6e5d5d7d)
Use rvh instead of vh for the iommu_setup_dma_ops to prevent
sleeping while atomic bugs as mutexes are used to serialize
access to iova regions, as well GFP_KERNEL allocations are used.
Bug: 214353193
Change-Id: I45f8f0404a247b67fd07a6831ff813bbc50fbca2
Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com>
Changes in 5.10.104
mac80211_hwsim: report NOACK frames in tx_status
mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
i2c: bcm2835: Avoid clock stretching timeouts
ASoC: rt5668: do not block workqueue if card is unbound
ASoC: rt5682: do not block workqueue if card is unbound
regulator: core: fix false positive in regulator_late_cleanup()
Input: clear BTN_RIGHT/MIDDLE on buttonpads
KVM: arm64: vgic: Read HW interrupt pending state from the HW
tipc: fix a bit overflow in tipc_crypto_key_rcv()
cifs: fix double free race when mount fails in cifs_get_root()
selftests/seccomp: Fix seccomp failure by adding missing headers
dmaengine: shdma: Fix runtime PM imbalance on error
i2c: cadence: allow COMPILE_TEST
i2c: qup: allow COMPILE_TEST
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
usb: gadget: don't release an existing dev->buf
usb: gadget: clear related members when goto fail
exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
exfat: fix i_blocks for files truncated over 4 GiB
tracing: Add test for user space strings when filtering on string pointers
serial: stm32: prevent TDR register overwrite when sending x_char
ata: pata_hpt37x: fix PCI clock detection
drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
tracing: Add ustring operation to filtering string pointers
ALSA: intel_hdmi: Fix reference to PCM buffer address
riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix config KASAN && DEBUG_VIRTUAL
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
iommu/amd: Recover from event log overflow
drm/i915: s/JSP2/ICP2/ PCH
xen/netfront: destroy queues before real_num_tx_queues is zeroed
thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
ntb: intel: fix port config status offset for SPR
mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
xfrm: fix MTU regression
netfilter: fix use-after-free in __nf_register_net_hook()
bpf, sockmap: Do not ignore orig_len parameter
xfrm: fix the if_id check in changelink
xfrm: enforce validity of offload input flags
e1000e: Correct NVM checksum verification flow
net: fix up skbs delta_truesize in UDP GRO frag_list
netfilter: nf_queue: don't assume sk is full socket
netfilter: nf_queue: fix possible use-after-free
netfilter: nf_queue: handle socket prefetch
batman-adv: Request iflink once in batadv-on-batadv check
batman-adv: Request iflink once in batadv_get_real_netdevice
batman-adv: Don't expect inter-netns unique iflink indices
net: ipv6: ensure we call ipv6_mc_down() at most once
net: dcb: flush lingering app table entries for unregistered devices
net/smc: fix connection leak
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
rcu/nocb: Fix missed nocb_timer requeue
ice: Fix race conditions between virtchnl handling and VF ndo ops
ice: fix concurrent reset and removal of VFs
sched/topology: Make sched_init_numa() use a set for the deduplicating sort
sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
ia64: ensure proper NUMA distance and possible map initialization
mac80211: fix forwarded mesh frames AC & queue selection
net: stmmac: fix return value of __setup handler
mac80211: treat some SAE auth steps as final
iavf: Fix missing check for running netdev
net: sxgbe: fix return value of __setup handler
ibmvnic: register netdev after init of adapter
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
efivars: Respect "block" flag in efivar_entry_set_safe()
firmware: arm_scmi: Remove space in MODULE_ALIAS name
ASoC: cs4265: Fix the duplicated control name
can: gs_usb: change active_channels's type from atomic_t to u8
arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
igc: igc_read_phy_reg_gpy: drop premature return
ARM: Fix kgdb breakpoint for Thumb2
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
selftests: mlxsw: tc_police_scale: Make test more robust
pinctrl: sunxi: Use unique lockdep classes for IRQs
igc: igc_write_phy_reg_gpy: drop premature return
ibmvnic: free reset-work-item when flushing
memfd: fix F_SEAL_WRITE after shmem huge page allocated
s390/extable: fix exception table sorting
ARM: dts: switch timer config to common devkit8000 devicetree
ARM: dts: Use 32KiHz oscillator on devkit8000
soc: fsl: guts: Revert commit 3c0d64e867
soc: fsl: guts: Add a missing memory allocation failure check
soc: fsl: qe: Check of ioremap return value
ARM: tegra: Move panels to AUX bus
ibmvnic: complete init_done on transport events
net: chelsio: cxgb3: check the return value of pci_find_capability()
iavf: Refactor iavf state machine tracking
nl80211: Handle nla_memdup failures in handle_nan_filter
drm/amdgpu: fix suspend/resume hang regression
net: dcb: disable softirqs in dcbnl_flush_dev()
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Input: samsung-keypad - properly state IOMEM dependency
HID: add mapping for KEY_DICTATE
HID: add mapping for KEY_ALL_APPLICATIONS
tracing/histogram: Fix sorting on old "cpu" value
tracing: Fix return value of __setup handlers
btrfs: fix lost prealloc extents beyond eof after full fsync
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
btrfs: add missing run of delayed items after unlink during log replay
Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
hamradio: fix macro redefine warning
Linux 5.10.104
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6db85dae2ee6420dfab7fc72fe79acdb74560637
The pKVM hypervisor currently zeroes all the pages mapped into guests
when tearing them down for confidentiality reasons. However, for pages
that are shared with the host this is unecessary at best as the content
of memory is already visible. This is particularly bad for non-protected
guests as all their memory is shared with the host by definition.
Add a new flag to distingish pages that solely need to be updated from
an ownership perspective and those that need to be zeroed.
NOTE: We should probably overhaul the teardown procedure at some point
to avoid the proliferation of those flags, but that would require
significant changes so we might not want that in Android 13.
Bug: 223678931
Change-Id: Icefc85a0bdcdf9958e9eb6871c794f68b06a007f
Signed-off-by: Quentin Perret <qperret@google.com>
The pKVM shadow table is protected by 'shadow_lock', however this lock
is only taken across relatively fine-grained calls when inserting and
removing entries from the table. This poses a problem for higher-level
functions such as __pkvm_init_shadow(), where a partially-initialised
shadow entry is made transiently visibly to get_shadow_vcpu() and could
potentially be loaded in an inconsistent state by another CPU.
Push the locking out of the insert/remove functions and up into
__pkvm_{init,teardown}_shadow() so that the shadow state always appears
to be consistent as long as the lock is held.
Bug: 216808671
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I74c563a539c1ce35f5da86a8281e47c7d435bd27
There's no reason to make the internal shadow table data directly
accessible outside of pkvm.c, so make it all static and provide an
initialisation function to install the initial pages.
Bug: 216808671
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idc0908796ebbd2b620494f5d4d6b6055455c8013
Add three new symbols to the aarch64 kernel ABI. These are to be
called from vendor modules to register an IOMMU with pKVM and
notify the hypervisor about its PM events.
New symbols:
- pkvm_iommu_s2mpu_register
- pkvm_iommu_suspend
- pkvm_iommu_resume
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I9797326a54cba6abd1b233682379de10139c2303
With new generic IOMMU code in place, and with all S2MPU code
having been migrated to the new pkvm_iommu_ops callbacks, remove
all the now unused code.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I6abc7ef0f400250cbb38a673feb1db35116c3f69
Remove the existing 's2mpu_host_stage2_set_owner' hook implementation
and refactor the code to match the prepare/apply split of the generic
IOMMU callbacks for updating host stage-2 mappings.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: If550fe2c41198c320559c8125ec9ecc0479eb249
Core SMPT manipulation code returns mpt_update_flags, signalling whether
the caller should flush the dcache (MPT_UPDATE_L2) or write new L1ATTR
values to S2MPU MMIO registers (MPT_UPDATE_L1).
In preparation for splitting the code into a driver-global and
per-device portions, store the value in the corresponding FMPT.
As long as the two code portions are called from a single critical
section, the FMPT value is guaranteed to not change.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Iec06697e8826b0dba682476b39cf64acd6337166
Previously the S2MPU DABT handler would be called directly from the host
DABT handler and it would look up the corresponding S2MPU device. Now the
lookup is done in the generic IOMMU DABT handler and only the actual
S2MPU register access is left to the driver itself.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I5236cf01b9e1dcc65a00081797a13ee92a4e263b
The host is now expected to notify EL2 about PM state changes of
individual IOMMU devices. Remove the old code that intercepted SMCs
and instead rely on callbacks from the core IOMMU code.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I2dd49836c01e405562ba62c00efc711f084d5963
Create 'struct pkvm_iommu_ops' for the S2MPU and a new driver ID to the
list of IOMMU drivers. Implement the 'init' callback, accepting donated
memory from the host to back SMPTs. If the donation is successful,
the SMPTs are assigned to 'host_mpt'.
Export 'pkvm_iommu_s2mpu_register' for a kernel module to call to
register an S2MPU device. First call to this function will also
run the global S2MPU driver initializer.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I3d1aaf8535114beae956993674a2b436414f07c4
The function is superseded by the generic
pkvm_iommu_host_stage2_adjust_range, remove it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I7d138a3c2e2497bdc19e6e6e95b7870ac48d890e
Replace all uses of 'struct s2mpu' with the generic 'struct pkvm_iommu'.
'struct s2mpu_drv_data' is created to accommodate driver-specific values
associated with 'struct pkvm_iommu' and allocated by the generic code.
These changes are safe because the S2MPU code is currently unused.
The EL1 code that initialized it had been removed.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ib12b64ffa3281be83440e33cdd032d80df6e4868
EL2 S2MPU driver relied on EL1 code which parsed the DT and populated
EL2 driver data before deprivileging of the host. The driver is now
moving to later initialization from kernel modules, which will take over
the role of parsing the DT and power management. Remove the unused code.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I96542ceeec4fcf1040658779a922363b1e41e976
S2MPU code previously assumed that all S2MPUs were powered on at boot
and would check the version register and precompute the value of
S2MPU.CONTEXT_CFG_VALID_VID.
With EL1 S2MPU code being removed, and to allow for S2MPUs not powered
at boot, move the code to EL2 and run it on resume.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: Ib3c3926c1ed3b78fe39769758a8d66490963d4c5