enter_exception64() performs an MTE check, which involves dereferencing
vcpu->kvm. While vcpu has already been fixed up to be a HYP VA pointer,
kvm is still a pointer in the kernel VA space.
This only affects nVHE configurations with MTE enabled, as in other
cases, the pointer is either valid (VHE) or not dereferenced (!MTE).
Fix this by first converting kvm to a HYP VA pointer.
Fixes: ea7fc1bb1c ("KVM: arm64: Introduce MTE VM feature")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221027120945.29679-1-ryan.roberts@arm.com
(cherry picked from commit b6bcdc9f6b)
[willdeacon@: Fixed conflict with aosp/2038249 rework moving MTE feature
check into caller]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: Id0aac0fc38dff2569081910af7468ecf97b6eca3
In commit 720c241924 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f260 ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Note this patch is specific to stable branches 5.4 and 5.10. Since in
newer kernel releases binder no longer caches a pointer to the vma.
Instead, it has been refactored to use vma_lookup() which avoids the
issue described here. This switch was introduced in commit a43cfc87ca
("android: binder: stop saving a pointer to the VMA").
Bug: 254837884
Link: https://lore.kernel.org/all/20221104175450.306810-1-cmllamas@google.com/
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org> # 5.10.x
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Change-Id: Ieabadbfa30f99812da9c226cf1ddd5e60f62c607
This is a stable-specific patch.
I botched the stable-specific rewrite of
commit b67fbebd4c ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"):
As Hugh pointed out, unmap_region() actually operates on a list of VMAs,
and the variable "vma" merely points to the first VMA in that list.
So if we want to check whether any of the VMAs we're operating on is
PFNMAP or MIXEDMAP, we have to iterate through the list and check each VMA.
Bug: 245812080
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3998dc50eb)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I115183f65fc7df5d33264e6211adcd2ec531d996
[ Upstream commit ba953a9d89 ]
When namespace support was added to xfrm/afkey, it caused the
previously single-threaded call to xfrm_probe_algs to become
multi-threaded. This is buggy and needs to be fixed with a mutex.
Bug: 245674737
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Fixes: 283bc9f35b ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Change-Id: I71fb89a999447862a6c4b1ff754378bb0452ad3a
Signed-off-by: Lee Jones <joneslee@google.com>
commit b67fbebd4c upstream.
Some drivers rely on having all VMAs through which a PFN might be
accessible listed in the rmap for correctness.
However, on X86, it was possible for a VMA with stale TLB entries
to not be listed in the rmap.
This was fixed in mainline with
commit b67fbebd4c ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"),
but that commit relies on preceding refactoring in
commit 18ba064e42 ("mmu_gather: Let there be one tlb_{start,end}_vma()
implementation") and commit 1e9fdf21a4 ("mmu_gather: Remove per arch
tlb_{start,end}_vma()").
This patch provides equivalent protection without needing that
refactoring, by forcing a TLB flush between removing PTEs in
unmap_vmas() and the call to unlink_file_vma() in free_pgtables().
Bug: 245812080
[This is a stable-specific rewrite of the upstream commit!]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8f539ff0365fb9b5d10fddb84082d5995348b897
Memory donated to the hypervisor needs to be contiguous, which
might be difficult to find. To improve the odds of finding
contiguous memory, break up vcpu state donations per vcpu.
Bug: 232070947
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: Iff19b2e2b6ca58b1e6ef38c4b0f16c80dae34ab9
This is done as the first step towards donating memory per vcpu
in future patches without having to spend potentially too much
time in one hypercall.
Moreover, this has the nice effect of removing the need for
stashing the host vcpus in the memory donated for the pgd.
Bug: 232070947
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I491c358fa29dd62ffc45347d6288696c846d5fc3
Factor out unpinning a single host vcpu from unpin_host_vcpus(),
since it will be used in a future patch in the error path.
No functional change intended.
Bug: 232070947
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I321e41ae624b2daae8fc917432be0673e32235aa
Facilitates future patches that move the initialization of the
shadow vcpu to a separate hyp call.
Removed unused parameter (vcpu_array/pgd) from
init_shadow_structs().
No functional change intended.
Bug: 232070947
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I5c3116e7558d958c03ea28dc5610122696a1fca2
Tidies up code and enables the reuse of this function.
No functional change intended.
Bug: 232070947
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I3a93dd0284e3c177b12d0cabf5e99747dceb0fb4
We need to carry on state from zap_pte_range_tlb_start to
zap_pte_range_tlb_end.
The new param on the function stack will keep the function
trace_android_vh_zap_pte_range_tlb_start called or not and
pass the state to trace_android_vh_zap_pte_range_tlb_end.
Thus, trace_android_vh_zap_pte_range_tlb_end will know
the trace_android_vh_zap_pte_range_tlb_start was called.
If it was called, trace_android_vh_zap_pte_range_tlb_end
will do action to make pair. Otherwise, just skip it.
Bug: 238728493
Bug: 256549265
Change-Id: I95706d51da66f916ede626686483523f3b68dacb
Signed-off-by: Minchan Kim <minchan@google.com>
In aosp/1979327 we attempted to prevent tasks with pending signals and
PF_FREEZER_SKIP from being immediately rescheduled, because such tasks
would crash the kernel if run while no capable CPUs were online. This was
implemented by declining to immediately reschedule them unless various
conditions were met. However, this ended up causing signals to fail to
be delivered if the signal was received while a task is processing a
syscall, such as futex(2), that will block with PF_FREEZER_SKIP set,
as the kernel relies on a check for TIF_SIGPENDING after setting the
task state to TASK_INTERRUPTIBLE in order to deliver such a signal.
This patch is an alternative solution to the original problem that
avoids introducing the signal delivery bug. It works by changing
how freezer_should_skip() is implemented. Instead of just checking
PF_FREEZER_SKIP, we also use the on_rq field to check whether the task
is not on a runqueue. In this way we ensure that a task that will be
immediately rescheduled will not return true from freezer_should_skip(),
and the task will block the freezer unless it is actually taken off
the runqueue.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Bug: 202918514
Bug: 251700836
Change-Id: I3f9b705ce9ad2ca1d2df959f43cf05bef78560f8
Commit ff05d4b45d upstream.
This is a different version of the commit, changed to store
the non-transmitted profile in the elems, and freeing it in
the few places where it's relevant, since that is only the
case when the last argument for parsing (the non-tx BSSID)
is non-NULL.
When we parse a multi-BSSID element, we might point some
element pointers into the allocated nontransmitted_profile.
However, we free this before returning, causing UAF when the
relevant pointers in the parsed elements are accessed.
Fix this by not allocating the scratch buffer separately but
as part of the returned structure instead, that way, there
are no lifetime issues with it.
The scratch buffer introduction as part of the returned data
here is taken from MLO feature work done by Ilan.
This fixes CVE-2022-42719.
Bug: 253642087
Fixes: 5023b14cf4 ("mac80211: support profile split between elements")
Co-developed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I68b07f5850a7ef363d631043d01f58a08aea9274
This is simply not valid and simplifies the next commit.
I'll make a separate patch for this in the current main
tree as well.
Bug: 254180332
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
(cherry picked from commit 353b5c8d4b)
Change-Id: Ie554c036923c94b125035141a3bffafc129a5aa6
commit c90b93b5b7 upstream.
When updating beacon elements in a non-transmitted BSS,
also update the hidden sub-entries to the same beacon
elements, so that a future update through other paths
won't trigger a WARN_ON().
The warning is triggered because the beacon elements in
the hidden BSSes that are children of the BSS should
always be the same as in the parent.
Bug: 254180332
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iea4669ba97b926dfa67e9592b3a263d3f18508e5
commit b2d03cabe2 upstream.
If beacon protection is active but the beacon cannot be
decrypted or is otherwise malformed, we call the cfg80211
API to report this to userspace, but that uses a netdev
pointer, which isn't present for P2P-Device. Fix this to
call it only conditionally to ensure cfg80211 won't crash
in the case of P2P-Device.
This fixes CVE-2022-42722.
Bug: 253642089
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Fixes: 9eaf183af7 ("mac80211: Report beacon protection failures to user space")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie3336b950136e26debbe835f97ad450d03f6baad
commit 1833b6f46d upstream.
If the tool on the other side (e.g. wmediumd) gets confused
about the rate, we hit a warning in mac80211. Silence that
by effectively duplicating the check here and dropping the
frame silently (in mac80211 it's dropped with the warning).
Bug: 254180332
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ieb3a258b998aca815efc5d09492ce66e461b5b88
commit bcca852027 upstream.
If a non-transmitted BSS shares enough information (both
SSID and BSSID!) with another non-transmitted BSS of a
different AP, then we can find and update it, and then
try to add it to the non-transmitted BSS list. We do a
search for it on the transmitted BSS, but if it's not
there (but belongs to another transmitted BSS), the list
gets corrupted.
Since this is an erroneous situation, simply fail the
list insertion in this case and free the non-transmitted
BSS.
This fixes CVE-2022-42721.
Bug: 253642088
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If83261f8b711f5ad0ce922abea2c35fedbc36c39
commit 0b7808818c upstream.
There are multiple refcounting bugs related to multi-BSSID:
- In bss_ref_get(), if the BSS has a hidden_beacon_bss, then
the bss pointer is overwritten before checking for the
transmitted BSS, which is clearly wrong. Fix this by using
the bss_from_pub() macro.
- In cfg80211_bss_update() we copy the transmitted_bss pointer
from tmp into new, but then if we release new, we'll unref
it erroneously. We already set the pointer and ref it, but
need to NULL it since it was copied from the tmp data.
- In cfg80211_inform_single_bss_data(), if adding to the non-
transmitted list fails, we unlink the BSS and yet still we
return it, but this results in returning an entry without
a reference. We shouldn't return it anyway if it was broken
enough to not get added there.
This fixes CVE-2022-42720.
Bug: 253642015
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de>
Fixes: a3584f56de ("cfg80211: Properly track transmitting and non-transmitting BSS")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I408bf72ca59b6ffbe2aba460f3e9326bf1c94eec
commit 567e14e39e upstream.
When iterating the elements here, ensure the length byte is
present before checking it to see if the entire element will
fit into the buffer.
Longer term, we should rewrite this code using the type-safe
element iteration macros that check all of this.
Bug: 254180332
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Reported-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I6ece37c57ca56462566adbcac6def6b35dc5b799
commit 8f033d2bec upstream.
Per spec, the maximum value for the MaxBSSID ('n') indicator is 8,
and the minimum is 1 since a multiple BSSID set with just one BSSID
doesn't make sense (the # of BSSIDs is limited by 2^n).
Limit this in the parsing in both cfg80211 and mac80211, rejecting
any elements with an invalid value.
This fixes potentially bad shifts in the processing of these inside
the cfg80211_gen_new_bssid() function later.
I found this during the investigation of CVE-2022-41674 fixed by the
previous patch.
Bug: 253641805
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Fixes: 78ac51f815 ("mac80211: support multi-bssid")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I7aa0b1a425fcf3a7797e83afa8ad6dd68b283b48
commit aebe9f4639 upstream.
In the copy code of the elements, we do the following calculation
to reach the end of the MBSSID element:
/* copy the IEs after MBSSID */
cpy_len = mbssid[1] + 2;
This looks fine, however, cpy_len is a u8, the same as mbssid[1],
so the addition of two can overflow. In this case the subsequent
memcpy() will overflow the allocated buffer, since it copies 256
bytes too much due to the way the allocation and memcpy() sizes
are calculated.
Fix this by using size_t for the cpy_len variable.
This fixes CVE-2022-41674.
Bug: 253641805
Reported-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de>
Tested-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de>
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I70d3a1188609751797cbabe905028d92d1700f17
Add vendor hook for bh_lru and lru_cache_disable
Bug: 238728493
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I81bfad317cf6e8633186ebb3238644306d7a102d
commit e64242caef upstream.
We need to prevent that users configure a screen size which is smaller than the
currently selected font size. Otherwise rendering chars on the screen will
access memory outside the graphics memory region.
This patch adds a new function fbcon_modechange_possible() which
implements this check and which later may be extended with other checks
if necessary. The new function is called from the FBIOPUT_VSCREENINFO
ioctl handler in fbmem.c, which will return -EINVAL if userspace asked
for a too small screen size.
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: b81212828a
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I6ac4cce2aeea4dcca222ea2b395cc2baa1008894
commit 65a01e601d upstream.
Prevent that users set a font size which is bigger than the physical screen.
It's unlikely this may happen (because screens are usually much larger than the
fonts and each font char is limited to 32x32 pixels), but it may happen on
smaller screens/LCD displays.
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: b81212828a
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I47e139779ab835a16d0b6b060e798ad35cad9f9b
The pagevec batching causes lru_add_drain_all which is too expensive
sometimes. This patch adds a new vendor hook to drain the pagevec
immediately depending on the page's type.
Bug: 251881967
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Id17e14e69197993ddad511a40c96e51674c02834
The SysMMU_SYNC provides an invalidation-complete signal to the
hypervisor. Currently the hypervisor will wait indefinitely for the SYNC
to set the SYNC_COMP_COMPLETE bit. In practice, this case deadlock as
the hypervisor holds the host lock while waiting for the SYNC.
To avoid deadlock, adjust the algorithm to time out after a given number
of reads of the SYNC_COMP register (new constant SYNC_TIMEOUT_BASE).
This can be a small number as most attempts succeed after a single read
of the SFR.
If the wait-loop times out, the hypervisor will try again, multiplying
the maximum number of SFR reads with SYNC_TIMEOUT_MULTIPLIER each time.
This number was selected to grow quickly, in case there is a lot of DMA
traffic that would be slowing down the SYNC request.
Finally, if the hardware does not set the bit even after
SYNC_MAX_RETRIES, the algorithm will give up to avoid deadlock. The
value was selected so that the worst-case time spent in
__wait_for_invalidation_complete() remains tolerable.
Bug: 250727777
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I00098753bcc46a894943bbdb3a61acc3a8e5e5d2
__clean_dcache_guest_page() is optimized to elide cache maintenance
operations on CPUs with FWB. The underlying assumption is that FWB is
always used by KVM when available. Although correct in the normal KVM
world, pKVM actively disables FWB for the host stage-2. As such,
omitting CMOs when guest memory is being reclaimed may provide a
malicious host with the ability to read the content of the recently
reclaimed pages.
Fix this by using the lower level kvm_flush_dcache_to_poc() helper
directly from the reclaim path.
Bug: 243501419
Reported-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I8e96ef7a8ccab2a59d3df46cd4d1a73190a2f457
Pierre-Clément reports that the error codes returned by the MMIO guard
map hypercall may end up being incorrectly reported as positive to
callers who interpret them a signed 64-bit integers, as specified in the
SMCCC.
Fix this by storing the return value in a 64-bit variable instead.
Bug: 253586500
Reported-by: Pierre-Clément Tosi <ptosi@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I3092856ec1a1fd1648a75c9e4ad4bfebd8830d14
4117cebf1a ("psi: Optimize task switch inside shared cgroups")
introduced a race condition that corrupts internal psi state. This
manifests as kernel warnings, sometimes followed by bogusly high IO
pressure:
psi: task underflow! cpu=1 t=2 tasks=[0 0 0 0] clear=c set=0
(schedule() decreasing RUNNING and ONCPU, both of which are 0)
psi: incosistent task state! task=2412744:systemd cpu=17 psi_flags=e clear=3 set=0
(cgroup_move_task() clearing MEMSTALL and IOWAIT, but task is MEMSTALL | RUNNING | ONCPU)
What the offending commit does is batch the two psi callbacks in
schedule() to reduce the number of cgroup tree updates. When prev is
deactivated and removed from the runqueue, nothing is done in psi at
first; when the task switch completes, TSK_RUNNING and TSK_IOWAIT are
updated along with TSK_ONCPU.
However, the deactivation and the task switch inside schedule() aren't
atomic: pick_next_task() may drop the rq lock for load balancing. When
this happens, cgroup_move_task() can run after the task has been
physically dequeued, but the psi updates are still pending. Since it
looks at the task's scheduler state, it doesn't move everything to the
new cgroup that the task switch that follows is about to clear from
it. cgroup_move_task() will leak the TSK_RUNNING count in the old
cgroup, and psi_sched_switch() will underflow it in the new cgroup.
A similar thing can happen for iowait. TSK_IOWAIT is usually set when
a p->in_iowait task is dequeued, but again this update is deferred to
the switch. cgroup_move_task() can see an unqueued p->in_iowait task
and move a non-existent TSK_IOWAIT. This results in the inconsistent
task state warning, as well as a counter underflow that will result in
permanent IO ghost pressure being reported.
Fix this bug by making cgroup_move_task() use task->psi_flags instead
of looking at the potentially mismatching scheduler state.
[ We used the scheduler state historically in order to not rely on
task->psi_flags for anything but debugging. But that ship has sailed
anyway, and this is simpler and more robust.
We previously already batched TSK_ONCPU clearing with the
TSK_RUNNING update inside the deactivation call from schedule(). But
that ordering was safe and didn't result in TSK_ONCPU corruption:
unlike most places in the scheduler, cgroup_move_task() only checked
task_current() and handled TSK_ONCPU if the task was still queued. ]
bug: b/253347377
Fixes: 4117cebf1a ("psi: Optimize task switch inside shared cgroups")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210503174917.38579-1-hannes@cmpxchg.org
(cherry picked from commit d583d360a6)
Change-Id: Id0a292058d4bffb716d8e1496f72139e8d435410
commit cd11d1a611 upstream.
It is possible for a malicious device to forgo submitting a Feature
Report. The HID Steam driver presently makes no prevision for this
and de-references the 'struct hid_report' pointer obtained from the
HID devices without first checking its validity. Let's change that.
Bug: 223455965
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: linux-input@vger.kernel.org
Fixes: c164d6abf3 ("HID: add driver for Valve Steam Controller")
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ica12507b87309a7c46b4cab6fcfe4499cd96f45d