commit efe1dc571a upstream.
Contention for the mailbox interface may occur during driver initialization
(immediately after a function reset), between mailbox commands initiated
via ioctl (bsg) and those driver requested by the driver.
After setting SLI_ACTIVE flag for a port, there is a window in which the
driver will allow an ioctl to be initiated while the adapter is
initializing and issuing mailbox commands via polling. The polling logic
then gets confused.
Correct by having thread setting SLI_ACTIVE spot an active mailbox command
and allow it complete before proceeding.
Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 921ca574cd upstream.
When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP
socket is switched into functional addressing mode, where only single frame
(SF) protocol data units can be send on the specified CAN interface and the
given tp.tx_id after bind().
In opposite to normal and extended addressing this socket does not register a
CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a
segmented bi-directional data transfer.
Sending SFs on this socket is therefore a TX-only 'broadcast' operation.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Thomas Wagner <thwa1@web.de>
Link: https://lore.kernel.org/r/20201206144731.4609-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24d7275ce2 upstream.
The syzbot reported the below BUG:
kernel BUG at include/linux/page-flags.h:785!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
Call Trace:
page_mapcount include/linux/mm.h:837 [inline]
smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466
smaps_pte_entry fs/proc/task_mmu.c:538 [inline]
smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601
walk_pmd_range mm/pagewalk.c:128 [inline]
walk_pud_range mm/pagewalk.c:205 [inline]
walk_p4d_range mm/pagewalk.c:240 [inline]
walk_pgd_range mm/pagewalk.c:277 [inline]
__walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379
walk_page_vma+0x277/0x350 mm/pagewalk.c:530
smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768
smap_gather_stats fs/proc/task_mmu.c:741 [inline]
show_smap+0xc6/0x440 fs/proc/task_mmu.c:822
seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272
seq_read+0x3e0/0x5b0 fs/seq_file.c:162
vfs_read+0x1b5/0x600 fs/read_write.c:479
ksys_read+0x12d/0x250 fs/read_write.c:619
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time. MADV_FREE may split THPs if it is called
for partial THP. It may trigger the below race:
CPU A CPU B
----- -----
smaps walk: MADV_FREE:
page_mapcount()
PageCompound()
split_huge_page()
page = compound_head(page)
PageDoubleMap(page)
When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.
This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.
Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount(). Don't skip getting mapcount
for device private entries since they do track references with mapcount.
Pagemap also has the similar issue although it was not reported. Fixed
it as well.
[shy828301@gmail.com: v4]
Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()]
Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com
Fixes: e9b61f1985 ("thp: reintroduce split_huge_page()")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e386dfc56f upstream.
Commit 054aa8d439 ("fget: check that the fd still exists after getting
a ref to it") fixed a race with getting a reference to a file just as it
was being closed. It was a fairly minimal patch, and I didn't think
re-checking the file pointer lookup would be a measurable overhead,
since it was all right there and cached.
But I was wrong, as pointed out by the kernel test robot.
The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
quite noticeably. Admittedly it seems to be a very artificial test:
doing "poll()" system calls on regular files in a very tight loop in
multiple threads.
That means that basically all the time is spent just looking up file
descriptors without ever doing anything useful with them (not that doing
'poll()' on a regular file is useful to begin with). And as a result it
shows the extra "re-check fd" cost as a sore thumb.
Happily, the regression is fixable by just writing the code to loook up
the fd to be better and clearer. There's still a cost to verify the
file pointer, but now it's basically in the noise even for that
benchmark that does nothing else - and the code is more understandable
and has better comments too.
[ Side note: this patch is also a classic case of one that looks very
messy with the default greedy Myers diff - it's much more legible with
either the patience of histogram diff algorithm ]
Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Carel Si <beibei.si@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfb3aa735f upstream.
An outgoing CPU is marked offline in a stop-machine handler and most
of that CPU's services stop at that point, including IRQ work queues.
However, that CPU must take another pass through the scheduler and through
a number of CPU-hotplug notifiers, many of which contain RCU readers.
In the past, these readers were not a problem because the outgoing CPU
has interrupts disabled, so that rcu_read_unlock_special() would not
be invoked, and thus RCU would never attempt to queue IRQ work on the
outgoing CPU.
This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD
Kconfig option, in which rcu_read_unlock_special() is invoked upon exit
from almost all RCU read-side critical sections. Worse yet, because
interrupts are disabled, rcu_read_unlock_special() cannot immediately
report a quiescent state and will therefore attempt to defer this
reporting, for example, by queueing IRQ work. Which fails with a splat
because the CPU is already marked as being offline.
But it turns out that there is no need to report this quiescent state
because rcu_report_dead() will do this job shortly after the outgoing
CPU makes its final dive into the idle loop. This commit therefore
makes rcu_read_unlock_special() refrain from queuing IRQ work onto
outgoing CPUs.
Fixes: 44bad5b3cc ("rcu: Do full report for .need_qs for strict GPs")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0764db9b49 upstream.
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock:
00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock:
00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
dump_stack_lvl+0x76/0x98
check_noncircular+0x136/0x158
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg. Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954 ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that TLBI invalidation is handled entirely at EL2 for both protected
and non-protected guests when protected KVM has initialised, unplug the
unused TLBI hypercalls.
Bug: 220815716
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Id3e341325237c606c3997cf1626bf092b761b604
It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always
result in the pending interrupts being accurately reported if they are
mapped to a HW interrupt. This is particularily visible when acking
the timer interrupt and reading the GICR_ISPENDR1 register immediately
after, for example (the interrupt appears as not-pending while it really
is...).
This is because a HW interrupt has its 'active and pending state' kept
in the *physical* distributor, and not in the virtual one, as mandated
by the spec (this is what allows the direct deactivation). The virtual
distributor only caries the pending and active *states* (note the
plural, as these are two independent and non-overlapping states).
Fix it by reading the HW state back, either from the timer itself or
from the distributor if necessary.
Reported-by: Ricardo Koller <ricarkol@google.com>
Tested-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org
Bug: 209777660
(cherry picked from commit 5bfa685e62)
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I974ce5801420bf5d27ac2aad0e93cd31437cc5b7
Handle PSCI SYSTEM_RESET2 similarly to SYSTEM_RESET by advertising PSCI
v1.1 and forwarding the new reset request back to the host along with
its argument register.
Bug: 216801012
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic9ff1cc6dbbcbc969e2863a9f07eab8e5b264266
When handling reset and power-off PSCI calls from the guest, we
initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to
re-run the vCPU after issuing the call.
Unfortunately, this also means that the VMM cannot see which PSCI call
was issued and therefore cannot distinguish between PSCI SYSTEM_RESET
and SYSTEM_RESET2 calls, which is necessary in order to determine the
validity of the "reset_type" in X1.
Allocate bit 0 of the previously unused 'flags' field of the
system_event structure so that we can indicate the PSCI call used to
initiate the reset.
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
Bug: 216801012
[willdeacon@: Fix context conflict in uapi header with pKVM CAP]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Idcfb24d3f974d96b1ebbf860741e750e549ff6df
PSCI v1.1 introduces the optional SYSTEM_RESET2 call, which allows the
caller to provide a vendor-specific "reset type" and "cookie" to request
a particular form of reset or shutdown.
Expose this call to the guest and handle it in the same way as PSCI
SYSTEM_RESET, along with some basic range checking on the type argument.
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-3-will@kernel.org
Bug: 216801012
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Id1954c7c5274dda50e247c500ec03e87ca8a2028
Systems with a single early entropy source based on either CPU
instructions or a firmware interface are unable to initialise the crng
unless the CPU is "trusted". By default, the CPU is untrusted and so for
protected virtual machines this causes a significant boot delay as the
crng refuses to initialise solely from the TRNG hypercall:
| [ 0.000000][ T0] random: get_random_u64 called from kmem_cache_open+0x2c/0x390 with crng_init=0
| ...
| [ 1.297022][ T211] EXT4-fs (dm-24): mounted filesystem without journal. Opts: (null)
| [ 3.362924][ C0] random: crng init done
| [ 3.363543][ C0] random: 7 urandom warning(s) missed due to ratelimiting
Since we trust the CPU and the firmware for many other things, such as
executing instructions and initialising the system, flip the default
around to trust CPU-backed entropy sources by default. This can be
disabled on the kernel command-line by passing "random.trust_cpu=off".
Bug: 220354122
Reported-by: Alan Stokes <alanstokes@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ie09b6253936823814d7bfe5092923f0ec825403a
The virtio video driver v2 [1] uses videobuf2 structure
vb2_dma_sg_memops, when virtio device supports non-contiguous DMA video
buffers.
DMA SG memory allocator for videobuf2
(drivers/media/common/videobuf2/videobuf2-dma-sg.c) is a common code and
has no hardware dependencies.
[1]: https://lore.kernel.org/all/20200218202753.652093-2-dmitry.sepp@opensynergy.com/
Bug: 219998156
Signed-off-by: Mikhail Golubev <Mikhail.Golubev@opensynergy.com>
Change-Id: I897898090d7a97b13202c05aae28955595e09468
Even if PVMs can't be straightforwardly reset, we should still propagate
this information to the VMM.
Bug: 216801012
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I41833748da6744605ec3d68897a785305640ea2b
The architecture provides an asymmetric mode for MTE where tag mismatches
are checked asynchronously for reads but synchronously for loads. Allow
userspace processes to select this and make it available as a default mode
via the existing per-CPU sysfs interface.
Since there PR_MTE_TCF_ values are a bitmask (allowing the kernel to choose
between the multiple modes) and there are no free bits adjacent to the
existing PR_MTE_TCF_ bits the set of bits used to specify the mode becomes
disjoint. Programs using the new interface should be aware of this and
programs that do not use it will not see any change in behaviour.
When userspace requests two possible modes but the system default for the
CPU is the third mode (eg, default is synchronous but userspace requests
either asynchronous or asymmetric) the preference order is:
ASYMM > ASYNC > SYNC
This situation is not currently possible since there are only two modes and
it is mandatory to have a system default so there could be no ambiguity and
there is no ABI change. The chosen order is basically arbitrary as we do not
have a clear metric for what is better here.
If userspace requests specifically asymmetric mode via the prctl() and the
system does not support it then we will return an error, this mirrors
how we handle the case where userspace enables MTE on a system that does
not support MTE at all and the behaviour that will be seen if running on
an older kernel that does not support userspace use of asymmetric mode.
Attempts to set asymmetric mode as the default mode will result in an error
if the system does not support it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/20220127195712.748150-5-broonie@kernel.org/
Bug: 217221156
Change-Id: I9fef2f29e4afad61aa1a2f9ceee89d9e35af77e1
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
This is useful for testing fuse-bpf directly on a backing folder.
Bug: 217570523
Test: mount -t fuse [DEVNAME] [mntpoint] -o user_id=0,group_id=0,rootmode=0040000,
no_daemon,root_dir=[backingfd]
Change-Id: I9ac13c3f707d71cbb74dba10eda5778bf3e83233
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Adds support for FUSE_STATFS, needed to run various filesystem tests
Bug: 217570523
Test: bpf_test_statfs
Change-Id: I5ee13e880118c5c79c4ca17bb2e902a3e17a7eb8
Signed-off-by: Daniel Rosenberg <drosen@google.com>
filldir used strcpy, potentially leading to writing the ending null past
the current page. fuse_dirents are not null terminated, so we switch to
using memcpy to avoid adding an extraneous null, which would be
overwritten if the name was already byte aligned.
Bug: 217570523
Test: generic/027
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ic5d1f1887a113e1a3319998bad47cfbac3d90baa
If we hit an error during fuse_create_open, some variables will be
undefined during the finalize, so check that they were actually
initialized before accessing.
Bug: 217570523
Test: attempt to over fill disk
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I094564b83e49eec2a6bac5bd050b4f7327b0c979
FF-A memory descriptors may need to be sent in fragments when they don't
fit in the mailboxes. Doing so involves using the FRAG_TX and FRAG_RX
primitives defined in the FF-A protocol.
Add support in the pKVM FF-A relayer for fragmented descriptors by
monitoring outgoing FRAG_TX transactions and by buffering large
descriptors on the reclaim path.
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I701f279cd4820abb0b6d7c2572ee28e0f943edad
Handle FFA_MEM_LEND calls from the host by treating them identically to
FFA_MEM_SHARE calls for the purposes of the host stage-2 page-table, but
forwarding on the original request to EL3.
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8f53bca6f0865fabd9938eefd8427fa0e78016ed
Intecept FFA_MEM_RECLAIM calls from the host and transition the host
stage-2 page-table entries from the SHARED_OWNED state back to the OWNED
state once EL3 has confirmed that the secure mapping has been reclaimed.
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I58365e1b3fafa47f290a292fe57f6d2ed7f9091b
Intercept FFA_MEM_SHARE/FFA_FN64_MEM_SHARE calls from the host and
transition the host stage-2 page-table entries from the OWNED state to
the SHARED_OWNED state prior to forwarding the call onto EL3.
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ic75a943c67e6cb96794c250dccf2a59362bf857e
Extend pKVM's memory protection code so that we can update the host's
stage-2 page-table to track pages shared with secure world by the host
using FF-A and prevent those pages from being mapped into a guest.
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
[willdeacon@: Moved 'pkvm_ffa_id' to nvhe/mem_protect.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ib4d404cd1d4fa11d7bf8c1d0b8ec00838a8038a0
Handle FFA_RXTX_MAP and FFA_RXTX_UNMAP calls from the host by sharing
the host's mailbox memory with the hypervisor and establishing a
separate pair of mailboxes between the hypervisor and the SPMD at EL3.
Co-developed-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ib5fa89e9b01aa20f7c1b5b41df79d66e98d07f55
The FF-A proxy code needs to allocate its own buffer pair for
communication with EL3 and for forwarding calls from the host at EL1.
Reserve a couple of pages for this purpose and use them to initialise
the hypervisor's FF-A buffer structure.
Co-developed-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Id72cd7f59be20eb6d1faa6f1c5e64ecc8debf929
Filter out advertising unsupported features, and only advertise
features and properties that are supported by the hypervisor proxy.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I071766d6d241f4bdd00b8f80e6b237c184a1e59a
Probe FF-A during pKVM initialisation so that we can detect any
inconsistencies in the version or partition ID early on.
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I7def4c2c497017ba86621bc98298bc65ffdeefae
When KVM is initialised in protected mode, we must take care to filter
certain FFA calls from the host kernel so that the integrity of guest
and hypervisor memory is maintained and is not made available to the
secure world.
As a first step, intercept and block all memory-related FF-A SMC calls
from the host to EL3. This puts the framework in place for handling them
properly.
Co-developed-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
[willdeacon@: Fixed #include context conflict with nvhe/iommu.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I5279bce56956c590862a68e8c4803dd2205e3f81
nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.
Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ia48a1747fe10fd7ec7ba545c3290fe69b8d1839c
This is consistent with the other comments in the struct.
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I10e9014a0d505fe5e132fb1cd6105b95a3f5f2bf
FF-A function IDs and error codes will be needed in the hypervisor too,
so move to them to the header file where they can be shared. Rename the
version constants with an "FFA_" prefix so that they are less likely
to clash with other code in the tree.
Co-developed-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Andrew Walbran <qwandor@google.com>
Bug: 171706629
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I00ed487279fdfb61ea34ae99140c6fac8ee89187
as a response to comments in I7e2c62e7dd97c6b.
Test: TH
Bug: 215745244
Change-Id: I2a2104216107dd5e22d4b1c6d4d978f0c79a3004
Signed-off-by: Yifan Hong <elsk@google.com>
This reverts commit e2c27194fc.
This change breaks Android userspace tools (tracepoint bpf programs,
simpleperf, atrace, perfetto, ...) that assume access to tracefs.
On Android T and S (QPR2) this is fixed by adding a gid mount option
in userspace, for devices with older kernels the permission change
needs to be reverted.
Bug: 219393750
Bug: 217150407
Bug: 216676030
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I53a63c42b4cf1133a6a2fc1674380ffd8f331392
(cherry picked from commit d2392cf8b12d8c1526e4d5766c642a47aecc28e7)
Changes in 5.10.101
integrity: check the return value of audit_log_start()
ima: Remove ima_policy file before directory
ima: Allow template selection with ima_template[_fmt]= after ima_hash=
ima: Do not print policy rule with inactive LSM labels
mmc: sdhci-of-esdhc: Check for error num after setting mask
can: isotp: fix potential CAN frame reception race in isotp_rcv()
net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYs
net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs
NFS: Fix initialisation of nfs_client cl_flags field
NFSD: Clamp WRITE offsets
NFSD: Fix offset type in I/O trace points
drm/amdgpu: Set a suitable dev_info.gart_page_size
tracing: Propagate is_signed to expression
NFS: change nfs_access_get_cached to only report the mask
NFSv4 only print the label when its queried
nfs: nfs4clinet: check the return value of kstrdup()
NFSv4.1: Fix uninitialised variable in devicenotify
NFSv4 remove zero number of fs_locations entries error check
NFSv4 expose nfs_parse_server_name function
NFSv4 handle port presence in fs_location server string
x86/perf: Avoid warning for Arch LBR without XSAVE
drm: panel-orientation-quirks: Add quirk for the 1Netbook OneXPlayer
net: sched: Clarify error message when qdisc kind is unknown
powerpc/fixmap: Fix VM debug warning on unmap
scsi: target: iscsi: Make sure the np under each tpg is unique
scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()
scsi: qedf: Add stag_work to all the vports
scsi: qedf: Fix refcount issue when LOGO is received during TMF
scsi: pm8001: Fix bogus FW crash for maxcpus=1
scsi: ufs: Treat link loss as fatal error
scsi: myrs: Fix crash in error case
PM: hibernate: Remove register_nosave_region_late()
usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
perf: Always wake the parent event
nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
KVM: eventfd: Fix false positive RCU usage warning
KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode
KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
riscv: fix build with binutils 2.38
ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
ARM: dts: Fix boot regression on Skomer
ARM: socfpga: fix missing RESET_CONTROLLER
nvme-tcp: fix bogus request completion when failing to send AER
ACPI/IORT: Check node revision for PMCG resources
PM: s2idle: ACPI: Fix wakeup interrupts handling
drm/rockchip: vop: Correct RK3399 VOP register fields
ARM: dts: Fix timer regression for beagleboard revision c
ARM: dts: meson: Fix the UART compatible strings
ARM: dts: meson8: Fix the UART device-tree schema validation
ARM: dts: meson8b: Fix the UART device-tree schema validation
staging: fbtft: Fix error path in fbtft_driver_module_init()
ARM: dts: imx6qdl-udoo: Properly describe the SD card detect
phy: xilinx: zynqmp: Fix bus width setting for SGMII
ARM: dts: imx7ulp: Fix 'assigned-clocks-parents' typo
usb: f_fs: Fix use-after-free for epfile
gpio: aggregator: Fix calling into sleeping GPIO controllers
drm/vc4: hdmi: Allow DBLCLK modes even if horz timing is odd.
misc: fastrpc: avoid double fput() on failed usercopy
netfilter: ctnetlink: disable helper autoassign
arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
ixgbevf: Require large buffers for build_skb on 82599VF
drm/panel: simple: Assign data from panel_dpi_probe() correctly
ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
gpio: sifive: use the correct register to read output values
bonding: pair enable_port with slave_arr_updates
net: dsa: mv88e6xxx: don't use devres for mdiobus
net: dsa: ar9331: register the mdiobus under devres
net: dsa: bcm_sf2: don't use devres for mdiobus
net: dsa: felix: don't use devres for mdiobus
net: dsa: lantiq_gswip: don't use devres for mdiobus
ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
nfp: flower: fix ida_idx not being released
net: do not keep the dst cache when uncloning an skb dst and its metadata
net: fix a memleak when uncloning an skb dst and its metadata
veth: fix races around rq->rx_notify_masked
net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
tipc: rate limit warning for received illegal binding update
net: amd-xgbe: disable interrupts during pci removal
dpaa2-eth: unregister the netdev before disconnecting from the PHY
ice: fix an error code in ice_cfg_phy_fec()
ice: fix IPIP and SIT TSO offload
net: mscc: ocelot: fix mutex lock error during ethtool stats read
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
vt_ioctl: fix array_index_nospec in vt_setactivate
vt_ioctl: add array_index_nospec to VT_ACTIVATE
n_tty: wake up poll(POLLRDNORM) on receiving data
eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
usb: dwc2: drd: fix soft connect when gadget is unconfigured
Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
usb: ulpi: Move of_node_put to ulpi_dev_release
usb: ulpi: Call of_node_put correctly
usb: dwc3: gadget: Prevent core from processing stale TRBs
usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
USB: gadget: validate interface OS descriptor requests
usb: gadget: rndis: check size of RNDIS_MSG_SET command
usb: gadget: f_uac2: Define specific wTerminalType
usb: raw-gadget: fix handling of dual-direction-capable endpoints
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
USB: serial: option: add ZTE MF286D modem
USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
USB: serial: cp210x: add NCR Retail IO box id
USB: serial: cp210x: add CPI Bulk Coin Recycler id
speakup-dectlk: Restore pitch setting
phy: ti: Fix missing sentinel for clk_div_table
hwmon: (dell-smm) Speed up setting of fan speed
Makefile.extrawarn: Move -Wunaligned-access to W=1
can: isotp: fix error path in isotp_sendmsg() to unlock wait queue
scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
scsi: lpfc: Reduce log messages seen after firmware download
arm64: dts: imx8mq: fix lcdif port node
perf: Fix list corruption in perf_cgroup_switch()
iommu: Fix potential use-after-free during probe
Linux 5.10.101
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6105dcbfc0c7f1373020d378d2048e692dc502ab
commit 5f4e5ce638 upstream.
There's list corruption on cgrp_cpuctx_list. This happens on the
following path:
perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
cpu_ctx_sched_in
ctx_sched_in
ctx_pinned_sched_in
merge_sched_in
perf_cgroup_event_disable: remove the event from the list
Use list_for_each_entry_safe() to allow removing an entry during
iteration.
Fixes: 058fe1c044 ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91f6d5f181 upstream.
The port node does not have a unit-address, remove it.
This fixes the warnings:
lcd-controller@30320000: 'port' is a required property
lcd-controller@30320000: 'port@0' does not match any of the regexes:
'pinctrl-[0-9]+'
Fixes: commit d0081bd02a ("arm64: dts: imx8mq: Add NWL MIPI DSI controller")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>