The existing fuse-bpf freeing logic would free the fuse_file struct
immediately. However, this would break readahead. Move freeing logic
to the same place as done in classic fuse.
Bug: 286287652
Test: fuse_test passes, android boots, cts tests run
Change-Id: If13519f0e956a8da0dc98e7ac4aed2036070e969
Signed-off-by: Paul Lawrence <paullawrence@google.com>
By putting and nulling fuse_inode's bpf field in fuse_evict_inode, we
left a race condition - this inode can still be active. Do not put the
bpf program until we are doing the final free in fuse_free_inode. This
was the root cause of the reported bug.
The backing inode cannot be put in fuse_free_inode, since put_inode can
sleep and this is called from an RCU handler. But the backing inode
cannot be freed until an RCU interval, so move the put_inode to the same
location as in overlayfs, which is destroy_inode.
Remove a path in fuse_handle_bpf_prog whereby bpf can be nulled out.
When we want to be able to null/change the bpf_prog in the future, we
will have to use a mutex or maybe RCU to protect existing users. But
until this time, ban this path.
Bug: 284450048
Test: fuse_test passes, Pixel 6 passes basic tests
Change-Id: Ie6844242f279a5b202eb021eac5a2dd3d08bf09d
Signed-off-by: Paul Lawrence <paullawrence@google.com>
They are controlled by kernel_images.modules_list, which is
set by define_common_kernels already.
The flags in build.configs has no effect.
Test: TH
Bug: 287697703
Signed-off-by: Yifan Hong <elsk@google.com>
(cherry picked from https://android-review.googlesource.com/q/commit:9bf4e4620ecc801c7eb824210595d9777b4a2ff8)
Merged-In: I1e322529476b4db67a1574393819900bdbd41311
Change-Id: I1e322529476b4db67a1574393819900bdbd41311
[ Upstream commit 6326442278 ]
In r592_probe, dev->detect_timer was bound with r592_detect_timer.
In r592_irq function, the timer function will be invoked by mod_timer.
If we remove the module which will call hantro_release to make cleanup,
there may be a unfinished work. The possible sequence is as follows,
which will cause a typical UAF bug.
Fix it by canceling the work before cleanup in r592_remove.
CPU0 CPU1
|r592_detect_timer
r592_remove |
memstick_free_host|
put_device; |
kfree(host); |
|
| queue_work
| &host->media_checker //use
Bug: 287729043
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Link: https://lore.kernel.org/r/20230307164338.1246287-1-zyytlz.wz@163.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9a342d4eb9)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Idb15f593287ebaeec294b3e276126306fa6743ba
commit 22ed903eee upstream.
syzbot detected a crash during log recovery:
XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
XFS (loop0): Starting recovery (logdev: internal)
==================================================================
BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074
CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:306
print_report+0x107/0x1f0 mm/kasan/report.c:417
kasan_report+0xcd/0x100 mm/kasan/report.c:517
xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
get_tree_bdev+0x400/0x620 fs/super.c:1282
vfs_get_tree+0x88/0x270 fs/super.c:1489
do_new_mount+0x289/0xad0 fs/namespace.c:3145
do_mount fs/namespace.c:3488 [inline]
__do_sys_mount fs/namespace.c:3697 [inline]
__se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89fa3f4aca
Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
</TASK>
The fuzzed image contains an AGF with an obviously garbage
agf_refcount_level value of 32, and a dirty log with a buffer log item
for that AGF. The ondisk AGF has a higher LSN than the recovered log
item. xlog_recover_buf_commit_pass2 reads the buffer, compares the
LSNs, and decides to skip replay because the ondisk buffer appears to be
newer.
Unfortunately, the ondisk buffer is corrupt, but recovery just read the
buffer with no buffer ops specified:
error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
buf_f->blf_len, buf_flags, &bp, NULL);
Skipping the buffer leaves its contents in memory unverified. This sets
us up for a kernel crash because xfs_refcount_recover_cow_leftovers
reads the buffer (which is still around in XBF_DONE state, so no read
verification) and creates a refcountbt cursor of height 32. This is
impossible so we run off the end of the cursor object and crash.
Fix this by invoking the verifier on all skipped buffers and aborting
log recovery if the ondisk buffer is corrupt. It might be smarter to
force replay the log item atop the buffer and then see if it'll pass the
write verifier (like ext4 does) but for now let's go with the
conservative option where we stop immediately.
Bug: 284409747
Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reported-by: Danila Chernetsov <listdansp@mail.ru>
Link: https://lore.kernel.org/linux-xfs/20230601164439.15404-1-listdansp@mail.ru
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a2961463d7)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie5e156221966323a9cb7cc261b4ed17593cfaabd
This is needed to get the module on the system_dlkm image.
Bug: 276339429
Change-Id: Ib8c19d0d23f27bc3872e8d387b20cef07327c600
Signed-off-by: Will McVicker <willmcvicker@google.com>
Set KMI_GENERATION=11 for 6/16 KMI update
function symbol 'int ___pskb_trim(struct sk_buff*, unsigned int)' changed
CRC changed from 0x4fe35e7b to 0xa4e3f70
function symbol 'struct sk_buff* __alloc_skb(unsigned int, gfp_t, int, int)' changed
CRC changed from 0x6e6b7433 to 0x2505e437
function symbol 'int __dev_change_net_namespace(struct net_device*, struct net*, const char*, int)' changed
CRC changed from 0x2ee7d827 to 0xb66ff6d3
... 773 omitted; 776 symbols have only CRC changes
type 'struct usb_phy' changed
byte size changed from 368 to 400
member 'u64 android_kabi_reserved0' was added
member 'u64 android_kabi_reserved1' changed
offset changed by 64
member 'u64 android_kabi_reserved2' was added
member 'u64 android_kabi_reserved3' was added
member 'u64 android_kabi_reserved4' was added
type 'struct pneigh_entry' changed
member changed from 'u8 key[0]' to 'u32 key[0]'
offset changed from 208 to 224
type changed from 'u8[0]' to 'u32[0]'
element type changed from 'u8' = '__u8' = 'unsigned char' to 'u32' = '__u32' = 'unsigned int'
resolved type changed from 'unsigned char' to 'unsigned int'
type 'struct netns_sysctl_ipv6' changed
member changed from 'bool skip_notify_on_dev_down' to 'int skip_notify_on_dev_down'
type changed from 'bool' = '_Bool' to 'int'
resolved type changed from '_Bool' to 'int'
member 'u8 fib_notify_on_flag_change' changed
offset changed by 24
Bug: 287162457
Change-Id: I5c44eacc57c60b81e48faa3c0c335904909d0933
Signed-off-by: Carlos Llamas <cmllamas@google.com>
There is at least one pending change for struct usb_phy that is not
going to make the ABI freeze deadline, but has already been submitted
upstream and is under active development. So reserve a spot for that
new callback to be added, and provide a bit more buffer here to the
structure for any future LTS changes that might be coming in this area
of the kernel
Bug: 151154716
Cc: Stanley Chang <stanley_chang@realtek.com>
Change-Id: I992a46fa35502fd491ee24d503290119c9b9f655
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit ed779fe4c9 ]
After the blamed commit, the member key is longer 4-byte aligned. On
platforms that do not support unaligned access, e.g., MIPS32R2 with
unaligned_action set to 1, this will trigger a crash when accessing
an IPv6 pneigh_entry, as the key is cast to an in6_addr pointer.
Change the type of the key to u32 to make it aligned.
Fixes: 62dd93181a ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
Change-Id: I51f20602275108df08d17710c24aad9ee42fc1db
Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
Link: https://lore.kernel.org/r/20230601015432.159066-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ab0eca3f54)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit edf2e1d201 ]
skip_notify_on_dev_down ctl table expects this field
to be an int (4 bytes), not a bool (1 byte).
Because proc_dou8vec_minmax() was added in 5.13,
this patch converts skip_notify_on_dev_down to an int.
Following patch then converts the field to u8 and use proc_dou8vec_minmax().
Fixes: 7c6bb7d2fa ("net/ipv6: Add knob to skip DELROUTE message on device down")
Change-Id: Ifd03ea5a951388d90e33851fe3dd17ed26f3fcda
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9b001a7d1e)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Match by how many bytes or packets a connection has transferred so far, or by average bytes per packet.
Bug: 284571311
Signed-off-by: Vignesh Saravanaperumal <vignesh1.s@samsung.com>
Change-Id: I352bc42ab0da321e29a8cef1069565b7a5f182e7
(cherry picked from commit d80f39a5aed79ae81eb92f009829905da8e4f7a0)
cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks
have been checked. DL BW is not allocated per-task but as a sum over
all DL tasks migrating.
If multiple controllers are attached to the cgroup next to the cpuset
controller a non-cpuset can_attach() can fail. In this case free DL BW
in cpuset_cancel_attach().
Finally, update cpuset DL task count (nr_deadline_tasks) only in
cpuset_attach().
Bug: 238390134
Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 2ef269ef1ahttps://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup/for-6.5)
[Conflict where we try to pull extra new code that is not applicable for
5.15, reject the new code]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: I54944463706cfb91cd441f003d54f78539d15316
While moving a set of tasks between exclusive cpusets,
cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for
DL BW overflow checking and per-task DL BW allocation on the destination
root_domain for the DL tasks in this set.
This approach has the issue of not freeing already allocated DL BW in
the following error cases:
(1) The set of tasks includes multiple DL tasks and DL BW overflow
checking fails for one of the subsequent DL tasks.
(2) Another controller next to the cpuset controller which is attached
to the same cgroup fails in its can_attach().
To address this problem rework dl_cpu_busy():
(1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a
dedicated dl_bw_free().
(2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of
a `struct task_struct *p` used in dl_cpu_busy(). This allows to
allocate DL BW for a set of tasks too rather than only for a single
task.
Bug: 238390134
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 85989106fehttps://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup/for-6.5)
[Trivial conflict due to clash with GKI specific code added in the same
location in the file]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: I569877547ff7c6d20309f006382b7a416d656fb8
Qais reported that iterating over all tasks when rebuilding root domains
for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume).
To fix the problem keep track of the number of DEADLINE tasks belonging
to each cpuset and then use this information (followup patch) to only
perform the above iteration if DEADLINE tasks are actually present in
the cpuset for which a corresponding root domain is being rebuilt.
Bug: 238390134
Reported-by: Qais Yousef <qyousef@layalina.io>
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 6c24849f55https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup/for-6.5)
[Trivial conflicts in cpuset.c and dealdine.c. Reject new code/fields
that are in upstream but not in 5.15]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: I8b991e2b25d66adf73a91c656c1a22cc9dbeea7b
Turns out percpu_cpuset_rwsem - commit 1243dc518c ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).
Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).
Bug: 238390134
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 111cd11bbchttps://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup/for-6.5)
[Conflicts in all the files. Mostly trivial. cpuset.c had new functions
added on mainline and we rejected lots of hunks related to this new
code. Removed ANDROID specific BUG check that uses cpuset_rwsem which
was a left over from a revert of CPU PAUSE feature that should no longer
exist on 5.15]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ibc77724cdd16a98053b277a79f724bbeb49621de
Set KMI_GENERATION=10 for 6/16 KMI update
function symbol 'int ___pskb_trim(struct sk_buff*, unsigned int)' changed
CRC changed from 0x2c424f24 to 0x4fe35e7b
function symbol 'struct sk_buff* __alloc_skb(unsigned int, gfp_t, int, int)' changed
CRC changed from 0xd91d523 to 0x6e6b7433
function symbol 'int __dev_change_net_namespace(struct net_device*, struct net*, const char*, int)' changed
CRC changed from 0x2809aeca to 0x2ee7d827
... 472 omitted; 475 symbols have only CRC changes
type 'struct sock' changed
byte size changed from 872 to 880
member 'int sk_wait_pending' was added
89 members ('struct sk_filter* sk_filter' .. 'u64 android_kabi_reserved8') changed
offset changed by 64
type 'struct usb_gadget' changed
member 'unsigned int wakeup_capable:1' was added
member 'unsigned int wakeup_armed:1' was added
type 'struct tcp_sock' changed
byte size changed from 2360 to 2368
136 members ('u16 tcp_header_len' .. 'u64 android_kabi_reserved1') changed
offset changed by 64
type 'struct vsock_sock' changed
byte size changed from 1472 to 1480
25 members ('const struct vsock_transport* transport' .. 'void* trans') changed
offset changed by 64
type 'struct binder_alloc' changed
member 'struct vm_area_struct* vma' was added
member 'unsigned long vma_addr' was removed
type 'struct usb_gadget_ops' changed
byte size changed from 152 to 160
member 'int(* set_remote_wakeup)(struct usb_gadget*, int)' was added
17 members ('int(* set_selfpowered)(struct usb_gadget*, int)' .. 'u64 android_kabi_reserved4') changed
offset changed by 64
type 'struct inet_connection_sock' changed
byte size changed from 1528 to 1536
32 members ('struct request_sock_queue icsk_accept_queue' .. 'u64 icsk_ca_priv[13]') changed
offset changed by 64
type 'struct inet_sock' changed
byte size changed from 1072 to 1080
29 members ('struct ipv6_pinfo* pinet6' .. 'struct inet_cork_full cork') changed
offset changed by 64
Bug: 287162457
Change-Id: I916f991d57a6caeef27be8f1d2030472823f5ccf
Signed-off-by: Carlos Llamas <cmllamas@google.com>
commit d1d8875c8c upstream.
[ cmllamas: clean forward port from commit 015ac18be7 ("binder: fix
UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed
in mainline after the revert of commit a43cfc87ca ("android: binder:
stop saving a pointer to the VMA") as pointed out by Liam. The commit
log and tags have been tweaked to reflect this. ]
In commit 720c241924 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f260 ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Fixes: c0fd210178 ("Revert "android: binder: stop saving a pointer to the VMA"")
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver
Cc: <stable@vger.kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Change-Id: I3a0c5812df50412981fc5c2b77a994332b900f95
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1bb8a65190)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 0fa53349c3 upstream.
Bring back the original lockless design in binder_alloc to determine
whether the buffer setup has been completed by the ->mmap() handler.
However, this time use smp_load_acquire() and smp_store_release() to
wrap all the ordering in a single macro call.
Also, add comments to make it evident that binder uses alloc->vma to
determine when the binder_alloc has been fully initialized. In these
scenarios acquiring the mmap_lock is not required.
Fixes: a43cfc87ca ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org
Change-Id: If0ceaf5ddbc73777caf01d4b4f180459ba163d48
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[cmllamas: fixed minor merge conflict in binder_alloc_set_vma()]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1cae0d5136)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit c0fd210178 upstream.
This reverts commit a43cfc87ca.
This patch fixed an issue reported by syzkaller in [1]. However, this
turned out to be only a band-aid in binder. The root cause, as bisected
by syzkaller, was fixed by commit 5789151e48 ("mm/mmap: undo ->mmap()
when mas_preallocate() fails"). We no longer need the patch for binder.
Reverting such patch allows us to have a lockless access to alloc->vma
in specific cases where the mmap_lock is not required. This approach
avoids the contention that caused a performance regression.
[1] https://lore.kernel.org/all/0000000000004a0dbe05e1d749e0@google.com
[cmllamas: resolved conflicts with rework of alloc->mm and removal of
binder_alloc_set_vma() also fixed comment section]
Fixes: a43cfc87ca ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org
Change-Id: I928119621aab98a3c5b166878f2fcc6f76220d42
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[cmllamas: fixed merge conflict in binder_alloc_set_vma()]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dd7aff43d0)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 4e8ef34e36 ]
When work in gadget mode, currently driver doesn't update software level
link_state correctly as link state change event is not enabled for most
devices, in function dwc3_gadget_suspend_interrupt(), it will only pass
suspend event to UDC core when software level link state changes, so when
interrupt generated in sequences of suspend -> reset -> conndone ->
suspend, link state is not updated during reset and conndone, so second
suspend interrupt event will not pass to UDC core.
Remove link_state compare in dwc3_gadget_suspend_interrupt() and add a
suspended flag to replace the compare function.
Fixes: 799e9dc829 ("usb: dwc3: gadget: conditionally disable Link State change events")
Cc: stable <stable@kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Change-Id: I05d2c6814aaca6ac9bb61298ca49341dc5b7d155
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Link: https://lore.kernel.org/r/20230512004524.31950-1-quic_linyyuan@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f191711553)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit b93c2a68f3 ]
The wakeup bit in the bmAttributes field indicates whether the device
is configured for remote wakeup. But this field should be allowed to
set only if the UDC supports such wakeup mechanism. So configure this
field based on UDC capability. Also inform the UDC whether the device
is configured for remote wakeup by implementing a gadget op.
Reviewed-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Change-Id: I45920af2fe5171587ee801abf4c3d33f8832720f
Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com>
Link: https://lore.kernel.org/r/1679694482-16430-2-git-send-email-quic_eserrao@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 4e8ef34e36 ("usb: dwc3: fix gadget mode suspend interrupt handler issue")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7919af1dcb)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Now that the branch is used to create production GKI
images, need to institute ACK DrNo for all commits.
The DrNo approvers are in the android-mainline branch
at /OWNERS_DrNo.
Bug: 287162457
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: Id5bb83d7add5f314df6816c1c51b4bf2d8018e79
This locks down OWNERS approval to a small group to guard against
unintentional breakages.
Bug: 287162457
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: I58ca467b1e7786e1ad0f6ad67c7a7a5845a91ec6
Set KMI_GENERATION=9 for 6/16 KMI update
variable symbol changed from 'struct static_key_false cpu_hwcap_keys[75]' to 'struct static_key_false cpu_hwcap_keys[95]'
CRC changed from 0xfe9a697c to 0x41aad71d
type changed from 'struct static_key_false[75]' to 'struct static_key_false[95]'
number of elements changed from 75 to 95
function symbol 'struct block_device* I_BDEV(struct inode*)' changed
CRC changed from 0x6ad768b0 to 0x8d400dbd
function symbol 'void* PDE_DATA(const struct inode*)' changed
CRC changed from 0x1b12d990 to 0xc3c38b5c
function symbol 'void __ClearPageMovable(struct page*)' changed
CRC changed from 0x5ed16e08 to 0xf489e5e8
... 3676 omitted; 3679 symbols have only CRC changes
type 'enum cpuhp_state' changed
enumerator 'CPUHP_AP_ARM_SDEI_STARTING' (114) was removed
enumerator 'CPUHP_AP_ARM_VFP_STARTING' value changed from 115 to 114
enumerator 'CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING' value changed from 116 to 115
enumerator 'CPUHP_AP_PERF_ARM_HW_BREAKPOINT_STARTING' value changed from 117 to 116
enumerator 'CPUHP_AP_PERF_ARM_ACPI_STARTING' value changed from 118 to 117
enumerator 'CPUHP_AP_PERF_ARM_STARTING' value changed from 119 to 118
enumerator 'CPUHP_AP_ARM_L2X0_STARTING' value changed from 120 to 119
enumerator 'CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING' value changed from 121 to 120
enumerator 'CPUHP_AP_ARM_ARCH_TIMER_STARTING' value changed from 122 to 121
enumerator 'CPUHP_AP_ARM_GLOBAL_TIMER_STARTING' value changed from 123 to 122
enumerator 'CPUHP_AP_JCORE_TIMER_STARTING' value changed from 124 to 123
enumerator 'CPUHP_AP_ARM_TWD_STARTING' value changed from 125 to 124
enumerator 'CPUHP_AP_QCOM_TIMER_STARTING' value changed from 126 to 125
enumerator 'CPUHP_AP_TEGRA_TIMER_STARTING' value changed from 127 to 126
enumerator 'CPUHP_AP_ARMADA_TIMER_STARTING' value changed from 128 to 127
enumerator 'CPUHP_AP_MARCO_TIMER_STARTING' value changed from 129 to 128
enumerator 'CPUHP_AP_MIPS_GIC_TIMER_STARTING' value changed from 130 to 129
enumerator 'CPUHP_AP_ARC_TIMER_STARTING' value changed from 131 to 130
enumerator 'CPUHP_AP_RISCV_TIMER_STARTING' value changed from 132 to 131
enumerator 'CPUHP_AP_CLINT_TIMER_STARTING' value changed from 133 to 132
enumerator 'CPUHP_AP_CSKY_TIMER_STARTING' value changed from 134 to 133
enumerator 'CPUHP_AP_TI_GP_TIMER_STARTING' value changed from 135 to 134
enumerator 'CPUHP_AP_HYPERV_TIMER_STARTING' value changed from 136 to 135
enumerator 'CPUHP_AP_KVM_STARTING' value changed from 137 to 136
enumerator 'CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING' value changed from 138 to 137
enumerator 'CPUHP_AP_KVM_ARM_VGIC_STARTING' value changed from 139 to 138
enumerator 'CPUHP_AP_KVM_ARM_TIMER_STARTING' value changed from 140 to 139
enumerator 'CPUHP_AP_DUMMY_TIMER_STARTING' value changed from 141 to 140
enumerator 'CPUHP_AP_ARM_XEN_STARTING' value changed from 142 to 141
enumerator 'CPUHP_AP_ARM_CORESIGHT_STARTING' value changed from 143 to 142
enumerator 'CPUHP_AP_ARM_CORESIGHT_CTI_STARTING' value changed from 144 to 143
enumerator 'CPUHP_AP_ARM64_ISNDEP_STARTING' value changed from 145 to 144
enumerator 'CPUHP_AP_SMPCFD_DYING' value changed from 146 to 145
enumerator 'CPUHP_AP_X86_TBOOT_DYING' value changed from 147 to 146
enumerator 'CPUHP_AP_ARM_CACHE_B15_RAC_DYING' value changed from 148 to 147
enumerator 'CPUHP_AP_ONLINE' value changed from 149 to 148
enumerator 'CPUHP_TEARDOWN_CPU' value changed from 150 to 149
enumerator 'CPUHP_AP_ONLINE_IDLE' value changed from 151 to 150
enumerator 'CPUHP_AP_SCHED_WAIT_EMPTY' value changed from 152 to 151
enumerator 'CPUHP_AP_SMPBOOT_THREADS' value changed from 153 to 152
enumerator 'CPUHP_AP_X86_VDSO_VMA_ONLINE' value changed from 154 to 153
enumerator 'CPUHP_AP_IRQ_AFFINITY_ONLINE' value changed from 155 to 154
enumerator 'CPUHP_AP_BLK_MQ_ONLINE' value changed from 156 to 155
enumerator 'CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS' value changed from 157 to 156
enumerator 'CPUHP_AP_X86_INTEL_EPB_ONLINE' value changed from 158 to 157
enumerator 'CPUHP_AP_PERF_ONLINE' value changed from 159 to 158
enumerator 'CPUHP_AP_PERF_X86_ONLINE' value changed from 160 to 159
enumerator 'CPUHP_AP_PERF_X86_UNCORE_ONLINE' value changed from 161 to 160
enumerator 'CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE' value changed from 162 to 161
enumerator 'CPUHP_AP_PERF_X86_AMD_POWER_ONLINE' value changed from 163 to 162
enumerator 'CPUHP_AP_PERF_X86_RAPL_ONLINE' value changed from 164 to 163
enumerator 'CPUHP_AP_PERF_X86_CQM_ONLINE' value changed from 165 to 164
enumerator 'CPUHP_AP_PERF_X86_CSTATE_ONLINE' value changed from 166 to 165
enumerator 'CPUHP_AP_PERF_X86_IDXD_ONLINE' value changed from 167 to 166
enumerator 'CPUHP_AP_PERF_S390_CF_ONLINE' value changed from 168 to 167
enumerator 'CPUHP_AP_PERF_S390_SF_ONLINE' value changed from 169 to 168
enumerator 'CPUHP_AP_PERF_ARM_CCI_ONLINE' value changed from 170 to 169
enumerator 'CPUHP_AP_PERF_ARM_CCN_ONLINE' value changed from 171 to 170
enumerator 'CPUHP_AP_PERF_ARM_HISI_DDRC_ONLINE' value changed from 172 to 171
enumerator 'CPUHP_AP_PERF_ARM_HISI_HHA_ONLINE' value changed from 173 to 172
enumerator 'CPUHP_AP_PERF_ARM_HISI_L3_ONLINE' value changed from 174 to 173
enumerator 'CPUHP_AP_PERF_ARM_HISI_PA_ONLINE' value changed from 175 to 174
enumerator 'CPUHP_AP_PERF_ARM_HISI_SLLC_ONLINE' value changed from 176 to 175
enumerator 'CPUHP_AP_PERF_ARM_L2X0_ONLINE' value changed from 177 to 176
enumerator 'CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE' value changed from 178 to 177
enumerator 'CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE' value changed from 179 to 178
enumerator 'CPUHP_AP_PERF_ARM_APM_XGENE_ONLINE' value changed from 180 to 179
enumerator 'CPUHP_AP_PERF_ARM_CAVIUM_TX2_UNCORE_ONLINE' value changed from 181 to 180
enumerator 'CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE' value changed from 182 to 181
enumerator 'CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE' value changed from 183 to 182
enumerator 'CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE' value changed from 184 to 183
enumerator 'CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE' value changed from 185 to 184
enumerator 'CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE' value changed from 186 to 185
enumerator 'CPUHP_AP_PERF_POWERPC_HV_GPCI_ONLINE' value changed from 187 to 186
enumerator 'CPUHP_AP_PERF_CSKY_ONLINE' value changed from 188 to 187
enumerator 'CPUHP_AP_WATCHDOG_ONLINE' value changed from 189 to 188
enumerator 'CPUHP_AP_WORKQUEUE_ONLINE' value changed from 190 to 189
enumerator 'CPUHP_AP_RANDOM_ONLINE' value changed from 191 to 190
enumerator 'CPUHP_AP_RCUTREE_ONLINE' value changed from 192 to 191
enumerator 'CPUHP_AP_BASE_CACHEINFO_ONLINE' value changed from 193 to 192
enumerator 'CPUHP_AP_ONLINE_DYN' value changed from 194 to 193
enumerator 'CPUHP_AP_ONLINE_DYN_END' value changed from 224 to 223
enumerator 'CPUHP_AP_MM_DEMOTION_ONLINE' value changed from 225 to 224
enumerator 'CPUHP_AP_X86_HPET_ONLINE' value changed from 226 to 225
enumerator 'CPUHP_AP_X86_KVM_CLK_ONLINE' value changed from 227 to 226
enumerator 'CPUHP_AP_DTPM_CPU_ONLINE' value changed from 228 to 227
enumerator 'CPUHP_AP_ACTIVE' value changed from 229 to 228
enumerator 'CPUHP_ANDROID_RESERVED_1' value changed from 230 to 229
enumerator 'CPUHP_ANDROID_RESERVED_2' value changed from 231 to 230
enumerator 'CPUHP_ANDROID_RESERVED_3' value changed from 232 to 231
enumerator 'CPUHP_ANDROID_RESERVED_4' value changed from 233 to 232
enumerator 'CPUHP_ONLINE' value changed from 234 to 233
type 'struct task_struct' changed
byte size changed from 4672 to 4736
5 members ('struct sched_rt_entity rt' .. 'struct uclamp_se uclamp[2]') changed
offset changed by -1536
member 'struct sched_statistics stats' was added
189 members ('struct hlist_head preempt_notifiers' .. 'u64 android_kabi_reserved8') changed
offset changed by 832
member 'struct thread_struct thread' changed
offset changed by 768
type 'struct platform_driver' changed
byte size changed from 240 to 248
member 'void(* remove_new)(struct platform_device*)' was added
7 members ('void(* shutdown)(struct platform_device*)' .. 'u64 android_kabi_reserved1') changed
offset changed by 64
type 'struct sched_entity' changed
byte size changed from 512 to 320
member 'struct sched_statistics statistics' was removed
5 members ('int depth' .. 'unsigned long runnable_weight') changed
offset changed by -1728
5 members ('struct sched_avg avg' .. 'u64 android_kabi_reserved4') changed
offset changed by -1536
type 'struct tipc_bearer' changed
member 'u16 encap_hlen' was added
type 'enum kvm_pgtable_prot' changed
enumerator 'KVM_PGTABLE_PROT_PXN' (32) was added
enumerator 'KVM_PGTABLE_PROT_UXN' (64) was added
Bug: 287162457
Change-Id: Icccb0e4826e7693fdae5c4463be6664db1de421c
Signed-off-by: Carlos Llamas <cmllamas@google.com>
[ Upstream commit 35a089b5d7 ]
Checking the bearer min mtu with tipc_udp_mtu_bad() only works for
IPv4 UDP bearer, and IPv6 UDP bearer has a different value for the
min mtu. This patch checks with encap_hlen + TIPC_MIN_BEARER_MTU
for min mtu, which works for both IPv4 and IPv6 UDP bearer.
Note that tipc_udp_mtu_bad() is still used to check media min mtu
in __tipc_nl_media_set(), as m->mtu currently is only used by the
IPv4 UDP bearer as its default mtu value.
Fixes: 682cd3cf94 ("tipc: confgiure and apply UDP bearer MTU on running links")
Change-Id: I585703598475f2de30353fcc7a96e295fe63549b
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 673cb47989)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 56077b56cd ]
When doing link mtu negotiation, a malicious peer may send Activate msg
with a very small mtu, e.g. 4 in Shuang's testing, without checking for
the minimum mtu, l->mtu will be set to 4 in tipc_link_proto_rcv(), then
n->links[bearer_id].mtu is set to 4294967228, which is a overflow of
'4 - INT_H_SIZE - EMSG_OVERHEAD' in tipc_link_mss().
With tipc_link.mtu = 4, tipc_link_xmit() kept printing the warning:
tipc: Too large msg, purging xmit list 1 5 0 40 4!
tipc: Too large msg, purging xmit list 1 15 0 60 4!
And with tipc_link_entry.mtu 4294967228, a huge skb was allocated in
named_distribute(), and when purging it in tipc_link_xmit(), a crash
was even caused:
general protection fault, probably for non-canonical address 0x2100001011000dd: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.3.0.neta #19
RIP: 0010:kfree_skb_list_reason+0x7e/0x1f0
Call Trace:
<IRQ>
skb_release_data+0xf9/0x1d0
kfree_skb_reason+0x40/0x100
tipc_link_xmit+0x57a/0x740 [tipc]
tipc_node_xmit+0x16c/0x5c0 [tipc]
tipc_named_node_up+0x27f/0x2c0 [tipc]
tipc_node_write_unlock+0x149/0x170 [tipc]
tipc_rcv+0x608/0x740 [tipc]
tipc_udp_recv+0xdc/0x1f0 [tipc]
udp_queue_rcv_one_skb+0x33e/0x620
udp_unicast_rcv_skb.isra.72+0x75/0x90
__udp4_lib_rcv+0x56d/0xc20
ip_protocol_deliver_rcu+0x100/0x2d0
This patch fixes it by checking the new mtu against tipc_bearer_min_mtu(),
and not updating mtu if it is too small.
Fixes: ed193ece26 ("tipc: simplify link mtu negotiation")
Reported-by: Shuang Li <shuali@redhat.com>
Change-Id: I84fb5694b763c1e9d1a93643d849d8d17dbf5cd8
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 575e84d90a)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 3ae6d66b60 ]
As different media may requires different min mtu, and even the
same media with different net family requires different min mtu,
add tipc_bearer_min_mtu() to calculate min mtu accordingly.
This API will be used to check the new mtu when doing the link
mtu negotiation in the next patch.
Change-Id: Ic9917ba5e26138b813dd037d38c52ce7adb3ea03
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 56077b56cd ("tipc: do not update mtu if msg_max is too small in mtu negotiation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5cf99d5f65)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit d2c48b2387 ]
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra
triggers:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by cpuhp/0/24:
#0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
#1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
#2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130
irq event stamp: 36
hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0
hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248
softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...]
Hardware name: WIWYNN Mt.Jade Server [...]
Call trace:
dump_backtrace+0x114/0x120
show_stack+0x20/0x70
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x228
rt_spin_lock+0x70/0x120
sdei_cpuhp_up+0x3c/0x130
cpuhp_invoke_callback+0x250/0xf08
cpuhp_thread_fun+0x120/0x248
smpboot_thread_fn+0x280/0x320
kthread+0x130/0x140
ret_from_fork+0x10/0x20
sdei_cpuhp_up() is called in the STARTING hotplug section,
which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry
instead to execute the cpuhp cb later, with preemption enabled.
SDEI originally got its own cpuhp slot to allow interacting
with perf. It got superseded by pNMI and this early slot is not
relevant anymore. [1]
Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the
calling CPU. It is checked that preemption is disabled for them.
_ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'.
Preemption is enabled in those threads, but their cpumask is limited
to 1 CPU.
Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb
don't trigger them.
Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call
which acts on the calling CPU.
[1]:
https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/
Suggested-by: James Morse <james.morse@arm.com>
Change-Id: If68806613938a753ba8113cf3421c545934cf3a2
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 66caf22787)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 31088f6f79 ]
typeof is (still) a GNU extension, which means that it cannot be used when
building ISO C (e.g. -std=c99). It should therefore be avoided in uapi
headers in favour of the ISO-friendly __typeof__.
Unfortunately this issue could not be detected by
CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in
any uapi header.
This matters from a userspace perspective, not a kernel one. uapi
headers and their contents are expected to be usable in a variety of
situations, and in particular when building ISO C applications (with
-std=c99 or similar).
This particular problem can be reproduced by trying to use the
__ALIGN_KERNEL macro directly in application code, say:
int align(int x, int a)
{
return __KERNEL_ALIGN(x, a);
}
and trying to build that with -std=c99.
Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com
Fixes: a79ff731a1 ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()")
Change-Id: I4204df6f16689acb4d0786e3edf2b6ebc457c4e3
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com>
Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 397eb669da)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 769fdf83df upstream.
When !SCHEDSTATS schedstat_enabled() is an unconditional 0 and the
whole block doesn't exist, however GCC figures the scoped variable
'stats' is unused and complains about it.
Upgrade the warning from -Wunused-variable to -Wunused-but-set-variable
by writing it in two statements. This fixes the build because the new
warning is in W=1.
Given that whole if(0) {} thing, I don't feel motivated to change
things overly much and quite strongly feel this is the compiler being
daft.
Fixes: cb3e971c435d ("sched: Make struct sched_statistics independent of fair sched class")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Change-Id: I3b1f6cc605ae53a43f4a75a8d1a6cf2a947998ea
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0a008c5098)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit ceeadb83ae ]
If we want to use the schedstats facility to trace other sched classes, we
should make it independent of fair sched class. The struct sched_statistics
is the schedular statistics of a task_struct or a task_group. So we can
move it into struct task_struct and struct task_group to achieve the goal.
After the patch, schestats are orgnized as follows,
struct task_struct {
...
struct sched_entity se;
struct sched_rt_entity rt;
struct sched_dl_entity dl;
...
struct sched_statistics stats;
...
};
Regarding the task group, schedstats is only supported for fair group
sched, and a new struct sched_entity_stats is introduced, suggested by
Peter -
struct sched_entity_stats {
struct sched_entity se;
struct sched_statistics stats;
} __no_randomize_layout;
Then with the se in a task_group, we can easily get the stats.
The sched_statistics members may be frequently modified when schedstats is
enabled, in order to avoid impacting on random data which may in the same
cacheline with them, the struct sched_statistics is defined as cacheline
aligned.
As this patch changes the core struct of scheduler, so I verified the
performance it may impact on the scheduler with 'perf bench sched
pipe', suggested by Mel. Below is the result, in which all the values
are in usecs/op.
Before After
kernel.sched_schedstats=0 5.2~5.4 5.2~5.4
kernel.sched_schedstats=1 5.3~5.5 5.3~5.5
[These data is a little difference with the earlier version, that is
because my old test machine is destroyed so I have to use a new
different test machine.]
Almost no impact on the sched performance.
No functional change.
[lkp@intel.com: reported build failure in earlier version]
Change-Id: I3df219ae37b431796057e380098afa7f6bb2bc63
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20210905143547.4668-3-laoar.shao@gmail.com
Stable-dep-of: 39afe5d6fc ("sched/fair: Fix inaccurate tally of ttwu_move_affine")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c3b9f95598)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 5c5a7680e6 ]
struct platform_driver::remove returning an integer made driver authors
expect that returning an error code was proper error handling. However
the driver core ignores the error and continues to remove the device
because there is nothing the core could do anyhow and reentering the
remove callback again is only calling for trouble.
So this is an source for errors typically yielding resource leaks in the
error path.
As there are too many platform drivers to neatly convert them all to
return void in a single go, do it in several steps after this patch:
a) Convert all drivers to implement .remove_new() returning void instead
of .remove() returning int;
b) Change struct platform_driver::remove() to return void and so make
it identical to .remove_new();
c) Change all drivers back to .remove() now with the better prototype;
d) drop struct platform_driver::remove_new().
While this touches all drivers eventually twice, steps a) and c) can be
done one driver after another and so reduces coordination efforts
immensely and simplifies review.
Change-Id: I35e14b74375e32f1351bfebfa794e2f3fec99776
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221209150914.3557650-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: c766c90faf ("media: rcar_fdp1: Fix refcount leak in probe and remove function")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d18789f434)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Over the lifetime of the kernel, new arm64 cpucaps need to be added to
handle errata and other fun stuff. So reserve 20 spots for us to use in
the future as this is an ABI-stable structure that we can not increase
over time without major problems.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I37bdac374e2570f61ab54919712fd62c7e541e67
FEAT_XNX allows to specify PXN and UXN attributes on stage-2 entries.
Make this usable from pKVM by exposing two new kvm_pgtable_prot entries
for each of them.
No functional changes intended.
Bug: 264070847
Change-Id: I47d861fa64ba511370b182f4609fe1c27695a949
Signed-off-by: Quentin Perret <qperret@google.com>
Nothing currently prevents the donation of an MMIO region to the
hypervisor for backing e.g. guest stage-2 page-tables, tracing buffers,
hyp vm and vcpu metadata, or any other donation to EL2. However, the
only confirmed use-case for MMIO donations are for protecting the IOMMU
registers as well as for vendor module usage.
Restrict the donation of MMIO regions to these two paths only by
introducing a new helper function.
Bug: 264070847
Change-Id: I914508fb3e3547fcfabca8557bdf7948cb796099
Signed-off-by: Quentin Perret <qperret@google.com>
We've historically disallowed state changes for MMIO pages -- the host
had sole ownership of all of them. However, changing the state of those
pages has clearly become a goal both to support vendor extensions to
the hypervisor, as well as to support device assignment in the longer
term. To pave the way towards this support, let's allow certain state
transitions for MMIO pages.
Bug: 264070847
Change-Id: I9803b572c90d8a694c3d43a0ee0d7b4f4124fe4a
Signed-off-by: Quentin Perret <qperret@google.com>
We now allow donations of MMIO ranges, let's also allow modules to
change host stage-2 permissions.
Bug: 264070847
Change-Id: Ia72678bb27559d9a7963dbc5ffb5a101efcbbad2
Signed-off-by: Quentin Perret <qperret@google.com>
There shouldn't be any reason to ever need allocating from the host
stage-2 pool during mem aborts now that the base page-table structure
is pinned. To prevent future regressions in this area, introduce a new
sanity check that will warn when hyp_page_alloc() is used from the mem
wrong paths.
Bug: 264070847
Change-Id: I7a7c606fe01558790e4ffcd3534f8976caf48bd0
Signed-off-by: Quentin Perret <qperret@google.com>
The MMIO register space for IOMMUs controlled by the hypervisor is
currently unmapped from the host stage-2, and we rely on the host abort
path to not accidentally map them. However, this approach becomes
increasingly difficult to maintain as we introduce support for donating
MMIO regions and not just memory -- nothing prevents the host from
donating a protected MMIO register to another entity for example.
Now that MMIO donations are possible, let's use the proper
host-donate-hyp machinery to implement this. As a nice side effect, this
guarantees the host stage-2 page-table is annotated with hyp ownership
for those IOMMU regions, which guarantees the core range alignment
feature in the host mem abort parth will do the right thing without
requiring a second pass in the IOMMU code. This also turns the host
stage-2 PTEs into "non-default" entries, hence avoiding issues with the
coallescing code looking forward.
Bug: 264070847
Change-Id: I1fad1b1be36f3b654190a912617e780141945a8f
Signed-off-by: Quentin Perret <qperret@google.com>
We now support donations of MMIO ranges to the hypervisor. Make sure to
update the donation logic to correctly map these pages with device
mappings.
Bug: 264070847
Change-Id: I36558f05ed47d1e3dc06e4e24151241474b4ff77
Signed-off-by: Quentin Perret <qperret@google.com>