linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-05 18:41:58 +09:00

Author	SHA1	Message	Date
Eric Dumazet	e3ff5d6bf0	UPSTREAM: bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready() [ Upstream commit `b320a45638` ] syzbot found sk_psock(sk) could return NULL when called from sk_psock_verdict_data_ready(). Just make sure to handle this case. [1] general protection fault, probably for non-canonical address 0xdffffc000000005c: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002e0-0x00000000000002e7] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.4.0-rc3-syzkaller-00588-g4781e965e655 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 RIP: 0010:sk_psock_verdict_data_ready+0x19f/0x3c0 net/core/skmsg.c:1213 Code: 4c 89 e6 e8 63 70 5e f9 4d 85 e4 75 75 e8 19 74 5e f9 48 8d bb e0 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 07 02 00 00 48 89 ef ff 93 e0 02 00 00 e8 29 fd RSP: 0018:ffffc90000147688 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000100 RDX: 000000000000005c RSI: ffffffff8825ceb7 RDI: 00000000000002e0 RBP: ffff888076518c40 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000008000 R15: ffff888076518c40 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f901375bab0 CR3: 000000004bf26000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_data_ready+0x10a/0x520 net/ipv4/tcp_input.c:5006 tcp_data_queue+0x25d3/0x4c50 net/ipv4/tcp_input.c:5080 tcp_rcv_established+0x829/0x1f90 net/ipv4/tcp_input.c:6019 tcp_v4_do_rcv+0x65a/0x9c0 net/ipv4/tcp_ipv4.c:1726 tcp_v4_rcv+0x2cbf/0x3340 net/ipv4/tcp_ipv4.c:2148 ip_protocol_deliver_rcu+0x9f/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2ec/0x520 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] NF_HOOK include/linux/netfilter.h:297 [inline] ip_local_deliver+0x1ae/0x200 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish+0x1cf/0x2f0 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:303 [inline] NF_HOOK include/linux/netfilter.h:297 [inline] ip_rcv+0xae/0xd0 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5491 __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5605 process_backlog+0x101/0x670 net/core/dev.c:5933 __napi_poll+0xb7/0x6f0 net/core/dev.c:6499 napi_poll net/core/dev.c:6566 [inline] net_rx_action+0x8a9/0xcb0 net/core/dev.c:6699 __do_softirq+0x1d4/0x905 kernel/softirq.c:571 run_ksoftirqd kernel/softirq.c:939 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:931 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x344/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Fixes: `6df7f764cd` ("bpf, sockmap: Wake up polling after data copy") Reported-by: syzbot <syzkaller@googlegroups.com> Change-Id: I7c0f888b35987f8019088e9232fbe0f0491f661b Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230530195149.68145-1-edumazet@google.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `898c9a0ee7`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:41 +00:00
John Fastabend	07873e75c6	UPSTREAM: bpf, sockmap: Incorrectly handling copied_seq [ Upstream commit `e5c6de5fa0` ] The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value. To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled. Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that. We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Change-Id: I8dc204d02e26975f8133d7e4d777b2194e30a6aa Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `fe735073a5`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:41 +00:00
John Fastabend	e218734b1b	UPSTREAM: bpf, sockmap: Wake up polling after data copy [ Upstream commit `6df7f764cd` ] When TCP stack has data ready to read sk_data_ready() is called. Sockmap overwrites this with its own handler to call into BPF verdict program. But, the original TCP socket had sock_def_readable that would additionally wake up any user space waiters with sk_wake_async(). Sockmap saved the callback when the socket was created so call the saved data ready callback and then we can wake up any epoll() logic waiting on the read. Note we call on 'copied >= 0' to account for returning 0 when a FIN is received because we need to wake up user for this as well so they can do the recvmsg() -> 0 and detect the shutdown. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Change-Id: Idf56c7acfeb25791dc6e5f42dce2e64b09d55cf9 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-8-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `dd628fc697`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:41 +00:00
John Fastabend	f9cc0b7f9b	UPSTREAM: bpf, sockmap: TCP data stall on recv before accept [ Upstream commit `ea444185a6` ] A common mechanism to put a TCP socket into the sockmap is to hook the BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program that can map the socket info to the correct BPF verdict parser. When the user adds the socket to the map the psock is created and the new ops are assigned to ensure the verdict program will 'see' the sk_buffs as they arrive. Part of this process hooks the sk_data_ready op with a BPF specific handler to wake up the BPF verdict program when data is ready to read. The logic is simple enough (posted here for easy reading) static void sk_psock_verdict_data_ready(struct sock sk) { struct socket sock = sk->sk_socket; if (unlikely(!sock \|\| !sock->ops \|\| !sock->ops->read_skb)) return; sock->ops->read_skb(sk, sk_psock_verdict_recv); } The oversight here is sk->sk_socket is not assigned until the application accepts() the new socket. However, its entirely ok for the peer application to do a connect() followed immediately by sends. The socket on the receiver is sitting on the backlog queue of the listening socket until its accepted and the data is queued up. If the peer never accepts the socket or is slow it will eventually hit data limits and rate limit the session. But, important for BPF sockmap hooks when this data is received TCP stack does the sk_data_ready() call but the read_skb() for this data is never called because sk_socket is missing. The data sits on the sk_receive_queue. Then once the socket is accepted if we never receive more data from the peer there will be no further sk_data_ready calls and all the data is still on the sk_receive_queue(). Then user calls recvmsg after accept() and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler. The handler checks for data in the sk_msg ingress queue expecting that the BPF program has already run from the sk_data_ready hook and enqueued the data as needed. So we are stuck. To fix do an unlikely check in recvmsg handler for data on the sk_receive_queue and if it exists wake up data_ready. We have the sock locked in both read_skb and recvmsg so should avoid having multiple runners. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Change-Id: I82bc3eafce486a816cf8dfada1939128922ae174 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-7-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ab90b68f65`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:41 +00:00
John Fastabend	028591f2c8	UPSTREAM: bpf, sockmap: Handle fin correctly [ Upstream commit `901546fd8f` ] The sockmap code is returning EAGAIN after a FIN packet is received and no more data is on the receive queue. Correct behavior is to return 0 to the user and the user can then close the socket. The EAGAIN causes many apps to retry which masks the problem. Eventually the socket is evicted from the sockmap because its released from sockmap sock free handling. The issue creates a delay and can cause some errors on application side. To fix this check on sk_msg_recvmsg side if length is zero and FIN flag is set then set return to zero. A selftest will be added to check this condition. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Change-Id: I26d941790b9742534370c0447fd4a92cab55c32e Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-6-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `3a2129ebae`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:41 +00:00
John Fastabend	e69ad7c838	UPSTREAM: bpf, sockmap: Improved check for empty queue [ Upstream commit `405df89dd5` ] We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Change-Id: I076ae2689caf17afbae7d4093139407d60cf4d0d Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ba4fec5bd6`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:40 +00:00
John Fastabend	ecfcbe21d7	UPSTREAM: bpf, sockmap: Reschedule is now done through backlog [ Upstream commit `bce22552f9` ] Now that the backlog manages the reschedule() logic correctly we can drop the partial fix to reschedule from recvmsg hook. Rescheduling on recvmsg hook was added to address a corner case where we still had data in the backlog state but had nothing to kick it and reschedule the backlog worker to run and finish copying data out of the state. This had a couple limitations, first it required user space to kick it introducing an unnecessary EBUSY and retry. Second it only handled the ingress case and egress redirects would still be hung. With the correct fix, pushing the reschedule logic down to where the enomem error occurs we can drop this fix. Fixes: `bec217197b` ("skmsg: Schedule psock work if the cached skb exists on the psock") Change-Id: Ibf8b70dbeca5122c2ef954504dbe44724456899e Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-4-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `1e4e379ccd`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:40 +00:00
John Fastabend	42fcf3b6df	UPSTREAM: bpf, sockmap: Convert schedule_work into delayed_work [ Upstream commit `29173d07f7` ] Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. >From on list discussion. This commit bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock") was intended to address similar race, but had a couple cases it missed. Most obvious it only accounted for receiving traffic on the local socket so if redirecting into another socket we could still get an sk_buff stuck here. Next it missed the case where copied=0 in the recv() handler and then we wouldn't kick the scheduler. Also its sub-optimal to require userspace to kick the internal mechanisms of sockmap to wake it up and copy data to user. It results in an extra syscall and requires the app to actual handle the EAGAIN correctly. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Change-Id: I61dbe914b0abf5f0f7e16f95d246c8e4fa0f5afa Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `9f4d7efb33`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:40 +00:00
John Fastabend	a59051006b	UPSTREAM: bpf, sockmap: Pass skb ownership through read_skb [ Upstream commit `78fa0d61d9` ] The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] <IRQ> [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150 Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay <will@isovalent.com> Change-Id: I0dadf18f695e4305ba1043a7fbec7ef3f58baba7 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `4ae2af3e59`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 23:02:40 +00:00
Elliot Berman	86409bb4e1	ANDROID: virt: gunyah: Sync with latest Gunyah patches Sync changes to Gunyah stack to align with latest changes posted to kernel.org: https://lore.kernel.org/all/20230613172054.3959700-1-quic_eberman@quicinc.com/ Bug: 287037804 Change-Id: Ia36044894860bb94ff5518cf304254cdad14aaf5 Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>	2023-06-14 22:02:31 +00:00
Elliot Berman	705a9b5feb	ANDROID: virt: gunyah: Sync with latest documentation and sample Sync with latest documentation and sample code from v14 of Gunyah patches: https://lore.kernel.org/all/20230613172054.3959700-1-quic_eberman@quicinc.com/ Bug: 287037804 Change-Id: I8893922e6b8096fdd5dff1b22ebce96e72cdb7c3 Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>	2023-06-14 22:02:31 +00:00
Howard Yen	60662882b7	FROMLIST: usb: xhci-plat: add xhci_plat_priv_overwrite Add an overwrite to platform specific callback for setting up the xhci_vendor_ops, allow vendor to store the xhci_vendor_ops and overwrite them when xhci_plat_probe invoked. This change is depend on Commit in this patch series ("usb: host: add xhci hooks for USB offload"), vendor needs to invoke xhci_plat_register_vendor_ops() to register the vendor specific vendor_ops. And the vendor_ops will overwrite the vendor_ops inside xhci_plat_priv in xhci_vendor_init() during xhci-plat-hcd probe. Change-Id: I8030fe3bd274615f5926f19014c3a3e066ca9dba Signed-off-by: Howard Yen <howardyen@google.com> Bug: 175358363 Link: https://lore.kernel.org/r/20210119101044.1637023-1-howardyen@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>	2023-06-14 17:35:00 +00:00
Howard Yen	6496f6cfbb	ANDROID: usb: host: export symbols for xhci hooks usage Export symbols for xhci hooks usage: xhci_ring_free - Allow xhci hook to free xhci_ring. xhci_get_slot_ctx - Allow xhci hook to get slot_ctx from the xhci_container_ctx for getting the slot_ctx information to know which slot is offloading and compare the context in remote subsystem memory if needed. xhci_get_ep_ctx - Allow xhci hook to get ep_ctx from the xhci_container_ctx for getting the ep_ctx information to know which ep is offloading and comparing the context in remote subsystem memory if needed. Export below xhci symbols for vendor modules to manage additional secondary rings. These will be used to manage the secondary ring for usb audio offload. xhci_segment_free - Free a segment struct. xhci_remove_stream_mapping - Free for sram xhci_link_segments - Make the prev segment point to the next segment. xhci_initialze_ring_info - Initialze a ring struct. xhci_check_trb_in_td_math - Check TRB math for validation. xhci_address_device - Issue an address device command xhci_bus_suspend xhci_bus_resume - Suspend and resume for power scenario Change-Id: I2d99bded67024b2a7c625f934567e39ac03a6e5f Signed-off-by: Howard Yen <howardyen@google.com> Bug: 175358363 Bug: 183761108 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Signed-off-by: Daehwan Jung <dh10.jung@samsung.com> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>	2023-06-14 17:35:00 +00:00
Howard Yen	90ab8e7f98	ANDROID: usb: host: add xhci hooks for USB offload To enable supporting for USB offload, define "offload" in usb controller node of device tree. "offload" value can be used to determine which type of offload was been enabled in the SoC. For example: &usbdrd_dwc3 { ... /* support usb offloading, 0: disabled, 1: audio */ offload = <1>; ... }; There are several vendor_ops introduced by this patch: c - function callbacks for vendor specific operations { @vendor_init: - called for vendor init process during xhci-plat-hcd probe. @vendor_cleanup: - called for vendor cleanup process during xhci-plat-hcd remove. @is_usb_offload_enabled: - called to check if usb offload enabled. @queue_irq_work: - called to queue vendor specific irq work. @alloc_dcbaa: - called when allocating vendor specific dcbaa during memory initializtion. @free_dcbaa: - called to free vendor specific dcbaa when cleanup the memory. @alloc_transfer_ring: - called when vendor specific transfer ring allocation is required @free_transfer_ring: - called to free vendor specific transfer ring @sync_dev_ctx: - called when synchronization for device context is required @usb_offload_skip_urb: - skip urb control for offloading @alloc_container_ctx: @free_container_ctx: - called to alloc and free vendor specific container context } The xhci hooks with prefix "xhci_vendor_" on the ops in xhci_vendor_ops. For example, vendor_init ops will be invoked by xhci_vendor_init() hook, is_usb_offload_enabled ops will be invoked by xhci_vendor_is_usb_offload_enabled(), and so on. Change-Id: Ib7f6952e6d44a2fcfe9d19a78f1d9f5093417613 Signed-off-by: Howard Yen <howardyen@google.com> Bug: 175358363 Signed-off-by: Greg Kroah-Harktman <gregkh@google.com> Signed-off-by: Puma Hsu <pumahsu@google.com> Signed-off-by: J. Avila <elavila@google.com> Signed-off-by: Daehwan Jung <dh10.jung@samsung.com> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>	2023-06-14 17:35:00 +00:00
Carlos Llamas	88959a53f4	ANDROID: 6/16/2023 KMI update Set KMI_GENERATION=9 for 6/16 KMI update function symbol 'struct block_device* I_BDEV(struct inode)' changed CRC changed from 0xb3d19fd2 to 0xc8597fa function symbol 'void __ClearPageMovable(struct page)' changed CRC changed from 0x66921e4f to 0xb4e74d22 function symbol 'void __SetPageMovable(struct page, const struct movable_operations)' changed CRC changed from 0x2b34667d to 0xe8b6d861 ... 4484 omitted; 4487 symbols have only CRC changes type 'struct request' changed byte size changed from 312 to 320 member 'u64 alloc_time_ns' was added 19 members ('u64 start_time_ns' .. 'u64 android_kabi_reserved1') changed offset changed by 64 type 'struct bio' changed byte size changed from 152 to 160 member 'u64 bi_iocost_cost' was added 12 members ('struct bio_crypt_ctx* bi_crypt_context' .. 'struct bio_vec bi_inline_vecs[0]') changed offset changed by 64 type 'enum cpuhp_state' changed enumerator 'CPUHP_AP_ARM_SDEI_STARTING' (116) was removed enumerator 'CPUHP_AP_ARM_VFP_STARTING' value changed from 117 to 116 enumerator 'CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING' value changed from 118 to 117 enumerator 'CPUHP_AP_PERF_ARM_HW_BREAKPOINT_STARTING' value changed from 119 to 118 enumerator 'CPUHP_AP_PERF_ARM_ACPI_STARTING' value changed from 120 to 119 enumerator 'CPUHP_AP_PERF_ARM_STARTING' value changed from 121 to 120 enumerator 'CPUHP_AP_PERF_RISCV_STARTING' value changed from 122 to 121 enumerator 'CPUHP_AP_ARM_L2X0_STARTING' value changed from 123 to 122 enumerator 'CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING' value changed from 124 to 123 enumerator 'CPUHP_AP_ARM_ARCH_TIMER_STARTING' value changed from 125 to 124 enumerator 'CPUHP_AP_ARM_GLOBAL_TIMER_STARTING' value changed from 126 to 125 enumerator 'CPUHP_AP_JCORE_TIMER_STARTING' value changed from 127 to 126 enumerator 'CPUHP_AP_ARM_TWD_STARTING' value changed from 128 to 127 enumerator 'CPUHP_AP_QCOM_TIMER_STARTING' value changed from 129 to 128 enumerator 'CPUHP_AP_TEGRA_TIMER_STARTING' value changed from 130 to 129 enumerator 'CPUHP_AP_ARMADA_TIMER_STARTING' value changed from 131 to 130 enumerator 'CPUHP_AP_MARCO_TIMER_STARTING' value changed from 132 to 131 enumerator 'CPUHP_AP_MIPS_GIC_TIMER_STARTING' value changed from 133 to 132 enumerator 'CPUHP_AP_ARC_TIMER_STARTING' value changed from 134 to 133 enumerator 'CPUHP_AP_RISCV_TIMER_STARTING' value changed from 135 to 134 enumerator 'CPUHP_AP_CLINT_TIMER_STARTING' value changed from 136 to 135 enumerator 'CPUHP_AP_CSKY_TIMER_STARTING' value changed from 137 to 136 enumerator 'CPUHP_AP_TI_GP_TIMER_STARTING' value changed from 138 to 137 enumerator 'CPUHP_AP_HYPERV_TIMER_STARTING' value changed from 139 to 138 enumerator 'CPUHP_AP_KVM_STARTING' value changed from 140 to 139 enumerator 'CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING' value changed from 141 to 140 enumerator 'CPUHP_AP_KVM_ARM_VGIC_STARTING' value changed from 142 to 141 enumerator 'CPUHP_AP_KVM_ARM_TIMER_STARTING' value changed from 143 to 142 enumerator 'CPUHP_AP_DUMMY_TIMER_STARTING' value changed from 144 to 143 enumerator 'CPUHP_AP_ARM_XEN_STARTING' value changed from 145 to 144 enumerator 'CPUHP_AP_ARM_CORESIGHT_STARTING' value changed from 146 to 145 enumerator 'CPUHP_AP_ARM_CORESIGHT_CTI_STARTING' value changed from 147 to 146 enumerator 'CPUHP_AP_ARM64_ISNDEP_STARTING' value changed from 148 to 147 enumerator 'CPUHP_AP_SMPCFD_DYING' value changed from 149 to 148 enumerator 'CPUHP_AP_X86_TBOOT_DYING' value changed from 150 to 149 enumerator 'CPUHP_AP_ARM_CACHE_B15_RAC_DYING' value changed from 151 to 150 enumerator 'CPUHP_AP_ONLINE' value changed from 152 to 151 enumerator 'CPUHP_TEARDOWN_CPU' value changed from 153 to 152 enumerator 'CPUHP_AP_ONLINE_IDLE' value changed from 154 to 153 enumerator 'CPUHP_AP_SCHED_WAIT_EMPTY' value changed from 155 to 154 enumerator 'CPUHP_AP_SMPBOOT_THREADS' value changed from 156 to 155 enumerator 'CPUHP_AP_X86_VDSO_VMA_ONLINE' value changed from 157 to 156 enumerator 'CPUHP_AP_IRQ_AFFINITY_ONLINE' value changed from 158 to 157 enumerator 'CPUHP_AP_BLK_MQ_ONLINE' value changed from 159 to 158 enumerator 'CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS' value changed from 160 to 159 enumerator 'CPUHP_AP_X86_INTEL_EPB_ONLINE' value changed from 161 to 160 enumerator 'CPUHP_AP_PERF_ONLINE' value changed from 162 to 161 enumerator 'CPUHP_AP_PERF_X86_ONLINE' value changed from 163 to 162 enumerator 'CPUHP_AP_PERF_X86_UNCORE_ONLINE' value changed from 164 to 163 enumerator 'CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE' value changed from 165 to 164 enumerator 'CPUHP_AP_PERF_X86_AMD_POWER_ONLINE' value changed from 166 to 165 enumerator 'CPUHP_AP_PERF_X86_RAPL_ONLINE' value changed from 167 to 166 enumerator 'CPUHP_AP_PERF_X86_CQM_ONLINE' value changed from 168 to 167 enumerator 'CPUHP_AP_PERF_X86_CSTATE_ONLINE' value changed from 169 to 168 enumerator 'CPUHP_AP_PERF_X86_IDXD_ONLINE' value changed from 170 to 169 enumerator 'CPUHP_AP_PERF_S390_CF_ONLINE' value changed from 171 to 170 enumerator 'CPUHP_AP_PERF_S390_SF_ONLINE' value changed from 172 to 171 enumerator 'CPUHP_AP_PERF_ARM_CCI_ONLINE' value changed from 173 to 172 enumerator 'CPUHP_AP_PERF_ARM_CCN_ONLINE' value changed from 174 to 173 enumerator 'CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE' value changed from 175 to 174 enumerator 'CPUHP_AP_PERF_ARM_HISI_DDRC_ONLINE' value changed from 176 to 175 enumerator 'CPUHP_AP_PERF_ARM_HISI_HHA_ONLINE' value changed from 177 to 176 enumerator 'CPUHP_AP_PERF_ARM_HISI_L3_ONLINE' value changed from 178 to 177 enumerator 'CPUHP_AP_PERF_ARM_HISI_PA_ONLINE' value changed from 179 to 178 enumerator 'CPUHP_AP_PERF_ARM_HISI_SLLC_ONLINE' value changed from 180 to 179 enumerator 'CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE' value changed from 181 to 180 enumerator 'CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE' value changed from 182 to 181 enumerator 'CPUHP_AP_PERF_ARM_L2X0_ONLINE' value changed from 183 to 182 enumerator 'CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE' value changed from 184 to 183 enumerator 'CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE' value changed from 185 to 184 enumerator 'CPUHP_AP_PERF_ARM_APM_XGENE_ONLINE' value changed from 186 to 185 enumerator 'CPUHP_AP_PERF_ARM_CAVIUM_TX2_UNCORE_ONLINE' value changed from 187 to 186 enumerator 'CPUHP_AP_PERF_ARM_MARVELL_CN10K_DDR_ONLINE' value changed from 188 to 187 enumerator 'CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE' value changed from 189 to 188 enumerator 'CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE' value changed from 190 to 189 enumerator 'CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE' value changed from 191 to 190 enumerator 'CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE' value changed from 192 to 191 enumerator 'CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE' value changed from 193 to 192 enumerator 'CPUHP_AP_PERF_POWERPC_HV_GPCI_ONLINE' value changed from 194 to 193 enumerator 'CPUHP_AP_PERF_CSKY_ONLINE' value changed from 195 to 194 enumerator 'CPUHP_AP_WATCHDOG_ONLINE' value changed from 196 to 195 enumerator 'CPUHP_AP_WORKQUEUE_ONLINE' value changed from 197 to 196 enumerator 'CPUHP_AP_RANDOM_ONLINE' value changed from 198 to 197 enumerator 'CPUHP_AP_RCUTREE_ONLINE' value changed from 199 to 198 enumerator 'CPUHP_AP_BASE_CACHEINFO_ONLINE' value changed from 200 to 199 enumerator 'CPUHP_AP_ONLINE_DYN' value changed from 201 to 200 enumerator 'CPUHP_AP_ONLINE_DYN_END' value changed from 231 to 230 enumerator 'CPUHP_AP_MM_DEMOTION_ONLINE' value changed from 232 to 231 enumerator 'CPUHP_AP_X86_HPET_ONLINE' value changed from 233 to 232 enumerator 'CPUHP_AP_X86_KVM_CLK_ONLINE' value changed from 234 to 233 enumerator 'CPUHP_AP_ACTIVE' value changed from 235 to 234 enumerator 'CPUHP_ANDROID_RESERVED_1' value changed from 236 to 235 enumerator 'CPUHP_ANDROID_RESERVED_2' value changed from 237 to 236 enumerator 'CPUHP_ANDROID_RESERVED_3' value changed from 238 to 237 enumerator 'CPUHP_ANDROID_RESERVED_4' value changed from 239 to 238 enumerator 'CPUHP_ONLINE' value changed from 240 to 239 type 'struct task_struct' changed byte size changed from 4736 to 4800 104 members ('const struct cred* ptracer_cred' .. 'struct thread_struct thread') changed offset changed by 384 type 'struct platform_driver' changed byte size changed from 240 to 248 member 'void(* remove_new)(struct platform_device)' was added 8 members ('void( shutdown)(struct platform_device)' .. 'u64 android_kabi_reserved1') changed offset changed by 64 type 'struct tipc_bearer' changed member 'u16 encap_hlen' was added type 'struct posix_cputimers_work' changed byte size changed from 24 to 72 member 'struct mutex mutex' was added member 'unsigned int scheduled' changed offset changed by 384 type 'struct binder_alloc' changed member 'struct vm_area_struct vma' was added member 'unsigned long vma_addr' was removed type 'struct usb_udc' changed byte size changed from 1000 to 952 member 'struct mutex connect_lock' was removed type 'enum kvm_pgtable_prot' changed enumerator 'KVM_PGTABLE_PROT_PXN' (32) was added enumerator 'KVM_PGTABLE_PROT_UXN' (64) was added Bug: 287162457 Change-Id: Ic3aad43bd3a6083cf91e71e79ece713bef0e8172 Signed-off-by: Carlos Llamas <cmllamas@google.com>	2023-06-14 16:40:59 +00:00
Carlos Llamas	21bc72f339	UPSTREAM: binder: fix UAF of alloc->vma in race with munmap() commit `d1d8875c8c` upstream. [ cmllamas: clean forward port from commit `015ac18be7` ("binder: fix UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed in mainline after the revert of commit `a43cfc87ca` ("android: binder: stop saving a pointer to the VMA") as pointed out by Liam. The commit log and tags have been tweaked to reflect this. ] In commit `720c241924` ("ANDROID: binder: change down_write to down_read") binder assumed the mmap read lock is sufficient to protect alloc->vma inside binder_update_page_range(). This used to be accurate until commit `dd2283f260` ("mm: mmap: zap pages with read mmap_sem in munmap"), which now downgrades the mmap_lock after detaching the vma from the rbtree in munmap(). Then it proceeds to teardown and free the vma with only the read lock held. This means that accesses to alloc->vma in binder_update_page_range() now will race with vm_area_free() in munmap() and can cause a UAF as shown in the following KASAN trace: ================================================================== BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0 Read of size 8 at addr ffff16204ad00600 by task server/558 CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x18/0x2c dump_stack+0xf8/0x164 print_address_description.constprop.0+0x9c/0x538 kasan_report+0x120/0x200 __asan_load8+0xa0/0xc4 vm_insert_page+0x7c/0x1f0 binder_update_page_range+0x278/0x50c binder_alloc_new_buf+0x3f0/0xba0 binder_transaction+0x64c/0x3040 binder_thread_write+0x924/0x2020 binder_ioctl+0x1610/0x2e5c __arm64_sys_ioctl+0xd4/0x120 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0 Allocated by task 559: kasan_save_stack+0x38/0x6c __kasan_kmalloc.constprop.0+0xe4/0xf0 kasan_slab_alloc+0x18/0x2c kmem_cache_alloc+0x1b0/0x2d0 vm_area_alloc+0x28/0x94 mmap_region+0x378/0x920 do_mmap+0x3f0/0x600 vm_mmap_pgoff+0x150/0x17c ksys_mmap_pgoff+0x284/0x2dc __arm64_sys_mmap+0x84/0xa4 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0 Freed by task 560: kasan_save_stack+0x38/0x6c kasan_set_track+0x28/0x40 kasan_set_free_info+0x24/0x4c __kasan_slab_free+0x100/0x164 kasan_slab_free+0x14/0x20 kmem_cache_free+0xc4/0x34c vm_area_free+0x1c/0x2c remove_vma+0x7c/0x94 __do_munmap+0x358/0x710 __vm_munmap+0xbc/0x130 __arm64_sys_munmap+0x4c/0x64 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0 [...] ================================================================== To prevent the race above, revert back to taking the mmap write lock inside binder_update_page_range(). One might expect an increase of mmap lock contention. However, binder already serializes these calls via top level alloc->mutex. Also, there was no performance impact shown when running the binder benchmark tests. Fixes: `c0fd210178` ("Revert "android: binder: stop saving a pointer to the VMA"") Fixes: `dd2283f260` ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: Jann Horn <jannh@google.com> Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver Cc: <stable@vger.kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Liam Howlett <liam.howlett@oracle.com> Change-Id: I4215750a81e94bccf5340e4d79f7b26bb039c573 Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `931ea1ed31`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Carlos Llamas	62c6dbdccd	UPSTREAM: binder: add lockless binder_alloc_(set\|get)_vma() commit `0fa53349c3` upstream. Bring back the original lockless design in binder_alloc to determine whether the buffer setup has been completed by the ->mmap() handler. However, this time use smp_load_acquire() and smp_store_release() to wrap all the ordering in a single macro call. Also, add comments to make it evident that binder uses alloc->vma to determine when the binder_alloc has been fully initialized. In these scenarios acquiring the mmap_lock is not required. Fixes: `a43cfc87ca` ("android: binder: stop saving a pointer to the VMA") Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Change-Id: I2a8040417790b6b82bf44e838146fd68403fdb51 Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `d7cee853bc`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Carlos Llamas	3cac174682	UPSTREAM: Revert "android: binder: stop saving a pointer to the VMA" commit `c0fd210178` upstream. This reverts commit `a43cfc87ca`. This patch fixed an issue reported by syzkaller in [1]. However, this turned out to be only a band-aid in binder. The root cause, as bisected by syzkaller, was fixed by commit `5789151e48` ("mm/mmap: undo ->mmap() when mas_preallocate() fails"). We no longer need the patch for binder. Reverting such patch allows us to have a lockless access to alloc->vma in specific cases where the mmap_lock is not required. This approach avoids the contention that caused a performance regression. [1] https://lore.kernel.org/all/0000000000004a0dbe05e1d749e0@google.com [cmllamas: resolved conflicts with rework of alloc->mm and removal of binder_alloc_set_vma() also fixed comment section] Fixes: `a43cfc87ca` ("android: binder: stop saving a pointer to the VMA") Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Change-Id: I208b4ebf832790eb155d52ec3115e1e6c58f6f80 Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20230502201220.1756319-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `72a94f8c14`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Carlos Llamas	dadb40b436	UPSTREAM: Revert "binder_alloc: add missing mmap_lock calls when using the VMA" commit `b15655b12d` upstream. This reverts commit `44e602b4e5`. This caused a performance regression particularly when pages are getting reclaimed. We don't need to acquire the mmap_lock to determine when the binder buffer has been fully initialized. A subsequent patch will bring back the lockless approach for this. [cmllamas: resolved trivial conflicts with renaming of alloc->mm] Fixes: `44e602b4e5` ("binder_alloc: add missing mmap_lock calls when using the VMA") Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: stable@vger.kernel.org Change-Id: If26447c08c59fbbc43731ecbd8b501c928ffbe2d Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20230502201220.1756319-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `7e6b854854`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Xin Long	fcdbf469c5	UPSTREAM: tipc: check the bearer min mtu properly when setting it by netlink [ Upstream commit `35a089b5d7` ] Checking the bearer min mtu with tipc_udp_mtu_bad() only works for IPv4 UDP bearer, and IPv6 UDP bearer has a different value for the min mtu. This patch checks with encap_hlen + TIPC_MIN_BEARER_MTU for min mtu, which works for both IPv4 and IPv6 UDP bearer. Note that tipc_udp_mtu_bad() is still used to check media min mtu in __tipc_nl_media_set(), as m->mtu currently is only used by the IPv4 UDP bearer as its default mtu value. Fixes: `682cd3cf94` ("tipc: confgiure and apply UDP bearer MTU on running links") Change-Id: I384afae6ffa9c43f72c1cda34ad2f1dd611fc675 Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `f215b62f59`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Xin Long	e48a801737	UPSTREAM: tipc: do not update mtu if msg_max is too small in mtu negotiation [ Upstream commit `56077b56cd` ] When doing link mtu negotiation, a malicious peer may send Activate msg with a very small mtu, e.g. 4 in Shuang's testing, without checking for the minimum mtu, l->mtu will be set to 4 in tipc_link_proto_rcv(), then n->links[bearer_id].mtu is set to 4294967228, which is a overflow of '4 - INT_H_SIZE - EMSG_OVERHEAD' in tipc_link_mss(). With tipc_link.mtu = 4, tipc_link_xmit() kept printing the warning: tipc: Too large msg, purging xmit list 1 5 0 40 4! tipc: Too large msg, purging xmit list 1 15 0 60 4! And with tipc_link_entry.mtu 4294967228, a huge skb was allocated in named_distribute(), and when purging it in tipc_link_xmit(), a crash was even caused: general protection fault, probably for non-canonical address 0x2100001011000dd: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.3.0.neta #19 RIP: 0010:kfree_skb_list_reason+0x7e/0x1f0 Call Trace: <IRQ> skb_release_data+0xf9/0x1d0 kfree_skb_reason+0x40/0x100 tipc_link_xmit+0x57a/0x740 [tipc] tipc_node_xmit+0x16c/0x5c0 [tipc] tipc_named_node_up+0x27f/0x2c0 [tipc] tipc_node_write_unlock+0x149/0x170 [tipc] tipc_rcv+0x608/0x740 [tipc] tipc_udp_recv+0xdc/0x1f0 [tipc] udp_queue_rcv_one_skb+0x33e/0x620 udp_unicast_rcv_skb.isra.72+0x75/0x90 __udp4_lib_rcv+0x56d/0xc20 ip_protocol_deliver_rcu+0x100/0x2d0 This patch fixes it by checking the new mtu against tipc_bearer_min_mtu(), and not updating mtu if it is too small. Fixes: `ed193ece26` ("tipc: simplify link mtu negotiation") Reported-by: Shuang Li <shuali@redhat.com> Change-Id: I95f28cbfaf6dc4899e0695ba6168c7c58737f06b Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `259683001d`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Xin Long	461038ba5c	UPSTREAM: tipc: add tipc_bearer_min_mtu to calculate min mtu [ Upstream commit `3ae6d66b60` ] As different media may requires different min mtu, and even the same media with different net family requires different min mtu, add tipc_bearer_min_mtu() to calculate min mtu accordingly. This API will be used to check the new mtu when doing the link mtu negotiation in the next patch. Change-Id: I960cf07506388294eb6028938025e1073a2c4be5 Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: `56077b56cd` ("tipc: do not update mtu if msg_max is too small in mtu negotiation") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `735c64ea88`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Francesco Dolcini	d0be9e79ee	UPSTREAM: Revert "usb: gadget: udc: core: Invoke usb_gadget_connect only when started" commit `f22e9b67f1` upstream. This reverts commit `0db213ea8e`. It introduces an issues with configuring the USB gadget hangs forever on multiple Qualcomm and NXP i.MX SoC at least. Cc: stable@vger.kernel.org Fixes: `0db213ea8e` ("usb: gadget: udc: core: Invoke usb_gadget_connect only when started") Reported-by: Stephan Gerhold <stephan@gerhold.net> Reported-by: Francesco Dolcini <francesco.dolcini@toradex.com> Link: https://lore.kernel.org/all/ZF4BvgsOyoKxdPFF@francesco-nb.int.toradex.com/ Change-Id: I2a294aedee1ca56b293db30fc7d9258e92e61372 Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Link: https://lore.kernel.org/r/20230512131435.205464-3-francesco@dolcini.it Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `ea56ede911`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Shengjiu Wang	66a5c03404	UPSTREAM: ASoC: fsl_micfil: Fix error handler with pm_runtime_enable [ Upstream commit `17955aba78` ] There is error message when defer probe happens: fsl-micfil-dai 30ca0000.micfil: Unbalanced pm_runtime_enable! Fix the error handler with pm_runtime_enable and add fsl_micfil_remove() for pm_runtime_disable. Fixes: `47a70e6fc9` ("ASoC: Add MICFIL SoC Digital Audio Interface driver.") Change-Id: I292d01a821e595076795be3088b2b816251a700f Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com Link: https://lore.kernel.org/r/1683540996-6136-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ce6c7befc2`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Uwe Kleine-König	6e721f991f	UPSTREAM: platform: Provide a remove callback that returns no value [ Upstream commit `5c5a7680e6` ] struct platform_driver::remove returning an integer made driver authors expect that returning an error code was proper error handling. However the driver core ignores the error and continues to remove the device because there is nothing the core could do anyhow and reentering the remove callback again is only calling for trouble. So this is an source for errors typically yielding resource leaks in the error path. As there are too many platform drivers to neatly convert them all to return void in a single go, do it in several steps after this patch: a) Convert all drivers to implement .remove_new() returning void instead of .remove() returning int; b) Change struct platform_driver::remove() to return void and so make it identical to .remove_new(); c) Change all drivers back to .remove() now with the better prototype; d) drop struct platform_driver::remove_new(). While this touches all drivers eventually twice, steps a) and c) can be done one driver after another and so reduces coordination efforts immensely and simplifies review. Change-Id: I7da6828a301462bad53470cf94db94d55ac51d37 Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20221209150914.3557650-1-u.kleine-koenig@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: `17955aba78` ("ASoC: fsl_micfil: Fix error handler with pm_runtime_enable") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `9d3ac384cb`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Pierre Gondois	07a8c09137	UPSTREAM: firmware: arm_sdei: Fix sleep from invalid context BUG [ Upstream commit `d2c48b2387` ] Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0 preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by cpuhp/0/24: #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130 irq event stamp: 36 hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0 hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248 softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...] Hardware name: WIWYNN Mt.Jade Server [...] Call trace: dump_backtrace+0x114/0x120 show_stack+0x20/0x70 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x228 rt_spin_lock+0x70/0x120 sdei_cpuhp_up+0x3c/0x130 cpuhp_invoke_callback+0x250/0xf08 cpuhp_thread_fun+0x120/0x248 smpboot_thread_fn+0x280/0x320 kthread+0x130/0x140 ret_from_fork+0x10/0x20 sdei_cpuhp_up() is called in the STARTING hotplug section, which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry instead to execute the cpuhp cb later, with preemption enabled. SDEI originally got its own cpuhp slot to allow interacting with perf. It got superseded by pNMI and this early slot is not relevant anymore. [1] Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the calling CPU. It is checked that preemption is disabled for them. _ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'. Preemption is enabled in those threads, but their cpumask is limited to 1 CPU. Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb don't trigger them. Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call which acts on the calling CPU. [1]: https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/ Suggested-by: James Morse <james.morse@arm.com> Change-Id: I9f73aadd24096d8298b5ae8f26f955e9f6ee2b9a Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `a8267bc8de`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Kevin Brodsky	b065972b7b	UPSTREAM: uapi/linux/const.h: prefer ISO-friendly __typeof__ [ Upstream commit `31088f6f79` ] typeof is (still) a GNU extension, which means that it cannot be used when building ISO C (e.g. -std=c99). It should therefore be avoided in uapi headers in favour of the ISO-friendly __typeof__. Unfortunately this issue could not be detected by CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in any uapi header. This matters from a userspace perspective, not a kernel one. uapi headers and their contents are expected to be usable in a variety of situations, and in particular when building ISO C applications (with -std=c99 or similar). This particular problem can be reproduced by trying to use the __ALIGN_KERNEL macro directly in application code, say: int align(int x, int a) { return __KERNEL_ALIGN(x, a); } and trying to build that with -std=c99. Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com Fixes: `a79ff731a1` ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()") Change-Id: I05462cdee00da59617f3dfb875c233a246f7d2f6 Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Tested-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ef9f854103`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Thomas Gleixner	aaf6ccb6f3	UPSTREAM: posix-cpu-timers: Implement the missing timer_wait_running callback commit `f7abf14f00` upstream. For some unknown reason the introduction of the timer_wait_running callback missed to fixup posix CPU timers, which went unnoticed for almost four years. Marco reported recently that the WARN_ON() in timer_wait_running() triggers with a posix CPU timer test case. Posix CPU timers have two execution models for expiring timers depending on CONFIG_POSIX_CPU_TIMERS_TASK_WORK: 1) If not enabled, the expiry happens in hard interrupt context so spin waiting on the remote CPU is reasonably time bound. Implement an empty stub function for that case. 2) If enabled, the expiry happens in task work before returning to user space or guest mode. The expired timers are marked as firing and moved from the timer queue to a local list head with sighand lock held. Once the timers are moved, sighand lock is dropped and the expiry happens in fully preemptible context. That means the expiring task can be scheduled out, migrated, interrupted etc. So spin waiting on it is more than suboptimal. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way: - Add a mutex to struct posix_cputimers_work. This struct is per task and used to schedule the expiry task work from the timer interrupt. - Add a task_struct pointer to struct cpu_timer which is used to store a the task which runs the expiry. That's filled in when the task moves the expired timers to the local expiry list. That's not affecting the size of the k_itimer union as there are bigger union members already - Let the task take the expiry mutex around the expiry function - Let the waiter acquire a task reference with rcu_read_lock() held and block on the expiry mutex This avoids spin-waiting on a task which might not even be on a CPU and works nicely for RT too. Fixes: `ec8f954a40` ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Reported-by: Marco Elver <elver@google.com> Change-Id: Ic069585c15bc968dec3c2b99cc70256f56a70b32 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marco Elver <elver@google.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `bccf9fe296`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-06-14 16:40:59 +00:00
Greg Kroah-Hartman	f3b712fcb5	ANDROID: GKI: reserve extra arm64 cpucaps for ABI preservation Over the lifetime of the kernel, new arm64 cpucaps need to be added to handle errata and other fun stuff. So reserve 20 spots for us to use in the future as this is an ABI-stable structure that we can not increase over time without major problems. Bug: 151154716 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I37bdac374e2570f61ab54919712fd62c7e541e67	2023-06-14 16:40:59 +00:00
Jindong Yue	d1c7974b1f	ANDROID: arm64: errata: Add WORKAROUND_NXP_ERR050104 cpucaps This is a placeholder to workaround NXP iMX8QM A53 Cache coherency issue. The full patch is still under review upstream. Considering the patch adds a new cpucap, which breaks KMI, and the KMI freeze date is coming, so use a placeholder here to update KMI before the freeze. According to NXP errata document[1] i.MX8QuadMax SoC suffers from serious cache coherence issue. It was also mentioned in initial support[2] for imx8qm mek machine. Following is excerpt from NXP IMX8_1N94W "Mask Set Errata" document Rev. 5, 3/2023. Just in case it gets lost somehow. "ERR050104: Arm/A53: Cache coherency issue" Description Some maintenance operations exchanged between the A53 and A72 core clusters, involving some Translation Look-aside Buffer Invalidate (TLBI) and Instruction Cache (IC) instructions can be corrupted. The upper bits, above bit-35, of ARADDR and ACADDR buses within in Arm A53 sub-system have been incorrectly connected. Therefore ARADDR and ACADDR address bits above bit-35 should not be used. Workaround The following software instructions are required to be downgraded to TLBI VMALLE1IS: TLBI ASIDE1, TLBI ASIDE1IS, TLBI VAAE1, TLBI VAAE1IS, TLBI VAALE1, TLBI VAALE1IS, TLBI VAE1, TLBI VAE1IS, TLBI VALE1, TLBI VALE1IS The following software instructions are required to be downgraded to TLBI VMALLS12E1IS: TLBI IPAS2E1IS, TLBI IPAS2LE1IS The following software instructions are required to be downgraded to TLBI ALLE2IS: TLBI VAE2IS, TLBI VALE2IS. The following software instructions are required to be downgraded to TLBI ALLE3IS: TLBI VAE3IS, TLBI VALE3IS. The following software instructions are required to be downgraded to TLBI VMALLE1IS when the Force Broadcast (FB) bit [9] of the Hypervisor Configuration Register (HCR_EL2) is set: TLBI ASIDE1, TLBI VAAE1, TLBI VAALE1, TLBI VAE1, TLBI VALE1 The following software instruction is required to be downgraded to IC IALLUIS: IC IVAU, Xt Specifically for the IC IVAU, Xt downgrade, setting SCTLR_EL1.UCI to 0 will disable EL0 access to this instruction. Any attempt to execute from EL0 will generate an EL1 trap, where the downgrade to IC ALLUIS can be implemented. [1] https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf [2] commit `307fd14d4b` ("arm64: dts: imx: add imx8qm mek support") Bug: 284762900 Link: https://lore.kernel.org/linux-arm-kernel/20230420112952.28340-1-iivanov@suse.de/ Signed-off-by: Jindong Yue <jindong.yue@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8dd50b369412de73b608805d1b5bb8424ea23280	2023-06-14 16:40:59 +00:00
Quentin Perret	b489c53001	ANDROID: KVM: arm64: Allow setting {P,U}XN in stage-2 PTEs FEAT_XNX allows to specify PXN and UXN attributes on stage-2 entries. Make this usable from pKVM by exposing two new kvm_pgtable_prot entries for each of them. No functional changes intended. Bug: 264070847 Change-Id: I47d861fa64ba511370b182f4609fe1c27695a949 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	b7aff5c603	ANDROID: KVM: arm64: Restrict host-to-hyp MMIO donations Nothing currently prevents the donation of an MMIO region to the hypervisor for backing e.g. guest stage-2 page-tables, tracing buffers, hyp vm and vcpu metadata, or any other donation to EL2. However, the only confirmed use-case for MMIO donations are for protecting the IOMMU registers as well as for vendor module usage. Restrict the donation of MMIO regions to these two paths only by introducing a new helper function. Bug: 264070847 Change-Id: I914508fb3e3547fcfabca8557bdf7948cb796099 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	f5f8c19f6c	ANDROID: KVM: arm64: Allow state changes of MMIO pages We've historically disallowed state changes for MMIO pages -- the host had sole ownership of all of them. However, changing the state of those pages has clearly become a goal both to support vendor extensions to the hypervisor, as well as to support device assignment in the longer term. To pave the way towards this support, let's allow certain state transitions for MMIO pages. Bug: 264070847 Change-Id: I9803b572c90d8a694c3d43a0ee0d7b4f4124fe4a Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	4ddb4ed818	ANDROID: KVM: arm64: Allow MMIO perm changes from modules We now allow donations of MMIO ranges, let's also allow modules to change host stage-2 permissions. Bug: 264070847 Change-Id: Ia72678bb27559d9a7963dbc5ffb5a101efcbbad2 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	5d0225cdf0	ANDROID: KVM: arm64: Don't allocate from handle_host_mem_abort There shouldn't be any reason to ever need allocating from the host stage-2 pool during mem aborts now that the base page-table structure is pinned. To prevent future regressions in this area, introduce a new sanity check that will warn when hyp_page_alloc() is used from the mem wrong paths. Bug: 264070847 Change-Id: I7a7c606fe01558790e4ffcd3534f8976caf48bd0 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	5136a28ab6	ANDROID: KVM: arm64: Donate IOMMU regions to pKVM The MMIO register space for IOMMUs controlled by the hypervisor is currently unmapped from the host stage-2, and we rely on the host abort path to not accidentally map them. However, this approach becomes increasingly difficult to maintain as we introduce support for donating MMIO regions and not just memory -- nothing prevents the host from donating a protected MMIO register to another entity for example. Now that MMIO donations are possible, let's use the proper host-donate-hyp machinery to implement this. As a nice side effect, this guarantees the host stage-2 page-table is annotated with hyp ownership for those IOMMU regions, which guarantees the core range alignment feature in the host mem abort parth will do the right thing without requiring a second pass in the IOMMU code. This also turns the host stage-2 PTEs into "non-default" entries, hence avoiding issues with the coallescing code looking forward. Bug: 264070847 Change-Id: I1fad1b1be36f3b654190a912617e780141945a8f Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	23b62ec342	ANDROID: KVM: arm64: Map MMIO donation as device at EL2 We now support donations of MMIO ranges to the hypervisor. Make sure to update the donation logic to correctly map these pages with device mappings. Bug: 264070847 Change-Id: I36558f05ed47d1e3dc06e4e24151241474b4ff77 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	adc78128b3	ANDROID: KVM: arm64: Don't recycle pages from host mem abort We're now guaranteed by construction to not require structural changes to the host stage-2 page-table from the host memory abort path, so let's use the low-level __host_stage2_idmap() function directly instead of the higher-level wrapper that attempts page recycling when running out of memory. Bug: 264070847 Change-Id: I2db34777386931bfb3f93ea3b3e51e1e2a10ea79 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	452ef5ae7b	ANDROID: KVM: arm64: Pin host stage-2 tables Now that the host stage-2 page-table is entirely pre-populated in __pkvm_init_finalize(), we know that by the end of this function, the structure of the page-table will remain stable until the host calls in the hypervisor to require e.g. a page-table changes (by e.g. running a guest). This does not necessarily mean that no host mem aborts will occur -- there may be null PTEs in the host stage-2 due to collapsed block mappings from fix_host_ownership() for example -- but all those aborts should be trivially handled without requiring structural changes to the page-table. This has the nice side effect of guaranteeing that host_mem_abort() will not allocate from the host stage-2 pool. In order to ensure this desirable property is retained for the lifetime of the system even in the presence of the coalescing feature, let's 'pin' the structure of the page-table as-is by taking an additional reference from each table entry. Bug: 264070847 Change-Id: If870d7485cc38f6ad714901e710287911f111897 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	a8bba661e3	ANDROID: KVM: arm64: Move kvm_pte_follow() to header We will soon need to use kvm_pte_follow() from outside pgtable.c, so move it to the header file as static inline. Bug: 264070847 Change-Id: I319dff1b352a4acd8d9a5cc74acb5f1758be358f Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	04ddc7eec0	ANDROID: KVM: arm64: Pre-populate host stage2 We will soon attempt to avoid any memory allocations from the host mem abort path. In order to pave the way towards supporting this, let's pre-populate the host stage-2 for the entire address space using as many block mappings as possible. Some of these mappings may need to be collapsed shortly after from fix_host_ownership() for example, so this doesn't guarantee the absence of memory aborts altogether, but helps getting the structure of the page-table in the right shape early on. Bug: 264070847 Change-Id: Ib3ce25c893f779437ce473d64e08e8876870556c Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	0b6736459a	ANDROID: KVM: arm64: Fix the host ownership later The fix_host_ownership() path walks the hypervisor's stage-1 page-table to adjust the host's stage-2 accordingly. However, this is done before the hyp stage-1 refcount has been fixed up, and before the hyp percpu fixmap has been created. This all works right now as we start off with an empty host stage-2, so none of the changes require the usage of the fixmap for e.g. CMOs. To prepare the ground for doing fix_host_ownership() with a non-empty page-table, finalize the hyp stage-1 upfront. Bug: 264070847 Change-Id: I6aff3ac2f835be3fb3fba7660540c0a9b99c097d Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	cf2d193d9b	ANDROID: KVM: arm64: Don't recycle non-default PTEs When recycling host stage-2 page-table pages, we currenly blindly unmap all 'non-moveable' regions. To prepare the ground for allowing the mapping of those regions with non-default attributes, let's switch to using the recently introduced kvm_pgtable_stage2_reclaim_leaf() helper which will only reclaim pages containing PTEs with default attributes. Bug: 264070847 Change-Id: I4a441a20abe84d2405efcfa403908078c10be841 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Quentin Perret	a701418f2f	ANDROID: KVM: arm64: Introduce kvm_pgtable_stage2_reclaim_leaves We will soon improve the mechanism by which the host's stage-2 page-table pages are recycled whenever its pool runs out of pages. To prepare thecground for this, introduce a new helper function in the page-table code allowing to reclaim leaf pages that don't hold counted PTEs. Bug: 264070847 Change-Id: Ie172bf11f2980e45bc908002368759f74f42d195 Signed-off-by: Quentin Perret <qperret@google.com>	2023-06-14 16:40:59 +00:00
Yang Yang	5224fbb5b8	ANDROID: GKI: enable CONFIG_BLK_CGROUP_IOCOST Enable CONFIG_BLK_CGROUP_IOCOST to help control IO resources. Bug: 188749221 Bug: 285074916 Change-Id: I611b3ff5929d0a998fa6241967887803636b7588 (cherry picked from commit `19316b4889`) Signed-off-by: Yang Yang <yang.yang@vivo.com>	2023-06-14 16:40:59 +00:00
Roy Luo	fe10954309	BACKPORT: FROMGIT: usb: core: add sysfs entry for usb device state Expose usb device state to userland as the information is useful in detecting non-compliant setups and diagnosing enumeration failures. For example: - End-to-end signal integrity issues: the device would fail port reset repeatedly and thus be stuck in POWERED state. - Charge-only cables (missing D+/D- lines): the device would never enter POWERED state as the HC would not see any pullup. What's the status quo? We do have error logs such as "Cannot enable. Maybe the USB cable is bad?" to flag potential setup issues, but there's no good way to expose them to userspace. Why add a sysfs entry in struct usb_port instead of struct usb_device? The struct usb_device is not device_add() to the system until it's in ADDRESS state hence we would miss the first two states. The struct usb_port is a better place to keep the information because its life cycle is longer than the struct usb_device that is attached to the port. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202306042228.e532af6e-oliver.sang@intel.com Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Change-Id: Ib78d4c7b4b1db402828c92dc792838a1015f0f2c Signed-off-by: Roy Luo <royluo@google.com> Message-ID: <20230608015913.1679984-1-royluo@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Backport conflicts: the adjacent sysfs entry is different in ABI documentation) Bug: 285199434 (cherry picked from commit `83cb2604f6` https: //git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-testing) Change-Id: I1a0da6686e57be05ef10ae98892599eb37074014 Signed-off-by: Roy Luo <royluo@google.com>	2023-06-14 13:51:57 +00:00
zuoyonghua	251efd6587	ANDROID: GKI: Update symbols to symbol list Add symbol list for oplus in android/abi_gki_aarch64_oplus 1 function symbol(s) added 'int public_key_verify_signature(const struct public_key, const struct public_key_signature)' Bug: 286993971 Change-Id: I748437d61b46b6ee3736b3c7df36ab7249b187f6 Signed-off-by: zuoyonghua <zuoyonghua@oppo.com>	2023-06-14 13:35:51 +00:00
Lee Jones	71761b36c3	ANDROID: HID; Over-ride default maximum buffer size when using UHID Presently, when a report is processed, its proposed size, provided by the user of the API (as Report Size * Report Count) is compared against the subsystem default HID_MAX_BUFFER_SIZE (16k). However, some low-level HID drivers allocate a reduced amount of memory to their buffers (e.g. UHID only allocates UHID_DATA_MAX (4k) buffers), rending this check inadequate in some cases. In these circumstances, if the received report ends up being smaller than the proposed report size, the remainder of the buffer is zeroed. That is, the space between sizeof(csize) (size of the current report) and the rsize (size proposed i.e. Report Size * Report Count), which can be handled up to HID_MAX_BUFFER_SIZE (16k). Meaning that memset() shoots straight past the end of the buffer boundary and starts zeroing out in-use values, often resulting in calamity. This is an Android specific patch which essentially achieves the same goal as the recently reverted upstream commits `b1a37ed00d` "(HID: core: Provide new max_buffer_size attribute to over-ride the default") and `1c5d422124` ("HID: uhid: Over-ride the default maximum data buffer value with our own") only it does so in an ABI friendly (albeit more hacky) way. Bug: 260007429 Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: I1f56673bb67b63ab14b58634bfe74a04b0758e3d	2023-06-13 16:49:26 +00:00
Peng Zhang	c3f3dc31f9	UPSTREAM: maple_tree: make maple state reusable after mas_empty_area() Make mas->min and mas->max point to a node range instead of a leaf entry range. This allows mas to still be usable after mas_empty_area() returns. Users would get unexpected results from other operations on the maple state after calling the affected function. For example, x86 MAP_32BIT mmap() acts as if there is no suitable gap when there should be one. Link: https://lkml.kernel.org/r/20230505145829.74574-1-zhangpeng.00@bytedance.com Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reported-by: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Reported-by: Tad <support@spotco.us> Reported-by: Michael Keyes <mgkeyes@vigovproductions.net> Link: https://lore.kernel.org/linux-mm/32f156ba80010fd97dbaf0a0cdfc84366608624d.camel@intel.com/ Link: https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.camel@spotco.us/ Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit `0257d9908d`) Bug: 281094761 Change-Id: I381313fa2e84cafbcb9da5ea25fd01e3a868b6d1 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-13 00:53:45 +00:00
Suren Baghdasaryan	d31ddcdbb8	Revert "Revert "mm/mmap: regression fix for unmapped_area{_topdown}"" This reverts commit 52ace503ecf894ec2f63b8137f181868ea61d95a. The issue that required the revert is fixed by: `0257d9908d` ("maple_tree: make maple state reusable after mas_empty_area()") Bug: 281094761 Change-Id: I97b45525689097d0c1369f81a994d50f0662c9c2 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-06-13 00:53:39 +00:00

1 2 3 4 5 ...

1149720 Commits