Commit Graph

1150063 Commits

Author SHA1 Message Date
Yu Kuai
404522c763 UPSTREAM: blk-ioc: protect ioc_destroy_icq() by 'queue_lock'
Currently, icq is tracked by both request_queue(icq->q_node) and
task(icq->ioc_node), and ioc_clear_queue() from elevator exit is not
safe because it can access the list without protection:

ioc_clear_queue			ioc_release_fn
 lock queue_lock
 list_splice
 /* move queue list to a local list */
 unlock queue_lock
 /*
  * lock is released, the local list
  * can be accessed through task exit.
  */

				lock ioc->lock
				while (!hlist_empty)
				 icq = hlist_entry
				 lock queue_lock
				  ioc_destroy_icq
				   delete icq->ioc_node
 while (!list_empty)
  icq = list_entry()		   list_del icq->q_node
  /*
   * This is not protected by any lock,
   * list_entry concurrent with list_del
   * is not safe.
   */

				 unlock queue_lock
				unlock ioc->lock

Fix this problem by protecting list 'icq->q_node' by queue_lock from
ioc_clear_queue().

Reported-and-tested-by: Pradeep Pragallapati <quic_pragalla@quicinc.com>
Link: https://lore.kernel.org/lkml/20230517084434.18932-1-quic_pragalla@quicinc.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531073435.2923422-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Bug: 285274586
(cherry picked from commit 5a0ac57c48)
Change-Id: I60f3acfaa32f18bed58c8190178cdca5ebd91100
Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com>
2023-09-05 15:27:33 +00:00
Wei Liu
bd0308e36b ANDROID: GKI: Update symbols to symbol list
Update symbols to symbol list externed by oppo network group.

5 Added function:

  [A] 'function int __rtnl_link_register(rtnl_link_ops*)'
  [A] 'function int ip_local_deliver(struct sk_buff *)'
  [A] 'function iov_iter_advance(struct iov_iter *i, size_t size)'
  [A] 'function int nf_register_net_hook(struct net *net,
					 const struct nf_hook_ops *reg)'
  [A] 'function void nf_unregister_net_hook(struct net *,
					    const struct nf_hook_ops *)'

These functions have been merged in lower versions of the kernel and are still needed by oppo in higher versions.

These functions are needed by other modules that provide functionality for oppo's network, such as the network tracking module, the network warm-up module, etc.

Bug: 297979024

Change-Id: Ic1a4c869b3894a06f7cab7b5120574ed94d519b2
Signed-off-by: Wei Liu <liuwei.a@oppo.com>
2023-09-05 12:38:42 +00:00
John Stultz
87647c0c54 ANDROID: uid_sys_stats: Use llist for deferred work
A use-after-free bug was found in the previous custom lock-free list
implementation for the deferred work, so switch functionality to llist
implementation.

While the previous approach atomically handled the list head, it did not
assure the new node's next pointer was assigned before the head was
pointed to the node, allowing the consumer to traverse to an invalid
next pointer.

Additionally, in switching to llists, this patch pulls the entire list
off the list head once and processes it separately, reducing the number
of atomic operations compared with the custom lists's implementation
which pulled one node at a time atomically from the list head.

BUG: KASAN: use-after-free in process_notifier+0x270/0x2dc
Write of size 8 at addr d4ffff89545c3c58 by task Blocking Thread/3431
Pointer tag: [d4], memory tag: [fe]

call trace:
 dump_backtrace+0xf8/0x118
 show_stack+0x18/0x24
 dump_stack_lvl+0x60/0x78
 print_report+0x178/0x470
 kasan_report+0x8c/0xbc
 kasan_tag_mismatch+0x28/0x3c
 __hwasan_tag_mismatch+0x30/0x60
 process_notifier+0x270/0x2dc
 notifier_call_chain+0xb4/0x108
 blocking_notifier_call_chain+0x54/0x80
 profile_task_exit+0x20/0x2c
 do_exit+0xec/0x1114
 __arm64_sys_exit_group+0x0/0x24
 get_signal+0x93c/0xa78
 do_notify_resume+0x158/0x3fc
 el0_svc+0x54/0x78
 el0t_64_sync_handler+0x44/0xe4
 el0t_64_sync+0x190/0x194

Bug: 294468796
Bug: 295787403
Fixes: 8e86825eec ("ANDROID: uid_sys_stats: Use a single work for deferred updates")
Change-Id: Id377348c239ec720a5237726bc3632544d737e3b
Signed-off-by: John Stultz <jstultz@google.com>
[nkapron: Squashed with other changes and rewrote the commit message]
Signed-off-by: Neill Kapron <nkapron@google.com>
2023-09-05 12:07:09 +00:00
Lin Ma
4b3ab91671 UPSTREAM: net: nfc: Fix use-after-free caused by nfc_llcp_find_local
[ Upstream commit 6709d4b7bc ]

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params   | // nfc_unregister_device
                             |
dev = nfc_get_device(idx);   | device_lock(...)
if (!dev)                    | dev->shutting_down = true;
    return -ENODEV;          | device_unlock(...);
                             |
device_lock(...);            |   // nfc_llcp_unregister_device
                             |   nfc_llcp_find_local()
nfc_llcp_find_local(...);    |
                             |   local_cleanup()
if (!local) {                |
    rc = -ENODEV;            |     // nfc_llcp_local_put
    goto exit;               |     kref_put(.., local_release)
}                            |
                             |       // local_release
                             |       list_del(&local->list)
  // nfc_genl_send_params    |       kfree()
  local->dev->idx !!!UAF!!!  |
                             |

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780  net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
 <TASK>
 __dump_stack  lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x72/0xa0  lib/dump_stack.c:106
 print_address_description  mm/kasan/report.c:319 [inline]
 print_report+0xcc/0x620  mm/kasan/report.c:430
 kasan_report+0xb2/0xe0  mm/kasan/report.c:536
 nfc_genl_send_params  net/nfc/netlink.c:999 [inline]
 nfc_genl_llc_get_params+0x72f/0x780  net/nfc/netlink.c:1045
 genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0  net/netlink/genetlink.c:968
 genl_family_rcv_msg  net/netlink/genetlink.c:1048 [inline]
 genl_rcv_msg+0x503/0x7d0  net/netlink/genetlink.c:1065
 netlink_rcv_skb+0x161/0x430  net/netlink/af_netlink.c:2548
 genl_rcv+0x28/0x40  net/netlink/genetlink.c:1076
 netlink_unicast_kernel  net/netlink/af_netlink.c:1339 [inline]
 netlink_unicast+0x644/0x900  net/netlink/af_netlink.c:1365
 netlink_sendmsg+0x934/0xe70  net/netlink/af_netlink.c:1913
 sock_sendmsg_nosec  net/socket.c:724 [inline]
 sock_sendmsg+0x1b6/0x200  net/socket.c:747
 ____sys_sendmsg+0x6e9/0x890  net/socket.c:2501
 ___sys_sendmsg+0x110/0x1b0  net/socket.c:2555
 __sys_sendmsg+0xf7/0x1d0  net/socket.c:2584
 do_syscall_x64  arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3f/0x90  arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
 </TASK>

Allocated by task 20116:
 kasan_save_stack+0x22/0x50  mm/kasan/common.c:45
 kasan_set_track+0x25/0x30  mm/kasan/common.c:52
 ____kasan_kmalloc  mm/kasan/common.c:374 [inline]
 __kasan_kmalloc+0x7f/0x90  mm/kasan/common.c:383
 kmalloc  include/linux/slab.h:580 [inline]
 kzalloc  include/linux/slab.h:720 [inline]
 nfc_llcp_register_device+0x49/0xa40  net/nfc/llcp_core.c:1567
 nfc_register_device+0x61/0x260  net/nfc/core.c:1124
 nci_register_device+0x776/0xb20  net/nfc/nci/core.c:1257
 virtual_ncidev_open+0x147/0x230  drivers/nfc/virtual_ncidev.c:148
 misc_open+0x379/0x4a0  drivers/char/misc.c:165
 chrdev_open+0x26c/0x780  fs/char_dev.c:414
 do_dentry_open+0x6c4/0x12a0  fs/open.c:920
 do_open  fs/namei.c:3560 [inline]
 path_openat+0x24fe/0x37e0  fs/namei.c:3715
 do_filp_open+0x1ba/0x410  fs/namei.c:3742
 do_sys_openat2+0x171/0x4c0  fs/open.c:1356
 do_sys_open  fs/open.c:1372 [inline]
 __do_sys_openat  fs/open.c:1388 [inline]
 __se_sys_openat  fs/open.c:1383 [inline]
 __x64_sys_openat+0x143/0x200  fs/open.c:1383
 do_syscall_x64  arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3f/0x90  arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
 kasan_save_stack+0x22/0x50  mm/kasan/common.c:45
 kasan_set_track+0x25/0x30  mm/kasan/common.c:52
 kasan_save_free_info+0x2e/0x50  mm/kasan/generic.c:521
 ____kasan_slab_free  mm/kasan/common.c:236 [inline]
 ____kasan_slab_free  mm/kasan/common.c:200 [inline]
 __kasan_slab_free+0x10a/0x190  mm/kasan/common.c:244
 kasan_slab_free  include/linux/kasan.h:162 [inline]
 slab_free_hook  mm/slub.c:1781 [inline]
 slab_free_freelist_hook  mm/slub.c:1807 [inline]
 slab_free  mm/slub.c:3787 [inline]
 __kmem_cache_free+0x7a/0x190  mm/slub.c:3800
 local_release  net/nfc/llcp_core.c:174 [inline]
 kref_put  include/linux/kref.h:65 [inline]
 nfc_llcp_local_put  net/nfc/llcp_core.c:182 [inline]
 nfc_llcp_local_put  net/nfc/llcp_core.c:177 [inline]
 nfc_llcp_unregister_device+0x206/0x290  net/nfc/llcp_core.c:1620
 nfc_unregister_device+0x160/0x1d0  net/nfc/core.c:1179
 virtual_ncidev_close+0x52/0xa0  drivers/nfc/virtual_ncidev.c:163
 __fput+0x252/0xa20  fs/file_table.c:321
 task_work_run+0x174/0x270  kernel/task_work.c:179
 resume_user_mode_work  include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop  kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x108/0x110  kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work  kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x21/0x50  kernel/entry/common.c:297
 do_syscall_64+0x4c/0x90  arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
 kasan_save_stack+0x22/0x50  mm/kasan/common.c:45
 __kasan_record_aux_stack+0x95/0xb0  mm/kasan/generic.c:491
 kvfree_call_rcu+0x29/0xa80  kernel/rcu/tree.c:3328
 drop_sysctl_table+0x3be/0x4e0  fs/proc/proc_sysctl.c:1735
 unregister_sysctl_table.part.0+0x9c/0x190  fs/proc/proc_sysctl.c:1773
 unregister_sysctl_table+0x24/0x30  fs/proc/proc_sysctl.c:1753
 neigh_sysctl_unregister+0x5f/0x80  net/core/neighbour.c:3895
 addrconf_notify+0x140/0x17b0  net/ipv6/addrconf.c:3684
 notifier_call_chain+0xbe/0x210  kernel/notifier.c:87
 call_netdevice_notifiers_info+0xb5/0x150  net/core/dev.c:1937
 call_netdevice_notifiers_extack  net/core/dev.c:1975 [inline]
 call_netdevice_notifiers  net/core/dev.c:1989 [inline]
 dev_change_name+0x3c3/0x870  net/core/dev.c:1211
 dev_ifsioc+0x800/0xf70  net/core/dev_ioctl.c:376
 dev_ioctl+0x3d9/0xf80  net/core/dev_ioctl.c:542
 sock_do_ioctl+0x160/0x260  net/socket.c:1213
 sock_ioctl+0x3f9/0x670  net/socket.c:1316
 vfs_ioctl  fs/ioctl.c:51 [inline]
 __do_sys_ioctl  fs/ioctl.c:870 [inline]
 __se_sys_ioctl  fs/ioctl.c:856 [inline]
 __x64_sys_ioctl+0x19e/0x210  fs/ioctl.c:856
 do_syscall_x64  arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3f/0x90  arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
 freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
 ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list.  For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

    local = nfc_llcp_find_local(dev); // A
    ..... \
           | raceable
    ..... /
    llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can  drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Bug: 294167961
Fixes: 52feb444a9 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 425d9d3a92)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8e7e7101ce0d5c81da9b8febd4ad78dd1affc4a5
2023-09-04 12:09:23 +01:00
Pablo Neira Ayuso
c603880bd5 UPSTREAM: netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
[ Upstream commit 0ebc1064e4 ]

Bail out with EOPNOTSUPP when adding rule to bound chain via
NFTA_RULE_CHAIN_ID. The following warning splat is shown when
adding a rule to a deleted bound chain:

 WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
 CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1
 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]

Bug: 296128351
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 268cb07ef3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Icf97f57d18bb2b30ed28a3de6cdd18661d7f1c3d
2023-09-04 09:47:17 +00:00
Laszlo Ersek
d95b2b008e UPSTREAM: net: tap_open(): set sk_uid from current_fsuid()
commit 5c9241f3ce upstream.

Commit 66b2c338ad initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec254 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec254 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").

Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338ad has
no observable effect:

- before, "sk_uid" would be zero, due to undefined behavior
  (CVE-2023-1076),

- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.

What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".

Bug: 295995961
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 66b2c338ad ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 767800fc40)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ib5f80015e5c0280acf9f35124d3ff267ff0420f0
2023-09-04 09:44:46 +00:00
Heikki Krogerus
b15c3a3df0 UPSTREAM: usb: typec: ucsi: Fix command cancellation
The Cancel command was passed to the write callback as the
offset instead of as the actual command which caused NULL
pointer dereference.

Reported-by: Stephan Bolten <stephan.bolten@gmx.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217517
Fixes: 094902bc6a ("usb: typec: ucsi: Always cancel the command if PPM reports BUSY condition")
Cc: stable@vger.kernel.org
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Message-ID: <20230606115802.79339-1-heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 298597334
Change-Id: I7f23e49c58b566f462ba34f76966db662308a5bc
(cherry picked from commit c4a8bfabef)
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
2023-09-01 09:53:15 +00:00
Will Shiu
0c34d588af UPSTREAM: locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock
As following backtrace, the struct file_lock request , in posix_lock_inode
is free before ftrace function using.
Replace the ftrace function ahead free flow could fix the use-after-free
issue.

[name:report&]===============================================
BUG:KASAN: use-after-free in trace_event_raw_event_filelock_lock+0x80/0x12c
[name:report&]Read at addr f6ffff8025622620 by task NativeThread/16753
[name:report_hw_tags&]Pointer tag: [f6], memory tag: [fe]
[name:report&]
BT:
Hardware name: MT6897 (DT)
Call trace:
 dump_backtrace+0xf8/0x148
 show_stack+0x18/0x24
 dump_stack_lvl+0x60/0x7c
 print_report+0x2c8/0xa08
 kasan_report+0xb0/0x120
 __do_kernel_fault+0xc8/0x248
 do_bad_area+0x30/0xdc
 do_tag_check_fault+0x1c/0x30
 do_mem_abort+0x58/0xbc
 el1_abort+0x3c/0x5c
 el1h_64_sync_handler+0x54/0x90
 el1h_64_sync+0x68/0x6c
 trace_event_raw_event_filelock_lock+0x80/0x12c
 posix_lock_inode+0xd0c/0xd60
 do_lock_file_wait+0xb8/0x190
 fcntl_setlk+0x2d8/0x440
...
[name:report&]
[name:report&]Allocated by task 16752:
...
 slab_post_alloc_hook+0x74/0x340
 kmem_cache_alloc+0x1b0/0x2f0
 posix_lock_inode+0xb0/0xd60
...
 [name:report&]
 [name:report&]Freed by task 16752:
...
  kmem_cache_free+0x274/0x5b0
  locks_dispose_list+0x3c/0x148
  posix_lock_inode+0xc40/0xd60
    do_lock_file_wait+0xb8/0x190
  fcntl_setlk+0x2d8/0x440
  do_fcntl+0x150/0xc18
...

Bug: 290585450
Link:https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/commit/?h=locks-6.6&id=74f6f5912693ce454384eaeec48705646a21c74f
(cherry picked from commit 74f6f59126)
Change-Id: I7daa6e72d1815daff30dd39726e14b1d57b60f5f
Signed-off-by: Will Shiu <Will.Shiu@mediatek.com>
2023-08-31 21:20:34 +00:00
Ramji Jiyani
20266a0652 ANDROID: kleaf: Remove ptp_kvm.ko from i386 modules
commit 638804ea1c ("ANDROID: kleaf: get_gki_modules_list add i386
option") introduced i386 as an option for get_gki_modules_list()
with ptp_kvm.ko as i386 module. ptp_kvm.ko is not a module on
anrdoid14-6.1, and cherry pick from android15-6.1 should have been worked to remove it.

Remove ptp_kvm.ko from i386 list and make it empty for android14-6.1.

Fixes: 638804ea1c ("ANDROID: kleaf: get_gki_modules_list add i386 option")
Bug: 293529933
Test: TH
Change-Id: Ied9d8c06c9f38dc271d541275afee053a87ecd79
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
2023-08-31 17:47:21 +00:00
zhengtangquan
ce18fe6f29 ANDROID: GKI: Add symbols to symbol list for oplus
1 function symbol(s) added
  'int __traceiter_android_vh_tune_swappiness(void*, int*)'

1 variable symbol(s) added
  'struct tracepoint __tracepoint_android_vh_tune_swappiness'

Bug: 297985476
Change-Id: I63e0e77b71df1b81eaa7d7370c6f739337d6c7e3
Signed-off-by: Tangquan Zheng <zhengtangquan@oppo.com>
2023-08-31 17:38:17 +00:00
Tangquan Zheng
8e6550add2 ANDROID: vendor_hooks: Add tune swappiness hook in get_scan_count()
Add hook in get_scan_count() for customized swappiness.
Partial cherry-pick of aosp/2119426.

Bug: 297985476

Change-Id: I9d4074cf1a4097ff2a96be04646a01624cbd8dc3
Signed-off-by: Tangquan Zheng <zhengtangquan@oppo.com>
2023-08-31 17:38:17 +00:00
ying zuxin
dd87a7122c ANDROID: GKI: Update symbol list for VIVO
INFO: 1 function symbol(s) added
  'void blk_fill_rwbs(char*, blk_opf_t)'

Bug: 298155651
Change-Id: If30ac266aff8ba370e3064a59f082a02035c9dff
Signed-off-by: ying zuxin <yingzuxin@vivo.com>
2023-08-31 13:09:07 +00:00
Ramji Jiyani
638804ea1c ANDROID: kleaf: get_gki_modules_list add i386 option
Adds "i386" as an option to get the list of 32-bit x86
modules in get_gki_modules_list().

virtual_device_i686 Cuttlefish target is a consumer.
Option is named i386 to match the `arch` attributes
in kernel_build rule.

Bug: 293529933
Test: TH
Change-Id: Ic5278aa687999a2bb2d98b97b204b99d1fcd809a
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
(cherry picked from commit 2a9967e15f99010ec06ac089b42a2ac20f2a57cb)
2023-08-31 03:24:10 +00:00
Ramji Jiyani
264e2973a4 ANDROID: arm as an option for get_gki_modules_list
If driver config depends on ARM64, driver is not
available for the ARM targets as module.

Introduce arm as an option for get_gki_modules_list()
to separate ARM64 dependent modules.

virtual_device_arm Cuttlefish target is the current
consumer of this; and it fails when there is ARM64
dependent module is introduced like OEM hypervisors.

Bug: 293529933
Test: TH
Change-Id: I462e8968faa48d58721d884688af62ff603c9a3d
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
(cherry picked from commit b0e30c021b79d9cb9a67b12a94d1fe2f61126f14)
2023-08-31 03:21:33 +00:00
David Gow
37edfbc5c4 UPSTREAM: um: Only disable SSE on clang to work around old GCC bugs
As part of the Rust support for UML, we disable SSE (and similar flags)
to match the normal x86 builds. This both makes sense (we ideally want a
similar configuration to x86), and works around a crash bug with SSE
generation under Rust with LLVM.

However, this breaks compiling stdlib.h under gcc < 11, as the x86_64
ABI requires floating-point return values be stored in an SSE register.
gcc 11 fixes this by only doing register allocation when a function is
actually used, and since we never use atof(), it shouldn't be a problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652

Nevertheless, only disable SSE on clang setups, as that's a simple way
of working around everyone's bugs.

Fixes: 8849818679 ("rust: arch/um: Disable FP/SIMD instruction to match x86")
Reported-by: Roberto Sassu <roberto.sassu@huaweicloud.com>
Link: https://lore.kernel.org/linux-um/6df2ecef9011d85654a82acd607fdcbc93ad593c.camel@huaweicloud.com/
Tested-by: Roberto Sassu <roberto.sassu@huaweicloud.com>
Tested-by: SeongJae Park <sj@kernel.org>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Tested-by: Arthur Grillo <arthurgrillo@riseup.net>
Signed-off-by: Richard Weinberger <richard@nod.at>

Bug: 296671039
Change-Id: Ie71e5c59ca9fb6a480895af233fae9a15f5c5ddc
(cherry picked from commit a3046a618a)
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
2023-08-30 12:58:15 +00:00
Pratyush Brahma
2a13641a14 ANDROID: GKI: Update abi_gki_aarch64_qcom for page_owner symbols
Update abi_gki_aarch64_qcom to include __set_page_owner
and page_owner_inited symbols.

Bug: 296348400
Change-Id: I3dec65fb596764e51897dd0251aada539a34feca
Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com>
2023-08-29 23:06:24 +00:00
Pratyush Brahma
f08623648a ANDROID: mm: Export page_owner_inited and __set_page_owner
Export page_owner_inited and __set_page_owner symbol
for loadable vendor modules.

Bug: 296348400
Change-Id: I220ec1b94326ca3c6cc809d54646c51194645197
Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com>
2023-08-29 23:06:13 +00:00
Ulises Mendez Martinez
e44e3955f7 ANDROID: Use alias for old rules.
* This is in preparation for removal of these targets.

Bug: 293529933
Change-Id: I7b7400bb95b0d2c571be18b97727d878996ab575
Signed-off-by: Ulises Mendez Martinez <umendez@google.com>
(cherry picked from commit 83379c35cd0f39f65d89aacb7fbd4166b4cc9e9a)
2023-08-29 18:19:28 +00:00
Yi-De Wu
67018dd4e4 ANDROID: virt: geniezone: Enable as GKI module for arm64
Enables CONFIG_MTK_GZVM (gzvm.ko) as protected GKI
module for arm64.

Depends on ARM64 so no need to explicitly disable for
other architecture's gki_defconfig files.

Change-Id: I7bbef9192d92db295623f491e2a923147473a196
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
2023-08-29 18:02:40 +00:00
Ulises Mendez Martinez
9a399ca713 ANDROID: Add arch specific gki module list targets
* This is a no-op change preparing for the split of target and files
  based on the architecture used.

Bug: 293529933
Change-Id: I7783b60e591aaad23b5446af5cb04af5765f4b3f
Signed-off-by: Ulises Mendez Martinez <umendez@google.com>
2023-08-29 18:02:40 +00:00
Yi-De Wu
3e079b7691 FROMLIST: virt: geniezone: Add dtb config support
Hypervisor might need to know the accurate address and size of dtb
passed from userspace. And then hypervisor would parse the dtb and get
vm information.

Change-Id: I23194d45f5c60555ba7fde9dd8d393443fd41310
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Liju-clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-10-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
39bd65ec1d FROMLIST: virt: geniezone: Add memory region support
Hypervisor might need to know the precise purpose of each memory
region, so that it can provide specific memory protection. We add a new
uapi to pass address and size of a memory region and its purpose.

Change-Id: I53cc0953fd1e3f0aa3c0a91bb5877b2fb297c858
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Liju-clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-9-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
c26057e351 FROMLIST: virt: geniezone: Add ioeventfd support
Ioeventfd leverages eventfd to provide asynchronous notification
mechanism for VMM. VMM can register a mmio address and bind with an
eventfd. Once a mmio trap occurs on this registered region, its
corresponding eventfd will be notified.

Change-Id: Iff6bb7dd8ba42d08813e531ab40629492a1218bc
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-8-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
e73a5222e6 FROMLIST: virt: geniezone: Add irqfd support
irqfd enables other threads than vcpu threads to inject virtual
interrupt through irqfd asynchronously rather through ioctl interface.
This interface is necessary for VMM which creates separated thread for
IO handling or uses vhost devices.

Change-Id: I3a77cdcec0530193a518352f30c162d08b5b35ef
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-7-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
7427b76faa FROMLIST: virt: geniezone: Add irqchip support for virtual interrupt injection
Enable GenieZone to handle virtual interrupt injection request.

Change-Id: I2dc99a1d30309864eb7bbc91c97570cbb7c548a2
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-6-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
540cff0872 FROMLIST: virt: geniezone: Add vcpu support
VMM use this interface to create vcpu instance which is a fd, and this
fd will be for any vcpu operations, such as setting vcpu registers and
accepts the most important ioctl GZVM_VCPU_RUN which requests GenieZone
hypervisor to do context switch to execute VM's vcpu context.

Change-Id: I76e6e5b3a33b30eb0b841288c3aa041e63564da2
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-5-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
6ce86d075e FROMLIST: virt: geniezone: Add GenieZone hypervisor support
GenieZone is MediaTek hypervisor solution, and it is running in EL2
stand alone as a type-I hypervisor. This patch exports a set of ioctl
interfaces for userspace VMM (e.g., crosvm) to operate guest VMs
lifecycle (creation and destroy) on GenieZone.

Change-Id: I4fbc79bab120fe5ad90e2832f70562e97bbf40c0
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-4-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
40107a0081 FROMLIST: dt-bindings: hypervisor: Add MediaTek GenieZone hypervisor
Add documentation for GenieZone(gzvm) node. This node informs gzvm
driver to start probing if geniezone hypervisor is available and
able to do virtual machine operations.

Change-Id: Ie448a33b8981ee25fe36231a10af5c1372d23012
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-3-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
Yi-De Wu
beaffb638b FROMLIST: docs: geniezone: Introduce GenieZone hypervisor
GenieZone is MediaTek proprietary hypervisor solution, and it is running
in EL2 stand alone as a type-I hypervisor. It is a pure EL2
implementation which implies it does not rely any specific host VM, and
this behavior improves GenieZone's security as it limits its interface.

Change-Id: I8326093b5be79af5f87285fc74ee0cd7f5827808
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 280363874
Link: https://lore.kernel.org/lkml/20230727080005.14474-2-yi-de.wu@mediatek.com/
2023-08-29 18:02:40 +00:00
valis
e0c4636bd2 UPSTREAM: net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
[ Upstream commit b80b829e9e ]

When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.

Fix this by no longer copying the tcf_result struct from the old filter.

Bug: 296347075
Fixes: 1109c00547 ("net: sched: RCU cls_route")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d4d3b53a4c)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iefbd201b92847ec1349f92c107d7ef5aec3fb359
2023-08-29 16:54:04 +00:00
Laszlo Ersek
ec1f17ddac UPSTREAM: net: tun_chr_open(): set sk_uid from current_fsuid()
commit 9bc3047374 upstream.

Commit a096ccca6e initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/net/tun" device node's owner UID. Per
original commit 86741ec254 ("net: core: Add a UID field to struct
sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
userspace process that creates the socket. Commit 86741ec254 mentions
socket() and accept(); with "tun", the action that creates the socket is
open("/dev/net/tun").

Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e
has no observable effect:

- before, "sk_uid" would be zero, due to undefined behavior
  (CVE-2023-1076),

- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.

What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".

Bug: 295995961
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: a096ccca6e ("tun: tun_chr_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b6846d7c40)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I2540ac5876ca7dad39e1b867a5e09a5c9c69bb86
2023-08-29 16:47:55 +00:00
Namjae Jeon
0adc759b0c UPSTREAM: exfat: check if filename entries exceeds max filename length
[ Upstream commit d42334578e ]

exfat_extract_uni_name copies characters from a given file name entry into
the 'uniname' variable. This variable is actually defined on the stack of
the exfat_readdir() function. According to the definition of
the 'exfat_uni_name' type, the file name should be limited 255 characters
(+ null teminator space), but the exfat_get_uniname_from_ext_entry()
function can write more characters because there is no check if filename
entries exceeds max filename length. This patch add the check not to copy
filename characters when exceeding max filename length.

Bug: 296393077
Cc: stable@vger.kernel.org
Cc: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reported-by: Maxim Suhanov <dfirblog@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c2fdf827f8)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I57a9ab007a5eac9c3415aa460df324c9044908c0
2023-08-29 16:46:10 +00:00
valis
f4ba064f76 UPSTREAM: net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
[ Upstream commit 76e42ae831 ]

When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.

Fix this by no longer copying the tcf_result struct from the old filter.

Bug: 296347075
Fixes: e35a8ee599 ("net: sched: fw use RCU")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7f691439b2)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I33c91c83d1cd8e889a7261adfa3779ca6c141088
2023-08-29 16:43:40 +00:00
Stephen Dickey
5b0878fc61 ANDROID: abi_gki_aarch64_qcom: update abi symbols
Add android_rvh_cgroup_force_migration and other symbols.

Symbols added:
   __traceiter_android_rvh_cgroup_force_kthread_migration
   __tracepoint_android_rvh_cgroup_force_kthread_migration

Bug: 184594949
Change-Id: I8ffed8f422a33f141edc95d1b65a07b8fe30b424
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-08-28 23:28:11 +00:00
Pavankumar Kondeti
7551a1a2a1 ANDROID: cgroup: Add android_rvh_cgroup_force_kthread_migration
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.

It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.

[1] f08f049de1
[2] https://lore.kernel.org/lkml/1617714261-18111-1-git-send-email-pkondeti@codeaurora.org

Bug: 184594949
Change-Id: I445a170ba797c8bece3b4b59b7a42cdd85438f1f
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
[quic_dickey@quicinc.com: port to android-mainline kernel]
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-08-28 23:28:01 +00:00
Enlin Mu
cd018c99fa FROMGIT: pstore/ram: Check start of empty przs during init
After commit 30696378f6 ("pstore/ram: Do not treat empty buffers as
valid"), initialization would assume a prz was valid after seeing that
the buffer_size is zero (regardless of the buffer start position). This
unchecked start value means it could be outside the bounds of the buffer,
leading to future access panics when written to:

 sysdump_panic_event+0x3b4/0x5b8
 atomic_notifier_call_chain+0x54/0x90
 panic+0x1c8/0x42c
 die+0x29c/0x2a8
 die_kernel_fault+0x68/0x78
 __do_kernel_fault+0x1c4/0x1e0
 do_bad_area+0x40/0x100
 do_translation_fault+0x68/0x80
 do_mem_abort+0x68/0xf8
 el1_da+0x1c/0xc0
 __raw_writeb+0x38/0x174
 __memcpy_toio+0x40/0xac
 persistent_ram_update+0x44/0x12c
 persistent_ram_write+0x1a8/0x1b8
 ramoops_pstore_write+0x198/0x1e8
 pstore_console_write+0x94/0xe0
 ...

To avoid this, also check if the prz start is 0 during the initialization
phase. If not, the next prz sanity check case will discover it (start >
size) and zap the buffer back to a sane state.

Bug: 293538531
Fixes: 30696378f6 ("pstore/ram: Do not treat empty buffers as valid")
Cc: Yunlong Xing <yunlong.xing@unisoc.com>
Cc: stable@vger.kernel.org
Change-Id: I6ff3a11b8b21f6f5ab37d8432751e5d33a441d8c
Signed-off-by: Enlin Mu <enlin.mu@unisoc.com>
Link: https://lore.kernel.org/r/20230801060432.1307717-1-yunlong.xing@unisoc.com
[kees: update commit log with backtrace and clarifications]
(cherry picked from commit fe8c3623ab
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore)
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Chunhui Li <chunhui.li@mediatek.com>
2023-08-28 23:15:44 +00:00
sunshijie
ffaab71302 UPSTREAM: erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF
z_erofs_do_read_page() may loop infinitely due to the inappropriate
truncation in the below statement. Since the offset is 64 bits and min_t()
truncates the result to 32 bits. The solution is to replace unsigned int
with a 64-bit type, such as erofs_off_t.
    cur = end - min_t(unsigned int, offset + end - map->m_la, end);

    - For example:
        - offset = 0x400160000
        - end = 0x370
        - map->m_la = 0x160370
        - offset + end - map->m_la = 0x400000000
        - offset + end - map->m_la = 0x00000000 (truncated as unsigned int)
    - Expected result:
        - cur = 0
    - Actual result:
        - cur = 0x370

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230710093410.44071-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 8191213a58
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I152508ba4c0eb83aeae5d753e22b0ca8d3ada56d
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
8497f46a87 UPSTREAM: erofs: avoid useless loops in z_erofs_pcluster_readmore() when reading beyond EOF
z_erofs_pcluster_readmore() may take a long time to loop when the page
offset is large enough, which is unnecessary should be prevented.

For example, when the following case is encountered, it will loop 4691368
times, taking about 27 seconds:
    - offset = 19217289215
    - inode_size = 1442672

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Fixes: 386292919c ("erofs: introduce readmore decompression strategy")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230710042531.28761-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 936aa701d8
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I279b0fadcfa8c0ff0d638a86c7bb2c6b4d07f194
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
2f805fb912 UPSTREAM: erofs: Fix detection of atomic context
Current check for atomic context is not sufficient as
z_erofs_decompressqueue_endio can be called under rcu lock
from blk_mq_flush_plug_list(). See the stacktrace [1]

In such case we should hand off the decompression work for async
processing rather than trying to do sync decompression in current
context. Patch fixes the detection by checking for
rcu_read_lock_any_held() and while at it use more appropriate
!in_task() check than in_atomic().

Background: Historically erofs would always schedule a kworker for
decompression which would incur the scheduling cost regardless of
the context. But z_erofs_decompressqueue_endio() may not always
be in atomic context and we could actually benefit from doing the
decompression in z_erofs_decompressqueue_endio() if we are in
thread context, for example when running with dm-verity.
This optimization was later added in patch [2] which has shown
improvement in performance benchmarks.

==============================================
[1] Problem stacktrace
[name:core&]BUG: sleeping function called from invalid context at kernel/locking/mutex.c:291
[name:core&]in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1615, name: CpuMonitorServi
[name:core&]preempt_count: 0, expected: 0
[name:core&]RCU nest depth: 1, expected: 0
CPU: 7 PID: 1615 Comm: CpuMonitorServi Tainted: G S      W  OE      6.1.25-android14-5-maybe-dirty-mainline #1
Hardware name: MT6897 (DT)
Call trace:
 dump_backtrace+0x108/0x15c
 show_stack+0x20/0x30
 dump_stack_lvl+0x6c/0x8c
 dump_stack+0x20/0x48
 __might_resched+0x1fc/0x308
 __might_sleep+0x50/0x88
 mutex_lock+0x2c/0x110
 z_erofs_decompress_queue+0x11c/0xc10
 z_erofs_decompress_kickoff+0x110/0x1a4
 z_erofs_decompressqueue_endio+0x154/0x180
 bio_endio+0x1b0/0x1d8
 __dm_io_complete+0x22c/0x280
 clone_endio+0xe4/0x280
 bio_endio+0x1b0/0x1d8
 blk_update_request+0x138/0x3a4
 blk_mq_plug_issue_direct+0xd4/0x19c
 blk_mq_flush_plug_list+0x2b0/0x354
 __blk_flush_plug+0x110/0x160
 blk_finish_plug+0x30/0x4c
 read_pages+0x2fc/0x370
 page_cache_ra_unbounded+0xa4/0x23c
 page_cache_ra_order+0x290/0x320
 do_sync_mmap_readahead+0x108/0x2c0
 filemap_fault+0x19c/0x52c
 __do_fault+0xc4/0x114
 handle_mm_fault+0x5b4/0x1168
 do_page_fault+0x338/0x4b4
 do_translation_fault+0x40/0x60
 do_mem_abort+0x60/0xc8
 el0_da+0x4c/0xe0
 el0t_64_sync_handler+0xd4/0xfc
 el0t_64_sync+0x1a0/0x1a4

[2] Link: https://lore.kernel.org/all/20210317035448.13921-1-huangjianan@oppo.com/

Reported-by: Will Shiu <Will.Shiu@mediatek.com>
Suggested-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20230621220848.3379029-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 12d0a24afd
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I652b189e316b26ca56e1d7b6f1e4c52ae20bb3b7
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
cc6111a287 UPSTREAM: erofs: fix compact 4B support for 16k block size
In compact 4B, two adjacent lclusters are packed together as a unit to
form on-disk indexes for effective random access, as below:

(amortized = 4, vcnt = 2)
       _____________________________________________
      |___@_____ encoded bits __________|_ blkaddr _|
      0        .                                    amortized * vcnt = 8
      .             .
      .                  .              amortized * vcnt - 4 = 4
      .                        .
      .____________________________.
      |_type (2 bits)_|_clusterofs_|

Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs,
since each lcluster can get 16 bits for its type and clusterofs, the
maximum supported lclustersize for compact 4B format is 16k (14 bits).

Fix this to enable compact 4B format for 16k lclusters (blocks), which
is tested on an arm64 server with 16k page size.

Fixes: 152a333a58 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 001b8ccd06
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I97918294a1d00a65223e741c3d153f375ab50507
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
f11ccb03a0 UPSTREAM: erofs: kill hooked chains to avoid loops on deduplicated compressed images
After heavily stressing EROFS with several images which include a
hand-crafted image of repeated patterns for more than 46 days, I found
two chains could be linked with each other almost simultaneously and
form a loop so that the entire loop won't be submitted.  As a
consequence, the corresponding file pages will remain locked forever.

It can be _only_ observed on data-deduplicated compressed images.
For example, consider two chains with five pclusters in total:
	Chain 1:  2->3->4->5    -- The tail pcluster is 5;
        Chain 2:  5->1->2       -- The tail pcluster is 2.

Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link
to Chain 2 at the same time with pcluster 2.

Since hooked chains are all linked locklessly now, I have no idea how
to simply avoid the race.  Instead, let's avoid hooked chains completely
until I could work out a proper way to fix this and end users finally
tell us that it's needed to add it back.

Actually, this optimization can be found with multi-threaded workloads
(especially even more often on deduplicated compressed images), yet I'm
not sure about the overall system impacts of not having this compared
with implementation complexity.

Fixes: 267f2492c8 ("erofs: introduce multi-reference pclusters (fully-referenced)")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 967c28b23f
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I33607c174bfeb54119c6de271b44c9fe2a7399e6
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
7521b904dc UPSTREAM: erofs: fix potential overflow calculating xattr_isize
Given on-disk i_xattr_icount is 16 bits and xattr_isize is calculated
from i_xattr_icount multiplying 4, xattr_isize has a theoretical maximum
of 256K (64K * 4).

Thus declare xattr_isize as unsigned int to avoid the potential overflow.

Fixes: bfb8674dc0 ("staging: erofs: add erofs in-memory stuffs")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230414061810.6479-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 1b3567a196
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I43d88c7ebc3b320e226ab4d7bc6717432ef5ad82
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
6ec6eee87e UPSTREAM: erofs: stop parsing non-compact HEAD index if clusterofs is invalid
Syzbot generated a crafted image [1] with a non-compact HEAD index of
clusterofs 33024 while valid numbers should be 0 ~ lclustersize-1,
which causes the following unexpected behavior as below:

 BUG: unable to handle page fault for address: fffff52101a3fff9
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 23ffed067 P4D 23ffed067 PUD 0
 Oops: 0000 [#1] PREEMPT SMP KASAN
 CPU: 1 PID: 4398 Comm: kworker/u5:1 Not tainted 6.3.0-rc6-syzkaller-g09a9639e56c0 #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
 Workqueue: erofs_worker z_erofs_decompressqueue_work
 RIP: 0010:z_erofs_decompress_queue+0xb7e/0x2b40
 ...
 Call Trace:
  <TASK>
  z_erofs_decompressqueue_work+0x99/0xe0
  process_one_work+0x8f6/0x1170
  worker_thread+0xa63/0x1210
  kthread+0x270/0x300
  ret_from_fork+0x1f/0x30

Note that normal images or images using compact indexes are not
impacted.  Let's fix this now.

[1] https://lore.kernel.org/r/000000000000ec75b005ee97fbaa@google.com

Reported-and-tested-by: syzbot+aafb3f37cfeb6534c4ac@syzkaller.appspotmail.com
Fixes: 02827e1796 ("staging: erofs: add erofs_map_blocks_iter")
Fixes: 152a333a58 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230410173714.104604-1-hsiangkao@linux.alibaba.com
(cherry picked from commit cc4efd3dd2
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I8e4d7d3f30d70f8c4ab42b33f215af1292c57fcf
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
sunshijie
9089c10d9c UPSTREAM: erofs: initialize packed inode after root inode is assigned
As commit 8f7acdae2c ("staging: erofs: kill all failure handling in
fill_super()"), move the initialization of packed inode after root
inode is assigned, so that the iput() in .put_super() is adequate as
the failure handling.

Otherwise, iput() is also needed in .kill_sb(), in case of the mounting
fails halfway.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Fixes: b15b2e307c ("erofs: support on-disk compressed fragments data")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230407141710.113882-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit cb9bce7951
    https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Bug: 296824280

Change-Id: I3cec91605b42c588e2c8f69629f0bdcc20078de2
Signed-off-by: sunshijie <sunshijie@xiaomi.corp-partner.google.com>
Signed-off-by: sunshijie <sunshijie@xiaomi.com>
2023-08-28 17:42:48 +00:00
Kalesh Singh
797dac42cc ANDROID: GKI: Update ABI for zsmalloc fixes
zs_pool->lock was added upstream as a replacement for the size_class
locks.

The tooling over-cautiously reports this as a ABI breakage but both
of these structs (zs_pool and size_class) are internal to zsmalloc.c.

Update the ABI to allow these changes.

Bug: 297093100
Change-Id: Ib9fc5a036f75d89fb6bee4c146034f6c81759e04
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2023-08-28 16:43:44 +00:00
Andrew Yang
cb440cecb2 BACKPORT: zsmalloc: fix races between modifications of fullness and isolated
We encountered many kernel exceptions of VM_BUG_ON(zspage->isolated ==
0) in dec_zspage_isolation() and BUG_ON(!pages[1]) in zs_unmap_object()
lately.  This issue only occurs when migration and reclamation occur at
the same time.

With our memory stress test, we can reproduce this issue several times
a day.  We have no idea why no one else encountered this issue.  BTW,
we switched to the new kernel version with this defect a few months
ago.

Since fullness and isolated share the same unsigned int, modifications of
them should be protected by the same lock.

[andrew.yang@mediatek.com: move comment]
  Link: https://lkml.kernel.org/r/20230727062910.6337-1-andrew.yang@mediatek.com
Link: https://lkml.kernel.org/r/20230721063705.11455-1-andrew.yang@mediatek.com
Fixes: c4549b8711 ("zsmalloc: remove zspage isolation for migration")
Change-Id: I4aeda0715d65f828bb88ad6fbf36b9927c7a5c4b
Signed-off-by: Andrew Yang <andrew.yang@mediatek.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4b5d1e47b6)
Bug: 297093100
[ Kalesh Singh - Fix trivial conflicts in zs_page_putback()]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2023-08-28 16:43:44 +00:00
yue.shen
c0e84be923 ANDROID: ABI: Update symbols to unisoc whitelist for A14-6.1
Update whitelist for the symbols used by the unisoc in abi_gki_aarch64_unisoc.

1 variable symbol(s) added
  'int percpu_counter_batch'

Bug: 296338673
Change-Id: Idd1d03e9482c5f9c3ea2184066371cd6705ddd0e
Signed-off-by: Yue Shen <yue.shen@unisoc.com>
2023-08-28 16:09:20 +00:00
Nhat Pham
5ef132d564 UPSTREAM: zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks
Currently, zsmalloc has a hierarchy of locks, which includes a pool-level
migrate_lock, and a lock for each size class.  We have to obtain both
locks in the hotpath in most cases anyway, except for zs_malloc.  This
exception will no longer exist when we introduce a LRU into the zs_pool
for the new writeback functionality - we will need to obtain a pool-level
lock to synchronize LRU handling even in zs_malloc.

In preparation for zsmalloc writeback, consolidate these locks into a
single pool-level lock, which drastically reduces the complexity of
synchronization in zsmalloc.

We have also benchmarked the lock consolidation to see the performance
effect of this change on zram.

First, we ran a synthetic FS workload on a server machine with 36 cores
(same machine for all runs), using

fs_mark  -d  ../zram1mnt  -s  100000  -n  2500  -t  32  -k

before and after for btrfs and ext4 on zram (FS usage is 80%).

Here is the result (unit is file/second):

With lock consolidation (btrfs):
Average: 13520.2, Median: 13531.0, Stddev: 137.5961482019028

Without lock consolidation (btrfs):
Average: 13487.2, Median: 13575.0, Stddev: 309.08283679298665

With lock consolidation (ext4):
Average: 16824.4, Median: 16839.0, Stddev: 89.97388510006668

Without lock consolidation (ext4)
Average: 16958.0, Median: 16986.0, Stddev: 194.7370021336469

As you can see, we observe a 0.3% regression for btrfs, and a 0.9%
regression for ext4. This is a small, barely measurable difference in my
opinion.

For a more realistic scenario, we also tries building the kernel on zram.
Here is the time it takes (in seconds):

With lock consolidation (btrfs):
real
Average: 319.6, Median: 320.0, Stddev: 0.8944271909999159
user
Average: 6894.2, Median: 6895.0, Stddev: 25.528415540334656
sys
Average: 521.4, Median: 522.0, Stddev: 1.51657508881031

Without lock consolidation (btrfs):
real
Average: 319.8, Median: 320.0, Stddev: 0.8366600265340756
user
Average: 6896.6, Median: 6899.0, Stddev: 16.04057355583023
sys
Average: 520.6, Median: 521.0, Stddev: 1.140175425099138

With lock consolidation (ext4):
real
Average: 320.0, Median: 319.0, Stddev: 1.4142135623730951
user
Average: 6896.8, Median: 6878.0, Stddev: 28.621670111997307
sys
Average: 521.2, Median: 521.0, Stddev: 1.7888543819998317

Without lock consolidation (ext4)
real
Average: 319.6, Median: 319.0, Stddev: 0.8944271909999159
user
Average: 6886.2, Median: 6887.0, Stddev: 16.93221781102523
sys
Average: 520.4, Median: 520.0, Stddev: 1.140175425099138

The difference is entirely within the noise of a typical run on zram.
This hardly justifies the complexity of maintaining both the pool lock and
the class lock.  In fact, for writeback, we would need to introduce yet
another lock to prevent data races on the pool's LRU, further complicating
the lock handling logic.  IMHO, it is just better to collapse all of these
into a single pool-level lock.

Link: https://lkml.kernel.org/r/20221128191616.1261026-4-nphamcs@gmail.com
Change-Id: Ib0eb09d7a69190fc4ffea8f819423c7f66d83379
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c0547d0b6a)
Bug: 297093100
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2023-08-26 05:43:02 +00:00
Maciej Żenczykowski
ec6b3d552a UPSTREAM: netfilter: nfnetlink_log: always add a timestamp
Compared to all the other work we're already doing to deliver
an skb to userspace this is very cheap - at worse an extra
call to ktime_get_real() - and very useful.

(and indeed it may even be cheaper if we're running from other hooks)

(background: Android occasionally logs packets which
caused wake from sleep/suspend and we'd like to have
timestamps reliably associated with these events)

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
(cherry picked from commit 1d85594fd3)
Change-Id: Id9b8bc046204c11bf3321e73a67b444777d387dd
2023-08-26 01:40:43 +00:00
Prakruthi Deepak Heragu
4db95aa21a ANDROID: virt: gunyah: Do not allocate irq for GH_RM_RESOURCE_NO_VIRQ
Resource manager can now return GH_RM_RESOURCE_NO_VIRQ (-1) instead of 0
as the value to mean "there's no vIRQ for this resource".

Bug: 297100131
Change-Id: I93c4f41b881bfc9e094fa6115df7ba6fcdaa7e6e
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
2023-08-26 00:39:19 +00:00