linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-05 10:31:46 +09:00

Author	SHA1	Message	Date
Peng Zhang	eb5048ea90	FROMGIT: maple_tree: introduce interfaces __mt_dup() and mtree_dup() Introduce interfaces __mt_dup() and mtree_dup(), which are used to duplicate a maple tree. They duplicate a maple tree in Depth-First Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the source tree and allocate new child nodes in non-leaf nodes. The new node is exactly the same as the source node except for all the addresses stored in it. It will be faster than traversing all elements in the source tree and inserting them one by one into the new tree. The time complexity of these two functions is O(n). The difference between __mt_dup() and mtree_dup() is that mtree_dup() handles locks internally. Analysis of the average time complexity of this algorithm: For simplicity, let's assume that the maximum branching factor of all non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a full tree. Under the given conditions, if there is a maple tree with n elements, the number of its leaves is n/16. From bottom to top, the number of nodes in each level is 1/16 of the number of nodes in the level below. So the total number of nodes in the entire tree is given by the sum of n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has log(n) terms with base 16. According to the formula for the sum of a geometric series, the sum of this series can be calculated as (n-1)/15. Each node has only one parent node pointer, which can be considered as an edge. In total, there are (n-1)/15-1 edges. This algorithm consists of two operations: 1. Traversing all nodes in DFS order. 2. For each node, making a copy and performing necessary modifications to create a new node. For the first part, DFS traversal will visit each edge twice. Let T(ascend) represent the cost of taking one step downwards, and T(descend) represent the cost of taking one step upwards. And both of them are constants (although mas_ascend() may not be, as it contains a loop, but here we ignore it and treat it as a constant). So the time spent on the first part can be represented as ((n-1)/15-1) * (T(ascend) + T(descend)). For the second part, each node will be copied, and the cost of copying a node is denoted as T(copy_node). For each non-leaf node, it is necessary to reallocate all child nodes, and the cost of this operation is denoted as T(dup_alloc). The behavior behind memory allocation is complex and not specific to the maple tree operation. Here, we assume that the time required for a single allocation is constant. Since the size of a node is fixed, both of these symbols are also constants. We can calculate that the time spent on the second part is ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc). Adding both parts together, the total time spent by the algorithm can be represented as: ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) - n/16 * T(dup_alloc) - (T(ascend) + T(descend)) Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc) Let C2 = T(dup_alloc) Let C3 = T(ascend) + T(descend) Finally, the expression can be simplified as: ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3). This is a linear function, so the average time complexity is O(n). Link: https://lkml.kernel.org/r/20231027033845.90608-4-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit fd32e4e9b7646510ee9010e0d5f8b8857d48a6f7 https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) Bug: 308042511 Change-Id: I385759a1184a202498e086458b572c203616b9b4 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-04 22:44:38 +00:00
Peng Zhang	dc9323545b	FROMGIT: maple_tree: introduce {mtree,mas}_lock_nested() In some cases, nested locks may be needed, so {mtree,mas}_lock_nested is introduced. For example, when duplicating maple tree, we need to hold the locks of two trees, in which case nested locks are needed. At the same time, add the definition of spin_lock_nested() in tools for testing. Link: https://lkml.kernel.org/r/20231027033845.90608-3-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit b2472efe4316b2687c153919c1513a098bd82c17 https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) Bug: 308042511 Change-Id: I06f0eb0a32a2f39b7842de08a0e5ce59895345c5 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-04 22:44:38 +00:00
Peng Zhang	4ddcdc519b	FROMGIT: maple_tree: add mt_free_one() and mt_attr() helpers Patch series "Introduce __mt_dup() to improve the performance of fork()", v7. This series introduces __mt_dup() to improve the performance of fork(). During the duplication process of mmap, all VMAs are traversed and inserted one by one into the new maple tree, causing the maple tree to be rebalanced multiple times. Balancing the maple tree is a costly operation. To duplicate VMAs more efficiently, mtree_dup() and __mt_dup() are introduced for the maple tree. They can efficiently duplicate a maple tree. Here are some algorithmic details about {mtree,__mt}_dup(). We perform a DFS pre-order traversal of all nodes in the source maple tree. During this process, we fully copy the nodes from the source tree to the new tree. This involves memory allocation, and when encountering a new node, if it is a non-leaf node, all its child nodes are allocated at once. This idea was originally from Liam R. Howlett's Maple Tree Work email, and I added some of my own ideas to implement it. Some previous discussions can be found in [1]. For a more detailed analysis of the algorithm, please refer to the logs for patch [3/10] and patch [10/10]. There is a "spawn" in byte-unixbench[2], which can be used to test the performance of fork(). I modified it slightly to make it work with different number of VMAs. Below are the test results. The first row shows the number of VMAs. The second and third rows show the number of fork() calls per ten seconds, corresponding to next-20231006 and the this patchset, respectively. The test results were obtained with CPU binding to avoid scheduler load balancing that could cause unstable results. There are still some fluctuations in the test results, but at least they are better than the original performance. 21 121 221 421 821 1621 3221 6421 12821 25621 51221 112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393 114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599 2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42% Thanks to Liam and Matthew for the review. This patch (of 10): Add two helpers: 1. mt_free_one(), used to free a maple node. 2. mt_attr(), used to obtain the attributes of maple tree. Link: https://lkml.kernel.org/r/20231027033845.90608-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20231027033845.90608-2-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4f2267b58a22d972be98edef8e6b3c7a67c9fb91 https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable) Bug: 308042511 Change-Id: Ib9b13dee357ac4c85668901c20a3c370fbdd08da Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-04 22:44:38 +00:00
Liam R. Howlett	c52d48818b	UPSTREAM: maple_tree: introduce __mas_set_range() mas_set_range() resets the node to MAS_START, which will cause a re-walk of the tree to the range. This is unnecessary when the maple state is already at the correct location of the write. Add a function that only sets the range to avoid unnecessary re-walking of the tree. Link: https://lkml.kernel.org/r/20230724183157.3939892-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit `c1297987cc`) Bug: 308042511 Change-Id: I9e026d0f103e3aa24b47998be6b83e28e7928540 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-04 22:44:38 +00:00
Kever Yang	066d57de87	ANDROID: GKI: Enable symbols for v4l2 in async and fwnode INFO: 14 function symbol(s) added 'struct v4l2_async_subdev* __v4l2_async_nf_add_fwnode(struct v4l2_async_notifier, struct fwnode_handle, unsigned int)' 'struct v4l2_async_subdev* __v4l2_async_nf_add_fwnode_remote(struct v4l2_async_notifier, struct fwnode_handle, unsigned int)' 'void v4l2_async_nf_cleanup(struct v4l2_async_notifier)' 'void v4l2_async_nf_init(struct v4l2_async_notifier)' 'int v4l2_async_nf_parse_fwnode_endpoints(struct device, struct v4l2_async_notifier, size_t, parse_endpoint_func)' 'int v4l2_async_nf_register(struct v4l2_device, struct v4l2_async_notifier)' 'void v4l2_async_nf_unregister(struct v4l2_async_notifier)' 'int v4l2_async_register_subdev(struct v4l2_subdev)' 'int v4l2_async_register_subdev_sensor(struct v4l2_subdev)' 'int v4l2_async_subdev_nf_register(struct v4l2_subdev, struct v4l2_async_notifier)' 'void v4l2_async_unregister_subdev(struct v4l2_subdev)' 'int v4l2_fwnode_endpoint_alloc_parse(struct fwnode_handle, struct v4l2_fwnode_endpoint)' 'void v4l2_fwnode_endpoint_free(struct v4l2_fwnode_endpoint)' 'int v4l2_fwnode_endpoint_parse(struct fwnode_handle, struct v4l2_fwnode_endpoint*)' Bug: 300024866 Change-Id: I7e4c2faac5c8341a19ea3fed694190d38679dc5b Signed-off-by: Kever Yang <kever.yang@rock-chips.com>	2024-01-04 22:25:09 +00:00
Ken Huang	e74417834e	ANDROID: Update the ABI symbol list Adding the following symbols: - __drmm_crtc_alloc_with_planes Bug: 275278929 Change-Id: I41b6069612d44214f474ed82ee2a4b07ca739302 Signed-off-by: Ken Huang <kenbshuang@google.com>	2024-01-04 23:18:14 +08:00
Vincent Donnefort	15a93de464	ANDROID: KVM: arm64: Fix hyp event alignment The structures that define hyp events must be packed so they match their format definitions in the tracefs file hyp/events/hyp/<event>/format. Bug: 299430621 Change-Id: Ia7e1a686744d5c9c3f8a21881f03228c8acecade Signed-off-by: Vincent Donnefort <vdonnefort@google.com>	2024-01-04 14:09:44 +00:00
Vincent Donnefort	717d1f8f91	ANDROID: KVM: arm64: Fix host_smc print typo From pKVM point of view, unknown SMCs are simply forwarded, we can't consider them invalid or not. This was probably a typo following a copy of the host_hcall event. Bug: 299430621 Change-Id: Ieb53f985a5187a8b5a9feb4a95982b15cdc1b04a Signed-off-by: Vincent Donnefort <vdonnefort@google.com>	2024-01-04 12:45:10 +00:00
Jaegeuk Kim	8fc25d7862	FROMGIT: f2fs: do not return EFSCORRUPTED, but try to run online repair If we return the error, there's no way to recover the status as of now, since fsck does not fix the xattr boundary issue. Bug: 305658663 Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit 50a472bbc79ff9d5a88be8019a60e936cadf9f13 https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev) Change-Id: I55060a4eede3f5f85066aba22a6ab7155517e5c4 (cherry picked from commit 70113b9d489050d3e7a6f28e0cd6e43f104fc132) (cherry picked from commit 2c1f3789d609bd549f14c019b6c7b311bfd2fa64)	2024-01-04 10:39:11 +00:00
Vincent Donnefort	99288e911a	ANDROID: KVM: arm64: Document module_change_host_prot_range When this pKVM module ops has been introduced, the documentation has been omitted. Bug: 308373293 Change-Id: I9e471414e72a1ee04c132de4ed95d77e815ae8c9 Signed-off-by: Vincent Donnefort <vdonnefort@google.com>	2024-01-04 09:56:20 +00:00
Mukesh Ojha	4d99e41ce1	FROMGIT: PM / devfreq: Synchronize devfreq_monitor_[start/stop] There is a chance if a frequent switch of the governor done in a loop result in timer list corruption where timer cancel being done from two place one from cancel_delayed_work_sync() and followed by expire_timers() can be seen from the traces[1]. while true do echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor done It looks to be issue with devfreq driver where device_monitor_[start/stop] need to synchronized so that delayed work should get corrupted while it is either being queued or running or being cancelled. Let's use polling flag and devfreq lock to synchronize the queueing the timer instance twice and work data being corrupted. [1] ... .. <idle>-0 [003] 9436.209662: timer_cancel timer=0xffffff80444f0428 <idle>-0 [003] 9436.209664: timer_expire_entry timer=0xffffff80444f0428 now=0x10022da1c function=__typeid__ZTSFvP10timer_listE_global_addr baseclk=0x10022da1c <idle>-0 [003] 9436.209718: timer_expire_exit timer=0xffffff80444f0428 kworker/u16:6-14217 [003] 9436.209863: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2b now=0x10022da1c flags=182452227 vendor.xxxyyy.ha-1593 [004] 9436.209888: timer_cancel timer=0xffffff80444f0428 vendor.xxxyyy.ha-1593 [004] 9436.216390: timer_init timer=0xffffff80444f0428 vendor.xxxyyy.ha-1593 [004] 9436.216392: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2c now=0x10022da1d flags=186646532 vendor.xxxyyy.ha-1593 [005] 9436.220992: timer_cancel timer=0xffffff80444f0428 xxxyyyTraceManag-7795 [004] 9436.261641: timer_cancel timer=0xffffff80444f0428 [2] 9436.261653][ C4] Unable to handle kernel paging request at virtual address dead00000000012a [ 9436.261664][ C4] Mem abort info: [ 9436.261666][ C4] ESR = 0x96000044 [ 9436.261669][ C4] EC = 0x25: DABT (current EL), IL = 32 bits [ 9436.261671][ C4] SET = 0, FnV = 0 [ 9436.261673][ C4] EA = 0, S1PTW = 0 [ 9436.261675][ C4] Data abort info: [ 9436.261677][ C4] ISV = 0, ISS = 0x00000044 [ 9436.261680][ C4] CM = 0, WnR = 1 [ 9436.261682][ C4] [dead00000000012a] address between user and kernel address ranges [ 9436.261685][ C4] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 9436.261701][ C4] Skip md ftrace buffer dump for: 0x3a982d0 ... [ 9436.262138][ C4] CPU: 4 PID: 7795 Comm: TraceManag Tainted: G S W O 5.10.149-android12-9-o-g17f915d29d0c #1 [ 9436.262141][ C4] Hardware name: Qualcomm Technologies, Inc. (DT) [ 9436.262144][ C4] pstate: 22400085 (nzCv daIf +PAN -UAO +TCO BTYPE=--) [ 9436.262161][ C4] pc : expire_timers+0x9c/0x438 [ 9436.262164][ C4] lr : expire_timers+0x2a4/0x438 [ 9436.262168][ C4] sp : ffffffc010023dd0 [ 9436.262171][ C4] x29: ffffffc010023df0 x28: ffffffd0636fdc18 [ 9436.262178][ C4] x27: ffffffd063569dd0 x26: ffffffd063536008 [ 9436.262182][ C4] x25: 0000000000000001 x24: ffffff88f7c69280 [ 9436.262185][ C4] x23: 00000000000000e0 x22: dead000000000122 [ 9436.262188][ C4] x21: 000000010022da29 x20: ffffff8af72b4e80 [ 9436.262191][ C4] x19: ffffffc010023e50 x18: ffffffc010025038 [ 9436.262195][ C4] x17: 0000000000000240 x16: 0000000000000201 [ 9436.262199][ C4] x15: ffffffffffffffff x14: ffffff889f3c3100 [ 9436.262203][ C4] x13: ffffff889f3c3100 x12: 00000000049f56b8 [ 9436.262207][ C4] x11: 00000000049f56b8 x10: 00000000ffffffff [ 9436.262212][ C4] x9 : ffffffc010023e50 x8 : dead000000000122 [ 9436.262216][ C4] x7 : ffffffffffffffff x6 : ffffffc0100239d8 [ 9436.262220][ C4] x5 : 0000000000000000 x4 : 0000000000000101 [ 9436.262223][ C4] x3 : 0000000000000080 x2 : ffffff889edc155c [ 9436.262227][ C4] x1 : ffffff8001005200 x0 : ffffff80444f0428 [ 9436.262232][ C4] Call trace: [ 9436.262236][ C4] expire_timers+0x9c/0x438 [ 9436.262240][ C4] __run_timers+0x1f0/0x330 [ 9436.262245][ C4] run_timer_softirq+0x28/0x58 [ 9436.262255][ C4] efi_header_end+0x168/0x5ec [ 9436.262265][ C4] __irq_exit_rcu+0x108/0x124 [ 9436.262274][ C4] __handle_domain_irq+0x118/0x1e4 [ 9436.262282][ C4] gic_handle_irq.30369+0x6c/0x2bc [ 9436.262286][ C4] el0_irq_naked+0x60/0x6c Bug: 317188938 Change-Id: I9a22325f6abbf28217c8f37b093cf77509b0139a Link: https://lore.kernel.org/all/1700860318-4025-1-git-send-email-quic_mojha@quicinc.com/ Reported-by: Joyyoung Huang <huangzaiyang@oppo.com> Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> (cherry picked from commit aed5ed595960c6d301dcd4ed31aeaa7a8054c0c6 https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git devfreq-next) Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>	2024-01-03 23:14:47 +00:00
Suren Baghdasaryan	6c8f710857	FROMGIT: arch/mm/fault: fix major fault accounting when retrying under per-VMA lock A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 46e714c729c8d1d8110bc0545d7ffe8a759c9dc0 https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-hotfixes-stable) Bug: 317385399 Change-Id: Ic7e97bf610dcabb7d3ac2306b2f1213be0ddd269 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox (Oracle)	4a518d8633	UPSTREAM: mm: handle write faults to RO pages under the VMA lock I think this is a pretty rare occurrence, but for consistency handle faults with the VMA lock held the same way that we handle other faults with the VMA lock held. Link: https://lkml.kernel.org/r/20231006195318.4087158-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4a68fef16df9d88d528094116f8bbd2dbfa62089) Bug: 293665307 Change-Id: I69cec218c8a1fe14df3268722e6b1be6dffe7978 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox (Oracle)	c1da94fa44	UPSTREAM: mm: handle read faults under the VMA lock Most file-backed faults are already handled through ->map_pages(), but if we need to do I/O we'll come this way. Since filemap_fault() is now safe to be called under the VMA lock, we can handle these faults under the VMA lock now. Link: https://lkml.kernel.org/r/20231006195318.4087158-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 12214eba1992642eee5813a9cc9f626e5b2d1815) Bug: 293665307 Change-Id: Iee48af98b866d88d88ec01143eb26389ab373b6b Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox (Oracle)	6541fffd92	UPSTREAM: mm: handle COW faults under the VMA lock If the page is not currently present in the page tables, we need to call the page fault handler to find out which page we're supposed to COW, so we need to both check that there is already an anon_vma and that the fault handler doesn't need the mmap_lock. Link: https://lkml.kernel.org/r/20231006195318.4087158-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4de8c93a4751e10737b6af65db42c743228c67a6) Bug: 293665307 Change-Id: If749a6f8fcf69d83bbf872c1d45865d1b1b77ea0 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox (Oracle)	c7fa581a79	UPSTREAM: mm: handle shared faults under the VMA lock There are many implementations of ->fault and some of them depend on mmap_lock being held. All vm_ops that implement ->map_pages() end up calling filemap_fault(), which I have audited to be sure it does not rely on mmap_lock. So (for now) key off ->map_pages existing as a flag to indicate that it's safe to call ->fault while only holding the vma lock. Link: https://lkml.kernel.org/r/20231006195318.4087158-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4ed4379881aa62588aba6442a9f362a8cf7624e6) Bug: 293665307 Change-Id: Ifb5ab3df5d05fb182d0cb52820fa24e28e2d6496 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox (Oracle)	95af8a80bb	BACKPORT: mm: call wp_page_copy() under the VMA lock It is usually safe to call wp_page_copy() under the VMA lock. The only unsafe situation is when no anon_vma has been allocated for this VMA, and we have to look at adjacent VMAs to determine if their anon_vma can be shared. Since this happens only for the first COW of a page in this VMA, the majority of calls to wp_page_copy() do not need to fall back to the mmap_sem. Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which will return RETRY if we currently hold the VMA lock and need to allocate an anon_vma. This lets us drop the check in do_wp_page(). Link: https://lkml.kernel.org/r/20231006195318.4087158-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 164b06f238b986317131e6b61b2f22aabcbc2cc0) [surenb: resolved merge conflicts due to folio/page differences] Bug: 293665307 Change-Id: I39bdc247b375bd3dae8078b52c60fd4ce12e1850 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox (Oracle)	b43b26b4cd	UPSTREAM: mm: make lock_folio_maybe_drop_mmap() VMA lock aware Patch series "Handle more faults under the VMA lock", v2. At this point, we're handling the majority of file-backed page faults under the VMA lock, using the ->map_pages entry point. This patch set attempts to expand that for the following siutations: - We have to do a read. This could be because we've hit the point in the readahead window where we need to kick off the next readahead, or because the page is simply not present in cache. - We're handling a write fault. Most applications don't do I/O by writes to shared mmaps for very good reasons, but some do, and it'd be nice to not make that slow unnecessarily. - We're doing a COW of a private mapping (both PTE already present and PTE not-present). These are two different codepaths and I handle both of them in this patch set. There is no support in this patch set for drivers to mark themselves as being VMA lock friendly; they could implement the ->map_pages vm_operation, but if they do, they would be the first. This is probably something we want to change at some point in the future, and I've marked where to make that change in the code. There is very little performance change in the benchmarks we've run; mostly because the vast majority of page faults are handled through the other paths. I still think this patch series is useful for workloads that may take these paths more often, and just for cleaning up the fault path in general (it's now clearer why we have to retry in these cases). This patch (of 6): Drop the VMA lock instead of the mmap_lock if that's the one which is held. Link: https://lkml.kernel.org/r/20231006195318.4087158-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231006195318.4087158-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 5d74b2ab2c15d596c470bae6626f345d5575a9d0) Bug: 293665307 Change-Id: Ife2d11ab12fb428868cd44751784cf731fbffe62 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Matthew Wilcox	9c4bc457ab	UPSTREAM: mm/memory.c: fix mismerge Fix a build issue. Link: https://lkml.kernel.org/r/ZNerqcNS4EBJA/2v@casper.infradead.org Fixes: 4aaa60dad4d1 ("mm: allow per-VMA locks on file-backed VMAs") Signed-off-by: Matthew Wilcox <willy@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308121909.XNYBtqNI-lkp@intel.com/ Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit `08dff2810e`) Bug: 293665307 Change-Id: I07ce19f29c44831cdcf709fe1ce122d1963f0be2 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2024-01-03 20:45:51 +00:00
Suren Baghdasaryan	7d50253c27	ANDROID: Export functions to be used with dma_map_ops in modules For modules to reuse default dma_map_ops implementations they need to be exported. Export the following functions: dma_direct_alloc dma_direct_free dma_common_mmap dma_common_get_sgtable dma_direct_get_required_mask Bug: 151050914 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia77b797fcd909fce01da7431bfbde282dc70b3b3 (cherry picked from commit fd31496dae939c7bf2ef874e08d4bf8c6ab738b3) Signed-off-by: Qian-Hao Huang <qhhuang@google.com> (cherry picked from commit `cdc9f6ef94`)	2024-01-03 20:45:29 +00:00
Gao Xiang	37e0a5b868	BACKPORT: FROMGIT: erofs: enable sub-page compressed block support Let's just disable cached decompression and inplace I/Os for partial pages as the first step in order to enable sub-page block initial support. In other words, currently it works primarily based on temporary short-lived pages. Don't expect too much in terms of performance. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-6-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I00238aa437f20c46d015bbe5ab7b706b80b8cfd7 (cherry picked from commit 0ee3a0d59e007320167a2e9f4b8bf1304ada7771 https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) [dhavale: resolved conflicts in inode.c in erofs_fill_inode()] Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	f466d52164	FROMGIT: erofs: refine z_erofs_transform_plain() for sub-page block support Sub-page block support is still unusable even with previous commits if interlaced PLAIN pclusters exist. Such pclusters can be found if the fragment feature is enabled. This commit tries to handle "the head part" of interlaced PLAIN pclusters first: it was once explained in commit `fdffc091e6` ("erofs: support interlaced uncompressed data for compressed files"). It uses a unique way for both shifted and interlaced PLAIN pclusters. As an added bonus, PLAIN pclusters larger than the block size is also supported now for the upcoming large lclusters. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-5-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I3d50132664f8754f56d62744420060108ed0da4f (cherry picked from commit 192351616a9dde686492bcb9d1e4895a1411a527 https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	a18efa4e4a	FROMGIT: erofs: fix ztailpacking for subpage compressed blocks `pageofs_in` should be the compressed data offset of the page rather than of the block. Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231214161337.753049-1-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I0997a69b22b0f42c327c810359f55f5fa6a76275 (cherry picked from commit e5aba911dee5e20fa82efbe13e0af8f38ea459e7 https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	0c6a18c75b	BACKPORT: FROMGIT: erofs: fix up compacted indexes for block size < 4096 Previously, the block size always equaled to PAGE_SIZE, therefore `lclusterbits` couldn't be less than 12. Since sub-page compressed blocks are now considered, `lobits` for a lcluster in each pack cannot always be `lclusterbits` as before. Otherwise, there is no enough room for the special value `Z_EROFS_VLE_DI_D0_CBLKCNT`. To support smaller block sizes, `lobits` for each compacted lcluster is now calculated as: lobits = max(lclusterbits, ilog2(Z_EROFS_VLE_DI_D0_CBLKCNT) + 1) Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-4-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: Iacd89e2b33ddf39ea40b90e88a2bf99bb5a83b31 (cherry picked from commit 8d2517aaeea3ab8651bb517bca8f3c8664d318ea https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) [dhavale: resolved conflicts in zmap.c due to older naming of constants and updated commit message also to use the older names] Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	d7bb85f1cb	FROMGIT: erofs: record `pclustersize` in bytes instead of pages Currently, compressed sizes are recorded in pages using `pclusterpages`, However, for tailpacking pclusters, `tailpacking_size` is used instead. This approach doesn't work when dealing with sub-page blocks. To address this, let's switch them to the unified `pclustersize` in bytes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-3-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: Ia8c50a7b4adcf6cd161b1d6f8bfc5a7fd3371079 (cherry picked from commit 54ed3fdd66055d073cb1cd2c6c65bbc0683c40cf https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	9d259220ac	FROMGIT: erofs: support I/O submission for sub-page compressed blocks Add a basic I/O submission path first to support sub-page blocks: - Temporary short-lived pages will be used entirely; - In-place I/O pages can be used partially, but compressed pages need to be able to be mapped in contiguous virtual memory. As a start, currently cache decompression is explicitly disabled for sub-page blocks, which will be supported in the future. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206091057.87027-2-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: Ib2cb6120805ab479a450580fc8774af131271791 (cherry picked from commit 192351616a9dde686492bcb9d1e4895a1411a527 https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	8a49ea9441	FROMGIT: erofs: fix lz4 inplace decompression Currently EROFS can map another compressed buffer for inplace decompression, that was used to handle the cases that some pages of compressed data are actually not in-place I/O. However, like most simple LZ77 algorithms, LZ4 expects the compressed data is arranged at the end of the decompressed buffer and it explicitly uses memmove() to handle overlapping: __________________________________________________________ \|_ direction of decompression --> ____ \|_ compressed data _\| Although EROFS arranges compressed data like this, it typically maps two individual virtual buffers so the relative order is uncertain. Previously, it was hardly observed since LZ4 only uses memmove() for short overlapped literals and x86/arm64 memmove implementations seem to completely cover it up and they don't have this issue. Juhyung reported that EROFS data corruption can be found on a new Intel x86 processor. After some analysis, it seems that recent x86 processors with the new FSRM feature expose this issue with "rep movsb". Let's strictly use the decompressed buffer for lz4 inplace decompression for now. Later, as an useful improvement, we could try to tie up these two buffers together in the correct order. Reported-and-tested-by: Juhyung Park <qkrwngud825@gmail.com> Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR2Q@mail.gmail.com Fixes: `0ffd71bcc3` ("staging: erofs: introduce LZ4 decompression inplace") Fixes: `598162d050` ("erofs: support decompress big pcluster for lz4 backend") Cc: stable <stable@vger.kernel.org> # 5.4+ Tested-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: Ifd2981320f9f79b27bc7484d8906501a2fa05359 (cherry picked from commit 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	bdc5d268ba	FROMGIT: erofs: fix memory leak on short-lived bounced pages Both MicroLZMA and DEFLATE algorithms can use short-lived pages on demand for the overlapped inplace I/O decompression. However, those short-lived pages are actually added to `be->compressed_pages`. Thus, it should be checked instead of `pcl->compressed_bvecs`. The LZ4 algorithm doesn't work like this, so it won't be impacted. Fixes: `67139e36d9` ("erofs: introduce `z_erofs_parse_in_bvecs'") Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231128180431.4116991-1-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: Ia1f602e9944b884022a3e20db12af568304fd80c (cherry picked from commit 93d6fda7f926451a0fa1121b9558d75ca47e861e https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	0d329bbe5c	BACKPORT: erofs: tidy up z_erofs_do_read_page() - Fix a typo: spiltted => split; - Move !EROFS_MAP_MAPPED and EROFS_MAP_FRAGMENT upwards; - Increase `split` in advance to avoid unnecessary repeats. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-4-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I465fd33c7cbbe91d5da4b4ee2343a7b319534148 (cherry picked from commit `e4c1cf523d`) [dhavale: resolved small conflict in zdata.c in z_erofs_do_read_page()] Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	dc94c3cc6b	UPSTREAM: erofs: move preparation logic into z_erofs_pcluster_begin() Some preparation logic should be part of z_erofs_pcluster_begin() instead of z_erofs_do_read_page(). Let's move now. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-3-hsiangkao@linux.alibaba.com Bug: 318378021 (cherry picked from commit `aeebae9d77`) Change-Id: I4bf438d719742a18a6f3065a78bf027de5dae293 Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	7751567a71	BACKPORT: erofs: avoid obsolete {collector,collection} terms {collector,collection} were once reserved in order to indicate different runtime logical extent instance of multi-reference pclusters. However, de-duplicated decompression has been landed in a more flexable way, thus `struct z_erofs_collection` was formally removed in commit `87ca34a706` ("erofs: get rid of `struct z_erofs_collection'"). Let's handle the remaining leftovers, for example: `z_erofs_collector_begin` => `z_erofs_pcluster_begin` `z_erofs_collector_end` => `z_erofs_pcluster_end` as well as some comments. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-2-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I61b812b5ae3dd564e52012d082415b1fc198383d (cherry picked from commit `dcba1b232e`) [dhavale: fixed minor conflict zdata.c in z_erofs_do_read_page()] Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	d0dbf74792	BACKPORT: erofs: simplify z_erofs_read_fragment() A trivial cleanup to make the fragment handling logic more clear. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817082813.81180-1-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I50c09c65b7d3da5022cfc2ede27aa31a1b331d29 (cherry picked from commit `8b00be163f`) [dhavale: resolved conflict around erofs_bread() in zdata.c] Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	4067dd9969	UPSTREAM: erofs: get rid of the remaining kmap_atomic() It's unnecessary to use kmap_atomic() compared with kmap_local_page(). In addition, kmap_atomic() is deprecated now. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230627161240.331-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Bug: 318378021 (cherry picked from commit `123ec246eb`) Change-Id: I7efee861bb4f079fe6b79123d554be2e1867d13b Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	365ca16da2	UPSTREAM: erofs: simplify z_erofs_transform_plain() Use memcpy_to_page() instead of open-coding them. In addition, add a missing flush_dcache_page() even though almost all modern architectures clear `PG_dcache_clean` flag for new file cache pages so that it doesn't change anything in practice. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230627161240.331-2-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Bug: 318378021 (cherry picked from commit `c5539762f3`) Change-Id: I4cb665b592936502ca95e2aee20e1c3a56103ff5 Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	187d034575	BACKPORT: erofs: adapt managed inode operations into folios This patch gets rid of erofs_try_to_free_cached_page() and fold it into .release_folio(). It also moves managed inode operations into zdata.c, which simplifies the code a bit. No logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-5-hsiangkao@linux.alibaba.com Bug: 318378021 Change-Id: I5cb1e44769f68edce788cb4f8084bb3d45b594b3 (cherry picked from commit `7b4e372c36`) [dhavale: changes to internal.h applied manually] Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	3d93182661	UPSTREAM: erofs: avoid on-stack pagepool directly passed by arguments On-stack pagepool is used so that short-lived temporary pages could be shared within a single I/O request (e.g. among multiple pclusters). Moving the remaining frontend-related uses into z_erofs_decompress_frontend to avoid too many arguments. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-3-hsiangkao@linux.alibaba.com Bug: 318378021 (cherry picked from commit `6ab5eed600`) Change-Id: I57d3ba6087904bb40c55b780aca50c16bfba2c0f Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Gao Xiang	5c1827383a	UPSTREAM: erofs: allocate extra bvec pages directly instead of retrying If non-bootstrap bvecs cannot be kept in place (very rarely), an extra short-lived page is allocated. Let's just allocate it immediately rather than do unnecessary -EAGAIN return first and retry as a cleanup. Also it's unnecessary to use __GFP_NOFAIL here since we could gracefully fail out this case instead. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-2-hsiangkao@linux.alibaba.com Bug: 318378021 (cherry picked from commit `05b63d2beb`) Change-Id: I2ac45a943060406bcbb741c5f7aa1094f783f906 Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Yue Hu	bed20ed1d3	UPSTREAM: erofs: clean up z_erofs_pcluster_readmore() `end` parameter is no needed since it's pointless for !backmost, we can handle it with backmost internally. And we only expand the trailing edge, so the newstart can be replaced with ->headoffset. Also, remove linux/prefetch.h inclusion since that is not used anymore after commit `386292919c` ("erofs: introduce readmore decompression strategy"). Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230525072605.17857-1-zbestahu@gmail.com [ Gao Xiang: update commit description. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Bug: 318378021 (cherry picked from commit `796e9149a2`) Change-Id: I9412c4111800077c876a43c4256ce9760a7d902e Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Yue Hu	5e861fa97e	UPSTREAM: erofs: remove the member readahead from struct z_erofs_decompress_frontend The struct member is only used to add REQ_RAHEAD during I/O submission. So it is cleaner to pass it as a parameter than keep it in the struct. Also, rename function z_erofs_get_sync_decompress_policy() to z_erofs_is_sync_decompress() for better clarity and conciseness. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230524063944.1655-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Bug: 318378021 (cherry picked from commit `ef4b4b46c6`) Change-Id: I59cc13e7499968a1e93e13df1cb43a5123d510d9 Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Yue Hu	66595bb17c	UPSTREAM: erofs: fold in z_erofs_decompress() No need this helper since it's just a simple wrapper for decompress method and only one caller. So, let's fold in directly instead. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230426084449.12781-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Bug: 318378021 (cherry picked from commit `597e2953ae`) Change-Id: I849360f088016cf97542858e8a5a9cee671a2f61 Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
Jingbo Xu	88a1939504	UPSTREAM: erofs: enable large folios for iomap mode Enable large folios for iomap mode. Then the readahead routine will pass down large folios containing multiple pages. Let's enable this for non-compressed format for now, until the compression part supports large folios later. When large folios supported, the iomap routine will allocate iomap_page for each large folio and thus we need iomap_release_folio() and iomap_invalidate_folio() to free iomap_page when these folios get reclaimed or invalidated. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20221130060455.44532-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Bug: 318378021 Change-Id: Iedbb9a2daf132399b7a1b5ea6905977ba123ba3c (cherry picked from commit `ce529cc25b`) Signed-off-by: Sandeep Dhavale <dhavale@google.com>	2024-01-03 18:37:43 +00:00
leonardian	2c085909e7	ANDROID: Update the ABI symbol list Adding the following symbols: - _dev_alert Bug: 311337219 Change-Id: Iaf6710842c45921ccfbacd1361e0b57401cf65d9 Signed-off-by: leonardian <leonardian@google.com>	2024-01-03 11:28:59 +00:00
Roy Luo	d16a15fde5	UPSTREAM: USB: gadget: core: adjust uevent timing on gadget unbind The KOBJ_CHANGE uevent is sent before gadget unbind is actually executed, resulting in inaccurate uevent emitted at incorrect timing (the uevent would have USB_UDC_DRIVER variable set while it would soon be removed). Move the KOBJ_CHANGE uevent to the end of the unbind function so that uevent is sent only after the change has been made. Fixes: `2ccea03a8f` ("usb: gadget: introduce UDC Class") Cc: stable@vger.kernel.org Signed-off-by: Roy Luo <royluo@google.com> Link: https://lore.kernel.org/r/20231128221756.2591158-1-royluo@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 312543856 Change-Id: Ida7fa7e1cfae3d1b3f3348512a67fe91065f25af (cherry picked from commit 73ea73affe8622bdf292de898da869d441da6a9d) Signed-off-by: Roy Luo <royluo@google.com>	2024-01-02 21:26:12 +00:00
xieliujie	d3006fb944	ANDROID: ABI: Update oplus symbol list 1 function symbol(s) added 'int __traceiter_android_vh_rt_mutex_steal(void, int, int, bool)' 1 variable symbol(s) added 'struct tracepoint __tracepoint_android_vh_rt_mutex_steal' Bug: 317670024 Change-Id: I28f0379adaec041400e49cbd1e497b2f8c5c893d Signed-off-by: xeiliujie <xieliujie@oppo.com>	2023-12-25 15:22:53 +08:00
xieliujie	bc97d5019a	ANDROID: vendor_hooks: Add hooks for rt_mutex steal Add hooks at rt_mutex_steal function so that oems can decide whether tasks with the same priority steal the rt_mutex or not. We did experiments and found that rt_mutex throughput can benefit a lot when threads with the same priority can steal the rt_mutex lock. Bug: 317670024 Change-Id: Id60a7a41c6c77a67808982d3667946cabe4acc8f Signed-off-by: xeiliujie <xieliujie@oppo.com>	2023-12-25 15:22:46 +08:00
Wu Bo	401a2769d9	UPSTREAM: dm verity: don't perform FEC for failed readahead IO We found an issue under Android OTA scenario that many BIOs have to do FEC where the data under dm-verity is 100% complete and no corruption. Android OTA has many dm-block layers, from upper to lower: dm-verity dm-snapshot dm-origin & dm-cow dm-linear ufs DM tables have to change 2 times during Android OTA merging process. When doing table change, the dm-snapshot will be suspended for a while. During this interval, many readahead IOs are submitted to dm_verity from filesystem. Then the kverity works are busy doing FEC process which cost too much time to finish dm-verity IO. This causes needless delay which feels like system is hung. After adding debugging it was found that each readahead IO needed around 10s to finish when this situation occurred. This is due to IO amplification: dm-snapshot suspend erofs_readahead // 300+ io is submitted dm_submit_bio (dm_verity) dm_submit_bio (dm_snapshot) bio return EIO bio got nothing, it's empty verity_end_io verity_verify_io forloop range(0, io->n_blocks) // each io->nblocks ~= 20 verity_fec_decode fec_decode_rsb fec_read_bufs forloop range(0, v->fec->rsn) // v->fec->rsn = 253 new_read submit_bio (dm_snapshot) end loop end loop dm-snapshot resume Readahead BIOs get nothing while dm-snapshot is suspended, so all of them will cause verity's FEC. Each readahead BIO needs to verify ~20 (io->nblocks) blocks. Each block needs to do FEC, and every block needs to do 253 (v->fec->rsn) reads. So during the suspend interval(~200ms), 300 readahead BIOs trigger ~1518000 (30020253) IOs to dm-snapshot. As readahead IO is not required by userspace, and to fix this issue, it is best to pass readahead errors to upper layer to handle it. Cc: stable@vger.kernel.org Fixes: `a739ff3f54` ("dm verity: add support for forward error correction") Bug: 316972624 Link: https://lore.kernel.org/dm-devel/b84fb49-bf63-3442-8c99-d565e134f2@redhat.com Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Akilesh Kailash <akailash@google.com> (cherry picked from commit 0193e3966ceeeef69e235975918b287ab093082b) Change-Id: I73560e5660cebdc1997e1f9926cbb8888789eb46	2023-12-21 22:46:28 +00:00
Florian Westphal	30bca9e278	UPSTREAM: netfilter: nft_set_pipapo: skip inactive elements during set walk commit 317eb9685095678f2c9f5a8189de698c5354316a upstream. Otherwise set elements can be deactivated twice which will cause a crash. Bug: 316310313 Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: `3c4287f620` ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `189c2a8293`) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: I27fb6ee806642e23ca02700763a387341dd463e6	2023-12-21 11:15:42 +00:00
Charan Teja Kalla	44702d8fa1	FROMLIST: mm: migrate high-order folios in swap cache correctly Large folios occupy N consecutive entries in the swap cache instead of using multi-index entries like the page cache. However, if a large folio is re-added to the LRU list, it can be migrated. The migration code was not aware of the difference between the swap cache and the page cache and assumed that a single xas_store() would be sufficient. This leaves potentially many stale pointers to the now-migrated folio in the swap cache, which can lead to almost arbitrary data corruption in the future. This can also manifest as infinite loops with the RCU read lock held. Bug: 315281107 Change-Id: I455f964a9f21c13089890073777388236b6669d7 [willy@infradead.org: modifications to the changelog & tweaked the fix] Fixes: `3417013e0d` ("mm/migrate: Add folio_migrate_mapping()") Link: https://lkml.kernel.org/r/20231214045841.961776-1-willy@infradead.org Link: https://lore.kernel.org/linux-mm/20231214045841.961776-1-willy@infradead.org/ Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com> Closes: https://lkml.kernel.org/r/1700569840-17327-1-git-send-email-quic_charante@quicinc.com Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>	2023-12-21 00:41:24 +00:00
Paul Lawrence	613d8368e3	ANDROID: fuse-bpf: Follow mounts in lookups Bug: 292925770 Test: fuse_test run. The following steps on Android also now pass: Create /data/123 and /data/media/0/Android/data/45 directories Mount /data/123 directory to /data/media/0/Android/data/45 directory Create 1.txt under the /data/123 directory File 1.txt should appear in /storage/emulated/0/Android/data/45 Change-Id: I1fe27d743ca2981e624a9aa87d9ab6deb313aadc Signed-off-by: Paul Lawrence <paullawrence@google.com>	2023-12-20 23:12:56 +00:00
Kever Yang	07775f9683	ANDROID: GKI: Add symbols for rockchip sata INFO: 24 function symbol(s) added 'size_t __scsi_format_command(char, size_t, const unsigned char, size_t)' 'int attribute_container_register(struct attribute_container)' 'int attribute_container_unregister(struct attribute_container)' 'void pci_intx(struct pci_dev, int)' 'int pcim_iomap_regions_request_all(struct pci_dev, int, const char)' 'void pcim_pin_device(struct pci_dev)' 'int reset_control_rearm(struct reset_control)' 'enum scsi_disposition scsi_check_sense(struct scsi_cmnd)' 'int scsi_device_set_state(struct scsi_device, enum scsi_device_state)' 'void scsi_eh_finish_cmd(struct scsi_cmnd, struct list_head)' 'void scsi_eh_flush_done_q(struct list_head)' 'int scsi_rescan_device(struct scsi_device)' 'void scsi_schedule_eh(struct Scsi_Host)' 'const u8* scsi_sense_desc_find(const u8, int, int)' 'int scsi_set_sense_field_pointer(u8, int, u16, u8, bool)' 'void sdev_evt_send_simple(struct scsi_device, enum scsi_device_event, gfp_t)' 'bool system_entering_hibernation()' 'int transport_add_device(struct device)' 'int transport_class_register(struct transport_class)' 'void transport_class_unregister(struct transport_class)' 'void transport_configure_device(struct device)' 'void transport_destroy_device(struct device)' 'void transport_remove_device(struct device)' 'void transport_setup_device(struct device)' Bug: 300024866 Change-Id: I6a505d48d0d199a710b0d93b6a8df189735a7b89 Signed-off-by: Kever Yang <kever.yang@rock-chips.com>	2023-12-19 18:44:06 +00:00

1 2 3 4 5 ...

1155954 Commits