After demand paging is captured during APP launch,
we can do it in advance before next launch.
Add the symbols for it here.
INFO: 4 function symbol(s) added
'unsigned int filemap_get_folios(struct address_space*, unsigned long*, unsigned long, struct folio_batch*)'
'unsigned int find_get_pages_range_tag(struct address_space*, unsigned long*, unsigned long, xa_mark_t, unsigned int, struct page**)'
'void page_cache_async_ra(struct readahead_control*, struct folio*, unsigned long)'
'void page_cache_sync_ra(struct readahead_control*, unsigned long)'
Bug: 315913896
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.com>
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.corp-partner.google.com>
Change-Id: I3f42c39c6432303e69f1fbae56fabf620381d8c5
Current implementation blocks the running operations when Plug-out and
Plug-In is performed continuously, process gets stuck in
dwc3_thread_interrupt().
Code Flow:
CPU1
->Gadget_start
->dwc3_interrupt
->dwc3_thread_interrupt
->dwc3_process_event_buf
->dwc3_process_event_entry
->dwc3_endpoint_interrupt
->dwc3_ep0_interrupt
->dwc3_ep0_inspect_setup
->dwc3_ep0_stall_and_restart
By this time if pending_list is not empty, it will get the next request
on the given list and calls dwc3_gadget_giveback which will unmap request
and call its complete() callback to notify upper layers that it has
completed. Currently dwc3_gadget_giveback status is set to -ECONNRESET,
whereas it should be -ESHUTDOWN based on condition if not dwc->connected
is true.
Cc: <stable@vger.kernel.org>
Fixes: d742220b35 ("usb: dwc3: ep0: giveback requests on stall_and_restart")
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20231222094704.20276-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 320413810
(cherry picked from commit e9d40b215e38480fd94c66b06d79045717a59e9c
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-next)
Change-Id: I7f0afebbcfa88b6b4e622a708b9838dd461661fc
Signed-off-by: Sriram Dash <quic_sriramd@quicinc.com>
Introduce a config option for each QMP PHY driver now that the QMP PHY
mega-driver has been split up into different modules. This allows kernel
configurators to limit the binary size of the kernel by only compiling
in the QMP PHY driver that they need.
Leave the old config QCOM_QMP in place and make it into a menuconfig so
that 'make olddefconfig' continues to work. Furthermore, set the default
of the new Kconfig symbols to be QCOM_QMP so that the transition is
smooth.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/all/20230202215330.2152726-1-swboyd@chromium.org/
Bug: 319064658
Change-Id: I633e6e1bbc3e79292bfde927e46f84219f0178ae
(cherry picked from commit d1abd69534)
[quic_kuruva: Resolved minor conflict in drivers/phy/qualcomm/Kconfig ]
Signed-off-by: Rajashekar kuruva <quic_kuruva@quicinc.com>
Add symbols of vendor hooks to capture demand paging during APP launch,
so we can do it in advance in next launch.
INFO: 1 function symbol(s) added
'int __traceiter_android_vh_read_pages(void*, struct readahead_control*)'
1 variable symbol(s) added
'struct tracepoint __tracepoint_android_vh_read_pages'
Bug: 315913896
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.com>
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.corp-partner.google.com>
Change-Id: Ibb1e31b6912f7b6b92b76727f7e5043897434def
Add vendor hooks to capture demand paging during APP launch,
so we can do it in advance in next launch.
Bug: 315913896
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.com>
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.corp-partner.google.com>
Change-Id: I2698fefd347745fb4ff84b111caedbb3bb365ce3
The current placement of trace_cma_alloc_start/finish misses the fail
cases: !cma || !cma->count || !cma->bitmap.
trace_cma_alloc_finish is also not emitted for the failure case
where bitmap_count > bitmap_maxno.
Fix these missed cases by moving the start event before the failure
checks and moving the finish event to the out label.
Link: https://lkml.kernel.org/r/20240110012234.3793639-1-kaleshsingh@google.com
Fixes: 7bc1aec5e2 ("mm: cma: add trace events for CMA alloc perf testing")
Change-Id: I61153fe078da4f9f3338147f1fbb7697a5554078
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 3b08ab9a811caebe1327f25f51557f95200d94bf https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 315897033
[ Remove ret arg from trace_cma_alloc_finish - Kalesh Singh ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Under certain circumstances a SoC can reach a critical temperaturelimit
and is unable to stabilize the temperature around a temperaturecontrol.
The system may ask for a specific power budget butbecause of the OPP
density, we can only choose an OPP with a powerbudget lower than the
requested one and under-utilize the CPU, thuslosing performance. In
other words, one OPP under-utilizes the CPUwith a power less than the
requested power budget and the next OPPexceeds the power budget. The
cpu idle cooling can solve this problem.
Bug: 299411923
Signed-off-by: Aran Dalton <arda@allwinnertech.com>
Change-Id: I1c17b340617e88be075097dc47f30ce94be2a4d7
Enabling CONFIG_NETFILTER_FAMILY_BRIDGE causes the new element,
hooks_bridge[] to be added to netns_nf. Since the KMI is frozen
this could not be added.
The only instantiation of struct netns_nf is as an embedded field
of struct net. So instead of adding the field to struct netns_nf,
a new "struct ext_net" is added that contains struct net and
the new hooks_bridge[] field. An accessor function,
get_nf_hooks_bridge() is added to get a pointer to the new
field.
There is a global init_net of type struct net which must be special
cased since it is not a member of a struct ext_net. All other
instances of struct net are allocated via net_alloc() which now
allocates a struct ext_net.
Since CONFIG_NETFILTER_FAMILY_BRIDGE is a hidden config that is
needed for vendor modules, it is enabled via init/Kconfig.gki.
Bug: 316040984
Fixes: 0145780bfc78 ("fix KASAN-related kernel crash by KMI W/A for NETFILTER_FAMILY_BRIDGE")
Change-Id: I2c7384e3df9b88f12464dc0138986fed12ca626a
Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
The commit 5aec776ef8 ("BACKPORT: ANDROID: dma-buf: Move sysfs work
out of DMA-BUF export path) re-purposed kobject as work_struct temporarily
to create the sysfs entries asynchronously. The author knows what he is
doing and rightly added a build assert if kobject struct size is smaller than
the work_struct size. We are hitting this build assert on a non-GKI platform
where CONFIG_ANDROID_KABI_RESERVE is not set. Fix this problem by allocating
a new union with dma_buf_sysfs_entry structure and temporary structure as
members. We only end up allocating more memory (because of union) only when
kobject size is smaller than work_struct which the original patch any way
assumed would never be true.
Bug: 261818147
Bug: 262666413
Change-Id: Ifb089bf80d8a3a44ece9f05fc0b99ee76cb11645
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
(cherry picked from commit ce18af9b5d)
Signed-off-by: T.J. Mercier <tjmercier@google.com>
We have identified an animation lag issue on our Android 14-6.1 product
which seems to be caused by contention in the rwsem lock during the
dmabuf request process. It appears that other processes are holding
sysfs read locks, resulting in the blocking of dmabuf sysfs node
creation. We encountered an issue in android14-6.1 that is similar to
the problem described in [1]. So we cherry-pick this commit to
android14-6.1.
[1] https://android-review.googlesource.com/c/kernel/common/+/2111974
Bug: 311282169
Bug: 206979019
Link: https://lore.kernel.org/lkml/CABdmKX2dNYhgOYdrrJU6-jt6F=LjCidbKhR6t4F7yaa0SPr+-A@mail.gmail.com/T/
Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com>
Conflicts:
include/linux/dma-buf.h
1. The android14-6.1 KMI is frozen, and the modification to struct
dma_buf_sysfs_entry in the original patch triggers ABI check
failures. Instead of an anonymous union, use the existing struct
kobject directly as a work_struct with type punning.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Change-Id: Ic0386849b6b248b0a72215633fc1a50782455bac
commit 7315dc1e122c85ffdfc8defffbb8f8b616c2eb1a upstream.
NFT_MSG_DELSET deactivates all elements in the set, skip
set->ops->commit() to avoid the unnecessary clone (for the pipapo case)
as well as the sync GC cycle, which could deactivate again expired
elements in such set.
Bug: 318548348
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0105571f80)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie733688e27d9568d797fc1bc477261883b7dc8c1
Under certain circumstances __get_fault_info() may resolve the faulting
address using the AT instruction. Given that this is being done outside
of the host lock critical section, it is racy and the resolution via AT
may fail. We currently BUG() in this situation, which is obviously less
than ideal. Moving the address resolution to the critical section may
have a performance impact, so let's keep it where it is, but bail out
and return to the host to try a second time.
Bug: 311830307
Change-Id: I26d61b04a4ccf040bd31802abb3c6b998ff4a48b
Signed-off-by: Quentin Perret <qperret@google.com>
commit d920abd1e7 upstream.
From Alon:
"Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel,
a malicious user can cause a UAF and a double free, which may lead to
RCE (may also lead to an LPE in case the attacker already has local
privileges)."
Hence, when a queue initialization fails after the ahash requests are
allocated, it is guaranteed that the queue removal async work will be
called, hence leave the deallocation to the queue removal.
Also, be extra careful not to continue processing the socket, so set
queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error.
Bug: 310114968
Cc: stable@vger.kernel.org
Reported-by: Alon Zahavi <zahavi.alon@gmail.com>
Tested-by: Alon Zahavi <zahavi.alon@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e985d78bdcf37f7ef73666a43b0d2407715f00d3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ifd7ec8294182a6bf6d8c261aeda5d989e909f7ff
Current EP0 dequeue path will share the same as other EPs. However, there
are some special considerations that need to be made for EP0 transfers:
- EP0 transfers never transition into the started_list
- EP0 only has one active request at a time
In case there is a vendor specific control message for a function over USB
FFS, then there is no guarantee on the timeline which the DATA/STATUS stage
is responded to. While this occurs, any attempt to end transfers on
non-control EPs will end up having the DWC3_EP_DELAY_STOP flag set, and
defer issuing of the end transfer command. If the USB FFS application
decides to timeout the control transfer, or if USB FFS AIO path exits, the
USB FFS driver will issue a call to usb_ep_dequeue() for the ep0 request.
In case of the AIO exit path, the AIO FS blocks until all pending USB
requests utilizing the AIO path is completed. However, since the dequeue
of ep0 req does not happen properly, all non-control EPs with the
DWC3_EP_DELAY_STOP flag set will not be handled, and the AIO exit path will
be stuck waiting for the USB FFS data endpoints to receive a completion
callback.
Fix is to utilize dwc3_ep0_reset_state() in the dequeue API to ensure EP0
is brought back to the SETUP state, and ensures that any deferred end
transfer commands are handled. This also will end any active transfers
on EP0, compared to the previous implementation which directly called
giveback only.
Fixes: fcd2def663 ("usb: dwc3: gadget: Refactor dwc3_gadget_ep_dequeue")
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Bug: 318577849
Change-Id: Ic00684db4b502f1aab128f7e49f22510dda24f60
(cherry picked from commit 730e12fbec53ab59dd807d981a204258a4cfb29a https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing)
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
commit 7644b1a1c9 upstream.
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer
dereference when the thread is cleared. Grab the SQPOLL data lock before
attempting to get the task cpu and pid for fdinfo, this ensures we have a
stable view of it.
Bug: 309790656
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9236d2ea64)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I044e0285d4535440606ff593230b873e3145db91
commit 4b7de801606e504e69689df71475d27e35336fb3 upstream.
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.
There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.
The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.
I'm hitting following race during the program load:
CPU 0 CPU 1
bpf_prog_load
bpf_check
do_misc_fixups
prog_array_map_poke_track
map_update_elem
bpf_fd_array_map_update_elem
prog_array_map_poke_run
bpf_arch_text_poke returns -EINVAL
bpf_prog_kallsyms_add
After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.
Similar race exists on the program unload.
Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.
Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.
[0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810
Bug: 309551558
Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 57a6b0a464)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I251c3da579e5d48cd7de4043913fd42d0671d6b5
Export sysctl_sched_min_granularity and
sysctl_sched_idle_min_granularity. In the vendor module, it will use
several static function in GKI, while we do not want to export these
static functions, which will need to make them not static, we copied
them to the vendor module, so we need the export the symbols used in
those static functions. For example, sysctl_sched_min_granularity
and sysctl_sched_idle_min_granularity are referred in sched_slice(),
and they are only used as read-only.
Bug: 316276520
Change-Id: I976d0a1f3a70e8e60099e55fdd3cc99a90053fbb
Signed-off-by: Rick Yiu <rickyiu@google.com>
Setting the PARKMODE_DISABLE_HS bit in the DWC3_USB3_GUCTL1.
When this bit is set to '1' all HS bus instances in park mode are disabled
For some USB wifi devices, if enable this feature it will reduce the
performance. Therefore, add an option for disabling HS park mode by
device-tree.
In Synopsys's dwc3 data book:
In a few high speed devices when an IN request is sent within 900ns of the
ACK of the previous packet, these devices send a NAK. When connected to
these devices, if required, the software can disable the park mode if you
see performance drop in your system. When park mode is disabled,
pipelining of multiple packet is disabled and instead one packet at a time
is requested by the scheduler. This allows up to 12 NAKs in a micro-frame
and improves performance of these slow devices.
Bug: 300024866
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Stanley Chang <stanley_chang@realtek.com>
Link: https://lore.kernel.org/r/20230419020044.15475-1-stanley_chang@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit d21a797a3e)
Change-Id: I43ee416e54779a073a0ba4057edf4be8bd7886de
Signed-off-by: Kever Yang <kever.yang@rock-chips.com>
As we reserve only 1GB of memory for the MMIO region don't prepopulate
the entire remaining address space with MMIO as this is prone to failure.
Instead, let the MMIO regions to be created lazily on the fault path and
keep only the RAM regions prepopulated.
Bug: 307805059
Test: Boot pKVM with CONFIG_ARM64_16K_PAGES
Change-Id: I6327f42eb17c6588335a1e04736393c9032114ab
Signed-off-by: Sebastian Ene <sebastianene@google.com>
In dup_mmap(), using __mt_dup() to duplicate the old maple tree and then
directly replacing the entries of VMAs in the new maple tree can result in
better performance. __mt_dup() uses DFS pre-order to duplicate the maple
tree, so it is efficient.
The average time complexity of __mt_dup() is O(n), where n is the number
of VMAs. The proof of the time complexity is provided in the commit log
that introduces __mt_dup(). After duplicating the maple tree, each
element is traversed and replaced (ignoring the cases of deletion, which
are rare). Since it is only a replacement operation for each element,
this process is also O(n).
Analyzing the exact time complexity of the previous algorithm is
challenging because each insertion can involve appending to a node,
pushing data to adjacent nodes, or even splitting nodes. The frequency of
each action is difficult to calculate. The worst-case scenario for a
single insertion is when the tree undergoes splitting at every level. If
we consider each insertion as the worst-case scenario, we can determine
that the upper bound of the time complexity is O(n*log(n)), although this
is a loose upper bound. However, based on the test data, it appears that
the actual time complexity is likely to be O(n).
As the entire maple tree is duplicated using __mt_dup(), if dup_mmap()
fails, there will be a portion of VMAs that have not been duplicated in
the maple tree. To handle this, we mark the failure point with
XA_ZERO_ENTRY. In exit_mmap(), if this marker is encountered, stop
releasing VMAs that have not been duplicated after this point.
There is a "spawn" in byte-unixbench[1], which can be used to test the
performance of fork(). I modified it slightly to make it work with
different number of VMAs.
Below are the test results. The first row shows the number of VMAs. The
second and third rows show the number of fork() calls per ten seconds,
corresponding to next-20231006 and the this patchset, respectively. The
test results were obtained with CPU binding to avoid scheduler load
balancing that could cause unstable results. There are still some
fluctuations in the test results, but at least they are better than the
original performance.
21 121 221 421 821 1621 3221 6421 12821 25621 51221
112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393
114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599
2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42%
[1] https://github.com/kdlucas/byte-unixbench/tree/master
Link: https://lkml.kernel.org/r/20231027033845.90608-11-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d2406291483775ecddaee929231a39c70c08fda2
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
[surenb: open-coded vma_iter_clear_gfp(), vma_iter_bulk_store();
replaced vma_next() with mas_find()]
Bug: 308042511
Change-Id: I42d6620e8ce6a0b16211c231a9b72ba16ba9c0d2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Introduce interfaces __mt_dup() and mtree_dup(), which are used to
duplicate a maple tree. They duplicate a maple tree in Depth-First Search
(DFS) pre-order traversal. It uses memcopy() to copy nodes in the source
tree and allocate new child nodes in non-leaf nodes. The new node is
exactly the same as the source node except for all the addresses stored in
it. It will be faster than traversing all elements in the source tree and
inserting them one by one into the new tree. The time complexity of these
two functions is O(n).
The difference between __mt_dup() and mtree_dup() is that mtree_dup()
handles locks internally.
Analysis of the average time complexity of this algorithm:
For simplicity, let's assume that the maximum branching factor of all
non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a
full tree.
Under the given conditions, if there is a maple tree with n elements, the
number of its leaves is n/16. From bottom to top, the number of nodes in
each level is 1/16 of the number of nodes in the level below. So the
total number of nodes in the entire tree is given by the sum of n/16 +
n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has log(n)
terms with base 16. According to the formula for the sum of a geometric
series, the sum of this series can be calculated as (n-1)/15. Each node
has only one parent node pointer, which can be considered as an edge. In
total, there are (n-1)/15-1 edges.
This algorithm consists of two operations:
1. Traversing all nodes in DFS order.
2. For each node, making a copy and performing necessary modifications
to create a new node.
For the first part, DFS traversal will visit each edge twice. Let
T(ascend) represent the cost of taking one step downwards, and T(descend)
represent the cost of taking one step upwards. And both of them are
constants (although mas_ascend() may not be, as it contains a loop, but
here we ignore it and treat it as a constant). So the time spent on the
first part can be represented as ((n-1)/15-1) * (T(ascend) + T(descend)).
For the second part, each node will be copied, and the cost of copying a
node is denoted as T(copy_node). For each non-leaf node, it is necessary
to reallocate all child nodes, and the cost of this operation is denoted
as T(dup_alloc). The behavior behind memory allocation is complex and not
specific to the maple tree operation. Here, we assume that the time
required for a single allocation is constant. Since the size of a node is
fixed, both of these symbols are also constants. We can calculate that
the time spent on the second part is ((n-1)/15) * T(copy_node) + ((n-1)/15
- n/16) * T(dup_alloc).
Adding both parts together, the total time spent by the algorithm can be
represented as:
((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) -
n/16 * T(dup_alloc) - (T(ascend) + T(descend))
Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)
Let C2 = T(dup_alloc)
Let C3 = T(ascend) + T(descend)
Finally, the expression can be simplified as:
((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3).
This is a linear function, so the average time complexity is O(n).
Link: https://lkml.kernel.org/r/20231027033845.90608-4-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fd32e4e9b7646510ee9010e0d5f8b8857d48a6f7
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: I385759a1184a202498e086458b572c203616b9b4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Patch series "Introduce __mt_dup() to improve the performance of fork()", v7.
This series introduces __mt_dup() to improve the performance of fork().
During the duplication process of mmap, all VMAs are traversed and
inserted one by one into the new maple tree, causing the maple tree to be
rebalanced multiple times. Balancing the maple tree is a costly
operation. To duplicate VMAs more efficiently, mtree_dup() and __mt_dup()
are introduced for the maple tree. They can efficiently duplicate a maple
tree.
Here are some algorithmic details about {mtree,__mt}_dup(). We perform a
DFS pre-order traversal of all nodes in the source maple tree. During
this process, we fully copy the nodes from the source tree to the new
tree. This involves memory allocation, and when encountering a new node,
if it is a non-leaf node, all its child nodes are allocated at once.
This idea was originally from Liam R. Howlett's Maple Tree Work email,
and I added some of my own ideas to implement it. Some previous
discussions can be found in [1]. For a more detailed analysis of the
algorithm, please refer to the logs for patch [3/10] and patch [10/10].
There is a "spawn" in byte-unixbench[2], which can be used to test the
performance of fork(). I modified it slightly to make it work with
different number of VMAs.
Below are the test results. The first row shows the number of VMAs. The
second and third rows show the number of fork() calls per ten seconds,
corresponding to next-20231006 and the this patchset, respectively. The
test results were obtained with CPU binding to avoid scheduler load
balancing that could cause unstable results. There are still some
fluctuations in the test results, but at least they are better than the
original performance.
21 121 221 421 821 1621 3221 6421 12821 25621 51221
112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393
114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599
2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42%
Thanks to Liam and Matthew for the review.
This patch (of 10):
Add two helpers:
1. mt_free_one(), used to free a maple node.
2. mt_attr(), used to obtain the attributes of maple tree.
Link: https://lkml.kernel.org/r/20231027033845.90608-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20231027033845.90608-2-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4f2267b58a22d972be98edef8e6b3c7a67c9fb91
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: Ib9b13dee357ac4c85668901c20a3c370fbdd08da
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Adding the following symbols:
- __drmm_crtc_alloc_with_planes
Bug: 275278929
Change-Id: I41b6069612d44214f474ed82ee2a4b07ca739302
Signed-off-by: Ken Huang <kenbshuang@google.com>
The structures that define hyp events must be packed so they match
their format definitions in the tracefs file
hyp/events/hyp/<event>/format.
Bug: 299430621
Change-Id: Ia7e1a686744d5c9c3f8a21881f03228c8acecade
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
From pKVM point of view, unknown SMCs are simply forwarded, we can't
consider them invalid or not. This was probably a typo following a copy
of the host_hcall event.
Bug: 299430621
Change-Id: Ieb53f985a5187a8b5a9feb4a95982b15cdc1b04a
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
If we return the error, there's no way to recover the status as of now, since
fsck does not fix the xattr boundary issue.
Bug: 305658663
Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 50a472bbc79ff9d5a88be8019a60e936cadf9f13
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I55060a4eede3f5f85066aba22a6ab7155517e5c4
(cherry picked from commit 70113b9d489050d3e7a6f28e0cd6e43f104fc132)
(cherry picked from commit 2c1f3789d609bd549f14c019b6c7b311bfd2fa64)
When this pKVM module ops has been introduced, the documentation has
been omitted.
Bug: 308373293
Change-Id: I9e471414e72a1ee04c132de4ed95d77e815ae8c9
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>